This tutorial covers the complete workflow for training models in Onyx Engine.
Complete Training Example
Here’s a full training script you can use as a starting point:
from onyxengine import Onyx
from onyxengine.modeling import (
Output,
Input,
MLPConfig,
TrainingConfig,
AdamWConfig,
)
# Initialize the client (defaults to ONYX_API_KEY env var)
onyx = Onyx()
# Define model outputs and inputs
outputs = [
Output(name='acceleration'),
]
inputs = [
Input(name='velocity', parent='acceleration', relation='derivative'),
Input(name='position', parent='velocity', relation='derivative'),
Input(name='control_input'),
]
# Configure the model
model_config = MLPConfig(
outputs=outputs,
inputs=inputs,
dt=0.0025,
sequence_length=8,
hidden_layers=3,
hidden_size=64,
activation='relu',
dropout=0.2,
bias=True
)
# Configure training
training_config = TrainingConfig(
training_iters=3000,
train_batch_size=1024,
test_dataset_size=500,
checkpoint_type='multi_step',
optimizer=AdamWConfig(lr=3e-4, weight_decay=1e-2),
lr_scheduler=None
)
# Start training
onyx.train_model(
model_name='example_model',
model_config=model_config,
dataset_name='example_data',
training_config=training_config,
)
To make your life easier for hardware dynamics modeling, Onyx provides some built-in tools for creating inputs/outputs with physical relationships, traceable naming, and feature scaling.
Outputs
Outputs are the features your model predicts. You can define multiple outputs and include parent and relation to derive outputs from other outputs. See Output for the full API.
from onyxengine.modeling import Output
outputs = [
Output(name='jerk'),
Output(name='acceleration', parent='jerk', relation='derivative'),
Output(name='temperature_delta')
]
Inputs are the features fed into the model to make predictions. Similarly, you can define multiple inputs and include parent and relation to derive inputs from both outputs and other inputs. See Input for the full API.
from onyxengine.modeling import Input
inputs = [
# Feed back acceleration to the model as an input
Input(name='acceleration_feedback', parent='acceleration', relation='equal'),
# Derive velocity and position from acceleration
Input(name='velocity', parent='acceleration', relation='derivative'),
Input(name='position', parent='velocity', relation='derivative'),
# Sometimes delta values can be more useful/intuitive versus derivatives
Input(name='temperature', parent='temperature_delta', relation='delta'),
Input(name='control_input'),
]
Feature Relationships
The parent parameter specifies the feature to derive from. The relation parameter specifies the mathematical relationship:
| Relation | Equation | Use Case |
|---|
'derivative' | state[t+1] = state[t] + parent[t] * dt | Predicting/using time derivatives |
'delta' | state[t+1] = state[t] + parent[t] | Predicting more general deltas |
'equal' | state[t+1] = parent[t] | Feed output back as input |
Feature Naming
By default, the name parameter will be used to find the input/output’s matching feature in the dataset when training the model. If you want to use a different name for the input/output, you can use the dataset_feature parameter.
# Model input name matches the dataset feature name
Input(name='encoder_position_radians')
# Model input 'position' uses the 'encoder_position_radians' dataset feature
Input(name='position', dataset_feature='encoder_position_radians')
# Dataset features can be used multiple times
Input(name='current_position', dataset_feature='encoder_position_radians')
Output(name='next_position_pred', dataset_feature='encoder_position_radians')
Model input/output names must be unique.
Feature Scaling
When training AI models, it’s important to scale inputs/outputs so that they are in similar ranges.
For example, if you were training a model with outputs Pressure [Pascal] and Torque [Newton-Meters], the values for pressure (and therefore loss/gradients) would be much larger than the values for torque, which can encourage the model to focus on learning to predict pressure and potentially ignore torque.
To address this, it’s typical to scale all features to have a mean of 0 and standard deviation of 1, or range from [-1, 1] or [0, 1].
Keeping track of the scaling factors of each feature for use in both training and deployment is prone to error, so Onyx automatically calculates the scaling factors and bundles them with each model for you:
train_mean: The mean of the dataset feature during training
train_std: The standard deviation of the dataset feature during training
train_min: The minimum value of the dataset feature during training
train_max: The maximum value of the dataset feature during training
scale: The scaling method used, either "mean" or [min, max] (see section below)
{
"name": "example_model",
"type": "model",
"created_at": "2026-01-20T19:56:25.969196+00:00",
"config": {
"type": "rnn",
"outputs": [
{
"type": "output",
"name": "acceleration",
"dataset_feature": "acceleration",
"scale": "mean",
"parent": None,
"relation": None,
"train_mean": -5.359800977241818,
"train_std": 4.525139292151637,
"train_min": -19.051991812957425,
"train_max": 0.5755097293625168,
}
],
"inputs": [
{
"type": "input",
"name": "velocity",
"dataset_feature": "velocity",
"scale": "mean",
"parent": None,
"relation": None,
"train_mean": 4.896938152458872,
"train_std": 6.035669440028341,
"train_min": -0.8852670185723328,
"train_max": 30.22476247402063,
},
...
The scale parameter controls how model inputs/outputs are scaled:
"mean" (default): Scales the feature to have a mean of 0 and std of 1.
[min, max]: For dataset features with known physical bounds, scales from [min, max] to [-1, 1].
Often, just leaving the default "mean" scaling is fine for most features (bounded or not).
from onyxengine.modeling import Input, Output
# Mean scaling (default): always use when feature distribution is unknown
Output(name='acceleration')
Input(name='velocity', scale='mean')
# Min-max scaling: can be used when feature has known min/max (e.g. voltage 0–5 V, PWM 0–1)
Input(name='voltage', dataset_feature='voltage_V', scale=[0.0, 5.0]), # 0–5 V → [-1, 1]
Input(name='pwm_input', dataset_feature='duty', scale=[0.0, 1.0]), # 0–100% → [-1, 1]
Model Configuration
Model configurations (eg. MLPConfig, RNNConfig, TransformerConfig) are separate from the actual model class (MLP, RNN, Transformer), to make it easier to manage configurations.
The following configuration parameters are common to all model configurations:
outputs: List of model outputs
inputs: List of model inputs
dt: Time step for model (does not need to match the dataset time step)
sequence_length: The number of previous time steps the model sees as a history window
Best for systems with relatively simple dynamics or for fast inference:
from onyxengine.modeling import MLPConfig
model_config = MLPConfig(
outputs=outputs,
inputs=inputs,
dt=0.0025,
sequence_length=5,
hidden_layers=3, # Number of hidden layers
hidden_size=64, # Neurons per layer
activation='relu', # Activation function
dropout=0.1, # Dropout rate for regularization
bias=True # Include bias terms
)
Better for systems with complex temporal dependencies:
from onyxengine.modeling import RNNConfig
model_config = RNNConfig(
outputs=outputs,
inputs=inputs,
dt=0.0025,
sequence_length=10,
rnn_type='LSTM', # 'RNN', 'LSTM', or 'GRU'
hidden_layers=2, # Number of RNN layers
hidden_size=64, # Hidden units per layer
dropout=0.1, # Dropout rate for regularization
bias=True # Include bias terms
)
Powerful for capturing long-range dependencies:
from onyxengine.modeling import TransformerConfig
model_config = TransformerConfig(
outputs=outputs,
inputs=inputs,
dt=0.0025,
sequence_length=20,
n_layer=2, # Transformer layers
n_head=4, # Attention heads
n_embd=64, # Embedding dimension (must be divisible by n_head)
dropout=0.1, # Dropout rate for regularization
bias=True # Include bias terms
)
Optimizers
Optimizers are responsible for updating the model’s weights during training, and can use an accompanying lr_scheduler (more below). As such, the OptimizerConfig can have a significant impact on how well the model learns.
This is an area where model optimization can be particularly useful, but we recommend first verifying you can train a simple model that shows signs of learning your dynamics.
A good starting baseline is the AdamWConfig with a learning rate of 3e-4 to 3e-5 and weight decay of 1e-2 to 1e-3.
Recommended for most cases.
from onyxengine.modeling import AdamWConfig
optimizer = AdamWConfig(
lr=3e-4, # Learning rate (higher = more aggressive learning)
weight_decay=1e-2 # L2 regularization (higher = more regularization)
)
from onyxengine.modeling import SGDConfig
optimizer = SGDConfig(
lr=1e-3, # Learning rate
weight_decay=1e-4, # L2 regularization
momentum=0.9 # Momentum factor
)
Learning Rate Schedulers
The lr_scheduler is used to adjust the learning rate used by the optimizer over the course of training. This is a tradeoff that can help the model learn efficiently (fewer iterations) with larger learning rates, while also squeezing out more performance with fine-tuned weight updates from smaller learning rates when close to convergence.
The default lr_scheduler is None, which means the optimizer will use a constant learning rate. We recommend starting with None and first finding a constant learning rate + simple model that shows stable learning on the kinds of data you’re working with.
No Scheduler
Use a constant learning rate:
training_config = TrainingConfig(
optimizer=AdamWConfig(lr=3e-4),
lr_scheduler=None # Constant learning rate
)
Cosine Decay with Warmup
Learning rate starts low, linearly warms up to peak, then decays smoothly until the minimum learning rate is reached. Recommended for most cases.
from onyxengine.modeling import CosineDecayWithWarmupConfig
lr_scheduler = CosineDecayWithWarmupConfig(
max_lr=3e-4, # Peak learning rate
min_lr=3e-5, # Final learning rate
warmup_iters=200, # Warmup period
decay_iters=1000 # Decay period
)
Learning rate curve:
LR
│ ╭───────╮
│ ╱ ╲
│ ╱ ╲____
│╱
└────────────────── Iterations
↑ ↑
warmup decay
When to use:
- Standard choice for most training runs
- When you want smooth convergence to a minimum
Cosine Annealing with Warm Restarts
Periodic cosine decay with restarts that can lengthen over time.
from onyxengine.modeling import CosineAnnealingWarmRestartsConfig
lr_scheduler = CosineAnnealingWarmRestartsConfig(
T_0=500, # Initial cycle length
T_mult=2, # Cycle length multiplier
eta_min=1e-5 # Minimum learning rate
)
Learning rate curve (T_mult=2):
LR
│╲ ╲ ╲
│ ╲ ╲ ╲
│ ╲ ╲ ╲____
│ ╲ ╲
└────────────────── Iterations
↑ ↑ ↑
T_0 2*T_0 4*T_0
When to use:
- Finding multiple local minima
- When stuck in suboptimal solutions
- Longer training runs
Training Configuration
The TrainingConfig class defines parameters for training the model you’ve configured.
from onyxengine.modeling import TrainingConfig, AdamWConfig
training_config = TrainingConfig(
training_iters=2000, # Total training iterations (how long to train)
train_batch_size=1024, # Samples per batch (generally leave as is)
train_val_split_ratio=0.9, # Train/validation split (generally leave as is)
test_dataset_size=500, # Samples to set aside for platform visuals
checkpoint_type='single_step', # Training checkpoint type (see below)
optimizer=AdamWConfig( # Optimizer config (see below)
lr=3e-4,
weight_decay=1e-2
),
lr_scheduler=None # Learning rate scheduler config (see below)
)
Checkpoint Types
| Type | Description | Use Case |
|---|
'single_step' | Save best training checkpoint for next-step prediction | Deploying with model(input) |
'multi_step' | Save best training checkpoint for trajectory simulation | Deploying with model.simulate(input) |
Start with single_step to verify your model learns the dynamics, then switch to multi_step if you’re using the .simulate() method in deployment.
Running Training
onyx.train_model(
model_name='example_model', # Name for the trained model
model_config=model_config,
dataset_name='example_data', # Dataset to train on
dataset_version_id=None, # Optional: specific dataset version
training_config=training_config,
)
Monitor training progress via the Engine Platform for detailed loss curves and predictions.
After Training
The trained model is automatically saved, versioned, and traced in the Engine to be pulled directly from code. The onyx.load_model function returns a full model object as either a MLP, RNN, or Transformer, depending on the model configuration.
Load a Model (latest version)
import torch
from onyxengine import Onyx
# Load the latest model version
onyx = Onyx()
model = onyx.load_model('example_model')
print(model.config.model_dump_json(indent=2))
# Make a single prediction
batch_size = 1
sequence_length = model.config.sequence_length
num_inputs = len(model.config.inputs)
x = torch.randn(batch_size, sequence_length, num_inputs)
with torch.no_grad():
output = model(x)
print("\nModel Prediction:\n", output)
Load a Model (specific version)
It is recommended to specify a specific version_id when loading a model to both select the best performing version, and to ensure your code is reproducible.
model = onyx.load_model('example_model', version_id='dcfec841-1748-47e2-b6c7-3c821cc69b4a')
Load a Model (local offline mode)
To load the model without downloading it from the Engine, you can use the mode='offline' parameter. This will only load the model from the local cache (more below), and will not check for updates from the Engine.
You will need to have previously downloaded the model files (.pt and .json) and have them in the local cache directory ([SCRIPT_DIRECTORY]/.onyx/models/).
model = onyx.load_model('example_model', mode='offline')
Local Caching
Models are cached locally after the first download. The SDK automatically:
- Checks if the local version matches the requested version
- Downloads only if the local cache is outdated
- Stores files in
[SCRIPT_DIRECTORY]/.onyx/models/
Next Steps