Training Models

This tutorial covers the complete workflow for training models in Onyx Engine.

Complete Training Example

Here’s a full training script you can use as a starting point:

from onyxengine import Onyx
from onyxengine.modeling import (
    Output,
    Input,
    MLPConfig,
    TrainingConfig,
    AdamWConfig,
)

# Initialize the client (defaults to ONYX_API_KEY env var)
onyx = Onyx()

# Define model outputs and inputs
outputs = [
    Output(name='acceleration'),
]
inputs = [
    Input(name='velocity', parent='acceleration', relation='derivative'),
    Input(name='position', parent='velocity', relation='derivative'),
    Input(name='control_input'),
]

# Configure the model
model_config = MLPConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.0025,
    sequence_length=8,
    hidden_layers=3,
    hidden_size=64,
    activation='relu',
    dropout=0.2,
    bias=True
)

# Configure training
training_config = TrainingConfig(
    training_iters=3000,
    train_batch_size=1024,
    test_dataset_size=500,
    checkpoint_type='multi_step',
    optimizer=AdamWConfig(lr=3e-4, weight_decay=1e-2),
    lr_scheduler=None
)

# Start training
onyx.train_model(
    model_name='example_model',
    model_config=model_config,
    dataset_name='example_data',
    training_config=training_config,
)

Defining Model Inputs and Outputs

To make your life easier for hardware dynamics modeling, Onyx provides some built-in tools for creating inputs/outputs with physical relationships, traceable naming, and feature scaling.

Outputs

Outputs are the features your model predicts. You can define multiple outputs and include parent and relation to derive outputs from other outputs. See Output for the full API.

from onyxengine.modeling import Output

outputs = [
    Output(name='jerk'),
    Output(name='acceleration', parent='jerk', relation='derivative'),
    Output(name='temperature_delta')
]

Inputs

Inputs are the features fed into the model to make predictions. Similarly, you can define multiple inputs and include parent and relation to derive inputs from both outputs and other inputs. See Input for the full API.

from onyxengine.modeling import Input

inputs = [
    # Feed back acceleration to the model as an input
    Input(name='acceleration_feedback', parent='acceleration', relation='equal'),
    # Derive velocity and position from acceleration
    Input(name='velocity', parent='acceleration', relation='derivative'),
    Input(name='position', parent='velocity', relation='derivative'),
    # Sometimes delta values can be more useful/intuitive versus derivatives
    Input(name='temperature', parent='temperature_delta', relation='delta'),
    Input(name='control_input'),
]

Feature Relationships

The parent parameter specifies the feature to derive from. The relation parameter specifies the mathematical relationship:

Relation	Equation	Use Case
`'derivative'`	`state[t+1] = state[t] + parent[t] * dt`	Predicting/using time derivatives
`'delta'`	`state[t+1] = state[t] + parent[t]`	Predicting more general deltas
`'equal'`	`state[t+1] = parent[t]`	Feed output back as input

Feature Naming

By default, the name parameter will be used to find the input/output’s matching feature in the dataset when training the model. If you want to use a different name for the input/output, you can use the dataset_feature parameter.

# Model input name matches the dataset feature name
Input(name='encoder_position_radians')

# Model input 'position' uses the 'encoder_position_radians' dataset feature
Input(name='position', dataset_feature='encoder_position_radians')

# Dataset features can be used multiple times
Input(name='current_position', dataset_feature='encoder_position_radians')
Output(name='next_position_pred', dataset_feature='encoder_position_radians')

Model input/output names must be unique.

Feature Scaling

When training AI models, it’s important to scale inputs/outputs so that they are in similar ranges. For example, if you were training a model with outputs Pressure [Pascal] and Torque [Newton-Meters], the values for pressure (and therefore loss/gradients) would be much larger than the values for torque, which can encourage the model to focus on learning to predict pressure and potentially ignore torque. To address this, it’s typical to scale all features to have a mean of 0 and standard deviation of 1, or range from [-1, 1] or [0, 1]. Keeping track of the scaling factors of each feature for use in both training and deployment is prone to error, so Onyx automatically calculates the scaling factors and bundles them with each model for you:

train_mean: The mean of the dataset feature during training
train_std: The standard deviation of the dataset feature during training
train_min: The minimum value of the dataset feature during training
train_max: The maximum value of the dataset feature during training
scale: The scaling method used, either "mean" or [min, max] (see section below)

{
    "name": "example_model",
    "type": "model",
    "created_at": "2026-01-20T19:56:25.969196+00:00",
    "config": {
        "type": "rnn",
        "outputs": [
            {
                "type": "output",
                "name": "acceleration",
                "dataset_feature": "acceleration",
                "scale": "mean",
                "parent": None,
                "relation": None,
                "train_mean": -5.359800977241818,
                "train_std": 4.525139292151637,
                "train_min": -19.051991812957425,
                "train_max": 0.5755097293625168,
            }
        ],
        "inputs": [
            {
                "type": "input",
                "name": "velocity",
                "dataset_feature": "velocity",
                "scale": "mean",
                "parent": None,
                "relation": None,
                "train_mean": 4.896938152458872,
                "train_std": 6.035669440028341,
                "train_min": -0.8852670185723328,
                "train_max": 30.22476247402063,
            },
        ...

The scale parameter controls how model inputs/outputs are scaled:

"mean" (default): Scales the feature to have a mean of 0 and std of 1.
[min, max]: For dataset features with known physical bounds, scales from [min, max] to [-1, 1].

Often, just leaving the default "mean" scaling is fine for most features (bounded or not).

from onyxengine.modeling import Input, Output

# Mean scaling (default): always use when feature distribution is unknown
Output(name='acceleration')
Input(name='velocity', scale='mean')

# Min-max scaling: can be used when feature has known min/max (e.g. voltage 0–5 V, PWM 0–1)
Input(name='voltage', dataset_feature='voltage_V', scale=[0.0, 5.0]),   # 0–5 V → [-1, 1]
Input(name='pwm_input', dataset_feature='duty', scale=[0.0, 1.0]),     # 0–100% → [-1, 1]

Model Configuration

Model configurations (eg. MLPConfig, RNNConfig, TransformerConfig) are separate from the actual model class (MLP, RNN, Transformer), to make it easier to manage configurations. The following configuration parameters are common to all model configurations:

outputs: List of model outputs
inputs: List of model inputs
dt: Time step for model (does not need to match the dataset time step)
sequence_length: The number of previous time steps the model sees as a history window

MLP (`MLPConfig`)

Best for systems with relatively simple dynamics or for fast inference:

from onyxengine.modeling import MLPConfig

model_config = MLPConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.0025,               
    sequence_length=5,       
    hidden_layers=3,         # Number of hidden layers
    hidden_size=64,          # Neurons per layer
    activation='relu',       # Activation function
    dropout=0.1,             # Dropout rate for regularization
    bias=True                # Include bias terms
)

RNN (`RNNConfig`)

Better for systems with complex temporal dependencies:

from onyxengine.modeling import RNNConfig

model_config = RNNConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.0025,
    sequence_length=10,
    rnn_type='LSTM',         # 'RNN', 'LSTM', or 'GRU'
    hidden_layers=2,         # Number of RNN layers
    hidden_size=64,          # Hidden units per layer
    dropout=0.1,             # Dropout rate for regularization
    bias=True                # Include bias terms
)

Transformer (`TransformerConfig`)

Powerful for capturing long-range dependencies:

from onyxengine.modeling import TransformerConfig

model_config = TransformerConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.0025,
    sequence_length=20,
    n_layer=2,               # Transformer layers
    n_head=4,                # Attention heads
    n_embd=64,               # Embedding dimension (must be divisible by n_head)
    dropout=0.1,             # Dropout rate for regularization
    bias=True                # Include bias terms
)

Optimizers

Optimizers are responsible for updating the model’s weights during training, and can use an accompanying lr_scheduler (more below). As such, the OptimizerConfig can have a significant impact on how well the model learns. This is an area where model optimization can be particularly useful, but we recommend first verifying you can train a simple model that shows signs of learning your dynamics. A good starting baseline is the AdamWConfig with a learning rate of 3e-4 to 3e-5 and weight decay of 1e-2 to 1e-3.

AdamW (`AdamWConfig`):

Recommended for most cases.

from onyxengine.modeling import AdamWConfig

optimizer = AdamWConfig(
    lr=3e-4,           # Learning rate (higher = more aggressive learning)
    weight_decay=1e-2  # L2 regularization (higher = more regularization)
)

SGD (`SGDConfig`):

from onyxengine.modeling import SGDConfig

optimizer = SGDConfig(
    lr=1e-3,           # Learning rate
    weight_decay=1e-4, # L2 regularization
    momentum=0.9       # Momentum factor
)

Learning Rate Schedulers

The lr_scheduler is used to adjust the learning rate used by the optimizer over the course of training. This is a tradeoff that can help the model learn efficiently (fewer iterations) with larger learning rates, while also squeezing out more performance with fine-tuned weight updates from smaller learning rates when close to convergence. The default lr_scheduler is None, which means the optimizer will use a constant learning rate. We recommend starting with None and first finding a constant learning rate + simple model that shows stable learning on the kinds of data you’re working with.

No Scheduler

Use a constant learning rate:

training_config = TrainingConfig(
    optimizer=AdamWConfig(lr=3e-4),
    lr_scheduler=None  # Constant learning rate
)

Cosine Decay with Warmup

Learning rate starts low, linearly warms up to peak, then decays smoothly until the minimum learning rate is reached. Recommended for most cases.

from onyxengine.modeling import CosineDecayWithWarmupConfig

lr_scheduler = CosineDecayWithWarmupConfig(
    max_lr=3e-4,        # Peak learning rate
    min_lr=3e-5,        # Final learning rate
    warmup_iters=200,   # Warmup period
    decay_iters=1000    # Decay period
)

Learning rate curve:

LR
│   ╭───────╮
│  ╱         ╲
│ ╱           ╲____
│╱
└────────────────── Iterations
  ↑        ↑
  warmup   decay

When to use:

Standard choice for most training runs
When you want smooth convergence to a minimum

Cosine Annealing with Warm Restarts

Periodic cosine decay with restarts that can lengthen over time.

from onyxengine.modeling import CosineAnnealingWarmRestartsConfig

lr_scheduler = CosineAnnealingWarmRestartsConfig(
    T_0=500,           # Initial cycle length
    T_mult=2,          # Cycle length multiplier
    eta_min=1e-5       # Minimum learning rate
)

Learning rate curve (T_mult=2):

LR
│╲  ╲    ╲
│ ╲  ╲    ╲
│  ╲  ╲    ╲____
│   ╲  ╲
└────────────────── Iterations
 ↑   ↑     ↑
 T_0 2*T_0 4*T_0

When to use:

Finding multiple local minima
When stuck in suboptimal solutions
Longer training runs

Training Configuration

The TrainingConfig class defines parameters for training the model you’ve configured.

from onyxengine.modeling import TrainingConfig, AdamWConfig

training_config = TrainingConfig(
    training_iters=2000,           # Total training iterations (how long to train)
    train_batch_size=1024,         # Samples per batch (generally leave as is)
    train_val_split_ratio=0.9,     # Train/validation split (generally leave as is)
    test_dataset_size=500,         # Samples to set aside for platform visuals
    checkpoint_type='single_step', # Training checkpoint type (see below)
    optimizer=AdamWConfig(         # Optimizer config (see below)
        lr=3e-4,                   
        weight_decay=1e-2          
    ),
    lr_scheduler=None              # Learning rate scheduler config (see below)
)

Checkpoint Types

Type	Description	Use Case
`'single_step'`	Save best training checkpoint for next-step prediction	Deploying with `model(input)`
`'multi_step'`	Save best training checkpoint for trajectory simulation	Deploying with `model.simulate(input)`

Start with single_step to verify your model learns the dynamics, then switch to multi_step if you’re using the .simulate() method in deployment.

Running Training

onyx.train_model(
    model_name='example_model',        # Name for the trained model
    model_config=model_config,
    dataset_name='example_data',       # Dataset to train on
    dataset_version_id=None,           # Optional: specific dataset version
    training_config=training_config,
)

Monitor training progress via the Engine Platform for detailed loss curves and predictions.

After Training

The trained model is automatically saved, versioned, and traced in the Engine to be pulled directly from code. The onyx.load_model function returns a full model object as either a MLP, RNN, or Transformer, depending on the model configuration.

Load a Model (latest version)

import torch
from onyxengine import Onyx

# Load the latest model version
onyx = Onyx()
model = onyx.load_model('example_model')
print(model.config.model_dump_json(indent=2))

# Make a single prediction
batch_size = 1
sequence_length = model.config.sequence_length
num_inputs = len(model.config.inputs)
x = torch.randn(batch_size, sequence_length, num_inputs)
with torch.no_grad():
    output = model(x)
print("\nModel Prediction:\n", output)

Load a Model (specific version)

It is recommended to specify a specific version_id when loading a model to both select the best performing version, and to ensure your code is reproducible.

model = onyx.load_model('example_model', version_id='dcfec841-1748-47e2-b6c7-3c821cc69b4a')

Load a Model (local offline mode)

To load the model without downloading it from the Engine, you can use the mode='offline' parameter. This will only load the model from the local cache (more below), and will not check for updates from the Engine. You will need to have previously downloaded the model files (.pt and .json) and have them in the local cache directory ([SCRIPT_DIRECTORY]/.onyx/models/).

model = onyx.load_model('example_model', mode='offline')

Local Caching

Models are cached locally after the first download. The SDK automatically:

Checks if the local version matches the requested version
Downloads only if the local cache is outdated
Stores files in [SCRIPT_DIRECTORY]/.onyx/models/

Next Steps

Optimizing Models

Automatically search for the best hyperparameters

Simulating with Models

Deploy your trained model for simulation

Getting Started

Tutorials

Concepts

Complete Training Example

Defining Model Inputs and Outputs

Outputs

Inputs

Feature Relationships

Feature Naming

Feature Scaling

Model Configuration

MLP (`MLPConfig`)

RNN (`RNNConfig`)

Transformer (`TransformerConfig`)

Optimizers

AdamW (`AdamWConfig`):

SGD (`SGDConfig`):

Learning Rate Schedulers

No Scheduler

Cosine Decay with Warmup

Cosine Annealing with Warm Restarts

Training Configuration

Checkpoint Types

Running Training

After Training

Load a Model (latest version)

Load a Model (specific version)

Load a Model (local offline mode)

Local Caching

Next Steps

Optimizing Models

Simulating with Models

Getting Started

Tutorials

Concepts

​Complete Training Example

​Defining Model Inputs and Outputs

​Outputs

​Inputs

​Feature Relationships

​Feature Naming

​Feature Scaling

​Model Configuration

​MLP (MLPConfig)

​RNN (RNNConfig)

​Transformer (TransformerConfig)

​Optimizers

​AdamW (AdamWConfig):

​SGD (SGDConfig):

​Learning Rate Schedulers

​No Scheduler

​Cosine Decay with Warmup

​Cosine Annealing with Warm Restarts

​Training Configuration

​Checkpoint Types

​Running Training

​After Training

​Load a Model (latest version)

​Load a Model (specific version)

​Load a Model (local offline mode)

​Local Caching

​Next Steps

Optimizing Models

Simulating with Models

Complete Training Example

Defining Model Inputs and Outputs

Outputs

Inputs

Feature Relationships

Feature Naming

Feature Scaling

Model Configuration

MLP (`MLPConfig`)

RNN (`RNNConfig`)

Transformer (`TransformerConfig`)

Optimizers

AdamW (`AdamWConfig`):

SGD (`SGDConfig`):

Learning Rate Schedulers

No Scheduler

Cosine Decay with Warmup

Cosine Annealing with Warm Restarts

Training Configuration

Checkpoint Types

Running Training

After Training

Load a Model (latest version)

Load a Model (specific version)

Load a Model (local offline mode)

Local Caching

Next Steps