Complete Training Example
Here’s a full training script you can use as a starting point:Defining Model Inputs and Outputs
To make your life easier for hardware dynamics modeling, Onyx provides some built-in tools for creating inputs/outputs with physical relationships, traceable naming, and feature scaling.Outputs
Outputs are the features your model predicts. You can define multiple outputs and includeparent and relation to derive outputs from other outputs. See Output for the full API.
Inputs
Inputs are the features fed into the model to make predictions. Similarly, you can define multiple inputs and includeparent and relation to derive inputs from both outputs and other inputs. See Input for the full API.
Feature Relationships
Theparent parameter specifies the feature to derive from. The relation parameter specifies the mathematical relationship:
| Relation | Equation | Use Case |
|---|---|---|
'derivative' | state[t+1] = state[t] + parent[t] * dt | Predicting/using time derivatives |
'delta' | state[t+1] = state[t] + parent[t] | Predicting more general deltas |
'equal' | state[t+1] = parent[t] | Feed output back as input |
Feature Naming
By default, thename parameter will be used to find the input/output’s matching feature in the dataset when training the model. If you want to use a different name for the input/output, you can use the dataset_feature parameter.
Model input/output names must be unique.
Feature Scaling
When training AI models, it’s important to scale inputs/outputs so that they are in similar ranges. For example, if you were training a model with outputsPressure [Pascal] and Torque [Newton-Meters], the values for pressure (and therefore loss/gradients) would be much larger than the values for torque, which can encourage the model to focus on learning to predict pressure and potentially ignore torque.
To address this, it’s typical to scale all features to have a mean of 0 and standard deviation of 1, or range from [-1, 1] or [0, 1].
Keeping track of the scaling factors of each feature for use in both training and deployment is prone to error, so Onyx automatically calculates the scaling factors and bundles them with each model for you:
train_mean: The mean of the dataset feature during trainingtrain_std: The standard deviation of the dataset feature during trainingtrain_min: The minimum value of the dataset feature during trainingtrain_max: The maximum value of the dataset feature during trainingscale: The scaling method used, either"mean"or[min, max](see section below)
scale parameter controls how model inputs/outputs are scaled:
"mean"(default): Scales the feature to have a mean of0and std of1.[min, max]: For dataset features with known physical bounds, scales from[min, max]to[-1, 1].
"mean" scaling is fine for most features (bounded or not).
Model Configuration
Model configurations (eg.MLPConfig, RNNConfig, TransformerConfig) are separate from the actual model class (MLP, RNN, Transformer), to make it easier to manage configurations.
The following configuration parameters are common to all model configurations:
outputs: List of model outputsinputs: List of model inputsdt: Time step for model (does not need to match the dataset time step)sequence_length: The number of previous time steps the model sees as a history window
MLP (MLPConfig)
Best for systems with relatively simple dynamics or for fast inference:
RNN (RNNConfig)
Better for systems with complex temporal dependencies:
Transformer (TransformerConfig)
Powerful for capturing long-range dependencies:
Optimizers
Optimizers are responsible for updating the model’s weights during training, and can use an accompanyinglr_scheduler (more below). As such, the OptimizerConfig can have a significant impact on how well the model learns.
This is an area where model optimization can be particularly useful, but we recommend first verifying you can train a simple model that shows signs of learning your dynamics.
A good starting baseline is the AdamWConfig with a learning rate of 3e-4 to 3e-5 and weight decay of 1e-2 to 1e-3.
AdamW (AdamWConfig):
Recommended for most cases.
SGD (SGDConfig):
Learning Rate Schedulers
Thelr_scheduler is used to adjust the learning rate used by the optimizer over the course of training. This is a tradeoff that can help the model learn efficiently (fewer iterations) with larger learning rates, while also squeezing out more performance with fine-tuned weight updates from smaller learning rates when close to convergence.
The default lr_scheduler is None, which means the optimizer will use a constant learning rate. We recommend starting with None and first finding a constant learning rate + simple model that shows stable learning on the kinds of data you’re working with.
No Scheduler
Use a constant learning rate:Cosine Decay with Warmup
Learning rate starts low, linearly warms up to peak, then decays smoothly until the minimum learning rate is reached. Recommended for most cases.- Standard choice for most training runs
- When you want smooth convergence to a minimum
Cosine Annealing with Warm Restarts
Periodic cosine decay with restarts that can lengthen over time.- Finding multiple local minima
- When stuck in suboptimal solutions
- Longer training runs
Training Configuration
TheTrainingConfig class defines parameters for training the model you’ve configured.
Checkpoint Types
| Type | Description | Use Case |
|---|---|---|
'single_step' | Save best training checkpoint for next-step prediction | Deploying with model(input) |
'multi_step' | Save best training checkpoint for trajectory simulation | Deploying with model.simulate(input) |
Running Training
After Training
The trained model is automatically saved, versioned, and traced in the Engine to be pulled directly from code. Theonyx.load_model function returns a full model object as either a MLP, RNN, or Transformer, depending on the model configuration.
Load a Model (latest version)
Load a Model (specific version)
It is recommended to specify a specificversion_id when loading a model to both select the best performing version, and to ensure your code is reproducible.
Load a Model (local offline mode)
To load the model without downloading it from the Engine, you can use themode='offline' parameter. This will only load the model from the local cache (more below), and will not check for updates from the Engine.
You will need to have previously downloaded the model files (.pt and .json) and have them in the local cache directory ([SCRIPT_DIRECTORY]/.onyx/models/).
Local Caching
Models are cached locally after the first download. The SDK automatically:- Checks if the local version matches the requested version
- Downloads only if the local cache is outdated
- Stores files in
[SCRIPT_DIRECTORY]/.onyx/models/
Next Steps
Optimizing Models
Automatically search for the best hyperparameters
Simulating with Models
Deploy your trained model for simulation