onyxengine

The primary api functions for interacting with the Engine.

onyxengine.api.get_object_metadata(name: str, version_id: str | None = None) dict[source]

Get the metadata for an object in the Engine.

Parameters:
  • name (str) – The name of the object to get metadata for

  • version_id (str, optional) – The version id of the object to get metadata for, None = latest_version. (Default is None)

Returns:

The metadata for the object, or None if the object does not exist.

Return type:

dict

Example:

# Get metadata for an Onyx object (dataset, model)
metadata = onyx.get_object_metadata('example_data')
print(metadata)

# Get metadata for a specific version
metadata = onyx.get_object_metadata('example_data', version_id='a05fb872-0a7d-4a68-b189-aeece143c7e4')
print(metadata)
onyxengine.api.load_dataset(name: str, version_id: str | None = None) OnyxDataset[source]

Load a dataset from the Engine, either from a local cached copy or by downloading from the Engine.

Parameters:
  • name (str) – The name of the dataset to load.

  • version_id (str, optional) – The version id of the dataset to load, None = latest_version. (Default is None)

Returns:

The loaded dataset.

Return type:

OnyxDataset

Example:

# Load the training dataset
train_dataset = onyx.load_dataset('example_train_data')
print(train_dataset.dataframe.head())
onyxengine.api.load_model(name: str, version_id: str | None = None) Module[source]

Load a model from the Engine, either from a local cached copy or by downloading from the Engine.

Parameters:
  • name (str) – The name of the model to load.

  • version_id (str, optional) – The version of the model to load, None = latest_version. (Default is None)

Returns:

The loaded Onyx model.

Return type:

torch.nn.Module

Example:

# Load our model
model = onyx.load_model('example_model')
print(model.config)

# Load a specific version of the model
model = onyx.load_model('example_model', version_id='a05fb872-0a7d-4a68-b189-aeece143c7e4')
print(model.config)
onyxengine.api.optimize_model(model_name: str = '', model_sim_config: ModelSimulatorConfig | None = None, dataset_name: str = '', dataset_version_id: str | None = None, optimization_config: OptimizationConfig | None = None)[source]

Optimize a model on the Engine using a specified dataset, model simulator config, and optimization configs. Optimization configs define the search space for hyperparameters.

Parameters:
  • model_name (str) – The name of the model to optimize. (Required)

  • model_sim_config (ModelSimulatorConfig) – The configuration for the model simulator. (Required)

  • dataset_name (str) – The name of the dataset to optimize on. (Required)

  • dataset_version_id (str, optional) – The version of the dataset to optimize on, None = latest_version. (Default is None)

  • optimization_config (OptimizationConfig) – The configuration for the optimization process. (Required)

Example:

# Model sim config (used across all trials)
sim_config = ModelSimulatorConfig(
    outputs=['acceleration'],
    states=[
        State(name='velocity', relation='derivative', parent='acceleration'),
        State(name='position', relation='derivative', parent='velocity'),
    ],
    controls=['control_input'],
    dt=0.0025
)

# Model optimization configs
mlp_opt = MLPOptConfig(
    sim_config=sim_config,
    num_inputs=sim_config.num_inputs,
    num_outputs=sim_config.num_outputs,
    sequence_length={"select": [1, 2, 4, 5, 6, 8, 10]},
    hidden_layers={"range": [2, 4, 1]},
    hidden_size={"select": [12, 24, 32, 64, 128]},
    activation={"select": ['relu', 'tanh']},
    dropout={"range": [0.0, 0.4, 0.1]},
    bias=True
)
rnn_opt = RNNOptConfig(
    sim_config=sim_config,
    num_inputs=sim_config.num_inputs,
    num_outputs=sim_config.num_outputs,
    rnn_type={"select": ['RNN', 'LSTM', 'GRU']},
    sequence_length={"select": [1, 2, 4, 5, 6, 8, 10, 12, 14, 15]},
    hidden_layers={"range": [2, 4, 1]},
    hidden_size={"select": [12, 24, 32, 64, 128]},
    dropout={"range": [0.0, 0.4, 0.1]},
    bias=True
)
transformer_opt = TransformerOptConfig(
    sim_config=sim_config,
    num_inputs=sim_config.num_inputs,
    num_outputs=sim_config.num_outputs,
    sequence_length={"select": [1, 2, 4, 5, 6, 8, 10, 12, 14, 15]},
    n_layer={"range": [2, 4, 1]},
    n_head={"range": [2, 10, 2]},
    n_embd={"select": [12, 24, 32, 64, 128]},
    dropout={"range": [0.0, 0.4, 0.1]},
    bias=True
)

# Optimizer configs
adamw_opt = AdamWOptConfig(
    lr={"select": [1e-5, 5e-5, 1e-4, 3e-4, 5e-4, 8e-4, 1e-3, 5e-3, 1e-2]},
    weight_decay={"select": [1e-4, 1e-3, 1e-2, 1e-1]}
)
sgd_opt = SGDOptConfig(
    lr={"select": [1e-5, 5e-5, 1e-4, 3e-4, 5e-4, 8e-4, 1e-3, 5e-3, 1e-2]},
    weight_decay={"select": [1e-4, 1e-3, 1e-2, 1e-1]},
    momentum={"select": [0, 0.8, 0.9, 0.95, 0.99]}
)

# Learning rate scheduler configs
cos_decay_opt = CosineDecayWithWarmupOptConfig(
    max_lr={"select": [1e-4, 3e-4, 5e-4, 8e-4, 1e-3, 3e-3, 5e-3]},
    min_lr={"select": [1e-6, 5e-6, 1e-5, 3e-5, 5e-5, 8e-5, 1e-4]},
    warmup_iters={"select": [50, 100, 200, 400, 800]},
    decay_iters={"select": [500, 1000, 2000, 4000, 8000]}
)
cos_anneal_opt = CosineAnnealingWarmRestartsOptConfig(
    T_0={"select": [200, 500, 1000, 2000, 5000, 10000]},
    T_mult={"select": [1, 2, 3]},
    eta_min={"select": [1e-6, 5e-6, 1e-5, 3e-5, 5e-5, 8e-5, 1e-4, 3e-4]}
)

# Optimization config
opt_config = OptimizationConfig(
    training_iters=2000,
    train_batch_size=512,
    test_dataset_size=500,
    checkpoint_type='single_step',
    opt_models=[mlp_opt, rnn_opt, transformer_opt],
    opt_optimizers=[adamw_opt, sgd_opt],
    opt_lr_schedulers=[None, cos_decay_opt, cos_anneal_opt],
    num_trials=5
)

# Execute model optimization
onyx.optimize_model(
    model_name='example_model_optimized',
    model_sim_config=sim_config,
    dataset_name='example_train_data',
    optimization_config=opt_config,
)
onyxengine.api.save_dataset(name: str, dataset: OnyxDataset, source_datasets: List[Dict[str, str | None]] = [])[source]

Save a dataset to the Engine.

Parameters:
  • name (str) – The name for the new dataset

  • dataset (OnyxDataset) – The OnyxDataset object to save

  • source_datasets (List[Dict[str, Optional[str]]]) – The source datasets used as a list of dictionaries, eg. [{‘name’: ‘dataset_name’, ‘version_id’: ‘dataset_version’}]. If no version is provided, the latest version will be used.

Example:

# Load data
raw_data = onyx.load_dataset('example_data')

# Pull out features for model training
train_data = pd.DataFrame()
train_data['acceleration_predicted'] = raw_data.dataframe['acceleration']
train_data['velocity'] = raw_data.dataframe['velocity']
train_data['position'] = raw_data.dataframe['position']
train_data['control_input'] = raw_data.dataframe['control_input']
train_data = train_data.dropna()

# Save training dataset
train_dataset = OnyxDataset(
    features=train_data.columns,
    dataframe=train_data,
    num_outputs=1,
    num_state=2,
    num_control=1,
    dt=0.0025
)
onyx.save_dataset(name='example_train_data', dataset=train_dataset, source_datasets=[{'name': 'example_data'}])
onyxengine.api.save_model(name: str, model: Module, source_datasets: List[Dict[str, str | None]] = [])[source]

Save a model to the Engine. Generally you won’t need to use this function as the Engine will save models it trains automatically.

Parameters:
  • name (str) – The name for the new model.

  • model (torch.nn.Module) – The Onyx model to save.

  • source_datasets (List[Dict[str, Optional[str]]]) – The source datasets used as a list of dictionaries, eg. [{‘name’: ‘dataset_name’, ‘version_id’: ‘dataset_version’}]. If no version is provided, the latest version will be used.

Example:

# Create model configuration
sim_config = ModelSimulatorConfig(
     outputs=['acceleration'],
     states=[
         State(name='velocity', relation='derivative', parent='acceleration'),
         State(name='position', relation='derivative', parent='velocity'),
     ],
     controls=['control_input'],
     dt=0.0025
 )
mlp_config = MLPConfig(
     sim_config=sim_config,
     num_inputs=sim_config.num_inputs,
     num_outputs=sim_config.num_outputs,
     hidden_layers=2,
     hidden_size=32,
     activation='relu',
     dropout=0.2,
     bias=True
 )

# Create and save model
model = MLP(mlp_config)
onyx.save_model(name='example_model', model=model, source_datasets=[{'name': 'example_train_data'}])
onyxengine.api.train_model(model_name: str = '', model_config: MLPConfig | RNNConfig | TransformerConfig | None = None, dataset_name: str = '', dataset_version_id: str | None = None, training_config: TrainingConfig = TrainingConfig(training_iters=3000, train_batch_size=32, train_val_split_ratio=0.9, test_dataset_size=500, checkpoint_type='single_step', optimizer=AdamWConfig(name='adamw', lr=0.0003, weight_decay=0.01), lr_scheduler=None), monitor_training: bool = True)[source]

Train a model on the Engine using a specified dataset, model config, and training config.

Parameters:
  • model_name (str) – The name of the model to train. (Required)

  • model_config (Union[MLPConfig, RNNConfig, TransformerConfig]) – The configuration for the model to train. (Required)

  • dataset_name (str) – The name of the dataset to train on. (Required)

  • dataset_version_id (str, optional) – The version of the dataset to train on, None = latest_version. (Default is None)

  • training_config (TrainingConfig) – The configuration for the training process. (Default is TrainingConfig())

  • monitor_training (bool, optional) – Whether to monitor the training job. (Default is True)

Example:

# Model config
sim_config = ModelSimulatorConfig(
    outputs=['acceleration'],
    states=[
        State(name='velocity', relation='derivative', parent='acceleration'),
        State(name='position', relation='derivative', parent='velocity'),
    ],
    controls=['control_input'],
    dt=0.0025
)
model_config = MLPConfig(
    sim_config=sim_config,
    num_inputs=sim_config.num_inputs,
    num_outputs=sim_config.num_outputs,
    hidden_layers=2,
    hidden_size=64,
    activation='relu',
    dropout=0.2,
    bias=True
)

# Training config
training_config = TrainingConfig(
    training_iters=2000,
    train_batch_size=32,
    test_dataset_size=500,
    checkpoint_type='single_step',
    optimizer=AdamWConfig(lr=3e-4, weight_decay=1e-2),
    lr_scheduler=CosineDecayWithWarmupConfig(max_lr=3e-4, min_lr=3e-5, warmup_iters=200, decay_iters=1000)
)

# Execute training
onyx.train_model(
    model_name='example_model',
    model_config=model_config,
    dataset_name='example_train_data',
    training_config=training_config,
    monitor_training=True
)