onyxengine
The primary api functions for interacting with the Engine.
- onyxengine.api.get_object_metadata(name: str, version_id: str | None = None) dict [source]
Get the metadata for an object in the Engine.
- Parameters:
name (str) – The name of the object to get metadata for
version_id (str, optional) – The version id of the object to get metadata for, None = latest_version. (Default is None)
- Returns:
The metadata for the object, or None if the object does not exist.
- Return type:
dict
Example:
# Get metadata for an Onyx object (dataset, model) metadata = onyx.get_object_metadata('example_data') print(metadata) # Get metadata for a specific version metadata = onyx.get_object_metadata('example_data', version_id='a05fb872-0a7d-4a68-b189-aeece143c7e4') print(metadata)
- onyxengine.api.load_dataset(name: str, version_id: str | None = None) OnyxDataset [source]
Load a dataset from the Engine, either from a local cached copy or by downloading from the Engine.
- Parameters:
name (str) – The name of the dataset to load.
version_id (str, optional) – The version id of the dataset to load, None = latest_version. (Default is None)
- Returns:
The loaded dataset.
- Return type:
Example:
# Load the training dataset train_dataset = onyx.load_dataset('example_train_data') print(train_dataset.dataframe.head())
- onyxengine.api.load_model(name: str, version_id: str | None = None) Module [source]
Load a model from the Engine, either from a local cached copy or by downloading from the Engine.
- Parameters:
name (str) – The name of the model to load.
version_id (str, optional) – The version of the model to load, None = latest_version. (Default is None)
- Returns:
The loaded Onyx model.
- Return type:
torch.nn.Module
Example:
# Load our model model = onyx.load_model('example_model') print(model.config) # Load a specific version of the model model = onyx.load_model('example_model', version_id='a05fb872-0a7d-4a68-b189-aeece143c7e4') print(model.config)
- onyxengine.api.optimize_model(model_name: str = '', model_sim_config: ModelSimulatorConfig | None = None, dataset_name: str = '', dataset_version_id: str | None = None, optimization_config: OptimizationConfig | None = None)[source]
Optimize a model on the Engine using a specified dataset, model simulator config, and optimization configs. Optimization configs define the search space for hyperparameters.
- Parameters:
model_name (str) – The name of the model to optimize. (Required)
model_sim_config (ModelSimulatorConfig) – The configuration for the model simulator. (Required)
dataset_name (str) – The name of the dataset to optimize on. (Required)
dataset_version_id (str, optional) – The version of the dataset to optimize on, None = latest_version. (Default is None)
optimization_config (OptimizationConfig) – The configuration for the optimization process. (Required)
Example:
# Model sim config (used across all trials) sim_config = ModelSimulatorConfig( outputs=['acceleration'], states=[ State(name='velocity', relation='derivative', parent='acceleration'), State(name='position', relation='derivative', parent='velocity'), ], controls=['control_input'], dt=0.0025 ) # Model optimization configs mlp_opt = MLPOptConfig( sim_config=sim_config, num_inputs=sim_config.num_inputs, num_outputs=sim_config.num_outputs, sequence_length={"select": [1, 2, 4, 5, 6, 8, 10]}, hidden_layers={"range": [2, 4, 1]}, hidden_size={"select": [12, 24, 32, 64, 128]}, activation={"select": ['relu', 'tanh']}, dropout={"range": [0.0, 0.4, 0.1]}, bias=True ) rnn_opt = RNNOptConfig( sim_config=sim_config, num_inputs=sim_config.num_inputs, num_outputs=sim_config.num_outputs, rnn_type={"select": ['RNN', 'LSTM', 'GRU']}, sequence_length={"select": [1, 2, 4, 5, 6, 8, 10, 12, 14, 15]}, hidden_layers={"range": [2, 4, 1]}, hidden_size={"select": [12, 24, 32, 64, 128]}, dropout={"range": [0.0, 0.4, 0.1]}, bias=True ) transformer_opt = TransformerOptConfig( sim_config=sim_config, num_inputs=sim_config.num_inputs, num_outputs=sim_config.num_outputs, sequence_length={"select": [1, 2, 4, 5, 6, 8, 10, 12, 14, 15]}, n_layer={"range": [2, 4, 1]}, n_head={"range": [2, 10, 2]}, n_embd={"select": [12, 24, 32, 64, 128]}, dropout={"range": [0.0, 0.4, 0.1]}, bias=True ) # Optimizer configs adamw_opt = AdamWOptConfig( lr={"select": [1e-5, 5e-5, 1e-4, 3e-4, 5e-4, 8e-4, 1e-3, 5e-3, 1e-2]}, weight_decay={"select": [1e-4, 1e-3, 1e-2, 1e-1]} ) sgd_opt = SGDOptConfig( lr={"select": [1e-5, 5e-5, 1e-4, 3e-4, 5e-4, 8e-4, 1e-3, 5e-3, 1e-2]}, weight_decay={"select": [1e-4, 1e-3, 1e-2, 1e-1]}, momentum={"select": [0, 0.8, 0.9, 0.95, 0.99]} ) # Learning rate scheduler configs cos_decay_opt = CosineDecayWithWarmupOptConfig( max_lr={"select": [1e-4, 3e-4, 5e-4, 8e-4, 1e-3, 3e-3, 5e-3]}, min_lr={"select": [1e-6, 5e-6, 1e-5, 3e-5, 5e-5, 8e-5, 1e-4]}, warmup_iters={"select": [50, 100, 200, 400, 800]}, decay_iters={"select": [500, 1000, 2000, 4000, 8000]} ) cos_anneal_opt = CosineAnnealingWarmRestartsOptConfig( T_0={"select": [200, 500, 1000, 2000, 5000, 10000]}, T_mult={"select": [1, 2, 3]}, eta_min={"select": [1e-6, 5e-6, 1e-5, 3e-5, 5e-5, 8e-5, 1e-4, 3e-4]} ) # Optimization config opt_config = OptimizationConfig( training_iters=2000, train_batch_size=512, test_dataset_size=500, checkpoint_type='single_step', opt_models=[mlp_opt, rnn_opt, transformer_opt], opt_optimizers=[adamw_opt, sgd_opt], opt_lr_schedulers=[None, cos_decay_opt, cos_anneal_opt], num_trials=5 ) # Execute model optimization onyx.optimize_model( model_name='example_model_optimized', model_sim_config=sim_config, dataset_name='example_train_data', optimization_config=opt_config, )
- onyxengine.api.save_dataset(name: str, dataset: OnyxDataset, source_datasets: List[Dict[str, str | None]] = [])[source]
Save a dataset to the Engine.
- Parameters:
name (str) – The name for the new dataset
dataset (OnyxDataset) – The OnyxDataset object to save
source_datasets (List[Dict[str, Optional[str]]]) – The source datasets used as a list of dictionaries, eg. [{‘name’: ‘dataset_name’, ‘version_id’: ‘dataset_version’}]. If no version is provided, the latest version will be used.
Example:
# Load data raw_data = onyx.load_dataset('example_data') # Pull out features for model training train_data = pd.DataFrame() train_data['acceleration_predicted'] = raw_data.dataframe['acceleration'] train_data['velocity'] = raw_data.dataframe['velocity'] train_data['position'] = raw_data.dataframe['position'] train_data['control_input'] = raw_data.dataframe['control_input'] train_data = train_data.dropna() # Save training dataset train_dataset = OnyxDataset( features=train_data.columns, dataframe=train_data, num_outputs=1, num_state=2, num_control=1, dt=0.0025 ) onyx.save_dataset(name='example_train_data', dataset=train_dataset, source_datasets=[{'name': 'example_data'}])
- onyxengine.api.save_model(name: str, model: Module, source_datasets: List[Dict[str, str | None]] = [])[source]
Save a model to the Engine. Generally you won’t need to use this function as the Engine will save models it trains automatically.
- Parameters:
name (str) – The name for the new model.
model (torch.nn.Module) – The Onyx model to save.
source_datasets (List[Dict[str, Optional[str]]]) – The source datasets used as a list of dictionaries, eg. [{‘name’: ‘dataset_name’, ‘version_id’: ‘dataset_version’}]. If no version is provided, the latest version will be used.
Example:
# Create model configuration sim_config = ModelSimulatorConfig( outputs=['acceleration'], states=[ State(name='velocity', relation='derivative', parent='acceleration'), State(name='position', relation='derivative', parent='velocity'), ], controls=['control_input'], dt=0.0025 ) mlp_config = MLPConfig( sim_config=sim_config, num_inputs=sim_config.num_inputs, num_outputs=sim_config.num_outputs, hidden_layers=2, hidden_size=32, activation='relu', dropout=0.2, bias=True ) # Create and save model model = MLP(mlp_config) onyx.save_model(name='example_model', model=model, source_datasets=[{'name': 'example_train_data'}])
- onyxengine.api.train_model(model_name: str = '', model_config: MLPConfig | RNNConfig | TransformerConfig | None = None, dataset_name: str = '', dataset_version_id: str | None = None, training_config: TrainingConfig = TrainingConfig(training_iters=3000, train_batch_size=32, train_val_split_ratio=0.9, test_dataset_size=500, checkpoint_type='single_step', optimizer=AdamWConfig(name='adamw', lr=0.0003, weight_decay=0.01), lr_scheduler=None), monitor_training: bool = True)[source]
Train a model on the Engine using a specified dataset, model config, and training config.
- Parameters:
model_name (str) – The name of the model to train. (Required)
model_config (Union[MLPConfig, RNNConfig, TransformerConfig]) – The configuration for the model to train. (Required)
dataset_name (str) – The name of the dataset to train on. (Required)
dataset_version_id (str, optional) – The version of the dataset to train on, None = latest_version. (Default is None)
training_config (TrainingConfig) – The configuration for the training process. (Default is TrainingConfig())
monitor_training (bool, optional) – Whether to monitor the training job. (Default is True)
Example:
# Model config sim_config = ModelSimulatorConfig( outputs=['acceleration'], states=[ State(name='velocity', relation='derivative', parent='acceleration'), State(name='position', relation='derivative', parent='velocity'), ], controls=['control_input'], dt=0.0025 ) model_config = MLPConfig( sim_config=sim_config, num_inputs=sim_config.num_inputs, num_outputs=sim_config.num_outputs, hidden_layers=2, hidden_size=64, activation='relu', dropout=0.2, bias=True ) # Training config training_config = TrainingConfig( training_iters=2000, train_batch_size=32, test_dataset_size=500, checkpoint_type='single_step', optimizer=AdamWConfig(lr=3e-4, weight_decay=1e-2), lr_scheduler=CosineDecayWithWarmupConfig(max_lr=3e-4, min_lr=3e-5, warmup_iters=200, decay_iters=1000) ) # Execute training onyx.train_model( model_name='example_model', model_config=model_config, dataset_name='example_train_data', training_config=training_config, monitor_training=True )