from onyxengine.modeling import TransformerConfig
config = TransformerConfig(
outputs: List[Output],
inputs: List[Input],
dt: float,
sequence_length: int = 1,
n_layer: int = 1,
n_head: int = 4,
n_embd: int = 32,
dropout: float = 0.0,
bias: bool = True
)
Configuration class for Transformer models (GPT-style decoder-only architecture).
Parameters
List of output feature definitions.
List of input feature definitions.
Time step in seconds. Must match your dataset’s sampling rate.
Length of the input sequence. Range: 1-100.
Number of transformer blocks. Range: 1-10.
Number of attention heads. Range: 1-12.
Embedding dimension. Must be divisible by n_head. Range: 1-1024.
Dropout rate. Range: 0.0-1.0.
Whether to include bias terms.
Example
from onyxengine.modeling import TransformerConfig, Input, Output
outputs = [Output(name='acceleration')]
inputs = [
Input(name='velocity', parent='acceleration', relation='derivative'),
Input(name='position', parent='velocity', relation='derivative'),
Input(name='control_input'),
]
config = TransformerConfig(
outputs=outputs,
inputs=inputs,
dt=0.01,
sequence_length=16,
n_layer=2,
n_head=4,
n_embd=64, # Must be divisible by n_head (64/4=16)
dropout=0.1,
bias=True
)
Architecture
Input: (batch, sequence_length, num_inputs)
↓ Linear embedding to n_embd dimensions
↓ Add positional encoding
↓ Transformer block (multi-head attention + feed-forward)
↓ ... (n_layer times)
↓ Linear(n_embd, num_outputs)
Output: (batch, num_outputs)
Constraints
n_embd must be divisible by n_head. For example:
n_embd=64, n_head=4 (valid: 64/4=16)
n_embd=64, n_head=3 (invalid: 64/3=21.33)
For hyperparameter optimization:
from onyxengine.modeling import TransformerOptConfig
transformer_opt = TransformerOptConfig(
outputs=outputs,
inputs=inputs,
dt=0.01,
sequence_length={"select": [4, 8, 12, 16]},
n_layer={"range": [2, 4, 1]},
n_head={"select": [2, 4, 8]},
n_embd={"select": [32, 64, 128]},
dropout={"range": [0.0, 0.4, 0.1]},
bias=True
)