Model Architectures

Onyx Engine provides three neural network architectures optimized for hardware simulation: MLP, RNN, and Transformer. Each has different strengths depending on your system’s dynamics.

Architecture Overview

Architecture	Best For	Temporal Handling	Training Speed
MLP	Simple dynamics, fast inference	Fixed window	Fastest
RNN	Sequential patterns, long dependencies	Recurrent state	Medium
Transformer	Complex interactions, variable contexts	Self-attention	Slowest

Multi-Layer Perceptron (MLP)

The MLP flattens the input sequence into a single vector and processes it through fully-connected layers.

Architecture

Input: (batch, sequence_length, num_inputs)
  ↓ Flatten
  ↓ Linear → LayerNorm → Activation → Dropout
  ↓ Linear → LayerNorm → Activation → Dropout
  ↓ ... (hidden_layers times)
  ↓ Linear
Output: (batch, num_outputs)

Configuration

from onyxengine.modeling import MLPConfig

config = MLPConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.01,
    sequence_length=8,      # Input history window
    hidden_layers=3,        # Number of hidden layers (1-10)
    hidden_size=64,         # Neurons per layer (1-1024)
    activation='relu',      # 'relu', 'gelu', 'tanh', 'sigmoid'
    dropout=0.2,            # Dropout rate (0.0-1.0)
    bias=True               # Include bias terms
)

When to Use

Systems with relatively simple dynamics
When inference speed is critical
As a baseline before trying more complex architectures
When you have limited training data

Strengths

Fastest training and inference
Fewest parameters for a given capacity
Works well with short sequence lengths

Limitations

Fixed-length input window
Limited ability to capture long-range temporal dependencies
May struggle with highly nonlinear dynamics

Recurrent Neural Network (RNN)

RNNs process sequences step-by-step, maintaining hidden state that captures temporal patterns.

Architecture

Input: (batch, sequence_length, num_inputs)
  ↓ RNN/LSTM/GRU layers (with hidden state)
  ↓ Linear
Output: (batch, num_outputs)

Configuration

from onyxengine.modeling import RNNConfig

config = RNNConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.01,
    sequence_length=12,
    rnn_type='LSTM',        # 'RNN', 'LSTM', 'GRU'
    hidden_layers=2,        # Number of RNN layers (1-10)
    hidden_size=64,         # Hidden state dimension (1-1024)
    dropout=0.1,            # Dropout between layers (0.0-1.0)
    bias=True               # Include bias terms
)

RNN Types

Type	Description	Best For
`'RNN'`	Basic recurrent unit	Simple sequences
`'LSTM'`	Long Short-Term Memory	Long-range dependencies
`'GRU'`	Gated Recurrent Unit	Balance of LSTM benefits with fewer params

When to Use

Systems with strong temporal dependencies
When longer sequence context improves predictions
Systems where recent history matters more than distant past

Strengths

Natural handling of sequential data
Can capture longer temporal patterns than MLP
LSTM/GRU handle vanishing gradient problem

Limitations

Sequential processing limits parallelization
Slower training than MLP
May overfit on small datasets

Transformer

Transformers use self-attention to relate all positions in the input sequence directly.

Architecture

Input: (batch, sequence_length, num_inputs)
  ↓ Linear embedding
  ↓ Positional encoding
  ↓ Multi-head self-attention + Feed-forward
  ↓ ... (n_layer times)
  ↓ Linear
Output: (batch, num_outputs)

Configuration

from onyxengine.modeling import TransformerConfig

config = TransformerConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.01,
    sequence_length=16,
    n_layer=2,              # Transformer blocks (1-10)
    n_head=4,               # Attention heads (1-12)
    n_embd=64,              # Embedding dimension (must be divisible by n_head)
    dropout=0.1,            # Dropout rate (0.0-1.0)
    bias=True               # Include bias terms
)

When to Use

Complex systems with multiple interacting dynamics
When you need to capture relationships across the full sequence
Systems where both recent and distant history matter equally

Strengths

Parallel processing of sequence elements
Direct modeling of long-range dependencies
Flexible attention patterns

Limitations

Memory usage scales quadratically with sequence length
May need more data to train effectively

Choosing an Architecture

Decision Guide

Start with MLP

Use a simple MLP as your baseline. If it achieves good single-step accuracy, you may not need a more complex architecture.

Try sequence length first

Before switching architectures, try increasing sequence_length. Often a longer MLP window captures enough temporal context.

Move to RNN or Transformer for temporal patterns

If increasing sequence length helps but plateaus, try an LSTM/GRU or Transformer. RNNs extract patterns from longer sequences; Transformers suit systems with multiple interacting components or where relationships between distant timesteps matter.

Sequence Length Considerations

The sequence_length parameter affects all architectures differently:

Architecture	Effect of Longer Sequence
MLP	Linear increase in parameters (flattened input)
RNN	More steps to process, similar parameter count
Transformer	Quadratic increase in attention computation

Getting Started

Tutorials

Concepts

Architecture Overview

Multi-Layer Perceptron (MLP)

Architecture

Configuration

When to Use

Strengths

Limitations

Recurrent Neural Network (RNN)

Architecture

Configuration

RNN Types

When to Use

Strengths

Limitations

Transformer

Architecture

Configuration

When to Use

Strengths

Limitations

Choosing an Architecture

Decision Guide

Sequence Length Considerations

Getting Started

Tutorials

Concepts

​Architecture Overview

​Multi-Layer Perceptron (MLP)

​Architecture

​Configuration

​When to Use

​Strengths

​Limitations

​Recurrent Neural Network (RNN)

​Architecture

​Configuration

​RNN Types

​When to Use

​Strengths

​Limitations

​Transformer

​Architecture

​Configuration

​When to Use

​Strengths

​Limitations

​Choosing an Architecture

​Decision Guide

​Sequence Length Considerations

Architecture Overview

Multi-Layer Perceptron (MLP)

Architecture

Configuration

When to Use

Strengths

Limitations

Recurrent Neural Network (RNN)

Architecture

Configuration

RNN Types

When to Use

Strengths

Limitations

Transformer

Architecture

Configuration

When to Use

Strengths

Limitations

Choosing an Architecture

Decision Guide

Sequence Length Considerations