Skip to main content
Onyx Engine provides three neural network architectures optimized for hardware simulation: MLP, RNN, and Transformer. Each has different strengths depending on your system’s dynamics.

Architecture Overview

ArchitectureBest ForTemporal HandlingTraining Speed
MLPSimple dynamics, fast inferenceFixed windowFastest
RNNSequential patterns, long dependenciesRecurrent stateMedium
TransformerComplex interactions, variable contextsSelf-attentionSlowest

Multi-Layer Perceptron (MLP)

The MLP flattens the input sequence into a single vector and processes it through fully-connected layers.

Architecture

Input: (batch, sequence_length, num_inputs)
  ↓ Flatten
  ↓ Linear → LayerNorm → Activation → Dropout
  ↓ Linear → LayerNorm → Activation → Dropout
  ↓ ... (hidden_layers times)
  ↓ Linear
Output: (batch, num_outputs)

Configuration

from onyxengine.modeling import MLPConfig

config = MLPConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.01,
    sequence_length=8,      # Input history window
    hidden_layers=3,        # Number of hidden layers (1-10)
    hidden_size=64,         # Neurons per layer (1-1024)
    activation='relu',      # 'relu', 'gelu', 'tanh', 'sigmoid'
    dropout=0.2,            # Dropout rate (0.0-1.0)
    bias=True               # Include bias terms
)

When to Use

  • Systems with relatively simple dynamics
  • When inference speed is critical
  • As a baseline before trying more complex architectures
  • When you have limited training data

Strengths

  • Fastest training and inference
  • Fewest parameters for a given capacity
  • Works well with short sequence lengths

Limitations

  • Fixed-length input window
  • Limited ability to capture long-range temporal dependencies
  • May struggle with highly nonlinear dynamics

Recurrent Neural Network (RNN)

RNNs process sequences step-by-step, maintaining hidden state that captures temporal patterns.

Architecture

Input: (batch, sequence_length, num_inputs)
  ↓ RNN/LSTM/GRU layers (with hidden state)
  ↓ Linear
Output: (batch, num_outputs)

Configuration

from onyxengine.modeling import RNNConfig

config = RNNConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.01,
    sequence_length=12,
    rnn_type='LSTM',        # 'RNN', 'LSTM', 'GRU'
    hidden_layers=2,        # Number of RNN layers (1-10)
    hidden_size=64,         # Hidden state dimension (1-1024)
    dropout=0.1,            # Dropout between layers (0.0-1.0)
    bias=True               # Include bias terms
)

RNN Types

TypeDescriptionBest For
'RNN'Basic recurrent unitSimple sequences
'LSTM'Long Short-Term MemoryLong-range dependencies
'GRU'Gated Recurrent UnitBalance of LSTM benefits with fewer params

When to Use

  • Systems with strong temporal dependencies
  • When longer sequence context improves predictions
  • Systems where recent history matters more than distant past

Strengths

  • Natural handling of sequential data
  • Can capture longer temporal patterns than MLP
  • LSTM/GRU handle vanishing gradient problem

Limitations

  • Sequential processing limits parallelization
  • Slower training than MLP
  • May overfit on small datasets

Transformer

Transformers use self-attention to relate all positions in the input sequence directly.

Architecture

Input: (batch, sequence_length, num_inputs)
  ↓ Linear embedding
  ↓ Positional encoding
  ↓ Multi-head self-attention + Feed-forward
  ↓ ... (n_layer times)
  ↓ Linear
Output: (batch, num_outputs)

Configuration

from onyxengine.modeling import TransformerConfig

config = TransformerConfig(
    outputs=outputs,
    inputs=inputs,
    dt=0.01,
    sequence_length=16,
    n_layer=2,              # Transformer blocks (1-10)
    n_head=4,               # Attention heads (1-12)
    n_embd=64,              # Embedding dimension (must be divisible by n_head)
    dropout=0.1,            # Dropout rate (0.0-1.0)
    bias=True               # Include bias terms
)

When to Use

  • Complex systems with multiple interacting dynamics
  • When you need to capture relationships across the full sequence
  • Systems where both recent and distant history matter equally

Strengths

  • Parallel processing of sequence elements
  • Direct modeling of long-range dependencies
  • Flexible attention patterns

Limitations

  • Memory usage scales quadratically with sequence length
  • May need more data to train effectively

Choosing an Architecture

Decision Guide

1

Start with MLP

Use a simple MLP as your baseline. If it achieves good single-step accuracy, you may not need a more complex architecture.
2

Try sequence length first

Before switching architectures, try increasing sequence_length. Often a longer MLP window captures enough temporal context.
3

Move to RNN or Transformer for temporal patterns

If increasing sequence length helps but plateaus, try an LSTM/GRU or Transformer. RNNs extract patterns from longer sequences; Transformers suit systems with multiple interacting components or where relationships between distant timesteps matter.

Sequence Length Considerations

The sequence_length parameter affects all architectures differently:
ArchitectureEffect of Longer Sequence
MLPLinear increase in parameters (flattened input)
RNNMore steps to process, similar parameter count
TransformerQuadratic increase in attention computation