Architecture Overview
| Architecture | Best For | Temporal Handling | Training Speed |
|---|---|---|---|
| MLP | Simple dynamics, fast inference | Fixed window | Fastest |
| RNN | Sequential patterns, long dependencies | Recurrent state | Medium |
| Transformer | Complex interactions, variable contexts | Self-attention | Slowest |
Multi-Layer Perceptron (MLP)
The MLP flattens the input sequence into a single vector and processes it through fully-connected layers.Architecture
Configuration
When to Use
- Systems with relatively simple dynamics
- When inference speed is critical
- As a baseline before trying more complex architectures
- When you have limited training data
Strengths
- Fastest training and inference
- Fewest parameters for a given capacity
- Works well with short sequence lengths
Limitations
- Fixed-length input window
- Limited ability to capture long-range temporal dependencies
- May struggle with highly nonlinear dynamics
Recurrent Neural Network (RNN)
RNNs process sequences step-by-step, maintaining hidden state that captures temporal patterns.Architecture
Configuration
RNN Types
| Type | Description | Best For |
|---|---|---|
'RNN' | Basic recurrent unit | Simple sequences |
'LSTM' | Long Short-Term Memory | Long-range dependencies |
'GRU' | Gated Recurrent Unit | Balance of LSTM benefits with fewer params |
When to Use
- Systems with strong temporal dependencies
- When longer sequence context improves predictions
- Systems where recent history matters more than distant past
Strengths
- Natural handling of sequential data
- Can capture longer temporal patterns than MLP
- LSTM/GRU handle vanishing gradient problem
Limitations
- Sequential processing limits parallelization
- Slower training than MLP
- May overfit on small datasets
Transformer
Transformers use self-attention to relate all positions in the input sequence directly.Architecture
Configuration
When to Use
- Complex systems with multiple interacting dynamics
- When you need to capture relationships across the full sequence
- Systems where both recent and distant history matter equally
Strengths
- Parallel processing of sequence elements
- Direct modeling of long-range dependencies
- Flexible attention patterns
Limitations
- Memory usage scales quadratically with sequence length
- May need more data to train effectively
Choosing an Architecture
Decision Guide
Start with MLP
Use a simple MLP as your baseline. If it achieves good single-step accuracy, you may not need a more complex architecture.
Try sequence length first
Before switching architectures, try increasing
sequence_length. Often a longer MLP window captures enough temporal context.Sequence Length Considerations
Thesequence_length parameter affects all architectures differently:
| Architecture | Effect of Longer Sequence |
|---|---|
| MLP | Linear increase in parameters (flattened input) |
| RNN | More steps to process, similar parameter count |
| Transformer | Quadratic increase in attention computation |