Skip to content

Backtest Specification

Backtest Engine Requirements

Core Principles

  1. No lookahead: Signals computed only from data available at decision time
  2. No repainting: Once a signal is generated, it is immutable
  3. Realistic execution: All costs, slippage, and latency modeled
  4. Reproducible: Same inputs → same outputs (seeded randomness)
  5. Auditable: Full trade log with entry rationale

Engine Configuration

# src/evaluation/backtester.py

class BacktestConfig:
    # Timing
    signal_delay_bars: int = 1        # Signal on bar N, execute on bar N+1
    execution_price: str = 'open'     # 'open', 'close', 'vwap', 'worst_case'

    # Position
    max_positions: int = 1            # Concurrent positions
    position_sizing: str = 'fixed'    # 'fixed', 'risk_pct', 'kelly'

    # Costs (applied per trade)
    apply_commission: bool = True
    apply_spread: bool = True
    apply_slippage: bool = True

    # Validation
    run_anti_bias_checks: bool = True
    require_oos_validation: bool = True

Cost Models

Commission Structure

Market Type Rate Example (1 lot)
FX (MT5) Per lot $3.50/side $7.00 round trip
FX (Prop firm) Included in spread $0 $0
Crypto (Binance) Percentage 0.04% maker, 0.06% taker 0.1% RT
Crypto (Bybit) Percentage 0.02% maker, 0.055% taker 0.075% RT
NQ/MNQ Futures Per contract $0.50-2.50/side $1-5 RT
Gold Futures Per contract $1.25/side $2.50 RT

Spread Model

def get_spread_cost(symbol, volatility_regime='normal'):
    """
    Return spread in price units.

    Spreads widen during:
    - News events (2-5x)
    - Low liquidity periods (1.5-2x)
    - High volatility (1.5-3x)
    """
    base_spreads = {
        'EURUSD': 0.00010,  # 1.0 pip
        'GBPUSD': 0.00012,  # 1.2 pips
        'USDJPY': 0.012,    # 1.2 pips
        'USDCAD': 0.00015,  # 1.5 pips
        'BTCUSD': 5.0,      # $5
        'ETHUSD': 0.50,     # $0.50
        'NQ': 0.50,         # 0.5 points
        'GC': 0.30,         # $0.30
    }

    multipliers = {
        'normal': 1.0,
        'elevated': 1.5,
        'high': 2.5,
        'extreme': 5.0
    }

    return base_spreads.get(symbol, 0.0001) * multipliers[volatility_regime]

Slippage Model

Condition Slippage (FX) Slippage (Crypto) Slippage (Futures)
Normal 0.1-0.3 pips 0.01-0.03% 0.25-0.5 ticks
High volume 0.3-0.5 pips 0.03-0.1% 0.5-1 tick
Low liquidity 0.5-2.0 pips 0.1-0.3% 1-3 ticks
News/events 1.0-5.0 pips 0.2-1.0% 2-10 ticks
def calculate_slippage(symbol, order_type, volatility, position_size):
    """
    Calculate expected slippage.

    Factors:
    - Base slippage per instrument
    - Volatility multiplier (ATR-based)
    - Size impact (larger orders = more slippage)
    - Order type (market vs limit)
    """
    base_slippage = get_base_slippage(symbol)

    # Volatility adjustment
    vol_mult = max(1.0, volatility / get_avg_volatility(symbol))

    # Size impact (only for larger positions)
    size_mult = 1.0 + max(0, (position_size / get_avg_volume(symbol)) - 0.01)

    # Limit orders get 50% less slippage on average
    order_mult = 0.5 if order_type == 'limit' else 1.0

    return base_slippage * vol_mult * size_mult * order_mult

Fill Assumptions

Order Types

Order Type Fill Price Certainty Use Case
Market Next bar open + slippage 100% Exits, urgent entries
Limit Limit price (if touched) 70-90% Standard entries
Stop Stop price + slippage 95% Stop losses
Stop-Limit Limit (if stop triggered) 60-80% Controlled stops

Partial Fill Handling

# Conservative assumption: No partial fills modeled
# All orders either fully filled or not filled

fill_rate_assumptions = {
    'market': 1.00,     # Always fills
    'limit': 0.80,      # 80% fill rate for limits at touch
    'stop': 0.95,       # 95% for stops (gaps can skip)
    'stop_limit': 0.70  # 70% for stop-limits
}

Gap Handling

Scenario Treatment
Gap through stop Fill at gap open (worse price)
Gap through limit Fill at limit (better price)
Gap through entry Fill at gap open
Weekend gap Apply to Monday open

Latency Assumptions

Signal-to-Execution Timeline

Signal Generated (bar close)
    ├── Pine Script computation: ~50ms
    ├── TradingView alert dispatch: 100-500ms
    ├── Webhook receive + process: 100-300ms
    ├── Order submission to broker: 50-200ms
    └── Broker acknowledgment: 50-500ms
    ──────────────────────────────────
    Total: 350ms - 1.5s typical
           Up to 5s in degraded conditions

Backtest Latency Modeling

Mode Entry Execution Exit Execution
Conservative Next bar open Next bar open
Realistic Same bar close + slippage Same bar close + slippage
Optimistic Signal price (no delay) Signal price

Default: Conservative mode - Trade on next bar open after signal.

Anti-Bias Tests

Mandatory Pre-Deployment Checks

Test Description Pass Criteria
Lookahead Shuffle future data, re-run Results unchanged
Leakage Remove target from features No target in feature set
Survivorship Include delisted instruments Results within 10%
Time alignment Shift data by 1 bar Signals shift correctly
Randomized entry Random entries same sizing Worse than strategy

Lookahead Detection

def test_no_lookahead(strategy, data):
    """
    Test that strategy doesn't use future data.

    Method:
    1. Run strategy on data[0:N]
    2. Run strategy on data[0:N+M] where M > 0
    3. Signals for bars 0:N must be identical

    If signals change, lookahead is present.
    """
    signals_original = strategy.generate_signals(data[:1000])
    signals_extended = strategy.generate_signals(data[:1500])

    # Compare signals for first 1000 bars
    assert signals_original.equals(signals_extended[:1000]), \
        "FAIL: Lookahead detected - signals changed with future data"

Overfitting Detection

def test_overfitting(strategy, data, n_shuffles=100):
    """
    Compare strategy to randomized versions.

    Method:
    1. Run strategy, record Sharpe
    2. Shuffle signal timing randomly, run N times
    3. Strategy should beat 95% of random versions

    If not, edge may be spurious.
    """
    real_sharpe = strategy.backtest(data).sharpe_ratio

    random_sharpes = []
    for _ in range(n_shuffles):
        shuffled = shuffle_signals(strategy, data)
        random_sharpes.append(shuffled.backtest(data).sharpe_ratio)

    percentile = (np.array(random_sharpes) < real_sharpe).mean()
    assert percentile >= 0.95, \
        f"FAIL: Strategy only beats {percentile*100:.0f}% of random"

Full Anti-Bias Suite

# Run all anti-bias tests
pytest tests/test_bias.py -v

# Individual tests
pytest tests/test_bias.py::test_no_lookahead
pytest tests/test_bias.py::test_no_leakage
pytest tests/test_bias.py::test_time_alignment
pytest tests/test_bias.py::test_overfitting
pytest tests/test_bias.py::test_survivorship

Walk-Forward & OOS Protocol

Data Split

Total Data: 2020-01-01 to 2024-01-01 (4 years)

├── In-Sample (IS): 2020-01-01 to 2022-12-31 (3 years, 75%)
│   └── Used for: Parameter optimization, feature selection
└── Out-of-Sample (OOS): 2023-01-01 to 2024-01-01 (1 year, 25%)
    └── Used for: Final validation only (touch once)

Walk-Forward Validation

def walk_forward_validation(strategy, data, n_windows=6):
    """
    Rolling window validation.

    Parameters:
    - Training window: 12 months
    - Testing window: 2 months
    - Step: 2 months (non-overlapping test)
    - Total windows: 6

    Returns:
    - Per-window performance
    - Aggregate statistics
    - Consistency score (% of profitable windows)
    """
    window_results = []

    for i in range(n_windows):
        train_start = i * 2  # months
        train_end = train_start + 12
        test_start = train_end
        test_end = test_start + 2

        train_data = data[train_start:train_end]
        test_data = data[test_start:test_end]

        # Fit on train, evaluate on test
        strategy.fit(train_data)
        result = strategy.backtest(test_data)
        window_results.append(result)

    return aggregate_results(window_results)

OOS Degradation Limits

Metric Max Degradation Action if Exceeded
Sharpe Ratio 30% Reject strategy
Win Rate 15% absolute Investigate
Profit Factor 25% Investigate
Max Drawdown 50% increase Reject strategy

Validation Report Template

# Strategy Validation Report: [STRATEGY_NAME]

## Summary
- IS Sharpe: X.XX
- OOS Sharpe: X.XX
- Degradation: XX%
- Status: PASS / FAIL

## Walk-Forward Results
| Window | Period | Sharpe | Win% | PF | Max DD |
|--------|--------|--------|------|-----|--------|
| 1 | 2022-01 to 2022-02 | ... | ... | ... | ... |

## Anti-Bias Tests
- [ ] Lookahead: PASS
- [ ] Leakage: PASS
- [ ] Overfitting: PASS (beats 97% of random)

## Recommendation
[APPROVED FOR PAPER / NEEDS REVISION / REJECTED]

Backtest Output Requirements

Trade Log Schema

Field Type Description
trade_id int Unique identifier
entry_time datetime Entry bar timestamp
exit_time datetime Exit bar timestamp
direction string 'long' or 'short'
entry_price float Actual entry price
exit_price float Actual exit price
position_size float Units traded
pnl_gross float Profit before costs
commission float Commission paid
slippage float Slippage cost
pnl_net float Net profit/loss
entry_reason string Signal that triggered entry
exit_reason string TP/SL/Signal/Timeout

Performance Summary

class BacktestResult:
    # Returns
    total_return: float
    cagr: float
    sharpe_ratio: float
    sortino_ratio: float
    calmar_ratio: float

    # Risk
    max_drawdown: float
    avg_drawdown: float
    max_drawdown_duration: int  # bars

    # Trades
    total_trades: int
    win_rate: float
    avg_win: float
    avg_loss: float
    profit_factor: float
    expectancy: float

    # Validation
    is_sharpe: float
    oos_sharpe: float
    degradation_pct: float
    walk_forward_consistency: float
    anti_bias_passed: bool