Backtest Specification¶

Backtest Engine Requirements¶

Core Principles¶

No lookahead: Signals computed only from data available at decision time
No repainting: Once a signal is generated, it is immutable
Realistic execution: All costs, slippage, and latency modeled
Reproducible: Same inputs → same outputs (seeded randomness)
Auditable: Full trade log with entry rationale

Engine Configuration¶

# src/evaluation/backtester.py

class BacktestConfig:
    # Timing
    signal_delay_bars: int = 1        # Signal on bar N, execute on bar N+1
    execution_price: str = 'open'     # 'open', 'close', 'vwap', 'worst_case'

    # Position
    max_positions: int = 1            # Concurrent positions
    position_sizing: str = 'fixed'    # 'fixed', 'risk_pct', 'kelly'

    # Costs (applied per trade)
    apply_commission: bool = True
    apply_spread: bool = True
    apply_slippage: bool = True

    # Validation
    run_anti_bias_checks: bool = True
    require_oos_validation: bool = True

Cost Models¶

Commission Structure¶

Market	Type	Rate	Example (1 lot)
FX (MT5)	Per lot	$3.50/side	$7.00 round trip
FX (Prop firm)	Included in spread	$0	$0
Crypto (Binance)	Percentage	0.04% maker, 0.06% taker	0.1% RT
Crypto (Bybit)	Percentage	0.02% maker, 0.055% taker	0.075% RT
NQ/MNQ Futures	Per contract	$0.50-2.50/side	$1-5 RT
Gold Futures	Per contract	$1.25/side	$2.50 RT

Spread Model¶

def get_spread_cost(symbol, volatility_regime='normal'):
    """
    Return spread in price units.

    Spreads widen during:
    - News events (2-5x)
    - Low liquidity periods (1.5-2x)
    - High volatility (1.5-3x)
    """
    base_spreads = {
        'EURUSD': 0.00010,  # 1.0 pip
        'GBPUSD': 0.00012,  # 1.2 pips
        'USDJPY': 0.012,    # 1.2 pips
        'USDCAD': 0.00015,  # 1.5 pips
        'BTCUSD': 5.0,      # $5
        'ETHUSD': 0.50,     # $0.50
        'NQ': 0.50,         # 0.5 points
        'GC': 0.30,         # $0.30
    }

    multipliers = {
        'normal': 1.0,
        'elevated': 1.5,
        'high': 2.5,
        'extreme': 5.0
    }

    return base_spreads.get(symbol, 0.0001) * multipliers[volatility_regime]

Slippage Model¶

Condition	Slippage (FX)	Slippage (Crypto)	Slippage (Futures)
Normal	0.1-0.3 pips	0.01-0.03%	0.25-0.5 ticks
High volume	0.3-0.5 pips	0.03-0.1%	0.5-1 tick
Low liquidity	0.5-2.0 pips	0.1-0.3%	1-3 ticks
News/events	1.0-5.0 pips	0.2-1.0%	2-10 ticks

def calculate_slippage(symbol, order_type, volatility, position_size):
    """
    Calculate expected slippage.

    Factors:
    - Base slippage per instrument
    - Volatility multiplier (ATR-based)
    - Size impact (larger orders = more slippage)
    - Order type (market vs limit)
    """
    base_slippage = get_base_slippage(symbol)

    # Volatility adjustment
    vol_mult = max(1.0, volatility / get_avg_volatility(symbol))

    # Size impact (only for larger positions)
    size_mult = 1.0 + max(0, (position_size / get_avg_volume(symbol)) - 0.01)

    # Limit orders get 50% less slippage on average
    order_mult = 0.5 if order_type == 'limit' else 1.0

    return base_slippage * vol_mult * size_mult * order_mult

Fill Assumptions¶

Order Types¶

Order Type	Fill Price	Certainty	Use Case
Market	Next bar open + slippage	100%	Exits, urgent entries
Limit	Limit price (if touched)	70-90%	Standard entries
Stop	Stop price + slippage	95%	Stop losses
Stop-Limit	Limit (if stop triggered)	60-80%	Controlled stops

Partial Fill Handling¶

# Conservative assumption: No partial fills modeled
# All orders either fully filled or not filled

fill_rate_assumptions = {
    'market': 1.00,     # Always fills
    'limit': 0.80,      # 80% fill rate for limits at touch
    'stop': 0.95,       # 95% for stops (gaps can skip)
    'stop_limit': 0.70  # 70% for stop-limits
}

Gap Handling¶

Scenario	Treatment
Gap through stop	Fill at gap open (worse price)
Gap through limit	Fill at limit (better price)
Gap through entry	Fill at gap open
Weekend gap	Apply to Monday open

Latency Assumptions¶

Signal-to-Execution Timeline¶

Signal Generated (bar close)
    │
    ├── Pine Script computation: ~50ms
    │
    ├── TradingView alert dispatch: 100-500ms
    │
    ├── Webhook receive + process: 100-300ms
    │
    ├── Order submission to broker: 50-200ms
    │
    └── Broker acknowledgment: 50-500ms
    ──────────────────────────────────
    Total: 350ms - 1.5s typical
           Up to 5s in degraded conditions

Backtest Latency Modeling¶

Mode	Entry Execution	Exit Execution
Conservative	Next bar open	Next bar open
Realistic	Same bar close + slippage	Same bar close + slippage
Optimistic	Signal price (no delay)	Signal price

Default: Conservative mode - Trade on next bar open after signal.

Anti-Bias Tests¶

Mandatory Pre-Deployment Checks¶

Test	Description	Pass Criteria
Lookahead	Shuffle future data, re-run	Results unchanged
Leakage	Remove target from features	No target in feature set
Survivorship	Include delisted instruments	Results within 10%
Time alignment	Shift data by 1 bar	Signals shift correctly
Randomized entry	Random entries same sizing	Worse than strategy

Lookahead Detection¶

def test_no_lookahead(strategy, data):
    """
    Test that strategy doesn't use future data.

    Method:
    1. Run strategy on data[0:N]
    2. Run strategy on data[0:N+M] where M > 0
    3. Signals for bars 0:N must be identical

    If signals change, lookahead is present.
    """
    signals_original = strategy.generate_signals(data[:1000])
    signals_extended = strategy.generate_signals(data[:1500])

    # Compare signals for first 1000 bars
    assert signals_original.equals(signals_extended[:1000]), \
        "FAIL: Lookahead detected - signals changed with future data"

Overfitting Detection¶

def test_overfitting(strategy, data, n_shuffles=100):
    """
    Compare strategy to randomized versions.

    Method:
    1. Run strategy, record Sharpe
    2. Shuffle signal timing randomly, run N times
    3. Strategy should beat 95% of random versions

    If not, edge may be spurious.
    """
    real_sharpe = strategy.backtest(data).sharpe_ratio

    random_sharpes = []
    for _ in range(n_shuffles):
        shuffled = shuffle_signals(strategy, data)
        random_sharpes.append(shuffled.backtest(data).sharpe_ratio)

    percentile = (np.array(random_sharpes) < real_sharpe).mean()
    assert percentile >= 0.95, \
        f"FAIL: Strategy only beats {percentile*100:.0f}% of random"

Full Anti-Bias Suite¶

# Run all anti-bias tests
pytest tests/test_bias.py -v

# Individual tests
pytest tests/test_bias.py::test_no_lookahead
pytest tests/test_bias.py::test_no_leakage
pytest tests/test_bias.py::test_time_alignment
pytest tests/test_bias.py::test_overfitting
pytest tests/test_bias.py::test_survivorship

Walk-Forward & OOS Protocol¶

Data Split¶

Total Data: 2020-01-01 to 2024-01-01 (4 years)

├── In-Sample (IS): 2020-01-01 to 2022-12-31 (3 years, 75%)
│   └── Used for: Parameter optimization, feature selection
│
└── Out-of-Sample (OOS): 2023-01-01 to 2024-01-01 (1 year, 25%)
    └── Used for: Final validation only (touch once)

Walk-Forward Validation¶

def walk_forward_validation(strategy, data, n_windows=6):
    """
    Rolling window validation.

    Parameters:
    - Training window: 12 months
    - Testing window: 2 months
    - Step: 2 months (non-overlapping test)
    - Total windows: 6

    Returns:
    - Per-window performance
    - Aggregate statistics
    - Consistency score (% of profitable windows)
    """
    window_results = []

    for i in range(n_windows):
        train_start = i * 2  # months
        train_end = train_start + 12
        test_start = train_end
        test_end = test_start + 2

        train_data = data[train_start:train_end]
        test_data = data[test_start:test_end]

        # Fit on train, evaluate on test
        strategy.fit(train_data)
        result = strategy.backtest(test_data)
        window_results.append(result)

    return aggregate_results(window_results)

OOS Degradation Limits¶

Metric	Max Degradation	Action if Exceeded
Sharpe Ratio	30%	Reject strategy
Win Rate	15% absolute	Investigate
Profit Factor	25%	Investigate
Max Drawdown	50% increase	Reject strategy

Validation Report Template¶

# Strategy Validation Report: [STRATEGY_NAME]

## Summary
- IS Sharpe: X.XX
- OOS Sharpe: X.XX
- Degradation: XX%
- Status: PASS / FAIL

## Walk-Forward Results
| Window | Period | Sharpe | Win% | PF | Max DD |
|--------|--------|--------|------|-----|--------|
| 1 | 2022-01 to 2022-02 | ... | ... | ... | ... |

## Anti-Bias Tests
- [ ] Lookahead: PASS
- [ ] Leakage: PASS
- [ ] Overfitting: PASS (beats 97% of random)

## Recommendation
[APPROVED FOR PAPER / NEEDS REVISION / REJECTED]

Backtest Output Requirements¶

Trade Log Schema¶

Field	Type	Description
trade_id	int	Unique identifier
entry_time	datetime	Entry bar timestamp
exit_time	datetime	Exit bar timestamp
direction	string	'long' or 'short'
entry_price	float	Actual entry price
exit_price	float	Actual exit price
position_size	float	Units traded
pnl_gross	float	Profit before costs
commission	float	Commission paid
slippage	float	Slippage cost
pnl_net	float	Net profit/loss
entry_reason	string	Signal that triggered entry
exit_reason	string	TP/SL/Signal/Timeout

Performance Summary¶

class BacktestResult:
    # Returns
    total_return: float
    cagr: float
    sharpe_ratio: float
    sortino_ratio: float
    calmar_ratio: float

    # Risk
    max_drawdown: float
    avg_drawdown: float
    max_drawdown_duration: int  # bars

    # Trades
    total_trades: int
    win_rate: float
    avg_win: float
    avg_loss: float
    profit_factor: float
    expectancy: float

    # Validation
    is_sharpe: float
    oos_sharpe: float
    degradation_pct: float
    walk_forward_consistency: float
    anti_bias_passed: bool