Backtest Specification
Backtest Engine Requirements
Core Principles
- No lookahead: Signals computed only from data available at decision time
- No repainting: Once a signal is generated, it is immutable
- Realistic execution: All costs, slippage, and latency modeled
- Reproducible: Same inputs → same outputs (seeded randomness)
- Auditable: Full trade log with entry rationale
Engine Configuration
# src/evaluation/backtester.py
class BacktestConfig:
# Timing
signal_delay_bars: int = 1 # Signal on bar N, execute on bar N+1
execution_price: str = 'open' # 'open', 'close', 'vwap', 'worst_case'
# Position
max_positions: int = 1 # Concurrent positions
position_sizing: str = 'fixed' # 'fixed', 'risk_pct', 'kelly'
# Costs (applied per trade)
apply_commission: bool = True
apply_spread: bool = True
apply_slippage: bool = True
# Validation
run_anti_bias_checks: bool = True
require_oos_validation: bool = True
Cost Models
Commission Structure
| Market | Type | Rate | Example (1 lot) |
| FX (MT5) | Per lot | $3.50/side | $7.00 round trip |
| FX (Prop firm) | Included in spread | $0 | $0 |
| Crypto (Binance) | Percentage | 0.04% maker, 0.06% taker | 0.1% RT |
| Crypto (Bybit) | Percentage | 0.02% maker, 0.055% taker | 0.075% RT |
| NQ/MNQ Futures | Per contract | $0.50-2.50/side | $1-5 RT |
| Gold Futures | Per contract | $1.25/side | $2.50 RT |
Spread Model
def get_spread_cost(symbol, volatility_regime='normal'):
"""
Return spread in price units.
Spreads widen during:
- News events (2-5x)
- Low liquidity periods (1.5-2x)
- High volatility (1.5-3x)
"""
base_spreads = {
'EURUSD': 0.00010, # 1.0 pip
'GBPUSD': 0.00012, # 1.2 pips
'USDJPY': 0.012, # 1.2 pips
'USDCAD': 0.00015, # 1.5 pips
'BTCUSD': 5.0, # $5
'ETHUSD': 0.50, # $0.50
'NQ': 0.50, # 0.5 points
'GC': 0.30, # $0.30
}
multipliers = {
'normal': 1.0,
'elevated': 1.5,
'high': 2.5,
'extreme': 5.0
}
return base_spreads.get(symbol, 0.0001) * multipliers[volatility_regime]
Slippage Model
| Condition | Slippage (FX) | Slippage (Crypto) | Slippage (Futures) |
| Normal | 0.1-0.3 pips | 0.01-0.03% | 0.25-0.5 ticks |
| High volume | 0.3-0.5 pips | 0.03-0.1% | 0.5-1 tick |
| Low liquidity | 0.5-2.0 pips | 0.1-0.3% | 1-3 ticks |
| News/events | 1.0-5.0 pips | 0.2-1.0% | 2-10 ticks |
def calculate_slippage(symbol, order_type, volatility, position_size):
"""
Calculate expected slippage.
Factors:
- Base slippage per instrument
- Volatility multiplier (ATR-based)
- Size impact (larger orders = more slippage)
- Order type (market vs limit)
"""
base_slippage = get_base_slippage(symbol)
# Volatility adjustment
vol_mult = max(1.0, volatility / get_avg_volatility(symbol))
# Size impact (only for larger positions)
size_mult = 1.0 + max(0, (position_size / get_avg_volume(symbol)) - 0.01)
# Limit orders get 50% less slippage on average
order_mult = 0.5 if order_type == 'limit' else 1.0
return base_slippage * vol_mult * size_mult * order_mult
Fill Assumptions
Order Types
| Order Type | Fill Price | Certainty | Use Case |
| Market | Next bar open + slippage | 100% | Exits, urgent entries |
| Limit | Limit price (if touched) | 70-90% | Standard entries |
| Stop | Stop price + slippage | 95% | Stop losses |
| Stop-Limit | Limit (if stop triggered) | 60-80% | Controlled stops |
Partial Fill Handling
# Conservative assumption: No partial fills modeled
# All orders either fully filled or not filled
fill_rate_assumptions = {
'market': 1.00, # Always fills
'limit': 0.80, # 80% fill rate for limits at touch
'stop': 0.95, # 95% for stops (gaps can skip)
'stop_limit': 0.70 # 70% for stop-limits
}
Gap Handling
| Scenario | Treatment |
| Gap through stop | Fill at gap open (worse price) |
| Gap through limit | Fill at limit (better price) |
| Gap through entry | Fill at gap open |
| Weekend gap | Apply to Monday open |
Latency Assumptions
Signal-to-Execution Timeline
Signal Generated (bar close)
│
├── Pine Script computation: ~50ms
│
├── TradingView alert dispatch: 100-500ms
│
├── Webhook receive + process: 100-300ms
│
├── Order submission to broker: 50-200ms
│
└── Broker acknowledgment: 50-500ms
──────────────────────────────────
Total: 350ms - 1.5s typical
Up to 5s in degraded conditions
Backtest Latency Modeling
| Mode | Entry Execution | Exit Execution |
| Conservative | Next bar open | Next bar open |
| Realistic | Same bar close + slippage | Same bar close + slippage |
| Optimistic | Signal price (no delay) | Signal price |
Default: Conservative mode - Trade on next bar open after signal.
Anti-Bias Tests
Mandatory Pre-Deployment Checks
| Test | Description | Pass Criteria |
| Lookahead | Shuffle future data, re-run | Results unchanged |
| Leakage | Remove target from features | No target in feature set |
| Survivorship | Include delisted instruments | Results within 10% |
| Time alignment | Shift data by 1 bar | Signals shift correctly |
| Randomized entry | Random entries same sizing | Worse than strategy |
Lookahead Detection
def test_no_lookahead(strategy, data):
"""
Test that strategy doesn't use future data.
Method:
1. Run strategy on data[0:N]
2. Run strategy on data[0:N+M] where M > 0
3. Signals for bars 0:N must be identical
If signals change, lookahead is present.
"""
signals_original = strategy.generate_signals(data[:1000])
signals_extended = strategy.generate_signals(data[:1500])
# Compare signals for first 1000 bars
assert signals_original.equals(signals_extended[:1000]), \
"FAIL: Lookahead detected - signals changed with future data"
Overfitting Detection
def test_overfitting(strategy, data, n_shuffles=100):
"""
Compare strategy to randomized versions.
Method:
1. Run strategy, record Sharpe
2. Shuffle signal timing randomly, run N times
3. Strategy should beat 95% of random versions
If not, edge may be spurious.
"""
real_sharpe = strategy.backtest(data).sharpe_ratio
random_sharpes = []
for _ in range(n_shuffles):
shuffled = shuffle_signals(strategy, data)
random_sharpes.append(shuffled.backtest(data).sharpe_ratio)
percentile = (np.array(random_sharpes) < real_sharpe).mean()
assert percentile >= 0.95, \
f"FAIL: Strategy only beats {percentile*100:.0f}% of random"
Full Anti-Bias Suite
# Run all anti-bias tests
pytest tests/test_bias.py -v
# Individual tests
pytest tests/test_bias.py::test_no_lookahead
pytest tests/test_bias.py::test_no_leakage
pytest tests/test_bias.py::test_time_alignment
pytest tests/test_bias.py::test_overfitting
pytest tests/test_bias.py::test_survivorship
Walk-Forward & OOS Protocol
Data Split
Total Data: 2020-01-01 to 2024-01-01 (4 years)
├── In-Sample (IS): 2020-01-01 to 2022-12-31 (3 years, 75%)
│ └── Used for: Parameter optimization, feature selection
│
└── Out-of-Sample (OOS): 2023-01-01 to 2024-01-01 (1 year, 25%)
└── Used for: Final validation only (touch once)
Walk-Forward Validation
def walk_forward_validation(strategy, data, n_windows=6):
"""
Rolling window validation.
Parameters:
- Training window: 12 months
- Testing window: 2 months
- Step: 2 months (non-overlapping test)
- Total windows: 6
Returns:
- Per-window performance
- Aggregate statistics
- Consistency score (% of profitable windows)
"""
window_results = []
for i in range(n_windows):
train_start = i * 2 # months
train_end = train_start + 12
test_start = train_end
test_end = test_start + 2
train_data = data[train_start:train_end]
test_data = data[test_start:test_end]
# Fit on train, evaluate on test
strategy.fit(train_data)
result = strategy.backtest(test_data)
window_results.append(result)
return aggregate_results(window_results)
OOS Degradation Limits
| Metric | Max Degradation | Action if Exceeded |
| Sharpe Ratio | 30% | Reject strategy |
| Win Rate | 15% absolute | Investigate |
| Profit Factor | 25% | Investigate |
| Max Drawdown | 50% increase | Reject strategy |
Validation Report Template
# Strategy Validation Report: [STRATEGY_NAME]
## Summary
- IS Sharpe: X.XX
- OOS Sharpe: X.XX
- Degradation: XX%
- Status: PASS / FAIL
## Walk-Forward Results
| Window | Period | Sharpe | Win% | PF | Max DD |
|--------|--------|--------|------|-----|--------|
| 1 | 2022-01 to 2022-02 | ... | ... | ... | ... |
## Anti-Bias Tests
- [ ] Lookahead: PASS
- [ ] Leakage: PASS
- [ ] Overfitting: PASS (beats 97% of random)
## Recommendation
[APPROVED FOR PAPER / NEEDS REVISION / REJECTED]
Backtest Output Requirements
Trade Log Schema
| Field | Type | Description |
| trade_id | int | Unique identifier |
| entry_time | datetime | Entry bar timestamp |
| exit_time | datetime | Exit bar timestamp |
| direction | string | 'long' or 'short' |
| entry_price | float | Actual entry price |
| exit_price | float | Actual exit price |
| position_size | float | Units traded |
| pnl_gross | float | Profit before costs |
| commission | float | Commission paid |
| slippage | float | Slippage cost |
| pnl_net | float | Net profit/loss |
| entry_reason | string | Signal that triggered entry |
| exit_reason | string | TP/SL/Signal/Timeout |
class BacktestResult:
# Returns
total_return: float
cagr: float
sharpe_ratio: float
sortino_ratio: float
calmar_ratio: float
# Risk
max_drawdown: float
avg_drawdown: float
max_drawdown_duration: int # bars
# Trades
total_trades: int
win_rate: float
avg_win: float
avg_loss: float
profit_factor: float
expectancy: float
# Validation
is_sharpe: float
oos_sharpe: float
degradation_pct: float
walk_forward_consistency: float
anti_bias_passed: bool