Data Specification
Data Sources
| Market | Primary Source | Backup Source | Frequency |
| FX | TradingView | MT5 broker feed | 1m, 5m, 15m, 1H, 4H, D |
| Crypto | Binance/Bybit API | TradingView | 1m, 5m, 15m, 1H, 4H, D |
| Gold/Silver Futures | TradingView | Broker feed | 5m, 15m, 1H, 4H, D |
| NASDAQ Futures | TradingView | Broker feed | 1m, 5m, 15m, 1H |
Data Fields
OHLCV Standard Schema
| Field | Type | Description | Required |
| timestamp | datetime64[ns, UTC] | Bar open time | Yes |
| open | float64 | Opening price | Yes |
| high | float64 | Highest price | Yes |
| low | float64 | Lowest price | Yes |
| close | float64 | Closing price | Yes |
| volume | float64 | Volume (contracts/units) | Yes |
| symbol | string | Instrument identifier | Yes |
| timeframe | string | Bar period (1m, 5m, etc.) | Yes |
Extended Fields (When Available)
| Field | Type | Description | Source |
| bid | float64 | Best bid at close | MT5 |
| ask | float64 | Best ask at close | MT5 |
| spread | float64 | Ask - Bid | Calculated |
| tick_volume | int64 | Number of ticks | MT5 |
| funding_rate | float64 | Perpetual funding | Crypto exchanges |
FX Sessions
Session Definitions (UTC)
| Session | UTC Start | UTC End | Summer (DST) Adjustment |
| Sydney | 21:00 | 06:00 | No adjustment |
| Tokyo | 00:00 | 09:00 | No adjustment |
| London | 07:00 | 16:00 | 06:00-15:00 (BST) |
| New York | 12:00 | 21:00 | 11:00-20:00 (EDT) |
Kill Zones (High-Probability Windows)
| Kill Zone | UTC Time | Local (Dublin) | Notes |
| Asia | 00:00-04:00 | 00:00-04:00 | Lower volatility |
| London Open | 07:00-10:00 | 07:00-10:00 | Best reversals |
| NY AM | 12:30-15:00 | 12:30-15:00 | Primary window |
| NY Lunch | 16:00-17:00 | 16:00-17:00 | Avoid |
| NY PM | 17:30-20:00 | 17:30-20:00 | Continuation moves |
Session Overlap Handling
def get_active_sessions(timestamp_utc):
"""Return list of active sessions for given UTC timestamp."""
hour = timestamp_utc.hour
sessions = []
if 21 <= hour or hour < 6:
sessions.append('sydney')
if 0 <= hour < 9:
sessions.append('tokyo')
if 7 <= hour < 16:
sessions.append('london')
if 12 <= hour < 21:
sessions.append('new_york')
return sessions
Crypto: 24/7 Handling
Funding Rate Schedule
| Exchange | Funding Interval | Timestamps (UTC) |
| Binance | 8 hours | 00:00, 08:00, 16:00 |
| Bybit | 8 hours | 00:00, 08:00, 16:00 |
| OKX | 8 hours | 00:00, 08:00, 16:00 |
Funding Rate Integration
# data/catalog.yaml entry
btc_perpetual:
symbol: BTCUSDT
type: perpetual
funding_rate_field: funding_rate
funding_times_utc: ["00:00", "08:00", "16:00"]
include_funding_in_backtest: true
Weekend/Holiday Handling
- Crypto trades 24/7/365 - no gaps expected
- Reduced liquidity: Saturday 00:00 - Sunday 12:00 UTC
- Flag low-liquidity periods in data catalog
- Widen slippage assumptions during low-liquidity
Futures Rollovers
Contract Specifications
| Instrument | Symbol Root | Contract Months | Roll Timing |
| E-mini NASDAQ | NQ | H, M, U, Z | 8 days before expiry |
| Micro NASDAQ | MNQ | H, M, U, Z | 8 days before expiry |
| Gold | GC | G, J, M, Q, V, Z | 3 days before FND |
| Silver | SI | H, K, N, U, Z | 3 days before FND |
Month Codes
| Code | Month | Code | Month |
| F | January | N | July |
| G | February | Q | August |
| H | March | U | September |
| J | April | V | October |
| K | May | X | November |
| M | June | Z | December |
Rollover Protocol
def build_continuous_contract(symbol_root, roll_days_before=8):
"""
Build continuous contract series using back-adjustment.
Method: Back-adjust prior prices by roll gap
- Preserves percentage returns
- No lookahead (adjustment applied to past, not future)
"""
pass # Implementation in src/data/rollovers.py
Rollover Data Schema
| Field | Type | Description |
| front_contract | string | Current front month symbol |
| roll_date | date | Date of roll |
| adjustment | float64 | Price adjustment applied |
| method | string | 'back_adjusted' or 'ratio_adjusted' |
Corporate Actions (Indices)
Handling Policy
| Event | Action | Backtest Impact |
| Index rebalance | Use adjusted data | Minimal for futures |
| Stock splits (constituents) | Index auto-adjusts | None |
| Dividends | Index adjusts for ex-div | Small gap possible |
| Halts | Use last valid price | Mark as suspect |
Note
For index futures (NQ, ES), corporate actions are absorbed into the index calculation. No manual adjustment required. Data from TradingView is pre-adjusted.
Data Quality Rules
Validation Checks
| Check | Threshold | Action if Failed |
| Missing bars | > 1% of session | Flag + interpolate or exclude |
| Price spike | > 5 ATR in 1 bar | Flag for manual review |
| Volume anomaly | > 10× rolling avg | Flag as suspect |
| Timestamp gap | > expected interval × 2 | Insert missing bar marker |
| OHLC logic | high < low or close outside range | Reject bar |
Gap Handling
def handle_gaps(df, max_gap_bars=5):
"""
Handle missing bars in OHLCV data.
Rules:
- Gap <= max_gap_bars: Forward-fill close, zero volume
- Gap > max_gap_bars: Mark as session break, don't fill
- Weekend gaps: Expected, mark as session_break=True
"""
pass # Implementation in src/data/validators.py
Data Versioning
Version Control Strategy
data/
├── raw/
│ └── {symbol}/
│ └── {timeframe}/
│ └── {YYYY}/{MM}/
│ └── {symbol}_{timeframe}_{YYYYMMDD}.parquet
│
├── processed/
│ └── v{VERSION}/
│ └── {symbol}_{timeframe}.parquet
│
└── catalog.yaml # Tracks versions and lineage
Catalog Schema
# data/catalog.yaml
version: "2024.01.15"
datasets:
eurusd_1h:
symbol: EURUSD
timeframe: 1H
source: tradingview
start_date: "2020-01-01"
end_date: "2024-01-15"
row_count: 35040
checksum: sha256:abc123...
processed_version: v3
processing_date: "2024-01-15"
quality_report:
missing_bars: 0.2%
flagged_bars: 12
btcusdt_15m:
symbol: BTCUSDT
timeframe: 15m
source: binance
start_date: "2021-01-01"
end_date: "2024-01-15"
row_count: 105120
checksum: sha256:def456...
includes_funding: true
Immutability Rules
- Raw data is append-only (never modified)
- Processed data is versioned (new version on reprocessing)
- Catalog tracks all versions and lineage
- Checksums verify data integrity
- Git LFS for large files (> 50MB)
Data Refresh Schedule
| Dataset | Refresh Frequency | Retention |
| Intraday (< 1D) | Daily at 00:00 UTC | Rolling 2 years |
| Daily | Weekly (Sunday) | Full history |
| Funding rates | Every 8 hours | Rolling 1 year |
| Roll calendar | Monthly | Full history |
Fetch Commands
# Fetch recent data
python -m src.data.fetch --symbol EURUSD --timeframe 1H --days 30
# Validate existing data
python -m src.data.validate --dataset eurusd_1h
# Build continuous futures
python -m src.data.rollovers --symbol NQ --method back_adjusted
# Update catalog checksums
python -m src.data.catalog --update-checksums