Skip to content

Data Specification

Data Sources

Market Primary Source Backup Source Frequency
FX TradingView MT5 broker feed 1m, 5m, 15m, 1H, 4H, D
Crypto Binance/Bybit API TradingView 1m, 5m, 15m, 1H, 4H, D
Gold/Silver Futures TradingView Broker feed 5m, 15m, 1H, 4H, D
NASDAQ Futures TradingView Broker feed 1m, 5m, 15m, 1H

Data Fields

OHLCV Standard Schema

Field Type Description Required
timestamp datetime64[ns, UTC] Bar open time Yes
open float64 Opening price Yes
high float64 Highest price Yes
low float64 Lowest price Yes
close float64 Closing price Yes
volume float64 Volume (contracts/units) Yes
symbol string Instrument identifier Yes
timeframe string Bar period (1m, 5m, etc.) Yes

Extended Fields (When Available)

Field Type Description Source
bid float64 Best bid at close MT5
ask float64 Best ask at close MT5
spread float64 Ask - Bid Calculated
tick_volume int64 Number of ticks MT5
funding_rate float64 Perpetual funding Crypto exchanges

FX Sessions

Session Definitions (UTC)

Session UTC Start UTC End Summer (DST) Adjustment
Sydney 21:00 06:00 No adjustment
Tokyo 00:00 09:00 No adjustment
London 07:00 16:00 06:00-15:00 (BST)
New York 12:00 21:00 11:00-20:00 (EDT)

Kill Zones (High-Probability Windows)

Kill Zone UTC Time Local (Dublin) Notes
Asia 00:00-04:00 00:00-04:00 Lower volatility
London Open 07:00-10:00 07:00-10:00 Best reversals
NY AM 12:30-15:00 12:30-15:00 Primary window
NY Lunch 16:00-17:00 16:00-17:00 Avoid
NY PM 17:30-20:00 17:30-20:00 Continuation moves

Session Overlap Handling

def get_active_sessions(timestamp_utc):
    """Return list of active sessions for given UTC timestamp."""
    hour = timestamp_utc.hour
    sessions = []
    if 21 <= hour or hour < 6:
        sessions.append('sydney')
    if 0 <= hour < 9:
        sessions.append('tokyo')
    if 7 <= hour < 16:
        sessions.append('london')
    if 12 <= hour < 21:
        sessions.append('new_york')
    return sessions

Crypto: 24/7 Handling

Funding Rate Schedule

Exchange Funding Interval Timestamps (UTC)
Binance 8 hours 00:00, 08:00, 16:00
Bybit 8 hours 00:00, 08:00, 16:00
OKX 8 hours 00:00, 08:00, 16:00

Funding Rate Integration

# data/catalog.yaml entry
btc_perpetual:
  symbol: BTCUSDT
  type: perpetual
  funding_rate_field: funding_rate
  funding_times_utc: ["00:00", "08:00", "16:00"]
  include_funding_in_backtest: true

Weekend/Holiday Handling

  • Crypto trades 24/7/365 - no gaps expected
  • Reduced liquidity: Saturday 00:00 - Sunday 12:00 UTC
  • Flag low-liquidity periods in data catalog
  • Widen slippage assumptions during low-liquidity

Futures Rollovers

Contract Specifications

Instrument Symbol Root Contract Months Roll Timing
E-mini NASDAQ NQ H, M, U, Z 8 days before expiry
Micro NASDAQ MNQ H, M, U, Z 8 days before expiry
Gold GC G, J, M, Q, V, Z 3 days before FND
Silver SI H, K, N, U, Z 3 days before FND

Month Codes

Code Month Code Month
F January N July
G February Q August
H March U September
J April V October
K May X November
M June Z December

Rollover Protocol

def build_continuous_contract(symbol_root, roll_days_before=8):
    """
    Build continuous contract series using back-adjustment.

    Method: Back-adjust prior prices by roll gap
    - Preserves percentage returns
    - No lookahead (adjustment applied to past, not future)
    """
    pass  # Implementation in src/data/rollovers.py

Rollover Data Schema

Field Type Description
front_contract string Current front month symbol
roll_date date Date of roll
adjustment float64 Price adjustment applied
method string 'back_adjusted' or 'ratio_adjusted'

Corporate Actions (Indices)

Handling Policy

Event Action Backtest Impact
Index rebalance Use adjusted data Minimal for futures
Stock splits (constituents) Index auto-adjusts None
Dividends Index adjusts for ex-div Small gap possible
Halts Use last valid price Mark as suspect

Note

For index futures (NQ, ES), corporate actions are absorbed into the index calculation. No manual adjustment required. Data from TradingView is pre-adjusted.

Data Quality Rules

Validation Checks

Check Threshold Action if Failed
Missing bars > 1% of session Flag + interpolate or exclude
Price spike > 5 ATR in 1 bar Flag for manual review
Volume anomaly > 10× rolling avg Flag as suspect
Timestamp gap > expected interval × 2 Insert missing bar marker
OHLC logic high < low or close outside range Reject bar

Gap Handling

def handle_gaps(df, max_gap_bars=5):
    """
    Handle missing bars in OHLCV data.

    Rules:
    - Gap <= max_gap_bars: Forward-fill close, zero volume
    - Gap > max_gap_bars: Mark as session break, don't fill
    - Weekend gaps: Expected, mark as session_break=True
    """
    pass  # Implementation in src/data/validators.py

Data Versioning

Version Control Strategy

data/
├── raw/
│   └── {symbol}/
│       └── {timeframe}/
│           └── {YYYY}/{MM}/
│               └── {symbol}_{timeframe}_{YYYYMMDD}.parquet
├── processed/
│   └── v{VERSION}/
│       └── {symbol}_{timeframe}.parquet
└── catalog.yaml  # Tracks versions and lineage

Catalog Schema

# data/catalog.yaml
version: "2024.01.15"

datasets:
  eurusd_1h:
    symbol: EURUSD
    timeframe: 1H
    source: tradingview
    start_date: "2020-01-01"
    end_date: "2024-01-15"
    row_count: 35040
    checksum: sha256:abc123...
    processed_version: v3
    processing_date: "2024-01-15"
    quality_report:
      missing_bars: 0.2%
      flagged_bars: 12

  btcusdt_15m:
    symbol: BTCUSDT
    timeframe: 15m
    source: binance
    start_date: "2021-01-01"
    end_date: "2024-01-15"
    row_count: 105120
    checksum: sha256:def456...
    includes_funding: true

Immutability Rules

  1. Raw data is append-only (never modified)
  2. Processed data is versioned (new version on reprocessing)
  3. Catalog tracks all versions and lineage
  4. Checksums verify data integrity
  5. Git LFS for large files (> 50MB)

Data Refresh Schedule

Dataset Refresh Frequency Retention
Intraday (< 1D) Daily at 00:00 UTC Rolling 2 years
Daily Weekly (Sunday) Full history
Funding rates Every 8 hours Rolling 1 year
Roll calendar Monthly Full history

Fetch Commands

# Fetch recent data
python -m src.data.fetch --symbol EURUSD --timeframe 1H --days 30

# Validate existing data
python -m src.data.validate --dataset eurusd_1h

# Build continuous futures
python -m src.data.rollovers --symbol NQ --method back_adjusted

# Update catalog checksums
python -m src.data.catalog --update-checksums