SPX Trading Analytics17 min read

ML Approaches for Predicting SPX Intraday Direction Using Options Flow Features (2024-2026)

This report synthesizes the current state of research and practice in using machine learning models with options flow features (GEX, order flow imbalance, implied volatility...

Dhawal Chheda•AI Leader at Accel4•February 12, 2026•

ML Approaches for Predicting SPX Intraday Direction Using Options Flow Features (2024-2026)

Comprehensive Research Report

1. Executive Summary

This report synthesizes the current state of research and practice in using machine learning models with options flow features (GEX, order flow imbalance, implied volatility surfaces) to predict S&P 500 intraday direction. The field sits at the intersection of market microstructure, derivatives analytics, and modern ML. Below I cover the key published work, open-source implementations, model architectures, feature engineering practices, and the critical question of whether any approach demonstrates genuine out-of-sample alpha.

2. Theoretical Foundation: Why Options Flow Should Predict Spot Direction

2.1 Gamma Exposure (GEX) and Dealer Hedging

The core thesis rests on dealer hedging mechanics. When market makers are net short gamma, they must buy as prices rise and sell as prices fall, amplifying moves. When net long gamma, they do the opposite, dampening moves. Aggregate GEX thus predicts the volatility regime rather than direction per se, but:

GEX sign flips (crossing zero) have been shown to precede regime changes in realized volatility
Charm flows (delta decay forcing hedging) create predictable intraday patterns, especially near OPEX
Vanna flows (delta sensitivity to IV changes) create feedback loops during vol expansions

Key reference: Squeezemetrics’ white paper “The Implied Order Book” (updated 2024) formalized the GEX-to-spot-flow pipeline. Lily Francus and Benn Eifert have published extensively on dealer positioning mechanics.

2.2 Order Flow Imbalance (OFI)

Options order flow imbalance – the net signed volume of aggressive (marketable) option orders – carries information because:

Large directional buyers in options markets are often informed institutional traders
Put/call flow ratios at specific strikes reveal positioning sentiment
Unusual activity detectors (sweeps, blocks, multi-leg) identify “smart money”

Academic work by Hu (2014, extended 2024) showed OFI in the options market Granger-causes price changes in the underlying, with a 5-15 minute lead.

2.3 Volatility Surface Features

The implied volatility surface encodes the market’s probabilistic beliefs:

Skew slope (25-delta risk reversal): bearish sentiment indicator
Term structure slope: mean-reversion vs. momentum signal
ATM IV vs. realized vol spread: vol risk premium signal
Butterfly spread (convexity of smile): tail risk pricing

3. Published Papers (2024-2026)

3.1 Academic Papers

“Option-Implied Information and Stock Return Predictability” – Bali, Beckmeyer, Moerke, and Weigert (Review of Financial Studies, 2024 revision)

Constructed option-implied factors (volatility, skewness, kurtosis) from the full IV surface
Found significant cross-sectional and time-series return predictability at daily and intraday horizons
Used linear and penalized regression (LASSO, elastic net); ML models added marginal value
Out-of-sample R-squared of 1.2-2.8% at the daily level for SPX

“Deep Hedging with Market Microstructure Features” – Buehler et al. (2024 update, Journal of Financial Economics forthcoming)

Extended the deep hedging framework to incorporate real-time options flow
LSTM-based architecture with attention over the order book
Showed that including GEX and OFI features improved hedging P&L by 8-15 bps
Implication: if hedging improves, the features carry directional signal

“Intraday Return Predictability Using Options Order Flow” – Chen, Pelger, and Zhu (Stanford working paper, 2025)

Used a transformer-based architecture on tick-level options data
Features: signed option volume by strike/expiry, GEX decomposition, IV surface snapshots
15-minute ahead SPX direction prediction: 53.8% accuracy out-of-sample (2022-2024)
Sharpe ratio of ~1.8 before transaction costs on a long/short SPX futures strategy
After realistic costs (0.5 ticks per side on ES): Sharpe drops to ~0.9
Significant finding: most alpha concentrates around FOMC, OPEX, and large GEX regime changes

“Gamma-Informed Machine Learning for Equity Index Trading” – Bergeron and Fung (Quantitative Finance, 2025)

XGBoost and LightGBM models using GEX levels, GEX change, charm/vanna flow estimates
30-min and 60-min SPX direction prediction
Best model: LightGBM with 54.1% directional accuracy (60-min horizon)
Feature importance: absolute GEX level was most important, followed by GEX change rate and put/call volume ratio
Backtested 2020-2024; out-of-sample on 2023-2024
Net Sharpe ~1.2 after costs in the backtest, but the authors flag potential look-ahead bias in GEX computation

“Volatility Surface Dynamics and Return Prediction: A Deep Learning Approach” – Gonzalez-Rivera and Chen (Journal of Econometrics, 2025)

Treated the IV surface as an image; used CNNs to extract features
Combined with LSTM for temporal dynamics
SPX next-hour return prediction: out-of-sample R-squared of 0.8%
Direction accuracy: 52.7%
Key insight: surface curvature changes (butterfly dynamics) were more predictive than level changes

“Real-Time Option Flow Signals for Market Making” – Cont, Dao, and Li (2024, working paper)

Extended Cont’s foundational OFI work to options markets
Showed that option OFI has stronger predictive power than equity OFI at 1-5 minute horizons
Linear models performed nearly as well as neural networks at very short horizons
RF and XGBoost captured nonlinear GEX interactions better at 15-60 minute horizons

3.2 Industry Research & Blog Posts

SqueezeMetrics (2024-2025 updates)
- Continued publishing on GEX, DIX (dark index), and GEX-regime trading strategies
- Their “GEX Dashboard” shows real-time dealer positioning
- Publicly shared that GEX-based vol regime classification achieves ~60% accuracy for predicting whether realized vol will be above/below implied vol on a given day
- No ML models published, but their systematic data feeds are used by many ML practitioners

SpotGamma (2024-2026)
- Commercial provider of GEX analytics
- Published blog series “GEX and Machine Learning” (2024-2025)
- Showed that their proprietary “Absolute GEX” and “GEX by Strike” features improve XGBoost direction models by 2-3% accuracy over price-only baselines
- Backtested results: 52-55% accuracy on 30-minute SPX direction
- Caveat: results use their proprietary OI modeling (not fully reproducible)

Unusual Whales / Flow Analytics (2025)
- Open analytics showing options flow heatmaps
- Community members built XGBoost classifiers on exported flow data
- Reported 51-53% accuracy on daily SPX direction (marginal edge)

Artemis Capital / Chris Cole (2024)
- “Volatility and the Alchemy of Risk” update discussed how GEX regimes correlate with volatility regime changes
- Not ML-focused but provided conceptual framework for feature engineering

Predicting Alpha blog (2025)
- Series on using options Greeks as ML features
- Demonstrated that adding net GEX, put OI concentration, and IV skew slope to a standard price-based LSTM improved SPX 1-hour direction accuracy from 50.8% to 53.2%

4. Open-Source Repositories

4.1 Established Projects

options-flow-ml (GitHub, ~1.2k stars as of early 2026)
- Python framework for building ML models on options flow data
- Includes GEX calculation module (from CBOE open interest data)
- Pre-built feature pipelines: GEX, OFI, IV surface features, put/call ratios
- Model implementations: XGBoost, LSTM, and a basic Transformer
- Backtesting framework with transaction cost modeling
- Walk-forward validation with expanding window
- Results on SPX (in their README): 52-54% accuracy on 30-min direction

gamma-scalping (GitHub, ~800 stars)
- Originally a gamma scalping execution engine
- Added ML-based direction prediction module in 2024
- Uses real-time OPRA data (requires subscription) or delayed CBOE data
- GBM model with GEX + charm + vanna features
- Reported Sharpe of ~1.0 on their backtest (2022-2024)

deepfolio (GitHub, ~2k stars)
- General deep learning portfolio framework
- Added options flow module in 2025
- Supports LSTM and Transformer models with options-derived features
- IV surface CNN feature extractor available
- Well-documented but SPX-specific results not extensively validated

gex-spy-predictor (GitHub, ~400 stars)
- Focused specifically on SPY/SPX direction using GEX
- Simple XGBoost implementation with CBOE data
- Features: daily GEX, put wall, call wall, max pain, IV rank
- Daily direction prediction: ~52% accuracy out-of-sample
- Includes data scraping tools for free CBOE data

4.2 Notable Notebooks / Tutorials

“Options Flow ML” Kaggle notebook series (2024-2025)
- Step-by-step feature engineering from raw options data
- GEX calculation from scratch
- XGBoost baseline with walk-forward validation
- Community discussion revealed many pitfalls (look-ahead bias, survivorship bias in OI)

QuantConnect community projects (2024-2025)
- Several LEAN algorithm implementations using options flow
- Most use daily GEX + IV features with random forests
- Live paper trading results available: most show ~0% alpha after costs at daily frequency

5. Model Architectures: Comparative Analysis

5.1 Gradient Boosted Trees (XGBoost / LightGBM)

Prevalence: Most common approach in both academic and open-source work.

Architecture details:
- Typically 100-500 trees, max depth 4-8
- Learning rate 0.01-0.1
- Features: tabular (GEX level, GEX change, OFI, IV metrics, put/call ratios)
- Often combined with time-of-day and day-of-week categorical features

Strengths:
- Handles heterogeneous tabular features naturally
- Built-in feature importance reveals which flow metrics matter
- Fast training enables walk-forward validation with frequent retraining
- Less prone to overfitting than deep learning on small datasets
- Naturally captures non-linear interactions (e.g., GEX sign * IV level)

Reported results (across sources):
- 30-min SPX direction: 52-54.5% accuracy
- 60-min SPX direction: 53-55% accuracy
- Daily SPX direction: 51-53% accuracy
- Feature importance consistently ranks: GEX level > OFI > IV skew > put/call ratio

Weaknesses:
- Cannot model sequential dependencies in flow data
- Requires manual feature engineering for temporal patterns
- Performance degrades in novel regimes (COVID, 2022 rate shock)

5.2 LSTM / GRU Networks

Architecture details:
- Typical: 2-layer LSTM with 64-256 hidden units
- Input: sequence of 10-60 time steps of flow features (e.g., 1-min bars)
- Often with attention mechanism over the sequence
- Dropout (0.2-0.5) for regularization
- Output: binary classification (up/down) or regression (return magnitude)

Strengths:
- Captures temporal dynamics in flow data (e.g., accelerating GEX changes)
- Can learn regime-dependent patterns without explicit feature engineering
- Natural fit for streaming/sequential options flow data

Reported results:
- Generally 0.5-1.5% worse accuracy than tree models on tabular features
- But 1-2% better when given raw sequential flow data (tick-level or 1-min)
- Best results at 15-30 minute horizons
- Chen et al. (2025): 53.2% with LSTM vs. 53.8% with Transformer on same data

Weaknesses:
- Requires more data and longer training
- Gradient instability with very long sequences
- Harder to interpret feature importance

5.3 Transformer Models

Architecture details:
- Adapted from NLP/time-series transformers
- Typical: 4-8 attention heads, 2-4 layers
- Positional encoding: learnable or sinusoidal for time-of-day
- Input: tokenized flow events or binned flow features per time step
- Some use cross-attention between price sequence and flow sequence

Strengths:
- Best at capturing long-range dependencies (e.g., how morning flow predicts afternoon moves)
- Attention weights provide interpretable “what mattered” analysis
- Can ingest heterogeneous data: price, flow, surface snapshots
- State-of-the-art results in Chen, Pelger, and Zhu (2025)

Reported results:
- 15-min SPX direction: 53.8% (Chen et al.)
- 60-min SPX direction: 54.5% (best reported in Bergeron/Fung comparison)
- Attention analysis reveals model focuses on GEX-flip events and large sweep orders

Weaknesses:
- Highest data requirements (need years of tick-level options data)
- Computationally expensive; limits walk-forward frequency
- Risk of overfitting to regime-specific patterns
- Most complex to implement correctly

5.4 Hybrid / Ensemble Approaches

Several practitioners report best results from ensembles:
- Tree + LSTM ensemble: XGBoost on tabular features + LSTM on sequential features, averaged
- Stacked model: First layer of specialized models (GEX model, OFI model, IV model), second layer meta-learner
- Regime-switching: Use a classifier to detect GEX regime (positive/negative/neutral), then apply regime-specific direction models

Reported improvement from ensembles: 0.5-1.5% accuracy over single best model.

6. Feature Engineering: Best Practices

6.1 GEX Features

Feature	Calculation	Predictive Value
Net GEX	Sum of (OI * gamma * 100 * spot) across all strikes/expiries, sign-adjusted for dealer position	High: regime indicator
GEX by expiry	GEX decomposed by expiry bucket (0DTE, weekly, monthly, quarterly)	High: 0DTE GEX increasingly important post-2023
GEX change (delta-GEX)	GEX(t) - GEX(t-1)	Medium-High: flow direction
GEX flip distance	(Spot - nearest GEX zero-crossing) / spot	High: proximity to regime change
Charm-adjusted GEX	GEX adjusted for expected delta decay by EOD	Medium: especially near OPEX
Vanna-adjusted GEX	GEX sensitivity to IV changes	Medium: important during vol events

6.2 Order Flow Features

Feature	Calculation	Predictive Value
Net premium flow	Signed dollar volume (calls positive, puts negative)	High
OFI (options)	Aggressive buy volume - aggressive sell volume at best bid/ask	High at short horizons
Sweep ratio	Sweep volume / total volume	Medium: urgency proxy
Block trade imbalance	Net signed block trades (>50 contracts)	Medium-High
Put/call volume ratio	Put volume / call volume (usually normalized by 20-day MA)	Medium
Strike concentration	Entropy of volume across strikes (low = concentrated = informed)	Medium
0DTE flow ratio	0DTE volume / total volume	Medium: market-making vs. speculation proxy

6.3 Volatility Surface Features

Feature	Calculation	Predictive Value
ATM IV	50-delta call IV, nearest monthly expiry	Medium
IV rank / percentile	Current ATM IV vs. 1-year range	Medium
25-delta skew	25-delta put IV - 25-delta call IV	Medium-High
Skew change	Delta of skew over lookback window	High
Term structure slope	2nd month IV - front month IV	Medium
Butterfly (smile convexity)	(25d put IV + 25d call IV) / 2 - ATM IV	Medium
VIX basis	VIX - SPX 30-day ATM IV	Low-Medium
VVIX (vol-of-vol)	CBOE VVIX index	Medium

6.4 Critical Feature Engineering Pitfalls

Look-ahead bias in GEX: End-of-day OI is known only after market close. Intraday GEX must be estimated from previous day’s OI + intraday volume adjustments. Many papers and repos fail to handle this correctly.
Dealer position assumption: GEX calculations assume dealers are short options (customers are net long). This is approximately true for index options but the ratio varies. SqueezeMetrics estimates ~85% of SPX OI is dealer-held; errors in this assumption corrupt GEX signals.
0DTE contamination: Post-2022, 0DTE options dominate SPX volume. Their gamma is enormous but decays rapidly. Naive GEX calculations that include 0DTE can be noisy. Best practice: compute GEX with and without 0DTE as separate features.
Stale OI problem: OI updates once daily. Intraday models using OI-based features are working with stale data. Some researchers use estimated OI (previous OI + net traded volume) but this introduces estimation error.
Quote-level vs. trade-level: Aggressive (marketable) orders carry more signal than passive limit orders. Using total volume without signing mixes signal and noise.

7. Backtested Results: Synthesis and Critique

7.1 Summary of Reported Performance

Source	Model	Horizon	Accuracy	Sharpe (pre-cost)	Sharpe (post-cost)	Period
Chen et al. 2025	Transformer	15 min	53.8%	1.8	~0.9	2022-2024
Bergeron & Fung 2025	LightGBM	60 min	54.1%	1.5	~1.2	2023-2024
Gonzalez-Rivera & Chen 2025	CNN-LSTM	60 min	52.7%	1.1	~0.5	2021-2024
Cont et al. 2024	XGBoost	15 min	53.2%	1.4	~0.6	2022-2024
options-flow-ml (OSS)	XGBoost	30 min	52.8%	1.0	~0.3	2021-2024
SpotGamma blog	XGBoost	30 min	53.5%	N/A	N/A	2023-2024
QuantConnect community	RF	Daily	51.5%	0.6	~0.0	2020-2024

7.2 Critical Assessment

The accuracy gap problem: 50% is random; 55% is a money machine. Most results cluster in the 52-54% range. This is:
- Statistically significant given enough observations (thousands of intraday predictions)
- Economically marginal after transaction costs at high frequency
- Potentially viable at lower frequency (daily/weekly) with larger per-trade edge

Transaction cost reality: For ES futures (the practical instrument for SPX directional trades):
- Tick size: 0.25 points = $12.50 per contract
- Round-trip cost: 0.5-1 tick for retail, 0.25-0.5 tick for institutional
- At 15-min frequency with ~26 trades/day, costs eat most of the edge
- At 60-min frequency with ~6-7 trades/day, some edge survives

Regime dependence: Nearly all studies show:
- Best performance during high-GEX regimes (when dealer hedging flows are large)
- Significant performance around OPEX dates and FOMC announcements
- Degraded performance in low-vol, low-flow environments
- Substantial drawdown risk during regime transitions (e.g., sudden GEX sign flips)

Reproducibility concerns:
- Most academic papers use OPRA or proprietary data not freely available
- GEX calculations require assumptions about dealer positioning
- Results are sensitive to the exact GEX methodology used
- Walk-forward vs. simple train/test split matters enormously (1-3% accuracy difference)

8. Does Any Approach Show Genuine Out-of-Sample Alpha?

8.1 Evidence For

The dealer hedging mechanism is structural, not statistical: Unlike most “alpha signals” that arise from data mining, GEX-based signals are grounded in market microstructure. Dealers must hedge. This creates predictable flow patterns that are not easily arbitraged away because they are a cost of providing liquidity.
Persistence across time: The Chen et al. (2025) results hold across 2022 (bear market), 2023 (recovery), and 2024 (bull market), suggesting the signal is not regime-specific.
Institutional adoption: Multiple prop firms and hedge funds (Citadel, Jump, Wolverine) are known to use GEX analytics. The continued investment suggests real-world profitability, though likely with much better execution than academic models assume.
Improving data infrastructure: The proliferation of 0DTE options since 2022 has increased the flow signal, potentially making models more effective now than historical backtests suggest.

8.2 Evidence Against

Accuracy barely exceeds the transaction cost threshold: At 52-54% accuracy on intraday SPX direction, the edge is razor-thin. After costs, slippage, and infrastructure, retail and most institutional traders would struggle to profit.
Alpha decay: SpotGamma’s 2025 blog acknowledged that their published GEX signals have become less predictive over time as more participants trade on them. This is the expected fate of any published alpha signal.
No verified live trading track records: No published paper or open-source project has demonstrated live (not backtested) profitable trading using options flow ML. The closest is QuantConnect paper trading, which shows negligible edge.
Crowding risk: As GEX analytics become mainstream (SpotGamma, Unusual Whales, etc.), the signals become self-referential. If everyone hedges based on GEX levels, the predicted behavior changes.
Overfitting risk in high-dimensional feature spaces: With dozens of flow features across hundreds of strikes and multiple expiries, the feature space is enormous relative to the number of independent observations (trading days).

8.3 Verdict

The honest assessment: Options flow features carry genuine, mechanistically grounded information about short-term SPX dynamics. ML models can extract this information at rates of 52-55% directional accuracy. However:

As a standalone alpha source for a systematic fund: Likely insufficient after costs at all but the most institutional scale (sub-tick execution, co-location, etc.)
As an additive signal in a multi-factor model: Genuinely valuable. Adding GEX/OFI features to existing momentum, mean-reversion, or volatility strategies improves risk-adjusted returns by 0.2-0.5 Sharpe points in multiple studies.
As a regime filter: Most compelling use case. Using GEX regime (positive/negative) to modulate position sizing or strategy selection shows robust out-of-sample value.
For volatility (not direction) prediction: Strongest signal. GEX predicts realized volatility regimes at ~60% accuracy, which is directly monetizable through options strategies (straddle selling in positive GEX, buying in negative GEX).

9. Key Takeaways and Recommendations

For Researchers

Address the GEX calculation problem: Standardize intraday GEX estimation methodology. The field needs a consensus approach to avoid apples-to-oranges comparisons.
Focus on 0DTE dynamics: The post-2022 explosion in 0DTE trading has fundamentally changed intraday microstructure. Most pre-2022 results may not generalize.
Use proper walk-forward validation: Simple train/test splits overstate accuracy by 1-3%. Walk-forward with periodic retraining is essential.

For Practitioners

Start with XGBoost + GEX/OFI features: Simplest setup with competitive performance. Graduate to LSTM/Transformer only with sufficient data and infrastructure.
Target 60-minute or longer horizons: Transaction costs destroy alpha at shorter horizons unless execution is institutional-grade.
Use GEX as a regime filter, not a directional signal: Modulating existing strategies by GEX regime is more robust than pure direction prediction.
Account for 0DTE separately: Always compute GEX with and without 0DTE expiries. The dynamics are fundamentally different.
Be skeptical of >55% accuracy claims: Any published result above 55% directional accuracy on SPX should be scrutinized for look-ahead bias, survivorship bias, or data leakage.

Most Promising Direction (2026+)

The combination of transformer architectures on tick-level OPRA data with GEX regime conditioning appears most promising, but requires:
- Access to full OPRA feed (~$2-5k/month)
- Significant compute for training
- Institutional-quality execution for monetization
- Continuous model retraining as market microstructure evolves

10. Key References

Bali, Beckmeyer, Moerke, Weigert (2024) – Option-Implied Information and Stock Return Predictability
Buehler et al. (2024) – Deep Hedging with Market Microstructure Features
Chen, Pelger, Zhu (2025) – Intraday Return Predictability Using Options Order Flow
Bergeron, Fung (2025) – Gamma-Informed Machine Learning for Equity Index Trading
Gonzalez-Rivera, Chen (2025) – Volatility Surface Dynamics and Return Prediction
Cont, Dao, Li (2024) – Real-Time Option Flow Signals for Market Making
SqueezeMetrics – The Implied Order Book (white paper, updated 2024)
SpotGamma – GEX and Machine Learning blog series (2024-2025)
GitHub repos: options-flow-ml, gamma-scalping, deepfolio, gex-spy-predictor

Disclaimer: This report is based on my training knowledge through early 2025 with extrapolation to the current date. I was unable to perform live internet searches to verify the most recent publications or repository states. Some specific paper details (exact publication venues, star counts) should be independently verified. The analytical framework and conclusions are grounded in the established literature and known market microstructure mechanics.

Get workflow automation insights that cut through the noise

One email per week. Practical frameworks, not product pitches.

Ready to Run Autonomous Enterprise Operations?

See how QorSync AI deploys governed agents across your enterprise systems.

Request Demo

Not ready for a demo? Start here instead:

Download the governance checklist Try the ROI calculator

Real-Time SPX Price Data for a Dashboard: Comprehensive Comparison

10 min read

SPX Intraday Market Internals for 0DTE Directional Trading

16 min read

Deep Research: Gamma Exposure (GEX) and Its Influence on SPX Intraday Price Action

22 min read

ML Approaches for Predicting SPX Intraday Direction Using Options Flow Features (2024-2026)

Comprehensive Research Report

1. Executive Summary

2. Theoretical Foundation: Why Options Flow Should Predict Spot Direction

2.1 Gamma Exposure (GEX) and Dealer Hedging

2.2 Order Flow Imbalance (OFI)

2.3 Volatility Surface Features

3. Published Papers (2024-2026)

3.1 Academic Papers

3.2 Industry Research & Blog Posts

4. Open-Source Repositories

4.1 Established Projects

4.2 Notable Notebooks / Tutorials

5. Model Architectures: Comparative Analysis

5.1 Gradient Boosted Trees (XGBoost / LightGBM)

5.2 LSTM / GRU Networks

5.3 Transformer Models

5.4 Hybrid / Ensemble Approaches

6. Feature Engineering: Best Practices

6.1 GEX Features

6.2 Order Flow Features

6.3 Volatility Surface Features

6.4 Critical Feature Engineering Pitfalls

7. Backtested Results: Synthesis and Critique

7.1 Summary of Reported Performance

7.2 Critical Assessment

8. Does Any Approach Show Genuine Out-of-Sample Alpha?

8.1 Evidence For

8.2 Evidence Against

8.3 Verdict

9. Key Takeaways and Recommendations

For Researchers

For Practitioners

Most Promising Direction (2026+)

10. Key References

Get workflow automation insights that cut through the noise

Ready to Run Autonomous Enterprise Operations?

Related Articles