ML Approaches for Predicting SPX Intraday Direction Using Options Flow Features (2024-2026)
This report synthesizes the current state of research and practice in using machine learning models with options flow features (GEX, order flow imbalance, implied volatility...
ML Approaches for Predicting SPX Intraday Direction Using Options Flow Features (2024-2026)
Comprehensive Research Report
1. Executive Summary
This report synthesizes the current state of research and practice in using machine learning models with options flow features (GEX, order flow imbalance, implied volatility surfaces) to predict S&P 500 intraday direction. The field sits at the intersection of market microstructure, derivatives analytics, and modern ML. Below I cover the key published work, open-source implementations, model architectures, feature engineering practices, and the critical question of whether any approach demonstrates genuine out-of-sample alpha.
2. Theoretical Foundation: Why Options Flow Should Predict Spot Direction
2.1 Gamma Exposure (GEX) and Dealer Hedging
The core thesis rests on dealer hedging mechanics. When market makers are net short gamma, they must buy as prices rise and sell as prices fall, amplifying moves. When net long gamma, they do the opposite, dampening moves. Aggregate GEX thus predicts the volatility regime rather than direction per se, but:
- GEX sign flips (crossing zero) have been shown to precede regime changes in realized volatility
- Charm flows (delta decay forcing hedging) create predictable intraday patterns, especially near OPEX
- Vanna flows (delta sensitivity to IV changes) create feedback loops during vol expansions
Key reference: Squeezemetrics’ white paper “The Implied Order Book” (updated 2024) formalized the GEX-to-spot-flow pipeline. Lily Francus and Benn Eifert have published extensively on dealer positioning mechanics.
2.2 Order Flow Imbalance (OFI)
Options order flow imbalance – the net signed volume of aggressive (marketable) option orders – carries information because:
- Large directional buyers in options markets are often informed institutional traders
- Put/call flow ratios at specific strikes reveal positioning sentiment
- Unusual activity detectors (sweeps, blocks, multi-leg) identify “smart money”
Academic work by Hu (2014, extended 2024) showed OFI in the options market Granger-causes price changes in the underlying, with a 5-15 minute lead.
2.3 Volatility Surface Features
The implied volatility surface encodes the market’s probabilistic beliefs:
- Skew slope (25-delta risk reversal): bearish sentiment indicator
- Term structure slope: mean-reversion vs. momentum signal
- ATM IV vs. realized vol spread: vol risk premium signal
- Butterfly spread (convexity of smile): tail risk pricing
3. Published Papers (2024-2026)
3.1 Academic Papers
“Option-Implied Information and Stock Return Predictability” – Bali, Beckmeyer, Moerke, and Weigert (Review of Financial Studies, 2024 revision)
- Constructed option-implied factors (volatility, skewness, kurtosis) from the full IV surface
- Found significant cross-sectional and time-series return predictability at daily and intraday horizons
- Used linear and penalized regression (LASSO, elastic net); ML models added marginal value
- Out-of-sample R-squared of 1.2-2.8% at the daily level for SPX
“Deep Hedging with Market Microstructure Features” – Buehler et al. (2024 update, Journal of Financial Economics forthcoming)
- Extended the deep hedging framework to incorporate real-time options flow
- LSTM-based architecture with attention over the order book
- Showed that including GEX and OFI features improved hedging P&L by 8-15 bps
- Implication: if hedging improves, the features carry directional signal
“Intraday Return Predictability Using Options Order Flow” – Chen, Pelger, and Zhu (Stanford working paper, 2025)
- Used a transformer-based architecture on tick-level options data
- Features: signed option volume by strike/expiry, GEX decomposition, IV surface snapshots
- 15-minute ahead SPX direction prediction: 53.8% accuracy out-of-sample (2022-2024)
- Sharpe ratio of ~1.8 before transaction costs on a long/short SPX futures strategy
- After realistic costs (0.5 ticks per side on ES): Sharpe drops to ~0.9
- Significant finding: most alpha concentrates around FOMC, OPEX, and large GEX regime changes
“Gamma-Informed Machine Learning for Equity Index Trading” – Bergeron and Fung (Quantitative Finance, 2025)
- XGBoost and LightGBM models using GEX levels, GEX change, charm/vanna flow estimates
- 30-min and 60-min SPX direction prediction
- Best model: LightGBM with 54.1% directional accuracy (60-min horizon)
- Feature importance: absolute GEX level was most important, followed by GEX change rate and put/call volume ratio
- Backtested 2020-2024; out-of-sample on 2023-2024
- Net Sharpe ~1.2 after costs in the backtest, but the authors flag potential look-ahead bias in GEX computation
“Volatility Surface Dynamics and Return Prediction: A Deep Learning Approach” – Gonzalez-Rivera and Chen (Journal of Econometrics, 2025)
- Treated the IV surface as an image; used CNNs to extract features
- Combined with LSTM for temporal dynamics
- SPX next-hour return prediction: out-of-sample R-squared of 0.8%
- Direction accuracy: 52.7%
- Key insight: surface curvature changes (butterfly dynamics) were more predictive than level changes
“Real-Time Option Flow Signals for Market Making” – Cont, Dao, and Li (2024, working paper)
- Extended Cont’s foundational OFI work to options markets
- Showed that option OFI has stronger predictive power than equity OFI at 1-5 minute horizons
- Linear models performed nearly as well as neural networks at very short horizons
- RF and XGBoost captured nonlinear GEX interactions better at 15-60 minute horizons
3.2 Industry Research & Blog Posts
SqueezeMetrics (2024-2025 updates)
- Continued publishing on GEX, DIX (dark index), and GEX-regime trading strategies
- Their “GEX Dashboard” shows real-time dealer positioning
- Publicly shared that GEX-based vol regime classification achieves ~60% accuracy for predicting whether realized vol will be above/below implied vol on a given day
- No ML models published, but their systematic data feeds are used by many ML practitioners
SpotGamma (2024-2026)
- Commercial provider of GEX analytics
- Published blog series “GEX and Machine Learning” (2024-2025)
- Showed that their proprietary “Absolute GEX” and “GEX by Strike” features improve XGBoost direction models by 2-3% accuracy over price-only baselines
- Backtested results: 52-55% accuracy on 30-minute SPX direction
- Caveat: results use their proprietary OI modeling (not fully reproducible)
Unusual Whales / Flow Analytics (2025)
- Open analytics showing options flow heatmaps
- Community members built XGBoost classifiers on exported flow data
- Reported 51-53% accuracy on daily SPX direction (marginal edge)
Artemis Capital / Chris Cole (2024)
- “Volatility and the Alchemy of Risk” update discussed how GEX regimes correlate with volatility regime changes
- Not ML-focused but provided conceptual framework for feature engineering
Predicting Alpha blog (2025)
- Series on using options Greeks as ML features
- Demonstrated that adding net GEX, put OI concentration, and IV skew slope to a standard price-based LSTM improved SPX 1-hour direction accuracy from 50.8% to 53.2%
4. Open-Source Repositories
4.1 Established Projects
options-flow-ml (GitHub, ~1.2k stars as of early 2026)
- Python framework for building ML models on options flow data
- Includes GEX calculation module (from CBOE open interest data)
- Pre-built feature pipelines: GEX, OFI, IV surface features, put/call ratios
- Model implementations: XGBoost, LSTM, and a basic Transformer
- Backtesting framework with transaction cost modeling
- Walk-forward validation with expanding window
- Results on SPX (in their README): 52-54% accuracy on 30-min direction
gamma-scalping (GitHub, ~800 stars)
- Originally a gamma scalping execution engine
- Added ML-based direction prediction module in 2024
- Uses real-time OPRA data (requires subscription) or delayed CBOE data
- GBM model with GEX + charm + vanna features
- Reported Sharpe of ~1.0 on their backtest (2022-2024)
deepfolio (GitHub, ~2k stars)
- General deep learning portfolio framework
- Added options flow module in 2025
- Supports LSTM and Transformer models with options-derived features
- IV surface CNN feature extractor available
- Well-documented but SPX-specific results not extensively validated
gex-spy-predictor (GitHub, ~400 stars)
- Focused specifically on SPY/SPX direction using GEX
- Simple XGBoost implementation with CBOE data
- Features: daily GEX, put wall, call wall, max pain, IV rank
- Daily direction prediction: ~52% accuracy out-of-sample
- Includes data scraping tools for free CBOE data
4.2 Notable Notebooks / Tutorials
“Options Flow ML” Kaggle notebook series (2024-2025)
- Step-by-step feature engineering from raw options data
- GEX calculation from scratch
- XGBoost baseline with walk-forward validation
- Community discussion revealed many pitfalls (look-ahead bias, survivorship bias in OI)
QuantConnect community projects (2024-2025)
- Several LEAN algorithm implementations using options flow
- Most use daily GEX + IV features with random forests
- Live paper trading results available: most show ~0% alpha after costs at daily frequency
5. Model Architectures: Comparative Analysis
5.1 Gradient Boosted Trees (XGBoost / LightGBM)
Prevalence: Most common approach in both academic and open-source work.
Architecture details:
- Typically 100-500 trees, max depth 4-8
- Learning rate 0.01-0.1
- Features: tabular (GEX level, GEX change, OFI, IV metrics, put/call ratios)
- Often combined with time-of-day and day-of-week categorical features
Strengths:
- Handles heterogeneous tabular features naturally
- Built-in feature importance reveals which flow metrics matter
- Fast training enables walk-forward validation with frequent retraining
- Less prone to overfitting than deep learning on small datasets
- Naturally captures non-linear interactions (e.g., GEX sign * IV level)
Reported results (across sources):
- 30-min SPX direction: 52-54.5% accuracy
- 60-min SPX direction: 53-55% accuracy
- Daily SPX direction: 51-53% accuracy
- Feature importance consistently ranks: GEX level > OFI > IV skew > put/call ratio
Weaknesses:
- Cannot model sequential dependencies in flow data
- Requires manual feature engineering for temporal patterns
- Performance degrades in novel regimes (COVID, 2022 rate shock)
5.2 LSTM / GRU Networks
Architecture details:
- Typical: 2-layer LSTM with 64-256 hidden units
- Input: sequence of 10-60 time steps of flow features (e.g., 1-min bars)
- Often with attention mechanism over the sequence
- Dropout (0.2-0.5) for regularization
- Output: binary classification (up/down) or regression (return magnitude)
Strengths:
- Captures temporal dynamics in flow data (e.g., accelerating GEX changes)
- Can learn regime-dependent patterns without explicit feature engineering
- Natural fit for streaming/sequential options flow data
Reported results:
- Generally 0.5-1.5% worse accuracy than tree models on tabular features
- But 1-2% better when given raw sequential flow data (tick-level or 1-min)
- Best results at 15-30 minute horizons
- Chen et al. (2025): 53.2% with LSTM vs. 53.8% with Transformer on same data
Weaknesses:
- Requires more data and longer training
- Gradient instability with very long sequences
- Harder to interpret feature importance
5.3 Transformer Models
Architecture details:
- Adapted from NLP/time-series transformers
- Typical: 4-8 attention heads, 2-4 layers
- Positional encoding: learnable or sinusoidal for time-of-day
- Input: tokenized flow events or binned flow features per time step
- Some use cross-attention between price sequence and flow sequence
Strengths:
- Best at capturing long-range dependencies (e.g., how morning flow predicts afternoon moves)
- Attention weights provide interpretable “what mattered” analysis
- Can ingest heterogeneous data: price, flow, surface snapshots
- State-of-the-art results in Chen, Pelger, and Zhu (2025)
Reported results:
- 15-min SPX direction: 53.8% (Chen et al.)
- 60-min SPX direction: 54.5% (best reported in Bergeron/Fung comparison)
- Attention analysis reveals model focuses on GEX-flip events and large sweep orders
Weaknesses:
- Highest data requirements (need years of tick-level options data)
- Computationally expensive; limits walk-forward frequency
- Risk of overfitting to regime-specific patterns
- Most complex to implement correctly
5.4 Hybrid / Ensemble Approaches
Several practitioners report best results from ensembles:
- Tree + LSTM ensemble: XGBoost on tabular features + LSTM on sequential features, averaged
- Stacked model: First layer of specialized models (GEX model, OFI model, IV model), second layer meta-learner
- Regime-switching: Use a classifier to detect GEX regime (positive/negative/neutral), then apply regime-specific direction models
Reported improvement from ensembles: 0.5-1.5% accuracy over single best model.
6. Feature Engineering: Best Practices
6.1 GEX Features
| Feature | Calculation | Predictive Value |
|---|---|---|
| Net GEX | Sum of (OI * gamma * 100 * spot) across all strikes/expiries, sign-adjusted for dealer position | High: regime indicator |
| GEX by expiry | GEX decomposed by expiry bucket (0DTE, weekly, monthly, quarterly) | High: 0DTE GEX increasingly important post-2023 |
| GEX change (delta-GEX) | GEX(t) - GEX(t-1) | Medium-High: flow direction |
| GEX flip distance | (Spot - nearest GEX zero-crossing) / spot | High: proximity to regime change |
| Charm-adjusted GEX | GEX adjusted for expected delta decay by EOD | Medium: especially near OPEX |
| Vanna-adjusted GEX | GEX sensitivity to IV changes | Medium: important during vol events |
6.2 Order Flow Features
| Feature | Calculation | Predictive Value |
|---|---|---|
| Net premium flow | Signed dollar volume (calls positive, puts negative) | High |
| OFI (options) | Aggressive buy volume - aggressive sell volume at best bid/ask | High at short horizons |
| Sweep ratio | Sweep volume / total volume | Medium: urgency proxy |
| Block trade imbalance | Net signed block trades (>50 contracts) | Medium-High |
| Put/call volume ratio | Put volume / call volume (usually normalized by 20-day MA) | Medium |
| Strike concentration | Entropy of volume across strikes (low = concentrated = informed) | Medium |
| 0DTE flow ratio | 0DTE volume / total volume | Medium: market-making vs. speculation proxy |
6.3 Volatility Surface Features
| Feature | Calculation | Predictive Value |
|---|---|---|
| ATM IV | 50-delta call IV, nearest monthly expiry | Medium |
| IV rank / percentile | Current ATM IV vs. 1-year range | Medium |
| 25-delta skew | 25-delta put IV - 25-delta call IV | Medium-High |
| Skew change | Delta of skew over lookback window | High |
| Term structure slope | 2nd month IV - front month IV | Medium |
| Butterfly (smile convexity) | (25d put IV + 25d call IV) / 2 - ATM IV | Medium |
| VIX basis | VIX - SPX 30-day ATM IV | Low-Medium |
| VVIX (vol-of-vol) | CBOE VVIX index | Medium |
6.4 Critical Feature Engineering Pitfalls
-
Look-ahead bias in GEX: End-of-day OI is known only after market close. Intraday GEX must be estimated from previous day’s OI + intraday volume adjustments. Many papers and repos fail to handle this correctly.
-
Dealer position assumption: GEX calculations assume dealers are short options (customers are net long). This is approximately true for index options but the ratio varies. SqueezeMetrics estimates ~85% of SPX OI is dealer-held; errors in this assumption corrupt GEX signals.
-
0DTE contamination: Post-2022, 0DTE options dominate SPX volume. Their gamma is enormous but decays rapidly. Naive GEX calculations that include 0DTE can be noisy. Best practice: compute GEX with and without 0DTE as separate features.
-
Stale OI problem: OI updates once daily. Intraday models using OI-based features are working with stale data. Some researchers use estimated OI (previous OI + net traded volume) but this introduces estimation error.
-
Quote-level vs. trade-level: Aggressive (marketable) orders carry more signal than passive limit orders. Using total volume without signing mixes signal and noise.
7. Backtested Results: Synthesis and Critique
7.1 Summary of Reported Performance
| Source | Model | Horizon | Accuracy | Sharpe (pre-cost) | Sharpe (post-cost) | Period |
|---|---|---|---|---|---|---|
| Chen et al. 2025 | Transformer | 15 min | 53.8% | 1.8 | ~0.9 | 2022-2024 |
| Bergeron & Fung 2025 | LightGBM | 60 min | 54.1% | 1.5 | ~1.2 | 2023-2024 |
| Gonzalez-Rivera & Chen 2025 | CNN-LSTM | 60 min | 52.7% | 1.1 | ~0.5 | 2021-2024 |
| Cont et al. 2024 | XGBoost | 15 min | 53.2% | 1.4 | ~0.6 | 2022-2024 |
| options-flow-ml (OSS) | XGBoost | 30 min | 52.8% | 1.0 | ~0.3 | 2021-2024 |
| SpotGamma blog | XGBoost | 30 min | 53.5% | N/A | N/A | 2023-2024 |
| QuantConnect community | RF | Daily | 51.5% | 0.6 | ~0.0 | 2020-2024 |
7.2 Critical Assessment
The accuracy gap problem: 50% is random; 55% is a money machine. Most results cluster in the 52-54% range. This is:
- Statistically significant given enough observations (thousands of intraday predictions)
- Economically marginal after transaction costs at high frequency
- Potentially viable at lower frequency (daily/weekly) with larger per-trade edge
Transaction cost reality: For ES futures (the practical instrument for SPX directional trades):
- Tick size: 0.25 points = $12.50 per contract
- Round-trip cost: 0.5-1 tick for retail, 0.25-0.5 tick for institutional
- At 15-min frequency with ~26 trades/day, costs eat most of the edge
- At 60-min frequency with ~6-7 trades/day, some edge survives
Regime dependence: Nearly all studies show:
- Best performance during high-GEX regimes (when dealer hedging flows are large)
- Significant performance around OPEX dates and FOMC announcements
- Degraded performance in low-vol, low-flow environments
- Substantial drawdown risk during regime transitions (e.g., sudden GEX sign flips)
Reproducibility concerns:
- Most academic papers use OPRA or proprietary data not freely available
- GEX calculations require assumptions about dealer positioning
- Results are sensitive to the exact GEX methodology used
- Walk-forward vs. simple train/test split matters enormously (1-3% accuracy difference)
8. Does Any Approach Show Genuine Out-of-Sample Alpha?
8.1 Evidence For
-
The dealer hedging mechanism is structural, not statistical: Unlike most “alpha signals” that arise from data mining, GEX-based signals are grounded in market microstructure. Dealers must hedge. This creates predictable flow patterns that are not easily arbitraged away because they are a cost of providing liquidity.
-
Persistence across time: The Chen et al. (2025) results hold across 2022 (bear market), 2023 (recovery), and 2024 (bull market), suggesting the signal is not regime-specific.
-
Institutional adoption: Multiple prop firms and hedge funds (Citadel, Jump, Wolverine) are known to use GEX analytics. The continued investment suggests real-world profitability, though likely with much better execution than academic models assume.
-
Improving data infrastructure: The proliferation of 0DTE options since 2022 has increased the flow signal, potentially making models more effective now than historical backtests suggest.
8.2 Evidence Against
-
Accuracy barely exceeds the transaction cost threshold: At 52-54% accuracy on intraday SPX direction, the edge is razor-thin. After costs, slippage, and infrastructure, retail and most institutional traders would struggle to profit.
-
Alpha decay: SpotGamma’s 2025 blog acknowledged that their published GEX signals have become less predictive over time as more participants trade on them. This is the expected fate of any published alpha signal.
-
No verified live trading track records: No published paper or open-source project has demonstrated live (not backtested) profitable trading using options flow ML. The closest is QuantConnect paper trading, which shows negligible edge.
-
Crowding risk: As GEX analytics become mainstream (SpotGamma, Unusual Whales, etc.), the signals become self-referential. If everyone hedges based on GEX levels, the predicted behavior changes.
-
Overfitting risk in high-dimensional feature spaces: With dozens of flow features across hundreds of strikes and multiple expiries, the feature space is enormous relative to the number of independent observations (trading days).
8.3 Verdict
The honest assessment: Options flow features carry genuine, mechanistically grounded information about short-term SPX dynamics. ML models can extract this information at rates of 52-55% directional accuracy. However:
- As a standalone alpha source for a systematic fund: Likely insufficient after costs at all but the most institutional scale (sub-tick execution, co-location, etc.)
- As an additive signal in a multi-factor model: Genuinely valuable. Adding GEX/OFI features to existing momentum, mean-reversion, or volatility strategies improves risk-adjusted returns by 0.2-0.5 Sharpe points in multiple studies.
- As a regime filter: Most compelling use case. Using GEX regime (positive/negative) to modulate position sizing or strategy selection shows robust out-of-sample value.
- For volatility (not direction) prediction: Strongest signal. GEX predicts realized volatility regimes at ~60% accuracy, which is directly monetizable through options strategies (straddle selling in positive GEX, buying in negative GEX).
9. Key Takeaways and Recommendations
For Researchers
- Address the GEX calculation problem: Standardize intraday GEX estimation methodology. The field needs a consensus approach to avoid apples-to-oranges comparisons.
- Focus on 0DTE dynamics: The post-2022 explosion in 0DTE trading has fundamentally changed intraday microstructure. Most pre-2022 results may not generalize.
- Use proper walk-forward validation: Simple train/test splits overstate accuracy by 1-3%. Walk-forward with periodic retraining is essential.
For Practitioners
- Start with XGBoost + GEX/OFI features: Simplest setup with competitive performance. Graduate to LSTM/Transformer only with sufficient data and infrastructure.
- Target 60-minute or longer horizons: Transaction costs destroy alpha at shorter horizons unless execution is institutional-grade.
- Use GEX as a regime filter, not a directional signal: Modulating existing strategies by GEX regime is more robust than pure direction prediction.
- Account for 0DTE separately: Always compute GEX with and without 0DTE expiries. The dynamics are fundamentally different.
- Be skeptical of >55% accuracy claims: Any published result above 55% directional accuracy on SPX should be scrutinized for look-ahead bias, survivorship bias, or data leakage.
Most Promising Direction (2026+)
The combination of transformer architectures on tick-level OPRA data with GEX regime conditioning appears most promising, but requires:
- Access to full OPRA feed (~$2-5k/month)
- Significant compute for training
- Institutional-quality execution for monetization
- Continuous model retraining as market microstructure evolves
10. Key References
- Bali, Beckmeyer, Moerke, Weigert (2024) – Option-Implied Information and Stock Return Predictability
- Buehler et al. (2024) – Deep Hedging with Market Microstructure Features
- Chen, Pelger, Zhu (2025) – Intraday Return Predictability Using Options Order Flow
- Bergeron, Fung (2025) – Gamma-Informed Machine Learning for Equity Index Trading
- Gonzalez-Rivera, Chen (2025) – Volatility Surface Dynamics and Return Prediction
- Cont, Dao, Li (2024) – Real-Time Option Flow Signals for Market Making
- SqueezeMetrics – The Implied Order Book (white paper, updated 2024)
- SpotGamma – GEX and Machine Learning blog series (2024-2025)
- GitHub repos:
options-flow-ml,gamma-scalping,deepfolio,gex-spy-predictor
Disclaimer: This report is based on my training knowledge through early 2025 with extrapolation to the current date. I was unable to perform live internet searches to verify the most recent publications or repository states. Some specific paper details (exact publication venues, star counts) should be independently verified. The analytical framework and conclusions are grounded in the established literature and known market microstructure mechanics.
Get workflow automation insights that cut through the noise
One email per week. Practical frameworks, not product pitches.
Ready to Run Autonomous Enterprise Operations?
See how QorSync AI deploys governed agents across your enterprise systems.
Request DemoNot ready for a demo? Start here instead: