Research Layer

Last updated: 2026-05-19

Overview

The research layer sits above the gold layer and answers: “which signals actually work, and how should I combine them into a strategy?” It consists of three components, deployed incrementally:

Gold signals ──┐
               ├──► Signal Validation (walk-forward)  ──► Signal Scorecard     [IMPLEMENTED]
               ├──► Meta-Model (ML combination)       ──► Combined Alpha Score [PLANNED #1]
               └──► Backtest Engine                   ──► Strategy PnL + Risk  [PLANNED #2]

1. Walk-Forward Signal Validation (Implemented)

Purpose

Continuously validate whether gold signals have predictive power over forward equity returns. Detects signal decay, regime sensitivity, and overfitting — so you know which signals to trust before trading on them.

Methodology

Expanding window approach:

|------------ train (>= 252 days) ------------|-- test (63 days) --|
                                              ^
                                              window slides forward by 21 days

Start with minimum 252 trading days of history
Test the next 63 days (one quarter)
Slide the window forward by 21 days
Repeat until end of available data

Metrics per window:

Metric	Definition	Threshold
IC (Information Coefficient)	Spearman rank correlation: signal vs. forward return	\|IC\| > 0.03 is meaningful
Hit Rate	% of days where signal direction matched return direction	> 55% is meaningful
t-statistic	Statistical significance of IC	\|t\| > 1.96 (95% confidence)
Regime IC	IC computed within each regime subset (hiking/cutting/stress/etc.)	Same thresholds

Signal health classification:

Health	Criteria	Action
Active	Latest IC > 0.03 AND statistically significant	Use in strategies
Weakening	Historical avg IC > 0.02 but latest not significant	Monitor, reduce weight
Dead	No evidence of predictive power	Do not trade on this

What Gets Validated

All gold signal columns across three horizons (5d, 20d, 60d forward SPY return):

Yield Curve signal:

DGS30, DGS10, DGS2, T10Y2Y levels
dgs10_dgs2_spread, dgs30_dgs10_spread
t10y2y_ma20, t10y2y_ma60
dgs10_momentum (20d, 60d), dgs30_momentum (20d, 60d)
Regime: fed_cycle (hiking/cutting/hold)

Risk Composite signal:

risk_score, risk_score_smoothed
VIXCLS_zscore, DTWEXBGS_zscore
spy_momentum_20d_zscore, hyg_spy_ratio_zscore
Regime: regime (risk_on/neutral/risk_off)

Credit Leverage signal:

credit_stress_score, credit_stress_smoothed
BAMLH0A0HYM2, BAMLC0A4CBBB, hy_bbb_spread_diff
hy_spread_momentum_20d, bbb_spread_momentum_20d
margin_debt_pct_chg_3m
Regime: credit_regime (stress/elevated/neutral/benign)

Infrastructure

ETL: etl/research/walk_forward_validation.py
Compute: AWS Glue (3x G.1X workers)
Schedule: Weekly, Sunday 08:00 UTC
Output: s3://{research_bucket}/validation/signal={name}/ + validation/scorecard/
Trigger: EventBridge emits SignalValidationComplete when scorecard is ready

How to Read the Scorecard

Query via Athena:

SELECT signal_name, horizon_days, latest_ic, latest_hit_rate, signal_health
FROM stratum_research_{env}.signal_scorecard
WHERE signal_health != 'dead'
ORDER BY latest_ic DESC;

Key questions it answers:

“Which signals are currently predictive at the 20-day horizon?”
“Has credit_stress_score lost its edge since the regime shifted to ‘hold’?”
“Which signals work in a hiking cycle but die in a hold?“

2. ML Signal Combination — Meta-Model (Planned)

GitHub: Issue #1

Concept

Individual signals are noisy and fire at different timings. The meta-model learns the optimal weighting and timing across all gold signals, producing a single daily probability score.

Architecture

Model: LightGBM (interpretable, handles mixed frequencies)
Features: All gold signal columns aligned on date
Target: Forward SPY returns at multiple horizons, or binary drawdown indicator (> 5% in next 20 days)
Training: Walk-forward (expanding window) — never uses future data
Output: Daily combined_alpha score + feature importances

Why Not a Simple Average?

Signals have different optimal horizons (credit stress leads by 2-4 weeks; VIX is contemporaneous)
Signal importance changes by regime (rates momentum matters more during hiking)
Non-linear interactions (credit stress + rising rates = multiplicative risk)

3. Strategy Backtesting Framework (Planned)

GitHub: Issue #2

Concept

Translate validated signals into simulated portfolio decisions with realistic constraints.

Architecture

Engine: Event-driven loop (iterates day by day, no lookahead)
Inputs: Gold signals + meta-model + Yahoo price histories
Assets: SPY, TLT, GLD, cash (expandable)
Constraints: Transaction costs, slippage, position limits
Output: Equity curve, drawdown, Sharpe, Calmar, turnover

Example Strategies to Test

Risk-off on credit stress: Go 50% cash when credit_stress_score > 1.0
Rates + leverage squeeze: Short equities when dgs30_momentum_60d > 50bps AND credit_regime == "stress"
Combined alpha threshold: Size positions proportional to meta-model combined_alpha

Dependency Chain

Walk-forward validation (which signals to trust)
        │
        ▼
Meta-model (how to combine trusted signals)
        │
        ▼
Backtester (what would this combination have returned)

Each layer feeds the next. Walk-forward tells you which inputs to give the meta-model. The meta-model produces the score the backtester trades on.