Gold signals are pre-computed, opinionated quantitative datasets built from one or more silver sources. They are designed for direct consumption in quant notebooks, backtesting engines, and the external API.
All gold signals are:
Partitioned by date (Parquet format)
Registered in Glue Catalog (stratum_gold_{env})
Lineage-tracked with cryptographic provenance
Tier-gated via Lake Formation
Signal: Yield Curve & Rates
Property
Value
Domain
rates
Signal name
yield_curve
Output path
domain=rates/signal=yield_curve/
Source
FRED silver (single-source)
Frequency
Daily
ETL
etl/external/rates_gold.py
Input Series
Series
Description
Frequency
DGS30
30-Year Treasury Constant Maturity Rate
daily
DGS10
10-Year Treasury Constant Maturity Rate
daily
DGS2
2-Year Treasury Constant Maturity Rate
daily
T10Y2Y
10Y minus 2Y Treasury (yield curve spread)
daily
DFF
Federal Funds Effective Rate
daily
FEDFUNDS
Federal Funds Rate
monthly
Output Schema
Column
Type
Description
date
timestamp
Observation date
DGS30
decimal
30-year yield
DGS10
decimal
10-year yield
DGS2
decimal
2-year yield
T10Y2Y
decimal
10Y-2Y spread (raw FRED)
DFF
decimal
Fed funds effective rate
FEDFUNDS
decimal
Fed funds rate (monthly, forward-filled)
dgs10_dgs2_spread
decimal
Computed 10Y-2Y spread
dgs30_dgs10_spread
decimal
30Y-10Y long-end term premium
is_inverted
boolean
True when T10Y2Y < 0
t10y2y_ma20
decimal
20-day moving average of T10Y2Y
t10y2y_ma60
decimal
60-day moving average of T10Y2Y
dgs10_momentum_20d
decimal
20-day rate change in DGS10
dgs10_momentum_60d
decimal
60-day rate change in DGS10
dgs30_momentum_20d
decimal
20-day rate change in DGS30
dgs30_momentum_60d
decimal
60-day rate change in DGS30
dff_delta_5d
decimal
5-day change in fed funds rate
fed_cycle
string
”hiking” / “cutting” / “hold”
Interpretation
is_inverted = True historically precedes recessions by 6-18 months
fed_cycle tracks Fed tightening/easing direction (useful for duration positioning)
hy_bbb_spread_diff widening → flight from junk, risk appetite collapsing (investors moving up the quality ladder)
margin_debt_pct_chg_3m > 5% combined with dgs30_momentum_60d > 50bps (from yield_curve signal) → the leverage-meets-rising-rates scenario where forced deleveraging becomes likely
bond_bid_to_cover falling → supply pressure on long-end Treasuries, market struggling to absorb issuance
Cross-Source Join Notes
Corporate spreads (FRED) define the output calendar (daily business days)
FINRA margin data (monthly) is forward-filled to daily
Treasury auction data is joined on auction date; between auctions the last value carries forward
All z-scored over trailing 252 days before composite aggregation
Research Layer: Signal Validation
Property
Value
Output path
validation/signal={signal_name}/ and validation/scorecard/
Sources
All gold signals + Yahoo silver (SPY)
Frequency
Weekly (Sunday 08:00 UTC)
ETL
etl/research/walk_forward_validation.py
What It Does
Runs expanding-window walk-forward validation on every gold signal column against SPY forward returns (5d, 20d, 60d horizons). Answers: “does this signal actually predict anything, and is it still working?”
Methodology
Expanding window: Starts with 252 days of history, tests on next 63 days, slides forward by 21 days
Information Coefficient (IC): Spearman rank correlation between signal value and realized forward return
Hit Rate: % of days where signal direction matched return direction
t-statistic: Statistical significance of IC (threshold: |t| > 1.96)
Regime-conditional: Splits IC computation by regime column (fed_cycle, credit_regime, risk regime) to detect “this signal only works in hiking cycles”
Output: Signal Scorecard
Column
Description
signal_name
Signal column being validated
horizon_days
Forward return horizon (5, 20, or 60 days)
latest_ic
IC in most recent test window
latest_hit_rate
Hit rate in most recent window
avg_ic
Average IC across all historical windows
ic_volatility
Std dev of IC across windows (stability)
n_windows
Number of test windows evaluated
signal_health
”active” (IC > 0.03 and significant) / “weakening” (avg IC > 0.02 but latest not significant) / “dead”
Interpretation
signal_health = “active” → signal is currently predictive, safe to use in strategies
signal_health = “weakening” → historically worked but losing edge; monitor closely
signal_health = “dead” → no evidence of predictive power; do not trade on this signal
Regime-conditional IC reveals whether a signal only works in specific environments (e.g., credit_stress_score may be highly predictive in hiking cycles but useless during holds)
Signal: Energy Volatility
Property
Value
Domain
energy
Signal name
energy_volatility
Output path
domain=energy/signal=energy_volatility/
Sources
EIA (single-source)
Frequency
Daily
ETL
etl/external/energy_volatility_gold.py
Input Series
Series
Source
Description
Frequency
henry_hub_gas
EIA
Henry Hub Natural Gas Spot Price ($/MMBtu)
daily
wti_crude
EIA
WTI Crude Oil Spot Price ($/barrel)
daily
brent_crude
EIA
Brent Crude Oil Spot Price ($/barrel)
daily
Output Schema
Column
Type
Description
date
timestamp
Trading date
henry_hub_price
decimal
Natural gas spot price
wti_price
decimal
WTI crude spot price
brent_price
decimal
Brent crude spot price
gas_return_1d
decimal
Daily log return on gas
oil_return_1d
decimal
Daily log return on WTI
gas_vol_20d
decimal
20-day realized volatility (gas, annualized)
gas_vol_60d
decimal
60-day realized volatility (gas, annualized)
oil_vol_20d
decimal
20-day realized volatility (oil, annualized)
oil_vol_60d
decimal
60-day realized volatility (oil, annualized)
gas_oil_ratio
decimal
Henry Hub / WTI price ratio
gas_oil_ratio_zscore
decimal
Z-score of gas/oil ratio (252d window)
gas_momentum_20d
decimal
20-day % change in gas price
oil_momentum_20d
decimal
20-day % change in oil price
energy_vol_score
decimal
Composite energy volatility score
energy_vol_smoothed
decimal
5-day smoothed score
energy_regime
string
”spike” / “elevated” / “normal” / “suppressed”
Interpretation
energy_vol_score > 1.5 → extreme energy volatility event (spike)
energy_regime = “spike” historically coincides with equity drawdowns, particularly in energy-exposed sectors
gas_oil_ratio divergence (high z-score) signals energy substitution dynamics or supply dislocations
flow_zscore > 2 → security receiving outsized capital relative to peers this quarter
conviction_score identifies securities where managers are making concentrated bets (high weight × many holders)
Caveats
13F data has a 45-day reporting lag. By the time a filing is public, the trade is 1.5-4.5 months old.
The signal’s value comes from (a) identifying persistent trends that continue past the disclosure date, and (b) detecting herd behavior early in the filing window when only some managers have reported.
Not all position changes are directional bets — some are index rebalancing, tax-loss harvesting, or fund inflows/outflows.
positioning_zscore_26w > 2.0 → “extreme_long” — trade is maximally crowded on the long side. Historically mean-reverting. Contrarian sell signal.
positioning_zscore_26w < -2.0 → “extreme_short” — specs are max short. Mean-reverting. Contrarian buy signal.
spec_commercial_divergence — when specs and commercials disagree strongly, commercials (who have physical exposure and structural information edge) tend to be right over 2-8 week horizons.
positioning_momentum = “reversing_to_long/short” — early detection of a positioning unwind. The weekly change direction has flipped vs the 4-week trend.
crowding_score — high values mean the top 4 traders dominate one side, making the market vulnerable to forced unwinds.
Key Contracts for Macro Trading
Contract Code
Market
Signal Relevance
088691
Gold (COMEX)
Rates/inflation hedge positioning
06765T
10-Year T-Note (CBOT)
Rates/duration positioning
13874+
E-mini S&P 500 (CME)
Equity risk appetite
023651
Crude Oil WTI (NYMEX)
Energy/inflation positioning
023391
Nat Gas (NYMEX)
Energy supply/demand
096742
EUR/USD (CME)
Dollar strength
099741
VIX (CBOE)
Volatility/tail-risk positioning
Caveats
COT data has a 3-day lag: published Friday, as-of Tuesday. Intraweek positioning changes are not captured.
Managed money category includes both trend-followers (CTAs) and discretionary macro funds — these may trade opposite directions.
Extreme positioning can persist for weeks before mean-reverting. Z-score > 2 is a necessary but not sufficient condition for reversal.
The signal works best at the 2-8 week horizon. Not useful for intraday or next-day trading.
risk_appetite_rvol < -0.3 + risk_appetite_momentum < -1.0 → “strong_risk_off” — flight to safety in progress
High relative_volume (>2.0) on individual sector ETFs signals large institutional position building in that sector
Signal: Yahoo Price Analytics
Property
Value
Domain
prices
Signal name
yahoo_prices
Output path
domain=prices/signal=yahoo_prices/
Glue table
stratum_gold_{env}.yahoo_prices
Source
Yahoo silver (silver_yahoo_prices)
Frequency
Daily, partitioned by ticker, year
ETL
etl/external/yahoo_prices_gold.py
Purpose
Per-ticker technical indicators computed from OHLCV silver data. This is a price-analytics table, not a strategy signal — its columns are reusable features that downstream signals (or notebooks) can consume.
Output Schema
Column
Type
Description
date
date
Trading date
ticker
string
Ticker symbol (partition key)
open, high, low, close
double
OHLC pass-through from silver
volume
bigint
Daily volume
trade_year, trade_month, trade_dow
int
Calendar dimensions
is_month_end
boolean
True if date == last_day(date)
daily_return_pct
double
(close - prev_close) / prev_close
log_return
double
ln(close / prev_close)
sma_50, sma_200
double
Simple moving averages
true_range
double
high - low
rolling_volatility_20d, rolling_volatility_30d
double
Standard deviation of daily returns
avg_volume_20d
double
20-day rolling mean volume
relative_volume
double
volume / avg_volume_20d
price_vs_200dma_pct
double
(close - sma_200) / sma_200
rsi_14
double
Relative Strength Index, 14-day
vol_zscore
double
Z-score of 30d realized vol against 1y rolling history
vpt_signal
double
daily_return_pct * 20d-mean(relative_volume)
cycle_score
double
Howard Marks-inspired greed/fear composite, clamped to [-1, +1]
cycle_score_smoothed
double
63-day average of cycle_score
record_pk
string
`md5(ticker
year
int
Year (partition key)
Interpretation
log_return is the primary input for portfolio-level signal aggregation (additive across time, symmetric for long/short).
cycle_score → +1 suggests peak greed (mean-reversion candidate); → −1 suggests deep fear (accumulation candidate). Built from price-vs-200dma, RSI, vol regime, and volume-price trend, each z-scored against its own 1-year history.
relative_volume > 2.0 with positive daily_return_pct flags institutional accumulation (used by etf_rotation signal).
WHERE ticker IN ('SPY','QQQ','IWM','TLT','GLD') ANDyear=2026
ORDER BY ticker, date;
Signal: Newsletter Inbox (LLM-Enriched)
Property
Value
Domain
email
Signal name
newsletter_inbox
Output path
domain=email/signal=newsletter_inbox/
Glue table
stratum_gold_email_{env}.newsletter_inbox (note: separate Glue catalog DB, region-pinned to eu-west-1)
Source
silver_email ⟕ DynamoDB enrichment cache ({project}-{env}-email-enrichment). Silver itself is fed by SES inbound mail and the hourly RSS / Atom poller (Reuters, Fed/ECB/BoE press, FT, CNBC, etc. — see docs/tech/ingestion-scheduling.md#rss-ingestion-phase-2). RSS items land with alias='rss' and source_type='rss'.
Frequency
Daily, 00:30 UTC, partitioned by alias, received_year, received_month
ETL
etl/external/email_gold.py
Enrichment
lambda/email_enricher/handler.py — Bedrock (Nova Micro by default) + Titan Text Embeddings v2
Purpose
Each inbound newsletter (or, in upcoming phases, RSS / Reuters item) is parsed by an LLM into a typed feature row suitable for ML or signal generation: sentiment, ticker mentions, named entities, claim type, time horizon, plus a 1024-dim embedding for clustering and similarity search. The regex-based v1 (topic_tags, is_newsletter_likely, body_summary_preview) is fully replaced — see git history if you need to reference it.
Architecture
SES inbound → email_parser (Lambda) → silver_email (Glue Spark)
│
▼
┌─────────────────────────────┐
│ enrichment_pending_lister │
│ Athena over last 7 days, │
│ filter cache hits │
└────────────┬────────────────┘
│ batches of 25
▼
┌─────────────────────────────┐
│ Step Functions Map │
│ maxConcurrency=10 │
└────────────┬────────────────┘
│
▼
┌─────────────────────────────┐
│ email_enricher (Lambda) │
│ Bedrock Converse │
│ + Titan embeddings │
│ → DynamoDB cache │
└────────────┬────────────────┘
│
▼
email_gold (Glue Spark)
silver ⟕ enrichment cache
Schema (selected columns; full list in lib/email_enrichment_schema.py)
Column
Type
Description
content_hash
string
sha256(subject|raw_object_key|sender_email) — stable id, joins to enrichment cache
L2-normalised Titan v2 vector over summary_long — use cosine similarity
enrichment_status
string
ok, truncated, parse_error, model_error, pending (filter on this)
enrichment_model
string
Bedrock model id (cost / model-version provenance)
enrichment_version
int
Bumped when prompt/schema changes; older rows eligible for re-enrichment
enriched_at, tokens_in, tokens_out
various
Provenance / cost attribution
Interpretation
enrichment_status = 'ok' is the default filter for any analytics query. pending means the silver row hasn’t been enriched yet (possible during catch-up); other statuses indicate a model or parse failure.
tickers are the primary join key to yahoo_prices and any market-data table. They’re uppercased and deduped at validation time; trust them as exact join keys.
is_actionable=true AND noise_score < 0.3 narrows to “this is a real analyst view, not a news aggregator”. Useful as an ML label.
embedding + cosine similarity is how the upcoming digest builder will dedupe stories across newsletters/RSS — see Phase 3.
Sentiment vs. confidence: a bullish view at confidence=0.4 is hedged; a neutral view at confidence=0.9 is a confident “no edge here”. Don’t collapse them.
Versioning
ENRICHMENT_VERSION (in lib/email_enrichment_schema.py) is the contract version between the prompt + schema and what’s persisted. Bumping it makes the pending-list Lambda re-enrich rows below the new version on the next scheduled run (within the 7-day lookback window). For a full backfill on a bump, run the lister manually with LOOKBACK_DAYS widened.
Downstream consumers
The newsletter_inbox table feeds the daily digest builder — see
docs/finance/digest.md. Phase 3 reads last-24h rows where
enrichment_status='ok', clusters by embedding similarity, scores
against a watchlist, and delivers a Markdown briefing daily at 07:00 UTC.