Gold Layer Signals

Last updated: 2026-05-19

Overview

Gold signals are pre-computed, opinionated quantitative datasets built from one or more silver sources. They are designed for direct consumption in quant notebooks, backtesting engines, and the external API.

All gold signals are:

Partitioned by date (Parquet format)
Registered in Glue Catalog (stratum_gold_{env})
Lineage-tracked with cryptographic provenance
Tier-gated via Lake Formation

Signal: Yield Curve & Rates

Property	Value
Domain	rates
Signal name	yield_curve
Output path	`domain=rates/signal=yield_curve/`
Source	FRED silver (single-source)
Frequency	Daily
ETL	`etl/external/rates_gold.py`

Input Series

Series	Description	Frequency
DGS30	30-Year Treasury Constant Maturity Rate	daily
DGS10	10-Year Treasury Constant Maturity Rate	daily
DGS2	2-Year Treasury Constant Maturity Rate	daily
T10Y2Y	10Y minus 2Y Treasury (yield curve spread)	daily
DFF	Federal Funds Effective Rate	daily
FEDFUNDS	Federal Funds Rate	monthly

Output Schema

Column	Type	Description
date	timestamp	Observation date
DGS30	decimal	30-year yield
DGS10	decimal	10-year yield
DGS2	decimal	2-year yield
T10Y2Y	decimal	10Y-2Y spread (raw FRED)
DFF	decimal	Fed funds effective rate
FEDFUNDS	decimal	Fed funds rate (monthly, forward-filled)
dgs10_dgs2_spread	decimal	Computed 10Y-2Y spread
dgs30_dgs10_spread	decimal	30Y-10Y long-end term premium
is_inverted	boolean	True when T10Y2Y < 0
t10y2y_ma20	decimal	20-day moving average of T10Y2Y
t10y2y_ma60	decimal	60-day moving average of T10Y2Y
dgs10_momentum_20d	decimal	20-day rate change in DGS10
dgs10_momentum_60d	decimal	60-day rate change in DGS10
dgs30_momentum_20d	decimal	20-day rate change in DGS30
dgs30_momentum_60d	decimal	60-day rate change in DGS30
dff_delta_5d	decimal	5-day change in fed funds rate
fed_cycle	string	”hiking” / “cutting” / “hold”

Interpretation

is_inverted = True historically precedes recessions by 6-18 months
fed_cycle tracks Fed tightening/easing direction (useful for duration positioning)
dgs30_dgs10_spread widening indicates market repricing long-end duration risk (term premium increase)
dgs30_momentum_60d > 50bps signals accelerating long-term borrowing costs — historically precedes forced deleveraging in margin-heavy environments
momentum columns indicate speed of rate moves (convexity risk)

Signal: Risk Sentiment Composite

Property	Value
Domain	sentiment
Signal name	risk_composite
Output path	`domain=sentiment/signal=risk_composite/`
Sources	FRED + Yahoo Finance (cross-source)
Frequency	Daily
ETL	`etl/external/risk_sentiment_gold.py`

Input Series

Series	Source	Description	Frequency
VIXCLS	FRED	CBOE Volatility Index	daily
DTWEXBGS	FRED	Trade Weighted US Dollar Index	daily
UMCSENT	FRED	U. Michigan Consumer Sentiment	monthly
SPY	Yahoo	S&P 500 ETF price	daily
HYG	Yahoo	High Yield Corporate Bond ETF price	daily

Output Schema

Column	Type	Description
date	timestamp	Trading date
VIXCLS	decimal	VIX level
DTWEXBGS	decimal	Dollar index level
UMCSENT	decimal	Consumer sentiment (forward-filled)
SPY	decimal	S&P 500 ETF close
HYG	decimal	High yield bond ETF close
spy_momentum_20d	decimal	SPY 20-day return
hyg_spy_ratio	decimal	HYG/SPY price ratio
VIXCLS_zscore	decimal	VIX rolling 1Y z-score
DTWEXBGS_zscore	decimal	Dollar rolling 1Y z-score
UMCSENT_zscore	decimal	Sentiment rolling 1Y z-score
spy_momentum_20d_zscore	decimal	SPY momentum z-score
hyg_spy_ratio_zscore	decimal	Credit appetite z-score
risk_score	decimal	Composite (-1 to +1 range typical)
risk_score_smoothed	decimal	5-day smoothed composite
regime	string	”risk_on” / “neutral” / “risk_off”

Interpretation

risk_score > 0.5 → broad risk appetite (equities favored, vol suppressed, credit tight)
risk_score < -0.5 → defensive positioning (elevated vol, credit stress, weak momentum)
regime provides a discrete label for rule-based strategies
Use risk_score_smoothed to filter out single-day noise

Cross-Source Join Notes

See Cross-Source Joins for the alignment methodology. Key points:

Yahoo (SPY) defines the output calendar (NYSE trading days)
UMCSENT (monthly) is forward-filled to daily
All components z-scored over trailing 252 days before combining

Signal: Credit & Leverage

Property	Value
Domain	credit
Signal name	credit_leverage
Output path	`domain=credit/signal=credit_leverage/`
Sources	FRED + FINRA + TreasuryDirect (cross-source)
Frequency	Daily (margin debt forward-filled from monthly, auctions from event dates)
ETL	`etl/external/credit_leverage_gold.py`

Input Series

Series	Source	Description	Frequency
BAMLH0A0HYM2	FRED	ICE BofA US High Yield OAS	daily
BAMLC0A4CBBB	FRED	ICE BofA BBB US Corporate OAS	daily
margin_debt	FINRA	Total debit balances in margin accounts	monthly
free_credit_cash	FINRA	Free credit in cash accounts	monthly
free_credit_margin	FINRA	Free credit in margin accounts	monthly
Bond auctions	TreasuryDirect	20-30yr Treasury auction results (bid-to-cover)	event-driven

Output Schema

Column	Type	Description
date	timestamp	Trading date
BAMLH0A0HYM2	decimal	High yield OAS (bps)
BAMLC0A4CBBB	decimal	BBB corporate OAS (bps)
hy_bbb_spread_diff	decimal	HY minus BBB spread (flight-from-junk indicator)
margin_debt	decimal	Total margin debt (millions USD, forward-filled)
margin_debt_mom_1m	decimal	1-month margin debt change
margin_debt_mom_3m	decimal	3-month margin debt change
margin_debt_pct_chg_3m	decimal	3-month margin debt % change
net_credit_balance	decimal	Free credit minus margin debt
bond_bid_to_cover	decimal	Latest 30yr auction bid-to-cover ratio
bond_bid_to_cover_ma5	decimal	5-auction moving avg bid-to-cover
BAMLH0A0HYM2_zscore	decimal	HY spread rolling 1Y z-score
BAMLC0A4CBBB_zscore	decimal	BBB spread rolling 1Y z-score
hy_bbb_spread_diff_zscore	decimal	HY-BBB differential z-score
margin_debt_pct_chg_3m_zscore	decimal	Margin growth z-score
credit_stress_score	decimal	Composite stress score (avg of z-scores)
credit_stress_smoothed	decimal	5-day smoothed stress score
credit_regime	string	”stress” / “elevated” / “neutral” / “benign”
hy_spread_momentum_20d	decimal	20-day HY spread change
bbb_spread_momentum_20d	decimal	20-day BBB spread change

Interpretation

credit_stress_score > 1.0 → credit markets in stress (spreads widening + leverage growing = reflexive risk)
credit_regime = “stress” → historically precedes equity drawdowns by 1-4 weeks
hy_bbb_spread_diff widening → flight from junk, risk appetite collapsing (investors moving up the quality ladder)
margin_debt_pct_chg_3m > 5% combined with dgs30_momentum_60d > 50bps (from yield_curve signal) → the leverage-meets-rising-rates scenario where forced deleveraging becomes likely
bond_bid_to_cover falling → supply pressure on long-end Treasuries, market struggling to absorb issuance

Cross-Source Join Notes

Corporate spreads (FRED) define the output calendar (daily business days)
FINRA margin data (monthly) is forward-filled to daily
Treasury auction data is joined on auction date; between auctions the last value carries forward
All z-scored over trailing 252 days before composite aggregation

Research Layer: Signal Validation

Property	Value
Output path	`validation/signal={signal_name}/` and `validation/scorecard/`
Sources	All gold signals + Yahoo silver (SPY)
Frequency	Weekly (Sunday 08:00 UTC)
ETL	`etl/research/walk_forward_validation.py`

What It Does

Runs expanding-window walk-forward validation on every gold signal column against SPY forward returns (5d, 20d, 60d horizons). Answers: “does this signal actually predict anything, and is it still working?”

Methodology

Expanding window: Starts with 252 days of history, tests on next 63 days, slides forward by 21 days
Information Coefficient (IC): Spearman rank correlation between signal value and realized forward return
Hit Rate: % of days where signal direction matched return direction
t-statistic: Statistical significance of IC (threshold: |t| > 1.96)
Regime-conditional: Splits IC computation by regime column (fed_cycle, credit_regime, risk regime) to detect “this signal only works in hiking cycles”

Output: Signal Scorecard

Column	Description
signal_name	Signal column being validated
horizon_days	Forward return horizon (5, 20, or 60 days)
latest_ic	IC in most recent test window
latest_hit_rate	Hit rate in most recent window
avg_ic	Average IC across all historical windows
ic_volatility	Std dev of IC across windows (stability)
n_windows	Number of test windows evaluated
signal_health	”active” (IC > 0.03 and significant) / “weakening” (avg IC > 0.02 but latest not significant) / “dead”

Interpretation

signal_health = “active” → signal is currently predictive, safe to use in strategies
signal_health = “weakening” → historically worked but losing edge; monitor closely
signal_health = “dead” → no evidence of predictive power; do not trade on this signal
Regime-conditional IC reveals whether a signal only works in specific environments (e.g., credit_stress_score may be highly predictive in hiking cycles but useless during holds)

Signal: Energy Volatility

Property	Value
Domain	energy
Signal name	energy_volatility
Output path	`domain=energy/signal=energy_volatility/`
Sources	EIA (single-source)
Frequency	Daily
ETL	`etl/external/energy_volatility_gold.py`

Input Series

Series	Source	Description	Frequency
henry_hub_gas	EIA	Henry Hub Natural Gas Spot Price ($/MMBtu)	daily
wti_crude	EIA	WTI Crude Oil Spot Price ($/barrel)	daily
brent_crude	EIA	Brent Crude Oil Spot Price ($/barrel)	daily

Output Schema

Column	Type	Description
date	timestamp	Trading date
henry_hub_price	decimal	Natural gas spot price
wti_price	decimal	WTI crude spot price
brent_price	decimal	Brent crude spot price
gas_return_1d	decimal	Daily log return on gas
oil_return_1d	decimal	Daily log return on WTI
gas_vol_20d	decimal	20-day realized volatility (gas, annualized)
gas_vol_60d	decimal	60-day realized volatility (gas, annualized)
oil_vol_20d	decimal	20-day realized volatility (oil, annualized)
oil_vol_60d	decimal	60-day realized volatility (oil, annualized)
gas_oil_ratio	decimal	Henry Hub / WTI price ratio
gas_oil_ratio_zscore	decimal	Z-score of gas/oil ratio (252d window)
gas_momentum_20d	decimal	20-day % change in gas price
oil_momentum_20d	decimal	20-day % change in oil price
energy_vol_score	decimal	Composite energy volatility score
energy_vol_smoothed	decimal	5-day smoothed score
energy_regime	string	”spike” / “elevated” / “normal” / “suppressed”

Interpretation

energy_vol_score > 1.5 → extreme energy volatility event (spike)
energy_regime = “spike” historically coincides with equity drawdowns, particularly in energy-exposed sectors
gas_oil_ratio divergence (high z-score) signals energy substitution dynamics or supply dislocations
Validation target: XLE (Energy Select SPDR) — energy vol spikes predict negative XLE forward returns

Signal: Insurance Risk

Property	Value
Domain	insurance
Signal name	insurance_risk
Output path	`domain=insurance/signal=insurance_risk/`
Sources	FRED + FEMA (cross-source)
Frequency	Daily (monthly components forward-filled)
ETL	`etl/external/insurance_risk_gold.py`

Input Series

Series	Source	Description	Frequency
DGS10	FRED	10-Year Treasury Constant Maturity Rate	daily
MORTGAGE30US	FRED	30-Year Fixed Rate Mortgage Average	weekly
CPIAUCSL	FRED	CPI All Urban Consumers	monthly
all_disasters	FEMA	Disaster declarations (all types)	event-driven

Output Schema

Column	Type	Description
date	timestamp	Trading date
DGS10	decimal	10-year Treasury rate
MORTGAGE30US	decimal	30-year mortgage rate (forward-filled)
CPIAUCSL	decimal	CPI level (forward-filled)
disaster_count_monthly	int	Monthly disaster declarations (forward-filled)
disaster_count_3m	int	Trailing 3-month disaster count
disaster_count_12m	int	Trailing 12-month disaster count
cpi_yoy_pct	decimal	CPI year-over-year % change
disaster_frequency_zscore	decimal	Z-score of 12m disaster count
rate_environment_score	decimal	Z-score of DGS10 (high = good for insurers)
inflation_score	decimal	Z-score of CPI YoY (high = bad for insurers)
insurance_risk_score	decimal	Composite (positive = favorable for insurers)
insurance_risk_smoothed	decimal	5-day smoothed composite
insurance_regime	string	”favorable” / “neutral” / “stressed”

Interpretation

insurance_regime = “favorable” → rising rates + low catastrophe frequency + stable inflation (bullish insurers)
insurance_regime = “stressed” → falling rates + elevated disasters + rising inflation (bearish insurers)
Insurers’ earnings are mechanically driven by these factors: investment income (rates), claims (catastrophes), reserve adequacy (inflation)
Validation target: KIE (S&P Insurance ETF) — composite should predict KIE forward returns at 20-60d horizons

Cross-Source Join Notes

FRED daily rates define the output calendar
MORTGAGE30US (weekly) and CPIAUCSL (monthly) are forward-filled to daily
FEMA disaster counts are aggregated to monthly, then forward-filled to daily
All components z-scored over trailing 252 days before composite aggregation

Signal: Institutional Flow

Property	Value
Domain	institutional
Signal name	institutional_flow
Output path	`domain=institutional/signal=institutional_flow/`
Sources	SEC EDGAR 13F (single-source)
Frequency	Quarterly (updated weekly as new filings arrive)
ETL	`etl/external/institutional_flow_gold.py`

Input

Source	Description	Frequency
SEC 13F-HR filings	Holdings of institutional managers with >$100M AUM	Quarterly (45-day lag)

Output Schema

Column	Type	Description
cusip	string	Security identifier
issuer_name	string	Security name
filing_year	int	Filing year
filing_quarter	int	Filing quarter (1-4)
filing_period	string	E.g., “2026-Q1”
holder_count	int	Number of institutional holders
new_entries	int	Managers initiating new positions this quarter
full_exits	int	Managers fully exiting this quarter
significant_increases	int	Managers increasing >25%
significant_decreases	int	Managers decreasing >25%
total_institutional_shares	long	Aggregate shares held
total_institutional_value_thousands	long	Aggregate market value ($K)
net_shares_delta	long	Net share change vs. prior quarter
net_value_delta_thousands	long	Net value change ($K)
avg_portfolio_weight_pct	decimal	Mean portfolio weight across holders
max_portfolio_weight_pct	decimal	Highest single-manager weight
net_buyer_ratio	decimal	0-1 (1 = all buying, 0 = all selling)
herding_score	decimal	0-1 (1 = completely one-sided flow)
conviction_score	decimal	Weight × holder count (higher = more concentrated interest)
holder_count_delta	int	Change in number of holders vs. prior quarter
flow_zscore	decimal	Z-score of net_value_delta within the quarter
institutional_flow_score	decimal	Composite flow signal
flow_regime	string	”strong_accumulation” / “mild_accumulation” / “neutral” / “mild_distribution” / “strong_distribution”

Interpretation

flow_regime = “strong_accumulation” → many managers buying simultaneously (herding + net buying). Historically signals informed capital allocation.
herding_score > 0.7 → highly one-sided institutional flow (either strongly buying or selling). Directional conviction is high.
holder_count_delta increasing + net_buyer_ratio > 0.6 → broadening institutional interest (new managers entering)
flow_zscore > 2 → security receiving outsized capital relative to peers this quarter
conviction_score identifies securities where managers are making concentrated bets (high weight × many holders)

Caveats

13F data has a 45-day reporting lag. By the time a filing is public, the trade is 1.5-4.5 months old.
The signal’s value comes from (a) identifying persistent trends that continue past the disclosure date, and (b) detecting herd behavior early in the filing window when only some managers have reported.
Not all position changes are directional bets — some are index rebalancing, tax-loss harvesting, or fund inflows/outflows.

Signal: COT Positioning

Property	Value
Domain	positioning
Signal name	cot_positioning
Output path	`domain=positioning/signal=cot_positioning/`
Source	CFTC COT silver (single-source)
Frequency	Weekly (Friday)
ETL	`etl/external/cot_positioning_gold.py`

Input

CFTC Disaggregated Futures-Only report — all ~270 reported contracts. Key positioning categories: managed money (hedge funds/CTAs), producer/merchant (commercials), swap dealers, other reportables.

Output Schema

Column	Type	Description
report_date	date	Tuesday as-of date for the report
market_and_exchange_names	string	Contract name + exchange (e.g., “GOLD - COMMODITY EXCHANGE INC.”)
cftc_contract_market_code	string	Unique contract identifier
open_interest_all	int	Total open interest
m_money_long_all	int	Managed money gross long contracts
m_money_short_all	int	Managed money gross short contracts
m_money_spread_all	int	Managed money spread contracts
m_money_net_all	int	Managed money net (long - short)
m_money_net_pct_oi	double	Net managed money as % of open interest
commercial_net_all	int	Producer/merchant net position
commercial_net_pct_oi	double	Commercial net as % of open interest
spec_commercial_divergence	double	Spec net% - commercial net% (contrarian signal)
positioning_zscore_26w	double	Z-score of net managed money (26-week lookback)
positioning_zscore_52w	double	Z-score of net managed money (52-week lookback)
net_change_1w	int	Week-over-week change in net managed money
net_change_4w	int	4-week change in net managed money
positioning_momentum	string	accelerating_long/short, reversing_to_long/short, flat
crowding_score	double	Top-4 trader concentration asymmetry
positioning_regime	string	extreme_long/stretched_long/neutral/stretched_short/extreme_short

Interpretation

positioning_zscore_26w > 2.0 → “extreme_long” — trade is maximally crowded on the long side. Historically mean-reverting. Contrarian sell signal.
positioning_zscore_26w < -2.0 → “extreme_short” — specs are max short. Mean-reverting. Contrarian buy signal.
spec_commercial_divergence — when specs and commercials disagree strongly, commercials (who have physical exposure and structural information edge) tend to be right over 2-8 week horizons.
positioning_momentum = “reversing_to_long/short” — early detection of a positioning unwind. The weekly change direction has flipped vs the 4-week trend.
crowding_score — high values mean the top 4 traders dominate one side, making the market vulnerable to forced unwinds.

Key Contracts for Macro Trading

Contract Code	Market	Signal Relevance
088691	Gold (COMEX)	Rates/inflation hedge positioning
06765T	10-Year T-Note (CBOT)	Rates/duration positioning
13874+	E-mini S&P 500 (CME)	Equity risk appetite
023651	Crude Oil WTI (NYMEX)	Energy/inflation positioning
023391	Nat Gas (NYMEX)	Energy supply/demand
096742	EUR/USD (CME)	Dollar strength
099741	VIX (CBOE)	Volatility/tail-risk positioning

Caveats

COT data has a 3-day lag: published Friday, as-of Tuesday. Intraweek positioning changes are not captured.
Managed money category includes both trend-followers (CTAs) and discretionary macro funds — these may trade opposite directions.
Extreme positioning can persist for weeks before mean-reverting. Z-score > 2 is a necessary but not sufficient condition for reversal.
The signal works best at the 2-8 week horizon. Not useful for intraday or next-day trading.

Signal: Short Squeeze Potential

Property	Value
Domain	positioning
Signal name	short_squeeze
Output path	`domain=positioning/signal=short_squeeze/`
Source	FINRA Consolidated Short Interest silver
Frequency	Bimonthly (15th and end of month)
ETL	`etl/external/short_squeeze_gold.py`

Output Schema

Column	Type	Description
settlement_date	date	FINRA settlement date
symbol_code	string	Ticker symbol
issue_name	string	Company name
current_short_position	long	Total shares sold short
previous_short_position	long	Prior settlement’s short position
avg_daily_volume	long	20-day average daily volume
days_to_cover	double	Short position / avg daily volume
si_change	long	Absolute change in short position
si_change_pct	double	Percentage change in SI
dtc_zscore	double	Z-score of days-to-cover (6-month lookback)
squeeze_score	double	Composite squeeze potential score
squeeze_regime	string	high_squeeze_potential/building_pressure/covering_in_progress/low_risk/neutral

Interpretation

days_to_cover > 5 + si_change_pct > 10% → “high_squeeze_potential” — heavily shorted, getting worse, would take many days to cover
si_change_pct < -10% + days_to_cover > 3 → “covering_in_progress” — shorts are exiting, forced buying likely accelerating
dtc_zscore > 2 — historically extreme short interest relative to the stock’s own history

Signal: ETF Rotation

Property	Value
Domain	positioning
Signal name	etf_rotation
Output path	`domain=positioning/signal=etf_rotation/`
Source	Yahoo Finance silver (sector ETFs)
Frequency	Daily
ETL	`etl/external/etf_rotation_gold.py`

Input ETFs

Category	Tickers	Purpose
Cyclical	XLK, XLY, XLI, XLF, XLB	Risk-on sectors
Defensive	XLU, XLP, XLV, XLRE	Risk-off/safety sectors
Risk assets	SPY, QQQ, IWM, HYG	Broad risk appetite
Safe havens	TLT, IEF, SHY, GLD	Flight-to-safety flow

Output Schema

Column	Type	Description
date	date	Trading date
ticker	string	ETF symbol
dollar_volume	double	Close price x volume (daily dollar flow proxy)
relative_volume	double	Dollar volume / 20-day average (>1 = above-average flow)
return_5d/20d/60d	double	Multi-horizon momentum
risk_appetite_rvol	double	Cyclical avg relative volume - defensive avg relative volume
risk_appetite_momentum	double	Risk asset 5d return - safe haven 5d return
rotation_regime	string	strong_risk_on/mild_risk_on/neutral/mild_risk_off/strong_risk_off

Interpretation

risk_appetite_rvol > 0.3 + risk_appetite_momentum > 1.0 → “strong_risk_on” — institutional money flowing aggressively into cyclicals
risk_appetite_rvol < -0.3 + risk_appetite_momentum < -1.0 → “strong_risk_off” — flight to safety in progress
High relative_volume (>2.0) on individual sector ETFs signals large institutional position building in that sector

Signal: Yahoo Price Analytics

Property	Value
Domain	prices
Signal name	yahoo_prices
Output path	`domain=prices/signal=yahoo_prices/`
Glue table	`stratum_gold_{env}.yahoo_prices`
Source	Yahoo silver (silver_yahoo_prices)
Frequency	Daily, partitioned by `ticker, year`
ETL	`etl/external/yahoo_prices_gold.py`

Purpose

Per-ticker technical indicators computed from OHLCV silver data. This is a price-analytics table, not a strategy signal — its columns are reusable features that downstream signals (or notebooks) can consume.

Output Schema

Column	Type	Description
date	date	Trading date
ticker	string	Ticker symbol (partition key)
open, high, low, close	double	OHLC pass-through from silver
volume	bigint	Daily volume
trade_year, trade_month, trade_dow	int	Calendar dimensions
is_month_end	boolean	True if `date == last_day(date)`
daily_return_pct	double	`(close - prev_close) / prev_close`
log_return	double	`ln(close / prev_close)`
sma_50, sma_200	double	Simple moving averages
true_range	double	`high - low`
rolling_volatility_20d, rolling_volatility_30d	double	Standard deviation of daily returns
avg_volume_20d	double	20-day rolling mean volume
relative_volume	double	`volume / avg_volume_20d`
price_vs_200dma_pct	double	`(close - sma_200) / sma_200`
rsi_14	double	Relative Strength Index, 14-day
vol_zscore	double	Z-score of 30d realized vol against 1y rolling history
vpt_signal	double	`daily_return_pct * 20d-mean(relative_volume)`
cycle_score	double	Howard Marks-inspired greed/fear composite, clamped to `[-1, +1]`
cycle_score_smoothed	double	63-day average of `cycle_score`
record_pk	string	`md5(ticker
year	int	Year (partition key)

Interpretation

log_return is the primary input for portfolio-level signal aggregation (additive across time, symmetric for long/short).
cycle_score → +1 suggests peak greed (mean-reversion candidate); → −1 suggests deep fear (accumulation candidate). Built from price-vs-200dma, RSI, vol regime, and volume-price trend, each z-scored against its own 1-year history.
relative_volume > 2.0 with positive daily_return_pct flags institutional accumulation (used by etf_rotation signal).
vol_zscore > 1.5 signals expanded vol regime — typically risk-off; < −1.0 complacency.

Athena Queries

-- Latest analytics per ticker
SELECT * FROM stratum_gold_sandbox.yahoo_prices
WHERE ticker = 'SPY' AND year = 2026
ORDER BY date DESC LIMIT 30;

-- Cycle score across all ETFs
SELECT date, ticker, close, cycle_score, cycle_score_smoothed
FROM stratum_gold_sandbox.yahoo_prices
WHERE ticker IN ('SPY','QQQ','IWM','TLT','GLD') AND year = 2026
ORDER BY ticker, date;

Property	Value
Domain	email
Signal name	newsletter_inbox
Output path	`domain=email/signal=newsletter_inbox/`
Glue table	`stratum_gold_email_{env}.newsletter_inbox` (note: separate Glue catalog DB, region-pinned to eu-west-1)
Source	`silver_email` ⟕ DynamoDB enrichment cache (`{project}-{env}-email-enrichment`). Silver itself is fed by SES inbound mail and the hourly RSS / Atom poller (Reuters, Fed/ECB/BoE press, FT, CNBC, etc. — see `docs/tech/ingestion-scheduling.md#rss-ingestion-phase-2`). RSS items land with `alias='rss'` and `source_type='rss'`.
Frequency	Daily, 00:30 UTC, partitioned by `alias, received_year, received_month`
ETL	`etl/external/email_gold.py`
Enrichment	`lambda/email_enricher/handler.py` — Bedrock (Nova Micro by default) + Titan Text Embeddings v2

Purpose

Each inbound newsletter (or, in upcoming phases, RSS / Reuters item) is parsed by an LLM into a typed feature row suitable for ML or signal generation: sentiment, ticker mentions, named entities, claim type, time horizon, plus a 1024-dim embedding for clustering and similarity search. The regex-based v1 (topic_tags, is_newsletter_likely, body_summary_preview) is fully replaced — see git history if you need to reference it.

Architecture

SES inbound → email_parser (Lambda) → silver_email (Glue Spark)
                                          │
                                          ▼
                          ┌─────────────────────────────┐
                          │  enrichment_pending_lister  │
                          │  Athena over last 7 days,   │
                          │  filter cache hits          │
                          └────────────┬────────────────┘
                                       │ batches of 25
                                       ▼
                          ┌─────────────────────────────┐
                          │  Step Functions Map         │
                          │  maxConcurrency=10          │
                          └────────────┬────────────────┘
                                       │
                                       ▼
                          ┌─────────────────────────────┐
                          │  email_enricher (Lambda)    │
                          │  Bedrock Converse           │
                          │  + Titan embeddings         │
                          │  → DynamoDB cache           │
                          └────────────┬────────────────┘
                                       │
                                       ▼
                              email_gold (Glue Spark)
                              silver ⟕ enrichment cache

Schema (selected columns; full list in `lib/email_enrichment_schema.py`)

Column	Type	Description
content_hash	string	`sha256(subject\|raw_object_key\|sender_email)` — stable id, joins to enrichment cache
received_at	timestamp	Best-effort parse of the Date: header
alias	string	Recipient bucket (newsletter / signals / research) — partition
sender, sender_email, sender_domain	string	Routing metadata
subject, link_count, links	various	Surface fields from silver
summary_short	string (≤200)	Human-readable one-liner
summary_long	string (≤800)	Detailed summary; embedding source
topic_primary	string (enum)	`equities, rates, fx, credit, commodities, crypto, macro, geopolitics, corporate, other`
topic_secondary	array	Free-text refinement, ≤8 entries
tickers	array	Uppercased equity / ETF / crypto tickers — direct join key to market data
entities	array<struct<type,name>>	Typed entities (`company, person, central_bank, country, sector, commodity, currency`)
sentiment	string (enum)	`bullish, bearish, neutral, mixed`
sentiment_score	double	Continuous in `[-1, 1]`
confidence	double	Author hedging level in `[0, 1]` — distinct from sentiment
time_horizon	string (enum)	`intraday, days, weeks, months, quarters, years, unspecified`
claim_type	string (enum)	`forecast, recommendation, observation, news_report, analysis, marketing, confirmation`
numerical_claims	array<struct<value,unit,context>>	Quantitative assertions extracted from the body
is_actionable	boolean	True iff the email contains a tradable view
is_confirmation	boolean	True iff this is a subscription-confirmation email
is_marketing	boolean	True iff primarily promotional
noise_score	double	`0=original analyst content, 1=boilerplate/aggregated news`
language	string	ISO-639-1
embedding	array (1024)	L2-normalised Titan v2 vector over `summary_long` — use cosine similarity
enrichment_status	string	`ok, truncated, parse_error, model_error, pending` (filter on this)
enrichment_model	string	Bedrock model id (cost / model-version provenance)
enrichment_version	int	Bumped when prompt/schema changes; older rows eligible for re-enrichment
enriched_at, tokens_in, tokens_out	various	Provenance / cost attribution

Interpretation

enrichment_status = 'ok' is the default filter for any analytics query. pending means the silver row hasn’t been enriched yet (possible during catch-up); other statuses indicate a model or parse failure.
tickers are the primary join key to yahoo_prices and any market-data table. They’re uppercased and deduped at validation time; trust them as exact join keys.
is_actionable=true AND noise_score < 0.3 narrows to “this is a real analyst view, not a news aggregator”. Useful as an ML label.
embedding + cosine similarity is how the upcoming digest builder will dedupe stories across newsletters/RSS — see Phase 3.
Sentiment vs. confidence: a bullish view at confidence=0.4 is hedged; a neutral view at confidence=0.9 is a confident “no edge here”. Don’t collapse them.

Versioning

ENRICHMENT_VERSION (in lib/email_enrichment_schema.py) is the contract version between the prompt + schema and what’s persisted. Bumping it makes the pending-list Lambda re-enrich rows below the new version on the next scheduled run (within the 7-day lookback window). For a full backfill on a bump, run the lister manually with LOOKBACK_DAYS widened.

Downstream consumers

The newsletter_inbox table feeds the daily digest builder — see docs/finance/digest.md. Phase 3 reads last-24h rows where enrichment_status='ok', clusters by embedding similarity, scores against a watchlist, and delivers a Markdown briefing daily at 07:00 UTC.

Cost (rough estimate)

Athena Queries

-- Most actionable views in the last 7 days
SELECT received_at, sender_email, summary_short, sentiment_score, confidence, tickers
FROM stratum_gold_email_sandbox.newsletter_inbox
WHERE received_at >= current_timestamp - INTERVAL '7' DAY
  AND enrichment_status = 'ok'
  AND is_actionable
  AND noise_score < 0.3
ORDER BY confidence DESC;

-- Sentiment shift on a specific ticker
SELECT received_at, sender_domain, sentiment, sentiment_score, summary_short
FROM stratum_gold_email_sandbox.newsletter_inbox
WHERE enrichment_status = 'ok'
  AND contains(tickers, 'NVDA')
ORDER BY received_at DESC LIMIT 50;

-- Confirmation links pending click
SELECT received_at, subject, sender, links
FROM stratum_gold_email_sandbox.newsletter_inbox
WHERE enrichment_status = 'ok' AND is_confirmation;

Planned Signals

Signal	Domain	Status	Sources
dealer_gamma	positioning	Planned (needs SpotGamma or SqueezeMetrics subscription)	CBOE options data
macro_regime	macro	Planned	FRED (GDP, INDPRO, PAYEMS, UNRATE, UMCSENT)
inflation_liquidity	monetary	Planned	FRED (CPIAUCSL, M2SL, FEDFUNDS)
global_liquidity	monetary	Planned	FRED + ECB + BIS

Planned Research

Component	Status	GitHub Issue
ML signal combination (meta-model)	Planned	#1
Strategy backtesting framework	Planned	#2

Gold Layer Signals

Overview

Signal: Yield Curve & Rates

Input Series

Output Schema

Interpretation

Signal: Risk Sentiment Composite

Input Series

Output Schema

Interpretation

Cross-Source Join Notes

Signal: Credit & Leverage

Input Series

Output Schema

Interpretation

Cross-Source Join Notes

Research Layer: Signal Validation

What It Does

Methodology

Output: Signal Scorecard

Interpretation

Signal: Energy Volatility

Input Series

Output Schema

Interpretation

Signal: Insurance Risk

Input Series

Output Schema

Interpretation

Cross-Source Join Notes

Signal: Institutional Flow

Input

Output Schema

Interpretation

Caveats

Signal: COT Positioning

Input

Output Schema

Interpretation

Key Contracts for Macro Trading

Caveats

Signal: Short Squeeze Potential

Output Schema

Interpretation

Signal: ETF Rotation

Input ETFs

Output Schema

Interpretation

Signal: Yahoo Price Analytics

Purpose

Output Schema

Interpretation

Athena Queries

Signal: Newsletter Inbox (LLM-Enriched)

Purpose

Architecture

Schema (selected columns; full list in lib/email_enrichment_schema.py)

Interpretation

Versioning

Downstream consumers

Cost (rough estimate)

Athena Queries

Planned Signals

Planned Research

Schema (selected columns; full list in `lib/email_enrichment_schema.py`)