Skip to content

Gold Layer Signals

Last updated: 2026-05-19


Overview

Gold signals are pre-computed, opinionated quantitative datasets built from one or more silver sources. They are designed for direct consumption in quant notebooks, backtesting engines, and the external API.

All gold signals are:

  • Partitioned by date (Parquet format)
  • Registered in Glue Catalog (stratum_gold_{env})
  • Lineage-tracked with cryptographic provenance
  • Tier-gated via Lake Formation

Signal: Yield Curve & Rates

PropertyValue
Domainrates
Signal nameyield_curve
Output pathdomain=rates/signal=yield_curve/
SourceFRED silver (single-source)
FrequencyDaily
ETLetl/external/rates_gold.py

Input Series

SeriesDescriptionFrequency
DGS3030-Year Treasury Constant Maturity Ratedaily
DGS1010-Year Treasury Constant Maturity Ratedaily
DGS22-Year Treasury Constant Maturity Ratedaily
T10Y2Y10Y minus 2Y Treasury (yield curve spread)daily
DFFFederal Funds Effective Ratedaily
FEDFUNDSFederal Funds Ratemonthly

Output Schema

ColumnTypeDescription
datetimestampObservation date
DGS30decimal30-year yield
DGS10decimal10-year yield
DGS2decimal2-year yield
T10Y2Ydecimal10Y-2Y spread (raw FRED)
DFFdecimalFed funds effective rate
FEDFUNDSdecimalFed funds rate (monthly, forward-filled)
dgs10_dgs2_spreaddecimalComputed 10Y-2Y spread
dgs30_dgs10_spreaddecimal30Y-10Y long-end term premium
is_invertedbooleanTrue when T10Y2Y < 0
t10y2y_ma20decimal20-day moving average of T10Y2Y
t10y2y_ma60decimal60-day moving average of T10Y2Y
dgs10_momentum_20ddecimal20-day rate change in DGS10
dgs10_momentum_60ddecimal60-day rate change in DGS10
dgs30_momentum_20ddecimal20-day rate change in DGS30
dgs30_momentum_60ddecimal60-day rate change in DGS30
dff_delta_5ddecimal5-day change in fed funds rate
fed_cyclestring”hiking” / “cutting” / “hold”

Interpretation

  • is_inverted = True historically precedes recessions by 6-18 months
  • fed_cycle tracks Fed tightening/easing direction (useful for duration positioning)
  • dgs30_dgs10_spread widening indicates market repricing long-end duration risk (term premium increase)
  • dgs30_momentum_60d > 50bps signals accelerating long-term borrowing costs — historically precedes forced deleveraging in margin-heavy environments
  • momentum columns indicate speed of rate moves (convexity risk)

Signal: Risk Sentiment Composite

PropertyValue
Domainsentiment
Signal namerisk_composite
Output pathdomain=sentiment/signal=risk_composite/
SourcesFRED + Yahoo Finance (cross-source)
FrequencyDaily
ETLetl/external/risk_sentiment_gold.py

Input Series

SeriesSourceDescriptionFrequency
VIXCLSFREDCBOE Volatility Indexdaily
DTWEXBGSFREDTrade Weighted US Dollar Indexdaily
UMCSENTFREDU. Michigan Consumer Sentimentmonthly
SPYYahooS&P 500 ETF pricedaily
HYGYahooHigh Yield Corporate Bond ETF pricedaily

Output Schema

ColumnTypeDescription
datetimestampTrading date
VIXCLSdecimalVIX level
DTWEXBGSdecimalDollar index level
UMCSENTdecimalConsumer sentiment (forward-filled)
SPYdecimalS&P 500 ETF close
HYGdecimalHigh yield bond ETF close
spy_momentum_20ddecimalSPY 20-day return
hyg_spy_ratiodecimalHYG/SPY price ratio
VIXCLS_zscoredecimalVIX rolling 1Y z-score
DTWEXBGS_zscoredecimalDollar rolling 1Y z-score
UMCSENT_zscoredecimalSentiment rolling 1Y z-score
spy_momentum_20d_zscoredecimalSPY momentum z-score
hyg_spy_ratio_zscoredecimalCredit appetite z-score
risk_scoredecimalComposite (-1 to +1 range typical)
risk_score_smootheddecimal5-day smoothed composite
regimestring”risk_on” / “neutral” / “risk_off”

Interpretation

  • risk_score > 0.5 → broad risk appetite (equities favored, vol suppressed, credit tight)
  • risk_score < -0.5 → defensive positioning (elevated vol, credit stress, weak momentum)
  • regime provides a discrete label for rule-based strategies
  • Use risk_score_smoothed to filter out single-day noise

Cross-Source Join Notes

See Cross-Source Joins for the alignment methodology. Key points:

  • Yahoo (SPY) defines the output calendar (NYSE trading days)
  • UMCSENT (monthly) is forward-filled to daily
  • All components z-scored over trailing 252 days before combining

Signal: Credit & Leverage

PropertyValue
Domaincredit
Signal namecredit_leverage
Output pathdomain=credit/signal=credit_leverage/
SourcesFRED + FINRA + TreasuryDirect (cross-source)
FrequencyDaily (margin debt forward-filled from monthly, auctions from event dates)
ETLetl/external/credit_leverage_gold.py

Input Series

SeriesSourceDescriptionFrequency
BAMLH0A0HYM2FREDICE BofA US High Yield OASdaily
BAMLC0A4CBBBFREDICE BofA BBB US Corporate OASdaily
margin_debtFINRATotal debit balances in margin accountsmonthly
free_credit_cashFINRAFree credit in cash accountsmonthly
free_credit_marginFINRAFree credit in margin accountsmonthly
Bond auctionsTreasuryDirect20-30yr Treasury auction results (bid-to-cover)event-driven

Output Schema

ColumnTypeDescription
datetimestampTrading date
BAMLH0A0HYM2decimalHigh yield OAS (bps)
BAMLC0A4CBBBdecimalBBB corporate OAS (bps)
hy_bbb_spread_diffdecimalHY minus BBB spread (flight-from-junk indicator)
margin_debtdecimalTotal margin debt (millions USD, forward-filled)
margin_debt_mom_1mdecimal1-month margin debt change
margin_debt_mom_3mdecimal3-month margin debt change
margin_debt_pct_chg_3mdecimal3-month margin debt % change
net_credit_balancedecimalFree credit minus margin debt
bond_bid_to_coverdecimalLatest 30yr auction bid-to-cover ratio
bond_bid_to_cover_ma5decimal5-auction moving avg bid-to-cover
BAMLH0A0HYM2_zscoredecimalHY spread rolling 1Y z-score
BAMLC0A4CBBB_zscoredecimalBBB spread rolling 1Y z-score
hy_bbb_spread_diff_zscoredecimalHY-BBB differential z-score
margin_debt_pct_chg_3m_zscoredecimalMargin growth z-score
credit_stress_scoredecimalComposite stress score (avg of z-scores)
credit_stress_smootheddecimal5-day smoothed stress score
credit_regimestring”stress” / “elevated” / “neutral” / “benign”
hy_spread_momentum_20ddecimal20-day HY spread change
bbb_spread_momentum_20ddecimal20-day BBB spread change

Interpretation

  • credit_stress_score > 1.0 → credit markets in stress (spreads widening + leverage growing = reflexive risk)
  • credit_regime = “stress” → historically precedes equity drawdowns by 1-4 weeks
  • hy_bbb_spread_diff widening → flight from junk, risk appetite collapsing (investors moving up the quality ladder)
  • margin_debt_pct_chg_3m > 5% combined with dgs30_momentum_60d > 50bps (from yield_curve signal) → the leverage-meets-rising-rates scenario where forced deleveraging becomes likely
  • bond_bid_to_cover falling → supply pressure on long-end Treasuries, market struggling to absorb issuance

Cross-Source Join Notes

  • Corporate spreads (FRED) define the output calendar (daily business days)
  • FINRA margin data (monthly) is forward-filled to daily
  • Treasury auction data is joined on auction date; between auctions the last value carries forward
  • All z-scored over trailing 252 days before composite aggregation

Research Layer: Signal Validation

PropertyValue
Output pathvalidation/signal={signal_name}/ and validation/scorecard/
SourcesAll gold signals + Yahoo silver (SPY)
FrequencyWeekly (Sunday 08:00 UTC)
ETLetl/research/walk_forward_validation.py

What It Does

Runs expanding-window walk-forward validation on every gold signal column against SPY forward returns (5d, 20d, 60d horizons). Answers: “does this signal actually predict anything, and is it still working?”

Methodology

  1. Expanding window: Starts with 252 days of history, tests on next 63 days, slides forward by 21 days
  2. Information Coefficient (IC): Spearman rank correlation between signal value and realized forward return
  3. Hit Rate: % of days where signal direction matched return direction
  4. t-statistic: Statistical significance of IC (threshold: |t| > 1.96)
  5. Regime-conditional: Splits IC computation by regime column (fed_cycle, credit_regime, risk regime) to detect “this signal only works in hiking cycles”

Output: Signal Scorecard

ColumnDescription
signal_nameSignal column being validated
horizon_daysForward return horizon (5, 20, or 60 days)
latest_icIC in most recent test window
latest_hit_rateHit rate in most recent window
avg_icAverage IC across all historical windows
ic_volatilityStd dev of IC across windows (stability)
n_windowsNumber of test windows evaluated
signal_health”active” (IC > 0.03 and significant) / “weakening” (avg IC > 0.02 but latest not significant) / “dead”

Interpretation

  • signal_health = “active” → signal is currently predictive, safe to use in strategies
  • signal_health = “weakening” → historically worked but losing edge; monitor closely
  • signal_health = “dead” → no evidence of predictive power; do not trade on this signal
  • Regime-conditional IC reveals whether a signal only works in specific environments (e.g., credit_stress_score may be highly predictive in hiking cycles but useless during holds)

Signal: Energy Volatility

PropertyValue
Domainenergy
Signal nameenergy_volatility
Output pathdomain=energy/signal=energy_volatility/
SourcesEIA (single-source)
FrequencyDaily
ETLetl/external/energy_volatility_gold.py

Input Series

SeriesSourceDescriptionFrequency
henry_hub_gasEIAHenry Hub Natural Gas Spot Price ($/MMBtu)daily
wti_crudeEIAWTI Crude Oil Spot Price ($/barrel)daily
brent_crudeEIABrent Crude Oil Spot Price ($/barrel)daily

Output Schema

ColumnTypeDescription
datetimestampTrading date
henry_hub_pricedecimalNatural gas spot price
wti_pricedecimalWTI crude spot price
brent_pricedecimalBrent crude spot price
gas_return_1ddecimalDaily log return on gas
oil_return_1ddecimalDaily log return on WTI
gas_vol_20ddecimal20-day realized volatility (gas, annualized)
gas_vol_60ddecimal60-day realized volatility (gas, annualized)
oil_vol_20ddecimal20-day realized volatility (oil, annualized)
oil_vol_60ddecimal60-day realized volatility (oil, annualized)
gas_oil_ratiodecimalHenry Hub / WTI price ratio
gas_oil_ratio_zscoredecimalZ-score of gas/oil ratio (252d window)
gas_momentum_20ddecimal20-day % change in gas price
oil_momentum_20ddecimal20-day % change in oil price
energy_vol_scoredecimalComposite energy volatility score
energy_vol_smootheddecimal5-day smoothed score
energy_regimestring”spike” / “elevated” / “normal” / “suppressed”

Interpretation

  • energy_vol_score > 1.5 → extreme energy volatility event (spike)
  • energy_regime = “spike” historically coincides with equity drawdowns, particularly in energy-exposed sectors
  • gas_oil_ratio divergence (high z-score) signals energy substitution dynamics or supply dislocations
  • Validation target: XLE (Energy Select SPDR) — energy vol spikes predict negative XLE forward returns

Signal: Insurance Risk

PropertyValue
Domaininsurance
Signal nameinsurance_risk
Output pathdomain=insurance/signal=insurance_risk/
SourcesFRED + FEMA (cross-source)
FrequencyDaily (monthly components forward-filled)
ETLetl/external/insurance_risk_gold.py

Input Series

SeriesSourceDescriptionFrequency
DGS10FRED10-Year Treasury Constant Maturity Ratedaily
MORTGAGE30USFRED30-Year Fixed Rate Mortgage Averageweekly
CPIAUCSLFREDCPI All Urban Consumersmonthly
all_disastersFEMADisaster declarations (all types)event-driven

Output Schema

ColumnTypeDescription
datetimestampTrading date
DGS10decimal10-year Treasury rate
MORTGAGE30USdecimal30-year mortgage rate (forward-filled)
CPIAUCSLdecimalCPI level (forward-filled)
disaster_count_monthlyintMonthly disaster declarations (forward-filled)
disaster_count_3mintTrailing 3-month disaster count
disaster_count_12mintTrailing 12-month disaster count
cpi_yoy_pctdecimalCPI year-over-year % change
disaster_frequency_zscoredecimalZ-score of 12m disaster count
rate_environment_scoredecimalZ-score of DGS10 (high = good for insurers)
inflation_scoredecimalZ-score of CPI YoY (high = bad for insurers)
insurance_risk_scoredecimalComposite (positive = favorable for insurers)
insurance_risk_smootheddecimal5-day smoothed composite
insurance_regimestring”favorable” / “neutral” / “stressed”

Interpretation

  • insurance_regime = “favorable” → rising rates + low catastrophe frequency + stable inflation (bullish insurers)
  • insurance_regime = “stressed” → falling rates + elevated disasters + rising inflation (bearish insurers)
  • Insurers’ earnings are mechanically driven by these factors: investment income (rates), claims (catastrophes), reserve adequacy (inflation)
  • Validation target: KIE (S&P Insurance ETF) — composite should predict KIE forward returns at 20-60d horizons

Cross-Source Join Notes

  • FRED daily rates define the output calendar
  • MORTGAGE30US (weekly) and CPIAUCSL (monthly) are forward-filled to daily
  • FEMA disaster counts are aggregated to monthly, then forward-filled to daily
  • All components z-scored over trailing 252 days before composite aggregation

Signal: Institutional Flow

PropertyValue
Domaininstitutional
Signal nameinstitutional_flow
Output pathdomain=institutional/signal=institutional_flow/
SourcesSEC EDGAR 13F (single-source)
FrequencyQuarterly (updated weekly as new filings arrive)
ETLetl/external/institutional_flow_gold.py

Input

SourceDescriptionFrequency
SEC 13F-HR filingsHoldings of institutional managers with >$100M AUMQuarterly (45-day lag)

Output Schema

ColumnTypeDescription
cusipstringSecurity identifier
issuer_namestringSecurity name
filing_yearintFiling year
filing_quarterintFiling quarter (1-4)
filing_periodstringE.g., “2026-Q1”
holder_countintNumber of institutional holders
new_entriesintManagers initiating new positions this quarter
full_exitsintManagers fully exiting this quarter
significant_increasesintManagers increasing >25%
significant_decreasesintManagers decreasing >25%
total_institutional_shareslongAggregate shares held
total_institutional_value_thousandslongAggregate market value ($K)
net_shares_deltalongNet share change vs. prior quarter
net_value_delta_thousandslongNet value change ($K)
avg_portfolio_weight_pctdecimalMean portfolio weight across holders
max_portfolio_weight_pctdecimalHighest single-manager weight
net_buyer_ratiodecimal0-1 (1 = all buying, 0 = all selling)
herding_scoredecimal0-1 (1 = completely one-sided flow)
conviction_scoredecimalWeight × holder count (higher = more concentrated interest)
holder_count_deltaintChange in number of holders vs. prior quarter
flow_zscoredecimalZ-score of net_value_delta within the quarter
institutional_flow_scoredecimalComposite flow signal
flow_regimestring”strong_accumulation” / “mild_accumulation” / “neutral” / “mild_distribution” / “strong_distribution”

Interpretation

  • flow_regime = “strong_accumulation” → many managers buying simultaneously (herding + net buying). Historically signals informed capital allocation.
  • herding_score > 0.7 → highly one-sided institutional flow (either strongly buying or selling). Directional conviction is high.
  • holder_count_delta increasing + net_buyer_ratio > 0.6 → broadening institutional interest (new managers entering)
  • flow_zscore > 2 → security receiving outsized capital relative to peers this quarter
  • conviction_score identifies securities where managers are making concentrated bets (high weight × many holders)

Caveats

  • 13F data has a 45-day reporting lag. By the time a filing is public, the trade is 1.5-4.5 months old.
  • The signal’s value comes from (a) identifying persistent trends that continue past the disclosure date, and (b) detecting herd behavior early in the filing window when only some managers have reported.
  • Not all position changes are directional bets — some are index rebalancing, tax-loss harvesting, or fund inflows/outflows.

Signal: COT Positioning

PropertyValue
Domainpositioning
Signal namecot_positioning
Output pathdomain=positioning/signal=cot_positioning/
SourceCFTC COT silver (single-source)
FrequencyWeekly (Friday)
ETLetl/external/cot_positioning_gold.py

Input

CFTC Disaggregated Futures-Only report — all ~270 reported contracts. Key positioning categories: managed money (hedge funds/CTAs), producer/merchant (commercials), swap dealers, other reportables.

Output Schema

ColumnTypeDescription
report_datedateTuesday as-of date for the report
market_and_exchange_namesstringContract name + exchange (e.g., “GOLD - COMMODITY EXCHANGE INC.”)
cftc_contract_market_codestringUnique contract identifier
open_interest_allintTotal open interest
m_money_long_allintManaged money gross long contracts
m_money_short_allintManaged money gross short contracts
m_money_spread_allintManaged money spread contracts
m_money_net_allintManaged money net (long - short)
m_money_net_pct_oidoubleNet managed money as % of open interest
commercial_net_allintProducer/merchant net position
commercial_net_pct_oidoubleCommercial net as % of open interest
spec_commercial_divergencedoubleSpec net% - commercial net% (contrarian signal)
positioning_zscore_26wdoubleZ-score of net managed money (26-week lookback)
positioning_zscore_52wdoubleZ-score of net managed money (52-week lookback)
net_change_1wintWeek-over-week change in net managed money
net_change_4wint4-week change in net managed money
positioning_momentumstringaccelerating_long/short, reversing_to_long/short, flat
crowding_scoredoubleTop-4 trader concentration asymmetry
positioning_regimestringextreme_long/stretched_long/neutral/stretched_short/extreme_short

Interpretation

  • positioning_zscore_26w > 2.0 → “extreme_long” — trade is maximally crowded on the long side. Historically mean-reverting. Contrarian sell signal.
  • positioning_zscore_26w < -2.0 → “extreme_short” — specs are max short. Mean-reverting. Contrarian buy signal.
  • spec_commercial_divergence — when specs and commercials disagree strongly, commercials (who have physical exposure and structural information edge) tend to be right over 2-8 week horizons.
  • positioning_momentum = “reversing_to_long/short” — early detection of a positioning unwind. The weekly change direction has flipped vs the 4-week trend.
  • crowding_score — high values mean the top 4 traders dominate one side, making the market vulnerable to forced unwinds.

Key Contracts for Macro Trading

Contract CodeMarketSignal Relevance
088691Gold (COMEX)Rates/inflation hedge positioning
06765T10-Year T-Note (CBOT)Rates/duration positioning
13874+E-mini S&P 500 (CME)Equity risk appetite
023651Crude Oil WTI (NYMEX)Energy/inflation positioning
023391Nat Gas (NYMEX)Energy supply/demand
096742EUR/USD (CME)Dollar strength
099741VIX (CBOE)Volatility/tail-risk positioning

Caveats

  • COT data has a 3-day lag: published Friday, as-of Tuesday. Intraweek positioning changes are not captured.
  • Managed money category includes both trend-followers (CTAs) and discretionary macro funds — these may trade opposite directions.
  • Extreme positioning can persist for weeks before mean-reverting. Z-score > 2 is a necessary but not sufficient condition for reversal.
  • The signal works best at the 2-8 week horizon. Not useful for intraday or next-day trading.

Signal: Short Squeeze Potential

PropertyValue
Domainpositioning
Signal nameshort_squeeze
Output pathdomain=positioning/signal=short_squeeze/
SourceFINRA Consolidated Short Interest silver
FrequencyBimonthly (15th and end of month)
ETLetl/external/short_squeeze_gold.py

Output Schema

ColumnTypeDescription
settlement_datedateFINRA settlement date
symbol_codestringTicker symbol
issue_namestringCompany name
current_short_positionlongTotal shares sold short
previous_short_positionlongPrior settlement’s short position
avg_daily_volumelong20-day average daily volume
days_to_coverdoubleShort position / avg daily volume
si_changelongAbsolute change in short position
si_change_pctdoublePercentage change in SI
dtc_zscoredoubleZ-score of days-to-cover (6-month lookback)
squeeze_scoredoubleComposite squeeze potential score
squeeze_regimestringhigh_squeeze_potential/building_pressure/covering_in_progress/low_risk/neutral

Interpretation

  • days_to_cover > 5 + si_change_pct > 10% → “high_squeeze_potential” — heavily shorted, getting worse, would take many days to cover
  • si_change_pct < -10% + days_to_cover > 3 → “covering_in_progress” — shorts are exiting, forced buying likely accelerating
  • dtc_zscore > 2 — historically extreme short interest relative to the stock’s own history

Signal: ETF Rotation

PropertyValue
Domainpositioning
Signal nameetf_rotation
Output pathdomain=positioning/signal=etf_rotation/
SourceYahoo Finance silver (sector ETFs)
FrequencyDaily
ETLetl/external/etf_rotation_gold.py

Input ETFs

CategoryTickersPurpose
CyclicalXLK, XLY, XLI, XLF, XLBRisk-on sectors
DefensiveXLU, XLP, XLV, XLRERisk-off/safety sectors
Risk assetsSPY, QQQ, IWM, HYGBroad risk appetite
Safe havensTLT, IEF, SHY, GLDFlight-to-safety flow

Output Schema

ColumnTypeDescription
datedateTrading date
tickerstringETF symbol
dollar_volumedoubleClose price x volume (daily dollar flow proxy)
relative_volumedoubleDollar volume / 20-day average (>1 = above-average flow)
return_5d/20d/60ddoubleMulti-horizon momentum
risk_appetite_rvoldoubleCyclical avg relative volume - defensive avg relative volume
risk_appetite_momentumdoubleRisk asset 5d return - safe haven 5d return
rotation_regimestringstrong_risk_on/mild_risk_on/neutral/mild_risk_off/strong_risk_off

Interpretation

  • risk_appetite_rvol > 0.3 + risk_appetite_momentum > 1.0 → “strong_risk_on” — institutional money flowing aggressively into cyclicals
  • risk_appetite_rvol < -0.3 + risk_appetite_momentum < -1.0 → “strong_risk_off” — flight to safety in progress
  • High relative_volume (>2.0) on individual sector ETFs signals large institutional position building in that sector

Signal: Yahoo Price Analytics

PropertyValue
Domainprices
Signal nameyahoo_prices
Output pathdomain=prices/signal=yahoo_prices/
Glue tablestratum_gold_{env}.yahoo_prices
SourceYahoo silver (silver_yahoo_prices)
FrequencyDaily, partitioned by ticker, year
ETLetl/external/yahoo_prices_gold.py

Purpose

Per-ticker technical indicators computed from OHLCV silver data. This is a price-analytics table, not a strategy signal — its columns are reusable features that downstream signals (or notebooks) can consume.

Output Schema

ColumnTypeDescription
datedateTrading date
tickerstringTicker symbol (partition key)
open, high, low, closedoubleOHLC pass-through from silver
volumebigintDaily volume
trade_year, trade_month, trade_dowintCalendar dimensions
is_month_endbooleanTrue if date == last_day(date)
daily_return_pctdouble(close - prev_close) / prev_close
log_returndoubleln(close / prev_close)
sma_50, sma_200doubleSimple moving averages
true_rangedoublehigh - low
rolling_volatility_20d, rolling_volatility_30ddoubleStandard deviation of daily returns
avg_volume_20ddouble20-day rolling mean volume
relative_volumedoublevolume / avg_volume_20d
price_vs_200dma_pctdouble(close - sma_200) / sma_200
rsi_14doubleRelative Strength Index, 14-day
vol_zscoredoubleZ-score of 30d realized vol against 1y rolling history
vpt_signaldoubledaily_return_pct * 20d-mean(relative_volume)
cycle_scoredoubleHoward Marks-inspired greed/fear composite, clamped to [-1, +1]
cycle_score_smootheddouble63-day average of cycle_score
record_pkstring`md5(ticker
yearintYear (partition key)

Interpretation

  • log_return is the primary input for portfolio-level signal aggregation (additive across time, symmetric for long/short).
  • cycle_score → +1 suggests peak greed (mean-reversion candidate); → −1 suggests deep fear (accumulation candidate). Built from price-vs-200dma, RSI, vol regime, and volume-price trend, each z-scored against its own 1-year history.
  • relative_volume > 2.0 with positive daily_return_pct flags institutional accumulation (used by etf_rotation signal).
  • vol_zscore > 1.5 signals expanded vol regime — typically risk-off; < −1.0 complacency.

Athena Queries

-- Latest analytics per ticker
SELECT * FROM stratum_gold_sandbox.yahoo_prices
WHERE ticker = 'SPY' AND year = 2026
ORDER BY date DESC LIMIT 30;
-- Cycle score across all ETFs
SELECT date, ticker, close, cycle_score, cycle_score_smoothed
FROM stratum_gold_sandbox.yahoo_prices
WHERE ticker IN ('SPY','QQQ','IWM','TLT','GLD') AND year = 2026
ORDER BY ticker, date;

Signal: Newsletter Inbox (LLM-Enriched)

PropertyValue
Domainemail
Signal namenewsletter_inbox
Output pathdomain=email/signal=newsletter_inbox/
Glue tablestratum_gold_email_{env}.newsletter_inbox (note: separate Glue catalog DB, region-pinned to eu-west-1)
Sourcesilver_email ⟕ DynamoDB enrichment cache ({project}-{env}-email-enrichment). Silver itself is fed by SES inbound mail and the hourly RSS / Atom poller (Reuters, Fed/ECB/BoE press, FT, CNBC, etc. — see docs/tech/ingestion-scheduling.md#rss-ingestion-phase-2). RSS items land with alias='rss' and source_type='rss'.
FrequencyDaily, 00:30 UTC, partitioned by alias, received_year, received_month
ETLetl/external/email_gold.py
Enrichmentlambda/email_enricher/handler.py — Bedrock (Nova Micro by default) + Titan Text Embeddings v2

Purpose

Each inbound newsletter (or, in upcoming phases, RSS / Reuters item) is parsed by an LLM into a typed feature row suitable for ML or signal generation: sentiment, ticker mentions, named entities, claim type, time horizon, plus a 1024-dim embedding for clustering and similarity search. The regex-based v1 (topic_tags, is_newsletter_likely, body_summary_preview) is fully replaced — see git history if you need to reference it.

Architecture

SES inbound → email_parser (Lambda) → silver_email (Glue Spark)
┌─────────────────────────────┐
│ enrichment_pending_lister │
│ Athena over last 7 days, │
│ filter cache hits │
└────────────┬────────────────┘
│ batches of 25
┌─────────────────────────────┐
│ Step Functions Map │
│ maxConcurrency=10 │
└────────────┬────────────────┘
┌─────────────────────────────┐
│ email_enricher (Lambda) │
│ Bedrock Converse │
│ + Titan embeddings │
│ → DynamoDB cache │
└────────────┬────────────────┘
email_gold (Glue Spark)
silver ⟕ enrichment cache

Schema (selected columns; full list in lib/email_enrichment_schema.py)

ColumnTypeDescription
content_hashstringsha256(subject|raw_object_key|sender_email) — stable id, joins to enrichment cache
received_attimestampBest-effort parse of the Date: header
aliasstringRecipient bucket (newsletter / signals / research) — partition
sender, sender_email, sender_domainstringRouting metadata
subject, link_count, linksvariousSurface fields from silver
summary_shortstring (≤200)Human-readable one-liner
summary_longstring (≤800)Detailed summary; embedding source
topic_primarystring (enum)equities, rates, fx, credit, commodities, crypto, macro, geopolitics, corporate, other
topic_secondaryarrayFree-text refinement, ≤8 entries
tickersarrayUppercased equity / ETF / crypto tickers — direct join key to market data
entitiesarray<struct<type,name>>Typed entities (company, person, central_bank, country, sector, commodity, currency)
sentimentstring (enum)bullish, bearish, neutral, mixed
sentiment_scoredoubleContinuous in [-1, 1]
confidencedoubleAuthor hedging level in [0, 1] — distinct from sentiment
time_horizonstring (enum)intraday, days, weeks, months, quarters, years, unspecified
claim_typestring (enum)forecast, recommendation, observation, news_report, analysis, marketing, confirmation
numerical_claimsarray<struct<value,unit,context>>Quantitative assertions extracted from the body
is_actionablebooleanTrue iff the email contains a tradable view
is_confirmationbooleanTrue iff this is a subscription-confirmation email
is_marketingbooleanTrue iff primarily promotional
noise_scoredouble0=original analyst content, 1=boilerplate/aggregated news
languagestringISO-639-1
embeddingarray (1024)L2-normalised Titan v2 vector over summary_long — use cosine similarity
enrichment_statusstringok, truncated, parse_error, model_error, pending (filter on this)
enrichment_modelstringBedrock model id (cost / model-version provenance)
enrichment_versionintBumped when prompt/schema changes; older rows eligible for re-enrichment
enriched_at, tokens_in, tokens_outvariousProvenance / cost attribution

Interpretation

  • enrichment_status = 'ok' is the default filter for any analytics query. pending means the silver row hasn’t been enriched yet (possible during catch-up); other statuses indicate a model or parse failure.
  • tickers are the primary join key to yahoo_prices and any market-data table. They’re uppercased and deduped at validation time; trust them as exact join keys.
  • is_actionable=true AND noise_score < 0.3 narrows to “this is a real analyst view, not a news aggregator”. Useful as an ML label.
  • embedding + cosine similarity is how the upcoming digest builder will dedupe stories across newsletters/RSS — see Phase 3.
  • Sentiment vs. confidence: a bullish view at confidence=0.4 is hedged; a neutral view at confidence=0.9 is a confident “no edge here”. Don’t collapse them.

Versioning

ENRICHMENT_VERSION (in lib/email_enrichment_schema.py) is the contract version between the prompt + schema and what’s persisted. Bumping it makes the pending-list Lambda re-enrich rows below the new version on the next scheduled run (within the 7-day lookback window). For a full backfill on a bump, run the lister manually with LOOKBACK_DAYS widened.

Downstream consumers

The newsletter_inbox table feeds the daily digest builder — see docs/finance/digest.md. Phase 3 reads last-24h rows where enrichment_status='ok', clusters by embedding similarity, scores against a watchlist, and delivers a Markdown briefing daily at 07:00 UTC.

Cost (rough estimate)

Athena Queries

-- Most actionable views in the last 7 days
SELECT received_at, sender_email, summary_short, sentiment_score, confidence, tickers
FROM stratum_gold_email_sandbox.newsletter_inbox
WHERE received_at >= current_timestamp - INTERVAL '7' DAY
AND enrichment_status = 'ok'
AND is_actionable
AND noise_score < 0.3
ORDER BY confidence DESC;
-- Sentiment shift on a specific ticker
SELECT received_at, sender_domain, sentiment, sentiment_score, summary_short
FROM stratum_gold_email_sandbox.newsletter_inbox
WHERE enrichment_status = 'ok'
AND contains(tickers, 'NVDA')
ORDER BY received_at DESC LIMIT 50;
-- Confirmation links pending click
SELECT received_at, subject, sender, links
FROM stratum_gold_email_sandbox.newsletter_inbox
WHERE enrichment_status = 'ok' AND is_confirmation;

Planned Signals

SignalDomainStatusSources
dealer_gammapositioningPlanned (needs SpotGamma or SqueezeMetrics subscription)CBOE options data
macro_regimemacroPlannedFRED (GDP, INDPRO, PAYEMS, UNRATE, UMCSENT)
inflation_liquiditymonetaryPlannedFRED (CPIAUCSL, M2SL, FEDFUNDS)
global_liquiditymonetaryPlannedFRED + ECB + BIS

Planned Research

ComponentStatusGitHub Issue
ML signal combination (meta-model)Planned#1
Strategy backtesting frameworkPlanned#2