Skip to content

How Stratum works

Stratum turns raw external data into trustworthy quantitative signals. It does this with a medallion architecture — three refinement layers — plus end-to-end provenance so every number can be traced back to the data that produced it.

This page is a public overview. It deliberately omits deployment, account, and security implementation detail.

The three layers

Bronze — raw

Source responses are stored exactly as received (JSON, XML, or CSV depending on the source), alongside metadata: when it was fetched, the HTTP status, the source version, and a run identifier. Nothing is interpreted at this stage — bronze is the audit trail.

Silver — normalized

Bronze is cleaned, typed, and deduplicated into columnar Parquet, partitioned for efficient querying and versioned for point-in-time reproducibility. Silver tables are catalogued and queryable with standard SQL.

Gold — signals

Silver tables are joined across sources and scored into domain signals — the actual product. Gold is where, say, yield-curve data and macro series become a single interpretable indicator.

30+ sources ──▶ Bronze (raw) ──▶ Silver (normalized) ──▶ Gold (signals)
audit trail SQL-queryable the product

Provenance by design

Every refined record carries a lineage back to the exact source fetch that produced it. This is what makes Stratum’s output auditable: a signal isn’t just a number, it’s a number you can explain.

Reproducibility & idempotency

Each transformation is deterministic and idempotent — re-running a stage on the same input produces the same output, with no duplicate records. That property is what lets the platform be re-run, backfilled, and validated with confidence.

Built on AWS

Stratum is defined entirely as infrastructure-as-code and deployed through a CI/CD pipeline across isolated AWS accounts, with security guardrails enforced at the organization level. Ingestion runs on serverless compute; transformations run as managed Spark jobs; signals are queryable through a managed SQL engine.

What’s next

The pages alongside this one go deeper on the signals themselves (Gold signals), the validation methodology (Research layer), and how data from different sources is combined (Cross-source joins).