Home/Resources/Guides/Medallion Architecture
Lakehouse Architecture

Medallion Architecture Migration Guide

Implement medallion architecture (Bronze, Silver, Gold) in 2-4 weeks. Organize lakehouse data with clear lineage. 90% faster analytics, incremental processing, data quality.

90%
Faster Analytics
2-4 Weeks
Implementation Time
3 Layers
Bronze, Silver, Gold
100%
Data Lineage

What is Medallion Architecture?

Medallion architecture is a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally improving the structure and quality of data as it flows through each layer (Bronze → Silver → Gold).

Bronze Layer (Raw)

Raw data ingested from source systems with minimal transformation. Preserves original data for auditing and reprocessing.

  • Purpose: Data ingestion and historical archive
  • Format: As-is from source (JSON, CSV, Parquet, Avro)
  • Schema: Schema-on-read, flexible structure
  • Quality: No validation, may contain duplicates/errors
  • Users: Data engineers, data scientists (exploratory)

Silver Layer (Refined)

Cleaned, validated, and enriched data. Standardized formats with quality checks applied.

  • Purpose: Data quality and standardization
  • Format: Standardized (Delta Lake, Iceberg, Hudi)
  • Schema: Enforced schema with data types
  • Quality: Validated, deduplicated, null handling
  • Users: Data scientists, ML engineers, analysts

Gold Layer (Curated)

Business-ready aggregates, features, and metrics optimized for consumption by BI tools and applications.

  • Purpose: Business analytics and reporting
  • Format: Optimized tables (star schema, denormalized)
  • Schema: Business-friendly column names and structure
  • Quality: Production-grade, SLA-backed
  • Users: Business analysts, executives, BI tools

4-Phase Implementation Process

1

Design Layer Structure

Days 1-3
  • Map source systems to Bronze tables
  • Define Silver layer transformations and quality rules
  • Design Gold layer aggregates and business metrics
  • Plan incremental processing strategy
2

Build Bronze Layer

Week 1
  • Set up ingestion pipelines from source systems
  • Implement change data capture (CDC) where needed
  • Store raw data with metadata (ingestion timestamp, source)
  • Validate data arrival and completeness
3

Build Silver Layer

Weeks 2-3
  • Apply data quality rules (deduplication, null handling)
  • Standardize data types and formats
  • Enrich data with lookups and joins
  • Implement incremental processing (merge/upsert)
4

Build Gold Layer

Week 4
  • Create business aggregates and metrics
  • Build star schema or denormalized tables
  • Optimize for BI tool performance (partitioning, indexing)
  • Connect BI tools and validate reports

Medallion Architecture Best Practices

Use Incremental Processing

Process only new/changed data in each layer using watermarks, CDC, or merge operations. This reduces processing time from hours to minutes and enables near-real-time analytics.

Preserve Raw Data in Bronze

Never delete or modify Bronze layer data. It serves as your source of truth for reprocessing if Silver/Gold logic changes or data quality issues are discovered.

Implement Data Quality Checks

Add quality checks at Silver layer: completeness, accuracy, consistency, uniqueness, validity, timeliness. Quarantine bad data instead of failing pipelines.

Use Consistent Naming Conventions

Prefix tables with layer name (bronze_*, silver_*, gold_*) and use descriptive names. This makes data lineage clear and prevents accidental cross-layer queries.

Optimize Gold for Consumption

Denormalize Gold tables for BI tool performance. Pre-aggregate common metrics. Use partitioning and Z-ordering for fast queries. Gold should be optimized for reads, not writes.

Track Data Lineage

Document transformations between layers. Use metadata tables to track source → Bronze → Silver → Gold lineage. This enables impact analysis and troubleshooting.

People Also Ask

Do I need all three layers (Bronze, Silver, Gold)?

Not always. For simple use cases, you might skip Bronze and ingest directly to Silver. However, Bronze provides valuable benefits: data recovery if transformations fail, ability to reprocess with new logic, and audit trail of raw data. For production systems, all three layers are recommended.

Can I have multiple Gold layers for different use cases?

Yes. It's common to have multiple Gold layers optimized for different consumers: gold_bi for BI tools, gold_ml for machine learning features, gold_api for application APIs. Each can have different aggregation levels and optimization strategies.

How do I handle slowly changing dimensions (SCD) in medallion architecture?

Implement SCD in Silver layer. Use SCD Type 2 (historical tracking) by adding effective_date, end_date, and is_current columns. Bronze stores raw snapshots, Silver maintains history, and Gold can present either current state or historical views depending on business needs.

What's the difference between medallion and lambda architecture?

Lambda architecture separates batch and streaming into different paths (batch layer + speed layer + serving layer). Medallion architecture unifies batch and streaming in a single path (Bronze → Silver → Gold) using technologies like Delta Lake that support both. Medallion is simpler to maintain with one codebase instead of two.

How long should I retain data in each layer?

Bronze: Retain indefinitely or per compliance requirements (7+ years). Silver: Retain 1-3 years for analysis and ML training. Gold: Retain 6-12 months for active reporting, archive older data. Use lifecycle policies to automatically move cold data to cheaper storage tiers.

Ready to Implement Medallion Architecture?

Get a free architecture review and implementation plan from our lakehouse experts.