ETL Optimization → Data Platform

A PostgreSQL car-listings ETL took 16 hours to backfill one large brand against a 4-hour SLA. The real problem wasn't the engine — it was running an OLAP workload on an OLTP store. A primer on the option landscape and a two-horizon path from quick win to platform.

The problem

A PostgreSQL ETL populating an 80-column ads_<make> car-listings mart took 16 hours to backfill a single large brand (VW, BMW) against a sub-4-hour SLA. The daily incremental run (200K–5M rows) was already fast — so the instinct to “just make the pipeline faster” pointed at the wrong layer entirely.

Root cause: an OLAP workload on an OLTP engine

The mart is a wide, denormalized analytical table queried with aggregations across tens of millions of rows. PostgreSQL is row-oriented and built for OLTP — small, concurrent transactional writes. Running analytical scans on it reads roughly 80× more data than needed, and ON CONFLICT DO UPDATE upserts of ~5M rows per run generate dead tuples (MVCC bloat) that quietly degrade every later query.

The option landscape

Fourteen options, grouped by how far each moves from the status quo:

Horizon	Options	Buys you
Stay in Postgres	Surgical rewrite, vertical scaling, Citus, columnar extensions	Fastest path to the SLA, lowest risk
OLTP / OLAP split	Postgres + ClickHouse, Doris / StarRocks, TimescaleDB	Analytics on a columnar engine
Lakehouse / cloud	S3 + dbt + DuckDB / Spark, Iceberg medallion, BigQuery / Snowflake	Scale plus AI/ML readiness
Event backbone	Debezium + Kafka, Materialize / RisingWave	Real-time downstream, replay

Recommendation: two horizons

Horizon 1 — hit the SLA now. A surgical PostgreSQL rewrite (set-based backfill, batching, and an index/VACUUM strategy) clears the sub-4-hour target with no new infrastructure and no migration risk.

Horizon 2 — build the platform. Split analytics onto a columnar store and move toward a lakehouse so the wide mart lives where it belongs — unlocking compression, faster analytics, and reusable AI/ML features.

Sequencing this way ships the deadline-critical win first, then earns the platform investment with a quantified business case instead of a speculative big-bang rewrite.

What I traded away

Choosing the Postgres quick win first means temporarily keeping an OLAP workload on an OLTP engine. That trade was made deliberately: hitting the SLA buys the runway to do the platform migration properly, rather than under fire.