From 24-hour batch delays to real-time data
How we transformed a logistics company's data infrastructure with CDC streaming, eliminating day-old data and untracked schema changes — without rebuilding their entire stack.
< 60s
Data latency
down from 24 hours
150+
Tables streamed
via CDC in real-time
100%
Data change visibility
with SCD2 tables
Client
Logistics Company
A digital freight platform connecting shippers and carriers across Europe, processing thousands of shipments daily and serving operational teams that depend on fresh data for real-time logistics decisions.
Industry
Logistics & Supply Chain
Cloud
Microsoft Azure
Stack
"Perigon played a key role in transforming our data infrastructure, integrating real-time streaming and analytics to enhance our insights."
Wenjia Tang
Global Head of Data
A solid foundation with a critical gap
The client had invested in a modern data stack — Snowflake, dbt, Airflow, Power BI — but the ingestion layer was still batch-oriented, creating a 24-hour blindspot for a business that operates in real-time.
24-hour data delay
Azure Data Factory batch exports ran once per day. Operational teams made decisions on yesterday's data — unacceptable for a real-time logistics business.
Silent schema changes
Product teams frequently altered SQL Server schemas — adding columns, changing types, renaming tables — without notifying the data team, breaking downstream pipelines.
Fragile batch pipeline
The ADF-to-Snowflake pipeline was brittle and hard to maintain. Any source-side change could silently corrupt data or cause silent failures that went undetected for days.
Stale dashboards
Power BI reports showed data that was up to 24 hours old. By the time issues were spotted in dashboards, the operational window to act had already closed.
Before & After
Toggle between the original batch architecture and the real-time streaming platform we built.
Production databases
Scheduled batch export
Staging files (CSV/Parquet)
Cloud data warehouse
SQL transforms (basic model)
Daily schedule (every 24h)
Dashboards (stale data)
What we built
Four key engineering decisions that transformed a daily batch pipeline into a real-time data platform.
CDC Streaming with Debezium & Kafka
We replaced the batch-oriented Azure Data Factory pipeline with Change Data Capture (CDC) using Debezium on Kafka Connect. Every insert, update, and delete in the source SQL Server databases is captured as an event and streamed through Apache Kafka — hosted on Aiven for fully-managed operations.
Streams, Tasks & Dynamic Tables in Snowflake
On the Snowflake side, we built a layered processing model using native Snowflake features to transform raw CDC events into clean, queryable tables — both for current state and full history.
Lambda dbt Model for Cost-effective Real-time
The existing dbt model was rebuilt using a lambda architecture pattern — combining a real-time stream layer with the periodic batch layer. This ensures fresh data is always available while maintaining the cost efficiency and data quality of batch processing.
YAML-driven Configuration & Code Generation
Instead of manually configuring each of the 150+ tables and their CDC connectors, we built a declarative configuration system. A single repository of YAML files defines every ingestion table, its schema mapping, and transformation rules.
The impact
Measurable improvements across latency, reliability, developer velocity, and business outcomes.
Perigon played a key role in transforming our data infrastructure, integrating real-time streaming and analytics to enhance our insights. The transition from batch to streaming was seamless, and the new YAML-driven configuration approach has made our pipeline dramatically easier to manage.
< 60s
End-to-end latency
From database change to queryable data in Snowflake, down from 24 hours.
150+
Tables in real-time
All critical business tables streamed via CDC with automatic schema evolution.
Live
Power BI dashboards
Operational teams now see real-time data, enabling immediate action.
100%
Schema change visibility
Debezium captures all DDL changes — the data team is never caught off guard.
~80%
Less ingestion code
YAML-driven generation replaced hundreds of manually written connector configs.
Minutes
New table onboarding
Adding a new source table to the pipeline: write YAML, push, done.
Facing similar challenges?
Whether you're stuck on batch pipelines, dealing with silent schema changes, or need to bring real-time data to your operational teams — let's talk about what a modern streaming architecture could look like for your business.
No commitment required. Let's start with a conversation about your challenges and see if we're a good fit.

