Case Study — Data Engineering

From 24-hour batch delays to real-time data

How we transformed a logistics company's data infrastructure with CDC streaming, eliminating day-old data and untracked schema changes — without rebuilding their entire stack.

< 60s

Data latency

down from 24 hours

150+

Tables streamed

via CDC in real-time

100%

Data change visibility

with SCD2 tables

Client

Logistics Company

A digital freight platform connecting shippers and carriers across Europe, processing thousands of shipments daily and serving operational teams that depend on fresh data for real-time logistics decisions.

Industry

Logistics & Supply Chain

Cloud

Microsoft Azure

Stack

SnowflakeKafkaDebeziumdbtAirflowSQL ServerPower BI
"Perigon played a key role in transforming our data infrastructure, integrating real-time streaming and analytics to enhance our insights."
WT

Wenjia Tang

Global Head of Data

01The Challenge

A solid foundation with a critical gap

The client had invested in a modern data stack — Snowflake, dbt, Airflow, Power BI — but the ingestion layer was still batch-oriented, creating a 24-hour blindspot for a business that operates in real-time.

24-hour data delay

Azure Data Factory batch exports ran once per day. Operational teams made decisions on yesterday's data — unacceptable for a real-time logistics business.

Silent schema changes

Product teams frequently altered SQL Server schemas — adding columns, changing types, renaming tables — without notifying the data team, breaking downstream pipelines.

Fragile batch pipeline

The ADF-to-Snowflake pipeline was brittle and hard to maintain. Any source-side change could silently corrupt data or cause silent failures that went undetected for days.

Stale dashboards

Power BI reports showed data that was up to 24 hours old. By the time issues were spotted in dashboards, the operational window to act had already closed.

02Architecture

Before & After

Toggle between the original batch architecture and the real-time streaming platform we built.

Latency: ~24 hours
Batch Architecture7 components
Source
SQL Server

Production databases

Ingestion
Azure Data Factory

Scheduled batch export

Azure Blob Storage

Staging files (CSV/Parquet)

Warehouse & Transform
Snowflake

Cloud data warehouse

dbt

SQL transforms (basic model)

Orchestration & Analytics
Airflow

Daily schedule (every 24h)

Power BI

Dashboards (stale data)

03The Solution

What we built

Four key engineering decisions that transformed a daily batch pipeline into a real-time data platform.

01

CDC Streaming with Debezium & Kafka

We replaced the batch-oriented Azure Data Factory pipeline with Change Data Capture (CDC) using Debezium on Kafka Connect. Every insert, update, and delete in the source SQL Server databases is captured as an event and streamed through Apache Kafka — hosted on Aiven for fully-managed operations.

Debezium SQL Server connectors capture row-level changes in real-time
Kafka on Aiven provides durable, scalable event streaming with zero ops overhead
Snowflake Sink Connector with Snowpipe Streaming delivers data in under 60 seconds
Schema changes are automatically captured — no more silent breakage
02

Streams, Tasks & Dynamic Tables in Snowflake

On the Snowflake side, we built a layered processing model using native Snowflake features to transform raw CDC events into clean, queryable tables — both for current state and full history.

Streams & Tasks produce latest-state tables for 150+ source tables, optimized for cost at scale
Dynamic Tables generate SCD Type 2 history — dramatically simpler than manual merge logic
The combination of both gives teams current data and full audit trails simultaneously
We chose Streams & Tasks over Dynamic Tables for latest-state due to better cost/performance at our scale
03

Lambda dbt Model for Cost-effective Real-time

The existing dbt model was rebuilt using a lambda architecture pattern — combining a real-time stream layer with the periodic batch layer. This ensures fresh data is always available while maintaining the cost efficiency and data quality of batch processing.

Real-time stream layer provides immediate data for operational dashboards
Batch layer periodically reconciles and provides the complete, validated dataset
Lambda merge tables combine both, giving downstream consumers the best of both worlds
Power BI dashboards now show live data instead of yesterday's numbers
04

YAML-driven Configuration & Code Generation

Instead of manually configuring each of the 150+ tables and their CDC connectors, we built a declarative configuration system. A single repository of YAML files defines every ingestion table, its schema mapping, and transformation rules.

Each table is defined once in YAML — connector config, schema, transforms, and tests
Debezium connector configs and Snowflake DDL are auto-generated from these definitions
Adding a new table to the pipeline takes minutes instead of hours of manual coding
The YAML repo serves as a living documentation of the entire ingestion landscape
04Results

The impact

Measurable improvements across latency, reliability, developer velocity, and business outcomes.

Perigon played a key role in transforming our data infrastructure, integrating real-time streaming and analytics to enhance our insights. The transition from batch to streaming was seamless, and the new YAML-driven configuration approach has made our pipeline dramatically easier to manage.

— Wenjia Tang, Global Head of Data

< 60s

End-to-end latency

From database change to queryable data in Snowflake, down from 24 hours.

150+

Tables in real-time

All critical business tables streamed via CDC with automatic schema evolution.

Live

Power BI dashboards

Operational teams now see real-time data, enabling immediate action.

100%

Schema change visibility

Debezium captures all DDL changes — the data team is never caught off guard.

~80%

Less ingestion code

YAML-driven generation replaced hundreds of manually written connector configs.

Minutes

New table onboarding

Adding a new source table to the pipeline: write YAML, push, done.

05Get in Touch

Facing similar challenges?

Whether you're stuck on batch pipelines, dealing with silent schema changes, or need to bring real-time data to your operational teams — let's talk about what a modern streaming architecture could look like for your business.

No commitment required. Let's start with a conversation about your challenges and see if we're a good fit.