Streaming Data Pipelines with Apache Kafka and Delta Live Tables

Frank Munz
2 min readAug 19, 2022

How to extract features and prep your ML data with autoscaling, declarative, and low-latency data pipelines

Data Pipelines with Kafka

Delta Live Tables (DLT) is the first ETL framework that uses a simple declarative approach for creating reliable data pipelines and fully manages the underlying infrastructure at scale for batch and streaming data. Many use cases require actionable insights derived from near real-time data. Delta Live Tables enables low-latency streaming data pipelines to support such use cases with low latencies by directly ingesting data from event buses like Apache Kafka, AWS Kinesis, Confluent Cloud, Amazon MSK, or Azure Event Hubs.

This article will walk through using DLT with Apache Kafka while providing the required Python code to ingest streams. The recommended system architecture will be explained, and related DLT settings worth considering will be explored along the way.

Did you enjoy reading this article? Give it like 13 claps (they are free! and bring me joy) and follow me here on Medium. If you enjoy more cloud-based data science, data engineering, and AI/ML feel free to follow me on Twitter (or LinkedIn).

--

--

Frank Munz

Cloudy things, large-scale data & compute. Twitter @frankmunz. Former Tech Evangelist @awscloud, Principal @Databricks now. personal opinions here. #devrel ❤️.