Ingestion Methods—CDC
Learn a real-time ingestion method: Change data capture.
We'll cover the following
In some cases, real-time data ingestion is important for businesses across various industries. It allows e-commerce and retail to have more accurate and rapid demand forecasts and adjust pricing quickly. Real-time ingestion can provide real-time IoT sensor alerts that help companies reduce downtime and optimize product performance.
Let’s look at a (near) real-time data ingestion method, change data capture (CDC), and its three different approaches.
Change data capture
Change data capture (CDC) is the process of ingesting changes from a source database. It provides real-time or near real-time data movement by moving data continuously as new database events occur. CDC is a very efficient way to move data across a wide area network, perfect for the cloud. There are many use cases for CDC. Here are a few examples:
Load real-time data into a data warehouse. Operational databases are not good for heavy analytical workloads. Therefore, operational data should be moved to a data warehouse to perform analysis. The traditional batch-based ETL has a latency issue. But with CDC, we can capture source data changes as they occur and deliver them to the data warehouse in real time.
Load real-time data into real-time frameworks. Database events can be delivered to real-time process engines like Apache Kafka and Apache Flink to apply transformations and provide real-time insights.
Data replication/synchronization. The source database might be located on an on-premises server. We can use CDC to capture and propagate every data change to the cloud. It can also be used to sync servers within the cloud.
Get hands-on with 1400+ tech skills courses.