Handling Incremental Loads and Change Data Capture
Learn how to handle incremental loads and change data capture in ADF.
In today’s world, data is generated in massive amounts and at a rapid pace. It is important to have systems that can handle such large data volumes and process them efficiently. When it comes to managing data pipelines, one of the most important considerations is how to handle incremental loads and change data capture. Incremental loads are when only the new or updated data is loaded into a target system, as opposed to a full refresh of all data. Change data capture is the process of identifying and capturing changes made to data sources over time. In this lesson, we’ll explore how Azure Data Factory (ADF) can be configured to handle these requirements.
Incremental data load
Incremental loads refer to the process of loading only the changes made to the data source since the last time the data was loaded. Instead of reloading all the data every time, only the changes are identified and loaded, which saves time and resources. Incremental loads are used to update target systems with only new or changed data without loading the entire dataset. This can save time and resources, especially when dealing with large datasets.
Types of incremental loads in Azure Data Factory
Incremental loads can be performed in several ways in ADF:
Full load with delta detection: In this approach, the entire data is loaded initially, and then only the changes are detected and loaded in subsequent runs. This is useful when the size of the data source is relatively small.
Append-only: In this approach, the data is appended to the existing data in the destination. This is useful when the data source is too large to be loaded entirely every time.
Incremental copy activity: This activity is used to copy only new or updated data from the source system to the target system. This is done by comparing the source and target data based on a specified key column. If there is a match, the data is skipped, but if there is no match, the data is copied to the target system.
Lookup activity: This activity is used to check whether a record already exists in the target system. It can be used to determine whether an insert or update operation should be performed.
Stored procedures: Stored procedures can be used to update data incrementally by performing delta updates. This involves updating only the changed rows in the target system based on a specified key column.
Change data capture (CDC)
Change data capture (CDC) is a technique used to capture the changes made to a data source. It identifies the changes made to the source since the last time the data was loaded and extracts only the changed data, thus reducing the amount of data to be processed. This can be useful when data needs to be tracked and audited.
Get hands-on with 1300+ tech skills courses.