Data Profiling and Quality
Learn about data profiling and data quality in Azure Data Factory.
Data profiling and data quality are essential aspects of any data integration process, and Azure Data Factory provides tools to ensure that data is of high quality before it is processed. Azure Data Factory can profile data during the copying process and provide insights into the quality of the data. The service also allows for custom rules to be defined to validate data against business requirements. With the integration of Azure Databricks, Data Factory can leverage the power of big data processing to perform in-depth data profiling and quality analysis.
Data quality
Data quality is a critical factor in ensuring that data analytics and business intelligence applications provide accurate and reliable insights. Data profiling is the process of analyzing and assessing the quality and structure of data to ensure it is suitable for its intended purpose.
In this lesson, we discuss how data quality and profiling can be achieved in Azure Data Factory (ADF). As a prerequisite to adding data profiling, we first need data to work with.
Uploading data to Azure Blob
Before getting started with data quality, let’s quickly add data to our Azure Blob container created in an earlier lesson. We’ll use the moviesDB.csv file for this lesson. Follow the steps below to upload a new file into the Blob Storage:
Log in to the Azure portal and search for “Storage accounts.”
The storage account and corresponding container that we created in an earlier lesson should be available under the “Container” tab within storage accounts.
Select the
adftutorial
container.Click the “Upload” button inside the container to upload a new file.
Now, select the
moviesDB.csv
file and complete the upload.
Below is a glimpse of all the tasks listed above:
With this upload, the moviesDB.csv
file should be present in the adftutorial
blob container of the storage account. Now, we will use this data file to perform data ...