Data Profiling and Quality

Learn about data profiling and data quality in Azure Data Factory.

Data profiling and data quality are essential aspects of any data integration process, and Azure Data Factory provides tools to ensure that data is of high quality before it is processed. Azure Data Factory can profile data during the copying process and provide insights into the quality of the data. The service also allows for custom rules to be defined to validate data against business requirements. With the integration of Azure Databricks, Data Factory can leverage the power of big data processing to perform in-depth data profiling and quality analysis.

Data quality

Data quality is a critical factor in ensuring that data analytics and business intelligence applications provide accurate and reliable insights. Data profiling is the process of analyzing and assessing the quality and structure of data to ensure it is suitable for its intended purpose.

Press + to interact
Importance of data profiling in assisting with clean data
Importance of data profiling in assisting with clean data

In this lesson, we discuss how data quality and profiling can be achieved in Azure Data Factory (ADF). As a prerequisite to adding data profiling, we first need data to work with.

Uploading data to Azure Blob

Before getting started with data quality, let’s quickly add data to our Azure Blob container created in an earlier lesson. We’ll use the moviesDB.csv file for this lesson. Follow the steps below to upload a new file into the Blob Storage:

  1. Log in to the Azure portal and search for “Storage accounts.”

  2. The storage account and corresponding container that we created in an earlier lesson should be available under the “Container” tab within storage accounts.

  3. Select the adftutorial container.

  4. Click the “Upload” button inside the container to upload a new file.

  5. Now, select the moviesDB.csv file and complete the upload.

Below is a glimpse of all the tasks listed above:

Press + to interact
Search for storage accounts on the Azure portal UI
1 / 3
Search for storage accounts on the Azure portal UI

With this upload, the moviesDB.csv file should be present in the adftutorial blob container of the storage account. Now, we will use this data file to perform data ...