Data Transformations with ADF: Power Query

Learn how to use Power Query in Azure Data Factory to build data pipeline for efficient data manipulation and processing.

Data processing and manipulation is a crucial part of modern businesses. As companies deal with increasingly larger datasets, data processing becomes more complex and requires efficient tools. One such tool is Power Query, a data transformation and cleaning tool available in ADF. This lesson will discuss how Power Query can be used as a data pipeline in Azure Data Factory and how it helps in data manipulation and processing.

Power Query in Azure Data Factory

Power Query is a powerful data transformation and cleaning tool provided by Microsoft. It is available as an add-in for Excel and can be used with various other Microsoft products, including Azure Data Factory. Power Query enables users to perform various data transformations, such as filtering, merging, and pivoting. It provides a user-friendly interface that allows users to create modifications using a drag-and-drop approach.

Importance of Power Query in data manipulation and processing

Data cleaning and transformation are essential steps in data processing. Raw data often contains errors, inconsistencies, and missing values, making it difficult to analyze and draw meaningful insights. Power Query helps to overcome these challenges by providing a range of data transformation capabilities. Some of the benefits of using Power Query in data manipulation and processing include:

Microsoft Learn documentation provides details on how power queries are a derived form of calculations done on other familiar platforms like Power BI, Excel and Access.

  • Data consistency: Power Query allows users to transform raw data into a consistent format, making it easier to analyze and draw meaningful insights.

  • Time-saving: Power Query’s drag-and-drop interface and its ability to create repeatable transformations, help to save time in data cleaning and transformation.

  • Data integration: Power Query can be used to integrate data from multiple sources, such as databases, spreadsheets, and cloud-based data services, making it easier to analyze and draw insights from various data sources.

Running Power Query as a data pipeline in ADF

Power Query can be used as a data pipeline in Azure Data Factory to perform various data transformations and cleaning operations. Below are the steps to create a power query data pipeline in Azure Data Factory.

Power Query data pipeline

In ADF, Power Query can be used as a data pipeline activity to perform data transformation tasks. The Power Query data pipeline activity in Azure Data Factory provides a visual interface that allows users to build data transformation workflows using a drag-and-drop approach. Users can select the input data source, apply transformations, and specify the target data destination using a graphical user interface.

The Power Query data pipeline activity in Azure Data Factory is based on the Power Query language used to express data transformation tasks. The language provides a rich set of data transformation functions, such as filtering, sorting, grouping, aggregating, and pivoting. The Power Query data pipeline activity in Azure Data Factory can transform data from a wide range of data sources, including Excel, CSV, JSON, XML, SQL Server, Oracle, and other sources. It also provides the ability to connect to cloud-based data stores such as Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.

Once a Power Query data pipeline activity is defined in Azure Data Factory, it can be executed as part of a pipeline run. The pipeline run orchestrates the execution of the activity and manages the flow of data between the input and output data stores. Overall, the Power Query data pipeline activity in Azure Data Factory provides a powerful and flexible mechanism for data transformation tasks and allows users to build complex data transformation workflows simply and intuitively.

Step 1: Prepare the source data

In this step, we’ll prepare the source data for our power query operations. To ensure the source data is set up, follow the steps below:

  1. Sign in to ADF, select “Power Query” from the left navigation pane, and create a “New power query.”

  2. Select the PII sample dataset created earlier from the dataset options in ADF. We’ll be using this data for our Power Query transformations.

  3. The data preview shows a glimpse of the source dataset.

The illustrations below can be used for reference on the steps explained above.

Get hands-on with 1300+ tech skills courses.