Link Azure Storage to Azure Data Factory

Learn how to connect storage accounts to ADF, which is crucial for storing raw source and sink files processed by data factory operations.

Microsoft Azure offers Azure Storage, a highly scalable and secure cloud storage service. It provides a wide variety of storage services, including Blob StorageAzure Blob Storage is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data. for unstructured data, File StorageAzure Files is a cloud storage service designed for sharing files, development or debugging tools, and applications that rely on native file systems. for shared file access, Queue StorageAzure Queue Storage is a service for storing large numbers of messages. for dependable messaging, and Table StorageAzure Table storage is a service that stores non-relational structured data (also known as structured NoSQL data) in the cloud, providing a key/attribute store with a schemaless design. for NoSQL data. As a main data source or destination for data migration and transformation activities, Azure Storage is essential to Azure Data Factory (ADF). During pipeline runs, ADF uses Azure Storage to read and write data quickly and effectively. Users may ingest, process, and store data at scale with ADF’s seamless connection with Azure Storage and various data sources and sinks. Azure Storage’s reliability, durability, and global accessibility make it an essential component for building robust and scalable data integration solutions using Azure Data Factory.

For creating and linking storage instances to the data factory, ensure that the following prerequisites are completed in the Azure environment:

  • An active Azure subscription

  • An active resource group

  • An Azure Data Factory (ADF) instance

We have detailed the instructions for creating an Azure account and a data factory instance in previous lessons; be sure to complete them!

Creating and linking storage instances in Azure

Creating and linking storage instances in Azure is a fundamental step in setting up a data storage solution. The various storage options within Azure Storage are available across different regions and are designed to meet different performance, scalability, and cost requirements. To create a storage instance in Azure, specify the storage type, storage account name, replication type, and storage region. Access keysWhen you create a storage account, Azure generates two 512-bit storage account access keys for that account. These keys can be used to authorize access to data in your storage account via Shared Key authorization, or via SAS tokens that are signed with the shared key. or shared access signatures (SAS)A shared access signature (SAS) is a URI that grants restricted access rights to Azure Storage resources. You can provide a shared access signature to clients who shouldn't be trusted with your storage account key but who need access to certain storage account resources. can be used to link storage instances with other Azure services or external applications after they have been created. Linked storage instances can be used for data archiving, backup, recovery, analytics, and machine learning applications, among other things.

Azure Data Factory’s linked services create links between the platform and outside data sources or destinations, such as cloud services, file systems, and databases. They include the authentication information, connection string, and pertinent characteristics required to access the data. Linked services function as a bridge, enabling easy data transfer between Azure Data Factory and the linked data sources or destinations. They are essential for creating and maintaining the links necessary for effective data integration procedures.

Linked services offer the advantage of connection reuse across pipelines, saving time and effort. In larger organizations, using a single Linked service avoids creating duplicate connections for the same data source or destination, enhancing efficiency. They provide a straightforward and efficient way to connect to external data sources or destinations in Azure Data Factory. By defining necessary properties and authentication details, Linked services enable seamless integration and simplify connection management. This promotes connection reuse across pipelines, optimizing efficiency and minimizing duplication.

Step 1: Create an Azure Storage instance

In the earlier lessons, we created an Azure Data Factory instance and an Azure Storage instance. In this lesson, we will link the two together, so we have a storage layer for the data processing. Azure Storage will ensure the is a location where all raw, processed, and output files can stay. Let’s start by recapping the creation of an Azure Data Factory instance:

Note: The names of the Azure Data Factory (ADF) and Azure Storage instances will have to be a globally unique namestherefore the user will be unable to use the name used below.

Get hands-on with 1300+ tech skills courses.