In this Cloud Lab, we will learn to create a data pipeline to efficiently gather, process, and securely store industrial data. We will start with creating IAM roles and an S3 bucket required for the pipeline. Next, we will connect our device with IoT Core by creating the required IoT Core infrastructure and then using the security credentials related to our infrastructure to establish a connection between our device and IoT Core. This will conclude the first part of the pipeline—-sending the device data to IoT Core.
The next step is to create a Kinesis Data Stream which will act as a data conduit, allowing data from IoT Core to flow through it efficiently. It will also set the stage for effective and secure transfer of data to Amazon S3 for long-term storage. Once the data stream is created, we will integrate it with IoT Core and test if the integration is working.
Next, we will set up an ETL (Extract, Transform, Load) job. This job is tasked with extracting data from the Kinesis Data Stream and efficiently transferring it to an Amazon S3 bucket for long-term storage. For that, we will first create a catalog table with our kinesis data stream as its source and then write an ETL job to transfer the data from the stream to an Amazon S3 bucket using this catalog table.
After completing this Cloud Lab, you will have gained hands-on experience in setting up an end-to-end data pipeline for IoT data using AWS IoT Core, Kinesis Data Streams, and Amazon S3.
Here is a high-level architecture diagram of the infrastructure that you will create in this Cloud Lab: