Automating Data Processing with AWS Glue DataBrew

CLOUD LABS

Automating Data Processing with AWS Glue DataBrew

In this Cloud Lab, you’ll learn how to process data with AWS Glue DataBrew. You’ll also learn how to automate data processing using the Lambda function.

10 Tasks

intermediate

1hr 30m

Certificate of Completion

Desktop OnlyDevice is not compatible.

No Setup Required

Amazon Web Services

Learning Objectives

Working knowledge of integrating AWS GlueDataBrew with Amazon S3

The ability to perform data processing steps (recipe) in AWS Glue DataBrew

Thorough understanding of jobs in AWS Glue DataBrew

Hands-on experience creating a trigger and invoking the AWS Glue DataBrew jobs with the Lambda function

Technologies

Lambda

Glue

Desktop Only

No Setup Required

Amazon Web Services

Labs Rules Apply

Stay within resource usage requirements.

Do not engage in cryptocurrency mining.

Do not engage in or encourage activity that is illegal.

Cloud Lab Overview

AWS Glue DataBrew is a fully managed service offered by Amazon Web Services (AWS). It is a data processing tool that can be used to clean, transform, and prepare large volumes of data for data analytics and machine learning tasks. AWS Glue DataBrew is a user-friendly data processing service because it provides a user-friendly interface, making it accessible to both technical and non-technical users.

In this Cloud Lab, you’ll create an S3 bucket and upload the data to be processed. After that, you’ll create execution roles allowing the intended use of AWS Glue DataBrew and AWS Lambda services. Afterward, you’ll create a dataset in AWS Glue DataBrew and connect it to the S3 bucket. You’ll also create an AWS Glue DataBrew Project and define data processing steps (recipe) that will be performed on the dataset. Moreover, you’ll create a job that can execute the project and save results in the S3 bucket. Lastly, you’ll automate this whole process by creating a Lambda function that will be triggered every time a new file is uploaded to the S3 bucket and will execute the job.

The following is the high-level architecture diagram of the infrastructure you’ll create in this Cloud Lab:

Cloud Lab Tasks

1.Introduction

Getting Started

2.Create an S3 Bucket and the Execution Roles

Create an S3 Bucket

Create the Execution Roles

3.Set Up an AWS Glue DataBrew Job

Set Up a Glue DataBrew Project

Set Up a Glue DataBrew Job

4.Automate Data Processing

Create a Lambda Function

Configure the Lambda Function

Test the Automation

5.Conclusion

Clean Up

Wrap Up

Labs Rules Apply

Stay within resource usage requirements.

Do not engage in cryptocurrency mining.

Do not engage in or encourage activity that is illegal.

Relevant Courses

Use the following content to review prerequisites or explore specific concepts in detail.

Hear what others have to say

Join 1.4 million developers working at companies like

"Your method is simple, straight to the point and I can practice with it everywhere, even from my phone, that's something I have never had in other learning platforms."

Felipe Matheus

Software Engineer

"I highly recommend Educative. The courses are well organized and easy to understand."

Adina Ong

Senior Engineering Manager

"I prefer Educative courses because they have a nice mix of text & images. I find that with full video courses, it can often be too easy to go into passive learning mode."

Clifford Fajardo

Senior Software Engineer

"I love the content on Educative and I feel as if I am definitely improving in my craft."

Thomas Chang

Software Engineer

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

Newsletter