AWS Glue DataBrew is a fully managed service offered by Amazon Web Services (AWS). It is a data processing tool that can be used to clean, transform, and prepare large volumes of data for data analytics and machine learning tasks. AWS Glue DataBrew is a user-friendly data processing service because it provides a user-friendly interface, making it accessible to both technical and non-technical users.
In this Cloud Lab, you’ll create an S3 bucket and upload the data to be processed. After that, you’ll create execution roles allowing the intended use of AWS Glue DataBrew and AWS Lambda services. Afterward, you’ll create a dataset in AWS Glue DataBrew and connect it to the S3 bucket. You’ll also create an AWS Glue DataBrew Project and define data processing steps (recipe) that will be performed on the dataset. Moreover, you’ll create a job that can execute the project and save results in the S3 bucket. Lastly, you’ll automate this whole process by creating a Lambda function that will be triggered every time a new file is uploaded to the S3 bucket and will execute the job.
The following is the high-level architecture diagram of the infrastructure you’ll create in this Cloud Lab: