AWS Lake Formation
Explore how AWS Lake Formation can be used to set up a data lake.
Launched in 2018, AWS Lake Formation is designed to allow people to set up a data lake on AWS more quickly (“in days instead of months”).
Amazon S3 is used as the centralized location for the data lake and is the destination where AWS Lake Formation ingests the data. Query engines such as Amazon Athena can then be used to generate insights from the data lake.
Lake Formation has additional features to clean and classify the ingested data and to set up security permissions for accessing the data.
Note: The vast majority of Amazon S3 use cases don’t require AWS Lake Formation. This is because S3 already provides many ways of ingesting data into buckets, including through APIs that can be configured to run regularly when new data arrives.
Creating a data lake
In practice, it’s not a quick process to set up a data lake using AWS Lake Formation. There are many required configurations, and it’s not always obvious which configurations might best fit project needs. This complexity may be the reason why instead of using AWS Lake Formation, many teams find it faster to aggregate data into S3 or data warehouses using alternative tools, including code.
For the purpose of demonstrating AWS Lake Formation, let’s walk through an example of setting up an S3 data lake with data in a CSV file.
Set up an IAM user
First, we need to designate an AWS Identity and Access Management (IAM) user to be the administrator for the data lake. AWS Lake Formation doesn’t allow the AWS root user to be this administrator and will generate error messages if we try to access the lake as the root user.
To proceed, we can designate an existing IAM administrator to be the Lake Formation administrator. For example, our AWS account already has a user “xke” with administrator permissions. We confirm this by going to the “Users” page of the IAM console and clicking the username.
Because the AWS “AdministratorAccess” permission is broad, it includes all the permissions that we’d need.
We go to the AWS Lake Formation area of the AWS Console. On the left side menu in the “Permissions” section, we open the page for “Administrative ...