Amazon Athena

Understand the basics of Amazon Athena and how to integrate it with AWS Glue to query data in Amazon S3.

Amazon Athena is a SQL query service for data stored in Amazon S3. Launched in 2016, Athena is based on Presto, an open-source SQL query engine.

Athena doesn’t require loading data outside of S3, though there is some schema setup required to properly query the data (similar to Redshift Spectrum).

Using Amazon Athena with AWS Glue

One of the faster ways to use Amazon Athena is through its integration with AWS Glue. Specifically, Athena can query databases and tables that have schemas (metadata definitions) stored in the AWS Glue Data Catalog.

Note: If you already have tables in the AWS Glue Data Catalog, you can jump ahead to the section Opening Athena from AWS Glue.

Setting up AWS Glue in our S3 account

Below is our dwarf_activities.csv example file that we’ll upload to our S3 account.

Press + to interact
full_name,current_activity,height_in_cm
Grumpy Dwarf,watching television,89
Sleepy Dwarf,sleeping in a hammock,82
Sneezy Dwarf,drinking soup,84
Doc Dwarf,eating an apple,77
Bashful Dwarf,hiding at home,81
Dopey Dwarf,reading a magazine,79
Happy Dwarf,walking outside,88

We want to upload this CSV file to a uniquely named Amazon S3 bucket.

Our AWS account has a bucket “demo-s3-data-lake-bucket.” Within this bucket, we create a folder “dwarf_activities.” We upload the dwarf_activities.csv file into this folder.

Press + to interact
S3 folder where we uploaded the example CSV file
S3 folder where we uploaded the example CSV file

We then navigate to the “AWS Glue” area of the AWS Console and click the “Crawlers” page (under the “Data Catalog” section of the left-side menu). Click the “Create crawler” button to create a new crawler for a data source.

Press + to interact
The AWS Glue page with a button to create a new crawler for a data source
The AWS Glue page with a button to create a new crawler for a data source

Here are our example crawler configurations:

  • Step 1: Set crawler properties.

    • Name: “dwarf_activities”

  • Step 2: ...