Amazon Redshift Spectrum

Amazon Redshift Spectrum allows analysts to query data stored in Amazon S3 and Redshift. Launched in 2017, Redshift Spectrum supports the same SQL queries as Amazon Redshift. It’s possible to query and analyze data in S3 without loading the data into a Redshift data warehouse.

It makes sense to try Redshift Spectrum if we already have an Amazon Redshift data warehouse and we wish to query existing data in S3. If we only have data in S3 (and not in Redshift), we can use Amazon Athena. If we only have data in Redshift (and not in S3), we can submit SQL queries through the standard Redshift Query Editor without more setup steps.

Using Redshift Spectrum

Redshift Spectrum doesn’t have its own area within the AWS Console. Rather, it’s a set of features within Amazon Redshift.

Let’s assume we already have an initial Amazon Redshift data warehouse. If you don't have a Redshift data warehouse, please visit the Amazon Redshift lesson for guidance on setting up Redshift Serverless. The Serverless option is relatively cost-effective, and we’ll use it for our illustrative example.

Note: Our sample S3 data resides in the “us-east-1” AWS region. To access this data through Redshift Spectrum, our Redshift setup must also be in “us-east-1.” While not covered in this lesson, it’s possible to migrate S3 data between regions, or to load the S3 data into Redshift (using non-Spectrum features).

Creating an IAM role for Redshift

The goal of Redshift Spectrum features is to access data that’s already in Amazon S3 without having to load the data into the Redshift data warehouse. Toward this goal, we need to set up an IAM role that provides Redshift with the necessary data access permissions.

Navigate to the “Identity and Access Management (IAM)” section of the AWS Console, and select the “Roles” page. Click the “Create role” button on the page.

Get hands-on with 1200+ tech skills courses.