Analytics on AWS

Get introduced to the course and learn about data, data analysis, and Amazon Web Services (AWS).

Amazon Web Services (AWS) has a suite of analytics services designed to enable organizations of all sizes and industries to reinvent their business with data.

This course describes these services according to the functionality that each service provides and includes interactive opportunities to learn more.

This course is designed for intermediate-level engineers and data analysts interested in learning more about AWS services. Takeaway skills for this course include:

  • An understanding of the range of data analytics tools available through AWS.

  • Hands-on experience with data lakes and their architecting in AWS.

  • Hands-on experience with AWS tools for analytics and insights, including Amazon QuickSight, Amazon SageMaker, AWS AI services, and Amazon Athena.

  • Familiarity with the various places that data can come from.

  • Working knowledge of different ways in which data can be migrated and AWS tools that facilitate this process.

Based on a specific use case for AWS services, the decision can be made whether data analytics on AWS is the "fastest way to get answers from all your data to all your users."

What is data?

In its broadest sense, data is any information that can be stored or utilized for any future purpose. Data can be in the form of numbers, also known as statistics. Data can also be nonnumerical and can take any form of representation—for example, written text from customers.

While data is certainly not a new concept, the development of computing capabilities has made it increasingly common to store big dataLarge amounts of data in digital forms.

Consider the email inbox. The expanding availability of the internet starting in the 1990s promoted the use of electronic mail in place of physical letters. The initial storage space available for web-based email was very limited (i.e., a few MB), so people downloaded emails onto their computers. When Google first announced Gmail in 2004, the idea of a web-based email service with 1 GB (1000 MB) of free storage seemed amazing. Today, at least 15 GB of free email storage is common.

Amazon.com was founded in 1994, during the earlier days of the internet. Since then, each of their over 300 million e-commerce customers has generated a history of online shopping purchases. The total amount of data that Amazon.com stores is massive and is estimated to be over 1,000,000,000 GB (> 1 exabyte).

Big Data Sizes

1,000,000,000 gigabyte (GB) stored by Amazon.com

= 1,000,000 terabyte (TB)

= 1,000 petabyte (PB)

= 1 exabyte (EB)

Try the calculator given below to get a better sense of data sizes from smaller to larger. Enter the number of gigabytes (e.g., 15 GB) and see how it converts into megabytes, terabytes, and petabytes!

Interactive Calculator: Data Sizes

Megabyte (MB)Gigabyte (GB)Terabyte (TB)Petabyte (PB)
f10001f0.001f0.000001

Note: This calculator uses decimal units, the current standard from the International Electrotechnical Commission (IEC).

What is data analytics?

In work and in life, we might have frequent needs to analyze the various data that we encounter. Looking at the weather forecast, we might decide that we need to pack an umbrella or a poncho tomorrow. We might realize that temperatures have been increasing and make preparations to mitigate climate change.

In this course, we define the term data analytics more broadly than in other contexts. We assume that people who wish to analyze data might become interested in using any of the tools available today that can help them discover, interpret, and communicate insights. Some terms related to the concept of data analytics are:

  • Data engineering: refers to the idea that engineering and coding skills are often required to set up systems that enable data to be effectively analyzed.

  • Data science: refers to the scientific rigor (e.g., math and statistics) that can be required to discover insights from data (e.g., to make predictions about the future from existing data).

The main reason our definition of data analytics is broad is that the tools we’ll be describing in this course encompass data engineering and data science as well. In this course, we’re much less concerned with specific job roles than we are with how various services can help us gain insights from data.

Use cases for data analytics within businesses are plentiful. For example, in the business of sports, analytics has become an important part of how teams can increase their chances of winning. This idea was popularized in the movie Moneyball (2011), featuring Brad Pitt in the role of Billy Beane, an American manager who fielded a highly competitive baseball team on a relatively low budget. According to the movie and its corresponding book (Moneyball: The Art of Winning an Unfair Game), a nontraditional and data-driven approach to evaluating baseball players enabled the 2002 Oakland Athletics team to deliver results that dramatically exceeded expectations.

What is AWS?

AWS is a division of Amazon.com that provides developers with tools that can be used to create and manage web-based applications, including applications that involve data of all types and sizes.

While Amazon started as an online shopping site (initially for buying paper books) in 1994, its Web Services offering for developers became more profitable for the company than any other offering. Many businesses use AWS under the hood, including Netflix and Apple. Amazon’s consumer-facing shopping sites also run on AWS, having migrated from previous computing infrastructure before 2010.

AWS competes in the area of cloud-computing solutions (along with Microsoft Azure, Google Cloud Platform (GCP), and others). Amazon founder Jeff Bezos compared AWS and cloud computing to electricity grids. He said that in the early 1900s, factories needing electricity would build their own electric power plants. However, when electric power grids came online, factories could buy power more efficiently from the grid.

From our perspective, using AWS effectively is more challenging than turning on a light bulb or buying something we like from Amazon.com.

This course serves as a guide to the data analytics services available on AWS. It’s structured based on an AWS recommendation for architecting systems that can derive insights from data, as shown in the figure below.