Why Sequence Data Analysis?

Before discussing sequence data analysis, we’ll first discuss the differences between sequence and aggregate data to show us the advantages and disadvantages of using sequence data for our analysis. We’ll then introduce the dataset we’ll use in this chapter. We’ll also discuss the different data formats we will be using for the algorithms discussed in the chapter. It’s important to understand that each algorithm will require the data in a different format and that we’ll need to develop a script that transforms the data into a format that we can use as input for the algorithms discussed.

Why does sequence data take more space?

As we can imagine, sequence data usually takes up more space than aggregated data because it contains individual actions a player took in chronological order rather than a total count of the number of times a single type of action was taken. This means that when we work with sequence data, we’ll inevitably end up with large datasets.

Handling complex and large data is an active area of research that’s still in its infancy, and thus we believe that more resources will be available to handle such data as the field continues to grow. We will not discuss methods for handling big distributed data in this course. Interested learners should consult other resources for big data storage and processing.

Understanding sequence data

To help us understand the differences between sequence data and aggregate data, we’ll use an example of data collected from the multiplayer game Dota 2, depicted in the below table.

Get hands-on with 1400+ tech skills courses.