

Overview of Dataset

Overview of Dataset

Become familiar with the dataset used in this course.

Structure of product dataset

First, let’s take a look at the metadata of the Amazon “Toys and Games” dataset:

"asin": "0000031852",
"title": "Girls Ballet Tutu Zebra Hot Pink",
"feature": ["Botiquecutie Trademark exclusive Brand",
"Hot Pink Layered Zebra Print Tutu"
"description": "This tutu is great ...",
"price": 3.17,
"imageURL": "http://...",
"imageURLHighRes": "http://...",
"also_buy": ["B00JHONN1S", "B002BZX8Z6", "..."],
"salesRank": {"Toys and Games": 211836},
"brand": "Coxlures",
"categories": [
"Sports & Outdoors",
"Other Sports",
Structure of product dataset in JSON format

Here is the explanation of the metadata of the “Toys and Games” dataset.

Column Name Column Detail
asin ID of the product, for example, 0000031852
title Name of the product
feature Features of the product in bullet point format
description Description of the product
price Price in US dollars (at the time of crawl)
imageURL URL of the product image
imageURLHighRes URL of the high-resolution product image
related Related products (also bought, also viewed, bought together, buy after viewing)
salesRank Sales rank information
brand Brand name
categories List of categories the product belongs to

Structure of review dataset

We’ll use the Amazon review dataset on “Toys and Games”. Its details can be found under the “Amazon Review Data (2018)” lesson in the Appendix.

"image": ["https://..."],
"overall": 5.0,
"vote": "2",
"verified": True,
"reviewTime": "01 1, 2018",
"reviewerID": "AUI6WTTT0QZYS",
"asin": "5120053084",
"style": {
"Size:": "Large",
"Color:": "Charcoal"
"reviewerName": "Abbey",
"reviewText": "I now have 4 ... ",
"summary": "Comfy, flattering, ...!",
"unixReviewTime": 1514764800
Structure of review dataset in JSON format

Below is the explanation of the “Toys and Games” review dataset:

Column Name Column Detail
image Images that users post after they have received the product
overall Rating of the product
vote Helpful votes of the review
reviewTime Time of the review (raw)
reviewerID ID of the reviewer, e.g., AUI6WTTT0QZYS
asin ID of the product, e.g., 5120053084
style A dictionary of the product metadata, e.g., “Size” is “Large”)
reviewerName Name of the reviewer
reviewText Text of the review
summary Summary of the review
unixReviewTime Time of the review (unix time)

Extract the dataset

The dataset is downloaded in the zipped format. We need to unzip it using the following command:

Press + to interact
gzip -d Toys_and_Games_5.json.gz

Try the command in the terminal

  • Copy the above command in the terminal to execute it.
  • Run ls command after this to check the extracted file.
Terminal 1