Overview of Dataset
Become familiar with the dataset used in this course.
Structure of product dataset
First, let’s take a look at the metadata of the Amazon “Toys and Games” dataset:
{"asin": "0000031852","title": "Girls Ballet Tutu Zebra Hot Pink","feature": ["Botiquecutie Trademark exclusive Brand","Hot Pink Layered Zebra Print Tutu"],"description": "This tutu is great ...","price": 3.17,"imageURL": "http://...","imageURLHighRes": "http://...","also_buy": ["B00JHONN1S", "B002BZX8Z6", "..."],"salesRank": {"Toys and Games": 211836},"brand": "Coxlures","categories": [["Sports & Outdoors","Other Sports","Dance"]]}
Structure of product dataset in JSON format
Here is the explanation of the metadata of the “Toys and Games” dataset.
Column Name | Column Detail |
---|---|
asin |
ID of the product, for example, 0000031852 |
title |
Name of the product |
feature |
Features of the product in bullet point format |
description |
Description of the product |
price |
Price in US dollars (at the time of crawl) |
imageURL |
URL of the product image |
imageURLHighRes |
URL of the high-resolution product image |
related |
Related products (also bought, also viewed, bought together, buy after viewing) |
salesRank |
Sales rank information |
brand |
Brand name |
categories |
List of categories the product belongs to |
Structure of review dataset
We’ll use the Amazon review dataset on “Toys and Games”. Its details can be found under the “Amazon Review Data (2018)” lesson in the Appendix.
{"image": ["https://..."],"overall": 5.0,"vote": "2","verified": True,"reviewTime": "01 1, 2018","reviewerID": "AUI6WTTT0QZYS","asin": "5120053084","style": {"Size:": "Large","Color:": "Charcoal"},"reviewerName": "Abbey","reviewText": "I now have 4 ... ","summary": "Comfy, flattering, ...!","unixReviewTime": 1514764800}
Structure of review dataset in JSON format
Below is the explanation of the “Toys and Games” review dataset:
Column Name | Column Detail |
---|---|
image |
Images that users post after they have received the product |
overall |
Rating of the product |
vote |
Helpful votes of the review |
reviewTime |
Time of the review (raw) |
reviewerID |
ID of the reviewer, e.g., AUI6WTTT0QZYS |
asin |
ID of the product, e.g., 5120053084 |
style |
A dictionary of the product metadata, e.g., “Size” is “Large”) |
reviewerName |
Name of the reviewer |
reviewText |
Text of the review |
summary |
Summary of the review |
unixReviewTime |
Time of the review (unix time) |
Extract the dataset
The dataset is downloaded in the zipped format. We need to unzip it using the following command:
gzip -d Toys_and_Games_5.json.gz
Try the command in the terminal
- Copy the above command in the terminal to execute it.
- Run
ls
command after this to check the extracted file.