Loading the Data
Get an introduction to another dataset and load the related data.
We'll cover the following...
We’re going to look at the LastFm dataset, the official song tag, and the song similarity dataset of the Million Song Dataset (MSD). Quoting from the link above:
“The MSD team is proud to partner with Last.fm in order to bring you the largest research collection of song-level tags and precomputed song-level similarity. All the data is associated with MSD tracks, which makes it easy to link it to other MSD resources: audio features, artist data, lyrics, etc.”
Importing the dataset
First, we need to import this dataset into a PostgreSQL database. The dataset is offered both as a SQLite database and a JSON file. Loading the SQLite database is easy thanks to pgloader:
$ curl -L -o /tmp/lastfm_tags.db
http://labrosa.ee.columbia.edu/millionsong/sites/default/files/lastfm/lastfm_tags.db
$ pgloader /tmp/lastfm_tags.db pgsql://appdev@localhost/appdev
We get the following output, meaning the data is now available in our PostgreSQL database for further indexing:
table name errors read imported bytes total time
----------------------- --------- --------- --------- --------- --------------
fetch 0 0 0 0.000s
fetch meta data 0 8 8 0.028s
Create Schemas 0 0 0 0.000s
Create SQL Types 0 0 0 0.006s
Create tables 0 6 6 0.031s
Set Table OIDs 0 3 3
...