...

Loading the Data

Get an introduction to another dataset and load the related data.

We'll cover the following...

Importing the dataset
Querying the data
The JSON resource
Try it yourself

We’re going to look at the LastFm dataset, the official song tag, and the song similarity dataset of the Million Song Dataset (MSD). Quoting from the link above:

“The MSD team is proud to partner with Last.fm in order to bring you the largest research collection of song-level tags and precomputed song-level similarity. All the data is associated with MSD tracks, which makes it easy to link it to other MSD resources: audio features, artist data, lyrics, etc.”

Importing the dataset

First, we need to import this dataset into a PostgreSQL database. The dataset is offered both as a SQLite database and a JSON file. Loading the SQLite database is easy thanks to pgloader:

$ curl -L -o /tmp/lastfm_tags.db
       http://labrosa.ee.columbia.edu/millionsong/sites/default/files/lastfm/lastfm_tags.db

$ pgloader /tmp/lastfm_tags.db pgsql://appdev@localhost/appdev

We get the following output, meaning the data is now available in our PostgreSQL database for further indexing:

             table name     errors       read   imported      bytes      total time
-----------------------  ---------  ---------  ---------  ---------  --------------
                  fetch          0          0          0                     0.000s
        fetch meta data          0          8          8                     0.028s
         Create Schemas          0          0          0                     0.000s
       Create SQL Types          0          0          0                     0.006s
          Create tables          0          6          6                     0.031s
         Set Table OIDs          0          3          3

...