...

/

Finding Correlation Between the Movie Ratings

Finding Correlation Between the Movie Ratings

Let’s look at how to find correlations between different datasets.

We'll cover the following...

We’ve generated some random data for a few movie ratings. Let’s have a look at it.

Press + to interact
{'Terminator': {'Tom': 4.0,
'Jack': 1.5,
'Lisa': 3.0,
'Sally': 2.0},
'Terminator 2': {'Tom': 5.0,
'Jack' : 1.0,
'Lisa': 3.5,
'Sally': 2.0},
'It happened one night': {'Tom': 3.5,
'Jack': 3.5,
'Tiger': 4.0,
'Lisa': 5.0,
'Michele': 3.0,
'Sally': 4.0,},
'27 Dresses': {'Tom': 3.0,
'Jack': 3.5,
'Tiger': 3.0,
'Lisa': 5.0,
'Michele': 4.0,
'Sally': 4.0},
'Poirot': {'Tom': 4.0,
'Jack': 3.0,
'Tiger': 5.0,
'Lisa': 4.0,
'Michele': 3.5,
'Sally': 3.0,
},
'Sherlock Holmes': {'Tom': 4.0,
'Jack': 3.0,
'Tiger': 3.5,
'Lisa': 3.5,
'Sally': 2.0,
}}

The movie data is stored as a dictionary. Each dictionary has its sub-dictionary. Let’s look at the first movie:

'Terminator': {'Tom': 4.0,
  'Jack': 1.5,
  'Lisa': 3.0,
  'Sally': 2.0},

The movie Terminator has been rated by four people: Tom has given it a score of 4.0, while Jack has given it 1.5, and so on. These numbers are random.

We will notice that not everyone has rated every movie. This is something we will need to take into account when we are calculating the correlation.

Let’s see how we can calculate the correlation for the following:

if len(sys.argv) < 2:
    print("Usage: python calc_correlation.py <data file.py>")
    exit(1)

We want to give the script a data file to calculate the correlation on. If the file is not provided, we’ll print the usage and exit.

with open(sys.argv[1], 'r') as f:
    temp = f.read()
    movies_list = ast.literal_eval(temp)
    print(movies_list)

Usually, when we open a file, we have to close it to deal with any errors. The with function does all that for us. It will open the file, close it at the end, and handle any errors that may arise.

We’re looking at the code line by line now.

with open(sys.argv[1], 'r') as f:

We’ll open the file passed, which is the first argument, as read-only.

temp =
...