Challenge 2: Merge with Missing Values (Medium)
This challenge expects you to merge DataFrames that have missing information when combined.
Problem definition
your music analyst now has two datasets with complementary information.
The first dataset contains information about bands, and their associated data. However, instead of names, it refers to countries by their IDs. Here are three sample rows:
artist | country | plays | genre | fans |
---|---|---|---|---|
The Beatles | 1 | 150 | rock | 50 |
Iron Maiden | 1 | 20000 | metal | 3500 |
Judas Priest | 1 | 5000 | metal | 1000 |
Leprous | 5 | 1000 | metal | 500 |
Rush | 6 | 3000 | rock | 500 |
The second is a simple mapping between the ID of a country and its name.
country_id | name |
---|---|
1 | UK |
2 | US |
3 | Egypt |
4 | Finland |
This is very similar to the previous dataset in Challenge 1, but with a small difference. In this case, there are some artists in the dataset with an associated country ID that doesn’t exist in the countries
table. This might be due to the data either getting corrupted, or inserted incorrectly.
Your music analyst would like to know how many plays
were affected due to this error. Can you help the music analyst find the sum of plays of all the artists without a country name associated with them?
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.