Solution: Merging Datasets

Let’s review the solution to the merging datasets exercise.

We'll cover the following...

The solution to the problem that merges the two datasets drops the NULL values in the resulting dataset and prints the final dataset given below.

Solution

Press + to interact
## Importing libraries
import pandas as pd
## Loading data
data = pd.read_csv('../data/PovStatsData.csv')
country = pd.read_csv('../data/PovStatsCountry.csv', na_values='', keep_default_na=False)
data = data.drop('Unnamed: 50', axis=1)
# Melting DataFrames
data_melt = pd.melt(data, id_vars=id_vars, var_name='year').dropna(subset=['value'])
data_melt['year'] = data_melt['year'].astype(int)
# Creating the is_country column
country['is_country'] = country['Region'].notna()
## pivoting melted DataFrame
data_pivot = data_melt.pivot(index=['Country Name', 'Country Code', 'year'],
columns='Indicator Name',
values='value').reset_index()
# code from below
poverty = pd.merge(data_pivot, country, left_on='Country Code', right_on='Country Code', how='left')
# "High Income" is NA so we fill it with False values, as it is not a country
poverty['is_country'] = poverty['is_country'].fillna(False)
print(poverty.head())

Explanation

...