Solution: Merging Datasets
Let’s review the solution to the merging datasets exercise.
We'll cover the following...
The solution to the problem that merges the two datasets drops the NULL
values in the resulting dataset and prints the final dataset given below.
Solution
Press + to interact
## Importing librariesimport pandas as pd## Loading datadata = pd.read_csv('../data/PovStatsData.csv')country = pd.read_csv('../data/PovStatsCountry.csv', na_values='', keep_default_na=False)data = data.drop('Unnamed: 50', axis=1)# Melting DataFramesdata_melt = pd.melt(data, id_vars=id_vars, var_name='year').dropna(subset=['value'])data_melt['year'] = data_melt['year'].astype(int)# Creating the is_country columncountry['is_country'] = country['Region'].notna()## pivoting melted DataFramedata_pivot = data_melt.pivot(index=['Country Name', 'Country Code', 'year'],columns='Indicator Name',values='value').reset_index()# code from belowpoverty = pd.merge(data_pivot, country, left_on='Country Code', right_on='Country Code', how='left')# "High Income" is NA so we fill it with False values, as it is not a countrypoverty['is_country'] = poverty['is_country'].fillna(False)print(poverty.head())