An Introduction to pandas
Explore the basics of pandas library.
In data science tasks in Python, we often use the pandas library for data manipulation, understanding, and analysis. Here, we give a brief introduction to the fundamentals of pandas before getting into the practice challenges.
The DataFrame data structure
A DataFrame is a 2D table of data with rows and columns, like a spreadsheet. Each column in a DataFrame is a Series, and each row has a unique label known as the index. Let‘s see how we can define a DataFrame using a dictionary.
import pandas as pdimport numpy as np# first we will create a dictionarydata = {'ids': ["STD_2_145", "STD_2_236", "STD_2_390","STD_2_487", "STD_2_569","STD_2_672", "STD_2_789","STD_2_812", "STD_2_951","STD_2_603"],'science': [70, 78, 82, np.nan, 82, 76, 71, 67, 95, 79],'english': [64, 75, 43, 76, 42, 77, 88, 56, 87, 90],'math': [87, 56, 68, 94, 76, 71, 64, 60, 89, 93],'previous result': ['78%', '75%', '59%', '85%', '70%','75%', '60%', '', '76%', '70%']}result = pd.DataFrame(data,columns=['ids', 'science', 'english', 'math', 'previous result'])print(type(data))print(type(result))print("----------------------")print(result.dtypes)
This code uses the pandas library in Python to create a result
DataFrame from a given dictionary data
. The dictionary has five keys, each representing a column in the ids
, science
, english
, math
, and previous result
DataFrame. This dictionary encapsulates student information such as their IDs, grades in science, English, and math courses, along with their previous overall results.
The type()
function is used to check the type of data
and result
. The dtypes
attribute is then used to view the data types of each column in the data frame. ...