Introduction to pandas
Learn the fundamentals of Python's pandas library.
We'll cover the following...
Why pandas?
pandas is an open-source Python library that provides efficient data manipulation and analysis tools. It offers a variety of data structures and procedures along with support for various data formats. It’s built on top of Python’s NumPy library.
The following key features are why pandas is a popular and commonly used library:
- Fast, efficient, and optimal performance and support for big data.
- Support for various data formats such as CSV files, JSON, XML, and SQL databases (to name a few).
- Data cleaning and support for handling missing values.
Pandas data structures
Python’s pandas
library provides support for the following two data structures:
- pandas Series
- pandas DataFrame
The pandas series
The pandas Series object is a one-dimensional labeled array. We can populate a Series object with any Python data type, such as integers, strings, floats, and so on. We can think of the Series object as a column in a spreadsheet. All Series objects are indexed by default, meaning that every Series element has an index.
We can create a Series object using an array, dictionary, lists, and scalars. As illustrated in the figure below, we can make the scores
Series object by passing the student_score
list to the pd.Series()
function. Each element is indexed, and we can ...