Introduction to pandas

Learn the fundamentals of Python's pandas library.

Why pandas?

pandas is an open-source Python library that provides efficient data manipulation and analysis tools. It offers a variety of data structures and procedures along with support for various data formats. It’s built on top of Python’s NumPy library.

The following key features are why pandas is a popular and commonly used library:

  • Fast, efficient, and optimal performance and support for big data.
  • Support for various data formats such as CSV files, JSON, XML, and SQL databases (to name a few).
  • Data cleaning and support for handling missing values.

Pandas data structures

Python’s pandas library provides support for the following two data structures:

  • pandas Series
  • pandas DataFrame

The pandas series

The pandas Series object is a one-dimensional labeled array. We can populate a Series object with any Python data type, such as integers, strings, floats, and so on. We can think of the Series object as a column in a spreadsheet. All Series objects are indexed by default, meaning that every Series element has an index.

We can create a Series object using an array, dictionary, lists, and scalars. As illustrated in the figure below, we can make the scores Series object by passing the student_score list to the pd.Series() function. Each element is indexed, and we can ...