Now that you’ve mastered pandas Series, take it to the next level by exploring pandas DataFrames and integrating Series into real-world data analysis projects, such as visualizing geospatial data.
Key takeaways:
A pandas Series is a one-dimensional labeled array that handles diverse data types.
You can create Series from lists, dictionaries, NumPy arrays, and scalars.
The Series provides flexible indexing with default integer indices or custom labels.
It offers powerful methods for mathematical operations, handling missing data, and filtering values efficiently.
The pandas Series is a one-dimensional data structure in the Python library that primarily holds data of a single data type but also supports mixed data types by upcasting them to a more generic type. While it shares similarities with NumPy arrays, one key distinction is that each element in a Series is associated with an index label, which can be customized as needed. Moreover, a pandas Series is dynamic, allowing you to add or remove elements as required, making it highly flexible for data manipulation tasks. Let’s delve into the details of the pandas Series, explore its parameters and methods, and learn how to leverage it effectively in code.
Pandas: Python for Data Analysis
Pandas is a very popular Python library that provides powerful, flexible, and high-performance tools to analyze and process data. Moreover, data science is in high demand and is one of the most highly paid professions today. If you’re looking to get into data science, machine learning, or if you simply want to brush up on your analytical skills, then this is the Path for you. It covers topics from basic representation of data to advanced data analysis techniques. You’ll also learn about feature engineering using pandas. By the end of this Path, you’ll be able to perform data analysis on different data sets.
The complete syntax of a Series object is mentioned below:
pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
The data
parameter is required, while the other parameters are optional.
Parameter | Explanation |
data | Data to store (e.g., array, list, dictionary, scalar) |
index | Optional labels for data—Default: integer labels |
dtype | Optional data type—Inferred by default |
name | Name of the Series—Default: |
copy | Whether to copy data ( |
Note: The
fastpath
parameter is for internal use only and helps speed up Series creation. You don’t need to use or modify it directly during Series initialization.
The following diagram depicts a diagrammatic representation of a pandas Series with the following labels and values:
labels = ['a', 'b', 'c', 'd']
values = ['maroon', 'navy', 'gray', 'teal']
There are several ways we create a pandas Series in Python.
Here’s a quick summary:
Method | Example | Use Case |
From a list |
| When working with lists or arrays |
From NumPy |
| When starting with NumPy data |
From dict |
| When you have labeled key-value pairs |
Scalar value |
| When repeating a single value |
Let’s discuss each of these in detail.
One way to create a pandas Series is by using a Python list.
import pandas as pdlist_of_numbers = [4, 10, 15, 6, 21, 19,]series_list = pd.Series(list_of_numbers)print("Series made using a list: \n")print(series_list)
Another way to create a pandas Series is by using a NumPy array.
import pandas as pdimport numpy as nparray_of_colors = np.array(['black', 'maroon', 'white', 'navy'])series_numpy = pd.Series(array_of_colors)print("Series made using an array: \n")print(series_numpy)
You can also create a pandas Series using a Python dictionary, where the keys become indices, and the values form the data.
import pandas as pddict_of_cars = {"1": "Bugatti Veyron", "2": "Aston Martin", "3": "Ferrarri"}series_dict = pd.Series(dict_of_cars)print("Series made using a dictionary: \n")print(series_dict)
You can also create a pandas Series using a scalar value, which is repeated across all specified indices.
import pandas as pdseries_scalar = pd.Series(5, index=['a', 'b', 'c', 'd'])print("Series made using a scalar: \n")print(series_scalar)
You can create an empty pandas Series by not providing any data. Later, you can assign values to it using its indices. It’s recommended to explicitly define the dtype
(e.g., dtype='float64'
) to avoid potential issues or warnings in future versions of pandas.
import pandas as pdempty_series = pd.Series(dtype='float64')print(empty_series)
Learn the pandas Series in depth!
There are several exercises that focus on how to use a particular function and method. The functions are covered in detail by explaining the important parameters and how to use them. By completing this course, you will be able to do data analysis and manipulation with Pandas easily and efficiently.
In a pandas Series, labels are crucial in identifying and accessing data. They provide a human-readable way to reference specific elements, making working with and interpreting the data easier. Below are the common types of labels used in the pandas Series:
If we don’t specify an index during creation, pandas assign default integer labels starting from 0.
import pandas as pddefault_labels_series = pd.Series([1, 2, 3, 4])print(default_labels_series)
We can assign custom labels to our Series either during creation or afterward.
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)
Locating elements is almost always crucial for any data. Indexing in Series involves selecting particular elements based on labels or positions. Below are a few handy techniques for Series indexing:
Note: The index position starts from 0.
Default indexing refers to automatically assigning integer labels to access elements in a pandas Series. With default indexing, elements can be accessed using integer positions like an array.
import pandas as pd# Creating a Series with default integer labelsdefault_index_series = pd.Series([10, 20, 30, 40])# Accessing elements using integer-based indexingprint("Element at index 1:", default_index_series[1])
With custom indexing, elements can be accessed using user-defined labels instead of default integer labels. This feature enhances the data’s readability and manageability, especially with more descriptive labels like strings.
import pandas as pd# Creating a Series with custom labelscustom_index_series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])# Accessing elements using custom labelsprint("Element with label 'b':", custom_index_series['b'])
In pandas, loc
and iloc
are two Series attributes used for indexing and selecting data from a DataFrame or Series.
loc
is used for label-based indexing in pandas, allowing you to access elements using their labels or with a boolean array for conditional selection.
iloc
is used for integer-location-based indexing, where we can specify the location by integer indexes.
For Series, the usage is relatively simple as the data is only one-dimensional. Let’s go through the code examples.
import pandas as pd# Creating a Series with custom labelscustom_labels_series = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])print("Series with custom labels:")print(custom_labels_series)# Label-based indexing using .locloc_example = custom_labels_series.loc['B']iloc_example = custom_labels_series.iloc[1]print("\nValue retrieved using loc (label-based indexing, label='B'):", loc_example)print("Value retrieved using iloc (position-based indexing, position=1):", iloc_example)# Slicing using label-based indexinglabel_slice = custom_labels_series.loc['B':'D']print("\nSlice using loc (labels 'B' to 'D'):")print(label_slice)# Slicing using position-based indexingposition_slice = custom_labels_series.iloc[1:3]print("\nSlice using iloc (positions 1 to 2):")print(position_slice)
Boolean indexing allows you to filter Python Series elements based on conditions. Only elements corresponding to True
values are selected.
import pandas as pddefault_labels_series = pd.Series([1, 2, 3, 4])filtered_series = default_labels_series[default_labels_series > 2]print(filtered_series)
Accessing elements in a pandas Series is essential for data manipulation and analysis. It involves retrieving specific values or subsets of data using various techniques, such as referencing by label or position. Below are a few commonly used methods for accessing elements in a pandas Series:
A single element can be accessed by using the []
operator and specifying the label or index position inside of it.
import pandas as pd# Creating a Series with default integer labelsdefault_labels_series = pd.Series([1, 2, 3, 4])# Single element indexing using the positionsingle_element_position = default_labels_series[1]print("Element at position 1:", single_element_position)# Creating a Series with custom labelslabeled_series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])# Single element indexing using the label in labeled Seriessingle_element_label = labeled_series['b']print("Element with label 'b':", single_element_label)
Using series[start:stop]
, we can obtain the slice of elements between start
and stop
indices (the element at stop
is also included).
import pandas as pd# Creating a Series with default integer labelsdefault_labels_series = pd.Series([1, 2, 3, 4])# Slicing elements using integer positionssliced_elements = default_labels_series[1:3]print("Sliced elements (default labels):")print(sliced_elements)# Creating a Series with custom labelslabeled_series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])# Slicing elements using custom labelssliced_series = labeled_series['b':'d']print("\nSliced elements (using labels):")print(sliced_series)
The pandas Series has various attributes that can provide information about the Series. Let’s go through some of them, along with a code example:
Attributes | Description |
| The |
| The |
| The |
| The |
| The |
| The |
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# The at attributeat_attribute = custom_labels_series.at['label_a']# The dtype attributedtype_attribute = custom_labels_series.dtype# The is_unique attributeis_unique_attribute = custom_labels_series.is_unique# The shape attributeshape_attribute = custom_labels_series.shape# The size attributesize_attribute = custom_labels_series.size# The values attributevalues_attribute = custom_labels_series.valuesprint("At attribute: ", at_attribute)print("Dtype attribute: ", dtype_attribute)print("Is unique attribute: ", is_unique_attribute)print("Shape attribute: ", shape_attribute)print("Size attribute: ", size_attribute)print("Values attribute: ", values_attribute)
Note:
iloc
andloc
are also Series attributes.
The data stored in Series can be manipulated using built-in methods offered by pandas for Series. We’ve already seen how to create and access elements in a Series, so now let’s cover some other highly useful Series methods below. We’ll demonstrate their use through practical examples, but check the official documentation for a comprehensive list.
series.head(n)
: Returns the first n
elements.
series.tail(n)
: Returns the last n
elements.
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Obtaining the first or last 2 elementshead_elements = custom_labels_series.head(2)tail_elements = custom_labels_series.tail(2)print("Head elements: \n", head_elements)print("Tail elements: \n", tail_elements)
series.sum()
: Sum of all elements
series.mean()
: Mean of all elements
series.median()
: Median of all elements
series.min()
, series.max()
: Minimum and maximum values
series.std()
, series.var()
: Standard deviation and variance
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Descriptive statisticssum_elements = custom_labels_series.sum()mean_element = custom_labels_series.mean()median_element = custom_labels_series.median()min_element = custom_labels_series.min()max_element = custom_labels_series.max()std_dev = custom_labels_series.std()variance = custom_labels_series.var()print("Sum: \n", sum_elements)print("Mean: \n", mean_element)print("Median: \n", median_element)print("Min element: \n", min_element)print("Max element: \n", max_element)print("Standard Deviation: \n", std_dev)print("Variance: \n", variance)
series.dropna()
: Remove missing NaN
values.
series.fillna(value)
: Replace missing values with a specified value.
import pandas as pdimport numpy as npcustom_labels_series = pd.Series([10, np.nan, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Handling missing valuesdropna_series = custom_labels_series.dropna()fillna_series = custom_labels_series.fillna(0)print("Dropna: \n", dropna_series, "\n")print("Fillna: \n", fillna_series)
series.sort_values()
: Sort elements by the values.
series.sort_index()
: Sort elements by the index.
import pandas as pdcustom_labels_series = pd.Series([10, 40, 30, 20], index=['label_d', 'label_c', 'label_b', 'label_a'])print(custom_labels_series, "\n")# Sorting Seriessorted_values = custom_labels_series.sort_values()sorted_index = custom_labels_series.sort_index()print("Sorted values: \n", sorted_values, "\n")print("Sorted index: \n", sorted_index)
series.unique()
: Return unique values.
series.value_counts()
: Count occurrences of each unique value.
import pandas as pdcustom_labels_series = pd.Series([10, 20, 20, 30], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series, "\n")# Obtaining unique valuesunique_values = custom_labels_series.unique()value_counts = custom_labels_series.value_counts()print("Unique values: \n", unique_values, "\n")print("Value counts: \n", value_counts)
series.describe()
: Generate descriptive statistics.
series.info()
: Display information about the Series.
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series, "\n")# Informational methodsdescription = custom_labels_series.describe()print("Description: \n", description, "\n")print("Info:")custom_labels_series.info()
With the pandas Series, we can also efficiently perform mathematical operations on all elements directly.
series + value
: Add a constant to each element.
series - value
: Subtract a constant from each element.
series * value
: Multiply each element by a constant.
series / value
: Divide each element by a constant.
series1 + series2
: Element-wise addition of two Series.
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Element-wise mathematical operationsaddition = custom_labels_series + 5multiplication = custom_labels_series * 2print("Addition: \n", addition)print("Multiplication: \n", multiplication)
Aspect | Series | DataFrame |
Dimensions | 1D | 2D (rows and columns) |
Data access | By index or position | By rows, columns, or both |
Use case | Labeled array for simple data | Tabular data for complex relationships |
Using the to_frame()
method, we can convert a series to a DataFrame. This conversion may be necessary for more complex operations or for adding multidimensional data.
import pandas as pdcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Converting a series to a DataFrameseries_to_dataframe = custom_labels_series.to_frame()print(series_to_dataframe)
Let’s briefly discuss the benefits of using the pandas Series.
Easy data handling
Labeled data for better organization
Automatic data alignment
Missing data handling (NaN
values)
Built-in statistical and mathematical functions
Integration with NumPy for numerical operations
Specialized support for time series data
Versatility in input formats (lists, dictionaries, etc.)
The pandas Series is a versatile tool for one-dimensional labeled data, essential for tracking temperatures, analyzing stock prices, or managing feedback. With missing value handling, statistical functions, and time-series support, it simplifies real-world data analysis and is a key step toward data science proficiency. Learning pandas Series is foundational to becoming proficient in data science and analytics.
What types of data can be stored in a pandas Series?
How is a pandas Series different from a pandas DataFrame or a NumPy array?
What happens if I don’t provide an index while creating a Series?
How can I access specific elements in a pandas Series?
How are at and loc different when working with the pandas Series?
Can I update or modify a pandas Series?
Free Resources