Home/Blog/Data Science/Pandas Series in Python
Home/Blog/Data Science/Pandas Series in Python

Pandas Series in Python

izza ahmad
Feb 01, 2024
5 min read

Pandas series is a one-dimensional array structure in the Python library. It contains data of a single data type (e.g. strings, integers, etc.) and has a fixed size that can be manipulated through add or remove functions. Based on its characteristics, Pandas series is commonly compared to the NumPy array. However, what distinguishes Pandas series is that it contains an index label for each element that can be customized according to the requirements. Let’s delve into the details of Pandas series, parameters, and methods, and learn how we can utilize them in our code effectively.

Series syntax#

The complete syntax of a series object is mentioned below.

pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

The data parameter is required, while the other parameters are optional.

Parameter

Explanation

data

It represents the data to be stored in the series. This parameter can be an array, list, dictionary, scalar value, or constant.

index

This parameter is optional and represents the labels for the data. If it’s not given, a default index of integers will be used.

dtype

This parameter is also optional and represents the data type of the series. It will automatically infer the type if it’s not provided.

name

This parameter is also optional and represents the name for the series. It is “None” by default.

copy

This parameter is optional and specifies whether or not to copy the data. It is “False” by default.

Note: The fastpath parameter is for internal use only.

The following diagram depicts a diagrammatic representation of a pandas series with the following labels and values.

  • labels = ['a', 'b', 'c', 'd']

  • values = ['maroon', 'navy', 'gray', 'teal']

Diagrammatic representation of pandas series
Diagrammatic representation of pandas series

Creating a Pandas series#

There are several ways we create a Pandas series in Python. Let’s discuss each of them in detail.

Method 1: Using a list#

One way to create a Pandas series is by using a Python list.

import pandas as pd
import numpy as np
list_of_numbers = [4, 10, 15, 6, 21, 19,]
series_list = pd.Series(list_of_numbers)
print("Series made using a list: \n")
print(series_list)

Method 2: Using a NumPy array#

Another way to create a Pandas series is by using a NumPy array.

import pandas as pd
import numpy as np
array_of_colors = np.array(['black', 'maroon', 'white', 'navy'])
series_numpy = pd.Series(array_of_colors)
print("Series made using an array: \n")
print(series_numpy)

Method 3: Using a dictionary#

You can also create a Pandas series by using a dictionary.

import pandas as pd
import numpy as np
dict_of_cars = {"1": "Bugatti Veyron", "2": "Aston Martin", "3": "Ferrarri"}
series_dict = pd.Series(dict_of_cars)
print("Series made using a dictionary: \n")
print(series_dict)

Method 4: Using a scalar#

We can also create a Pandas series by using a scalar value.

import pandas as pd
import numpy as np
series_scalar = pd.Series(5, index=['a', 'b', 'c', 'd'])
print("Series made using a scalar: \n")
print(series_scalar)

Method 5: Using an empty series#

We can create an empty Pandas series by not providing any data. Additions can be made afterwards too.

import pandas as pd
import numpy as np
empty_series = pd.Series()
print(empty_series)

Labels#

In a series, labels are crucial for both identifying and accessing data. They help provide a human-readable product so that we can easily refer to specific elements when needed.

Default labeling#

If we don’t specify an index during creation, Pandas assigns default integer labels starting from 0.

import pandas as pd
import numpy as np
default_labels_series = pd.Series([1, 2, 3, 4])
print(default_labels_series)

Custom labeling#

We can assign custom labels to our series either during creation or afterwards.

import pandas as pd
import numpy as np
custom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])
print(custom_labels_series)

Indexing#

For any data, locating elements is almost always crucial. Indexing in series involves selecting particular elements based on labels or positions. A few handy techniques and methods for series indexing are mentioned below.

Indexing a single element#

A single element can be accessed by using the [] operator and specifying the label or index position inside of it.

import pandas as pd
import numpy as np
default_labels_series = pd.Series([1, 2, 3, 4])
print(default_labels_series)
single_element = default_labels_series[1]

Slicing elements#

Using series[start:stop] we can obtain the slice of elements between start and stop (the element at stop is included as well).

import pandas as pd
import numpy as np
default_labels_series = pd.Series([1, 2, 3, 4])
print(default_labels_series)
sliced_elements = default_labels_series[1:3]

Note: The index position starts from 0.

loc and iloc#

In Pandas, loc and iloc are two series attributes that are used for indexing and selecting data from a dataframe or series. The loc function is used for label-based indexing, while iloc is used for positional indexing.

  1. loc is label-based indexing, which means we can use it to access a group of rows and columns by labels or a boolean array.

  2. iloc is used for integer-location based indexing, where we can specify the location by integer indices.

For series, the usage is relatively simple since the data is only one-dimensional. Let’s go through the code examples.

import pandas as pd
import numpy as np
default_labels_series = pd.Series([1, 2, 3, 4])
print(default_labels_series)
loc_example = default_labels_series.loc[1]
iloc_example = default_labels_series.iloc[1]
print(loc_example)
print(iloc_example)

Series attributes#

Pandas series come with various attributes that can provide us with information about the series. Let’s go through some of them along with code examples:

The at attribute#

The at attribute returns the value for a row / column label pair.

The dtype attribute#

The dtype attribute returns the data type of the series data.

The is_unique attribute#

The is_unique attribute returns True if the elements of the series are unique.

The shape attribute#

The shape attribute returns a tuple containing the shape of the series data.

The size attribute #

The size attribute returns the total number of elements in the series data.

The values attribute #

The values attribute returns the series as an ndarray object that contains the elements of the series.

import pandas as pd
import numpy as np
custom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])
print(custom_labels_series)
# The at attribute
at_attribute = custom_labels_series.at['label_a']
# The dtype attribute
dtype_attribute = custom_labels_series.dtype
# The is_unique attribute
is_unique_attribute = custom_labels_series.is_unique
# The shape attribute
shape_attribute = custom_labels_series.shape
# The size attribute
size_attribute = custom_labels_series.size
# The values attribute
values_attribute = custom_labels_series.values
print("At attribute: ", at_attribute)
print("Dtype attribute: ", dtype_attribute)
print("Is unique attribute: ", is_unique_attribute)
print("Shape attribute: ", shape_attribute)
print("Size attribute: ", size_attribute)
print("Values attribute: ", values_attribute)

Note: iloc and loc are also series attributes.

Series methods#

The data stored in series can be manipulated in different ways using built-in methods offered by Pandas for series. We’ve already seen how to create and access elements in a series, so now let’s cover some other highly useful series methods below.

Obtaining first or last n elements#

series.head(n): Return the first n elements.

series.tail(n): Return the last n elements.

Element-wise mathematical operations#

series + value: Add a constant to each element.

series - value: Subtract a constant from each element.

series * value: Multiply each element by a constant.

series / value: Divide each element by a constant.

series1 + series2: Element-wise addition of two series.

Descriptive statistics#

series.sum(): Sum of all elements.

series.mean(): Mean of all elements.

series.median(): Median of all elements.

series.min(), series.max(): Minimum and maximum values.

series.std(), series.var(): Standard deviation and variance.

Handling missing values#

series.dropna(): Remove missing NaN values.

series.fillna(value): Replace missing values with a specified value.

Sorting Series#

series.sort_values(): Sort elements by the values.

series.sort_index(): Sort elements by the index.

Boolean indexing#

series[condition]: Filter elements based on a certain condition.

Obtaining unique values#

series.unique(): Return unique values.

series.value_counts(): Count occurrences of each unique value.

Informational methods#

series.describe(): Generate descriptive statistics.

series.info(): Display information about the series.

The complete list for attributes and methods is exhaustive, and can be found in the official documentation for pandas.Series. Let’s learn more about implementing the discussed methods through Python code.

import pandas as pd
import numpy as np
custom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])
print(custom_labels_series)
# Obtaining first or last n elements
head_elements = custom_labels_series.head(2)
tail_elements = custom_labels_series.tail(2)
# Element-wise mathematical operations
addition = custom_labels_series + 5
multiplication = custom_labels_series * 2
# Descriptive statistics
sum_elements = custom_labels_series.sum()
mean_element = custom_labels_series.mean()
median_element = custom_labels_series.median()
min_element = custom_labels_series.min()
max_element = custom_labels_series.max()
std_dev = custom_labels_series.std()
variance = custom_labels_series.var()
# Handling missing values
dropna_series = custom_labels_series.dropna()
fillna_series = custom_labels_series.fillna(0)
# Sorting Series
sorted_values = custom_labels_series.sort_values()
sorted_index = custom_labels_series.sort_index()
# Boolean indexing
condition_filtered = custom_labels_series[custom_labels_series > 15]
# Obtaining unique values
unique_values = custom_labels_series.unique()
value_counts = custom_labels_series.value_counts()
# Informational methods
description = custom_labels_series.describe()
info_display = custom_labels_series.info()
print("Head elements: \n", head_elements)
print("Tail elements: \n", tail_elements)
print("Addition: \n", addition)
print("Multiplication: \n", multiplication)
print("Sum: \n", sum_elements)
print("Mean: \n", mean_element)
print("Median: \n", median_element)
print("Min element: \n", min_element)
print("Max element: \n", max_element)
print("Standard Deviation: \n", std_dev)
print("Variance: \n", variance)
print("Dropna: \n", dropna_series)
print("Fillna: \n", fillna_series)
print("Sorted values: \n", sorted_values)
print("Sorted index: \n", sorted_index)
print("Boolean index: \n", condition_filtered)
print("Unique values: \n", unique_values)
print("Value counts: \n", value_counts)
print("Description: \n", description)
print("Info: \n", info_display)

Series vs dataframes#

Understanding the difference between a series and a dataframe is crucial for efficient data manipulation. While a series is one-dimensional, a dataframe is a two-dimensional labeled data structure. There are many occurrences where we would need to convert one into the other.

Converting a series to a dataframe#

We can convert a series to a dataframe using the to_frame() method. Such a conversion may be necessary for more complex operations or for adding multidimensional data.

import pandas as pd
import numpy as np
custom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])
print(custom_labels_series)
# Converting a series to a dataframe
series_to_dataframe = custom_labels_series.to_frame()
print(series_to_dataframe)

Benefits of Pandas series#

Let’s briefly discuss the benefits of using the Pandas series. 

  • Easy data handling.

  • Labeled data with an index.

  • Automatic data alignment.

  • Missing data handling (NaN values).

  • Built-in statistical and mathematical functions.

  • Integration with NumPy for numerical operations.

  • Specialized support for time series data.

  • Versatility in input formats (lists, dictionaries, etc.)

Congratulations, we’ve effectively covered all there is to get equipped with the most crucial knowledge about Pandas series. By practicing the code examples yourself, you’ll be well on your way to mastering Pandas series. Happy coding!


  

Free Resources