Pandas series is a one-dimensional array structure in the Python library. It contains data of a single data type (e.g. strings, integers, etc.) and has a fixed size that can be manipulated through add or remove functions. Based on its characteristics, Pandas series is commonly compared to the NumPy array. However, what distinguishes Pandas series is that it contains an index label for each element that can be customized according to the requirements. Let’s delve into the details of Pandas series, parameters, and methods, and learn how we can utilize them in our code effectively.
The complete syntax of a series object is mentioned below.
pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
The data
parameter is required, while the other parameters are optional.
Parameter | Explanation |
data | It represents the data to be stored in the series. This parameter can be an array, list, dictionary, scalar value, or constant. |
index | This parameter is optional and represents the labels for the data. If it’s not given, a default index of integers will be used. |
dtype | This parameter is also optional and represents the data type of the series. It will automatically infer the type if it’s not provided. |
name | This parameter is also optional and represents the name for the series. It is “None” by default. |
copy | This parameter is optional and specifies whether or not to copy the data. It is “False” by default. |
Note: The
fastpath
parameter is for internal use only.
The following diagram depicts a diagrammatic representation of a pandas series with the following labels and values.
labels = ['a', 'b', 'c', 'd']
values = ['maroon', 'navy', 'gray', 'teal']
import pandas as pdimport numpy as nplist_of_numbers = [4, 10, 15, 6, 21, 19,]series_list = pd.Series(list_of_numbers)print("Series made using a list: \n")print(series_list)
Another way to create a Pandas series is by using a NumPy array.
import pandas as pdimport numpy as nparray_of_colors = np.array(['black', 'maroon', 'white', 'navy'])series_numpy = pd.Series(array_of_colors)print("Series made using an array: \n")print(series_numpy)
You can also create a Pandas series by using a dictionary.
import pandas as pdimport numpy as npdict_of_cars = {"1": "Bugatti Veyron", "2": "Aston Martin", "3": "Ferrarri"}series_dict = pd.Series(dict_of_cars)print("Series made using a dictionary: \n")print(series_dict)
We can also create a Pandas series by using a scalar value.
import pandas as pdimport numpy as npseries_scalar = pd.Series(5, index=['a', 'b', 'c', 'd'])print("Series made using a scalar: \n")print(series_scalar)
We can create an empty Pandas series by not providing any data. Additions can be made afterwards too.
import pandas as pdimport numpy as npempty_series = pd.Series()print(empty_series)
In a series, labels are crucial for both identifying and accessing data. They help provide a human-readable product so that we can easily refer to specific elements when needed.
If we don’t specify an index during creation, Pandas assigns default integer labels starting from 0.
import pandas as pdimport numpy as npdefault_labels_series = pd.Series([1, 2, 3, 4])print(default_labels_series)
We can assign custom labels to our series either during creation or afterwards.
import pandas as pdimport numpy as npcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)
For any data, locating elements is almost always crucial. Indexing in series involves selecting particular elements based on labels or positions. A few handy techniques and methods for series indexing are mentioned below.
A single element can be accessed by using the []
operator and specifying the label or index position inside of it.
import pandas as pdimport numpy as npdefault_labels_series = pd.Series([1, 2, 3, 4])print(default_labels_series)single_element = default_labels_series[1]
Using series[start:stop]
we can obtain the slice of elements between start
and stop
(the element at stop
is included as well).
import pandas as pdimport numpy as npdefault_labels_series = pd.Series([1, 2, 3, 4])print(default_labels_series)sliced_elements = default_labels_series[1:3]
Note: The index position starts from 0.
In Pandas, loc
and iloc
are two series attributes that are used for indexing and selecting data from a dataframe or series. The loc
function is used for label-based indexing, while iloc
is used for positional indexing.
loc
is label-based indexing, which means we can use it to access a group of rows and columns by labels or a boolean array.
iloc
is used for integer-location based indexing, where we can specify the location by integer indices.
For series, the usage is relatively simple since the data is only one-dimensional. Let’s go through the code examples.
import pandas as pdimport numpy as npdefault_labels_series = pd.Series([1, 2, 3, 4])print(default_labels_series)loc_example = default_labels_series.loc[1]iloc_example = default_labels_series.iloc[1]print(loc_example)print(iloc_example)
Pandas series come with various attributes that can provide us with information about the series. Let’s go through some of them along with code examples:
The at
attribute returns the value for a row / column label pair.
The dtype
attribute returns the data type of the series data.
The is_unique
attribute returns True if the elements of the series are unique.
The shape
attribute returns a tuple containing the shape of the series data.
The size
attribute returns the total number of elements in the series data.
The values
attribute returns the series as an ndarray object that contains the elements of the series.
import pandas as pdimport numpy as npcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# The at attributeat_attribute = custom_labels_series.at['label_a']# The dtype attributedtype_attribute = custom_labels_series.dtype# The is_unique attributeis_unique_attribute = custom_labels_series.is_unique# The shape attributeshape_attribute = custom_labels_series.shape# The size attributesize_attribute = custom_labels_series.size# The values attributevalues_attribute = custom_labels_series.valuesprint("At attribute: ", at_attribute)print("Dtype attribute: ", dtype_attribute)print("Is unique attribute: ", is_unique_attribute)print("Shape attribute: ", shape_attribute)print("Size attribute: ", size_attribute)print("Values attribute: ", values_attribute)
Note:
iloc
andloc
are also series attributes.
The data stored in series can be manipulated in different ways using built-in methods offered by Pandas for series. We’ve already seen how to create and access elements in a series, so now let’s cover some other highly useful series methods below.
series.head(n)
: Return the first n elements.
series.tail(n)
: Return the last n elements.
series + value
: Add a constant to each element.
series - value
: Subtract a constant from each element.
series * value
: Multiply each element by a constant.
series / value
: Divide each element by a constant.
series1 + series2
: Element-wise addition of two series.
series.sum()
: Sum of all elements.
series.mean()
: Mean of all elements.
series.median()
: Median of all elements.
series.min()
, series.max()
: Minimum and maximum values.
series.std()
, series.var()
: Standard deviation and variance.
series.dropna()
: Remove missing NaN values.
series.fillna(value)
: Replace missing values with a specified value.
series.sort_values()
: Sort elements by the values.
series.sort_index()
: Sort elements by the index.
series[condition]
: Filter elements based on a certain condition.
series.unique()
: Return unique values.
series.value_counts()
: Count occurrences of each unique value.
series.describe()
: Generate descriptive statistics.
series.info
()
: Display information about the series.
The complete list for attributes and methods is exhaustive, and can be found in the official documentation for pandas.Series. Let’s learn more about implementing the discussed methods through Python code.
import pandas as pdimport numpy as npcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Obtaining first or last n elementshead_elements = custom_labels_series.head(2)tail_elements = custom_labels_series.tail(2)# Element-wise mathematical operationsaddition = custom_labels_series + 5multiplication = custom_labels_series * 2# Descriptive statisticssum_elements = custom_labels_series.sum()mean_element = custom_labels_series.mean()median_element = custom_labels_series.median()min_element = custom_labels_series.min()max_element = custom_labels_series.max()std_dev = custom_labels_series.std()variance = custom_labels_series.var()# Handling missing valuesdropna_series = custom_labels_series.dropna()fillna_series = custom_labels_series.fillna(0)# Sorting Seriessorted_values = custom_labels_series.sort_values()sorted_index = custom_labels_series.sort_index()# Boolean indexingcondition_filtered = custom_labels_series[custom_labels_series > 15]# Obtaining unique valuesunique_values = custom_labels_series.unique()value_counts = custom_labels_series.value_counts()# Informational methodsdescription = custom_labels_series.describe()info_display = custom_labels_series.info()print("Head elements: \n", head_elements)print("Tail elements: \n", tail_elements)print("Addition: \n", addition)print("Multiplication: \n", multiplication)print("Sum: \n", sum_elements)print("Mean: \n", mean_element)print("Median: \n", median_element)print("Min element: \n", min_element)print("Max element: \n", max_element)print("Standard Deviation: \n", std_dev)print("Variance: \n", variance)print("Dropna: \n", dropna_series)print("Fillna: \n", fillna_series)print("Sorted values: \n", sorted_values)print("Sorted index: \n", sorted_index)print("Boolean index: \n", condition_filtered)print("Unique values: \n", unique_values)print("Value counts: \n", value_counts)print("Description: \n", description)print("Info: \n", info_display)
Understanding the difference between a series and a dataframe is crucial for efficient data manipulation. While a series is one-dimensional, a dataframe is a two-dimensional labeled data structure. There are many occurrences where we would need to convert one into the other.
We can convert a series to a dataframe using the to_frame()
method. Such a conversion may be necessary for more complex operations or for adding multidimensional data.
import pandas as pdimport numpy as npcustom_labels_series = pd.Series([10, 20, 30, 40], index=['label_a', 'label_b', 'label_c', 'label_d'])print(custom_labels_series)# Converting a series to a dataframeseries_to_dataframe = custom_labels_series.to_frame()print(series_to_dataframe)
Let’s briefly discuss the benefits of using the Pandas series.
Easy data handling.
Labeled data with an index.
Automatic data alignment.
Missing data handling (NaN values).
Built-in statistical and mathematical functions.
Integration with NumPy for numerical operations.
Specialized support for time series data.
Versatility in input formats (lists, dictionaries, etc.)
Congratulations, we’ve effectively covered all there is to get equipped with the most crucial knowledge about Pandas series. By practicing the code examples yourself, you’ll be well on your way to mastering Pandas series. Happy coding!
Free Resources