pandas is a popular Python-based data analysis toolkit that can be imported using:
import pandas as pd.
It presents a diverse range of utilities, ranging from parsing multiple file-formats to converting an entire data table into a NumPy matrix array. This property makes pandas a trusted ally in data science and machine learning.
pandas can help in the creation of multiple types of data analysis graphs. One such example is the scatterplot.
A scatter plot is implemented when comparing large numbers of data points with no regard to time. This is a very powerful chart type that can deployed to show the relationship between two variables (e.g., the height and weight of a person).
The default implementation of the scatter plot is:
DataFrame.plot.scatter(
x
= Noney
= None,s
= None,c
= None, **kwargs)
x
: int or string - The column name or position to be used as horizontal coordinate for each point.
y
: int or string - The column name or position to be used as vertical coordinate for each point.
s
: str, scalar, or array-like - The size of each point, possibly:
c
: str, array-like, int - Color for each column. Possible values are:
-Single string referenced in RGB, or RGBA code - used for all columns.
-Array referenced in RGB, or RGBA code - used for columns recursively.
-Column name or position specified according to a colormap.
**kwargs
:Keyword arguments to pass on to DataFrame.plot()
.
#import libraryimport pandas as pd#add csv file to dataframedf = pd.DataFrame({'length': [5, 8, 9, 8, 2, 3, 9, 9, 2, 3], 'width': [9, 3, 5, 2, 1, 3, 4, 5, 7, 5]})#create bar graphbargraph = df.plot.scatter(x = 'length', y = 'width', c = 'red')
Free Resources