Scikit-Learn is a popular machine learning library in Python. It has some of the most fundamental algorithms used in supervised and unsupervised learning in machine learning.
To use Scikit-learn, we need to import the library abbreviated as sklearn
, as shown below.
import sklearn
The Iris dataset is one of the most popular datasets in data science. It is considered the ‘Hello World’ of machine learning and can be used to learn classification algorithms.
The Iris dataset consists of 3 types of Iris flowers and their characteristics and classifications.
The ‘scikit-learn’ package already comes with the Iris dataset preloaded.
Use the following steps to import the datasets
package from sklearn
. This gives us access to other datasets as well.
from sklearn import datasets#this imports the package 'datasets' from sklearn
In order to import the iris
data as a numpy
array, set the return parameter to True
.
from sklearn import datasetsiris_X,iris_y = datasets.load_iris(return_X_y = True)#loads the dataset as a numpy array#to view the Iris_X dataset arrayprint(iris_X)
To import the training data (X
) as a dataframe and the training data (y
) as a series, set the as_frame
parameter to True
.
from sklearn import datasetsiris_X,iris_y = datasets.load_iris(return_X_y = True , as_frame = True)#the X,y data is converted to a dataframe and series respectively
The
as_frame
functionality is not available insklearn
version 0.22 and older, so in case you run into an error (such as ‘unspecified keyword argument’ as_frame), you can upgrade yoursklearn
library using this code:
!pip install scikit-learn == 0.24
on your jupyter notebookor
pip install --upgrade scikit-learn
in your Python terminal.