Importing Data with Scikit-Learn
Learn how to import a dataset to use with scikit-learn.
There are three main ways to obtain data when using scikit-learn:
Using the toy datasets that come with it.
Generating synthetic data.
Importing data from external sources, such as CSV files.
Loading toy datasets from scikit-learn
The scikit-learn library provides several toy datasets that we can use for experimenting with ML algorithms. One of the most commonly used datasets is the iris
dataset, which contains information about iris flowers, including their sepal length and width, petal length and width, and species. This is a classic toy dataset, often used in tutorials due to its data is relatively clean, and it can be used for multiclass classification tasks.
The following code demonstrates how to load the iris
dataset into our Python environment and plot it:
from sklearn.datasets import load_iris# Load the iris datasetiris = load_iris()# Print the first 5 rows of the datasetprint(iris.data[:5])
Line 1: We import
load_iris()
from scikit-learn.Line 4: We load the
iris
dataset.
In the output above, we can see the first five rows of the dataset and the values for the features: sepal length, sepal width, petal length, and petal width.