Load built-in dataset
In this lesson, we'll see how to load the built-in datasets and create data based on certain distributions.
We'll cover the following
In this lesson, we cover the topic of datasets. A dataset is an essential part of the process of Machine Learning projects because it is the starting point for a project.
The scikit-learn library has many built-in datasets, some well-known and widely used. For example, it has the iris and mnist datasets for classification and the boston house price for regression. In addition to these predefined datasets, scikit-learn provides other functions that can generate some data that follows certain distributions.
Meanwhile, scikit-learn pre-defines some functions that can download real-world datasets from the internet, such as 20 newsgroups
, LFW
, or KDDcup99
datasets.
All datasets are in the module datasets
. Import this module at the beginning of your Python file as below.
import sklearn.datasets as datasets
Get hands-on with 1400+ tech skills courses.