Initializing the PyCaret environment

After we complete the EDA, we need to initialize the PyCaret environment. We can accomplish this easily with the setup() function, which prepares the data for model training. This function has numerous parameters and lets us create a complete data preprocessing pipeline. Regardless, we’ll only examine the most important functionality to keep things simple. Please check the PyCaret Classification module documentation page for more details.

After we run setup(), we get a table of useful information about its settings and parameters. Let’s see it in detail below.

Press + to interact

Identifying numeric and categorical features

PyCaret can automatically infer whether a feature is numeric or categorical. In the setup() function, we can specify which features are categorical or numeric using the categorical_features and numeric_features parameters. In this case, all the features are correctly identified as numeric, so there is no need for any changes.

Train/Test split

Splitting a dataset into a train and test subset is standard practice in machine learning. We set the train_size parameter to $0.8$ . This means that we’ll train the machine learning model on $80$ % of the original ...

	Description	Value
0	session_id	3934
1	Target	Species
2	Target Type	Multiclass
3	Label Encoded	Iris-setosa:0,Iris-versicolor:1,Iris-virgininca:2
4	Original Data	(150,5)
5	Missing Values	False
6	Numeric Features	4
7	Categorical Features	0
8	Ordinal Features	False
9	High Cardinality Features	False
10	High Cardinality Method	None
11	Transformed Train Set	(120,4)
12	Transformed Test Set	(30,4)
13	Shuffle Train-Test	True
14	Stratify Train-Test	False
15	Fold Generator	StratifiedKFold
16	Fold Number	10
17	CPU Jobs	-1
18	Use GPU	False

Introduction to Machine Learning

Regression

Classification

Clustering

Customer Segmentation with K-Means Clustering

Anomaly Detection

Natural Language Processing

Deploying a Machine Learning Model

Conclusion

Appendix

Getting Familiar with the PyCaret Environment

Initializing the PyCaret environment

PyCaret environment setup

Identifying numeric and categorical features

Train/Test split