Simplifying Machine Learning with PyCaret in Python/

...

Getting Familiar with the PyCaret Environment

Let’s familiarize ourselves with the PyCaret environment setup in the regression task.

We'll cover the following...

Initializing the PyCaret environment
Identifying numeric and categorical features
Train/Test split
Normalization of numeric features
Transformation of numeric features and target
One-hot encoding categorical features
Printing the preprocessed features
Comparing regression models

Press + to interact

Now we’ll examine the preprocessing pipeline applied to the dataset.

Identifying numeric and categorical features

PyCaret can automatically infer whether a feature is numeric or categorical. In the setup() function, we can specify which features are categorical or numeric using the categorical_features and numeric_features parameters, as we did in this case.

Train/Test split

Splitting a dataset into a train and test subset is standard practice in machine learning because it is important to evaluate model performance on data that the model has not seen before. In this case, we set the train_size parameter to $0.80$ . This means that the machine learning model will be trained on $80$ % of the original data, while the $20$ % will be used for testing purposes.

Normalization of numeric features

Some regression models require numeric features to be normalized by having mean $\mu = 0$ ...

	Description	Value
0	session_id	7402
1	Target	charges
2	Original Data	(1338,7)
3	Missing Values	False
4	Numeric Features	3
5	Categorical Features	3
6	Ordinal Features	False
7	High Cardinality Features	False
8	High Cardinality Method	None
9	Transformed Train Set	(1070,9)
10	Transformed Test Set	(268,9)
11	Shuffle Train-Test	True
12	Stratify Train-Test	False
13	Fold Generator	KFold

Introduction to Machine Learning

Regression

Classification

Clustering

Customer Segmentation with K-Means Clustering

Anomaly Detection

Natural Language Processing

Deploying a Machine Learning Model

Conclusion

Appendix

Getting Familiar with the PyCaret Environment

Initializing the PyCaret environment

PyCaret environment setup

Identifying numeric and categorical features

Train/Test split

Normalization of numeric features