...

/

Getting Familiar with the PyCaret Environment

Getting Familiar with the PyCaret Environment

Let’s familiarize ourselves with the PyCaret environment setup in the regression task.

Initializing the PyCaret environment

After the EDA part of the project is complete, the next step is to initialize the PyCaret environment. We can accomplish that by using the setup() function, which prepares the model training and deployment data. This function has numerous parameters, but we’ll only focus on the most important. If you want to delve deeper, you can refer to the documentation page of the PyCaret Regression module.

After running the setup() function, a table with its parameters and settings is printed as seen in the image below.

Press + to interact
# PyCaret environment setup.Setting different parameters in setup() function
# to prepare model training and deployment data.
reg = setup(data=data, target='charges', train_size = 0.8, session_id = 7402,
numeric_features = numeric[:-1], categorical_features = categorical,
transformation = True,silent=True, normalize = True, transform_target = True)

Note: silent parameter to True prevents any prompts from stopping code execution.

PyCaret environment setup


Description

Value

0

session_id

7402

1

Target

charges

2

Original Data

(1338,7)

3

Missing Values

False

4

Numeric Features

3

5

Categorical Features

3

6

Ordinal Features

False

7

High Cardinality Features

False

8

High Cardinality Method

None

9

Transformed Train Set

(1070,9)

10

Transformed Test Set

(268,9)

11

Shuffle Train-Test

True

12

Stratify Train-Test

False

13

Fold Generator

KFold

Now we’ll examine the preprocessing pipeline applied to the dataset.

Identifying numeric and categorical features

PyCaret can automatically infer whether a feature is numeric or categorical. In the setup() function, we can specify which features are categorical or numeric using the categorical_features and numeric_features parameters, as we did in this case.

Train/Test split

Splitting a dataset into a train and test subset is standard practice in machine learning because it is important to evaluate model performance on data that the model has not seen before. In this case, we set the train_size parameter to 0.800.80. This means that the machine learning model will be trained on 8080 ...