...
/Getting Familiar with the PyCaret Environment
Getting Familiar with the PyCaret Environment
Let’s familiarize ourselves with the PyCaret environment setup in the regression task.
We'll cover the following...
Initializing the PyCaret environment
After the EDA part of the project is complete, the next step is to initialize the PyCaret environment. We can accomplish that by using the setup()
function, which prepares the model training and deployment data. This function has numerous parameters, but we’ll only focus on the most important. If you want to delve deeper, you can refer to the documentation page of the PyCaret Regression module.
After running the setup()
function, a table with its parameters and settings is printed as seen in the image below.
# PyCaret environment setup.Setting different parameters in setup() function# to prepare model training and deployment data.reg = setup(data=data, target='charges', train_size = 0.8, session_id = 7402,numeric_features = numeric[:-1], categorical_features = categorical,transformation = True,silent=True, normalize = True, transform_target = True)
Note:
silent
parameter toTrue
prevents any prompts from stopping code execution.
PyCaret environment setup
Description | Value | |
0 | session_id | 7402 |
1 | Target | charges |
2 | Original Data | (1338,7) |
3 | Missing Values | False |
4 | Numeric Features | 3 |
5 | Categorical Features | 3 |
6 | Ordinal Features | False |
7 | High Cardinality Features | False |
8 | High Cardinality Method | None |
9 | Transformed Train Set | (1070,9) |
10 | Transformed Test Set | (268,9) |
11 | Shuffle Train-Test | True |
12 | Stratify Train-Test | False |
13 | Fold Generator | KFold |
Now we’ll examine the preprocessing pipeline applied to the dataset.
Identifying numeric and categorical features
PyCaret can automatically infer whether a feature is numeric or categorical. In the setup()
function, we can specify which features are categorical or numeric using the categorical_features
and numeric_features
parameters, as we did in this case.
Train/Test split
Splitting a dataset into a train and test subset is standard practice in machine learning because it is important to evaluate model performance on data that the model has not seen before. In this case, we set the train_size
parameter to
. This means that the machine learning model will be trained on ...