...
/Getting Familiar with the PyCaret Environment
Getting Familiar with the PyCaret Environment
Get familiarized with the Pycaret environment setup for the classification task.
Initializing the PyCaret environment
After we complete the EDA, we need to initialize the PyCaret environment. We can accomplish this easily with the setup()
function, which prepares the data for model training. This function has numerous parameters and lets us create a complete data preprocessing pipeline. Regardless, we’ll only examine the most important functionality to keep things simple. Please check the PyCaret Classification module documentation page for more details.
After we run setup()
, we get a table of useful information about its settings and parameters. Let’s see it in detail below.
# PyCaret environment setup.Setting different parameters in setup() function# to prepare model training and deployment data.classf = setup(data = data, target = 'species', train_size = 0.8,normalize = True,silent=True, session_id = 3934)
PyCaret environment setup
Description | Value | |
0 | session_id | 3934 |
1 | Target | Species |
2 | Target Type | Multiclass |
3 | Label Encoded | Iris-setosa:0,Iris-versicolor:1,Iris-virgininca:2 |
4 | Original Data | (150,5) |
5 | Missing Values | False |
6 | Numeric Features | 4 |
7 | Categorical Features | 0 |
8 | Ordinal Features | False |
9 | High Cardinality Features | False |
10 | High Cardinality Method | None |
11 | Transformed Train Set | (120,4) |
12 | Transformed Test Set | (30,4) |
13 | Shuffle Train-Test | True |
14 | Stratify Train-Test | False |
15 | Fold Generator | StratifiedKFold |
16 | Fold Number | 10 |
17 | CPU Jobs | -1 |
18 | Use GPU | False |
Identifying numeric and categorical features
PyCaret can automatically infer whether a feature is numeric or categorical. In the setup()
function, we can specify which features are categorical or numeric using the categorical_features
and numeric_features
parameters. In this case, all the features are correctly identified as numeric, so there is no need for any changes.
Train/Test split
Splitting a dataset into a train and test subset is ...