...

/

Getting Familiar with the PyCaret Environment

Getting Familiar with the PyCaret Environment

Get familiarized with the Pycaret environment setup for the classification task.

Initializing the PyCaret environment

After we complete the EDA, we need to initialize the PyCaret environment. We can accomplish this easily with the setup() function, which prepares the data for model training. This function has numerous parameters and lets us create a complete data preprocessing pipeline. Regardless, we’ll only examine the most important functionality to keep things simple. Please check the PyCaret Classification module documentation page for more details.

After we run setup(), we get a table of useful information about its settings and parameters. Let’s see it in detail below.

Press + to interact
# PyCaret environment setup.Setting different parameters in setup() function
# to prepare model training and deployment data.
classf = setup(data = data, target = 'species', train_size = 0.8,
normalize = True,silent=True, session_id = 3934)

PyCaret environment setup


Description

Value

0

session_id

3934

1

Target

Species

2

Target Type

Multiclass

3

Label Encoded

Iris-setosa:0,Iris-versicolor:1,Iris-virgininca:2

4

Original Data

(150,5)

5

Missing Values

False

6

Numeric Features

4

7

Categorical Features

0

8

Ordinal Features

False

9

High Cardinality Features

False

10

High Cardinality Method

None

11

Transformed Train Set

(120,4)

12

Transformed Test Set

(30,4)

13

Shuffle Train-Test

True

14

Stratify Train-Test

False

15

Fold Generator

StratifiedKFold

16

Fold Number

10

17

CPU Jobs

-1

18

Use GPU

False

Identifying numeric and categorical features

PyCaret can automatically infer whether a feature is numeric or categorical. In the setup() function, we can specify which features are categorical or numeric using the categorical_features and numeric_features parameters. In this case, all the features are correctly identified as numeric, so there is no need for any changes.

Train/Test split

Splitting a dataset into a train and test subset is ...