Classification with PyCaret

Learn how to import necessary libraries and datasets for classification with PyCaret.

Classification is one of the fundamental supervised learning tasks. Its goal is to predict a categorical variable known as the class label. This task is known as binary classification when there are only two classes (00 and 11), or multiclass classification in case there are more. One of the most widely used binary classification models is logistic regression. It is defined in the following equation:

log(pn1pn)=β0+β1xn1++βpxnp=βTXn\log (\frac{p_{n}}{1-p_{n}})=\beta_{0}+\beta_{1} x_{n 1}+\cdots+\beta_{p} x_{n p}=\beta^{T} X_{n}

  • log(pn1pn)\log (\frac{p_{n}} {1-p_{n}}) is the natural logarithm of the odds, known as the logit function.
  • x1x_1 to xpx_p are the feature variables.
  • β0\beta_{0} is the intercept term.
  • β1\beta_{1} to βp\beta_{p} are the coefficients of the feature variables.
  • βTXn\beta^{T} X_{n} is the vectorized form of the equation. Our goal is to calculate pnp_{n} which is the probability that an instance of the given dataset belongs to class 11. The logistic function σ(z)\sigma(z) is the inverse of the logit (or log-odds) function, so we can apply it and get the desired result.

pn=σ(βTXn)=11+exp(βTXn)p_{n}=\sigma (\beta^{T} X_{n})=\frac{1}{1+\exp (-\beta^{T} X_{n})}

Get hands-on with 1400+ tech skills courses.