An Introductory Guide to Data Science and Machine Learning/

...

Dummy Estimators and Handling Imbalance Class Problem

You will learn about Dummy Estimators and handling imbalance class problems in this lesson. Dummy estimators help develop baseline models for classification. The Imbalanced class problem is a common problem, and there are several techniques to deal with it.

We'll cover the following...

- Dummy Estimators
- Imbalance class problem
  - Accuracy Paradox
  - Handling Imbalance Class Problem
- - Oversampling
  - Undersampling
    - Random Undersampling
    - Tomek Links Undersampling

Dummy Estimators

Dummy Estimators help us to define a baseline model on the problem at hand. We saw them in case of Regression problems too. In the case of Classification, we have the following Dummy Estimators.

stratified: It predicts the random class label by respecting the training set class distribution.
most_frequent: It always predicts the most common label in the training dataset.
prior: It predicts the class which maximizes the class prior.
uniform: It generates the predictions uniformly at random.
constant: It always predicts the constant label provided by the user.

prior always predicts the class that maximizes the class prior (like most_frequent) and predict_proba returns the class prior.

Press + to interact

Python 3.5

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, train_size=0.7)
# Fitting the BaseLine DummyEstimator
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent', random_state=0)
clf.fit(X_train, y_train)
print("The accuracy (DummyClassifier) on test set is {0:.2f}".format(clf.score(X_test, y_test)))
# Fitting the Support Vector Machine 
from sklearn.svm import SVC
clf = SVC(kernel='linear', C=1).fit(X_train, y_train)
print("The accuracy (SVM) on test set is {0:.2f}".format(clf.score(X_test, y_test)))

What is Data Science ?

Applications of Data Science

Overview of Libraries

Probability and Statistics

Machine Learning Part-1

Machine Learning Part-2

Machine Learning Part-3

Deep Learning

Machine Learning Tools and Libraries

Big Data Tools and Technologies

Where to go next ?

Dummy Estimators and Handling Imbalance Class Problem

Dummy Estimators