Definition: Data classification

Classification is a classic data mining technique based on machine learning. It assigns categories to a collection of data to aid in more accurate predictions and analysis.

The classification method makes use of mathematical techniques such as decision trees, linear programming, neural networks, and statistics. In classification, we develop software that can learn how to classify data items into groups.

The data classification process includes two steps:

  1. Building the classifier or model
  2. Using the classifier for classification
svg viewer

Example use case

Suppose we want to predict whether or not an employee will leave the company in the next five years; we would use the past records of all employees who left the company to build a classifier. We would then use this classifier to predict the employees who will probably leave the company in a future period. In our particular case, the classifier will group the employees into two classes: those who will “leave” and those who will “stay.”

Copyright ©2024 Educative, Inc. All rights reserved