Data Clustering

Build on the fundamentals of data mining for TSP by learning clustering techniques.

Data visualizations are used to extract knowledge and insights from data sets. In addition to just displaying the location, the coordinates on a map can display additional key figures. For example, the location marker color can display the level of sales. Even time series diagrams can be plotted over the location. Often, it is only when looking at the data pointsIn data analysis and statistics, a data point is a piece of information that describes a unit of observation at a particular point in time. in a graphical representation that we recognize particular patterns or the proverbial needle in the haystack.

Data mining refers to the process of discovering patterns and extracting useful information from (large) datasets. Clustering is one of the techniques used in data mining and involves grouping similar data points together into clusters based on their similarities or dissimilarities. The goal of clustering is to identify groups of data points that are similar to each other and different from data points in other clusters. This can help in identifying patterns in the data that can be useful in a variety of applications, such as market segmentationMarket segmentation is a marketing strategy that involves dividing a larger and heterogeneous market into smaller, more manageable segments based on shared characteristics and needs. The goal of market segmentation is to better understand and target specific customer groups with tailored marketing strategies. and anomaly detectionAnomaly detection is a technique used in data analysis, statistics, and machine learning to identify unusual patterns or observations in a dataset that don’t conform with expected behavior..

Anomaly detection is a technique used in data analysis, statistics, and machine learning to identify unusual patterns or observations in a dataset that do not conform to expected behavior.

Data mining

Our boss asked us to figure out the shortest total distance between stores. But as proactive data scientists, we want to deliver more than what was asked of us. After all, it’s our job to remember that the appeal is in combining different data. Given our strong connection with the sales department, obtaining the daily sales data of the stores is easily manageable.

Sample Extract

SalesDate


SalesValue


Store


31.01.2020


39.0


1

31.01.2020


2560.0


2

31.01.2020


4476.0


3

Note: The dataset we have contains stores with numbers. Store 1 corresponds to StoreA, Store 2 corresponds to StoreB, and so on. This mismatch often occurs in reality when we merge master data from different systems.

RFM clustering

Recency, Frequency, and Monetary value (RFM) clustering is an effective customer segmentation technique. It can help our sales colleagues to make better strategic decisions because we can quickly ...