What kind of patterns can be mined in data mining?

Class/concept description: Data entries are associated with labels or classes. For instance, in a library, the classes of items for borrowed items include books and research journals, and customers' concepts include registered members and not registered members. These types of descriptions are class or concept descriptions.

Frequent patterns: These are data points that occur more often in the dataset. There are many kinds of recurring patterns, such as frequent items, frequent subsequence, and frequent sub-structure.

Associations: It shows the relationships between data and pre-defined association rules. For instance, a shopkeeper makes an association rule that 70% of the time, when a football is sold, a kit is bought alongside. These two items can be combined together to make an association.
Correlations: This is performed to find the statistical correlations between two data points to find if they have positive, negative, or no effect.

Clusters: This is the formation of a group of similar data points. Each point in the collection is somewhat similar but very different from other members of different groups.

Let's delve into the practical implementation of clustering through code. It provides a fundamental technique for discovering patterns within data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.cluster import DBSCAN
X, _= make_classification(
    n_samples=1000,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_clusters_per_class=1,
    random_state=4
)
df = pd.DataFrame(X)
print(df.shape)
# # define the model
dbscan_model = DBSCAN(eps=0.35,min_samples=16)
# # train the model
dbscan_model.fit(df)
# #visualize the clusters.
plt.figure(figsize=(10,10))
plt.scatter(df[0],df[1],c = dbscan_model.labels_,s=15)
plt.title('DBSCAN Clustering',fontsize=20)
plt.xlabel('Feature 1',fontsize=14)
plt.ylabel('Feature 2',fontsize=14)
plt.show()

Note: To read more about the DBSCAN algorithm, check out this answer.

Lines 1–5: We import the neccessary libraries for use.
Lines 7–14: We create a random dataset with 1000 samples and 2 features.
Lines 16–17: We convert the dataset output X into a data frame and print the shape of the data frame.
Line 20: We initialize the DBSCAN model with an eps=0.35 and min_samples=16, both of which need to be tuned to obtain the optimal number of clusters and detect noise better.
Line 23: We fit the model to the dataset and generate clusters.
Lines 26–30: We visualize the clusters using a scatter plot.

Predictive patterns

It predicts future values by analyzing the data patterns and their outcomes based on the previous data. It also helps us find missing values in the data.

Predictive patterns can be categorized into the following patterns.

Classification: It helps predict the label of unknown data points with the help of known data points. For instance, if we have a dataset of X-rays of cancer patients, then the possible labels would be cancer patient and not cancer patient. These classes can be obtained by data characterizations or by data discrimination.
Regression: Unlike classification, regression is used to find the missing numeric values from the dataset. It is also used to predict future numeric values as well. For instance, we can find the behavior of the next year's sales based on the past twenty years' sales by finding the relation between the data.
Outlier analysis: Not all data points in the dataset need to follow the same behavior. Data points that don't follow the usual behavior are called outliers. Analysis of these outliers is called outlier analysis. These outliers are not considered while working on the data.
Evolution analysis: As the name suggests, those data points change their behavior and trends with time.

From predictive patterns, let's see the practical implementation of regression through code. It is an essential predictive pattern used to understand the relationship between variables and make predictions based on observed data.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
boston = load_boston()
X = boston.data
Y = boston.target
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
plt.scatter(y_test, y_pred)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], linestyle='--', color='red', linewidth=2)  # Regression line
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.title("True vs. Predicted Values")
plt.savefig('./output/plot.png')
plt.show()

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

What kind of patterns can be mined in data mining?

Overview

Descriptive patterns

Predictive patterns