What is t-SNE?

T-distributed stochastic neighbor embedding (t-SNE), is a machine learning model that helps us see and understand data better. It was made by Laurens van der Maaten and Geoffrey Hinton in 2008. This program turns high-dimensional data information into a simpler picture, usually in 2D or 3D. The goal is to make the data visually simple while keeping the important connections between the points.

How it works?

The algorithm works by modeling each high-dimensional data point as a probability distribution in the lower-dimensional space. Each piece of information is like a dot on a map in the simple picture. The program then tries to ensure that the distances between these dots in the simple picture match the original distances between the data in the detailed version. This way, we can look at the data more simply and still see the important groups or patterns.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Standardize the feature matrix
X_std = StandardScaler().fit_transform(X)
# Apply t-SNE to reduce the data to two components
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_std)
# Plot the results
plt.figure(figsize=(8, 6))
# Scatter plot with different colors for each class
for i in range(len(np.unique(y))):
    plt.scatter(X_tsne[y == i, 0], X_tsne[y == i, 1], label=f'Class {i}')
plt.title('t-SNE Visualization of Iris Dataset')
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.legend()
plt.show()

Explanation

Lines 1–5: We import the important libraries.
Line 8: We load the dataset, which is a part of the sklearn library.
Line 9–10: We assign value to the x and y components.
Line 13: We standardize the feature matrix to have zero mean and unit variance.
Line 16: We create a t-SNE model with two components (dimensions) in the lower-dimensional space. The random_state parameter ensures the reproducibility of the results.
Line 17: We fit the t-SNE model to the original data (X) and transform it into the lower-dimensional space (X_tsne).
Lines 23–24: We plot the data points in the lower-dimensional space (X_tsne). The data points are colored according to their corresponding class labels (y) using the colors specified in the colors list.
Lines 27–28: We visualize high-dimensional data in lower-dimensional spaces. The results can vary with different random seeds and perplexity values.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

What is t-SNE?

How it works?

Uses

Example

Explanation