Home/Blog/Machine Learning/Top three machine learning models
Home/Blog/Machine Learning/Top three machine learning models

Top three machine learning models

14 min read
Mar 01, 2024

Machine learning, an increasingly influential force in the IT sector, has witnessed significant growth in recent years. Its applications span various domains, from recommendation systems to the development of autonomous vehicles. As technology continues to advance, acquiring knowledge about machine learning becomes imperative for professionals in the software and data science industries. Embracing the principles and applications of machine learning is not just beneficial; it is becoming a fundamental requirement for staying abreast of industry developments and fostering innovation. In this blog, we will explore the top three machine learning models, delving into their intricacies and applications to gain a comprehensive understanding of their significance in the field.

So, what exactly is machine learning? Machine learning is a field of artificial intelligence (AI) that focuses on the development of algorithms and models that enable computers to learn from and make predictions or decisions based on data without being explicitly programmed for specific tasks. In essence, machine learning algorithms allow computers to identify patterns in data and improve their performance over time through experience.

Before we explore the algorithms, let’s have a brief look at the different machine learning types.

Types of machine learning#

There are three primary categories for machine learning:

  • Supervised learning

  • Unsupervised learning

  • Reinforcement learning

Categories of machine learning
Categories of machine learning

Supervised learning is a category of machine learning in which the algorithm leverages labeled training data to formulate predictions or decisions. In this framework, the algorithm undergoes training on a dataset that comprises input data along with associated desired outputs. The objective is to grasp a mapping function from the input to the output, enabling the algorithm to predict outcomes for unseen data. In essence, the algorithm is guided by the provided dataset and supervised to discern patterns and relationships within the data. Supervised learning algorithms can further be classified as classification algorithms and regression algorithms.

Unsupervised learning, on the other hand, has no target or class attribute. The techniques employed in unsupervised learning aim to uncover inherent patterns within data commonly employed for exploratory data analysis. In unsupervised learning, the algorithms are focused on the data’s features. The objective is to identify relationships within the data and organize data points based on a similarity matrix.

Unsupervised vs. supervised learning
Unsupervised vs. supervised learning

Reinforcement learning directs its attention toward instructing agents to make specific decisions by drawing from their experiences in a dynamic environment. At its core, the reinforcement learning paradigm is built on the idea that an agent can develop decision-making abilities through active engagement with its environment. As the agent makes decisions aimed at optimizing cumulative rewards, it attains the capability to successfully execute tasks within an unpredictable and occasionally complex setting.

The basic components of reinforcement learning include five components: agent, environment, actions, state, and rewards. The figure below shows the action-reward feedback loop of a generic reinforcement learning model.

A basic diagram of reinforcement learning
A basic diagram of reinforcement learning

The loop process shown above can be summarized as follows:

  1. Initialization: The environment is initialized, and the agent begins in a starting state.

  2. Observation: The agent observes the current state of the environment.

  3. Action Selection: Based on the observed state and its internal policy, the agent selects an action (At) to take.

  4. Interaction: The agent takes the selected action, and it interacts with the environment.

  5. Feedback: The environment provides feedback to the agent in the form of a reward signal (Rt), indicating the quality of the action taken.

  6. State transition: The environment transitions to a new state based on the action taken by the agent.

  7. Learning: Steps 2–6 are repeated iteratively, with the agent continuously observing states, selecting actions, receiving rewards, and causing state transitions. Through this iterative process, the agent learns which actions lead to favorable outcomes (higher rewards) and adjusts its behavior accordingly to maximize cumulative rewards over time.

  8. Termination: The loop process will terminate after a predefined number of iterations or when certain conditions are met, such as achieving a specific goal or reaching a terminal state.

Now that we have covered the basics, let’s take a look at the top three machine learning models.

Linear regression#

This is one of the most popular and simplest machine learning algorithms. Linear regression is a supervised learning algorithm, and it aims to discover a linear line that optimally aligns with scattered data points on a graph. Its goal is to model the connection between independent variables (the x values) and a numerical outcome (the y values) by fitting the equation of a line to the given data. This resulting line can subsequently be employed to make predictions for future values.

The line that best fits the data is referred to as the regression line, and it’s expressed through a linear equation:

Where:

  • YY is the dependent, also known as the outcome or response, variable.

  • XX is the independent, or predictor variable.

  • aa represents the incline or steepness of the line.

  • bb is the y-intercept.

Linear regression
Linear regression

Note: In the equation above, we’re assuming that there is only one independent variable to keep things simple.

In linear regression, various techniques and methodologies can be employed to enhance or adapt the model to different scenarios. Some of these include the following:

  • Ordinary least squares (OLS) aims to minimize the sum of squared variances between observed and predicted values.

  • Lasso regression, also known as L1 regularization, incorporates the absolute values of the coefficients into the cost function as a penalty term.

  • Ridge regression (L2 regularization) introduces the squared values of the coefficients into the cost function as a penalty term.

  • Stepwise regression iteratively adds or removes predictors based on statistical criteria (forward selection, backward elimination, or both).

The linear regression models are commonly evaluated using the mean squared error (MSE), which calculates the average squared difference between actual and predicted values.

Linear regression finds applications across various domains due to its simplicity and versatility. Some common applications include:

  • Stock price prediction: Linear regression can be used to model the relationship between financial indicators and stock prices, assisting in predicting future stock values.

  • Route and pricing optimization: Ride-sharing platforms, such as Uber, leverage regression analysis to enhance dynamic pricing and optimize routes. By analyzing historical data, evaluating real-time traffic conditions, and considering user preferences, these companies can refine their route planning and implement effective pricing strategies.

  • Real estate valuation: Real estate agents, sellers, and buyers frequently use regression methods to assess property values. These techniques enable them to calculate property prices based on factors such as amenities, size, and location, incorporating historical data like market values and sales patterns.

Below, we have a simple example of the linear regression code using Python and the popular machine learning library sklearn. We’ll use the well-known Boston Housing dataset, which is included in sklearn.

Note: The following code will output an image and the values of a, b, and MSE. Click the “>” button in the output window to see the values.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
# Loading the Boston Housing dataset
boston = load_boston()
X = boston.data[:, np.newaxis, 2] # Using only one feature for simplicity (average number of rooms per dwelling)
y = boston.target
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Creating a linear regression model, training, and getting the prediction
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred = lin_reg.predict(X_test)
# Printing the model coefficients and intercept
print("Coefficients:", lin_reg.coef_)
print("Intercept:", lin_reg.intercept_)
# Evaluating the model performance
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
# Plotting the training data and regression line
plt.scatter(X_train, y_train, color='blue', label='Training Data')
plt.scatter(X_test, y_test, color='red', label='Testing Data')
plt.plot(X_test, y_pred, color='green', linewidth=3, label='Regression Line')
plt.xlabel('Average Number of Rooms')
plt.ylabel('House Price')

Code explanation

  • Lines 1–5: We import the required libraries: NumPy for numerical operations, Matplotlib for plotting, and sklearn for machine learning-related functions, specifically the LinearRegression class for linear regression modeling.

  • Lines 9–11: We load the Boston Housing dataset from sklearn and extract only one feature (average number of rooms per dwelling) for simplicity. X represents the input feature (average number of rooms per dwelling), and y represents the target variable (house prices).

  • Line 14: We divide the dataset into training and testing sets through the train_test_split function. In this process, 80% of the data is allocated for training, while the remaining 20% is reserved for testing.

  • Lines 17–19: We utilize sklearn’s LinearRegression class to construct a linear regression model. Subsequently, we proceed to train this model with the training data. Once trained, we apply the model to make predictions on the testing set.

  • Lines 22–23: We display the linear regression model’s coefficients and intercept.

  • Lines 26–27: We evaluate the model performance using MSE and print the result.

  • Lines 30–34: We plot the training data, testing data, and the regression line to visualize the model’s predictions.

Logistic regression#

Don’t let the name mislead you! Despite being named logistic regression, it’s actually a classification algorithm rather than a regression algorithm. Logistic regression is used to predict binary values. It determines the likelihood of an event occurring by aligning data with a logistic function. It’s focused on probability prediction, so the output values fall within the range of 0 to 1.

The logistic function is used by the logistic regression model to describe the likelihood of a binary result. An S-shaped curve that converts any real number into a value between 0 and 1 is used to depict this logistic function. The logistic function is defined as:

Where:

  • PP is the probability value.
  • ee is the base of the natural logarithm (about 2.718).
  • aa is the y-intercept.
  • bib_i is the coefficient associated with the predictor variable XiX_i. Note that the value of ii is between 1 and nn.
  • XiX_i is the independent variable.
Logistic regression
Logistic regression

Logistic regression models are evaluated using various metrics that assess the model’s performance in predicting the outcomes, including accuracy, F1-score, and area under the curve (AUC).

Some of the common applications that can use the logistic regression technique include:

  • Marketing and customer analytics: Logistic regression can be used to predict the probability of a customer making a purchase or responding to a marketing campaign based on demographic and behavioral data.

  • Natural language processing (NLP): Text sentiment analysis can use the logistic regression method to classify text as positive or negative based on the sentiment expressed.

  • Epidemiology: Linear regression can be used to study the risk factors for a particular disease and predicting the likelihood of an individual developing the disease.

Below, we have a simple example of logistic regression code using Python and the popular machine learning library, sklearn. We'll use the famous Breast Cancer Wisconsin dataset included in sklearn.

Note: The following code will output an image and the values of accuracy, the confusion matrix, the classification report, the model intercept, and different coefficients. Click the “>” button in the output window to see the values.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, auc
# Loading the Breast Cancer Wisconsin dataset (binary classification)
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardizing features (important for logistic regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Creating a logistic regression model, training the model, and making predictions
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluating the model performance
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
# Printing the results
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", classification_rep)
#The coefficients and feature importance
print("Model Coefficients:\n", model.coef_)
print("Model Intercept:\n", model.intercept_)
# Computing ROC curve and ROC area
fpr, tpr, _ = roc_curve(y_test, model.predict_proba(X_test)[:, 1])
roc_auc = auc(fpr, tpr)
# Plotting ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (AUC = {:.2f})'.format(roc_auc))
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle=':')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")

Code explanation

  • Lines 1–7: We import required libraries for data manipulation, visualization, and machine learning model evaluation.

  • Lines 10–12: We load the Breast Cancer Wisconsin dataset, which is a binary classification dataset, and assign features to X and target labels to y.

  • Line 15: We divide the dataset into training and testing sets through the train_test_split function. In this process, 80% of the data is allocated for training, while the remaining 20% is reserved for testing.

  • Lines 18–20: We standardize the features using StandardScaler to ensure that features have mean=0 and variance=1. This is important for logistic regression.

  • Lines 23–25: We then create a logistic regression model, train it on the training data (X_train, y_train), and make predictions on the test data (X_test).

  • Lines 28–35: We evaluate the model performance using accuracy, the confusion matrix, and the classification report, and we print the results.

  • Lines 38–39: We print the coefficients and intercept of the logistic regression model.

  • Lines 42–43: We then compute the ROC curve and AUC for the model.

  • Lines 46–52: After computing the ROC and AUC, we plot the ROC curve for visualizing the performance of the logistic regression model.

Decision tree#

Another popular machine learning algorithm, the decision tree, can be used for both classification and regression tasks. The objective is to construct a model capable of predicting the value of a target variable by acquiring straightforward decision rules deduced from the features within the data. The goal of the technique is to create a tree-like structure where each node represents a choice made based on a feature by recursively partitioning the data depending on features. The model is guided by these decision criteria when it generates predictions for previously unknown data.

Decision tree
Decision tree

Decision tree algorithms employ different approaches to create decision trees. Two common methodologies for classification decision trees are:

  • Gini impurity (Gini index) minimizes the Gini impurity at each node. Gini impurity quantifies the probability of incorrectly classifying a randomly selected element within the dataset. A lower Gini index indicates a better split.

  • Information gain (entropy) maximizes information gain, which measures the reduction in entropy (uncertainty or disorder) after a dataset is split. Lower entropy is usually preferable in decision trees.

  • MSE is used for regression trees.

Classification decision tree models are evaluated using various metrics that assess the model’s performance in predicting the outcomes, including accuracy, F1-score, and AUC. Mean absolute error (MAE) and R-squared (coefficient of determination) are used to describe regression trees. R-squared measures the percentage of the dependent variable’s variance that can be predicted based on the independent variables, and MAE is the average of the absolute differences between the predicted and actual values.

Some of the common applications that use decision tree models include:

  • Credit scoring: Decision trees are used to assess creditworthiness by analyzing factors such as income, credit history, and debt.

  • Customer churn prediction: Decision trees predict whether a customer is likely to churn based on factors like usage patterns, customer service interactions, and feedback.

  • Energy consumption forecasting: Decision trees forecast energy consumption by analyzing historical usage patterns, weather data, and other relevant factors.

  • Demand forecasting: In retail, decision trees predict product demand based on historical sales data, promotions, and external factors.

Below, we have an implementation for classification trees. We’ll use the Breast Cancer Wisconsin dataset here as well.

Note: The following code will output an image and the values of accuracy, the confusion matrix, and the classification report for the decision tree model. Click the “>” button in the output window to see the values.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
# Loading the Breast Cancer Wisconsin dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Creating a decision tree classifier
dt_classifier = DecisionTreeClassifier(random_state=50)
# Training the model on the training set and making predictions on the test set
dt_classifier.fit(X_train, y_train)
y_pred = dt_classifier.predict(X_test)
# Evaluating the model performance
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
# Printing the results
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", classification_rep)
# Visualizing the decision tree
plt.figure(figsize=(20, 10))
plot_tree(dt_classifier, feature_names=cancer.feature_names, class_names=cancer.target_names, filled=True)

Code explanation

  • Lines 1–5: We import required libraries for data manipulation, visualization, and machine learning model evaluation.

  • Lines 8–10: We load the Breast Cancer Wisconsin dataset, which is a binary classification dataset, and assign features to X and target labels to y.

  • Line 13: As seen in previous code as well, we divide the data into training and testing sets.

  • Line 16: We create an instance of the DecisionTreeClassifier class, which is used to build decision tree models for classification.

  • Lines 19–20: Next, we use the training data (X_train and y_train) to train the decision tree classifier. The model maps the goal variable (y_train) to the input features (X_train) through learning. On the testing set (X_test), we use the trained model to make predictions. The variable y_pred contains the expected labels.

  • Lines 23–30: We evaluate the model performance using accuracy, the confusion matrix, and the classification report, and we print the results.

  • Lines 33–35: Finally, we visualize the decision tree using the plot_tree function from sklearn. This visualization helps in understanding how the decision tree makes decisions based on the features. The tree is displayed with feature names and class names, and nodes are filled with colors to represent class distribution.

Comparison of models#

The table below gives a brief comparison of the three algorithms. We have included some benefits and drawbacks of using these algorithms. 

Model Comparison

Algorithm

Advantages

Disadvantages

Use Cases

Linear Regression

  • Simple to implement
  • Interpretable results
  • Works well with linearly separable data
  • Assumes a linear relationship
  • Sensitive to outliers


  • Predictive analysis
  • Trend forecasting

Logistic Regression

  • Simple and efficient
  • Provides probabilities
  • Efficient for binary classification
  • Assumes a linear relationship
  • Sensitive to outliers
  • Prone to overfitting
  • Not ideal for imbalanced datasets


  • Email spam detection
  • Credit risk analysis

Decision

Tree

  • Easy to interpret
  • Handles numerical and categorical data
  • Minimal data preparation
  • Can be used for both classification and regression tasks


  • Prone to overfitting
  • Complex trees can be difficult to interpret
  • Not suitable for linear relationships
  • Customer churn prediction
  • Risk assessment

Key takeaways and next steps#

In this blog, we’ve familiarized ourselves with the top three machine learning algorithms widely employed in the industry. Our exploration delved into the advantages and disadvantages associated with these algorithms and included a comprehensive understanding of their basic implementation details in the Python programming language.

This comprehensive overview equips us with the knowledge needed to make informed choices when selecting and implementing machine learning algorithms for diverse applications. Additionally, understanding the intricacies of implementation in Python enhances our versatility in addressing real-world data science challenges across different environments and scenarios.

Don’t stop here! You can explore and practice different techniques and libraries to build more accurate and robust models. We encourage you to check out the following courses on Educative:

Fundamentals of Machine Learning: A Pythonic Introduction

Cover
Fundamentals of Machine Learning: A Pythonic Introduction

This course focuses on core concepts, algorithms, and machine learning techniques. It explores the fundamentals, implements algorithms from scratch, and compares the results with scikit-learn, the Python machine learning library. This course contains examples, theoretical knowledge, and codes for various ML algorithms. You’ll start by learning the essentials of machine learning and its applications. Then, you’ll learn about supervised learning, clustering, and constructing a bag of visual words project, followed by generalized linear regression, support vector machines, logistic regression, ensemble learning, and principal component analysis. You’ll also learn about autoencoders and variational autoencoders and end with three exciting projects. By the end, you’ll have a solid understanding of machine learning and its algorithms, hands-on experience implementing such algorithms and applying them to different problems, and an understanding of how each algorithm works with the provided examples.

14hrs
Beginner
148 Playgrounds
21 Quizzes

Hands-on Machine Learning with Scikit-Learn

Cover
Hands-on Machine Learning with Scikit-Learn

Scikit-Learn is a powerful library that provides a handful of supervised and unsupervised learning algorithms. If you’re serious about having a career in machine learning, then scikit-learn is a must know. In this course, you will start by learning the various built-in datasets that scikit-learn offers, such as iris and mnist. You will then learn about feature engineering and more specifically, feature selection, feature extraction, and dimension reduction. In the latter half of the course, you will dive into linear and logistic regression where you’ll work through a few challenges to test your understanding. Lastly, you will focus on unsupervised learning and deep learning where you’ll get into k-means clustering and neural networks. By the end of this course, you will have a great new skill to add to your resume, and you’ll be ready to start working on your own projects that will utilize scikit-learn.

5hrs
Intermediate
5 Challenges
2 Quizzes

Written By:
Kamran Lodhi
 
Join 2.5 million developers at
Explore the catalog

Free Resources