Home/Blog/Machine Learning/Introduction to machine learning (2024)
Home/Blog/Machine Learning/Introduction to machine learning (2024)

Introduction to machine learning (2024)

izza ahmad
Dec 11, 2023
16 min read

Get Started With Machine Learning

Learn the fundamentals of Machine Learning with this free course. Future-proof your career by adding ML skills to your toolkit — or prepare to land a job in AI or Data Science.

Once a budding field, machine learning (ML) has since become an integral part of our daily lives. It has been seamlessly integrated into various applications like automated translation and self-driving cars. At its core, machine learning is a subfield of artificial intelligence (AI). It focuses on building systems that learn and improve from experience. Let’s break down the complexities of machine learning into more understandable segments.

If you’ve been hearing a lot lately about “Machine Learning” but feel a bit lost in the tech jargon, you’re not alone. This blog has been made just for you! It will provide you with all the necessary details you need to get equipped as a beginner, including types of machine learning, common machine learning algorithms, the complete steps in machine learning, applications like pattern recognition, regression analysis, and automation, and finally Python code to solve an ML problem. By the end of this blog, you’ll be solving real-world machine-learning problems as well!

What is machine learning?#

Machine learning (ML) lets computers learn from data and improve over time without specific human instructions. It's not like traditional programming, where engineers write step-by-step instructions. Instead, ML allows computers to solve problems by learning from past data.

This learning happens through algorithms that analyze labeled data (training data). These algorithms enable the computer to recognize patterns and make predictions on its own. For example, after showing a machine many pictures of apples and pears, it can learn to identify them.

Why does ML stand out? It can efficiently process large amounts of data more accurately than humans. 

How does machine learning work?#

ML enables computers to mimic human learning. This approach focuses on analyzing data, recognizing patterns, and improving past experiences — all done with minimal human input.

For example, you would start by feeding the computer pictures of different vegetables, each labeled with its name. The ML model examines these photos, identifying patterns in the shape, size, and color of the vegetables.

This process is called data labeling or tagging. For example, consider sentiment analysis. Here, customer feedback is labeled as positive, neutral, or negative. The model then learns to link these tags to data characteristics, such as specific words in text or colors in images.

The model is tested with new, unseen data after training. If it accurately identifies and categorizes this data, it's ready for real-world tasks. Otherwise, it needs more training. Over time, as it encounters more data, the model keeps learning and improving its accuracy. That's the crux of how ML works — it's a continuous process of learning, testing, and refining.

What are the types of machine learning?#

There are three types of ML:  supervised, unsupervised, and reinforcement learning. You should know how these methods guide machines in decision-making. Let's explore the three most popular types of machine learning.

Types of Machine Learning
Types of Machine Learning

Supervised learning#

Supervised learning involves training models with labeled data. Each data set includes inputs paired with the correct outputs. This guides the model in recognizing patterns and making predictions. For instance, to learn how to identify different types of vehicles, a model is fed images labeled 'car,' 'truck,' or 'bike.' The model then learns to classify new, unlabeled images into these categories. This method is crucial in predicting different real-life scenarios. For example, house prices based on historical sales data or classifying emails as spam or non-spam. The process is straightforward. First, the model is trained with labeled examples, and then applies what it has learned to new, unseen data. Gradually, its accuracy and reliability improve.

Unsupervised learning#

Unsupervised learning uses algorithms to find patterns in unlabeled data without set outcomes. It's crucial for analyzing abundant, unlabeled data. Clustering is a common method that groups similar data for insights. For example, banks might use unsupervised learning for customer segmentation. This is based on credit behavior and helps enable tailored services.

This technique also aids in feature learning and anomaly detection, making it effective for categorizing complex data. E-commerce sites can use browsing habits for personalized product recommendations. Unsupervised learning operates without direct guidance, mimicking human pattern recognition.

Reinforcement learning#

Reinforcement learning (RL) is a ML method in which a software agent learns to make decisions by trial and error. The aim is to maximize rewards. Unlike other learning models, this approach lacks training data. As a result, the agent learns from its actions, improving over time. This approach is significant in robotics and gaming, where actions directly correlate with outcomes, such as scores in video games.

In RL, the agent, through repeated attempts, discovers which actions yield the highest rewards. Imagine teaching a dog a trick. It learns not from instructions but from rewards or corrections. Similarly, an RL agent starts with clumsy attempts but eventually evolves into an adept entity.

A practical RL scenario involves feeding the system data for the purpose of identifying a fruit. If the model incorrectly identifies an apple as an orange, the feedback corrects it. Learning from this, it correctly identifies the apple the next time. 

The machine learning process#

The gist of machine learning can be explained by understanding the steps we take when solving a machine learning problem.

Steps in Machine Learning
Steps in Machine Learning

Step 1: Dataset selection#

Solving any problem requires a sufficient dataset that accurately represents the knowledge required for solving the problem statement. Incorrect, irrelevant, or outdated data may result in suboptimal results. For instance, if we’re solving a problem that requires predicting lung cancer, it would be wise to obtain a dataset containing information on people's ages, smoking habits, health history, fatigue, finger coloration (in case of yellowing from smoking), and difficulty breathing.

Step 2: Data preprocessing#

Once a dataset has been selected, it’s first preprocessed before proceeding to the next step. Preprocessing mainly involves cleaning the raw data, removing discrepancies, and making it more suitable for the machine learning algorithm.

Here are a few data cleaning techniques:

  1. Handling missing values: The original data may contain missing values in some columns. This information can be replaced with the most frequent value, imputed with the nearest neighbors, or deleted entirely. Learn more on the best techniques for how to handle missing values in machine learning.

  2. Outlier detection: Outliers are drastically different from the rest of the data, and this deviation results in less accurate results. Thankfully, outliers can be handled in various ways, such as replacing them with the mean or ignoring them entirely. Learn more on how to handle outliers in machine learning.

  3. Feature scaling: Feature scaling involves performing transformations to normalize the data within a certain range.

  4. One-hot encoding: One-hot encoding is a technique for converting categorical data into numerical data by making a column for each category and assigning 1 to the category of that row and 0 to all other category columns. This helps the model learn easily due to the binary nature of the new columns.

Step 3: Feature selection and extraction#

Feature selection involves selecting existing variables that are useful for helping the model learn and make accurate conclusions. Learning patient names, for example, will not help the model predict the incidence of lung cancer—but learning about a patient’s smoking history certainly will. On the other hand, feature extraction refers to deriving new data from existing information, such as the texture obtained from a lung scan image. You should opt for feature selection when the existing data is relevant, and you should opt for feature extraction when the existing data is redundant—or when combining the data to form new variables would be more helpful than relying on standalone variables. The two main benefits of these techniques are that they reduce the input size of the data and get rid of noise.

Step 4: Model selection#

Choosing the correct model is essential to obtaining optimal results for the given problem, especially because different machine learning models target different business needs. Therefore, it’s crucial to select the model most applicable to the problem statement. We’ll learn more about some important models later in this blog.

Step 5: Model training#

A model learns from the dataset during its training phase. It makes sense of the variables and information we provide to it, and it then tries to infer appropriate findings based on this information. We could use any type of machine learning depending on the problem and dataset. For instance, in our cancer dataset scenario, after indicator variables (like smoking, fatigue, etc.), the dataset also contains a column indicating whether or not the person has cancer. This way the model can learn the mapping between inputs and outputs and in turn predict or classify new data. This is an example of supervised learning.

The machine learning process also consists of data separated for various purposes, as described below:

  • Training set: The model is initially trained on a training dataset in order to learn patterns in the data.  

  • Validation set: Overfitting is a problem that can occur during training. To make up for this during training, we use a validation set that is different from the training set in order to see how well the model works on data other than the training set. After this assessment, the model adjusts itself accordingly.

  • Testing set: Once the model has trained itself, an entirely separate dataset can be used for a final evaluation to gauge how well the trained model performs on real-world data. 

Loss function#

We use a loss function in machine learning to calculate the difference between the predicted and actual values. This helps guide the model in improving its predictions. For instance, mean squared error (MSE) is a common loss function that calculates the average of the squared difference between predicted and actual values and is often used in regression problems.

Step 6: Model evaluation#

The quality of the model can be judged by evaluating it against various metrics. This is crucial in order to ensure its effectiveness in making accurate predictions. A few key evaluators are discussed below:

Accuracy: Determines the number of overall correct predictions.

Prediction: Determines the number of instances correctly predicted as positive out of all instances predicted as positive.

Recall / Sensitivity: Determines the number of instances correctly predicted as positive out of all actually positive instances.

Specificity: Determines the accuracy of instances predicted as negative.

F1 score: Balances between precision and recall.

Confusion matrix: A grid visualization of true/false positives and true/false negatives.

ROC Curve and AUC-ROC: A visualization that illustrates the trade-off between true and false positive rates.

Step 7: Hyperparameter tuning#

Based on the results obtained after evaluating the model, we can work on improving its performance. Hyperparameter tuning involves adjusting various external model parameters and assessing their impact on the model’s performance. This process aims to identify the optimal values for these external configurations in order to improve the performance of the machine learning model.

Step 8: Predictions#

Once the model has been fully trained and optimized, it can finally be used to solve the required problem on unseen data. For instance, a trained model on house price prediction can now efficiently predict the price of a new house using the provided input parameters.

Machine learning algorithms#

The world is full of immense data, and this data can be used to solve various problems by inferring useful analytics from it. Let’s go through a few common machine learning algorithms and see how they work.

Model

Explanation

Linear regression

We use linear regression to predict a response variable by utilizing one or more given variables. We achieve this by establishing connections between different variables and using this information to determine the best line for predicting new values. For instance, we can use temperature to predict ice cream sales.

Logistic regression

We also use logistic regression to predict values, but it does not yield continuous output. Instead, we map inputs to either 0 or 1. Therefore, it can also be called a classification algorithm, as we group inputs into two categories. For example, we predict whether a student will pass an exam based on past grades.

Decision trees

We deploy decision trees to guide decision-making by segregating data using a series of yes/no questions. Each branch of a tree represents one of the possible conditions that is true for the data, and following a path leads to a certain outcome. For example, for house price predictions, the first branch could be the number of rooms, and the second could be whether or not the rooms contain air conditioners.

Random forest

We use random forest as an ensemble of various decision trees. This results in less overfitting and more accurate results, as we take the average of various trees. An example could be predicting the outcome of a medical diagnosis by incorporating insights from various decision trees.

Support vector machine

We use a support vector machine (SVM) to efficiently segregate data into distinct categories by establishing the best possible separation boundary. For instance, we use SVMs for email classification, determining whether or not an email is spam based on various features like content or sender.

K-nearest neighbors

We use k-nearest neighbors (KNN) as a supervised machine learning algorithm for both classification and regression tasks. In simple terms, we predict the label or value of a new data point based on the majority class or average value of its nearest labeled neighbors. The value of k is specified, for instance, with five neighbors. An application example includes predicting customer preferences by examining the purchasing patterns of similar customers.

K-means clustering

We use k-means clustering as an unsupervised machine learning technique. We apply it to group data, and it’s valuable for identifying patterns or relationships in data. For example, we group customers based on buying behaviors in order to tailor marketing strategies for each cluster.

Naive Bayes

We use naive Bayes, which is based on Bayes’ theorem and works with probabilities. We employ the concept of conditional probability, where an event occurs given another event has already occurred. For instance, we use naive Bayes for sentiment analysis of customer reviews. The reviews could be positive or negative based on certain keywords.

Gradient boosting

We use gradient boosting as a sequential process for gradually correcting prediction errors. We initiate it with an initial model, and upon obtaining results, another model is applied to address poor predictions. We then merge the outcomes of both models, refining the errors of the preceding model. For instance, we can apply gradient boosting to predict stock prices based on previous trends by iteratively improving prediction accuracy.

Classification example in Python#

Classification is one of the most prominent problems in machine learning. To understand the complete process of machine learning, let’s try to solve a classification problem in Python where we’ll classify a flower’s species according to its various features.

First, we begin by importing the required libraries for data visualization and machine learning.

  • The matplotlib.pyplot and seaborn libraries are imported for creating plots and visualizations.

  • The sklearn machine learning library is used to import the datasets, KNN classifier, train-test split, and performance metrics.

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, roc_auc_score, confusion_matrix

Next, we load the Iris dataset, a commonly used dataset in machine learning. We separate the dataset into feature variables (X) and target labels (y) i.e. the species class.

iris_dataset = datasets.load_iris()
X = iris_dataset.data
y = iris_dataset.target

We then split the dataset into training and testing sets, allocating 80% of the data for training the machine learning model.

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)

Here we make the important choice of model selection. Since this is a multi-class classification problem, we create a KNN model with n_neighbors=3 and train it using the training data.

k_nearest_neighbour_model = KNeighborsClassifier(n_neighbors = 3)
k_nearest_neighbour_model.fit(X_train, y_train)

After training the model, we use it to make predictions on the test set.

y_prediction = k_nearest_neighbour_model.predict(X_test)

To evaluate the model, we calculate and store various performance metrics like accuracy, precision, recall, ROC AUC, and the confusion matrix.

accuracy = accuracy_score(y_test, y_prediction)
precision = precision_score(y_test, y_prediction, average = "weighted")
recall = recall_score(y_test, y_prediction, average = "weighted")
roc_auc = roc_auc_score(y_test, k_nearest_neighbour_model.predict_proba(X_test), multi_class="ovr")
conf_matrix = confusion_matrix(y_test, y_prediction)

To visualize the confusion matrix, we use the seaborn library to create a heatmap.

sns.heatmap(conf_matrix, annot = True, fmt = "d", cmap = "Blues", xticklabels = iris_dataset.target_names, yticklabels = iris_dataset.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

Finally, we collect the calculated metrics into a dictionary and display them using a bar chart.

metrics = {'Accuracy': accuracy, 'Precision': precision, 'Recall': recall, 'ROC AUC': roc_auc}
metric_names, metric_values = zip(*metrics.items())
fig, ax = plt.subplots()
bars = ax.bar(metric_names, metric_values, color = ["#212129", "#380000", "#330033", "#003333"])
for bar, value in zip(bars, metric_values):
ax.text(bar.get_x() + bar.get_width() / 2 - 0.1, bar.get_height() + 0.01, f'{value:.3f}', color='black', fontsize=10)
plt.ylabel('Score')
plt.title('Performance')

Analyse machine learning plots#

Let’s understand the results plotted by the model below.

The dark blue boxes along the diagonal in the confusion matrix show the correct results predicted by the model (e.g. an Iris setosa actually being predicted as a setosa flower, an Iris veriscolor being predicted as a veriscolor, and an Iris virginica being predicted as a virginica). The other values can be read according to the labeling on the x and y axis, (e.g. the “1” in the 2nd column of the 3rd row depicts a virginica sample incorrectly being predicted as versicolor.

Confusion Matrix Plot
Confusion Matrix Plot

The performance metrics below represent the accuracy, precision, recall, and ROC AUC values of the trained model.

Performance Metrics Plot
Performance Metrics Plot

Congratulations, you solved your first machine learning problem! Although it was a beginner friendly dataset and solution, these are the major steps that will be involved in any machine learning problem, paired with the other steps we discussed at the start of the blog where applicable. With your newfound knowledge, you can start solving similar problems on your own and get hands-on experience yourself!

What are some common uses of machine learning? #

Here's a list of some of the most common uses of machine learning:

Pattern recognition #

Pattern recognition uses ML to identify patterns automatically. It observes various data types, including text, image, and sound data. These systems swiftly and precisely detect familiar patterns, streamlining data analysis. Common applications of pattern recognition are as follows:

  • Image recognition

  • Fingerprint scanning

  • Seismic activity analysis

Regression analysis#

Regression models predict a continuous outcome variable (y) based on one or more predictor variables (x). Regression analysis is a fundamental technique in ML.

Automation#

ML recognizes patterns in the tested system and autonomously generates test cases. This diminishes the reliance on manual creation, enhancing productivity and expediting processes by 48%.

Data mining#

Data mining involves the following key steps: detecting patterns, forecasting outcomes, and deriving significant insights from extensive datasets

What is the best way to learn machine learning? #

Machine learning uses algorithms like decision trees and logistic regression to analyze data, predict outcomes, and automate processes — revolutionizing areas from pattern recognition to data mining. ML includes supervised, unsupervised, and reinforcement learning. Each contributes to the understanding and utilizing of vast datasets.

Pick a language like Python or R for machine learning. To start learning machine learning skills, explore the "Fundamentals of Machine Learning for Software Engineers" course. It's perfect for beginners seeking hands-on experience with real-world datasets and the intricacies of AI and deep learning. The deep learning vs. machine learning blog gives a comprehensive explanation of the two approaches. For beginners wanting to take a complete skill path, the Become a Machine Learning Engineer path is the way to go.

Test your newfound ML knowledge!

Question

A problem where resulting labels are not provided and the model has to learn itself by finding similar patterns is called: _____

Show Answer

  

Free Resources