XOR problem in neural network

The XOR, or "exclusive OR", problem is a classic problem in the field of artificial intelligence and machine learning. It is a problem that cannot be solved by a single layer perceptron, and therefore requires a multi-layer perceptron or a deep learning model.

This Answer aims to provide a comprehensive understanding of the XOR problem and how it can be solved using a neural network. First of all lets get to know what is XOR function?

The XOR function

The XOR function is a binary function that takes two binary inputs and returns a binary output. The output is true if the number of true inputs is odd, and false otherwise. In other words, it returns true if exactly one of the inputs is true, and false otherwise.

The following table shows the truth table for the XOR function:

Input A

Input B

Output (A XOR B)

0

0

0

0

1

1

1

0

1

1

1

0

The perceptron and its limitations

A perceptron is a type of artificial neuron used in machine learning. It takes multiple inputs, applies weights to them, sums them up, and passes them through an activation function to produce an output. The mathematical representation of a perceptron is:

Despite its simplicity, a single-layer perceptron has a significant limitation: it can only model linearly separable functions. This means it can only classify data points that can be separated by a straight line in a two-dimensional space, or a hyperplane in higher dimensions.

Linear separability of points

Linear separability is a concept in machine learning that refers to the ability to distinguish between classes of data points using a straight line (in two dimensions) or a hyperplane (in higher dimensions). If two classes of points can be perfectly separated by such a line or hyperplane, they are considered linearly separable. This concept is fundamental to understanding the limitations of single-layer perceptrons, which can only model linearly separable functions.

Consider the below data points:

Linearly separable data points
Linearly separable data points

The XOR problem

The XOR function is not linearly separable, which means we cannot draw a single straight line to separate the inputs that yield different outputs.

This is illustrated in the following diagram:

XOR function not linearly separable
XOR function not linearly separable

In the above illustration, the circle is drawn when both x and y are the same, and the diamond is for when they are different. But as shown in the figure, we can not separate the circles and diamonds by drawing a line. Hence the XOR function is not linearly separable.

This is where the XOR problem in neural networks arises. A single-layer perceptron, due to its linear nature, fails to model the XOR function.

Overcoming the XOR problem

The XOR problem can be overcome by using a multi-layer perceptron (MLP), also known as a neural network. An MLP consists of multiple layers of perceptrons, allowing it to model more complex, non-linear functions.

The structure of an MLP solving the XOR problem might look like this:

Structure of multi-layer perceptron
Structure of multi-layer perceptron

In this structure, the first layer is the input layer. The second layer (hidden layer) transforms the original non-linearly separable problem into a linearly separable one, which the third layer (output layer) can then solve.

Solving the XOR problem with a multi layer neural network

The following is a Python code snippet that uses the Keras library to create a multi-layer perceptron for the XOR function:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# XOR inputs and outputs
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
outputs = np.array([[0], [1], [1], [0]])
# Create a sequential model
model = Sequential()
# Add layers
model.add(Dense(2, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(inputs, outputs, epochs=5000, verbose=0)
# Evaluate the model
print(model.evaluate(inputs, outputs))

Code explanation

Let’s break down the code:

  • Lines 13: These lines import the necessary libraries for the code.

  • Lines 67: These lines define the inputs and outputs for the XOR function. The inputs are all possible pairs of binary values, and the outputs are the results of the XOR function for each pair.

  • Line 10: Here, a Sequential model is created using Keras. A Sequential model is a linear stack of layers that can be used to create a neural network.

  • Lines 1314: These lines add layers to the neural network. The first layer is a Dense (fully connected) layer with two neurons, and it uses the relu activation function. The second layer is also a Dense layer, but it has only one neuron and uses the sigmoid activation function.

  • Line 17: In this line, the model is compiled. This step involves specifying the loss function and the optimizer. The loss function used here is binary cross-entropy, which is suitable for binary classification problems. The optimizer used is adam, which is an algorithm for first-order gradient-based optimization of stochastic objective functions.

  • Line 20: Here, the model is trained on the XOR inputs and outputs for 5000 epochs. An epoch is one complete pass through the entire training dataset.

  • Line 23: Finally, the model's performance is evaluated on the same inputs and outputs. The evaluate function returns the loss value and metrics values for the model.

Conclusion

The XOR problem is a classic problem in artificial intelligence and machine learning that illustrates the limitations of single-layer perceptrons and the power of multi-layer perceptrons. By using a neural network with at least one hidden layer, it is possible to model complex, non-linear functions like XOR. This makes neural networks a powerful tool for various machine learning tasks.

Test your knowledge

Match The Answer
Select an option from the left-hand side

XOR operation

Difficulty of a single-layer perceptron with XOR

Perceptron

Ability to distinguish data points with a line

Linear separability

An artificial neuron

XOR problem

A logical operation with odd true inputs


Copyright ©2024 Educative, Inc. All rights reserved