XOR problem in neural network

The XOR, or "exclusive OR", problem is a classic problem in the field of artificial intelligence and machine learning. It is a problem that cannot be solved by a single layer perceptron, and therefore requires a multi-layer perceptron or a deep learning model.

This Answer aims to provide a comprehensive understanding of the XOR problem and how it can be solved using a neural network. First of all lets get to know what is XOR function?

The XOR function

The XOR function is a binary function that takes two binary inputs and returns a binary output. The output is true if the number of true inputs is odd, and false otherwise. In other words, it returns true if exactly one of the inputs is true, and false otherwise.

The following table shows the truth table for the XOR function:

Despite its simplicity, a single-layer perceptron has a significant limitation: it can only model linearly separable functions. This means it can only classify data points that can be separated by a straight line in a two-dimensional space, or a hyperplane in higher dimensions.

Linear separability of points

Linear separability is a concept in machine learning that refers to the ability to distinguish between classes of data points using a straight line (in two dimensions) or a hyperplane (in higher dimensions). If two classes of points can be perfectly separated by such a line or hyperplane, they are considered linearly separable. This concept is fundamental to understanding the limitations of single-layer perceptrons, which can only model linearly separable functions.

Consider the below data points:

In the above illustration, the circle is drawn when both x and y are the same, and the diamond is for when they are different. But as shown in the figure, we can not separate the circles and diamonds by drawing a line. Hence the XOR function is not linearly separable.

This is where the XOR problem in neural networks arises. A single-layer perceptron, due to its linear nature, fails to model the XOR function.

Overcoming the XOR problem

The XOR problem can be overcome by using a multi-layer perceptron (MLP), also known as a neural network. An MLP consists of multiple layers of perceptrons, allowing it to model more complex, non-linear functions.

The structure of an MLP solving the XOR problem might look like this:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# XOR inputs and outputs
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
outputs = np.array([[0], [1], [1], [0]])
# Create a sequential model
model = Sequential()
# Add layers
model.add(Dense(2, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(inputs, outputs, epochs=5000, verbose=0)
# Evaluate the model
print(model.evaluate(inputs, outputs))

Code explanation

Let’s break down the code:

Lines 1–3: These lines import the necessary libraries for the code.
Lines 6–7: These lines define the inputs and outputs for the XOR function. The inputs are all possible pairs of binary values, and the outputs are the results of the XOR function for each pair.
Line 10: Here, a Sequential model is created using Keras. A Sequential model is a linear stack of layers that can be used to create a neural network.
Lines 13–14: These lines add layers to the neural network. The first layer is a Dense (fully connected) layer with two neurons, and it uses the relu activation function. The second layer is also a Dense layer, but it has only one neuron and uses the sigmoid activation function.
Line 17: In this line, the model is compiled. This step involves specifying the loss function and the optimizer. The loss function used here is binary cross-entropy, which is suitable for binary classification problems. The optimizer used is adam, which is an algorithm for first-order gradient-based optimization of stochastic objective functions.
Line 20: Here, the model is trained on the XOR inputs and outputs for 5000 epochs. An epoch is one complete pass through the entire training dataset.
Line 23: Finally, the model's performance is evaluated on the same inputs and outputs. The evaluate function returns the loss value and metrics values for the model.

Conclusion

The XOR problem is a classic problem in artificial intelligence and machine learning that illustrates the limitations of single-layer perceptrons and the power of multi-layer perceptrons. By using a neural network with at least one hidden layer, it is possible to model complex, non-linear functions like XOR. This makes neural networks a powerful tool for various machine learning tasks.

Test your knowledge

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments