What is entropy?

Entropy or information theory tends to express a certain measure of information within an event. The theory was proposed and developed by Claude Shannon at Bell Labs.

Information theory

Let's say that we have the event of a coin toss with the outcome as either getting heads or tails. Now we know that there is a 50/50 chance of getting both. However, let's suppose now that we have a person who states that the probability of getting heads on a specific toss is 95%. This is highly unlikely but, at the same time, intriguing.

This highly improbable event somehow happens on the specific toss, as the person states. So now we know that every time he chooses to toss the coin, heads will come first in the toss, so there is little uncertainty and hence less information to receive. But if we are to say that some other person decides to toss the coin now with the 50/50 probability and ends up getting heads thrice in a row (highly uncertain event), then there is plenty of information to get from this toss.

Hence if the probability of some event is low, the information gain is high. This can be stated by the function defined below:

Here E(x)E(x) describes the information; EE gained on the event; xx.

In our short example, the coin toss event will never have a large information gain due to the limited 50/50 probabilities assigned to both outcomes (heads and tails), respectively. However, in comparison to this, if we take an event that has a smaller chance of occurring such as picking a red marble from uniquely colored marbles in a bag, let's say with a probability of 1/9. We have an information gain greater than the one obtained in the typical coin toss event.

Applying this theorem to discrete variables, we can state that the entropy; H(X)H(X) of a discrete random variable; XX with distribution; p[1,0]p \rightarrow [1,0] is

In the equation above, b is the base of the logarithm, which is usually 22 . xx belongs to the XX set of discrete variables, being randomly selected.

Another way to define entropy in the context of measure theoryThe generalization of geometric measures such as mass and probability events is:

Where,

  • AA is the event whose entropy we want to measure.

  • σ\sigma denotes the entropy.

  • (X,,μ)(X, \sum, \mu) is the probability space with XX denoting the sample space from which the event AA is chosen, \sum being the outcomes of this event (sum of probabilities) and μ\mu being the probability function in consideration.

Now it's time to go ahead and take a look at an example.

Example

Let's take our coin toss example that we discussed at the start, where heads had a 95% chance of occurring. Then we can find the entropy as follows:

However, for the 50/50 toss, the entropy would be:

So the entropy is greater on the uncertainty rather than the certainty of the heads occurring.

Properties

Some of the properties of entropy are stated below:

  • Adding or removing a zero-probability event will not change the entropy.

  • Entropy calculated on two events (X,Y)(X,Y) will be equal to evaluating them differently.

  • The entropy of a variable will decrease when its function estimated value is passed through the original function again.

  • If XX and YY are two independent random variables, then:

  • Entropy in a probability mass function; pp is concave:

Applications

Entropy is used in

  • Combinatorics (Loomis–Whitney inequality and binomial coefficient approximation).

  • Machine Learning (Decision trees, Machine learning, Bayesian inference).

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved