What is entropy?

Entropy or information theory tends to express a certain measure of information within an event. The theory was proposed and developed by Claude Shannon at Bell Labs.

Information theory

Let's say that we have the event of a coin toss with the outcome as either getting heads or tails. Now we know that there is a 50/50 chance of getting both. However, let's suppose now that we have a person who states that the probability of getting heads on a specific toss is 95%. This is highly unlikely but, at the same time, intriguing.

This highly improbable event somehow happens on the specific toss, as the person states. So now we know that every time he chooses to toss the coin, heads will come first in the toss, so there is little uncertainty and hence less information to receive. But if we are to say that some other person decides to toss the coin now with the 50/50 probability and ends up getting heads thrice in a row (highly uncertain event), then there is plenty of information to get from this toss.

Hence if the probability of some event is low, the information gain is high. This can be stated by the function defined below:

Here $E(x)$ describes the information; $E$ gained on the event; $x$ .

In our short example, the coin toss event will never have a large information gain due to the limited 50/50 probabilities assigned to both outcomes (heads and tails), respectively. However, in comparison to this, if we take an event that has a smaller chance of occurring such as picking a red marble from uniquely colored marbles in a bag, let's say with a probability of 1/9. We have an information gain greater than the one obtained in the typical coin toss event.

Applying this theorem to discrete variables, we can state that the entropy; $H(X)$ of a discrete random variable; $X$ with distribution; $p \rightarrow [1,0]$ is

Where,

$A$ is the event whose entropy we want to measure.
$\sigma$ denotes the entropy.
$(X, \sum, \mu)$ is the probability space with $X$ denoting the sample space from which the event $A$ is chosen, $\sum$ being the outcomes of this event (sum of probabilities) and $\mu$ being the probability function in consideration.

Now it's time to go ahead and take a look at an example.

Example

Let's take our coin toss example that we discussed at the start, where heads had a 95% chance of occurring. Then we can find the entropy as follows:

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

What is entropy?

Information theory

Example

Properties

Applications