Not So Big Data Blog Ramblings of a data engineer

The (simple) intuition of the Bernoulli distribution's variance

Hi again!

Today is going to be a short post that briefly discusses a simple intuition that I stumbled across after plotting the variance of the Bernoulli distribution. I thought it neatly captured some intuition, and I feel its worth sharing.

I’ll walk you through the steps we need to get to this intuition (it’s not bad, I promise). So, from the top, then.

Let’s start with a random variable, that can take on a binary value, i.e. . The probability that takes on a value of 0 or 1 is:

where is a parameter we can control and is in the range . Put differently, we can think of as a measure of how biased a coin is. The larger the value of , the greater the probability of observing a 1, and the smaller the probability of observing a 0.

We can encode this logic in a single expression with a little bit of creativity:

This is identical to our earlier expression above (feel free to plug-in values to verify) but contained in a single, neat expression.

The Expected Value

The following is stock-standard affair, but we’ll go through the steps for the sake of thoroughness.

The expected value of a (discrete) random variable is defined as:

(i.e. the sum of each outcome multiplied by the probability of observing that outcome 1)

For the Bernoulli distribution, that gives us the following:

Ah-ha! Our expected value of a random variable with a Bernoulli distribution is our bias parameter, 2.

Variance

Same steps as before, but this time we’ll be calculating the variance. The variance of a (discrete) random variable is defined as:

So, given our parameters, the variance for the Bernoulli distribution can be expressed as:

Next, we can plug-in our previously-calculated mean:

and away we go…

The magic

At this point, this may seem kind of arbitrary. But let’s plot our calculated variance of the Bernoulli distribution (remember: the domain of is ).

That’s rather interesting, right? No, not the fact that it’s a parabola (the equation for the variance should’ve given that away). But rather that the intuition regarding a biased coin is captured entirely by this single curve!

Let me explain. controls the bias of our coin, as we know from earlier. If we know the coin is entirely biased to one side (eg. or ), then there is no uncertainty about the outcome - hence the variance is zero. Conversely, if we have an unbiased coin (when ), we know the least about the potential outcome, and so your variance is at its maximum.

This might not be rocket science 3 to people who’re very familiar with statistics, but the intuition encoded in that variance plot was a unexpectedly delightful observation that captures more intuition about the behaviour of a Bernoulli distribution than simply looking at the maths. And that’s always a win in my book.

Till next time.

Footnotes

  1. This is a rather neat intuition in and of itself. 

  2. This isn’t something that may have been immediately clairvoyant upon first inspection. It certainly wasn’t for me, even though in hindsight (after computing the math), it appears rather obvious. Intuition in the probability domain remains a curious thing. 

  3. or brain surgery, for that matter.