Follow A Bayesian, Or People Die!#

𝘛𝘩𝘪𝘴 𝘪𝘴 𝘢 𝘸𝘳𝘪𝘵𝘦-𝘶𝘱 𝘰𝘧 𝘴𝘰𝘮𝘦 𝘵𝘩𝘰𝘶𝘨𝘩𝘵𝘴 𝘐 𝘩𝘢𝘥 𝘢𝘵 𝘵𝘩𝘦 𝘣𝘦𝘨𝘪𝘯𝘯𝘪𝘯𝘨 𝘰𝘧 𝘵𝘩𝘦 𝘱𝘢𝘯𝘥𝘦𝘮𝘪𝘤, 𝘭𝘰𝘰𝘬𝘪𝘯𝘨 𝘣𝘢𝘤𝘬 𝘵𝘰 𝘴𝘰𝘮𝘦 𝘢𝘴𝘱𝘦𝘤𝘵𝘴 𝘰𝘧 𝘵𝘩𝘦 𝘍𝘳𝘦𝘲𝘶𝘦𝘯𝘵𝘪𝘴𝘵 𝘷𝘴. 𝘉𝘢𝘺𝘦𝘴𝘪𝘢𝘯 𝘥𝘦𝘣𝘢𝘵𝘦. 𝘐’𝘮 𝘯𝘰𝘵 𝘪𝘯 𝘵𝘩𝘦 𝘱𝘰𝘴𝘪𝘵𝘪𝘰𝘯 𝘵𝘰 𝘢𝘥𝘥 𝘢𝘯𝘺𝘵𝘩𝘪𝘯𝘨 𝘯𝘦𝘸 𝘵𝘰 𝘵𝘩𝘦 𝘵𝘰𝘱𝘪𝘤, 𝘣𝘶𝘵 𝘐 𝘸𝘪𝘭𝘭 𝘶𝘴𝘦 𝘵𝘩𝘦 𝘯𝘦𝘸𝘴 𝘰𝘧 𝘵𝘩𝘢𝘵 𝘵𝘪𝘮𝘦 𝘵𝘰 𝘨𝘪𝘷𝘦 𝘢𝘯 𝘦𝘹𝘢𝘮𝘱𝘭𝘦 𝘸𝘩𝘦𝘳𝘦 𝘍𝘳𝘦𝘲𝘶𝘦𝘯𝘵𝘪𝘴𝘵 𝘢𝘯𝘥 𝘉𝘢𝘺𝘦𝘴𝘪𝘢𝘯 𝘨𝘪𝘷𝘦 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘵 𝘢𝘯𝘴𝘸𝘦𝘳𝘴. 𝘐𝘵 𝘪𝘴 𝘢 𝘤𝘭𝘢𝘴𝘴𝘪𝘤 𝘤𝘰𝘯𝘥𝘪𝘵𝘪𝘰𝘯𝘢𝘭 𝘴𝘵𝘢𝘵𝘪𝘴𝘵𝘪𝘤𝘢𝘭 𝘪𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦, 𝘦𝘢𝘴𝘺 𝘪𝘧 𝘺𝘰𝘶 𝘩𝘢𝘷𝘦 𝘢 𝘥𝘦𝘦𝘱 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥𝘪𝘯𝘨 𝘰𝘧 𝘵𝘩𝘦 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘤𝘦𝘴 𝘣𝘦𝘵𝘸𝘦𝘦𝘯 𝘍𝘳𝘦𝘲𝘶𝘦𝘯𝘵𝘪𝘴𝘵 𝘢𝘯𝘥 𝘉𝘢𝘺𝘦𝘴𝘪𝘢𝘯 𝘴𝘵𝘢𝘵𝘪𝘴𝘵𝘪𝘤𝘴, 𝘣𝘶𝘵 𝘢𝘭𝘴𝘰 𝘦𝘢𝘴𝘺 𝘵𝘰 𝘨𝘰 𝘸𝘳𝘰𝘯𝘨 𝘪𝘧 𝘺𝘰𝘶 𝘥𝘰𝘯’𝘵.

The title is inspired by a quote from Col. Jessup (interpreted by Jack Nicholson) in A Few Good Man movie.

“We follow orders, son.
We follow orders or people die.
It’s that simple.”

A truly legendary performance!

Disclaimer

Despite the provocative title, and how sometimes the Frequentist approach to these kind of problems is mis-represented in many accounts you find on the web (too many unfortunately of low quality, try to Google it!), I do not advocate the view that Frequentist is bad and you should always use Bayesian! .

The usual advice is to always choose wisely (obviously!).
My goal is to persuade you that in order to do so you must understand both well! (that’s the real trick!).

Disclaimer 2

For a deep, practical and very well done discussion on this topic I strongly suggest to have a look at what Jake VanderPlas has written in his blog. Here I followed his approach in presenting these kind of problems.

Issues With Early Covid-19 Samples#

At the beginning of the pandemic (Feb-Mar 2020) we knew very little about this new disease and, above all, how extended was spread and how deadly it was. The only available information, repeated tirelessly by the media, were about the number and fraction of people tested positive, how many of them needed hospitalization and, unfortunately, the number of deaths.

What added even more rage for people like me that digest data for a living, was the mis-use and mis-interpretation of those numbers and figures. Both by traditional media and, exponentially magnified, by the totally uncontrolled arena of social media.

In fact, it was clear to anybody with minimal statistic background that all the extrapolations about the spread, seriousness, death rate, etc… of the disease (statistic inference in our jargon) from the daily rate of infection announcements (as the number positives over the total tests done) was completely meaningless and totally biased.

What it was needed - as many pointed out - was a controlled testing on a randomly selected sample of people. Unfortunately at that time it was not possible to do it. The limited supply of testing kits that were prioritized to key workers and people with symptoms. That was totally understandable.

But the inability to produce unbiased data was not a justification to use biased data as an alternative!!

However there was an important exception to this lack of unbiased data: the Diamond Princess cruise ship (and others later).

Because of the isolated environment and relatively low number, all passengers were tested independently from the fact they reported symptoms or not.

For some time, the data from cruise ships have been the only unbiased ones available. Many scientists used these data for early studies of the Covid-19 (i.e. “What the cruise-ship outbreaks reveal about COVID-19” published in Nature on 26 March 2020)

A Statistical Problem on Conditional Inference#

Those dramatic events at early stage of the pandemic reminded me examples of conditional inferences, an advanced statistical concept. They are fairly well known, with first examples dating back to the reverend Thomas Bayes in 1763, that discussed a “billiard problem” in his paper.

Almost any book on Statistics has a version of these kind of problems, and for a good reason! They really challenge your understanding of statistical inference, and make you appreciate the subtleties behind the Frequentist and Bayesian approach to probability.

It is only after a full understanding of these subtleties that you can really be in a position to “choose wisely”!

A Covid-19 Infected Cruise Ship Problem

A cruise ship docked on the port you are responsible for. They have reported people with Covid-19 symptoms. All passengers have being tested, and you have to dispatch ambulances to take people tested positive in dedicated hubs for treatment.

You assess the situation as the following:

So far M=35 passengers got their Covid-19 test results, with Q=7 tested positive.
They have been already disembarked and taken care accordingly
There are still N=70 people on board, with the Covid-19 test results still pending.
You only have K=22 ambulances left.
One ambulance can carry only one positive passenger.

At this point you get a call from the crisis center headquarter asking you if you need more ambulances. They explain that ambulances are needed everywhere and this will be your only chance to have more ambulances. However you should only ask what is deemed necessary, many people lives are at stake.

Clearly to be 100% safe, you should have 70 ambulances; you will be covered no matter what.

But ambulances are scarce, so you decide to take a reasonable risk: you will ask for more ambulances only if the probability that 22 ambulances are not enough is higher than 10%

What you should answer then?

Will the number of positive tests on the remaining 70 passengers be greater than 22 with a probability higher than 10% ?

Some possible answers#

There are many ways to reason about this problem, I will discuss four of them:

Frequentist Approach: This is how a Frequentist might answer
Bayesian Approach: This is how a Bayesian will probbably answer
Professional Approach: This is how a Professional statistician may answer
Caveman Approach: This is how a person knowing nothing or enough about conditional probability, maximum likelihood, nuisance parameters, Beta-Binomial distribution, etc… , but with very good coding skills, might answer. A DevOps or MLOps engineer for example.

In the following we assume the first 35 people were randomly selected, i.e. their rate of infection is the same as the remaining 70 passengers (never ever ever give priority to woman, children or elderly as a statistician,it will totally bias your sample!!).

Frequentist

A frequentistic approach could go like this:

The actual (true) infection rate is \(\rho\). We don’t know it, but we can infer our best guess by the measurements we have.
For any given passenger already tested, we can calculate the probability (binomial) distribution of the observation (positive or not), given \(\rho\).
As we have \(M=35\) observations, we can calculate the likelihood \(\mathcal{L}(\rho)\) of observing \(Q=7\) positives as join probability of \(M\) independent observations
Finally we calculate which value of \(\rho\) maximize the likelihood \(\mathcal{L}(\rho)\) of observing the measurements we observe. That result \(\hat{\rho}\) is our best guess for \(\rho\).

After all those steps and calculations we find that our best guess \(\hat{\rho}\) is (surprise!):

\[ \hat{\rho} = \frac{Q}{M} = \frac{7}{35} \]

Since the infection rate is the same for all passengers, the probability of \(K\) positive cases out of the remaining \(N\) passengers is easy to calculate (binomial probability):

\[ P(K|N,M,Q) = \hat{\rho}^K (1-\hat{\rho})^{N-K} = \bigg(\frac{Q}{M}\bigg)^K \bigg(1-\frac{Q}{M}\bigg)^{N-K} = \frac{Q^K}{M^N}(M-Q)^{N-K} \]

In particular we want to know the overall probability \(\mathcal{P}(K_{max})\) that the number of positive cases is above \(K_{max} = 22\):

\[\mathcal{P}(K_{max}) = \sum_{K > K_{max}} P_{K} \]

If \(\mathcal{P}(K_{max})\) is above 10% the risk is too high, and we need more ambulances.

Bayesian

A bayesian will treat this problem as a simple case of conditional probability with a nuisance parameter to marginalize

Let’s define for clarity:

A: there are \(K\) positive cases among the \(N=70\) passenger on board
D: the data we have, i.e. there are \(Q=7\) positive cases among the \(M=35\) passengers already tested
\(\rho\): the unknown infection rate

What we want is \(P(A,\rho|D)\), and since \(\rho\) is unknown, it is marginalized:

\[ P(A|D) = \int{P(A,\rho|D)d\rho} \]

The trick now is to manipulate this expression until we get something we know how to calculate. Using the law of conditional probability (\(P(A\cap B) = P(A|B) \cdot P(B)\)) and the Bayes’ theorem we have:

\[\begin{split} P(A,\rho|D) = P(A|\rho, D)\cdot P(\rho|D) \\ P(\rho|D) = \frac{P(D|\rho) \cdot P(\rho)}{p(D)}\\ P(D) = \int{P(D|\rho) \cdot P(\rho)\ d\rho} \end{split}\]

Using the binomial probability, we also have:

\[\begin{split} P(A|\rho, D) = {K \choose N} {\rho}^K (1-{\rho})^{N-K} \\ P(D|\rho) = {Q \choose M} {\rho}^Q (1-{\rho})^{M-Q} \end{split}\]

The last bit is what to put for \(P(\rho)\), the prior on the probability distribution of \(\rho\). What we can say is that it can be equally anything between 0 and 1 (flat distribution, \(P(\rho)=c\)). Put everything together:

\[\begin{split} \begin{align} P(A|D) & = \int{P(A,\rho|D)d\rho} = \int{P(A|\rho, D) \cdot P(\rho|D) \ d\rho} \\ & = \int{P(A|\rho, D) \cdot \frac{P(D|\rho) \cdot P(\rho)}{p(D)} \ d\rho} \\ & = \frac{\int{P(A|\rho, D) \cdot P(D|\rho) \cdot P(\rho) \ d\rho}}{\int{P(D|\rho) \cdot P(\rho) \ d\rho}} \\ & = {K \choose N} \frac{\int_0^1{\rho^K (1-\rho)^{N-K} \cdot \rho^Q (1-\rho)^{M-Q} d\rho}}{\int_0^1{{\rho}^Q (1-{\rho})^{M-Q} \ d\rho}} \end{align} \end{split}\]

After calculating these simple integrals, we get the conditional probability to observe \(K\) positives on the remaining passengers:

\[ P(A|D) = P(K|N,M,Q) = {K \choose N} \frac{(K+Q)!}{Q!} \cdot \frac{(N-K+(M-Q))! }{(N+M+1)} \cdot \frac{(M+1)!}{(M-Q)!} \]

We can get the overall probability \(\mathcal{P}(K_{max})\) that the number of positive cases is above \(K_{max} = 22\) from :

\[\mathcal{P}(K_{max}) = \sum_{K = K_{max}+1}^N P(K|N,M,Q) \]

Professional

A Professional would recognize that he probability of \(a\) positives in \(A\) passengers randomly selected from all the \(B\) passengers among which there are exactly \(b\) positives follows a Hypergeometric distribution, i.e.:

\[ p(a|b,B,A) = \mathcal{HG}(a|b,B,A) \]

In this problem however we know \(a\) and with this information we want, on the still-to-be-tested passengers (\(B-A\)), the distribution of the number of positives still unknown (\(b-a\))

It is convenient to choose for \(b\) a Beta-Binomial distribution as prior, as it is a conjugate prior of a Hypergeometric distribution:

\[ b \approx \mathcal{BB}(b|B,\alpha,\beta) \]

This implies that the unknown number of positives \(b-a\) is also a Beta-Binomial distribution (posterior):

\[ b-a \approx \mathcal{BB}(b-a|B-A,\alpha+a,\beta+(A-a)) \]

where the hyperparameters \(\alpha,\beta\) of the prior are added to the observed numbers of positives and negatives passengers \(a, A-a\).

Using the notation of the problem (\(A=M, a=Q, B=N+M, b=K+Q\)) and choosing a uniform prior (\(\alpha = \beta = 1\)):

\[ P(K|N,M,Q) = \mathcal{BB}(K, N, Q+1, M-Q+1) \]

We can get the overall probability \(\mathcal{P}(K_{max})\) that the number of positive cases is above \(K_{max} = 22\) from :

\[\mathcal{P}(K_{max}) = \sum_{K = K_{max}+1}^N P(K|N,M,Q) \]

Caveman

A caveman would just write a simple toy Monte Carlo simulation, for example as follow:

Assume a random infection rate,
Create a random, fake scenario of infected passengers in agreement with the given observations,
Simply annotate (yes or no) if that scenario has an “unwanted” outcome, i.e. the number of tested positives passengers are more than 22
Generate zillion of scenarios

In the end, the fraction of the generated scenarios with an “unwanted” outcome will give the probability required. No statistic knowledge needed, just a ratio of two integers.

Python Code

Frequentist

from scipy.special import factorial as f_
from scipy.special import comb
from scipy.stats import binom
import numpy as np
import random

## P(K|N,M,Q)
def FreqProbability(k,n,m,q):
    
    p_hat = q/m
    prob = binom.pmf(k,n,p_hat)
    return prob

## P(K_max)
def ProbabilitySum(func, k_max ,n,m,q):
    prob_v = np.array([func(i,n,m,q) for i in range(k_max+1,n+1)])
    #print(prob_v)
    return prob_v.sum()

k_max = 22
N = 70
M = 35
Q = 7

p_max = ProbabilitySum(FreqProbability, k_max, N, M, Q)

print(f'Total probability for K_max={k_max}: {p_max:.6f} ')

Bayesian

from scipy.special import factorial as f_
from scipy.special import comb
from scipy.stats import binom
import numpy as np
import random


## integral value
def ProbIntegralValue(a,b):
    return f_(a)*f_(b)/f_(a+b+1)


def BayesianProbability(k,n,m,q):
    
    a = q
    b = m-a
    den = ProbIntegralValue(a,b)
    
    a = k+q
    b = n+m-a
    num = comb(n,k)*ProbIntegralValue(a,b)
    
    return num/den

## P(K_max)
def ProbabilitySum(func, k_max ,n,m,q):
    prob_v = np.array([func(i,n,m,q) for i in range(k_max+1,n+1)])
    #print(prob_v)
    return prob_v.sum()


k_max = 22
N = 70
M = 35
Q = 7

p_max = ProbabilitySum(BayesianProbability, k_max, N, M, Q)

print(f'Total probability for K_max={k_max}: {p_max:.6f} ')

Professional

from scipy.stats import betabinom

##- Input
m=35
q=7
n=70
k=22

## Use the sf = 1-cdf to have the overall probability
p = betabinom.sf(k,n,q+1,m-q+1)

print(f'Total probability for K_max={k}: {p:.6f} ')

Caveman

import numpy as np
import random

def MakeExperiment(N, p):
    ## make a random scenario of N tosses with probability p
    rnd = np.random.rand(N)
    return (rnd<p).astype(int) ## 1 with probability p, 0 with probability (1-p)
    
def CheckExperiment(experiment, observed_tosses, positive_observed, check_type = 'measurement'):
    ## check if a given experiment satisfy requirements:
    ## check_type = 'measurement'
    ##          Requirement: having 'positive_observed' in the first 'observed_tosses' tosses
    ## check_type = 'expectation'
    ##          Requirement: having 'positive_observed' in the tosses after the first 'observed_tosses'
    #print(experiment, observed_tosses, positive_observed, check_type)
    assert(len(experiment)> observed_tosses)
    assert(observed_tosses >= positive_observed)

    test_passed = False
    
    ## positives in observed
    if (check_type == 'measurement'):
        positives = experiment[:observed_tosses].sum()
        if positives == positive_observed: test_passed = True
    
    ## positive in not-observed
    if (check_type == 'expectation'):
        positives = experiment[observed_tosses:].sum()
        if positives > positive_observed: test_passed = True
        
    if test_passed:
         return experiment
    
    return []

def RunSimulation(Tot_number_experiments, N, m, k, q):
    N_experiment_accepted = 0
    N_experiment_sucessed = 0
    for i in range(Tot_number_experiments):
        
        p = random.random() ## random p    
        experiment = CheckExperiment(MakeExperiment(N+m,p), m, q) # Make experiment compatible with observation
        if len(experiment) == 0: continue # if not passing the check, move on
        N_experiment_accepted += 1
        good_experiment = CheckExperiment(experiment, m, k, check_type = 'expectation') ## Now check if compatible with expectations
        if len(good_experiment): N_experiment_sucessed += 1
    
    print("... experiment done")
    return np.array([N_experiment_accepted, N_experiment_sucessed])
    


## Observed 
observed_tosses = 35 
positive_in_observed = 7
## To predict
future_tosses = 70
positive_in_future = 22
## Number of simulated experiments
Tot_number_experiments = 10000000

###-----
n_bunches = 10 ## split in bunches to estimate the error
N_in_bunch = int(Tot_number_experiments/n_bunches)

result = np.array([RunSimulation(N_in_bunch, future_tosses, observed_tosses, positive_in_future, positive_in_observed) for i in range(n_bunches)])

## With error estimate
prob = result[:,1]/result[:,0]
print("Probability of {} positive in next {} tosses after observing {} positives in {} tosses: ({:.5f} +- {:.5f}) %".format( \
    positive_in_future, future_tosses, positive_in_observed, observed_tosses, prob.mean()*100, prob.std()*100 ))

Results#

Running the Python code we can report the final answers

Will the number of positive tests on the remaining 70 passengers be higher than 22 with 10% (or more) probability?

Frequentist: NO. The probability is only 0.8%
Bayesian: YES. The probability is 10.92%
Professional: YES. The probability is 10.92%
Caveman: YES. The probability is (10.92 \(\pm\) 0.14) %

The Caveman result accounts for the statistical error on the number of toy Montecarlo generated while the other three approaches are analytical results.

But these are just details.

The main point is that the Frequentist approach gives the wrong answer, and as a consequence people die! (well, to be precise there is a 10.92% probability that nobody dies)

Discussion and Take Away Message#

OK, what’s going on here?? Were not all those endless frequentistic vs. bayesian debates basically all but philosophical banters? Some pedantic interpretations of the concept of probability but basically giving identical answers in all practical cases? And this is why it is important not only to know them, but also to understand them well.

Important

Frequentistic and Bayesian never give the same answers (maybe the same results), for the simple fact that they don’t answer the same question!

Is the frequentistic statistical inference all wrong?#

Is this a counterexample that the frequentist approach is wrong? Obviously not!

Any person with a good statistic background already realized from the start that the frequentist approach I showed above is not what a frequentist would do.

To explain well why is beyond the scope of this post (this is also the reason for the immense literature on these topics!). But I will put here few points that are important to have crystal clear for a proper application of statistic inference, whatever approach one might use:

A Frequentist would already have argued about the question asked. The objection is on the meaning of “90% probability that at most 22 passengers on board will test positive”. For a frequentist there is not such a probability. Either it will happen or it will not. That’s the truth. We don’t know what it will be, but there is not 11% or 0.8% or whatever probability. It is yes or it is no.
In any case, a Frequentist would probably have framed the problem in term of hypothesis testing, and/or confident interval. But this IS NOT the same as the Bayesian counterparts (often called credible region). They are NOT answering the same question!
The freqentist inference gives procedures on statistics problems. They account of observations in the calculation of the confidence interval, but the statistic interpretation is on the procedure. The procedure does come with a frequency guarantee that the truth number of positives is within the frequentist confidence interval 95% of the time (or whatever threshold is used), not the particular confidence interval.
If the frequentist procedure is applied to 100 cruise ships in the same situation, for 95 ships the calculated confident interval contains the true value of positive passengers. If you consider only 1 ship though , it may be one of those 95 or one of the remaining 5. In the latter case the calculated confidence interval DOES NOT contain the true value. As for the frequentist spirit, the truth is or is not in the calculated interval, there is no probability.
In this sense conditional inference is not Frequentist. Confidence Intervals are un-conditional.

As a final note, it seems in this particular case the Bayesian approach is a better (in the sense of “more intuitive”) choice. But I still remember a warning I once found somewhere while reading about these topics:

To those who attach themselves to either camp: remember, there is plenty of ammunitions in term of counterexamples on BOTH sides!

Can We Outsmart Ryanair?

26 November 2022

Categories