An Exploration of Using Bayesian Principles to Generalize the T-Test Procedure¶

Standard T-Test¶

Consider the t-test of equality of means. Using the standard procedure for a t-test:

In [ ]:

test_equality_means <- function(norm_vec1, norm_vec2){
  sx <- var(norm_vec1)
  sy <- var(norm_vec2)
  xbar <- mean(norm_vec1)
  ybar <- mean(norm_vec2)
  n = length(norm_vec1)
  m = length(norm_vec2)

  tstat <- (xbar - ybar)/(sqrt((1/n + 1/m)*((n - 1)*sx + (m - 1)*sy)/(n + m - 2)))
  p_tstat <- pt(tstat,n + m - 2)

  return(list(tstat = tstat, prob = p_tstat))
}

In [2]:

vec1 <- c(120,107,110,116,114,111,113,117,114,112)
vec2 <- c(110,111,107,108,110,105,107,106,111,111)

t_test_res = test_equality_means(vec1, vec2)
t_test_res

$tstat: 3.48432421316996
$prob: 0.998676366290831

We can see that the t-test gives us a significant result -- the two means would have a difference smaller than the observed difference with probability r t_test_res$prob. We then would reject the null hypothesis with 95% confidence (or even 99% confidence.)

Bayesian Approach¶

Now let's try to solve this problem with our Bayesian hats! Assuming that we have a prior distribution of $\mu$ and $\sigma^2$ such that $x \mid \mu, \sigma^2 \sim N(\mu, \sigma^2)$ , we want the posterior $f(\mu \mid x_1, x_2, ..., x_n, \sigma^2)$ . Assuming a prior of $\mu \mid \sigma^2 \sim N(\beta, \frac{\sigma}{\sqrt{n}})$ and $\frac{S}{\sigma^2} \sim \chi^2(k)$, we start with prior parameters $\beta_0, n, S_0, k$ and update for each value of x.

In [1]:

simulatepostprob <- function(vec, prior_params=c(n = 0,S = 0,k = -1,B = 0)){
  nnew = prior_params['n']
  Snew = prior_params['S']
  knew = prior_params['k']
  Bnew = prior_params['B']

  lst <- list()
  for (i in 1:length(vec)) {

    obs <- vec[i]

    ninit = nnew
    Binit = Bnew
    Sinit = Snew
    kinit = knew

    knew = kinit + 1
    nnew = ninit + 1
    Bnew <- (ninit*Binit + obs)/nnew
    Snew <- Sinit + ninit*Binit^2 + obs^2 - nnew*Bnew^2

    if (knew == 0) {
      sampchi <- 0
      sigma <- 0
    }else{
      ## We randomly sample a value for sigma|x first
      sampchi <- sum(rnorm(knew,0,1)^2)
      sigma <- sqrt(Snew/sampchi)
    }
    ## Using sigma, we sample for mu|x, sigma.
    mu <- rnorm(1,obs + Bnew, sigma / sqrt(nnew))
    lst[[i]] <- mu
  }
  return(unlist(lst))
}

rpost <- function(n,vec, prior_params=c(n = 0,S = 0,k = -1,B = 0)){
  replicate(n,expr = simulatepostprob(vec)[length(vec)])
}

In [3]:

simulatepostprob(vec = vec1) - simulatepostprob(vec = vec2)

20
-0.503416712324565
1.45988251782074
17.6073011837844
7.22926620488519
9.24411515038176
8.35747543643981
15.4506594648444
8.61721035342939
6.22213384120991

By iterating for each observation, we arrive at a posterior distribution for $\mu \mid \sigma^2$ and $\frac{S}{\sigma^2}$. We then sample randomly from these theoretical distributions to sample from the posterior distributions. If we replicate this many times, we will be able to estimate the differences between the posterior distributions of the two vectors.

Using uninformative prior gives similar results to frequentist t-test method

In [4]:

sum(rpost(1000,vec1) > rpost(1000,vec2))/1000

0.999

We can then use our prior information about the sample to produce a mixed, updated result. For instance, with prior parameters:

In [5]:

prior_params = c(n=10, S=2, k=9, B=30)

We can see our posterior:

In [6]:

sum(rpost(1000, vec1, prior_params) > rpost(1000, vec2, prior_params))/1000

0.999

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.

Comments

To T-Test or Not to T-Test A Bayesian Approach

An Exploration of Using Bayesian Principles to Generalize the T-Test Procedure¶

Standard T-Test¶

Bayesian Approach¶

Published

Category

Tags

Contact