Stats Works
  • About This Website

To T-Test or Not to T-Test A Bayesian Approach

An Exploration of Using Bayesian Principles to Generalize the T-Test Procedure¶

Standard T-Test¶

Consider the t-test of equality of means. Using the standard procedure for a t-test:

In [ ]:
test_equality_means <- function(norm_vec1, norm_vec2){
  sx <- var(norm_vec1)
  sy <- var(norm_vec2)
  xbar <- mean(norm_vec1)
  ybar <- mean(norm_vec2)
  n = length(norm_vec1)
  m = length(norm_vec2)

  tstat <- (xbar - ybar)/(sqrt((1/n + 1/m)*((n - 1)*sx + (m - 1)*sy)/(n + m - 2)))
  p_tstat <- pt(tstat,n + m - 2)

  return(list(tstat = tstat, prob = p_tstat))
}
In [2]:
vec1 <- c(120,107,110,116,114,111,113,117,114,112)
vec2 <- c(110,111,107,108,110,105,107,106,111,111)

t_test_res = test_equality_means(vec1, vec2)
t_test_res
$tstat
3.48432421316996
$prob
0.998676366290831

We can see that the t-test gives us a significant result -- the two means would have a difference smaller than the observed difference with probability r t_test_res$prob. We then would reject the null hypothesis with 95% confidence (or even 99% confidence.)

Bayesian Approach¶

Now let's try to solve this problem with our Bayesian hats! Assuming that we have a prior distribution of $\mu$ and $\sigma^2$ such that $x \mid \mu, \sigma^2 \sim N(\mu, \sigma^2)$ , we want the posterior $f(\mu \mid x_1, x_2, ..., x_n, \sigma^2)$ . Assuming a prior of $\mu \mid \sigma^2 \sim N(\beta, \frac{\sigma}{\sqrt{n}})$ and $\frac{S}{\sigma^2} \sim \chi^2(k)$, we start with prior parameters $\beta_0, n, S_0, k$ and update for each value of x.

In [1]:
simulatepostprob <- function(vec, prior_params=c(n = 0,S = 0,k = -1,B = 0)){
  nnew = prior_params['n']
  Snew = prior_params['S']
  knew = prior_params['k']
  Bnew = prior_params['B']

  lst <- list()
  for (i in 1:length(vec)) {

    obs <- vec[i]

    ninit = nnew
    Binit = Bnew
    Sinit = Snew
    kinit = knew

    knew = kinit + 1
    nnew = ninit + 1
    Bnew <- (ninit*Binit + obs)/nnew
    Snew <- Sinit + ninit*Binit^2 + obs^2 - nnew*Bnew^2

    if (knew == 0) {
      sampchi <- 0
      sigma <- 0
    }else{
      ## We randomly sample a value for sigma|x first
      sampchi <- sum(rnorm(knew,0,1)^2)
      sigma <- sqrt(Snew/sampchi)
    }
    ## Using sigma, we sample for mu|x, sigma.
    mu <- rnorm(1,obs + Bnew, sigma / sqrt(nnew))
    lst[[i]] <- mu
  }
  return(unlist(lst))
}

rpost <- function(n,vec, prior_params=c(n = 0,S = 0,k = -1,B = 0)){
  replicate(n,expr = simulatepostprob(vec)[length(vec)])
}
In [3]:
simulatepostprob(vec = vec1) - simulatepostprob(vec = vec2)
  1. 20
  2. -0.503416712324565
  3. 1.45988251782074
  4. 17.6073011837844
  5. 7.22926620488519
  6. 9.24411515038176
  7. 8.35747543643981
  8. 15.4506594648444
  9. 8.61721035342939
  10. 6.22213384120991

By iterating for each observation, we arrive at a posterior distribution for $\mu \mid \sigma^2$ and $\frac{S}{\sigma^2}$. We then sample randomly from these theoretical distributions to sample from the posterior distributions. If we replicate this many times, we will be able to estimate the differences between the posterior distributions of the two vectors.

Using uninformative prior gives similar results to frequentist t-test method

In [4]:
sum(rpost(1000,vec1) > rpost(1000,vec2))/1000
0.999

We can then use our prior information about the sample to produce a mixed, updated result. For instance, with prior parameters:

In [5]:
prior_params = c(n=10, S=2, k=9, B=30)

We can see our posterior:

In [6]:
sum(rpost(1000, vec1, prior_params) > rpost(1000, vec2, prior_params))/1000
0.999
Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.
Comments
comments powered by Disqus

Articles in Bayesian Analysis

  • « Bayesian Analysis: Poisson Discrete

All Articles

  • « Bayesian Analysis: Poisson Discrete
  • MANOVA »

Published

Nov 3, 2018

Category

Bayesian Analysis

Tags

  • alternative methodologies 1
  • bayesian modelling 1
  • t test 2

Contact

  • Stats Works - Making Statistics Work for You