Here we will be doing MANOVA for the dataset below. We want to know if the means of tear, gloss and opacity are statistically significantly different from one another depending on the rate groups -- Low or High.

Data from manova example: Krzanowski (1998, p. 381)

In evaluating MANOVA (Multivariate Analysis of Variance), we can consider a few different approaches to estimating the "size" of deviance from the between (the matrix $H$) and the within (the matrix $E$) distances. Each matrix represents something similar to the univariate ANOVA example, but we are now comparing the differences in matrices rather than in vectors. This makes the task more difficult, as we cannot simply use the F-distribution as a comparison of squared differences -- our results will also look like matrices, not single numbers.

There have been four proposed solutions to this problem, named after Wilks, Roys, Pillai and Hotelling respectively. Each of these attempts to quantify the magnitude of the between matrix divided by the within matrix using the determinant or the eigenvalues of the product of these two matrices. If this value is greater than a certain threshold, we conclude that the groups are different, via a similar intuition with ANOVA. However, unlike ANOVA, no such simple distribution exists to describe the two error matrices.

We also implement here the F approximations for each of these statistics to quantify our results.

In [2]:

return_MANOVA_results <- function(dataset, group_column) {
  F_dists <- find_approx_F_dist(dataset, group_column)
  results <- 1 - sapply(X = F_dists[c('Wilks','Roy','Pillai','Hotelling')], FUN = function(obj) pf(obj[1], obj[2], obj[3]))
  results <- rbind(p_f = results, statistic = F_dists$untransformed[1:4],
                   data.frame(F_dists[c('Wilks','Roy','Pillai','Hotelling')],row.names = c('F Approx.', 'df1', 'df2')))
  return(results)
}

find_approx_F_dist <- function(dataset, group_column) {
  # dataset (dataframe): dataset with groupings as integer values. 
  # group_column (string): string name of column with group entries, integer values. 
  MANOVA_stat_lst <- find_MANOVA_statistics(dataset,group_column)
  
  Vh <- MANOVA_stat_lst['n_groups'] - 1
  p <- ncol(dataset) - 1
  Ve <- nrow(dataset) - Vh - 1
  s <- min(Vh, p) 
  
  Wilks_approx_F <- FforWilks(Lambda = MANOVA_stat_lst['Wilks'], Vh = Vh, p = p, Ve = Ve)
  Roy_approx_F <- FBoundforRoy(R = MANOVA_stat_lst['Roys'], Vh = Vh, p = p, Ve = Ve)
  Pillai_approx_F <- FapproxforPillai(P = MANOVA_stat_lst['Pillai'], Vh = Vh, p = p, Ve = Ve, s = s)
  Hotelling_approx_F <- FapproxforHotelling(H = MANOVA_stat_lst['Hotelling'], Vh = Vh, p = p, Ve = Ve)

  return(list(untransformed = MANOVA_stat_lst, Wilks = Wilks_approx_F,Roy = Roy_approx_F, Pillai = Pillai_approx_F, Hotelling = Hotelling_approx_F))
}

find_MANOVA_statistics <- function(dataset, group_column){
  # dataset (dataframe): dataset with groupings as integer values. 
  # group_column (string): string name of column with group entries, integer values. 
  
  grouped_ds_lst <- lapply(1:length(unique(dataset[,group_column])), function(i) dataset[dataset[,group_column] == i,!names(dataset) %in% c(group_column)])
  H <- Reduce("+", lapply(grouped_ds_lst,function(g) hypothesis_iv(g, dataset)))
  E <- Reduce("+", lapply(grouped_ds_lst,function(g) error_iv(g, dataset)))
  
  Einv <- solve(E)
  HoverE <- H %*% Einv
  eigen_det <- eigen(HoverE)
  ev <- eigen_det$values
  
  Wilks <- 1/(prod(1 + ev))
  Roys <- (max(ev))/(1 + max(ev))
  Pillai <- sum(ev/(1 + ev))
  Hotelling <- sum(ev)
  
  return(c(Wilks = Wilks,Roys = Roys,Pillai = Pillai,Hotelling = Hotelling, n_groups = length(grouped_ds_lst)))
}

hypothesis_iv <- function(group,dataset) {
  A <- sapply(group,mean)
  B <- sapply(dataset[, 1:(ncol(dataset) - 1)],mean)
  dif <- A - B
  return((dif %*% t(dif))*nrow(group))
}

error_iv <- function(group,dataset) {
  A <- sapply(group,mean)
 
  mattimestranspose <- function(A)
    return(A %*% t(A))
  
  mylist <- lapply(1:nrow(group), function(i) mattimestranspose(t(as.matrix(group[i,] - A))))
  a1 <- Reduce('+',mylist)
  
  return(a1)
}

FforWilks <- function(Lambda, Vh, p, Ve) {
  t <- (((p^2)*(Vh^2) - 4)/((p^2) + (Vh^2) - 5))^.5
  df1 <- p*Vh
  w <- Ve + Vh - ((p + Vh + 1)/2)
  df2 <- w*t - (((p*Vh) - 2)/2)
  Fapprox <- ((1 - (Lambda^(1/t)))/(Lambda^(1/t)))*(df2/df1)
  return(c(Fapprox = Fapprox, df1 = df1, df2 = df2))
}

FBoundforRoy <- function(R, Vh, p, Ve) {
  v1 <- (Ve - p - 1)/2
  v2 <- (p - Vh - 1)/2
  
  return(c(Fapprox = R*v1/v2, df1 = 2*v1 + 2, df2 = 2*v2 + 2))
}

FapproxforPillai <- function(P, Vh, p, Ve, s) {
  N <- (Ve - Vh + s)/2
  m <- (abs(Vh - p) - 1)/2
  num <- (2*N + s + 1)*P
  den <- (2*m + s + 1)*(s - P)
  Fapprox <- num/den
  df1 <- s*(2*m + s + 1)
  df2 <- s*(2*N + s + 1)
  return(c(Fapprox = Fapprox, df1 = df1, df2 = df2))
}

FapproxforHotelling <- function(H, Vh, p, Ve) {
  a <- p*Vh
  B <- ((Ve + Vh - p - 1)*(Ve - 1))/((Ve - p - 3)*(Ve - p))
  b <- 4 + ((a + 2)/(B - 1))
  cnum <- a*(b - 2)
  cden <- b*(Ve - p + 1)
  Fapprox <- (H*cden)/cnum
  return(c(Fapprox = Fapprox, df1 = a, df2 = b))
}

In [3]:

tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
          6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
           9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
             2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
rate     <- gl(2,10, labels = c("Low", "High"))

Y <- data.frame(cbind(tear,gloss,opacity,rate))

return_MANOVA_results(Y, 'rate')

	Wilks	Roy	Pillai	Hotelling
p_f	0.002273044	0.05409917	4.282064e-04	0.001220981
statistic	0.413619230	0.58638077	5.863808e-01	1.417682561
F Approx.	7.560973658	8.20933078	9.451217e+00	8.641112752
df1	3.000000000	16.00000000	3.000000e+00	3.000000000
df2	16.000000000	3.00000000	2.000000e+01	16.000000000

We see that all four methods approximately agree that this would be unlikely if the two groups followed the same Multivariate Normal Distribution.

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.

Comments

MANOVA

Published

Category

Tags

Contact