Stats Works
  • About This Website

Heteroskedacity with Weighted Least Squares

An Approach to Solving Heteroskedacity when the Variance is a Multiplicative Constant

One of the most widely known assumptions of linear regression is homoskedacity, that is that the variance of the errors are constant for all of the values (they do not depend on the particular observation). Weighted least squares is a way of dealing with this problem. We will detail the intuition and reasoning behind why this is the case.

Consider a scale that measures two objects. For each object, we have some randomized error. Consider the weights of two objects, first alone and then together.

That is, consider the following model:

$$ y_{1} = \alpha_{1} + \varepsilon_{1} $$$$ y_{2} = \alpha_{2} + \varepsilon_{2} $$$$ y_{3} = \alpha_{1} + \alpha_{2}\ + \ \varepsilon_{3} $$

This is like a regression model, in that we can solve for the alphas (the true weights) by the method of least squares.

$${\mathbf{y = X\alpha + \ \varepsilon} } $$$$\mathbf{X}^{\mathbf{T}}\mathbf{X} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \\ \end{bmatrix}$$$$ {(\mathbf{X}^{\mathbf{T}}\mathbf{X)}}^{- 1} = \frac{1}{3} \begin{bmatrix} 2 & - 1 \\ -1 & 2 \\ \end{bmatrix} $$$$ {(\mathbf{X}^{\mathbf{T}}\mathbf{X)})}^{- 1}\mathbf{X}^{\mathbf{T}}=\frac{1}{3} \begin{bmatrix} 2 & -1 & 1 \\ -1 & 2 & 1 \\ \end{bmatrix} $$$$ \widehat{\mathbf{\alpha}} = \frac{1}{3}\begin{bmatrix} 2y_{1} - y_{2} + y_{3} \\ -y_{1} + 2y_{2} + y_{3} \\ \end{bmatrix} $$

Now consider that the random element is not equally variant for the two objects -- that is, that for both weights combined there is some multiplicative factor on the error $> 1$. That is, $$ \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \varepsilon_{3} \\ \end{bmatrix} = \mathbf{MVN}(0, \Sigma) \quad \text{where } \Sigma = \begin{bmatrix} \sigma^{2} & 0 & 0 \\ 0 & \sigma^{2} & 0 \\ 0 & 0 & {k^{2}\sigma}^{2} \\ \end{bmatrix} $$

Note that this violates the least squares assumptions. One method to fix this issue is to transform the variables so they have equal variance, since $k$ is a numeric value $> 0$.

Note that we can transform the third variable as follows:

$$\frac{y_{3}}{k} = \frac{\alpha_{1}}{k} + \frac{\alpha_{2}}{k}\ + \ \frac{\varepsilon_{3}}{k}$$

Now, $\frac{\varepsilon_{3}}{k}$ has variance $\sigma^{2}$.

Let $$\mathbf{y}^{\mathbf{*}} = \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3}/k \\ \end{bmatrix} $$

$$ \mathbf{X}^{\mathbf{*}} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1/k & 1/k \\ \end{bmatrix} $$$$ \mathbf{\alpha}^{\mathbf{*}} = \begin{bmatrix} \alpha_{1} \\ \alpha_{2} \\ \end{bmatrix} $$$$ \mathbf{\varepsilon}^{\mathbf{*}} = \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \varepsilon_{3}/k \\ \end{bmatrix} $$

Then we can solve as in the case of OLS:

$${{\widehat{\mathbf{\alpha}}}^{*} = \left( {\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{X}^{\mathbf{*}} \right)^{- 1}{\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{y}^{\mathbf{*}} }{{\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{X}^{\mathbf{*}}\mathbf{= \ }\begin{bmatrix} \frac{k^{2} + 1}{k^{2}} & \frac{1}{k^{2}} \\ \frac{1}{k^{2}} & \frac{k^{2} + 1}{k^{2}} \\ \end{bmatrix}}$$$$\left( {\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{X}^{\mathbf{*}} \right)^{- 1} = \frac{k^{4}}{k^{4} + 2k^{2}}\ \begin{bmatrix} \frac{k^{2} + 1}{k^{2}} & \frac{- 1}{k^{2}} \\ \frac{- 1}{k^{2}} & \frac{k^{2} + 1}{k^{2}} \\ \end{bmatrix}$$$$\left( {\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{X}^{\mathbf{*}} \right)^{- 1}{\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{=}\frac{k^{4}}{k^{4} + 2k^{2}}\begin{bmatrix} \frac{k^{2} + 1}{k^{2}} & \frac{- 1}{k^{2}} & \frac{1}{k} \\ \frac{- 1}{k^{2}} & \frac{k^{2} + 1}{k^{2}} & \frac{1}{k} \\ \end{bmatrix}$$$${\widehat{\mathbf{\alpha}}}^{*}\mathbf{=}\frac{k^{4}}{k^{4} + 2k^{2}}\mathbf{\ }\begin{bmatrix} \frac{k^{2} + 1}{k^{2}}y_{1} - \frac{y_{2}}{k^{2}} + \frac{y_{3}}{k^{2}} \\ \frac{{- y}_{1}}{k^{2}} + \frac{k^{2} + 1}{k^{2}}y_{2} + \frac{y_{3}}{k^{2}} \\ \end{bmatrix}$$$${\widehat{\mathbf{\alpha}}}^{*}\mathbf{=}\frac{k^{2}}{k^{4} + 2k^{2}}\mathbf{\ }\begin{bmatrix} \mathbf{(}k^{2} + 1)y_{1} - y_{2} + y_{3} \\ \mathbf{-}y_{1}\mathbf{+ (}k^{2} + 1)y_{2} + y_{3} \\ \end{bmatrix}$$

However, this is not a general solution. Consider instead the following:

$$\begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \varepsilon_{3} \\ \end{bmatrix}\mathbf{\ \sim\ MVN(0, \Sigma)\ \ } \quad \text{where } \Sigma = \begin{bmatrix} {a^{2}\sigma}^{2} & 0 & 0 \\ 0 & b^{2}\sigma^{2} & 0 \\ 0 & 0 & {c^{2}\sigma}^{2} \\ \end{bmatrix}$$

We could attempt to perform the same procedure, that is, divide $y_{1}$ by $a^{2}$, $y_{2}$ by $b^{2}$ and so on. However, this is arduous, and it would be convenient to use a matrix of weights to solve this.

Let W be a matrix of the reciprocals of these constants. Using the same procedure as above, we can express $\mathbf{y}^{\mathbf{*}}$ as $\mathbf{W}^{\mathbf{1/2}}\mathbf{\ }\mathbf{y}$ and $\mathbf{X}^{\mathbf{*}}$ as $\mathbf{W}^{\mathbf{1/2}}\mathbf{X}$ Then our result is:

$${\widehat{\mathbf{\alpha}}}^{*} = \left( {\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{X}^{\mathbf{*}} \right)^{- 1}{\mathbf{X}^{\mathbf{*}}}^{\mathbf{T}}\mathbf{y}^{\mathbf{*}}$$$${\widehat{\mathbf{\alpha}}}^{*} = \left( \mathbf{X}^{\mathbf{T}}\mathbf{\text{WX}} \right)^{- 1}\mathbf{X}^{\mathbf{T}}\mathbf{W}\mathbf{y}^{\mathbf{*}}$$

We can see that if W = $\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & {1/k}^{2} \\ \end{bmatrix}\mathbf{\ }$ we get the same result as earlier.

Code Implementation¶

In [3]:
ols_preds <- function(X, y) { 
  return(solve(t(X) %*% X) %*% t(X) %*% y)
}

weighted_ols_preds <- function(X, y, W) {
  return(solve(t(X) %*% W %*% X) %*% t(X) %*% W %*% y)
}
In [6]:
y <- c(41,53,97)
X <- matrix(c(1,0,1,0,1,1),nrow = 3)
alphahat <- solve(t(X) %*% X) %*% t(X)
alphahat %*% y
42
54

This calculates the predictions in the first case.

In [7]:
k <- 1.25
ystar <- y
ystar[3] <- y[3]/k
xstar <- matrix(c(1,0,1/k,0,1,1/k),nrow = 3)
alphahat <- solve(t(xstar) %*% xstar) %*% t(xstar)
alphahat %*% ystar
41.84211
53.84211

This calculates the predictions in the second case

Using the functions we get the same results:

In [8]:
ols_preds(X,y)
W <- diag(c(rep(1,2), 1/k^2))
weighted_ols_preds(X, y, W)
42
54
41.84211
53.84211
Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.
Comments
comments powered by Disqus

Articles in Linear Models

  • « Constrained Least Squares

All Articles

  • « Constrained Least Squares
  • Hidden Markov Models Extensions »

Published

Jan 24, 2019

Category

Linear Models

Tags

  • heteroskedacity 1
  • linear regression 6
  • theory 6
  • weighted least squares 1

Contact

  • Stats Works - Making Statistics Work for You