Stats Works
  • About This Website

Bootstrapping Exercise An example using the GLM parameters

This is a simple example of using the boot function in R to produce bootstrapped estimates of the bias and standard error of the parameters.

In [2]:
install.packages('ISLR')
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
In [3]:
library(ISLR)
library(boot)
attach(Default)
head(Default)
glm.fit <- glm(default~income+balance,family = "binomial",data = Default)
summary(glm.fit)
boot.fn <- function(data ,index) {
 return(glm(default~income+balance ,family = "binomial",data = data ,subset = index)$coefficients)
}

boot.fn(Default,1:100)
boot(Default ,boot.fn ,1000)
summary(glm.fit)
defaultstudentbalanceincome
No No 729.526544361.625
No Yes 817.180412106.135
No No 1073.549231767.139
No No 529.250635704.494
No No 785.655938463.496
No Yes 919.5885 7491.559

Call:
glm(formula = default ~ income + balance, family = "binomial", 
    data = Default)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4725  -0.1444  -0.0574  -0.0211   3.7245  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.154e+01  4.348e-01 -26.545  < 2e-16 ***
income       2.081e-05  4.985e-06   4.174 2.99e-05 ***
balance      5.647e-03  2.274e-04  24.836  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1579.0  on 9997  degrees of freedom
AIC: 1585

Number of Fisher Scoring iterations: 8
(Intercept)
-26.5660685235383
income
9.0174230280541e-20
balance
-3.21887798899407e-20

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Default, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
         original        bias     std. error
t1* -1.154047e+01 -4.009952e-02 4.329822e-01
t2*  2.080898e-05  1.740632e-07 4.721751e-06
t3*  5.647103e-03  1.964705e-05 2.331222e-04

Call:
glm(formula = default ~ income + balance, family = "binomial", 
    data = Default)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4725  -0.1444  -0.0574  -0.0211   3.7245  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.154e+01  4.348e-01 -26.545  < 2e-16 ***
income       2.081e-05  4.985e-06   4.174 2.99e-05 ***
balance      5.647e-03  2.274e-04  24.836  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1579.0  on 9997  degrees of freedom
AIC: 1585

Number of Fisher Scoring iterations: 8

As we can see above, the boot function found standard errors that were greater than the standard errors of the summary function. This makes sense, as the bootstrap method does not depend on estimates of the deviations of the error terms -- it simply samples observations repeatedly and creates models with them, and uses the difference between that and the full model to determine the standard error. So, as expected, the true standard error is likely larger than that given by the summary, since the summary is dependent on the type of model itself, while the bootstrap is not.

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.
Comments
comments powered by Disqus

All Articles

  • « Mixed Models vs GEE Comparing the Two Hierarchical Models
  • Summary Statistics Part 1: Central Tendency »

Published

Mar 29, 2018

Category

Model Validation

Tags

  • bootstrapping 1
  • generalized linear models 3
  • glm 1
  • linear regression 6
  • parameters 1

Contact

  • Stats Works - Making Statistics Work for You