This is a simple example of using the boot function in R to produce bootstrapped estimates of the bias and standard error of the parameters.

In [2]:

install.packages('ISLR')

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

In [3]:

library(ISLR)
library(boot)
attach(Default)
head(Default)
glm.fit <- glm(default~income+balance,family = "binomial",data = Default)
summary(glm.fit)
boot.fn <- function(data ,index) {
 return(glm(default~income+balance ,family = "binomial",data = data ,subset = index)$coefficients)
}

boot.fn(Default,1:100)
boot(Default ,boot.fn ,1000)
summary(glm.fit)

default	student	balance	income
No	No	729.5265	44361.625
No	Yes	817.1804	12106.135
No	No	1073.5492	31767.139
No	No	529.2506	35704.494
No	No	785.6559	38463.496
No	Yes	919.5885	7491.559


Call:
glm(formula = default ~ income + balance, family = "binomial", 
    data = Default)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4725  -0.1444  -0.0574  -0.0211   3.7245  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.154e+01  4.348e-01 -26.545  < 2e-16 ***
income       2.081e-05  4.985e-06   4.174 2.99e-05 ***
balance      5.647e-03  2.274e-04  24.836  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1579.0  on 9997  degrees of freedom
AIC: 1585

Number of Fisher Scoring iterations: 8

(Intercept): -26.5660685235383
income: 9.0174230280541e-20
balance: -3.21887798899407e-20


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Default, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
         original        bias     std. error
t1* -1.154047e+01 -4.009952e-02 4.329822e-01
t2*  2.080898e-05  1.740632e-07 4.721751e-06
t3*  5.647103e-03  1.964705e-05 2.331222e-04


Call:
glm(formula = default ~ income + balance, family = "binomial", 
    data = Default)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4725  -0.1444  -0.0574  -0.0211   3.7245  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.154e+01  4.348e-01 -26.545  < 2e-16 ***
income       2.081e-05  4.985e-06   4.174 2.99e-05 ***
balance      5.647e-03  2.274e-04  24.836  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1579.0  on 9997  degrees of freedom
AIC: 1585

Number of Fisher Scoring iterations: 8

As we can see above, the boot function found standard errors that were greater than the standard errors of the summary function. This makes sense, as the bootstrap method does not depend on estimates of the deviations of the error terms -- it simply samples observations repeatedly and creates models with them, and uses the difference between that and the full model to determine the standard error. So, as expected, the true standard error is likely larger than that given by the summary, since the summary is dependent on the type of model itself, while the bootstrap is not.

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.

Comments

Bootstrapping Exercise An example using the GLM parameters

Published

Category

Tags

Contact