How to plot logistic glm predicted values and confidence interval in R -


i have binomial glm of presence/absence response variable , factor variable 9 levels this:

data$y<-factor(data$y,levels=c(0,1),labels=c("absent","present")) table(data$y,data$site_name)            andulay antulang basak dauin poblacion district 1 guinsuan kookoo's nest lutoban pier lutoban south malatapay pier   absent        4        4     1                          0        3             1            5             5              2   present       2        2     5                          6        3             5            1             1              4  model <- glm(y~site_name,data=data,binomial) 

just skipping model inference , validation brevity's sake, how plot per site probability of getting "present" in boxplot confidence interval? kind of shown in plot predicted probabilities , confidence intervals in r show boxplot, regression variable site_name factor 9 levels, not continuous variable.

i think can calculate necessary values follows (but not 100% sure correctness):

function convert model coefficients probabilities of success:

calc_val <- function(x){return(round(1/(1+1/(exp(x))),3))} 

predicted probabilities based on model:

prob <- tapply(predict(model,type="response"),data$site_name,function(x){round(mean(x),3)}) means <- as.data.frame(prob) 

75% , 95% confidence intervals predicted probabilities:

ci <- cbind(confint(model,level=0.9),confint(model,level=0.5)) rownames(ci) <- gsub("site_name","",rownames(ci)) ci <- t(apply(ci,1,calc_val)) 

join in 1 table

ci<-cbind(means,ci) ci                             prob   5 %  95 %  25 %  75 %   pr(>|z|) stderr andulay                    0.333 0.091 0.663 0.214 0.469 0.42349216  0.192 antulang                   0.333 0.112 0.888 0.304 0.696 1.00000000  0.192 basak                      0.833 0.548 0.993 0.802 0.964 0.09916496  0.152 dauin poblacion district 1 1.000 0.000    na 0.000 1.000 0.99097988  0.000 guinsuan                   0.500 0.223 0.940 0.474 0.819 0.56032414  0.204 kookoo's nest              0.833 0.548 0.993 0.802 0.964 0.09916496  0.152 lutoban pier               0.167 0.028 0.788 0.130 0.501 0.51171512  0.152 lutoban south              0.167 0.028 0.788 0.130 0.501 0.51171512  0.152 malatapay pier             0.667 0.364 0.972 0.640 0.903 0.25767454  0.192 

so questions twofold:

  1. is calculation of probability , confidence interval correct?
  2. how plot in bloxplot (box , whiskers plot)?

edit here sample data via dput (which modified tables above match data):

# dput(data[c("y", "site_name")]) data <- structure(list(y = structure(c(1l, 1l, 1l, 1l, 2l, 2l, 1l, 2l, 2l, 2l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 1l, 1l, 1l, 1l, 1l, 2l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 1l, 2l, 2l, 1l, 2l, 2l, 2l, 2l, 1l, 2l, 2l, 2l, 2l, 2l), .label = c("absent", "present"), class = "factor"), site_name = structure(c(2l, 2l, 2l, 2l, 2l, 2l, 9l, 9l, 9l, 9l, 9l, 9l, 4l, 4l, 4l, 4l, 4l, 4l, 8l, 8l, 8l, 8l, 8l, 8l, 7l, 7l, 7l, 7l, 7l, 7l, 5l, 5l, 5l, 5l, 5l, 5l, 1l, 1l, 1l, 1l, 1l, 1l, 3l, 3l, 3l, 3l, 3l, 3l, 6l, 6l, 6l, 6l, 6l, 6l), .label = c("andulay", "antulang", "basak", "dauin poblacion district 1", "guinsuan", "kookoo's nest", "lutoban pier", "lutoban south", "malatapay pier"), class = "factor")), .names = c("y", "site_name"), row.names = c(125l, 123l, 126l, 124l, 128l, 127l, 154l, 159l, 157l, 158l, 156l, 155l, 111l, 114l, 116l, 115l, 112l, 113l, 152l, 151l, 148l, 150l, 153l, 149l, 143l, 146l, 144l, 147l, 142l, 145l, 164l, 165l, 161l, 163l, 160l, 162l, 120l, 122l, 121l, 117l, 118l, 119l, 137l, 136l, 139l, 141l, 140l, 138l, 129l, 134l, 131l, 135l, 133l, 130l), class = "data.frame") # 

this lowest-common-denominator, base-package-only, solution.

fit model:

mm <- glm(y~site_name,data=dd,family=binomial) 

make prediction frame site names:

pframe <- data.frame(site_name=unique(dd$site_name)) 

predict (on logit/linear-predictor scale), standard errors

pp <- predict(mm,newdata=pframe,se.fit=true) linkinv <- family(mm)$linkinv ## inverse-link function 

put prediction, lower , upper bounds, , back-transform probability scale:

pframe$pred0 <- pp$fit pframe$pred <- linkinv(pp$fit) alpha <- 0.95 sc <- abs(qnorm((1-alpha)/2))  ## normal approx. likelihood alpha2 <- 0.5 sc2 <- abs(qnorm((1-alpha2)/2))  ## normal approx. likelihood pframe <- transform(pframe,                     lwr=linkinv(pred0-sc*pp$se.fit),                     upr=linkinv(pred0+sc*pp$se.fit),                     lwr2=linkinv(pred0-sc2*pp$se.fit),                     upr2=linkinv(pred0+sc2*pp$se.fit)) 

plot.

with(pframe, {     plot(site_name,pred,ylim=c(0,1))     arrows(as.numeric(site_name),lwr,as.numeric(site_name),upr,            angle=90,code=3,length=0.1) }) 

as boxplot:

with(pframe, {     bxp(list(stats=rbind(lwr,lwr2,pred,upr2,upr),              n = rep(1,nrow(pframe)),              conf = na,              out = null,              group = null,              names=as.character(site_name))) }) 

there lots of other ways this; recommend

library("ggplot2") ggplot(pframe,aes(site_name,pred))+      geom_pointrange(aes(ymin=lwr,ymax=upr))+      geom_linerange(aes(ymin=lwr2,ymax=upr2),lwd=1.5)+      coord_flip() 

an alternative solution fit model via y~site_name-1, in case assign separate parameter probability of each site, , use profile()/confint() find confidence intervals; more accurate relying on normality of sampling distributions of parameters/predictions done in answer above.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -