How to plot logistic glm predicted values and confidence interval in R -
i have binomial glm
of presence/absence response variable , factor variable 9 levels this:
data$y<-factor(data$y,levels=c(0,1),labels=c("absent","present")) table(data$y,data$site_name) andulay antulang basak dauin poblacion district 1 guinsuan kookoo's nest lutoban pier lutoban south malatapay pier absent 4 4 1 0 3 1 5 5 2 present 2 2 5 6 3 5 1 1 4 model <- glm(y~site_name,data=data,binomial)
just skipping model inference , validation brevity's sake, how plot per site probability of getting "present" in boxplot confidence interval? kind of shown in plot predicted probabilities , confidence intervals in r show boxplot, regression variable site_name factor 9 levels, not continuous variable.
i think can calculate necessary values follows (but not 100% sure correctness):
function convert model coefficients probabilities of success:
calc_val <- function(x){return(round(1/(1+1/(exp(x))),3))}
predicted probabilities based on model:
prob <- tapply(predict(model,type="response"),data$site_name,function(x){round(mean(x),3)}) means <- as.data.frame(prob)
75% , 95% confidence intervals predicted probabilities:
ci <- cbind(confint(model,level=0.9),confint(model,level=0.5)) rownames(ci) <- gsub("site_name","",rownames(ci)) ci <- t(apply(ci,1,calc_val))
join in 1 table
ci<-cbind(means,ci) ci prob 5 % 95 % 25 % 75 % pr(>|z|) stderr andulay 0.333 0.091 0.663 0.214 0.469 0.42349216 0.192 antulang 0.333 0.112 0.888 0.304 0.696 1.00000000 0.192 basak 0.833 0.548 0.993 0.802 0.964 0.09916496 0.152 dauin poblacion district 1 1.000 0.000 na 0.000 1.000 0.99097988 0.000 guinsuan 0.500 0.223 0.940 0.474 0.819 0.56032414 0.204 kookoo's nest 0.833 0.548 0.993 0.802 0.964 0.09916496 0.152 lutoban pier 0.167 0.028 0.788 0.130 0.501 0.51171512 0.152 lutoban south 0.167 0.028 0.788 0.130 0.501 0.51171512 0.152 malatapay pier 0.667 0.364 0.972 0.640 0.903 0.25767454 0.192
so questions twofold:
- is calculation of probability , confidence interval correct?
- how plot in bloxplot (box , whiskers plot)?
edit here sample data via dput
(which modified tables above match data):
# dput(data[c("y", "site_name")]) data <- structure(list(y = structure(c(1l, 1l, 1l, 1l, 2l, 2l, 1l, 2l, 2l, 2l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 1l, 1l, 1l, 1l, 1l, 2l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 1l, 2l, 2l, 1l, 2l, 2l, 2l, 2l, 1l, 2l, 2l, 2l, 2l, 2l), .label = c("absent", "present"), class = "factor"), site_name = structure(c(2l, 2l, 2l, 2l, 2l, 2l, 9l, 9l, 9l, 9l, 9l, 9l, 4l, 4l, 4l, 4l, 4l, 4l, 8l, 8l, 8l, 8l, 8l, 8l, 7l, 7l, 7l, 7l, 7l, 7l, 5l, 5l, 5l, 5l, 5l, 5l, 1l, 1l, 1l, 1l, 1l, 1l, 3l, 3l, 3l, 3l, 3l, 3l, 6l, 6l, 6l, 6l, 6l, 6l), .label = c("andulay", "antulang", "basak", "dauin poblacion district 1", "guinsuan", "kookoo's nest", "lutoban pier", "lutoban south", "malatapay pier"), class = "factor")), .names = c("y", "site_name"), row.names = c(125l, 123l, 126l, 124l, 128l, 127l, 154l, 159l, 157l, 158l, 156l, 155l, 111l, 114l, 116l, 115l, 112l, 113l, 152l, 151l, 148l, 150l, 153l, 149l, 143l, 146l, 144l, 147l, 142l, 145l, 164l, 165l, 161l, 163l, 160l, 162l, 120l, 122l, 121l, 117l, 118l, 119l, 137l, 136l, 139l, 141l, 140l, 138l, 129l, 134l, 131l, 135l, 133l, 130l), class = "data.frame") #
this lowest-common-denominator, base-package-only, solution.
fit model:
mm <- glm(y~site_name,data=dd,family=binomial)
make prediction frame site names:
pframe <- data.frame(site_name=unique(dd$site_name))
predict (on logit/linear-predictor scale), standard errors
pp <- predict(mm,newdata=pframe,se.fit=true) linkinv <- family(mm)$linkinv ## inverse-link function
put prediction, lower , upper bounds, , back-transform probability scale:
pframe$pred0 <- pp$fit pframe$pred <- linkinv(pp$fit) alpha <- 0.95 sc <- abs(qnorm((1-alpha)/2)) ## normal approx. likelihood alpha2 <- 0.5 sc2 <- abs(qnorm((1-alpha2)/2)) ## normal approx. likelihood pframe <- transform(pframe, lwr=linkinv(pred0-sc*pp$se.fit), upr=linkinv(pred0+sc*pp$se.fit), lwr2=linkinv(pred0-sc2*pp$se.fit), upr2=linkinv(pred0+sc2*pp$se.fit))
plot.
with(pframe, { plot(site_name,pred,ylim=c(0,1)) arrows(as.numeric(site_name),lwr,as.numeric(site_name),upr, angle=90,code=3,length=0.1) })
as boxplot:
with(pframe, { bxp(list(stats=rbind(lwr,lwr2,pred,upr2,upr), n = rep(1,nrow(pframe)), conf = na, out = null, group = null, names=as.character(site_name))) })
there lots of other ways this; recommend
library("ggplot2") ggplot(pframe,aes(site_name,pred))+ geom_pointrange(aes(ymin=lwr,ymax=upr))+ geom_linerange(aes(ymin=lwr2,ymax=upr2),lwd=1.5)+ coord_flip()
an alternative solution fit model via y~site_name-1
, in case assign separate parameter probability of each site, , use profile()
/confint()
find confidence intervals; more accurate relying on normality of sampling distributions of parameters/predictions done in answer above.
Comments
Post a Comment