cdfquantreg: IPCC data example

Yiyun Shou, Michael Smithson

IPCC Study

The fourth Intergovernmental Panel on Climate Change (IPCC) report utilizes verbal phrases such as “likely” and “unlikely” to describe uncertainties in climate science (e.g., “The Greenland ice sheet and other Arctic ice fields likely contributed no more than 4 m of the observed sea level rise.”). The IPCC report also provided guidelines to enable readers to interpret these phrases as numerical intervals (e.g., “likely” was characterized as referring to probabilities between .66 and 1).

Budescu, Broomell, and Por (2009) conducted an experimental study of lay interpretations of these phrases, using 13 sentences from the IPCC report. They asked participants to provide lower, “best”, and upper numerical estimates of the probabilities to which they believed each sentence referred. They found that participants’ “best” estimates were nearer to the middle of the [0, 1] interval than the IPCC guidelines. In a reanalysis of their data using beta regression, Smithson, et al. (2012) reported that this tendency was stronger for negatively-worded phrases (e.g., “unlikely”) than for positively-worded phrases. Moreover, they found greater dispersion of responses (i.e., less consensus) for negative than for positive phrases.

About the data

The IPCC data-set comprises the lower, best, and upper estimates for the phrases “likely” and “unlikely” in six IPCC report sentences. There are 18 observations for each of 223 participants, consisting of lower, best, and upper estimates for 6 sentences. The “likely” sentence data are in the rows where max(Q4, Q5, Q6) = 1, and the “unlikely” sentence data are in the rows where max(Q8, Q9, Q10) = -1. A variable named valence takes a value of 1 for “likely” and 0 for “unlikely”. Lower, best, and upper estimates are identified by the variables “mid” and “high”, such that both are 0 for the lower estimates, mid = 1 and high = 0 for the best estimates, and mid = 1 and high = 1 for the upper estimates.

The raw estimates themselves are the variable named prob, and probm is a transformation that shifts prob away from the boundary values of 0 and 1. Thus, probm is the appropriate dependent variable for a cdfquantreg model.

The remaining three variables (treat, narrow, and wide) represent the experimental conditions in the Budescu et al. study. The “treat” variable codes two conditions: treat = 0 if participants were given a table with the IPCC guidelines in it, and treat = 1 if the IPCC guideline was included in the sentence itself. Budescu, et al. (2009) reported that embedding the guideline in the sentence caused respondents’ estimates to be less regressive and closer to the IPCC guidelines.

library(cdfquantreg)
data(cdfqrExampleData)
ipcc_mid <- subset(IPCC, mid == 1 & high == 0)

# Overview the data
knitr::kable(head(ipcc_mid), row.names=F)
subj treat prob probm mid high Question valence
1 1 0.56 0.5597309 1 0 Q4 1
1 1 0.51 0.5099552 1 0 Q5 1
1 1 0.52 0.5199103 1 0 Q6 1
1 1 0.35 0.3506726 1 0 Q8 0
1 1 0.42 0.4203587 1 0 Q9 0
1 1 0.90 0.8982063 1 0 Q10 0
# Distribution of the data
MASS::truehist(ipcc_mid$probm)

# Choice of CDF distribution: finite tailed
cdfqrFamily(shape='FT')
## Overview cdfquantreg distributions:
Distributions fd sd shape
ArcSinh-ArcSinh arcsinh arcsinh Finite-tailed
ArcSinh-Cauchy arcsinh cauchy Finite-tailed
Cauchit-ArcSinh cauchit arcsinh Finite-tailed
Cauchit-Cauchy cauchit cauchy Finite-tailed
T2-T2 T2 T2 Finite-tailed

Model fit

# We use T2-T2 distribution
fd <- "t2"
sd <- "t2"

# Fit the null model
fit_null <- cdfquantreg(probm ~ 1 | 1, fd, sd, data = ipcc_mid)

# Fit the target model
fit <- cdfquantreg(probm ~ valence | valence, fd, sd, data = ipcc_mid)

# Obtain the statistics for the null model
summary(fit)
## Family:  t2 t2 
## Call:  cdfquantreg(formula = probm ~ valence | valence, data = ipcc_mid,  
##     fd = fd, sd = sd) 
## 
## Mu coefficients (Location submodel)
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.79843    0.03436  23.240  < 2e-16 ***
## valence     -0.18599    0.04120  -4.514 6.37e-06 ***
## 
## Sigma coefficients (Dispersion submodel)
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.36790    0.04500  -8.176 2.22e-16 ***
## valence     -0.42062    0.06228  -6.754 1.44e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Converge:  successful completion
## Log-Likelihood:  435.2941 
## 
## Gradient:  -0.0376 -0.0387 0.0129 -0.0011

Model diagosis

# Compare the empirical distribution and the fitted values distribution
plot(fit)

# Plot the fitted values
plot(fitted(fit, "full"))

# Check Residuals
plot(residuals(fit, "raw"))

References