The **pcFactorStan** package for **R**
provides convenience functions and pre-programmed **Stan**
models related to analysis of paired comparison data. Its purpose is to
make fitting models using Stan easy and easy to understand.
**pcFactorStan** relies on the **rstan**
package, which should be installed first. See
here for instructions on installing **rstan**.

One situation where a factor model might be useful is when there are people that play in tournaments of more than one game. For example, the computer player AlphaZero (Silver et al. 2018) has trained to play chess, shogi, and Go. We can take the tournament match outcome data for each of these games and find rankings among the players. We may also suspect that there is a latent board game skill that accounts for some proportion of the variance in the per-board game rankings. This proportion can be recovered by the factor model.

Our goal may be to fit a factor model, but it is necessary to build up the model step-by-step. There are essentially three models: ‘unidim’, ‘correlation’, and ‘factor’. ‘unidim’ analyzes a single item. ‘correlation’ is suitable for two or more items. Once you have vetted your items with the ‘unidim’ and ‘correlation’ models, then you can try the ‘factor’ model. There is also a special model ‘unidim_adapt’. Except for this model, the other models require scaling constants. To find appropriate scaling constants, we will fit ‘unidim_adapt’ to each item separately.

The R code below first loads **rstan** and
**pcFactorStan**. We load **loo** for extra
diagnostics, and **qgraph** and **ggplot2**
for visualization.

```
library(rstan)
library(pcFactorStan)
library(loo)
library(qgraph)
library(ggplot2)
library(Matrix)
```

Next we take a peek at the data.

`head(phyActFlowPropensity)`

pa1 | pa2 | skill | predict | novelty | creative | complex | goal1 | feedback1 | chatter | waiting | body | control | present | spont | stakes | evaluated | reward |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

mountain biking | tennis | 1 | -1 | -2 | 1 | 1 | 1 | 1 | -2 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 0 |

mountain biking | tennis | 1 | 2 | -1 | -1 | -1 | 0 | 2 | 1 | 2 | 0 | 1 | 0 | 0 | 1 | 2 | -1 |

ice skating | running | -2 | 1 | -1 | -2 | -1 | 1 | 1 | -2 | -2 | -1 | 0 | 0 | -1 | -1 | -1 | 0 |

climbing | rowing | -2 | 2 | -2 | -2 | -2 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 0 | 0 | 0 |

card game | gardening | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | -2 | 2 | 1 | 0 | 0 | 2 | -2 | 2 |

dance | table tennis | 0 | -2 | -1 | -1 | 0 | -1 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |

These data consist of paired comparisons of 87 physical activities on
16 flow-related facets. Participants submitted two activities using
free-form input. These activities were substituted into item templates.
For example, Item *predict* consisted of the prompt, “How
predictable is the action?” with response options:

`A1`

is much more predictable than`A2`

.`A1`

is somewhat more predictable than`A2`

.- Both offer roughly equal predictability.
`A2`

is somewhat more predictable than`A1`

.`A2`

is much more predictable than`A1`

.

If the participant selected ‘golf’ and ‘running’ for activities then
‘golf’ was substituted into `A1`

and ‘running’ into
`A2`

. Duly prepared, the item was presented and the
participant asked to select the most plausible statement.

A *somewhat more* response is scored 1 or -1 and *much
more* scored 2 or -2. A tie (i.e. *roughly equal*) is scored
as zero. A negative value indicates > (greater than) and positive
value indicates > (less than). For example, if `A1`

is
*golf*, `A2`

is *running*, and the observed
response is 2 then the endorsement is “*golf* is much less
predictable than *running*.”

We will need to analyze each item separately before we analyze them
together. Therefore, we will start with Item *skill*. Data must
be fed into **Stan** in a partially digested form. The next
block of code demonstrates how a suitable data list may be constructed
using the `prepData()`

function. This function automatically
determines the number of threshold parameters based on the range
observed in your data. One thing it does not do is pick a
`varCorrection`

factor. The `varCorrection`

determines the degree of adaption in the model. Usually some choice
between 2.0 to 4.0 will obtain optimal results.

```
dl <- prepData(phyActFlowPropensity[,c(paste0('pa',1:2), 'skill')])
dl$varCorrection <- 5.0
```

Next we fit the model using the `pcStan()`

function, which
is a wrapper for `stan()`

from **rstan**. We
also choose the number of chains. As is customary **Stan**
procedure, the first half of each chain is used to estimate the
sampler’s weight matrix (i.e. warm up) and excluded from inference.

`fit1 <- pcStan("unidim_adapt", data=dl)`

A variety of diagnostics are available to check whether the sampler ran into trouble.

```
check_hmc_diagnostics(fit1)
#>
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
```

Everything looks good, but there are a few more things to check. We want \(\widehat R\) < 1.015 and effective sample size greater than 100 times the number of chains (Vehtari et al., 2019).

```
allPars <- summary(fit1, probs=c())$summary
print(min(allPars[,'n_eff']))
#> [1] 593.8
print(max(allPars[,'Rhat']))
#> [1] 1.007
```

Again, everything looks good. If the target values were not reached then we would sample the model again with more iterations. Time for a plot,

```
theta <- summary(fit1, pars=c("theta"), probs=c())$summary[,'mean']
ggplot(data.frame(x=theta, activity=dl$nameInfo$pa, y=0.47)) +
geom_point(aes(x=x),y=0) +
geom_text(aes(label=activity, x=x, y=y),
angle=85, hjust=0, size=2,
position = position_jitter(width = 0, height = 0.4)) + ylim(0,1) +
theme(legend.position="none",
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
```

Intuitively, this seems like a fairly reasonable ranking for skill. As pretty as the plot is, the main reason that we fit this model was to find a scaling constant to produce a score variance close to 1.0,

```
s50 <- summary(fit1, pars=c("scale"), probs=c(.5))$summary[,'50%']
print(s50)
#> [1] 0.7181
```

We use the median instead of the mean because `scale`

is
not likely to have a symmetric marginal posterior distribution. We
obtained 0.7181, but that value is just for one item. We have to perform
the same procedure for every item. Wow, that would be really tedious …
if we did not have a function to do it for us! Fortunately,
`calibrateItems`

takes care of it and produces a table of the
pertinent data,

`result <- calibrateItems(phyActFlowPropensity, iter=1000L)`

`print(result)`

item | iter | divergent | treedepth | low_bfmi | n_eff | Rhat | scale | thetaVar |
---|---|---|---|---|---|---|---|---|

skill | 1500 | 0 | 0 | 0 | 434.3 | 1.011 | 0.7170 | 0.8754 |

predict | 1500 | 0 | 0 | 0 | 545.1 | 1.012 | 0.7030 | 0.8685 |

novelty | 1000 | 0 | 0 | 0 | 418.4 | 1.012 | 0.5647 | 0.7957 |

creative | 1000 | 0 | 0 | 0 | 439.8 | 1.007 | 0.5689 | 0.7980 |

complex | 1500 | 0 | 0 | 0 | 578.7 | 1.010 | 0.6459 | 0.8396 |

goal1 | 1500 | 0 | 0 | 0 | 491.8 | 1.008 | 0.0819 | 0.3676 |

feedback1 | 1500 | 0 | 0 | 0 | 449.8 | 1.012 | 0.1756 | 0.4986 |

chatter | 1000 | 0 | 0 | 0 | 497.1 | 1.003 | 0.2818 | 0.6026 |

waiting | 1500 | 0 | 0 | 0 | 756.0 | 1.009 | 0.5762 | 0.8021 |

body | 1000 | 0 | 0 | 0 | 492.2 | 1.006 | 0.4143 | 0.7030 |

control | 1000 | 0 | 0 | 0 | 430.9 | 1.003 | 0.3453 | 0.6535 |

present | 1000 | 0 | 0 | 0 | 533.0 | 1.004 | 0.2573 | 0.5810 |

spont | 1000 | 0 | 0 | 0 | 567.8 | 1.008 | 0.2894 | 0.6090 |

stakes | 1500 | 0 | 0 | 0 | 600.0 | 1.010 | 0.3002 | 0.6180 |

evaluated | 2250 | 0 | 0 | 0 | 949.6 | 1.006 | 0.5190 | 0.7693 |

reward | 1500 | 0 | 0 | 0 | 656.1 | 1.005 | 0.2137 | 0.5395 |

Items *goal1* and *feedback1* are prone to failure.
This happens when there is no clear ranking between objects. For
example, if we observe that `A<B`

, `B<C`

,
and `C<A`

then the only sensible interpretation is that
`A=B=C`

which can only have close to zero variance. We
exclude these two items with the smallest `scale`

. I
requested `iter=1000L`

to demonstrate how
`calibrateItems`

will resample the model until
`n_eff`

is large enough and `Rhat`

small enough.
As demonstrated in the *iter* column, some items needed more than
1000 samples to converge.

Next we will fit the correlation model. We exclude parameters that
start with the prefix `raw`

. These parameters are needed by
the model, but should not be interpreted.

```
pafp <- phyActFlowPropensity
excl <- match(c('goal1','feedback1'), colnames(pafp))
pafp <- pafp[,-excl]
dl <- prepData(pafp)
dl$scale <- result[match(dl$nameInfo$item, result$item), 'scale']
```

`fit2 <- pcStan("correlation", data=dl, include=FALSE, pars=c('rawTheta', 'rawThetaCorChol'))`

```
check_hmc_diagnostics(fit2)
#>
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
allPars <- summary(fit2, probs=0.5)$summary
print(min(allPars[,'n_eff']))
#> [1] NaN
print(max(allPars[,'Rhat']))
#> [1] NaN
```

The HMC diagnostics look good, but … oh dear! Something is wrong with
the `n_eff`

and \(\widehat
R\). Let us look more carefully,

```
head(allPars[order(allPars[,'sd']),])
#> mean se_mean sd 50% n_eff Rhat
#> thetaCor[1,1] 1 NaN 0.000e+00 1 NaN NaN
#> thetaCor[2,2] 1 9.391e-19 5.862e-17 1 3897.1 0.999
#> thetaCor[3,3] 1 1.083e-18 6.177e-17 1 3253.6 0.999
#> thetaCor[4,4] 1 4.465e-18 6.671e-17 1 223.2 0.999
#> thetaCor[5,5] 1 1.329e-18 6.743e-17 1 2574.7 0.999
#> thetaCor[7,7] 1 1.448e-18 7.711e-17 1 2837.5 0.999
```

Ah ha! It looks like all the entries of the correlation matrix are reported, including the entries that are not stochastic but are fixed to constant values. We need to filter those out to get sensible results.

```
allPars <- allPars[allPars[,'sd'] > 1e-6,]
print(min(allPars[,'n_eff']))
#> [1] 560.5
print(max(allPars[,'Rhat']))
#> [1] 1.008
```

Ah, much better. Now we can inspect the correlation matrix. There are
many ways to visualize a correlation matrix. One of my favorite ways is
to plot it using the **qgraph** package,

```
corItemNames <- dl$nameInfo$item
tc <- summary(fit2, pars=c("thetaCor"), probs=c(.1,.5,.9))$summary
tcor <- matrix(tc, length(corItemNames), length(corItemNames))
#> Warning in matrix(tc, length(corItemNames), length(corItemNames)): data length
#> differs from size of matrix: [1568 != 14 x 14]
tcor[sign(tc[,'10%']) != sign(tc[,'90%'])] <- 0 # delete faint edges
dimnames(tcor) <- list(corItemNames, corItemNames)
tcor <- nearPD(tcor, corr=TRUE)$mat
qgraph(tcor, layout = "spring", graph = "cor", labels=colnames(tcor),
legend.cex = 0.3,
cut = 0.3, maximum = 1, minimum = 0, esize = 20,
vsize = 7, repulsion = 0.8, negDashed=TRUE, theme="colorblind")
```