\documentclass[nojss]{jss} \usepackage{dsfont} \usepackage{bbm} \usepackage{bm} \usepackage{amssymb} \usepackage{amsmath} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% just as usual \author{P. M. E. Altham\\University of Cambridge\And Robin K. S. Hankin\\University of Stirling} \title{Multivariate Generalizations of the Multiplicative Binomial Distribution: Introducing the \pkg{MM} Package} %\VignetteIndexEntry{The Multiplicative Multinomial distribution} %% for pretty printing and a nice hypersummary also set: \Plainauthor{P. M. E. Altham and Robin K. S. Hankin} \Plaintitle{Multivariate Generalizations of the Multiplicative Binomial Distribution: Introducing the MM Package} \Shorttitle{The MM distribution} %% an abstract and keywords \Abstract{ We present two natural generalizations of the multinomial and multivariate binomial distributions, which arise from the multiplicative binomial distribution of~\citet{altham1978}. The resulting two distributions are discussed and we introduce an \proglang{R} package, \pkg{MM}, which includes associated functionality. This vignette is based on~\cite{altham2012}. } \Keywords{Generalized Linear Models, Multiplicative Binomial, Overdispersion, Overdispersed Binomial, Categorical Exponential Family, Multiplicative Multinomial Distribution, \proglang{R}} \Plainkeywords{Generalized Linear Models, Multiplicative Binomial, Overdispersion, Overdispersed Binomial, Categorical Exponential Family, Multiplicative Multinomial Distribution, R} %% publication information %% NOTE: This needs to filled out ONLY IF THE PAPER WAS ACCEPTED. %% If it was not (yet) accepted, leave them commented. %% \Volume{13} %% \Issue{9} %% \Month{September} %% \Year{2004} %% \Submitdate{2004-09-29} %% \Acceptdate{2004-09-29} %% The address of (at least) one author should be given %% in the following format: \Address{ P. M. E. Altham\\ Statistical Laboratory\\ Centre for Mathematical Sciences\\ Wilberforce Road Cambridge CB3 0WB\\ United Kingdom\\ E-mail: \email{P.M.E.Altham@statslab.cam.ac.uk} Robin K. S. Hankin\\ University of Stirling\\ Scotland\\ E-mail: \email{hankin.robin@gmail.com} } %% It is also possible to add a telephone and fax number %% before the e-mail in the following format: %% Telephone: +43/1/31336-5053 %% Fax: +43/1/31336-734 %% for those who use Sweave please include the following line (with % symbols): %% need no \usepackage{Sweave.sty} %% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \SweaveOpts{echo=FALSE} \begin{document} \newcommand{\boldp}{\mathbf{p}} \newcommand{\boldy}{\mathbf{y}} \newcommand{\boldtheta}{{\mathbf\theta}} \newcommand{\bn}{\mathbf{n}} \section{Introduction} <>= options("show.signif.stars" = FALSE) options(prompt = "R> ", continue = "+ ", width = 70, useFancyQuotes = FALSE) @ The uses of the binomial and multinomial distributions in statistical modelling are very well understood, with a huge variety of applications and appropriate software, but there are plenty of real-life examples where these simple models are inadequate. In the current paper, we first remind the reader of the two-parameter exponential family generalization of the binomial distribution first introduced by~\citet{altham1978} to allow for over- or under-dispersion: \begin{equation}\label{altham.eqn9} \Prob\left(Z=j\right) = \left. {n \choose j}p^jq^{n-j}\theta^{j(n-j)}\right/ f(p,\theta,n)\qquad j=0,\ldots,n \end{equation} where \begin{equation}\label{altham.eqn9half} f(p,\theta,n)=\sum_{j=0}^n{n\choose j} p^jq^{n-j}\theta^{j(n-j)}. \end{equation} Here, $0\leqslant p\leqslant 1$ is a probability, $p+q=1$, and~$\theta>0$ is the new parameter which controls the shape of the distribution; the standard binomial~$\mathsf{Bi}(n,p)$ is recovered if $\theta=1$. \citeauthor{altham1978} points out that this distribution is more sharply peaked than the binomial if~$\theta>1$, and more diffuse if~$\theta<1$. As far as we are aware, no other generalization has this type of flexibility; for example, the beta-binomial distribution~\citep{johnson2005} only allows for {\em over-}dispersion relative to the corresponding binomial distribution. We then introduce two different generalizations, both of which are of exponential family form. We call these: \begin{description} \item[The multivariate multinomial distribution]\hfill\\ Take non-negative integers $y_1, \ldots, y_k \geqslant 0$ with $\sum y_i = y$, a fixed integer. Then suppose that the probability mass function of~$Y_1, \ldots, Y_k$ is \begin{equation}\label{mult_mult} \Prob(y_1, \ldots, y_k) = C^{-1} {y \choose y_1\ldots y_k} \prod_{i=1}^{k} p_i^{y_i} \prod_{1\leqslant i 0$ for $1\leqslant i1$, negative if~$\phi<1$. The conditional distribution is again of multiplicative binomial form, since we can write \begin{equation}\label{cond_fish} \Prob\left(X_1=x_1 |X_2=x_2\right)\propto {m_1 \choose x_1\, z_1} \left(p_1 \phi^{x_2}\right)^{x_1} q_1 ^{z_1} \theta_1 ^{x_1z_1}. \end{equation} \section[The MM package]{The \pkg{MM} package} The \pkg{MM} package associated with this article provides \proglang{R} functionality for assessing the multiplicative multinomial and multivariate binomial. We have provided user-friendly wrappers to expedite use of the distributions in a data analysis setting. The \pkg{MM} package uses an object-oriented approach: The set of free parameters (one vector, one upper-diagonal matrix) is not a standard \proglang{R} object, and is defined to be an object of \proglang{S4} class \code{paras}. The objects thus defined are user-transparent and a number of manipulation methods are provided in the package. For example, consider Equation~\ref{mult_mult} with~$k=5$ and~$p_i=\frac{1}{5}$ , $1\leqslant i\leqslant 5$ and~$\theta_{ij}=2$ for~$1\leqslant i>= options('digits'=5) @ <>= library("MM") pm1 <- paras(5,pnames=letters[1:5]) theta(pm1) <- 2 pm1 @ Now we may sample repeatedly from the distribution (sampling is quick because it does not require evaluation of the normalization constant). Consider~$y=20$: <>= set.seed(0) (sample1 <- rMM(n=10, Y=20, paras=pm1)) @ See how closely clustered the sample is around its mean of~$(4,4,4,4,4)$; compare the wider dispersion of the multinomial: <>= pm2 <- pm1 theta(pm2) <- 1 # revert to classical multinomial (sample2 <- rMM(n=10, Y=20, paras=pm2)) @ Thus \code{sample2} is drawn from the classical multinomial. It is then straightforward to perform a likelihood ratio test on, say, \code{sample1}: <>= support1 <- MM_allsamesum(sample1, paras=pm1) support2 <- MM_allsamesum(sample1, paras=pm2) @ <>= support1-support2 @ Function \code{MM_allsamesum()} calculates the log likelihood for a specific parameter object (in this case, \code{pm1} and \code{pm2} respectively) and we see that, for \code{sample1}, hypothesis \code{pm1} is preferable to~\code{pm2} on the grounds of a likelihood ratio of about~$\Lambda=\Sexpr{round(exp(-support1+support2)*1e6,2)}\times 10^{-6}$, corresponding to \Sexpr{round(support1-support2,2)} units of support. This would exceed the two units of support criterion suggested by~\cite{edwards1992} and we could reject~\code{pm2}. Alternatively, we could observe that~$-2\log\Lambda$ is in the tail region of its asymptotic distribution, $\chi^2_1$. The package includes a comprehensive suite of functionality which is documented through the \proglang{R} help system and accessible by typing \code{library(help="MM")} at the command prompt. \section{The package in use} The package comes with a number of datasets, four of which are illustrated here. We begin with a small synthetic dataset, then consider data taken from the social sciences, previously analyzed by~\cite{wilson1989}; analyze some pollen counts considered by~\cite{mosimann1962} in the context of palaeoclimatology; and finally assess a marketing science dataset. \subsection{Synthetic voting dataset} \label{voting_dataset} We begin with a small synthetic dataset which is simple enough to illustrate the salient aspects of the multiplicative multinomial distribution, and the \pkg{MM} package. This dataset arises from~96 households each of size~4, in which each member of the household is noted as voting~\code{Lib}, \code{Con} or \code{Lab} respectively. We take~$n(\cdot,\cdot,\cdot)$ as the voting tally; thus~$n(0,0,4)=5$ [the first line] means that there are exactly~5 households in which all~4 members vote Labour; similarly~$n(0,1,3)=8$ means that there are exactly~8 households in which~1 member votes Conservative and the remaining~3 vote Labour. <>= data("voting") cbind(voting,voting_tally) @ One natural hypothesis is that the data are drawn from a multinomial distribution (alternative hypotheses might recognize that individuals within a given household may be non-independent of each other in their voting). The multinomial hypothesis may be assessed using \code{glm()} following~\cite{lindsey1992} but without the interaction terms: <>= Off <- -rowSums(lfactorial(voting)) jj <- glm(voting_tally~-1+(.)+offset(Off),data=as.data.frame(voting),family=poisson) @ \begin{Schunk} \begin{Sinput} R> Off <- -rowSums(lfactorial(voting)) R> summary(glm(voting_tally ~ -1 + (.) + offset(Off), data = as.data.frame(voting), family = poisson)) \end{Sinput} \end{Schunk} <>= Off <- -rowSums(lfactorial(voting)) summary(glm(voting_tally~-1+(.)+offset(Off), data=as.data.frame(voting),family=poisson)) @ Thus the model fails to fit (\Sexpr{round(jj$deviance,3)} %$......... being much larger than the corresponding degrees of freedom, 12). This is because the observed frequencies of the cells in which all members of the household vote for the same party (namely for rows~1, 5 and~15 of the data) greatly exceed the corresponding expected numbers under the simple multinomial model. The next step is to take account of the fact that individuals within a given household may be non-independent of each other in their voting intentions (and may indeed tend to disagree with each other rather than all vote the same way). Positive dependence between individuals in a household could be modelled by the Dirichlet-multinomial distribution~\citep{mosimann1962}, but by using the Multiplicative Multinomial introduced here, we are allowing dependence between individuals in a household to be positive or negative. The MM parameters may be estimated, again following~\cite{lindsey1992} but this time admitting first-order interaction, using bespoke function~\code{Lindsey()}: <>= Lindsey(voting, voting_tally, give_fit = TRUE) @ <>= jjvoting <- Lindsey(voting, voting_tally, give=TRUE) p.fit <- jjvoting$fit$coefficients[1:3] p.mle <- p(jjvoting$MLE) @ Observe that the MLEs of~$p$ [viz~$(\Sexpr{round(p.mle[1],3)}, \Sexpr{round(p.mle[2],3)}, \Sexpr{round(p.mle[3],3)})$] are obtained as proportional to the exponential of the estimated regression coefficients: $(e^{\Sexpr{round(p.fit[1],3)}}, e^{\Sexpr{round(p.fit[2],3)}}, e^{\Sexpr{round(p.fit[3],3)}})$, normalized to add to 1. This model is quite a good fit in the sense that the null deviance~$\Sexpr{round(jjvoting$fit$deviance,1)}$ is not in the tail region of~$\chi^2_9$, its null distribution; it can be seen that all~3 interaction parameters are significant and, for example, $\hat{\theta}_{23} =\Sexpr{round(theta(jjvoting$MLE)[2,3],3)} =\exp\left(\Sexpr{round(log(theta(jjvoting$MLE))[2,3],3)}\right)$. The corresponding conditional cross-ratios are all significantly greater than~$1$; for example \begin{equation}\label{conditionalcrossratios} \frac{\hat{\Prob}_{11hl}\hat{\Prob}_{22hl}}{\hat{\Prob}_{12hl}\hat{\Prob}_{21hl}} = \frac{1}{\hat{\theta}_{12}^2} = \frac{1}{0.6735116^2} = 2.2045. \end{equation} \subsection{Housing satisfaction data} We now consider a small dataset taken from Table~1 of~\citet{wilson1989}, who analyzed the datset in the context of overdispersion (\citeauthor{wilson1989} himself took the dataset from Table~1 of~\citet{brier1980}). In a non-metropolitan area, there were~18 independent neighbourhoods each of~5 households, and each household gave its response concerning its personal satisfaction with their home. The allowable responses were `unsatisfied' (\code{US}), `satisfied' (\code{S}), and `very satisfied' (\code{VS}). <>= data("wilson") head(non_met) @ Thus the first neighbourhood had three households responding US, two reporting S, and zero reporting VS; the second neighbourhood had the same reporting pattern. Observe that the~5 households within a neighbourhood may not be independent in their responses. The first step is to recast the dataset into a table format; the package provides a function \code{gunter()} [named for the \proglang{R}-lister who suggested the elegant and fast computational method]: <>= wilson <- gunter(non_met) wilson @ Thus~1 neighbourhood reported \code{c(5,0,0)}, and~5 neighbourhoods reported~\code{c(4,1,0)} [because \code{d[1]=1} and \code{tbl[1,]=c(5,0,0)}; and \code{d[2]=5} and \code{tbl[2,]=c(4,1,0)} respectively]. The hypothesis that the data are drawn from a multinomial distribution may again be assessed by using~\citeauthor{lindsey1992}'s technique: <>= wilson$tbl <- as.data.frame(wilson$tbl) Off <- -rowSums(lfactorial(wilson$tbl)) jj <- glm(wilson$d~-1+(.)+offset(Off),data=wilson$tbl,family=poisson) @ <>= attach(wilson) Off <- -rowSums(lfactorial(tbl)) summary(glm(d~-1+(.)+offset(Off),data=tbl,family=poisson)) @ <>= detach(wilson) @ Thus the multinomial model is a reasonable fit, in the sense that the residual deviance of~\Sexpr{round(jj$deviance,3)} %$.................. is consistent with the null distribution, $\chi^2_{18}$. The slightly increased value would be because the observed frequencies for neighbourhoods in agreement (that is, either perfect agreement---\code{c(5,0,0)} or \code{c(0,5,0)} or \code{c(0,0,5)}---or near-perfect, as in \code{c(4,1,0)}) exceed the corresponding expected numbers under the simple multinomial model. The MM parameters may be estimated, again following~\cite{lindsey1992} but this time admitting first-order interaction: <>= Lindsey(wilson,give_fit=TRUE) @ Thus in this dataset, only the first interaction parameter \code{US:S} is significant. This might be interpreted as an absence of \code{VS} responses coupled with a broader than expected spread split between \code{US} and \code{S}. Note that the residual deviance is now less than the corresponding degrees of freedom. In this case, the three categories \code{US}, \code{S}, and \code{VS} are ordered, a feature which is not used in the present approach. It is not clear at this stage how we could best include information about such ordering into our analysis. We now check agreement of the observed and expected sufficient statistics: <>= summary(suffstats(wilson)) @ The \code{summary()} method gives normalized statistics so that, for example, the \code{row_sums} total~$y=5$. This may be compared with the expectation of the maximum likelihood MM distribution: <>= L <- Lindsey(wilson) <>= expected_suffstats(L,5) @ showing agreement to within numerical precision. \subsection[Mosimann's forest pollen dataset]{\citeauthor{mosimann1962}'s forest pollen dataset} \label{mfpd} Palynology offers a unique perspective on palaeoclimate; pollen is durable, easily identified, and informative about climate~\citep{faegri1992}. We now consider a dataset collected in the context of palaeoclimate~\citep{sears1955,clisby1955}, and further analyzed by~\cite{mosimann1962}. We consider a dataset taken from the Bellas Artes core from the Valley of Mexico~\cite[Table 2]{clisby1955}; details of the site are given by~\cite{foreman1955}. The dataset comprises a matrix with~$N=73$ observations, each representing a depth in the core, and~$k=4$ columns, each representing a different type of pollen. We follow~\citeauthor{mosimann1962} in assuming that the~$73$ observations are independent, and in restricting the analysis to depths at which a full complement of~100 grains were identified. <>= data("pollen") pollen <- as.data.frame(pollen) head(pollen) @ Thus each row is constructed to sum to~$100$, and there are~$4$ distinct types of pollen; hence in our notation~$y=100$ and~$k=4$. Observe that this dataset, in common with the housing satisfaction data considered above, has to be coerced to histogram form; but this time the numbers are larger. The \pkg{partitions} package~\citep{hankin2006} uses generating functions to determine that there are exactly <>= library("partitions") S(rep(100,4),100) @ possible non-negative integer solutions to~$y_1+y_2+y_3+y_4=100$ (most of these have zero observed count). Each of these solutions must be generated and this is achieved using the \code{compositions()} function of the \pkg{partitions} package~\citep{hankin2007}. <>= options('digits'=3) @ First we repeat some of~\citeauthor{mosimann1962}'s calculations as a check. Using the ordinary multinomial distribution~$\mathsf{Mn}(y,p)$, we find~$\hat{p}$ to be <>= p.hat <- colSums(pollen)/sum(pollen) @ The observed sample variances for the counts are <>= apply(pollen,2,var) @ but if the ordinary multinomial model held, we would expect these variances to be <>= 100*p.hat*(1-p.hat) @ respectively. This shows that the dataset has pronounced over-dispersion compared to the ordinary multinomial. Furthermore, the sample correlation matrix is not what we would expect from the ordinary multinomial. As~\citeauthor{mosimann1962} points out, the sample correlation matrix is <>= cor(pollen) @ while the correlation matrix for the multinomial corresponding to~$\hat{p}$ is actually <>= jj <- -outer(p.hat,p.hat)/sqrt(outer(p.hat,1-p.hat)*outer(1-p.hat,p.hat)) diag(jj) <- 1 jj @ (We have corrected what seems to be a small typo in the 4th column of this matrix in \citeauthor{mosimann1962}'s Table 2). It is particularly striking that the data show \emph{positive} correlations for~3 entries. Such positive correlations could never arise from the Dirichlet-multinomial distribution, but they will be exactly matched by our new multiplicative multinomial model. The full sample covariance matrix for the dataset is <>= var(pollen) @ which is precisely the covariance of the multiplicative multinomial distribution at the maximum likelihood (ML) parameters. <>= jj <- gunter(pollen) n <- jj$d y <- jj$tbl nm <- colnames(pollen) lind_poll <- Lindsey(pollen, give_fit=TRUE) fvm <- lind_poll$fit$fitted.values/nrow(pollen) @ \begin{figure}[htbp] \begin{center} <>= par(mfrow=c(2,2)) par(mai=c(0.2,0.2,0.2,0.2)*2) f <- function(yin,xlimits,ylimits,pine){ mfm <- tapply(fvm,yin,sum) p1 <- sum(n*yin)/sum(100*n) i <- 0:100 fvb <- dbinom(i, 100,p1) matplot(i, cbind(fvb, mfm), xlab="",ylab="probability", main=pine, type="b", pch=16, xlim=xlimits, ylim=ylimits) } f(y[,1],c(70,100),c(0,0.12),nm[1]) f(y[,2],c( 0, 10),c(0,0.35),nm[2]) legend("topright",lty=1:2,pch=16,col=1:2,legend=c('multinomial','MM')) f(y[,3],c( 0, 20),c(0,0.14),nm[3]) f(y[,4],c( 0, 10),c(0,0.25),nm[4]) @ \caption{Marginal frequency distributions for numbers of each of four pollen \label{fittedprobs} types based on the multinomial distribution (black) and the multiplicative multinomial (red).} \end{center} \end{figure} Calculating the Normalizing constant for the MM distribution is computationally expensive; \code{NormC()} takes over~60 seconds to execute on a~$2.66\,\mathrm{GHz}$ Intel PC running linux. For direct maximization of the log-likelihood function, for example by \pkg{MM} function \code{optimizer()}, one would have to call \code{NormC()} many times. Thus function \code{Lindsey()} represents, in this case, a considerable saving of time in maximizing the log-likelihood (the call below took under 15 seconds elapsed time): \begin{Schunk} \begin{Sinput} Lindsey(pollen, give_fit=TRUE) \end{Sinput} \end{Schunk} <>= lind_poll @ Thus we arrive at an apparently rather symmetrical set of parameter estimates (in the sense of the elements of~$\hat{p}$ being close to one another, and the elements of~$\hat{\theta}$ being close to unity). For this dataset, we observe that the asymptotic distribution of the residual deviance, $\chi^2_{176841}$, is not a good approximation for its actual distribution. This is because the frequency data is overwhelmingly comprised of zeros, with only~66 nonzero frequencies amongst the~176851 compositions. Function \code{glm()} tenders a warning to this effect. \subsection{Marketing science: An example of the multivariate multiplicative binomial} We now illustrate the multivariate multiplicative binomial with an example drawn from the field of economics. \cite{danaher2005} considered a dataset obtained from a sample of~$N=548$ households over four consecutive store trips. For each household, they counted the total number of egg purchases in their four eligible shopping trips, and the total number of bacon purchases for the same trips; the hypothesis was that egg consumption was correlated with bacon consumption. The dataset is provided with the \pkg{MM} package: <>= data("danaher") danaher @ <>= set.seed(1) jjfish <- fisher.test(as.array(danaher), sim=TRUE,B=1e6) jjv <- round(jjfish$p.value*1e6,3) @ Thus~16 households purchased eggs twice and bacon once (this is~\citeauthor{danaher2005}'s Table~1). The purchases of eggs and bacon are not independent\footnote{Fisher's exact test gives a $p$~value of~$\Sexpr{jjv}\times 10^{-6}$ with~$10^6$ replicates.} and we suggest fitting this data to the distribution given in Equation~\ref{fishpat}; here~$m_1=m_2=4$. The Poisson device of \citeauthor{lindsey1992} is again applicable: <>= fit <- Lindsey_MB(danaher) summary(fit) @ and \code{glm()} gives a good fit in the sense that the residual deviance of~\Sexpr{round(fit$deviance,3)}%$.................................. \ is compatible with its asymptotic null distribution~$\chi^2_{\Sexpr{fit$df.residual}}$.%$................... <>= jj <- coefficients(fit)[6] @ The \code{bacon:eggs} coefficient (ie~$\log\hat{\phi} = \Sexpr{round(jj,4)}$) gives~$\hat{\phi}=e^{\Sexpr{round(jj,4)}}=\Sexpr{round(exp(jj),4)}$, showing strong positive association. <>= rm(n) @ <>= N <- sum(danaher) @ We can now verify that the expected (marginal) number of egg purchases and bacon purchases under the ML distribution match the observed. The first step is to create \code{M}, the expected contingency matrix: <>= M <- danaher M[] <- fitted.values(fit) M @ Then we may verify, for example, that the fitted sum of bacon purchases matches its observed value: <>= bacon <- slice.index(danaher,1) eggs <- slice.index(danaher,2) <>= sum(bacon*danaher) # Observed number of bacon purchases sum(bacon*M) # Expectation; matches @ As a final check, we can verify that the sample covariance matches the distribution's covariance at the MLE: <>= sum(bacon*eggs*danaher)/N - sum(bacon*danaher)*sum(eggs*danaher)/N^2 sum(bacon*eggs*M) /N - sum(bacon*M )*sum(eggs*M )/N^2 @ again showing agreement to within numerical precision. \section{Suggestions for further work} The multiplicative multinomial is readily generalized to a distribution with~$2^k-1$ parameters: \begin{equation}\label{mm_gen} \Prob\left(y_1,\ldots,y_k\right)= {y\choose y_1\ldots y_k} \prod_{\mathcal{S}\subseteq\left[k\right]} \left(\Theta_\mathcal{S}\right)^{\prod_{i\in\mathcal S}y_i} \end{equation} where~$[k]=\left\{1,2,\ldots,k\right\}$ is the set of all strictly positive integers not exceeding~$k$. Here, the parameters are indexed by a subset of~$[k]$; it is interesting to note that~$\Theta_\varnothing$ formally represents the normalization constant~$C$. In this notation, Equation~\ref{mult_mult} becomes \begin{equation}\label{mm_gen2} \prod_{\mathcal{S}\subseteq\left[k\right]\atop \left|\mathcal{S}\right|\leqslant 2} {y\choose y_1\ldots y_k} \left(\Theta_\mathcal{S}\right)^{\prod_{i\in\mathcal S}y_i}. \end{equation} Equation~\ref{mm_gen} leads to a distribution of the exponential family type; but interpretation of the parameters is difficult, and further work would be needed to establish the usefulness of this extension. Further, Equation~\ref{fishpat} generalizes to \begin{equation}\label{bingen} \prod_{i=1}^t {m_i\choose x_i\, z_i}p_i^{x_i}q_i^{z_i}\theta_i^{x_iz_i} \prod_{i