# BLE_Categorical

library(BayesSampling)

# Application of the BLE to categorical data

### (From Section 4 of the “Gonçalves, Moura and Migon: Bayes linear estimation for finite population with emphasis on categorical data”)

In a situation where the population can be divided into different and exclusive categories, we can calculate the Bayes Linear Estimator for the proportion of individuals in each category with the BLE_Categorical() function, which receives the following parameters:

• $$y_s$$ - $$k$$-vector of sample proportion for each category;
• $$n$$ - sample size;
• $$N$$ - total size of the population;
• $$m$$ - $$k$$-vector with the prior proportion of each category. If NULL, sample proportion for each category will be used (non-informative prior);
• $$rho$$ - matrix with the prior correlation coefficients between two different units within categories. It must be a symmetric square matrix of dimension $$k$$ (or $$k-1$$). If NULL, non-informative prior will be used (see below).

### Vague Prior Distribution

Letting $$\rho_{ii} \to 1$$, that is, assuming prior ignorance, the resulting point estimate will be the same as the one seen in the design-based context for categorical data.

This can be achieved using the BLE_Categorical() function by omitting either the prior proportions and/or the parameter rho, that is:

• $$m =$$ NULL - sample proportions in each category will be used
• $$rho =$$ NULL - $$\rho_{ii} \to 1$$ and $$\rho_{ij} = 0, i \neq j$$

### R and Vs Matrices

If the calculation of matrices R and Vs results in non-positive definite matrices, a warning will be displayed. In general this does not produce incorrect/ inconsistent results for the proportion estimate but for its associated variance. It is suggested to review the prior correlation coefficients (parameter rho).

### Examples

1. Example presented in the mentioned article (2 categories)
ys <- c(0.2614, 0.7386)
n <- 153
N <- 15288
m <- c(0.7, 0.3)
rho <- matrix(0.1, 1)
Estimator <- BLE_Categorical(ys,n,N,m,rho)

Estimator$est.prop #>  0.2855228 0.7144772 Estimator$Vest.prop
#>              [,1]         [,2]
#> [1,]  0.001155671 -0.001155671
#> [2,] -0.001155671  0.001155671

Bellow we can see that the greater the correlation coefficient, the closer our estimation will get to the sample proportions.

ys <- c(0.2614, 0.7386)
n <- 153
N <- 15288
m <- c(0.7, 0.3)
rho <- matrix(0.5, 1)
Estimator <- BLE_Categorical(ys,n,N,m,rho)

Estimator$est.prop #>  0.2642195 0.7357805 Estimator$Vest.prop
#>               [,1]          [,2]
#> [1,]  0.0006750388 -0.0006750388
#> [2,] -0.0006750388  0.0006750388
1. Example from the help page (3 categories)
ys <- c(0.2, 0.5, 0.3)
n <- 100
N <- 10000
m <- c(0.4, 0.1, 0.5)
mat <- c(0.4, 0.1, 0.1, 0.1, 0.2, 0.1, 0.1, 0.1, 0.6)
rho <- matrix(mat, 3, 3)

Estimator <- BLE_Categorical(ys,n,N,m,rho)

Estimator$est.prop #>  0.2221967 0.4785131 0.2992902 Estimator$Vest.prop
#>               [,1]          [,2]          [,3]
#> [1,]  0.0013711226 -0.0004980297 -0.0008730929
#> [2,] -0.0004980297  0.0006722052 -0.0001741755
#> [3,] -0.0008730929 -0.0001741755  0.0010472684

Same example, but with no prior correlation coefficients informed (non-informative prior)

ys <- c(0.2, 0.5, 0.3)
n <- 100
N <- 10000
m <- c(0.4, 0.1, 0.5)

Estimator <- BLE_Categorical(ys,n,N,m,rho=NULL)
#> parameter 'rho' not informed, non informative prior correlation coefficients used in estimations
#> Warning in BLE_Categorical(ys, n, N, m, rho = NULL): 'Vest.prop' should have
#> only positive diagonal values. Review prior specification and verify calculated
#> matrices 'R' and 'Vs'.

Estimator\$est.prop
#>  0.2017585 0.4996729 0.2985685