---
title: "Sequential testing of hypotheses about population density with the `sequential.pops` package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Seq-vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(000)
```

## Introduction

The `sequential.pops` package is designed to simplify the development, evaluation 
and analysis of sequential designs for testing hypotheses about population
density. Sequential analyses are particularly useful for decision-making in situations 
where estimation is not the goal, but where critical actions should be triggered if
population sizes reach or are above or below predefined levels. This situations include
deciding whether a pest control action should be applied, whether an area should be
declared free from an invasive species, or whether early capture numbers of a pest
are consistent with an outbreak. Sequential analyses can also be useful in fields 
outside ecology and pest management, such as quality control, fraud detection and
adaptive clinical trials, but those applications are out of the scope of this package.

Sequential data arrive in sampling bouts over time and are processed as they come, 
in order to evaluate hypotheses on which practical decisions are made once sufficient 
evidence has been collected. Sequential designs are typically more cost-efficient
than conventional fixed-sample-size approaches, and can be purely sequential (one-at-a-time)
or group sequential (`n > 1` per sampling bout). The analysis of biological population
densities is often carried out by collecting count data from traps or field observations
or binomial records from structured clusters of sampling units. The `sequential.pops`
package focuses on count and binomial data with or without overdispersion to test hypotheses
about non-negative population sizes.

There are other R packages that deal with sequential designs. Most existing
packages, such as `GroupSeq`, `Grpseq`, `gsbDesign` and `seqmon`, focus on adaptive clinical 
trials for normally distributed variables. Others, such as `gscounts`, `gsDesign`, `ASSISTant`,
`Sequential` and `BayesOrdDesign` can take non-normal variables but are also focused 
on clinical trials and can hardly be adapted to process count or binomial data from 
ecological or agricultural settings and evaluate hypotheses about population densities. 
Other packages with a more general scope, such as `MSPRT` and `sprtt`, include a limited 
selection of probability distributions and lack functions to evaluate tests' operating
characteristics.

The `sequential.pops` is the first R package devoted to test hypotheses on biological 
population densities. Two approaches are included: the Sequential Test of Bayesian
Posterior Probabilities (STBP) (Rincon et al. 2025), and the Sequential Probability
Ratio Test (SPRT) (Wald 1945). Both the STBP and the SPRT have their own pros and cons 
but are the most popular and state-of-the-art approaches for sequential analyses in 
ecology and pest management. This tutorial walks you through the development, evaluation 
and analysis of sequential designs with STBP and SPRT using the `sequential.pops`
package.

## 1. The Sequential Test of Bayesian Posterior Probabilities

This test is specifically designed to test hypotheses about population densities,
although it may have applications in other fields. It works by sequentially updating 
the conditional probability of the hypothesis being tested as new data is processed. The
two main benefits of using STBP is that is distribution-agnostic (works with virtually
any probability distribution), and that is one of the few sequential analyses that
is "likelihood ratio-free" meaning that it can test single hypotheses without having 
to specify a non-complementary alternative. As we'll see later, most sequential tests, 
such as the SPRT (section 2), require two hypotheses, rather than one, because they are based on 
likelihood ratios. Another advantage of the STBP is that it can process purely 
sequential (one-at-a-time) or group sequential (in batches of samples), balanced or 
unbalanced, and static (single-value) or dynamic (population models) hypotheses without 
major modifications. 

The main disadvantage of the STBP is that it is so new that has not been as tested as its
counterparts in the rough streets of real datasets. The STBP is also pretty sensitive to
sample sizes within sampling bouts. In theory, it outperforms other sequential analysis
as long as several samples are collected within each sampling bout, but type I error rate
(accepting H when is false) gets ugly in purely sequential designs.

### Example 1: Testing species absence

Let's start with a simple case in which one wants to test the hypothesis that certain
species is absent in sampled area. As Carl Sagan said "the absence of evidence is not 
evidence of absence". In other words, we can never be sure about the absence of an species
in an area, but we can calculate the *probability* of the species being absent *given* 
the collected data.

Most small population sizes, like those of recently introduced species, are well
described by Poisson distributions. So, in this case, we want to test if H: `mu = 0`
assuming random sampling and counts following a Poisson distribution. If we are lucky
and can collect say 30 random samples each time, and they all result in absences (zeros),
we should be able to track the (posterior) probability of H being true after each sampling bout
of 30.

Data structure in `sequential.pops` for counts is based on matrices, where columns
represent time (sampling bouts) and rows samples within bouts. So, for example, if we
have collected 4 sampling bouts each with 30 samples, data should be arranged in a 
`30 x 4` matrix. In this case, we have all zeros:
```{r setup}
a30 <- matrix(rep(0, 120), 30, 4)
head(a30, 3) # only first three rows out of 30 are shown
```
To run the test, we use `stbp_simple`, which is specially designed to test hypotheses
about species absence, so we do not need to specify the hypothesis. We do need to 
provide the matrix with the `data`, the distribution family (as a character string)
in `density_func`, the initial `prior` probability (often 0.5), and `upper_criterion`
and `lower_criterion`, the upper and lower criteria to stop sampling and make a decision. 
The argument `upper_bnd` is almost always `Inf`, since it represents the upper 
limit of the parameter space for `mu`.
```{r}
library(sequential.pops)

test30 <- stbp_simple(data = a30,
                      density_func = "poisson",
                      prior = 0.5,
                      upper_bnd = Inf,
                      lower_criterion = 0,
                      upper_criterion = 0.9999)
test30
```
Given that all samples are zero and assuming a completely efficient detection rate,
we can say that after 3 sampling bouts of 30 samples there is a pretty high change that
H: `mu = 0` is true. How spaced in time sampling bouts should be depends on the species
being sampled and on what we consider a *different* sampling time. Notice that `upper_criterion` 
and `lower_criterion` are set close to 1 and 0, respectively. In reality, to test species 
absence, only `upper_criterion` is important to minimize type II error, since the 
posterior probability of H will become 0 as soon as we get the first detection (`data > 0`).

If sample size within sampling bouts is reduced, we should expect to loose power and
require more bouts to collect enough evidence to achieve similar levels of certainty
that the species is absent (in case we keep getting zeros, of course). So, if we can
only collect 3 samples per sampling bout, and we get all zeros the matrix with the data
should be specified as:
```{r}
a3 <- matrix(rep(0, 30), 3, 10)
a3
```
for 10 sampling bouts. After running the test with:
```{r}
test3 <- stbp_simple(data = a3,
                     density_func = "poisson",
                     prior = 0.5,
                     upper_bnd = Inf,
                     lower_criterion = 0,
                     upper_criterion = 0.9999)
test3
```
we can see that we now require 9 sampling bouts to achieve similar levels of confidence
to declare that our species is absent from the sampled area. The number of bouts with 
all zeros required to accept H keeps increasing as sampling size within bouts is reduced.
In fact, when sample size is one (purely sequential), the test looses power and the posterior 
remains unchanged regardless the number of bouts. You can check the progression of
the posterior probabilities for different sample sizes by calling `plot`, for the
sequential design with 30 samples:
```{r}
plot(test30)
```

and for the sequential design with 3 samples:
```{r}
plot(test3)
```

### Example 2: Testing if a population is above a static threshold

The STBP can also assist in situations where one wants to test if a given population
is above or below a relevant threshold for management. This is useful to determine,
for example, if a pest population has reached an economic threshold or if an endangered 
species is below the Allee threshold. For the remaining of this tutorial, we will focus
on hypotheses about populations being *above* a threshold, but the procedure is similar 
for hypotheses with the opposite direction. In fact this is specified through a single
argument in the function `stbp_composite`.

For this example, we will use the case study of the tomato leafminer in greenhouse 
tomatoes (Rincon et al. 2021). The tomato leafminer is a significant pest of tomatoes
but prefers mostly to feed on leaves where damage is minimum and will only turn to
fruits when population density increases. The economic threshold varies depending 
on whether tomato fruits are present or not, but for now let's assume the threshold is
static (constant) at 9 larvae per plant and we want to test through sequential sampling
if a population has exceeded such threshold.

When population densities are not so small, chances are that the Poisson distribution
is no longer appropriate to describe counts and that a probability distribution a that 
allow for overdispersion (or aggregation) is required. As populations increase, very 
often they aggregate in clusters which increase the dispersion of the data to levels 
that are not accounted for by the Poisson. This is the case for the tomato leafminer, 
so the STBP should be defined with a negative binomial distribution and some specification 
for the dispersion, which is obtained from studies of the counts observed in the field. 
For more details about how to conduct these studies the reader is referred to textbooks 
like Binns et al. (2000).

Let's start by generating some negative binomial data in a purely sequential design 
(`n = 1` per sampling bout):
```{r}
counts1 <- rnbinom(20, mu = 5, size = 4.5)
counts1
```
This is 20 random counts from a population of 5 larvae per sample unit (plant) in size 
that follows a negative binomial distribution with an overdispersion of 4.5. We can now
build a test for the hypothesis of H: `mu > 9` larvae per plant with:
```{r}
test_counts1 <- stbp_composite(data = counts1,
                               greater_than = TRUE,
                               hypothesis = 9,
                               density_func = "negative binomial",
                               overdispersion = 4.5,
                               prior = 0.5,
                               lower_bnd = 0,
                               upper_bnd = Inf,
                               lower_criterion = 0.01,
                               upper_criterion = 0.99)
test_counts1
```
Notice that this time we use the function `stbp_composite`, which is designed to test
threshold-style hypotheses. The direction of the hypothesis (whether is "greater than" or
"less than") is controlled by the argument `greater_than` and the threshold is included in
the argument `hypothesis`. `lower_bnd` and `upper_bnd` are the parameter space boundaries for
the hypothesis, `mu`, and is almost always 0 and `Inf`, respectively. `lower_criterion` and
`upper_criterion` are the posterior probabilities above or below which a decision is
made.

Many times, however, overdispersion is not this simple and should be specified as a
function of the mean, rather than as a constant. This is the case for most animal and
plant populations because overdispersion is highly tied to the variance, and variance
of population counts tends to increase exponentially with the mean. There are several
ways to empirically describe variance (and overdispersion) as a function of the mean,
but the most popular one is Taylor's Power Law. In the `sequential.pop` package, you can
use whatever function you want, as it just must be specified before running the test.
For example, the Taylor's Power Law parameters that describe the variance-mean relationship
of tomato leafminer counts are `a = 1.83` and `b = 1.21`, so overdispersion, `k`, is
given by:
```{r}
estimate_k <- function(mean) {
  a = 1.83 
  b = 1.21
  
  (mean^2) / ((a * mean^(b)) - mean)
}
```
we can also incorporate more realistic values of overdispersion to the generated counts
by including `estimate_k` in the count generation function:
```{r}
counts1a <- rnbinom(20, mu = 5, size = estimate_k(5))
counts1a
```
and the test can be specified as:
```{r}
test_counts1a <- stbp_composite(data = counts1a,
                                greater_than = TRUE,
                                hypothesis = 9,
                                density_func = "negative binomial",
                                overdispersion = "estimate_k",
                                prior = 0.5,
                                lower_bnd = 0,
                                upper_bnd = Inf,
                                lower_criterion = 0.01,
                                upper_criterion = 0.99)
test_counts1a
```
the overdispersion function is specified as a character string with the function name
which must be added to the R environment before running the test. Although it is not
the case for this instance of randomly generated numbers, there are noticeable gains
in efficiency (fewer average number of sampling bouts to make a decision) and accuracy
(less bad decisions) when an appropriate the overdispersion function is included 
as part of a STBP (Rincon et al. 2025).

As with `stbp_simple` you can call `plot` to visualize the change in posterior
probabilities with sequential sampling bouts for `stbp_composite` tests. 
For the test with a constant overdispersion:
```{r}
plot(test_counts1)
```

and with overdispersion as a function of the mean:
```{r}
plot(test_counts1a)
```

Sometimes data on populations do not come in counts but as binary outcomes when sampling
units are evaluated as presence/absence or infested/uninfected, for example. For those
cases, `stbp_composite` is equipped with `binomial` and `beta-binomial` options for
the argument `density_func`. Like count data, the selection of one or the other depends
on whether there is overdispersion in the data, with `beta-binomial` being the option for
overdispersed binomial variables. For these kind of data, two pieces of information are 
required from each sample unit: the number of successes (e.g., present or infected) 
and the total number of samples from which observations where made. For this reason, data
for binomial variables should be structured as a `list()` of `n x 2` matrices, where
each matrix corresponds to a sampling bout, and `n` is the number of sample units (clusters 
of samples examined). The number of successes must be included in column 1 and the 
number of samples in column 2 of each matrix.

For example, if a sequential sampling is designed to collect 3 clusters (sampling units)
each made of 10 samples and 7 sampling bouts have been collected data should be 
introduced like:
```{r}
library(emdbook) # for rbetabinom()

binom1 <- list()
for(i in 1: 7) {
  binom1[[i]] <- matrix(c(emdbook::rbetabinom(3, prob = 0.2,
                                              size = 10,
                                              theta = 6.5), rep(10, 3)), 3, 2)
}

head(binom1, 3) # only first three sampling bouts out of seven are shown.
```
The data generated in the list `binom1` is overdispersed with a constant overdispersion
parameter, `theta` and mean incidence of `prob = 0.2`. All the required support 
to work with the beta-binomial distribution is in the `emdbook` package, so it must 
be loaded before running any sequential tests that involve this distribution. The 
STBP can be run on the data `binom1` with:
```{r}
test_bin1 <- stbp_composite(data = binom1,
                            greater_than = TRUE,
                            hypothesis = 0.15,
                            density_func = "beta-binomial",
                            overdispersion = 6.5,
                            prior = 0.5,
                            lower_bnd = 0,
                            upper_bnd = 1,
                            lower_criterion = 0.001,
                            upper_criterion = 0.999)
test_bin1
```
where the STBP correctly infers that the data in `binom1` is consistent with 
H: `mu > 0.15`. Like overdispersed count data, overdispersion for binomial data 
can also be included as a function of mean proportion (sometines called incidence). 
Again, the overdispersion model is derived from the variance-mean (variance-incidence) 
relationship, and the most popular form is the one provided by Madden et al. (2007), 
although the `sequential.pops` package allows you to pick whatever function you want. 
The function that describes overdispersion, `theta`, as a function of the mean proportion 
for the tomato leafminer is defined as (Madden et al. 2007):
```{r}
estimate_theta <- function(mean) {
  A = 780.72
  b = 1.61
  R = 10
  
  (1 / (R - 1)) * (((A * R^(1 - b))/ ((mean * (1 - mean))^(1 - b))) - 1)
}
```
where `A` and `b` are parameters from the variance-incidence model and `R` is the 
number of samples in each sampling unit (cluster). We can generate a new binomial 
dataset that includes variable overdispersion:
```{r}
binom2 <- list()
for(i in 1: 7) {
  binom2[[i]] <- matrix(c(rbetabinom(3, prob = 0.2,
                                     size = 10,
                                     theta = estimate_theta(0.2)),
                          rep(10, 3)), 3, 2)
}

head(binom2, 3) # only first three sampling bouts out of seven are shown.
```
and then we can specify and run a STBP as:
```{r}
test_bin2 <- stbp_composite(data = binom2,
                            greater_than = TRUE,
                            hypothesis = 0.15,
                            density_func = "beta-binomial",
                            overdispersion = "estimate_theta",
                            prior = 0.5,
                            lower_bnd = 0,
                            upper_bnd = 1,
                            lower_criterion = 0.001,
                            upper_criterion = 0.999)
test_bin2
```
Notice that the argument `upper_bnd` in the tests `test_bin1` and `test_bin2` is set
to 1, rather than `Inf` like in the tests that processed counts, `test_counts1` and 
`test_counts1a`. This is because the parameter space of population density, `mu`, for
binomial data is `[0, 1]` but it is `[0, Inf]` for count data.

#### Test evaluation

The `sequential.pops` package is equipped with tools to evaluate the accuracy and
efficiency of specifications for the STBP. For a perfectly accurate sequential test, 
the acceptance rates of H: `mu > psi` are zero when the true population is < psi and 
1 when it is > psi, with a minimum (efficient) average number of samples. Thus, 
the evaluation of STBP involves the analysis of the proportion of times H is accepted 
across a range of true population densities and the corresponding number of sampling 
bouts required to make a decision. Sometimes these analyses are called "operating 
characteristics".

All you need to run an evaluation of a STBP is a `STBP` object, which is created every
time a test is run through `stbp_composite`, an evaluation range, which is a
sequence of population densities over which the evaluation is to be performed, and the
sample size within bouts. You can add additional levels of sophistication by playing with
the initial prior probability or by adding stochasticity to the overdipersion parameter,
in case you are using one.

To demonstrate the evaluation of STBP we will use the `STBP` objects created above.
The simplest one is `test_counts1`, which is a purely sequential design that tests
H: `mu > 9` with constant overdispersion of `k = 4.5`. For the evaluation, the only 
relevant information that is extracted from `test_counts1` has to do with the test
specification, and the data is not really used. To run evaluation of `test_counts1`
we call the function `STBP.eval`:
```{r}
eval1 <- STBP.eval(test_counts1,
                   eval.range = seq(3, 12),
                   n = 1,
                   prior = 0.5,
                   N = 20)
eval1
```
Notice that the evaluation range of true population densities (`eval.range`) includes 
psi, the threshold (hypothesis), because we are particularly interested in examining 
the behavior of the test when population size is close the threshold. Also, that
the purely sequential approach was kept for this analysis (`n = 1`), but we could
also try other samples sizes. The argument `N` is the number of simulations run for each
true population size. In this case is 20, so a total of 200 simulations were run
(`20 * length(seq(3, 12))`), which is pretty small for a formal analysis. You should run,
at least, 1000 simulation per population size in more formal test.

The output of `STBP.eval` is a `list()` with two vectors: `$AvgSamples` and `$AcceptRate`.
Both have as many elements as `length(eval.range)` and the first is the mean number
of sampling bouts required to make a decision about H and the second is the proportion
of simulation runs that resulted in acceptance of H. Both can be visualized across
the evaluated range of true population densities to the test performance around the
threshold of 9:
```{r}
plot(seq(3, 12), eval1$AvgSamples, type = "o", xlab = "True population size",
      ylab = "Average number of bouts")
abline(v = 9, lty = 2)

plot(seq(3, 12), eval1$AcceptRate, type = "o", xlab = "True population size",
      ylab = "Acceptance rate of H")
abline(v = 9, lty = 2)
```

There is an argument in the `STBP.eval` function called `overdispersion.sim` that was
omitted in `eval1`. Through `overdispersion.sim`, you can specify overdispersion 
functions or values different from those specified in the test. So, in `eval1`, the 
overdispersion used to run the simulations was the same as in the test `test_counts1`.
However, more realistic test evaluations can be performed if (1) varying overdispersion
is included and (2) stochasticity about the variance-mean relationship is considered. The 
first can either be specified in the model (as in `test_counts1a` or in `test_bin2`) or
directly in `STBP.eval` through `overdispersion.sim`, in case is not in the model.

To add stochasticity, however, you should declare a new overdispersion function
with allowance for variability. This new function is similar to the one used for
the models but includes a random normal variable with `mean = 0` and standard 
deviation `sd` as a new factor. Conventionally, `sd` is the square root 
of the mean square error for the regression used to fit the variance-mean model:
Taylor's Power Law, for count data, and Madden's model for binomial data 
(Binns et al. 2000).

For example, if we want to add stochasticity to the evaluation of model `test_counts1a`,
we should incorporate a normal random variable in function `estimate_k` by:
```{r}
estimate_k_stoch <- function(mean) {
  a = 1.83
  b = 1.21
  RMSE = 0.32
  
  (mean^2) / 
    ((a * mean^(b) * exp(truncdist::rtrunc(1, "norm", 
                                           a = log(1 / (a * mean^(b - 1))), 
                                           b = Inf, 
                                           mean = 0, 
                                           sd = RMSE))) 
     - mean)
}
```
Notice that `estimate_k_stoch` is similar to `estimate_k` except that the former
includes the exponential of a normal random variable as a factor in the denominator.
Here we use a truncated normal distribution, form the `truncdist` package, to prevent
zero denominators or negative values for `k`, but you can also use `rnorm` and a 
different alternative to stay away from undefined or negative values for the 
overdispersion parameter. To run the evaluation including stochasticity, we just 
specify `estimate_stoch` as a character string in the argument `overdispersion.sim`.
```{r}
eval1a <- STBP.eval(test_counts1,
                    eval.range = seq(3, 12),
                    n = 1,
                    prior = 0.5,
                    overdispersion.sim = "estimate_k_stoch",
                    N = 20)
eval1a
```
Test evaluation is a useful tool to compare different sequential analysis approaches,
but it can also be useful to calibrate sample size, decision criteria or the impact
of (incorrect) initial priors of sequential designs. For example, you can compare 
both accuracy and efficiency of tests with different sample sizes within bouts.
```{r}
eval3a <- STBP.eval(test_counts1,
                    eval.range = seq(3, 12),
                    n = 3,
                    prior = 0.5,
                    overdispersion.sim = "estimate_k_stoch",
                    N = 20)

eval6a <- STBP.eval(test_counts1,
                    eval.range = seq(3, 12),
                    n = 6,
                    prior = 0.5,
                    overdispersion.sim = "estimate_k_stoch",
                    N = 20)
```
Here `eval3a` includes not 1 but 3 samples per sampling bout and `eval6a` includes
6 samples. We can now visualize how sample size within bouts affects both accuracy
and efficiency:
```{r}
plot(seq(3, 12), eval1a$AvgSamples, type = "o", xlab = "True population size",
      ylab = "Average number of bouts")
points(seq(3, 12), eval3a$AvgSamples, type = "o", pch = 19)
points(seq(3, 12), eval6a$AvgSamples, type = "o", pch = 0)
abline(v = 9, lty = 2)

plot(seq(3, 12), eval1a$AcceptRate, type = "o", xlab = "True population size",
      ylab = "Acceptance rate of H")
points(seq(3, 12), eval3a$AcceptRate, type = "o", pch = 19)
points(seq(3, 12), eval6a$AcceptRate, type = "o", pch = 0)
abline(v = 9, lty = 2)
```

This very limited analysis (only 20 simulations per population density) shows that
sample sizes of 3 (filled circles) and 6 (empty squares) outperform in accuracy and 
efficiency a purely sequential design (empty circles). Some gains in accuracy can
be achieved by increasing sample size from 3 to 6.

Test evaluation is similar for binomial data, except that the evaluation range is
on the `[0, 1]` interval. The analog of `estimate_k_stoch` for the binomial world
for the tomato leafminer examples is defined as (Binns et al. 2000):
```{r}
estimate_theta_stoch <- function(mean) {
  A = 780.72
  b = 1.61
  R = 10
  RMSE = 0.41
  
  ((1 / (R - 1)) * (((A * R^(1 - b) *
    exp(rnorm(1, mean = 0, sd = RMSE))) / ((mean * (1 - mean))^(1 - b))) - 1))
}
```
Now, we can run a similar analysis to compare sample sizes for a binomial design.
One important difference between binomial and count sampling is that sample sizes
for binomial designs typically require more samples, although they are quicker and 
easier to examine. Simulations within `STBP.eval` con only consider binomial sampling 
with sample units or clusters made of a single observation (`size = 1`). So, within
`STBP.eval`, `n = 1` means a single presence/absence or infected/uninfected observation.
To compensate, the argument `n` in `STBP.eval` should reflect the total number of 
observations considering all the clusters. For example, if you plan to design a binomial
sequential sampling with three clusters each of 10 observations (samples), sample size
for evaluation should be set to 30, `n = 30`. Here we test 10, 20 and 30 total 
observations.
```{r}
evalB10 <- STBP.eval(test_bin2,
                     eval.range = seq(0.01, 0.25, 0.02),
                     n = 10,
                     prior = 0.5,
                     overdispersion.sim = "estimate_theta_stoch",
                     N = 20)

evalB20 <- STBP.eval(test_bin2,
                     eval.range = seq(0.01, 0.25, 0.02),
                     n = 20,
                     prior = 0.5,
                     overdispersion.sim = "estimate_theta_stoch",
                     N = 20)

evalB30 <- STBP.eval(test_bin2,
                     eval.range = seq(0.01, 0.25, 0.02),
                     n = 30,
                     prior = 0.5,
                     overdispersion.sim = "estimate_theta_stoch",
                     N = 20)
```
Notice that the evaluation range are proportions on `[0.01, 0.25]`, excluding 0
to avoid `warnings`, and considering that H was set to H: `mu > 0.15` in `test_bin2`.
We can also visualize acceptance rate of H and average number of bouts across true
incidences:
```{r}
plot(seq(0.01, 0.25, 0.02), evalB10$AvgSamples, type = "o", xlab = "True incidence",
      ylab = "Average number of bouts")
points(seq(0.01, 0.25, 0.02), evalB20$AvgSamples, type = "o", pch = 19)
points(seq(0.01, 0.25, 0.02), evalB30$AvgSamples, type = "o", pch = 0)
abline(v = 0.15, lty = 2)

plot(seq(0.01, 0.25, 0.02), evalB10$AcceptRate, type = "o", xlab = "True incidence",
      ylab = "Acceptance rate of H")
points(seq(0.01, 0.25, 0.02), evalB20$AcceptRate, type = "o", pch = 19)
points(seq(0.01, 0.25, 0.02), evalB30$AcceptRate, type = "o", pch = 0)
abline(v = 0.15, lty = 2)
```

Again, this is a very limited analysis made of only 20 simulations per true incidence.
However, it shows how the number of bouts required to make a decision decreases
with sample size, with designs with 10 samples (empty circles) struggling to make quick
decision when incidence is far from the the threshold of 0.15, compared with sample
sizes of 20 (filled circles) and 30 (empty squares). Accuracy is also affected by
sample size with `n = 20` and `n = 30` producing less incorrect decisions,
particularly when the true incidence is below the threshold.

### Example 3: Testing if a population is above a dynamic threshold

Sometimes hypotheses about population densities are not single numbers, but entire
population trajectories. This is the case when monitoring programs are designed to 
track a target population over time and decide as early as possible whether a management
action is necessary to prevent an outbreak. Population models often assist the prediction 
of densities over time based on current densities and growth rates, so the profile 
of population dynamics that result in problematic population sizes is often known.

An example with real data is presented by Rincon et al. (2025). Here we use the
logistic population growth model, `pop.outB`, to make up an outbreak population 
trajectory of 20 weekly population densities that results in densities that cannot 
be tolerated. We will call such trajectory `OB_pop <- pop.outB(t = seq(1, 20))`, 
where `t` is given in weeks.
```{r}
pop.outB <- function(t) {
  N0 = 15
  K = 90
  r = 0.25
  
  K / (1 + (K / N0 - 1) * exp(-r * t))
}

OB_pop <- pop.outB(t = seq(1, 20))

plot(seq(1, 20), OB_pop, xlab = "Time (weeks)", 
     ylab  = "Population density", lwd = 2, type = "o", pch = 19)
```

The goal is to determine, with the fewest possible number of sampling bouts collected 
over time, if a sampled population is consistent with one equivalent or greater than 
`pop.outB`. One complication that the threshold here is a moving target and the 
differences between outbreak and non-outbreak population configurations during the 
first few weeks (when we want to make an informed decision) are often tiny, but 
tend to amplify over time. Thus, the test should be able to detect those early tiny 
differences and produce correct early warnings.

Every cycle/season the population may behave differently but we can use the outbreak
configuration as a reference to quantify how severe they are. For example, non-outbreaks
are a portion of the outbreak, and very bad seasons/cycles could even take larger 
population sizes than a the reference outbreak. We can produce several variants of 
population trajectories based on the outbreak:
```{r}
plot(seq(1, 20), OB_pop, ylim = c(0, 95), type = "o", 
     lwd = 2, ylab = "Population size", xlab = "Time (weeks)", pch = 19)
points(seq(1, 20), OB_pop * 0.9, type = "o")
points(seq(1, 20), OB_pop * 0.7, type = "o", pch = 0)
points(seq(1, 20), OB_pop * 0.5, type = "o", pch = 2)
points(seq(1, 20), OB_pop * 1.1, type = "o", pch = 5)
```

The trajectory in bold is the outbreak population configuration, `OB_pop`, and the 
rest are outbreak and non-outbreak variants. The pattern with triangles is a configuration 
of 50% of an outbreak, the one with squares one of 70% of an outbreak, circles 90% of
an outbreak, and diamonds represents one that surpasses the outbreak by 10%. Modeling 
variants using the outbreak as a reference (with proportions) is more realistic than 
varying single parameters in the `pop.outB` function, since there are multiple 
mechanisms that drive simultaneously outbreak severity. As we'll see below, the 
evaluation range to evaluate STBP for dynamic thresholds in `STBP.eval` is provided 
as factors of the threshold trajectory.

Like for static hypotheses, data for dynamic hypotheses come in sequential bouts of
counts or proportions with or without overdispersion, and the data is structured in
the same way. The only difference is that true density varies over time according
to the population model. For example, data collected sequentially in bouts of 10
samples each on a population that is 70% of an outbreak ca be generated as:
```{r}
pop70 <- OB_pop * 0.7
count70 <- matrix(NA, 10, 20)

for(i in 1:20){
  count70[, i] <- rnbinom(10, mu = pop70[i], size = estimate_k(pop70[i]))
}

head(count70, 3) # only first three samples out of ten are shown
```
Notice that we are using the same negative binomial specification of example 2,
with varying overdispersion described with the function `estimate_k`, but a constant 
overdispersion could also be used. We can visualize the sample means per sampling bout 
(means across columns of `counts70`) along with the outbreak trajectory, `OB_pop`:
```{r}
plot(seq(1, 20), OB_pop, ylim = c(0, 95), type = "o", 
     lwd = 2, ylab = "Population size", xlab = "Time (weeks)", pch = 19)
points(seq(1, 20), colMeans(count70), type = "o")
```

To run the test on `count70`, we only have to specify the outbreak trajectory (not 
the function) in the `hypothesis` argument of `stbp_composite`:
```{r}
test_dyn <- stbp_composite(data = count70,
                           greater_than = TRUE,
                           hypothesis = OB_pop,
                           density_func = "negative binomial",
                           overdispersion = "estimate_k",
                           prior = 0.5,
                           lower_bnd = 0,
                           upper_bnd = Inf,
                           lower_criterion = 0.001,
                           upper_criterion = 0.999)
test_dyn
```
The STBP rejects H: `trajectory mu > trajectory OB_pop` after only two bouts. The 
procedure to test dynamic hypotheses with binomial data is similar, except that 
trajectories should be provided as proportions (incidence) and data should be 
structured as `list()` of matrices with column 1 with successes and column 2 with 
samples.

#### Test evaluation

STBP for dynamic hypotheses can be evaluated the same way as tests for static
hypotheses. The only difference is that `eval.range` is provided as factors of the
threshold trajectory. For example, for `test_dyn`, the evaluation is run on data 
generated for a range of percentages of `OB_pop = pop.outB(seq(1, 20))`, such 
as `OB_pop * 0.5` for 50% an outbreak, `OB_pop * 0.7` for 70% an outbreak, or 
`OB_pop * 1.2` for 20% *above* the outbreak. We can refer to these factors as 
"outbreak intensities".
```{r}
eval_dyn <- STBP.eval(test_dyn,
                      eval.range = seq(0.2, 1.5, 0.2),
                      n = 10,
                      prior = 0.5, 
                      N = 20)
```
We can visualize the average number of bouts required to make a decision and the
acceptance rate of H across true outbreak intensities.
```{r}
plot(seq(0.2, 1.5, 0.2), eval_dyn$AvgSamples, type = "o", xlab = "True outbreak 
     intensity", ylab = "Average number of bouts")
abline(v = 1, lty = 2)

plot(seq(0.2, 1.5, 0.2), eval_dyn$AcceptRate, type = "o", xlab = "True outbreak 
     intensity", ylab = "Acceptance rate of H")
abline(v = 1, lty = 2)
```

The pattern is similar to the one shown for static hypotheses (example 2). Average 
sampling bouts peaks at the threshold (outbreak intensity = 1) where decision-making 
is toughest and more samples are required to distinguish outbreaks from non-outbreaks. 
Acceptance rate of H is 0 at low outbreak intensities and reaches 1 above the threshold 
(outbreak intensity = 1). Again, the value of this analysis stems from the ability 
to check the balance between sampling effort and accuracy that results from varying 
decision criteria (arguments `lower_criterion` or `upper_criterion`) and sampling 
size within bouts.

## 2. The Sequential Probability Ratio Test

The SPRT was invented in the context of quality control back in the 40s, but it is
widely applied in many fields, including population management and clinical trials.
In fact, there are other R packages that implement the SPRT, particularly in the 
field of clinical trials (see introduction for a list). The SPRT is popular among 
practitioners of Integrated Pest Management, who are often familiar with the charts
with two parallel lines on which cumulative pest counts are overlapped to decide if
sampling should continue or if there is enough evidence to stop and decide in favor
or against the hypothesis being tested. The SPRT has been extensively tested and it
has been shown to be optimal under a restricted range of contexts. The SPRT can be
easily implemented even without a calculator just by comparing data with stop lines
"on the fly".

The main difference between STBP and SPRT is philosophical. While STBP provides 
recommendations about H based on Bayesian probabilities, SPRT does so based on 
likelihood ratios. Bayesian probabilities are understood as the credibility for H 
to be true *given* the data, and likelihood ratios rely on the frequency in which 
observations occur in face of two competing models. For this reason, the decision 
criteria for STBP (i.e., arguments `lower_criterion` and `upper_criterion`) are not
directly associated with type I and type II error rates and the `stbp_simple` and
`stbp_composite` outputs include the final posterior probability (credibility) for 
H. In SPRT, tolerable type I and type II error rates are explicitly set as part of 
the test specification and the final likelihood ratio is not as relevant since the 
"significance" of the result is stated from the beginning and provided that one of 
the two stop lines has been reached.

One important drawback of the SPRT is that it requires the specification of two 
non-related hypotheses. You can't test a single hypothesis against its complementary,
like in STBP, because the SPRT requires two simple hypothesis to produce a probability
ratio each time new data come. Also, the SPRT can't process group sequential data, 
unless you use the mean, median or mode and in its conventional form cannot be used 
to test dynamic hypotheses. If you want to use SPRT for dynamic hypotheses, you 
must use a variation called Time-Sequential Probability Ratio Test, which is not 
implemented in the `sequential.pops` package yet.

### Example 4: Testing if a population is above a threshold

Like other sequential tests, the SPRT assist in situations where one wants to test 
if a given population is above or below a threshold. The main difference with the
STBP is that SPRT is specified with two hypotheses instead of one. There are no rules
on how those hypotheses should be stated when you only want to test a single threshold-style
hypothesis. Conventionally, they are formulated equidistant from the threshold density,
using confidence intervals for the threshold, but there are other approaches.

For this example, we will use once again the case study of the tomato leafminer 
in greenhouse tomatoes. Let's assume again the economic threshold is static (constant) 
at 9 larvae per plant and we want to test, through sequential sampling, if a population 
has exceeded such threshold. In the `sequential.pops` package, the same probability
distributions available for the STBP are also available for the SPRT, except for 
the beta-binomial, for which stop lines have not been derived yet.

Similar to the STBP, distributions without overdispersion (poisson and binomial) are 
used to test and sample small population sizes. But when population grow they start 
to aggregate and overdispersion becomes a nuisance that must be accounted for by
using appropriate distributions (negative binomial). However, the to specify a SPRT 
the overdispersion parameter is only used for the threshold density to calculate 
stop lines. If there a function to obtain reliable values for the overdispersion 
parameter based on the mean, it can be used for model evaluation to generate more 
realistic data.

Unlike the STBP, you can specify a SPRT without data, in which case you'll get the a
chart with the corresponding stop lines as output. For example, if we want build 
a test specification for H: `mu > 9` larvae per plant, we have to provide two
hypotheses that allow for the sequential calculation of probability ratios, so that
would be H0: `mu = 8` and H1: `mu = 10`. We also must provide a probability distribution,
an overdispersion parameter value (in case the distribution is negative binomial),
and tolerable type I (`alpha`) and type II (`beta`) error rates:
```{r}
testPR10 <- sprt(mu0 = 8,
                 mu1 = 10,
                 density_func = "negative binomial",
                 overdispersion = estimate_k(9),
                 alpha = 0.1,
                 beta = 0.1)
```
Notice that no data have to be provided, and that `overdispersion` is not a function
but a value specified through a function for the threshold density of 9 larvae per 
plant. A chart with the stop line ready to contrast with cumulative counts is returned
with the call `plot`:
```{r}
plot(testPR10)
```

and the coefficients for the stops lines by calling the object:
```{r}
testPR10
```

Now let's generate some data with the same specifications as in example 2. This is
from a negative binomial distribution and the function `estimate_k` to obtain 
overdispersion based on the mean:
```{r}
counts1a <- rnbinom(20, mu = 5, size = estimate_k(5))
counts1a
```
This is 20 random counts from a population of 5 larvae per sample unit (plant) in 
size. If we run the function `sprt` with data, a SPRT test is performed and a new
`"SPRT"` object is created.
```{r}
testPR1 <- sprt(data = counts1a,
                mu0 = 8,
                mu1 = 10,
                density_func = "negative binomial",
                overdispersion = estimate_k(9),
                alpha = 0.1,
                beta = 0.1)
testPR1
```
The output is similar to the one generated by the functions `stbp_simple` and
`stbp_composite`. In this case, H0 is accepted, evidence is compatible with a sampled
population above the threshold of 9 individuals per plant. In contrast to `stbp_simple` 
and `stbp_composite`, the output from `sprt` does not provide a probability and the
precision of the final decision relies on the selected values for `alpha` and `beta`. 
Once again, note that overdispersion is provided as a value through a function. 
There is also a `plot` method for `"SPRT"` objects, which overlap the stop lines 
with cumulative counts:
```{r}
plot(testPR1)
```

In this case, the whole sequence of 20 counts is displayed, but the sampling should have
been terminated after the 9th bout when cumulative counts touched the area below the
lower stop line. The procedure for binary variables is similar, but cluster sampling
(i.e., the ability to collect clusters of samples, like in the STBP) is not allowed.
Instead, only 1 and 0 values can be introduced and decisions and based on the cumulative 
number of 1s across sampling bouts.

For example, binary data collected sequentially for a SPRT is structured in a vector
that can only contain 1 and 0. For example, we can generate some binary data for a
population with mean incidence of `prob = 0.5`, without cluster sampling, `size = 1`:
```{r}
binPR <- rbinom(40, prob = 0.5, size = 1)
binPR
```
Again, we could get the stop lines in a chart and the corresponding coefficients to 
set up a sampling plan for H: `mu > 0.4` just by specifying the test as H0: `mu = 0.35` 
vs. H1: `mu = 0.45` and omitting the argument `data` in the function `sprt`:
```{r}
testPR20 <- sprt(mu0 = 0.35,
                 mu1 = 0.45,
                 density_func = "binomial",
                 alpha = 0.1,
                 beta = 0.1)
plot(testPR20)
```

and obtain the stop lines coefficients with:
```{r}
testPR20
```

Now, we can run a SPRT for the data in `binPR` with:
```{r}
testPR2 <- sprt(data = binPR,
                mu0 = 0.35,
                mu1 = 0.45,
                density_func = "binomial",
                alpha = 0.1,
                beta = 0.1)
testPR2
```
In this case, 34 binomial sampling bouts made of a single observation were required
to make a decision between H0 or H1. The object `testPR2` can also be used to produce 
a chart that compares stop lines with the cumulative data:
```{r}
plot(testPR2)
```

#### Test evaluation

Like for the STBP, the goal of evaluating SPRT is to check specifications and adjust
to achieve a nice balance between sampling effort and precision. The procedure and
requirements and outputs are similar. For test evaluation, overdispersion for the 
negative binomial can be specified as a function with or without stochasticity. The 
evaluation with a constant overdispersion (as is in the model `testPR1`) is performed
with:
```{r}
evalPR1 <- SPRT.eval(testPR1, eval.range = seq(4, 12), N = 20)
```
Notice that there is no need to specify once more overdispersion when it is a constant 
because the same use for the model is used by default. If we wanted to add a function
that describes overdipersion with added stochasticity the scrip would be:
```{r}
evalPR1a <- SPRT.eval(testPR1, eval.range = seq(4, 12),
                      overdispersion.sim = "estimate_k_stoch",
                      N = 20)
```
Now we can compare both evaluations, with and without functional description of 
overdispersion with stochasticity:
```{r}
plot(seq(4, 12), evalPR1a$AvgSamples, xlab = "True population size",
     ylab = "Average number of bouts", type = "o")
points(seq(4, 12), evalPR1$AvgSamples, type = "o", pch = 2)
abline(v = 9, lty = 2)

plot(seq(4, 12), evalPR1a$AcceptRate, xlab = "True population size",
     ylab = "Acceptance rate of H1", type = "o")
points(seq(4, 12), evalPR1$AcceptRate, type = "o", pch = 2)
abline(v = 9, lty = 2)
```

Notice the the evaluation for SPRT is performed on the acceptance rate for H1. As 
mentioned above, the addition of realistic descriptions of overdispersion across
count of binomial data provides a more realistic expectation for the test when applied
to real data. In the figures produced above, triangles denote the evaluation assuming
constant overdispersion and circles with varying overdispersion and added stochasticity.
Although this is based on far too few simulations (`N = 20`), it shows that simulations 
that omit the varying nature of overdispersion underestimate the number of sampling
bouts required to make a decision and the rate of incorrect decisions, specially 
when the true population size is above the threshold.

Evaluation of tests with binomial data is similar:
```{r}
evalPR2 <- SPRT.eval(testPR2, eval.range = seq(0.1, 0.8, 0.1), N = 20)

plot(seq(0.1, 0.8, 0.1), evalPR2$AvgSamples, xlab = "True incidence",
     ylab = "Average number of bouts", type = "o")
abline(v = 0.4, lty = 2)
plot(seq(0.1, 0.8, 0.1), evalPR2$AcceptRate, xlab = "True incidence",
     ylab = "Acceptance rate of H1", type = "o")
abline(v = 0.4, lty = 2)
```

Although binomial sampling is quicker and easier, it requires significantly more 
samples than count sampling and precision is often compromised too. However, binomial 
sampling can be a good alternative when the threshold population density is small 
[see Binns et al. (2000) for a discussion]. 

## Wishlist for future versions

- Implement cluster sampling in eval functions for both STBP and SPRT
- Include time-sequential probability ratio test for dynamic thresholds
- Implement cluster sampling for SPRT
- Protocol to run eval functions faster with parallel computing

## Cited literature

Binns, M.R., Nyrop, J.P. & Werf, W.v.d. (2000) Sampling and monitoring in crop 
protection: the theoretical basis for developing practical decision guides. CABI Pub., 
Wallingford, Oxon, UK; New York, N.Y (USA). 284pp.

Madden, L. V., Hughes, G., & Bosch, F.v.d. (2007). The study of plant disease epidemics. 
American Phytopathological Society. St. Paul, MN (USA). 421pp.

Rincon, D.F., McCabe, I. & Crowder, D.W. (2025) Sequential testing of complementary 
hypotheses about population density. Methods in Ecology and Evolution. 
<https://doi.org/10.1111/2041-210X.70053>

Rincon, D. F., Rivera- Trujillo, H. F., Mojica- Ramos, L., & BorreroEcheverry, F. 
(2021) Sampling plans promoting farmers' memory provide decision support in *Tuta absoluta*
management. Agronomy for Sustainable Development, 41: 33. <https://doi.org/10.1007/s13593-021-00693-0>

Wald, A. (1945) Sequential Tests of Statistical Hypotheses. The Annals of Mathematical
Statistics 16(2): 117-186.