# Multiplicity Adjustment for Group Sequential Designs

## Background

Consider a phase 3 study with three arms: Arm A (drug A + drug B), Arm B (drug A placebo + drug B), and Arm C (drug A placebo + drug B placebo). Consider two endpoints: PFS and OS. There are six comparisons of interest corresponding to the combinations of two endpoints by three pairwise treatment comparisons.

The overall type I error rate is to be controlled at 1-sided $$\alpha=2.5\%$$. The total alpha is split into 20% for PFS and 80% for OS. The following graph depicts the initial alpha allocation and propagation. Figure 1: Graphical procedure for six hypotheses

This strategy ensures that the A/B comparison will be performed for a given endpoint only if both the A/C and B/C comparisons are significant for the endpoint. This assumes that B/C comparison is likely to be effective and we are interested in assessing the contribution of components (CoC) of drug A in the drug combination A+B.

The weights along the edges are chosen to ensure that (1) about 20% of alpha will be allocated to PFS and 80% of alpha will be allocated to OS once the two hypotheses in the preceding family are rejected, and (2) if the PFS hypothesis is rejected, the majority of alpha will be passed to the OS hypothesis in the same family. This can be verified using the code below:

library(lrstat)

alpha = 0.025
x = list(
w = c(0.2, 0.8, 0, 0, 0, 0),
G = matrix(c(0, 0.8, 0.2, 0, 0, 0,
0.5, 0, 0, 0.5, 0, 0,
0, 0.8, 0, 0, 0.2, 0,
0.72, 0, 0, 0, 0, 0.28,
0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0),
nrow=6, ncol=6, byrow = TRUE),
I = seq(1, 6))

print(alpha*x$w, digits=4) ##  0.005 0.020 0.000 0.000 0.000 0.000 for (j in 1:6) { x = updateGraph(x$w, x$G, x$I, j)
print(alpha*x$w, digits=4) } ##  0.000 0.024 0.001 0.000 0.000 0.000 ##  0.000 0.000 0.005 0.020 0.000 0.000 ##  0.000000 0.000000 0.000000 0.023846 0.001154 0.000000 ##  0.000000 0.000000 0.000000 0.000000 0.005092 0.019908 ##  0.000 0.000 0.000 0.000 0.000 0.025 ##  0 0 0 0 0 0 ## Assumptions We assume that the enrollment rate is 10 patients per month for the first 8 months and 28 patients per month thereafter. We target to enroll 700 patients in about 30 months. The randomization ratio is 2:2:1 for Arms A:B:C. The table below displays the assumptions for the event process by endpoint and by treatment arm along with the hazard ratios for Arm A or Arm B compared with Arm C. Endpoint Parameter Arm A Arm B Arm C PFS Median, mos 22 17 12 HR vs. C 0.55 0.70 NA OS Median, mos 50 40 30 HR vs. C 0.60 0.75 NA Table 1: Assumptions for the event process ## Group sequential design We assume that the study has two interim looks and one final look. The comparison in PFS for Arm A vs. C will be performed at Looks 1 and 2 only, while the other five comparisons will be performed at all three looks. An HSD(-4) alpha spending function will be used for testing each of the six hypotheses. The timing of Look 1 and Look 2 is based on the combined PFS events for Arm A and Arm C, and Look 1 has 75% PFS events relative to Look 2. In addition, we target 97% power for PFS A/C at 1-sided alpha of 0.5%. The following code calculates the sample size for the PFS A/C comparison. Of note, the accrual intensity is 3/5 of the overall population due to the 2:2:1 randomization ratio and the fact that only Arms A and C are included for the A/C comparison. The accrual duration of 30.143 months is used to enroll 700 patients for Arms A, B and C combined. (lr1 <- lrsamplesize( beta = 0.03, kMax = 2, informationRates = c(0.75,1), alpha = 0.005, typeAlphaSpending = "sfHSD", parameterAlphaSpending = -4, allocationRatioPlanned = 2, accrualTime = c(0,8), accrualIntensity = c(10,28)*3/5, lambda1 = log(2)/22, lambda2 = log(2)/12, accrualDuration = 30.143, followupTime = NA, typeOfComputation = "Schoenfeld")$resultsUnderH1)
##
## Group-sequential trial with 2 stages
## Overall power: 0.971, overall significance level (1-sided): 0.005
## Maximum # events: 248, expected # events: 196.1
## Maximum # dropouts: 0, expected # dropouts: 0
## Maximum # subjects: 423, expected # subjects: 423
## Total study duration: 41.5, expected study duration: 34.8
## Accrual duration: 30.3, follow-up duration: 11.1, fixed follow-up: FALSE
##
##                              Stage 1 Stage 2
## Information rate             0.750   1.000
## Efficacy boundary (Z-scale)  2.915   2.629
## Cumulative rejection         0.837   0.971
## Cumulative alpha spent       0.0018  0.0050
## Number of events             186.0   248.0
## Number of dropouts           0.0     0.0
## Number of subjects           423.0   423.0
## Analysis time                33.4    41.5
## Efficacy boundary (HR-scale) 0.635   0.702
## Efficacy boundary (p-scale)  0.0018  0.0043
## Information                  41.33   55.11
## HR                           0.545   0.545

It can be seen that Look 1 and Look 2 are expected to take place at months 33.4 and 41.5, respectively, with the target number of events for Arm A and Arm C combined of 186 and 248 at the two looks, respectively.

We can now calculate the expected number of OS events for Arm A and Arm C combined at Look 1 and Look 2, respectively.

(lr2a <- lrstat(
time = lr1$byStageResults$analysisTime,
allocationRatioPlanned = 2,
accrualTime = c(0,8), accrualIntensity = c(10,28)*3/5,
lambda1 = log(2)/50, lambda2 = log(2)/30,
accrualDuration = 30.143, followupTime = 100,
predictEventOnly = 1))
##       time subjects  nevents nevents1 nevents2 ndropouts ndropouts1 ndropouts2
## 1 33.44355 420.0024  96.6643 54.71521 41.94909         0          0          0
## 2 41.46301 420.0024 136.9521 78.41901 58.53314         0          0          0

Look 3 is planned to have 90% power for the OS A/C comparison at 1-sided alpha of 2%, which can be done as follows:

(d_os <- ceiling(uniroot(function(d) {
lr <- lrsamplesize(
beta = 0.1, kMax = 3,
informationRates = c(lr2a$nevents, d)/d, alpha = 0.020, typeAlphaSpending = "sfHSD", parameterAlphaSpending = -4, allocationRatioPlanned = 2, accrualTime = c(0,8), accrualIntensity = c(10,28)*3/5, lambda1 = log(2)/50, lambda2 = log(2)/30, accrualDuration = 30.143, followupTime = NA, typeOfComputation = "Schoenfeld", rounding = 0)$resultsUnderH1
lr$overallResults$numberOfEvents - d},
c(137,300))$root)) ##  196 (studyDuration = caltime( nevents = d_os, allocationRatioPlanned = 2, accrualTime = c(0,8), accrualIntensity = c(10,28)*3/5, lambda1 = log(2)/50, lambda2 = log(2)/30, accrualDuration = 30.143, followupTime = 1000)) ##  55.72894 In summary, the OS A/C power calculation is as follows: (lr2 <- lrpower( kMax = 3, informationRates = c(lr2a$nevents, d_os)/d_os,
alpha = 0.020, typeAlphaSpending = "sfHSD",
parameterAlphaSpending = -4,
allocationRatioPlanned = 2,
accrualTime = c(0,8),
accrualIntensity = c(10,28)*3/5,
lambda1 = log(2)/50, lambda2 = log(2)/30,
accrualDuration = 30.143, followupTime = 25.586,
typeOfComputation = "Schoenfeld"))
##
## Group-sequential trial with 3 stages
## Overall power: 0.901, overall significance level (1-sided): 0.02
## Maximum # events: 196, expected # events: 147.8
## Maximum # dropouts: 0, expected # dropouts: 0
## Maximum # subjects: 420, expected # subjects: 420
## Total study duration: 55.7, expected study duration: 44.6
## Accrual duration: 30.1, follow-up duration: 25.6, fixed follow-up: FALSE
##
##                              Stage 1 Stage 2 Stage 3
## Information rate             0.493   0.699   1.000
## Efficacy boundary (Z-scale)  2.832   2.605   2.099
## Cumulative rejection         0.321   0.597   0.901
## Cumulative alpha spent       0.0023  0.0057  0.0200
## Number of events             96.7    137.0   196.0
## Number of dropouts           0.0     0.0     0.0
## Number of subjects           420.0   420.0   420.0
## Analysis time                33.4    41.5    55.7
## Efficacy boundary (HR-scale) 0.543   0.624   0.728
## Efficacy boundary (p-scale)  0.0023  0.0046  0.0179
## Information                  21.48   30.43   43.56
## HR                           0.600   0.600   0.600

Therefore, Look 3 is expected to take place at month 55.7. Based on the timing of the three looks, we obtain the expected number of events for the other four comparisons:

(lr3 <- lrstat(
time = lr2$byStageResults$analysisTime,
allocationRatioPlanned = 2,
accrualTime = c(0,8), accrualIntensity = c(10,28)*3/5,
lambda1 = log(2)/17, lambda2 = log(2)/12,
accrualDuration = 30.143, followupTime = 1000,
predictEventOnly = 1))
##       time subjects  nevents nevents1  nevents2 ndropouts ndropouts1 ndropouts2
## 1 33.44358 420.0024 206.7956 127.5644  79.23122         0          0          0
## 2 41.46304 420.0024 271.8408 170.0795 101.76138         0          0          0
## 3 55.72900 420.0024 341.7859 218.5591 123.22675         0          0          0
(lr4 <- lrstat(
time = lr2$byStageResults$analysisTime,
allocationRatioPlanned = 2,
accrualTime = c(0,8), accrualIntensity = c(10,28)*3/5,
lambda1 = log(2)/40, lambda2 = log(2)/30,
accrualDuration = 30.143, followupTime = 1000,
predictEventOnly = 1))
##       time subjects  nevents  nevents1 nevents2 ndropouts ndropouts1 ndropouts2
## 1 33.44358 420.0024 108.2079  66.25878 41.94915         0          0          0
## 2 41.46304 420.0024 152.5236  93.99041 58.53320         0          0          0
## 3 55.72900 420.0024 216.1404 134.73124 81.40917         0          0          0
(lr5 <- lrstat(
time = lr2$byStageResults$analysisTime,
allocationRatioPlanned = 1,
accrualTime = c(0,8), accrualIntensity = c(10,28)*4/5,
lambda1 = log(2)/22, lambda2 = log(2)/17,
accrualDuration = 30.143, followupTime = 1000,
predictEventOnly = 1))
##       time subjects  nevents nevents1 nevents2 ndropouts ndropouts1 ndropouts2
## 1 33.44358 560.0032 233.9718 106.4074 127.5644         0          0          0
## 2 41.46304 560.0032 315.2459 145.1664 170.0795         0          0          0
## 3 55.72900 560.0032 412.5407 193.9816 218.5591         0          0          0
(lr6 <- lrstat(
time = lr2$byStageResults$analysisTime,
allocationRatioPlanned = 1,
accrualTime = c(0,8), accrualIntensity = c(10,28)*4/5,
lambda1 = log(2)/50, lambda2 = log(2)/40,
accrualDuration = 30.143, followupTime = 1000,
predictEventOnly = 1))
##       time subjects  nevents  nevents1  nevents2 ndropouts ndropouts1
## 1 33.44358 560.0032 120.9741  54.71529  66.25878         0          0
## 2 41.46304 560.0032 172.4095  78.41910  93.99041         0          0
## 3 55.72900 560.0032 249.3223 114.59105 134.73124         0          0
##   ndropouts2
## 1          0
## 2          0
## 3          0

The power calculation for PFS A/C and OS A/C assumes no alpha recycling between the two hypotheses. In the presence of alpha propagation among the six hypotheses depicted in Figure 1, we resort to simulations to obtain the empirical power with multiplicity adjustment for the six hypotheses.

## Simulation and empirical power

First we simulate 500 trials, with the first 2 looks planned at 186 and 248 events for endpoint 1 and the final look planned at 196 events for endpoint 2, i.e., kMaxe1 = 2 and plannedEvents = c(186, 248, 196).

sim1 = lrsim2e3a(
kMax = 3,
kMaxe1 = 2,
allocation1 = 2,
allocation2 = 2,
allocation3 = 1,
accrualTime = c(0, 8),
accrualIntensity = c(10, 28),
lambda1e1 = log(2)/22,
lambda2e1 = log(2)/17,
lambda3e1 = log(2)/12,
lambda1e2 = log(2)/50,
lambda2e2 = log(2)/40,
lambda3e2 = log(2)/30,
accrualDuration = 30.143,
plannedEvents = c(186, 248, 196),
maxNumberOfIterations = 500,
maxNumberOfRawDatasetsPerStage = 1,
seed = 314159)

From the summary data for each simulated data set, we obtain the pairwise total number of events and the raw p-values.

library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
##     filter, lag
## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union
library(tidyr)

df <- sim1$sumdata %>% mutate(events13 = events1 + events3, events23 = events2 + events3, events12 = events1 + events2, p13 = pnorm(logRankStatistic13), p23 = pnorm(logRankStatistic23), p12 = pnorm(logRankStatistic12)) %>% select(iterationNumber, stageNumber, endpoint, events13, events23, events12, p13, p23, p12) dfcomp <- tibble(comparison = c("13", "23", "12"), order = c(1, 2, 3)) dfinfo <- df %>% arrange(iterationNumber, endpoint) %>% pivot_longer( c("events13", "events23", "events12"), names_to = "c_events", values_to = "events") %>% mutate(comparison = substring(c_events, 7)) %>% left_join(dfcomp, by = c("comparison")) %>% arrange(iterationNumber, order, endpoint, stageNumber) dfp <- df %>% arrange(iterationNumber, endpoint) %>% pivot_longer( c("p13", "p23", "p12"), names_to = "c_p", values_to = "p") %>% mutate(comparison = substring(c_p, 2)) %>% left_join(dfcomp, by = c("comparison")) %>% arrange(iterationNumber, order, endpoint, stageNumber) Now we use specified alpha-spending function along with the target total number of events to adjust the critical values based on the current alpha for the hypothesis. We compare the p-values to the critical values to determine whether to reject the hypothesis at the current look. The empirical power for a hypothesis is estimated by the proportion of simulations which reject the hypothesis. w = c(0.2, 0.8, 0, 0, 0, 0) G = matrix(c(0, 0.8, 0.2, 0, 0, 0, 0.5, 0, 0, 0.5, 0, 0, 0, 0.8, 0, 0, 0.2, 0, 0.72, 0, 0, 0, 0, 0.28, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0), nrow=6, ncol=6, byrow=TRUE) rejStage = fseqbon( w = w, G = G, alpha = 0.025, kMax = 3, typeAlphaSpending = rep("sfHSD", 6), parameterAlphaSpending = rep(-4, 6), incidenceMatrix = matrix( c(1,1,0, 1,1,1, 1,1,1, 1,1,1, 1,1,1, 1,1,1), nrow=6, ncol=3, byrow=TRUE), maxInformation = c(248, 196, 342, 216, 413, 249), p = matrix(dfp$p, 3000, 3, byrow=TRUE),
information = matrix(dfinfo\$events, 3000, 3, byrow=TRUE))

reject2a = matrix(rejStage>0, nrow=500, ncol=6, byrow=TRUE)
(power2 = apply(reject2a, 2, mean))
##  0.992 0.942 0.722 0.530 0.286 0.136