Early termination in single-parameter model phase II clinical trial designs using decreasingly informative priors

Background: To exchange the type of subjective Bayesian prior selection for assumptions more directly related to statistical decision making in clinician studies and trials, the decreasingly informative prior (DIP) is considered. We expand standard Bayesian early termination methods in one-parameter statistical models for Phase II clinical trials to include decreasingly informative priors (DIP). These priors are designed to reduce the chance of erroneously adapting trials too early by parameterize skepticism in an amount always equal to the unobserved sample size. Method: We show how to parameterize these priors based on effective prior sample size and provide examples for common single-parameter models, include Bernoulli, Poisson, and Gaussian distributions. We use a simulation study to search through possible values of total sample sizes and termination thresholds to find the smallest total sample size (N) under admissible designs, which we define as having at least 80% power and no greater than 5% type I error rate. Results: For Bernoulli, Poisson, and Gaussian distributions, the DIP approach requires fewer patients when admissible designs are achieved. In situations where type I error or power are not admissible, the DIP approach yields similar power and better-controlled type I error with comparable or fewer patients than other Bayesian priors by Thall and Simon. Conclusions: The DIP helps control type I error rates with comparable or fewer patients, especially for those instances when increased type I error rates arise from erroneous termination early in a trial.


INTRODUCTION
Phase II clinical studies typically focus on determining whether a treatment has sufficient evidence of preliminary efficacy to warrant further investigation, such as in phase III trials, or whether the investigation should be discontinued due to a lack of efficacy or safety. These studies tend to be small, and data monitoring tends to occur as subjects are accrued so that decisions on whether to stop the study early-for efficacy, safety, or futility-can be made as soon as possible, even before the planned end of the study.
While the traditional frequentist methods (e.g., Pocock group sequential designs, O'Brien-Fleming alpha-spending function, etc.) provide stopping rules or termination guidance in phase II trials, Bayesian methods allow the inclusion of prior or historical information, which may help to improve decision making. 1,2 The Bayesian approaches are also more amenable to adaptive designs and complex modelings. 3 Despite these benefits, the Bayesian approach can be subject to inflated type I error rates.
Further, the prior selection in Bayesian approach is crucial because it is possible to generate posterior distributions that are strongly influenced by the priors which is not desirable. In practice, the Bayesian approach can be contentious when prior information is based mainly on subject matter experts. 3 To exchange this type of subjectivity for assumptions more directly related to statistical decision making in clinician studies and trials, the DIP is considered, where null skepticism is elicited into the prior in a manner that decreases its prior effective sample size (ESS) as subjects accrued. [4][5][6] In this way, the posterior distribution is increasingly informed by observed data and less by the prior information as subjects are accrued.
The goal of this paper is to develop and present the DIP approach based on the effective sample size for single-parameter models, include Bernoulli, Poisson, and Gaussian, and compare the DIP approach to the Thall and Simon's Bayesian approaches. 1,2 The net effect of this DIP formulation is that it restricts response-based adaptation early in a trial, gradually permitting more adaptation as the overall Bayesian model transfers the total effective sample size from the prior to the likelihood. If applied to designs featuring early termination processes, this decreasingly informative prior could possibly help control type I error rates, especially for those instances when increased type I error rates arise from erroneous termination early in a trial. This paper presents an alternative Bayesian approach to early termination in phase II trials using DIP in single-parameter statistical models. Following a description of standard Bayesian early termination phase II trial designs in Section 2.1, the rationale of DIP approach and the general model is detailed in section 2.2. Examples of one-sample models, including Bernoulli, Poisson, and Gaussian distributions, are presented in 2.3. Simulation studies (Section 3) are used to compare the performance of the DIP approach with the standard Bayesian model, focusing on identifying admissible designs (those with at least 80% power and no more than 5% type-I error rates) and the minimum sample size that yields such designs. Section 4 concludes the paper with a discussion.

Standard Bayesian early termination phase II trials
In single-group phase II studies and two-group phase II trials, we often need to know if an experimental treatment is sufficiently efficacious relative to some threshold or the other treatment. Suppose we have a likelihood function f(y|θ) and prior distribution π(θ) for outcome vector y and scalar parameter θ. Let θ 1 be the parameter value representing efficacy in a new treatment, while θ 2 reflects either some null level representing the boundary between an efficacious and non-efficacious treatment or the efficacy parameter in a comparison group. Then, the hypotheses we are testing are H 0 : θ 1 ≤ θ 2 + δ 0 , H 1 : θ 1 > θ 2 + δ 0 where δ 0 is a fixed targeted improvement for the new treatment to achieve (which could be 0). Note that these hypotheses assume that larger values of Θ 1 are reflective of greater efficacy; we could simply switch the directions of the inequalities if lower values imply greater efficacy. We also set upper and a lower boundary for the posterior probability, denoted as p s and p f respectively, representing the probabilistic thresholds needed to be met in order to terminate the trial for superiority or futility. Throughout the trial, we can decide to terminate for efficacy if the evidence is promising (P(θ 1 >θ 2 +δ 0 |y)≥p s or terminate for futility if the evidence is unpromising (P(θ 1 >θ 2 +δ 0 |y)≤p f ), and we continue the trial and enrol additional subjects if the evidence is inconclusive (p f <P(θ 1 >θ 2 +δ 0 |y)<p s ). These probabilities can be estimated and the resulting decisions can be made after each new subject is enrolled and observed until the new treatment is determined as either efficacious or futile, or when all the predetermined total number of subjects are recruited. Note that posterior probabilities could be calculated after cohorts of patients are accrued and observed, though we will not investigate that possibility here.

Decreasingly informative prior
A DIP is a skeptical prior that decreases in ESS as a trial progress. To that end, it incorporates both the predetermined total sample size and the current observed sample size in such a way that the unobserved sample size N-n is made explicitly or approximately equal to the prior ESS in the prior distribution. The DIP is also parameterized in a way that centers the prior distribution at some value or values that would reflect conditions of the null hypothesis (i.e., the new therapy is not efficacious).
The basic steps for constructing a DIP are as follows: Determine the prior ESS for a statistical model, functionalize the prior in terms of the observed sample size n and the planned sample size N (often N-n, the unobserved sample size) so that the prior ESS is N at the beginning of the trial and 0 at the end of the trial, center the prior distribution at some value reflecting the null hypothesis, which could come from a hyperprior.
Though several approaches are available, ESS can be determined using the expected localinformation-ratio approach. 7,8 For example, given binary outcomes with response rate p and a prior beta (a and b), we know the mode of the prior is a−1 a+b − 2 , as well as the prior ESS= a+b. If we want the mode of prior centered around p 0 , the value from the null hypothesis, then we can set prior ESS=a+b=N-n and a−1 a+b − 2 = p 0 , and solve to get a=l+p 0 (N−n−2) and b=1+(1-p 0 )(N−n−2). We could slightly alter the prior parameterization of a and b as a=1+p 0 (N−n) and b=1+(1−p 0 )(N−n) so that the prior would be non-informative when the unobserved sample size is 0 at the end of the trial. Similarly, if we want the prior mean centered around a null value of p 0 , then we let prior ESS=a+b=N−n and a a+b = p 0 . and derive a=p 0 (N−n) and b=(1−p 0 )(N−n); we again set a=1+p 0 (N−n) and b=1+(1−p 0 )(N−n) to make sure we have at least non-informative prior information when n=N in the trial.
In clinical trials, the number of accrued subjects n is small at the beginning of a trial relative to the planned sample size N, making the prior ESS large (e.g., 2+N−n) and the DIP informative. Since the prior is skeptical and parameterized to reflect the conditions stated in the null hypothesis (with mean or mode set to θ 0 ), the resulting posterior distribution, with a low effective sample size n in the likelihood function, will be restricted from providing evidence in the form of posterior probabilities that would favor termination of the trial. As the trial progresses, and the prior ESS is "transferred" to the likelihood via the increased observed sample size, the posterior distribution becomes increasingly more sensitive to the likelihood and terminating the trial-if evidence for concluding as such is present-becomes more likely.

Examples
Bernoulli data with a beta prior-Thall and Simon evaluate the efficacy of a new treatment based on Bernoulli outcomes where the interested parameter is the response rate. 1 In this case, we assume θ=P and we have the likelihood distribution In the one-sample case, we temporarily assume a non-informative prior distribution π(θ)=π(p) ~ beta (1,1); we will relax this assumption in subsequent paragraphs. Let θ=p denote the response rate of the new treatment and θ 0 =p 0 denote the null response rate (which could be taken as the standard or current rate), we can derive the posterior distribution of p based on the conjugate nature of the beta-binomial pairing p|y ~ beta (1+y, 1+n−y), where y = ∑ i = 1 n y i is the total number of successes out of the n observed subjects in the trial.
Instead of assuming a non-informative prior, we could elicit an informative prior-as suggested by Thall and Simon-by setting the prior mean equal to p 0 +δ 0 /2 and selecting a value for concentration parameter c e. 1 Thus, we can reparameterize the beta (a, b) prior using a=c e (p 0 +δ 0 /2) and b=c e [1−(p 0 +δ 0 /2)]. Thall and Simon discuss several possible values, including low values of c e (e.g., 2) representing a sparse prior distribution, and larger values c e (e.g., 10) representing an informative prior distribution localized around its mean. 1 The posterior distribution of p for an informative prior is then given by p|y ~ beta (a+y, b+n−y).
With a binary outcome and conjugate beta prior distribution, an informative and skeptical DIP can be specified as beta (1+p 0 (N−n), 1+(1−p 0 )(N−n)), as discussed in Section 2.2.
At the beginning of the trial, n and y are small and the posterior distribution of p is more centered at the prior mode p 0 . As n and y become larger, the accrued data become increasingly more important while the prior information is decreasing in importance.
Poisson data with a gamma prior-If the outcome in a clinical trial is the number of the events per subject, then a Poisson distribution with Gamma prior is a plausible choice for likelihood. One choice of the standard Bayesian prior distribution is a Jeffreys' non-informative Gamma prior (limiting case) λ~Gamma (0.5, 0.001) and the posterior is λ|y ~ Gamma (0.5+y, 0.001+n). To apply a DIP for the one-sample case with Poisson outcomes with mean event rate θ = λ and a prior Gamma (a, b), we know the null mean (λ 0 ) of the prior is In the DIP approach, when more subjects n is accrued in the trial, the skewness of the posterior distribution will depend more on the observed data instead of the prior information.
Normal data with known variance-For outcomes that could be modeled with a normal distribution with variance s 2 known, we have the likelihood function f(y|θ) ~ N (θ, s 2 ) with a normal prior θ ~ N (θ 0 , τ 2 ). In this case, the prior ESS=s 2 /τ 2 . 7 In a one-sample clinical trial, we assume θ=μ is the new treatment mean and τ 2 =s 2 /n 0 , where n 0 is the prior ESS with a null-mean μ 0 . For the given likelihood y|μ ~ N (μ, s 2 ) and prior μ ~ N (μ 0 , s 2 /n 0 ), the posterior distribution of μ can be written as: μ s 2 , y N n 0 n 0 + n μ 0 + n n 0 + n y, s 2 n 0 + n The value of prior parameters n 0 determines the level of information contained in the prior and the contribution of the null mean. If n 0 is small and s 2 /n 0 is large, the prior distribution is dispersed and less informative; when n 0 is larger, the prior distribution will be more tightly centered around the null mean and become more informative.
For the DIP model, we set a skeptical prior (centered at μ 0 ) as initially informative with n 0 =N−n. This formulation allows the information contained within the posterior distribution of μ to shift from the skeptical prior at the beginning of the trial to the likelihood function as subjects accrued. The DIP posterior distribution of μ is: μ s 2 , y N N−n N μ 0 + n N y, s 2 N As the posterior mean in the Bayesian model is a weighted average of the prior mean μ 0 and the sample mean y, this DIP formulation will cause the posterior mean to approximate the prior mean μ 0 early in a trial and will increasingly approximate y as subjects accrued.

Simulation templates
The goals of our simulation studies are to identify the smallest possible sample size N among admissible designs and to compare the DIP approach with other Bayesian approaches. We define an admissible design as having at least 80% power and no greater than 5% type I error rate. If there are no admissible designs based on power, we default to selecting the combination of parameters that yield the highest power and best-controlled type I error. If there are no admissible designs based on type I error, we default to selecting the combination of parameters that yield the lowest type I error and at least 80% power. We will explore Bernoulli-, Poisson-and normal-distributed outcomes.
In these simulations, the observed outcome y i for each subject in each trial is randomly simulated from the probability density or mass function f (y i |θ), where θ is based off the population-level values assumed for that trial. Each subsequent subject is recruited until the trial is stopped (for futility or efficacy) or the planned sample size N is reached. In all one-sample cases, the upper (efficacy) and lower (futility) decision boundaries are set to p s ∈ (0.80, 0.99) and p f ∈ (0.01, 0.10) respectively, and the total sample size N∈(10, 100).
For simplicity, we assume the target threshold δ 0 equals 0. We simulate the observed data and estimate the power and type I error for each combination of p s , p f and the planned total sample size N. We then select the smallest total sample size under the admissible power and type I error. Type I error is measured as the proportion of trials where the null hypothesis is rejected under the null hypothesis (e.g., θ 1 =θ 0 for one-sample case), while power is measured as the proportion of trials where the null hypothesis is rejected under the alternative hypothesis (e.g., θ 1 >θ 0 ). Each parameter setting is repeated in 1000 simulated trials. All simulations are coded using R 1.4.1717. 9 The random samples are generated with the same seed.
For the Bernoulli outcome when θ=p, we assume the treatment group with higher response rate is more efficacious, and consider several models: a non-informative prior beta (1,1), informative prior beta (a, b) with several choices of prior information a+b=2, 6 or 10, and a DIP skeptical prior, as illustrated in section 2.3.1.
In the one-sample case, we consider null response rates of p 0 =0.1, 0.3, 0.5, or 0.7, with the actual response rate for the new treatment response rate p 1 set at p 1 =p 0 +δ, where we range δ ∈ (0, 0.05, 0.10, 0.15, 0.2), while the target improvement is set at δ 0 =0 for simplicity. The outcome for each subject is randomly generated from Bernoulli (p 1 ).
For the normal cases where θ=μ with known variance, we consider Bayesian models with n 0 =2, 6, or 10 for Equation 2, as well as a DIP case. We study low-variance and highvariance cases reflecting our assumptions about the known variability. For each template, we set the null mean as μ 0 =100 and expect lower values to imply improved efficacy; thus, the new treatment mean is defined as μ 1 =μ 0 -δ, where we set δ=0, 5, or 10 and set δ 0 =0 for simplicity; we consider the low variability with s=15 and the high variability with s=30. Each subject is randomly generated from N (μ 1 , s 2 ). Table 1 shows the simulation results for one-sample Bernoulli cases with a low response rate (p 0 =0.1). Compared with different standard Bayesian approaches, the DIP approach always has better-controlled type I error, with a comparative or lower sample size. For some cases in which the standard Bayesian approach cannot achieve the admissible design on type I error, such as the case p 1 =0.2, the DIP approach not only controlled type I error, but also has the smallest planned sample size. Results are similar for other cases (p 0 =0.3, 0.5 and 0.7) (see Tables S.1 The simulation results for the one-sample Poisson cases are shown in Table 2. Compared to the non-informative Bayesian approach, the DIP approach performs better in controlling type I error. Additionally, when the effect size is large, the DIP approach has a lower sample size and better-controlled type I error rate. Table 3 show the simulation results for one-sample normal cases. In both low and high variability settings, compared with different standard Bayesian approaches, the DIP approach has lower or comparative sample size when type I error is controlled (0.05). When the admissible type I error rate cannot be achieved, the DIP approach has a better-controlled type I error and comparable sample size.

DISCUSSION
In summary, we introduced the rationale of the DIP approach, applied the DIP to three formulations of early termination phase II trial designs (Poisson, Bernoulli, and Normal), and compared the performance to the Bayesian approaches by Thall and Simon using simulation studies. 1,2 The results show that, for the three distributions and across all onesample settings, compared to the traditional Bayesian approaches by Thall and Simon, the DIP approach requires fewer patients when admissible designs are achieved. 1,2 In the designs where type I error or power are not admissible, the DIP approach yields similar power and better-controlled type I error with comparable or fewer patients than Thall and Simon's Bayesian approaches. 1,2 We also extend the one-sample case to two-sample cases, and the results are presented in the supplemental material. For two-sample cases, it is concluded that the DIP approach performed better than Thall and Simon's Bayesian approaches for moderate to large response rates, but performed poorly with low response rates and low effect sizes. 1,2 It should be noted that the focus of this study is on identifying the smallest sample size to achieve an admissible design, defined by the commonly used thresholds of at least 80% power and at most 5% type-I-error. Changing the minimum power and maximum type I error rate might change our findings and conclusions, though these values are conventional. We also ignored admissible designs with a larger sample size, forfeiting designs with possibly higher power or lower type I error. While our choices for the predetermined sample size (N) are limited within the admissible design as having at least 80% power and no greater than 5% type I error, the choices for parameters settings in the simulations are broadly and comprehensively considered to reflect realistic scenarios. For each parameter set, we also investigated a non-informative and three informative models (κ 0 =2, 6 and 10) in comparison with the DIP model.
While we elicited the DIP in the way that is not based on any historical or optimistic prior, the researchers can still explore other subjective priors at the end of the trial to determine the robustness of their findings. We also motivated the DIP approach using conjugate examples: Poisson-gamma, beta-binomial, and normal-normal models. We can easily extend this to other prior-likelihood combinations, particularly those that lead to non-conjugate or intractable posterior distributions using MCMC approaches. The key of the DIP approach with a non-conjugate prior is to parameterize the prior so that its effective sample size equals N−n, which may require numerical or simulation-based determination. 7,8 In future work, we plan to extend the single parameter DIP model to cases with two or more parameters.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Funding:
The study was partially supported by the Biostatistics, Epidemiology and Research Design (BERD) core of the C.  Simulation results for Poisson cases.  Simulation results for normal cases with known variance. type I error is calculated under the null hypothesis μ1=μ0.