Testing for baseline differences in clinical trials
Reporting statistical tests for baseline measures of clinical trials does not make sense since the statistical significance is dependent on sample size, as a large trial can find significance in the same difference that a small trial did not find to be statistically significant. We use 3 published trials using the same baseline measures to provide the relationship between trial sample size and p value. For trial 1 sequential organ failure assessment (SOFA) score, p=0.01, 10.4±3.4 vs. 9.6±3.2, difference=0.8; p=0.007 for vasopressors, 83.0% vs. 72.6%. Trial 2 has SOFA score 11±3 vs. 12±3, difference=1, p=0.42. Trial 3 has vasopressors 73% vs. 83%, p=0.21. Based on trial 2, supine group has a mean of 12 and an SD of 3 for SOFA score, while prone group has a mean of 11 and an SD of 3 for SOFA score. The p values are 0.29850, 0.09877, 0.01940, 0.00094, 0.00005, and <0.00001 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Based on trial 3 information, the vasopressors percentages are 73.0% in the supine group vs. 83.0% in the prone group. The p values are 0.4452, 0.2274, 0.0878, 0.0158, 0.0031, and 0.0006 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Small trials provide larger p values than big trials for the same baseline differences. We cannot define the imbalance in baseline measures only based on these p values. There is no statistical basis for advocating the baseline difference tests
Moher D, Schulz S, Gotzsche P, Egger M. CONSORT 2010 explanation and elaboration: Updated guideline for reporting parallel group randomized trials. BMJ. 2010;340:c869.
Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994;13(17):1715-26.
Altman D, Doré C. Randomisation and baseline comparisons in clinical trials. Lancet. 1990;335(8682):149-53.
Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet. 2000;355(9209):1064-9.
Wright N, Ivers N, Eldridge S, Taljaard M, Bremner S. A review of the use of covariates in cluster randomized trials uncovers marked discrepancies between guidance and practice. J Clin Epidemiol. 2015;68(6):603-9.
Schulz KF. Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecology journals. JAMA. 1994;272(2):125-8.
Knol M, Groenwold R, Grobbee D. P-values in baseline tables of randomised controlled trials are inappropriate but still common in high impact journals. Eur J Prev Cardiol. 2012;19(2):231-2.
Peterson RL, Tran M, Koffel J, Stovitz SD. Statistical testing of baseline differences in sports medicine RCTs: a systematic evaluation. BMJ Open Sport Exerc Med. 2017;3(1):e000228.
Boer MRD, Waterlander WE, Kuijper LD, Steenhuis IH, Twisk JW. Testing for baseline differences in randomized controlled trials: an unhealthy research behavior that is hard to eradicate. Int J Behav Nutr Phys Act. 2015;12:4.
Zhao W, Berger V. Imbalance control in clinical trial subject randomization—from philosophy to strategy. J Clin Epidemiol. 2018;101:116-8.
Altman DG. Comparability of Randomised Groups. Statistician. 1985;34(1):125.
Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21(19):2917-30.
Mutz DC, Pemantle R, Pham P. The perils of balance testing in experimental design: Messy analyses of clean data. Am Stat. 2018;73(1):32-42.
Sedgwick P. Randomised controlled trials: balance in baseline characteristics. BMJ. 2014;349:g5721.
Roberts C, Torgerson DJ. Understanding controlled trials: Baseline imbalance in randomised controlled trials. BMJ. 1999;319:185.
Voggenreiter G, Aufmkolk M, Stiletto RJ, Baacke MG, Waydhas C, Ose C, et al. Prone positioning improves oxygenation in post-traumatic lung injury - A prospective randomized trial. J Trauma. 2005;59(2):333-43.
Mancebo J, Fernández R, Blanch L, Rialp G, Gordo F, Ferrer M, et al. A multicenter trial of prolonged prone ventilation in severe acute respiratory distress syndrome. Am J Respir Crit Care Med. 2006;173(11):1233-9.
Guerin C, Reignier J, Richard J, Beuret P, Gacouin A, Boulain T, et al. Prone positioning in severe acute respiratory distress syndrome. N Engl J Med. 2013;368(23):2159-68.
Wang W, Ma Y, Huang Y, Chen H. Generalizability analysis for clinical trials: a simulation study. Stat Med. 2017;36:1523-31.