Reporting on Statistical Methods To Adjust for Confounding:
A Cross-Sectional Survey
Marcus Mu¨llner, MD; Hugh Matthews, BSc, MBBS; and Douglas G. Altman, DSc

Background: The use of complex statistical models to adjust for
confounding was used. In 1 paper in 10, it was unclear which
confounding is common in medical research.
statistical method was used or for which variables adjustment was
made. In 45% of papers, it was not clear how multicategory or

Objective: To determine the frequency and adequacy of adjust-
continuous variables were treated in the analysis. Inadequate re-
ment for confounding in medical articles.
porting was less frequent if an author was affiliated with a de-
Design: Cross-sectional survey.
partment of statistics, epidemiology, or public health and if arti-
cles were published in journals with a high impact factor.

Setting: 34 scientific medical journals with a high impact factor.
Conclusions: Details of methods used to adjust for confounding
Measurements: Frequency of reporting on methods used to
are frequently not reported in original research articles.
adjust for confounding in 537 original research articles published
in January 1998.

Ann Intern Med. 2002;136:122-126.
Results: Of the 537 articles, 169 specified that adjustment for
For author affiliations, current addresses, and contributions, see end of text.
Discovering the determinants of disease is often not
In this cross-sectional study, we sought to determine straightforward, particularly if the disease or the how frequently adjustment is reported in medical scien- risk factor is rare or not easily recognized (1). In case– tific articles and whether reporting is sufficiently de- control, cohort, and other nonrandomized studies, the groups being compared are likely to vary with respect toseveral demographic, clinical, and other characteristics.
In randomized trials, authors often try to demonstrate Journal Selection
that the observed treatment effect is not explained by Scientific medical journals were included if they some difference in baseline characteristics (2). The def- were published in English, were available in the British inition of confounding is that there are alternative ex- Medical Association’s library, and had an impact factor planations for an observed association between a risk (6) that placed it in the highest 20% of journals within factor and a health outcome. This may occur when one its medical specialty. We excluded review journals and or more of these demographic or clinical characteristics journals specializing in statistics, epidemiology, and are associated with one another and with the outcome of interest (3). Social class, for example, is known to beassociated with cardiovascular risk factors, such as smok-ing, serum cholesterol level, and leisure physical activity, Article Selection
as well as with mortality. This difference in demo- Two of the authors independently assessed all Jan- graphic and clinical characteristics may account for uary 1998 issues of the selected journals to identify full- about half of the excess coronary and all-cause mortality length original research articles. Short reports, scientific in blue-collar workers compared with white-collar work- letters, case reports, and review articles, and animal stud- If confounding cannot be avoided at the design stage of a study, disentangling the web of causation is Data Collection
often difficult, and more or less complex statistical We assessed whether adjustment for baseline vari- methods are needed. Readers of published scientific ar- ables or confounding was performed and whether the ticles need to know whether and how the authors ap- paper specified which method was used, the variables for propriately adjusted for confounding (5).
which adjustment was made, and how these variables 122 15 January 2002 Annals of Internal Medicine Volume 136 • Number 2
Reporting on Statistical Methods To Adjust for Confounding Brief Communication Figure. Articles assessed for reporting of adjustment for confounding or baseline differences and whether adjustment
was reported in the methods or results section.

were handled in the analysis (that is, whether it was to 10, or Ͼ10)—was assessed by using chi-square tests specified how continuous and multicategory variables or chi-square tests for trend. We used multiple logistic were entered into the statistical model). Inappropriate regression to investigate simultaneously variables that reporting was defined as not reporting or insufficiently showed an association with inappropriate reporting.
reporting on one or more of these points. For simplicity, Data were processed by using Excel 97 software (Mi- we assumed that all multiple regression analyses were crosoft Corp., Redmond, Washington) and Stata, re- done to adjust for confounders, even though in some lease 6 (Stata Corp., College Station, Texas).
cases they were done to identify prognostic variables.
Each paper was independently assessed by two of the authors. Agreement between the two authors was good to very good (␬ ϭ 0.61 to 0.96). Disagreement was Thirty-four journals met our inclusion criteria (Ap- usually caused by oversight rather than differing opin- pendix). The median impact factor of these journals in ions. In case of disagreement or uncertainty, the third 1996 was 4.26 (interquartile range, 2.64 to 5.74). We author (a senior statistician) was consulted.
identified 537 articles that fulfilled our inclusion criteria
(Figure), of which 169 (32%) reported adjustment for
confounding or baseline differences.
Statistical Analysis
The univariate association between inappropriate reporting and several variables—for example, impact Reporting of Methods
factor (quartiles); at least one of the authors being affil- Of the 169 articles, 152 (90%) appropriately re- iated with department of statistics, epidemiology, or ported methods to adjust for confounding (Figure), 7
public health; and number of authors (1 or 2, 3 to 5, 6 reported no method, and 10 mentioned but did not
15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 123
Brief Communication Reporting on Statistical Methods To Adjust for Confounding Table. Association between Inadequate Reporting of Adjustment for Confounders and Study Characteristics in 169
Original Research Articles

Studies with
Relative Risk (95% CI)
Inadequate Reporting,
n/n (%)

At least one author affiliated with department of statistics, epidemiology, or public health * Chi-square test for comparing two proportions or test for trend.
adequately specify a method. In these 10 articles, the groups were comparable at baseline. In a table, the au- authors used the phrases “multiple regression” or “mul- thors report “baseline adjusted incidence of pulmonary complications.” However, the reader is not told why Of articles that specified the method, multiple logis- adjustment was needed to present a main end point, tic regression analysis was the most frequently used (n ϭ how this adjustment was performed, or which of the 14 76), followed by multiple Cox proportional hazards or more baseline variables were included (7).
models (n ϭ 43); multiple linear regression (n ϭ 33);and other methods (n ϭ 31), including stratified analy- sis, partial correlation, direct and indirect standardiza- A randomized, controlled trial compared two anti- tion, mixed-effect modeling, and age-adjusted z-score.
biotics for the treatment of gonorrhea. The authors Some papers included more than one method of ad- stated that “After correction for baseline abnormalities justed analysis, but each paper was counted only once in there was no significant difference in laboratory abnor- malities.” However, they did not indicate which baselineabnormalities were meant and how this correction was Reporting of Variables
Of the 169 articles that reported adjustment for confounding, 154 (91%) clearly specified variables for which adjustment was made, and 93 (55%) clearly A randomized, controlled trial compared balsalazide stated how variables were handled (Figure). Only 93
with mesalamine in patients with acute ulcerative colitis.
articles (55%) met criteria for appropriate reporting: 51 The authors stated that “Logistic regression techniques articles had one inadequacy, 17 articles had two inade- were used to identify prognostic factors significantly as- quacies, and 8 articles had three inadequacies.
sociated with remission.” The reader is told which vari-ables were finally significantly associated with the out-come but not which variables were entered into the Examples of Inappropriate Reporting
A randomized, controlled trial investigated whether surfactant administered to preterm infants reduces the incidence of severe complications. The authors provided Breathing patterns and respiratory muscle perfor- great detail to demonstrate that treatment and control mance measures during weaning were examined in a 124 15 January 2002 Annals of Internal Medicine Volume 136 • Number 2
Reporting on Statistical Methods To Adjust for Confounding Brief Communication case series of 17 patients receiving prolonged mechanical the interpretation of the results. Transformation to ful- ventilation. In the methods section, the authors state fill the assumption of a normal distribution may or may that “adjusted means were calculated for the variables not have been performed (13). Likewise, for categorical presenting a group effect,” but the reader is not told variables, the definition and number of categories are how or where this calculation was performed (10).
needed. Assuming a linear association when it is nonlin-ear may mask an association (14), having too few andbroad categories may lead to considerable residual con- Determinants of Inappropriate Reporting
founding, and having too many categories may reduce In 64 of the 169 articles (38%), at least one author was a methodologist (that is, he or she was affiliated The reasons for these shortcomings may be mani- with a department of statistics, epidemiology, or public fold. It is possible that all the necessary information was health). Among papers with a methodologist-author, the present before peer review and was omitted during the rate of any inappropriate reporting was about half publication process. However, errors and omissions are that of papers without a methodologist-author (Table).
more likely the consequence of a system failure at many The rate of inappropriate reporting tended to decrease levels (17), including that of the authors, reviewers, stat- as the journal impact factor increased, but this effect was largely due to a lower rate of inappropriate reporting in Although data analysis may be correct despite inap- the journals in the highest quartile of impact factor. The propriate reporting, such reporting leaves readers unable number of authors was not associated with inadequate to assess whether the data were processed appropriately.
Having a methodologist as an author seems to have a A multiple logistic regression model with any inap- “protective” effect, which is in accordance with the find- propriate reporting as the dependent variable and meth- ings of an earlier study (15). Why articles published in odologist-author and impact factor (quartiles) as predic- journals with a very high impact factor have a lower rate tor variables showed that these two effects were largely of inappropriate reporting remains a matter of specula- independent (results not shown). Among the journals in tion. It may relate to the fact that these journals more the lower three quartiles of impact factor, 12 of 42 frequently use statistical reviewers than do lower-rank- (29%) articles with a methodologist-author and 56 of 90 (62%) without a methodologist-author had inappro- We suggest that readers, authors, referees, and edi- priate reporting. In contrast, among the journals in the tors try to assess whether original articles state which top quartile of impact factor, the rate of inappropriate statistical method was used to adjust for confounders, reporting was similar among articles with and without a for which variables adjustment was performed, and the methodologist-author (3 of 15 [20%] articles vs. 5 of 22 way in which the variables were handled in the analysis.
[23%] articles, respectively). However, because thenumber of papers was small and this split by impactfactor was not prespecified, P values are not presented.
The following journals were included in the study: American DISCUSSION
Journal of Cardiology, American Journal of Medicine, American Statistical methods are often misused, and poorly Journal of Obstetrics and Gynecology, Anaesthesia, Anesthesiology, presenting them leaves the reader unable to critically Annals of Internal Medicine, Archives of Dermatology, Archives of interpret the findings of an original research study (11, Internal Medicine, BMJ, Blood, Brain, British Journal of Anaesthe-sia, British Journal of Cancer, British Journal of Dermatology, Brit- 12). Some studies use a selection procedure to reduce ish Journal of Obstetrics and Gynaecology, British Journal of Surgery, the number of variables for which adjustment is needed Circulation, Critical Care Medicine, Gastroenterology, Gut, JAMA, to those that are statistically significant. In such cases, Journal of the American College of Cardiology, Journal of Gerontol- authors should report all variables considered in addi- ogy, Journal of the National Cancer Institute, Journal of Pediatrics, tion to those for which adjustment was actually made.
Journal of Rheumatology, Kidney International, The Lancet, New Not reporting whether variables are treated as con- England Journal of Medicine, Neurology, Pediatrics, Thorax, tinuous or as categorical data may make a difference in Thrombosis and Haemostasis, and Transplantation.
15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 125
Brief Communication Reporting on Statistical Methods To Adjust for Confounding From BMJ, London, and ICRF Medical Statistics Group, Oxford, 5. Uniform requirements for manuscripts submitted to biomedical journals.
International Committee of Medical Journal Editors. Ann Intern Med 1997;
126:34-47. [PMID: 8992922]
6. Garfield E. How can impact factors be improved? BMJ. 1996;313:411-3.
Acknowledgments: The authors thank the BMJ staff, particularly Rich-
ard Smith, for providing the environment that enabled this research 7. Lotze A, Mitchell BR, Bulas DI, Zola EM, Shalwitz RA, Gunkel JH. Mul-
ticenter study of surfactant (beractant) use in the treatment of term infants withsevere respiratory failure. Survanta in Term Infants Study Group. J Pediatr. 1998; Requests for Single Reprints: Marcus Mu¨llner, MD, Universita¨tsklinik
fu¨r Notfallmedizin, Allgemeines Krankenhaus Wien, Wa¨hringer Gu¨rtel 8. Jones RB, Schwebke J, Thorpe EM Jr, Dalu ZA, Leone P, Johnson RB.
18-20/6D, A-1090 Vienna, Austria; e-mail, [email protected]
Randomized trial of trovafloxacin and ofloxacin for single-dose therapy of gon-orrhea. Trovafloxacin Gonorrhea Study Group. Am J Med. 1998;104:28-32.
Current Author Addresses: Dr. Mu¨llner: Universita¨tsklinik fu¨r Notfall-
medizin, Allgemeines Krankenhaus Wien, Wa¨hringer Gu¨rtel 18-20/6D, 9. Green JR, Lobo AJ, Holdsworth CD, Leicester RJ, Gibson JA, Kerr GD, et
al. Balsalazide is more effective and better tolerated than mesalamine in the treat-
ment of acute ulcerative colitis. The Abacus Investigator Group. Gastroenterol-
Mr. Matthews: Sandbanks, Graveney, Faversham, Kent ME13 9DJ, 10. Capdevila X, Perrigault PF, Ramonatxo M, Roustan JP, Peray P, d’Athis F,
Dr. Altman: ICRF Medical Statistics Group, Centre for Statistics in et al. Changes in breathing pattern and respiratory muscle performance parame-
Medicine, Institute of Health Sciences, Old Road, Headington, Oxford ters during difficult weaning. Crit Care Med. 1998;26:79-87. [PMID: 9428547] 11. Bender R, Grouven U. Logistic regression models used in medical research
are poorly presented [Letter]. BMJ. 1996;313:628. [PMID: 8806274]
Author Contributions: Conception and design: M. Mu¨llner, H. Mat-
12. Khan KS, Chien PF, Dwarakanath LS. Logistic regression models in obstet-
rics and gynecology literature. Obstet Gynecol. 1999;93:1014-20. [PMID: Analysis and interpretation of the data: M. Mu¨llner, D.G. Altman.
Drafting of the article: M. Mu¨llner, H. Matthews, D.G. Altman.
13. Bland JM, Altman DG. Transforming data. BMJ. 1996;312:770. [PMID:
Critical revision of the article for important intellectual content: M.
Mu¨llner, H. Matthews, D.G. Altman.
14. Katz MH. Multivariable Analysis: A Practical Guide for Clinicians. New
Final approval of the article: M. Mu¨llner, H. Matthews, D.G. Altman.
Statistical expertise: M. Mu¨llner, D.G. Altman.
15. Altman DG, De Stavola BL, Love SB, Stepniewska KA. Review of survival
Administrative, technical, or logistic support: M. Mu¨llner.
analyses published in cancer journals. Br J Cancer. 1995;72:511-8. [PMID: Collection and assembly of data: M. Mu¨llner, H. Matthews.
16. Brenner H. A potential pitfall in control of covariates in epidemiologic stud-
ies. Epidemiology. 1998;9:68-71. [PMID: 9430271]
17. Reason J. Human error: models and management. BMJ. 2000;320:768-70.
[PMID: 10720363]
1. Hill AB. The environment and disease: association or causation? Journal of the
18. Godlee F, Gale CR, Martyn CN. Effect on the quality of peer review of
Royal Society of Medicine. 1965;58:295-300.
blinding reviewers and asking them to sign their reports: a randomized controlled 2. Altman DG. Adjustment for covariate imbalance. In: Armitage P, Colton T,
trial. JAMA. 1998;280:237-40. [PMID: 9676667] eds. Encyclopaedia of Biostatistics. New York: Wiley; 1998:1000-5.
19. Goodman SN, Altman DG, George SL. Statistical reviewing policies of
3. Hennekens CH, Buring JE, Mayrent SH. Epidemiology in Medicine. Boston:
medical journals: caveat lector? J Gen Intern Med. 1998;13:753-6. [PMID: 4. Pekkanen J, Tuomilehto J, Uutela A, Vartiainen E, Nissinen A. Social class,
health behaviour, and mortality among men and women in eastern Finland.
2002 American College of Physicians–American Society of Internal 126 15 January 2002 Annals of Internal Medicine Volume 136 • Number 2


