Abstract
Background: The interpretation of the results of randomized controlled trials (RCTs) has traditionally emphasized statistical significance rather than clinical importance. Our aim was to assess the quality of reporting of factors related to clinical importance in a sample of published RCTs.
Methods: A random sample of 27 (of a total of 266) RCTs published in 5 major medical journals over a 1-year period were reviewed by 4 independent reviewers for factors considered important in the interpretation of the clinical importance of study results: identification of a clearly defined primary outcome, reporting of the expected difference between groups used in the calculation of sample size (the delta value) and whether it was based on the minimal clinically important difference of the intervention, the statistical significance of the results, presentation of pertinent confidence intervals, and the authors' interpretation of the clinical importance of the results.
Results: Twenty-two of 27 (81%) articles explicitly reported a single primary outcome. Of the 20 articles that included a sample size calculation, 18 (90%) reported a delta value. Two of the 18 (11%) articles explicitly stated that the delta value was chosen to reflect the minimal clinically important difference of the intervention. For the primary outcomes, confidence intervals surrounding the point estimates of the efficacy of the interventions were reported in 11 of 27 (41%) studies. The study results were interpreted from the perspective of clinical importance in 20 of 27 (74%) of the articles. Of these 20 reports, 5 (25%) provided justification for their clinical interpretation of the results.
Interpretation: Authors of RCTs published in major general medical and internal medicine journals do not consistently provide their own interpretation of the clinical importance of their results, and they often do not provide sufficient information to allow readers to make their own interpretation.
The interpretation of the results of randomized controlled trials (RCTs) has emphasized statistical significance rather than clinical importance. For example, the revised CONSORT statement,1 which is a widely adopted, recently updated series of recommendations designed to improve the quality of reporting of RCTs, failed to recommend specifically that authors discuss the clinical importance of their results. The lack of emphasis on clinical importance has led to frequent misconceptions and disagreement regarding the interpretation of the results of clinical trials and a tendency to equate statistical significance with clinical importance. In some instances, statistically significant results may not be clinically important and, conversely, statistically insignificant results do not rule out completely the possibility of clinically important effects.2,3
The minimal clinically important difference (MCID) of a therapy is defined as the smallest treatment effect that would result in a change in patient management, given its side effects, costs and inconveniences.4 It is a key concept in the design of clinical trials. Sample size calculations for prospective trials require determination of the magnitude of difference in outcomes between treatment groups that the study can reliably detect (often called the delta value). In order for clinical trials to have the best chance of detecting clinically important effect sizes, their delta values should reflect the MCIDs of the study interventions.
The MCID is also a key concept in the interpretation of clinical trial results. For example, in individuals without previous myocardial infarction (MI) or stroke, the regular use of ASA will reduce the incidence of MI by 0.2% per year (from a baseline rate of 0.7%/year to 0.5%/year, which is a relative risk reduction of about 25%), but this benefit is possibly offset by a concomitant absolute increase in the chance of stroke of 0.02% per year (from 0.30%/year to 0.32%/year, which is a relative increase of about 10%/year) and of gastrointestinal bleeding of about 1% per year (from about 1%/year to 2%/year).5 After weighing the advantages and disadvantages of ASA use in this clinical situation, an influential expert panel did not recommend its use, reasoning that the efficacy of ASA in preventing MI was not sufficient to overcome the increased incidence of stroke and gastrointestinal bleeding in this low-risk group.6 That is, the efficacy of ASA in this clinical situation was insufficient to meet or exceed its MCID.
The comparison of actual study results (including the point estimates and the accompanying confidence intervals) with MCID values can also give an indication of the clinical importance of the study results.7 If the MCID estimate is less than the value of the lower limit of the 95% confidence interval, the study results are statistically significant and very likely to be clinically important. Alternatively, if the MCID value is greater than the upper limit of the 95% confidence interval, the results of the study are very likely to be clinically unimportant. Finally, for study results in which the MCID values are contained within the 95% confidence intervals, their clinical importance is less clear.
To allow readers to interpret the clinical importance of trial results from their own perspective, specific information must be reported, including a clearly defined primary outcome, a delta value or MCID, statistical significance and confidence intervals around the point estimates of the efficacy of the intervention. A previous review in 1987 found that such information was often underreported.8 The objective of this study was to assess the quality of reporting with respect to the concept of clinical importance in a random sample of RCTs recently published in high-impact general medical and internal medicine journals.
Methods
For the period from Dec. 1, 1998, to Nov. 30, 1999, 5 high-impact general medical and internal medicine journals (Annals of Internal Medicine, BMJ, JAMA, The New England Journal of Medicine and the Lancet) were manually searched by 2 independent reviewers for RCTs. From each journal, a random sampling of 10% of the identified articles were selected for review. These articles were evaluated by 4 independent reviewers using a standardized data collection sheet developed to allow assessment of key components in the interpretation of study results from the perspective of clinical importance (Table 1). Any disagreements were resolved by collaborative review. The standardized data collection sheet evaluated the reporting of 5 factors:
1. Identification of a clearly defined primary outcome. The Methods section of each report was reviewed for an explicitly stated primary outcome. The primary outcome was defined as the one upon which the sample size calculation was based. If no sample size calculation was reported, the rest of the report was reviewed for an explicitly stated primary outcome. If an explicitly stated primary outcome could not be identified, the outcome that was deemed to be of greatest clinical relevance was chosen.
2. Reporting of the sample size delta value. Sample size calculations were evaluated for the reporting of a delta value, that is, the magnitude of difference in outcomes between treatment groups that the trial was attempting to detect. It was recorded whether the delta value was chosen to reflect the authors' perception of the MCID of the study intervention, and whether it was reported in absolute or relative terms. In order to prevent MCID values from changing with the baseline rate in the control groups (which will occur if MCID values are reported as relative risk reductions), we believe that MCID values should always be stated in absolute terms.
3. Documentation of the statistical significance of the results. With respect to the primary outcome, the statistical significance (p < 0.05) of the efficacy of the study intervention was recorded.
4. Presentation of the pertinent confidence intervals. For the primary outcome, we recorded whether the 95% confidence intervals surrounding the point estimates of the efficacy of the interventions were reported.
5. Authors' interpretation of the clinical importance of their results. The Discussion sections were reviewed to determine whether the clinical importance of the study results was discussed. If this was discussed, we judged whether the discussion of clinical importance was explicit or implicit. The discussion was classified as explicit if there was direct comment on the clinical importance of the study results. Indirect references to clinical importance were classified as implicit. We also evaluated how well the authors justified their interpretation of the clinical importance of the study results. A rating system that graded the levels of strength of justification was developed by reviewer consensus. Levels of justification (Table 2) ranged from level 1 (“explicitly discusses the clinical importance of the primary study result in relation to previous empirical work done to determine the MCID of the therapy”) to level 4 (“no accompanying justification”). Discussion of the clinical importance of study results was also reviewed for 2 other factors: justification of the magnitude of the chosen MCID value, and commentary on the relation between the MCID and the confidence intervals around the primary outcome.
Results
From the 5 journals, 266 eligible RCTs were identified. Two reports were excluded because they presented long-term follow-up data from previously published trials. Based on a random 10% sampling of each journal's published RCTs, a total of 27 trials were selected for review.11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37 The overall results of our study are shown in Table 3.
A total of 22 of 27 (81%) articles explicitly reported a single primary outcome. The remaining 5 articles measured multiple outcomes without identifying one as the primary outcome. For these articles, the primary outcome was chosen to be the one we judged most clinically relevant.
Twenty of 27 (74%) articles included a sample size calculation. Of these 20, a delta value was reported in 18 (90%). Two of the 18 (11%) articles explicitly stated that the delta value was chosen to reflect the MCID of the intervention. The delta values were reported as absolute values in 13 of 18 (72%) studies. In 2 of the 5 studies that did not report absolute delta values, the anticipated baseline event rates in the control groups were given and, therefore, absolute delta values could be calculated.
Of the 27 RCTs, 17 (63%) studies had primary outcomes that were reported as being statistically significant (p < 0.05).
For the primary outcome, confidence intervals surrounding the point estimates of the efficacy of the intervention were reported in 11 of 27 (41%) of the studies.
The study results were interpreted from the perspective of clinical importance in 20 of 27 (74%) articles, with 10 of 20 (50%) articles having an explicit discussion and 10 of 20 (50%) articles having an implicit discussion. Most of the reports (15 of 20, 75%) provided no justification for the authors' interpretation of the clinical importance of their results (level 4). In only one article was it judged that the authors discussed the clinical importance of their study results in relation to a reported sample size MCID (level 2). No study authors justified their interpretation of clinical importance in relation to empirically determined or consensus-based MCID values (level 1). No study authors provided justification for the magnitude of their MCID value or commented on the relation of the MCID with the confidence intervals surrounding the point estimates.
Interpretation
The results of this study document that many reports of RCTs provide no discussion regarding the clinical importance of the trial results. These results agree with the findings of a previous review of 102 RCTs with statistically insignificant results in which only 20% (20/102) of studies made any statement about clinical importance.38
Although clinical importance was discussed in 74% (20/27) of articles in our study, we were liberal in our categorization. Several studies had only one sentence that implied clinical importance (e.g., “our findings suggest that lorazepam is safe and effective for these patients”27). For studies that did provide a discussion of clinical importance, most provided little or no justification for their clinical interpretation. The one report that ranked highest on our levels of justification (level 2) was a trial comparing adenoidectomy or tonsillectomy, or both, with medical therapy in the treatment of recurrent acute otitis media.11 The sample size calculation was based on a delta value that the authors believed would represent a clinically important difference for the intervention(s). In the discussion, the authors stated, “efficacy [did not] reach the level we initially posed as necessary to justify surgery.”11 This article was the only one that clearly compared the study result with an a priori MCID.
In addition to the authors' interpretation of the clinical importance of their trial results, readers should be provided with sufficient information to judge this for themselves. The study results demonstrate that many reports of RCTs did not provide the necessary information.
Eighty-one percent of the studies reviewed reported a clearly defined primary outcome. This is an improvement on a 1987 review of 45 RCTs published in 3 major medical journals in which the primary outcome measures were clearly specified in only 27% of the reports.8 This trend was also apparent in the reporting of confidence intervals, with 37% of studies reviewed having reported pertinent confidence intervals compared with 13% in the previous review.8
Many studies (26%) did not report a sample size calculation or a delta value. Of those that did, many reported the delta value in relative instead of absolute terms, with an even smaller number explicitly identifying the delta value as a MCID. Furthermore, in many instances, the delta value did not appear to reflect the MCID of the intervention. For example, a study17 that assessed the effect of dietary supplementation with polyunsaturated fats demonstrated an absolute decrease of 1.3% (95% confidence interval 0.1%–2.6%) in the primary outcome (combined end point of death, nonfatal MI and stroke). In the sample size calculation, the delta value was a 4% absolute difference between groups over a 3.5-year period. Thus, the efficacy of the intervention found by the study was significantly smaller than the sample size delta. If the sample size delta truly reflected the MCID of the intervention, then the efficacy of the intervention was not clinically important. However, the authors concluded that their study result provided both “a clinically important and statistically significant benefit.” This issue is probably not unique to this study, because sample size calculations are often based on feasibility issues (e.g., choosing a large delta in order to reduce sample size) rather than powering the trial to have the best chance of detecting clinically important differences.39 In order to change this practice with strict calculation of sample sizes based on MCIDs, feasibility issues will need to be addressed. For example, funding agencies will have to agree to provide increased support so that larger trials can be performed.
Interestingly, one trial appeared overpowered to detect the authors' perception of a clinically important benefit.15 In this trial that assessed the effect of low-dose hydrocortisone on chronic fatigue syndrome, a 9-point reduction on a fatigue scale was deemed to be clinically important. However, the delta for the sample size calculation was reported as a 4-point reduction on the same scale. Thus, the authors reported that the study result, a statistically significant 4.5-point reduction on the fatigue scale when comparing the intervention group with the control group, was not clinically important.15
If delta values are not representative of MCIDs and there is little discussion of the clinical importance of the study results, it is difficult to determine whether the authors perceive the results of their study to be clinically important. Previous work done by Burback and colleagues40 clearly illustrates this. The clinical importance of the results of several RCTs that assessed the efficacy of tacrine in the treatment of Alzheimer's disease was unclear, because the reports provided no MCID values and the authors generally failed to comment on the clinical importance of their results. Therefore, Burback and colleagues40 surveyed a group of practising physicians to determine the MCID for this intervention. The MCID value was then compared with the results of trials under review. Of the 2 of 12 studies that found a statistically significant difference in favour of tacrine therapy, neither study result could be deemed clinically important when compared with the MCID value.
One limitation of our study was that we had access only to the final published reports of the trials we reviewed. Thus, it is possible that pertinent information regarding sample size calculations, authors' perceptions of MCID values of the interventions and interpretation of clinical importance did not survive the editorial process. In addition, our assessment of the adequacy of discussion with respect to clinical importance was subjective. Some disagreements did occur among the reviewers. However, consensus was eventually reached in all cases, with a deliberate tendency to err toward accepting ambiguous statements as indicating the presence of a discussion regarding clinical importance. Thus, our study possibly overestimated the percentage of trials that adequately addressed the issue of clinical importance. Another possible limitation is the small number of journals we reviewed.
An argument can be made that the clinical interpretation of RCTs using surrogate outcomes is problematic. However, we believe that the onus is on the authors of such trials to connect their trial results with the potential clinical implications, because not doing so leaves readers with no guidance regarding the clinical relevance of the results. With the increasing prevalence of published meta-analyses, it may be argued that the need for interpretation of the clinical importance of primary study results is superseded once a relevant meta-analysis is published. However, at the time of publication of an RCT with primary data, it is usually not known whether a meta-analysis on a similar topic will subsequently be published. Thus, the need for primary studies to publish the elements shown in Table 1 remains important. Furthermore, publication of these elements can help provide the basis with which meta-analysts (and their readers) can judge the clinical interpretation of meta-analytic results.
Our findings demonstrate that the reports of RCTs published in major general medical and internal medical journals do not provide consistently the authors' interpretation of the clinical importance of their results. Moreover, they do not contain adequate information to allow readers to assess the clinical importance of the results from their own perspective. Ideally, each published RCT should report an explicit primary outcome, a sample size delta value that reflects the authors' perception of the MCID of their intervention, the statistical significance of the primary outcome, confidence intervals surrounding the point estimates for the primary outcome and an explicit discussion of the clinical importance of the study results. Many of these criteria are already included in the revised CONSORT statement.1 However, we also recommend that authors report the MCID of the intervention, justify its magnitude and explicitly discuss the clinical importance of the results in relation to the MCID.
Footnotes
-
This article has been peer reviewed.
Competing interests: None declared.