Grading evidence

Gerald J.M. Tevaarwerk

doi:10.1503/cmaj.1031694

Holger Schünemann and his colleagues in the GRADE Working Group1 have taken an important first step in developing a universally acceptable grading system for denoting the quality of evidence and associated recommendations. Current systems use alphabetic, numeric or colour-coded nominals or ordinals, which represent discontinuous, qualitative (or at most semiquantitative) and hence imprecise categories.2^,3 In many instances the resulting imprecision exceeds that of the information it attempts to convey. I would like to suggest an alternative.

The most informative and sophisticated form of measurement uses a continuous scale with consistent intervals.2 Interval scales are common in clinical medicine, being used for measurements of blood pressure, temperature, heart rate and weight, and for nearly all laboratory measurements. The consistent intervals allow values to be combined as averages and deviations. If the scale starts at zero, it becomes a ratio scale, which allows ratio statements such as “twice as big” or “half as much.”2

My suggestion is to implement a system already in common use throughout the world: the 100-interval ratio scale, which is widely used for currencies and for grading performance and which is based on the most common counting practice, the decimal system.

The use of this scale to grade the quality of scientific measurements would not be new. Statistical confidence limits around a point estimate are expressed as percentages, as are the sensitivity, specificity and predictive values of diagnostic interventions.2^,4 Probabilities and likelihoods may be expressed on the 100-interval scale or can be readily converted to it, while utility, the relative value of alternative choices, is also often expressed as a value out of 100.4 Perhaps most importantly, Bayes' factors, the mathematical expression of how disease indicants modify diagnostic hypotheses, fit well with the 100-interval scale and may also be used for therapeutic interventions.4^,5^,6 This scale even facilitates the use of odds, as odds to the base 100 are equivalent to percentages.6

The disadvantage of the system is that it may give a sense of precision that does not exist. For example, clinicians primarily use subjective degrees of belief in diagnostic reasoning. Indeed, even objective observations are expressed with confidence intervals, not just as point estimates. However, a 100-interval ratio scale seems preferable to a system of only 4 grades that do not fit either the Bayesian type of reasoning used in clinical practice or the clinical decision analysis that is increasingly recommended for use in complex clinical and policy problems.7

Gerald J.M. Tevaarwerk Professor Emeritus University of Western Ontario Zutphen, the Netherlands

References

1.↵
Schünemann HJ, Best D, Vist G, Oxman AD, for the GRADE Working Group. Letters, numbers, symbols and words: how to communicate grades of evidence and recommendations [editorial]. CMAJ 2003;169(7):677-80.
OpenUrl Abstract/FREE Full Text
2.↵
Hassard TH. Understanding biostatistics. St. Louis: Mosby Year Book; 1991.
3.↵
Upshur REG. Are all evidence-based practices alike? Problems in the ranking of evidence [editorial]. CMAJ 2003;169(7):672-3.
OpenUrl FREE Full Text
4.↵
Dawson-Saunders B, Trapp RG. Basic and clinical biostatistics. Norwalk (CT): Appleton & Lange; 1994.
5.↵
Tevaarwerk GJM. Measuring the efficacy and cost-effectiveness of laboratory tests. Ann R Coll Phys Surg Can 1995;28(6):217-20.
OpenUrl
6.↵
Tevaarwerk GJM. A Bayesian approach to treatment efficacy: therapeutic likelihood ratios and odds change values. Ann R Coll Phys Surg Can 1998; 31(7):319-26.
OpenUrl
7.↵
Lusted LB. Introduction to medical decision-making. Springfield (IL): CC Thomas; 1968.