1. Were the patients randomized, and was the randomization
concealed? Randomization is the only way to equalize study
groups for known and unknown confounding factors. A cohort study
with carefully matched groups might achieve the former but not the
latter.
2. Were treatment and control patients similar at the
start of the study (ie, did randomization work)? Most studies
provide a table of demographic data and other potentially relevant
characteristics of the individuals in the study groups. Comparability
of the groups is necessary to assure that randomization was effective.
3. Were patients analyzed in the groups to which they
were randomized (intent to treat)? What should be done
with participants who do not follow the assigned protocol? If 10% of
children who were supposed to get thickened formula received regular
formula instead, should they be counted in the treatment group?
In the control group? Dropped from the study? The third option would
provide a solution except for two problems. In real practice, noncompliance
is common. The clinically important result is how a potential treatment
performs in real-world use. Moving nonadherent participants into
a different group might reduce the benefit of randomization, since
a factor relevant to the study’s outcome also might affect
compliance with the intervention (eg, children with the worst reflux
may have given up on the thickened formula and tried something else).
4. Were clinicians, patients, and study personnel blinded
to the group assignment? If clinicians or patients know
which intervention is being received, they might introduce bias
into treatment plans or assessments and either exaggerate or minimize
the true treatment effect.
5. How many patients were lost to follow-up? Was the
length of follow-up reasonable? If a large number of patients
drop out of a study, it is difficult to assess the outcome in each
group fairly (especially if, for instance, the patients dropped
out because the new therapy was not working or had intolerable side
effects). Some loss to follow-up is unavoidable. One way of assessing
whether the loss is acceptable is by using a worst-case scenario
analysis. The authors could assume the worst case, that all of the
children lost to follow-up did poorly, and recalculate the results
using that assumption. If the results still indicate a benefit,
the conclusion is strengthened. As a rule of thumb, loss to follow-up
of more than 20% significantly reduces the validity of
the study.
6. Is it possible that the results could have been due
to chance (P > .05)? Another threat to the validity of
a study is that the difference between groups is not real but is due
simply to chance (a type I error). The probability of this error
is expressed as P. By convention, up to a 5% likelihood
of the results occurring by chance (P ≤ .05) is accepted.
7. Is it possible that results could have been skewed
by a small sample size? (When does n matter?) In
a “positive” study (ie, a difference is demonstrated
between intervention and control groups), statistical significance
(P < .05) assures that the sample size (n)
was adequate. In a “negative” study (ie, no statistically
significant difference between groups), it is possible that a real
difference between the study groups was missed (a type II error,
or β). To decide if the sample size was large enough to
avoid this error, the clinician must examine the methods section
of the study for an explanation of how the sample size was determined.
The authors should specify their sample size calculation based on the
difference in response they want to detect, the estimated success
rate in the control group, and the minimum acceptable probability
of correctly finding a difference (1 – β,
also known as power and generally set at 80% by convention).
Using these assumptions, the authors can calculate—and
should state—the sample size needed to demonstrate a difference.
Additional measures are often useful beyond statistical significance:
relative risk, odds ratio, absolute risk reduction, and the number
needed to treat (the inverse of absolute risk reduction).