Tufts OpenCourseware
Search
Color Key
Important key words or phrases.
Important concepts or main ideas.

1. Introduction:

As noted in lecture one, when critiquing an article in the medical literature the reader must consider both internal and external validity issues. This lecture will focus on the types of problems that can creep into a study and lead to an incorrect study conclusion. These “mistakes” of internal validity can either overestimate or underestimate the true relationship (association) between exposure and outcome.

There are three general categories of threats to internal validity that must be considered: chance, bias and confounding. Evaluating the role of chance is the subject of lectures appearing later in the course. You will learn that there is no 100% guarantee that the study results are or are not due to chance alone.

2. Bias

Gordis defines bias as “any systematic error in the design, conduct or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of disease.” There are many types of biases, some studies being particularly prone to one type or another. Epidemiology texts sometimes define specific biases in different ways. This is not a big deal. What matters most is that you can spot a “systematic error” in a study, not the various ways the concept is defined.

2.1. Information Bias

It should be obvious that if information from a study is erroneously gathered, the conclusions from the study might be wrong. This is the “garbage in, garbage out” analogy from the computer world. There are various kinds of information bias.

2.2. Misclassification Bias

Misclassifying exposure or outcome status is not a good thing. Suppose investigators conduct a prospective cohort study with four groups of subjects: those who exercise vigorously, moderately, minimally or not at all. Such groupings might be based on questionnaires completed by subjects and/or interviews with the subjects. If these methods lead to misclassifications of exposure status, bias may have been introduced into the study.

How this bias might affect the study’s outcome depends on how the misclassification occurred. For example, were all groups misclassified to the same degree and how much misclassification likely occurred? Misclassifying the exposure status of two out of 1,000 subjects is unlikely to lead to an erroneous study conclusion, but misclassifying the exposure status of 100 of 1,000 subjects might be a significant problem. This concept of misclassifying exposure status is contained in the first analysis point using the so-called McMaster’s method.

Outcomes can also be misclassified. For example, incorrectly concluding that some subjects had an MI when they did not can lead to bias. How much bias depends on the type of misclassification and how much misclassification likely occurred.

Misclassification that occurs in the same proportion in each group is called nondifferential or random misclassification. For example, suppose the outcome in a study of exercisers and non-exercisers is MI. If 20% of exercisers were erroneously classified as having a MI, and 20% of the non-exercisers were erroneously classified as having a MI, nondifferential misclassification of the outcome occurred.

Misclassification that occurs in different proportions in each group is called differential or non-random misclassification. Using the same example in the paragraph above, if 20% of exercisers were erroneously classified as having a MI, and 5% of the non-exercisers were erroneously classified as having a MI, differential misclassification of the outcome occurred.

Said another way, if misclassification of the outcome occurs to the same degree in each exposure group there is random misclassification; conversely, if the misclassification of the outcome occurs differently between the exposure groups there is non-random misclassification.

If misclassification of the exposure status occurs to the same degree in each outcome group there is random misclassification; conversely, if the misclassification of the exposure status occurs differently between the outcome groups there is non-random misclassification.

3. Effects of Random vs. Non-Random Misclassification Bias

The effects of random and non-random misclassification are different.

3.1. Random Misclassification

Random misclassification always results in an underestimate of the true association. For example, the true relative risk in a study might be 3.0 but random misclassification might lead to a reported relative risk of 2.0. Or, the true relative risk might be 0.1 but random misclassification might lead to a reported relative risk of 0.6. As noted by Gordis, “we are less likely to detect an association even if one really exists” when random misclassification occurs.

Said another way, random misclassification pushes the study results away from truth towards the null hypothesis. (You will learn about the null hypothesis in the statistics lectures.)

3.1.1. Avoiding Random Misclassification:

This can be tough. However, excellent methods to measure and/or diagnose an exposure status or outcome are helpful.

3.2. Non-Random Misclassification

Differential misclassification can either overestimate or underestimate the true association, depending on the situation. For example, the true relative risk in a study might be 2.0 but differential misclassification might lead to a reported relative risk of 2.5 or 1.5, depending on the situation. Or, the true relative risk in a study might be 0.5 but differential misclassification could lead to a reported relative risk of 0.2 or 0.8, depending on the situation.

As noted by Gordis, “differential misclassification bias can lead either to an apparent association even if one does not really exist or to an apparent lack of association when one does exist.”

Said another way, differential misclassification either pushes the study results away from truth toward the null hypothesis or away from truth in the opposite direction of the null hypothesis, depending on the situation.

3.2.1. Avoiding Differential Misclassification:

Using blind studies, for subjects and/or researchers, avoids differential misclassification of outcomes.

3.3. Surveillance Bias

Surveillance bias, what some texts call detection bias, occurs when one group is followed more closely than the other group.This could lead to an outcome being diagnosed more often in the more closely followed group, but not because it truly occurred more often in that group. This is a type of non-random misclassification bias.

For example, physicians may be concerned that subjects on Med A will develop guaiac positive stools - blood in the stools - that can be asymptomatic but detected with guaiac solution. No similar concern exists for subjects on Med B. The study protocol allows patients to be followed as determined by their physicians. The concern about guaiac positive stools might lead physicians to follow subjects on Med A more closely than those on Med B. This could result in more guaiac tests for patients on Med A vs. Med B, leading to a potential erroneous conclusion that guaiac positive stools occur more frequently in subjects on Med A vs. Med B.

3.3.1. Avoiding Surveillance Bias:

If possible, study protocols should specify that subjects will be followed at the same time intervals with identical tests performed at each visit.

3.4. Recall Bias

This is a bias unique to case control studies that rely on information provided by the subjects resulting in non-random misclassification bias. The notion is that because subjects are aware of their health status as cases or controls, such knowledge might lead to a differential recall of an exposure status. “Thus a certain piece of information, such as a potentially relevant exposure, may be recalled by a case but forgotten by a control,” notes Gordis. Say Hennekens and Buring in their text, “Individuals who have experienced a disease or other adverse health outcome tend to think about the possible ‘causes’ of their illness and thus are likely to remember their exposure histories differently form those who are unaffected by the disease.”

3.4.1. Avoiding Recall Bias:

This can be tough. Researchers need to recognize this potential bias when interpreting study results. Gordis says there are few actual examples demonstrating that recall bias has been a major problem leading to erroneous study associations. However, he adds, “the potential problem cannot be disregarded, and the possibility for such bias must always be kept in mind.”

3.5. Reporting Bias

For a variety of reasons, including issues of social desirability or sensitivity, subjects may not be willing to report an exposure accurately. When researchers gather baseline characteristic data, subjects may underestimate the amount of alcohol they drink, cigarettes they smoke, illicit drugs they use, etc. If this occurs equally in both groups, to a material degree, it leads to random misclassification bias.

3.5.1. Avoiding Reporting Bias:

Sometimes other sources can verify information.

3.6. Interviewer Bias

This occurs when data collection methods differ between groups resulting in non-random misclassification bias. In a case control study, for example, an interviewer might ask more probing questions of cases than controls, and such probing could lead to an overestimate or underestimate of a true exposure status.

3.6.1. Avoiding Interviewer Bias:

Blinding the interviewer to the status of the case or control will avoid differential probing. In addition, interviews should be standardized such that the interviewer cannot deviate from the script.

3.7. Selection Bias

Selection bias occurs in a case control study when the method by which cases and controls are selected is associated with exposure status.To avoid selection bias, the selection of cases and controls should be such that they are representative of the population from which they are selected.

Suppose a case control study is conducted to assess a possible association between Disease X and drinking whiskey. If 20% of patients with Disease X in the population drink whiskey, then 20% of Disease X subjects in the study should also drink whiskey. Conversely, if 5% of controls in the population drink whiskey, then 5% of the selected controls should drink whiskey. In reality, of course, one does not know the population data, which is why the study is being conducted. The notion, however, is that the process of selecting cases and controls should be such that cases and controls represent their actual exposure in the population from which they were drawn.

An example of selection bias follows. Suppose a researcher wants to assess if there is an association between rheumatoid arthritis (RA) and smoking. The researcher announces the purpose of her study to colleagues who care for RA patients.

The researcher randomly selects 200 controls from outpatient department records. Cases enter the study as follows. A colleague of the researcher sees a RA patient in his office and says, “I note from your medical history that you smoke. I know a physician doing a study on RA and smoking and I’d like you to enter her study.” The patient enters the study, as do many more patients under the same scenario.

The researcher then notes that 100% of the 200 RA subjects smoke while only 10% of the controls smoke. This selection bias leads to an overestimate of the true odds ratio between RA and smoking.

Selection bias can also occur in retrospective cohort studies if the way subjects are selected is associated with the disease (outcome).Selection bias cannot occur in RCTs and prospective cohort studies.

3.7.1. Avoiding Selection Bias:

The best way to avoid selection bias is to properly plan the study’s design trying to anticipate the ways selection bias could enter the study, with resulting plans to avoid the bias.

3.8. Loss-To-Follow-Up Bias

This topic pertains to one of the McMaster’s internal validity criteria discussed in Lecture 1 - Critiquing a Randomized Controlled Trial. Suppose an RCT has two arms, Med A and Med B, with the outcome being MI. Suppose 30% of the subjects assigned to Med A are lost to follow-up while only 10% of the subjects assigned to Med B are lost to follow-up. Subjects assigned to Med A might be lost to follow-up because they developed warning symptoms of an MI (unstable angina) and therefore left the study to seek treatment elsewhere.

If this occurs, the Med A arm is left with the “cream of the crop” subjects for analysis, i.e. the subjects who are less likely to develop an MI than those who left the study. The subjects remaining in the Med A arm for analysis are therefore not representative of all the subjects originally assigned to the arm. This could introduce bias into the study results.

Loss-to-follow-up bias is especially a concern if there is a difference in loss-to-follow-up between the treatment arms. If each arm loses the same percentage of subjects, and the subjects leave for the same reasons, the study will not be affected in the same manner as noted in the example above.

3.8.1. Avoiding Loss-To-Follow-Up Bias:

Recruiting subjects who are likely to adhere to the study’s protocols is important. In addition, researchers should make efforts to determine why subjects left the study and whether or not they developed the outcome of interest. This information can then be considered in the study’s analysis.

4. Confounding

Suppose a researcher conducts a prospective cohort study to determine if there is an association between MIs and drinking alcohol. One group consists of subjects who drink alcohol and the other consists of non-drinkers. Both groups are followed for 10 years. The crude data indicate that those who drink alcohol are twice as likely to have an MI vs. those who do not drink alcohol.

This was an unexpected result. The researcher expected to see a protective effect in the alcohol cohort given recent literature reports suggesting alcohol can reduce the incidence of heart disease.

After further review of the data, the researcher notes that 50% of the subjects were smokers in the alcohol arm while only 10% of the subjects in the non-alcohol arm were smokers. The researcher has therefore identified an association between alcohol use and smoking, i.e. alcohol drinkers were more likely to smoke than non-drinkers. This is a potential problem for the study as smoking is an independent risk factor for the outcome, MI. Maybe the drinkers had a larger incidence of MIs because they smoke and not because they drink alcohol.

After the researcher made the appropriate statistical adjustments to address the uneven presence of smokers in each arm, he reported that the smoking-adjusted relative risk was 0.9 for drinkers vs. non-drinkers.

In this example, smoking led to an overestimate in the crude relative risk before the appropriate adjustment for the unequal presence of smokers in each cohort was made.

So, confounding occurs when two conditions are met: (1) There must be an association between a variable (third factor) and the exposure status. As noted in the Gordis text, the confounding variable is associated with the exposure but it is not the result of the exposure. In the diagram below, smoking is associated with drinking alcohol but it is not the result of drinking alcohol. (2) The variable must also be an independent risk factor for the outcome.An additional caveat is that the variable cannot be an intermediate between the exposure and the disease, e.g. cigarette smoking to pre-cancerous lung cells to cancerous lung cells. In this case, the presence of pre-cancerous lung cells is an intermediate, not a confounder.

Confounding

5. Not Confounding:

When considering the potential effect of confounding on study results, first consider how much confounding actually occurred. For the example above, if 90% of the drinkers were smokers and only 5% of the non-drinkers were smokers, there would be more confounding than in a study having 15% of the drinkers as smokers and 5% of the non-drinkers as smokers. That’s logical.

In addition, one should also consider how strongly the confounder is associated with the outcome. Smoking is a significant risk factor for MIs, i.e. it is strongly associated with MIs. As having a type A personality is weakly associated with MIs, an unequal distribution (association) of type A personalities between drinkers and non-drinkers might not have a significant effect on the study results.

6. Avoiding and Addressing Confounding

When planning a study, researchers should think about methods to avoid confounding. If confounding does occur, then it must be addressed to avoid potential erroneous study conclusions. “Failure to take confounding into account in interpreting the results of a study is indeed an error in the conduct of the study and can bias the conclusions of the study,” Gordis notes. There are methods in study design to avoid confounding, and there are analytical methods to address confounding when it occurs.

6.1. Study Design:

6.1.1. Restriction:

Subjects with known potential confounders are not allowed to participate in the study. This method can be used in case control studies, cohort studies, and RCTs. For example, cigarette smokers and subjects with high cholesterol levels might be restricted from participating in a study with heart disease as an endpoint. The drawback to restriction is that the potential pool of subjects is reduced, and issues of generalizability of the study results to restricted patients can be raised.

6.1.2. Matching:

For each subject with a potential confounder in one study arm, a subject with the same potential confounder is selected for the other study arm. For example, a case control study might have thrombocytopenia (low platelets) as the disease status, while the exposure is a recent heparin treatment.

Since alcohol might itself be associated with thrombocytopenia, for each case subject who drinks alcohol, a control who drinks alcohol will also be selected. The study will therefore be matched for alcohol.

Matching can be used in case control and cohort studies. It cannot be used in RCTs where subjects by definition are not hand-selected. It is difficult to match for all the potential confounders that might exist within a study. In addition, there are problems that arise with so-called overmatching, matching for a factor that is not a true confounder.

6.1.3. Conduct a RCT:

If it is ethically and reasonably possible, conducting a RCT will tend to avoid confounding because of randomization.However, there is no guarantee that chance will work.

6.2. Study Analysis:

6.2.1. Adjustment:

Pagano notes that a “crude rate is a single number computed as a summary measure ……..It disregards differences caused by age, gender, race and other characteristics.” These differences are potential confounders that could lead to erroneous study conclusions if they are not considered and addressed in an analysis.

One method of adjustment is the mathematical tool of multivariate analysis, which will be discussed later in the course. This method can yield adjusted rates for one or more confounders.

Sometimes authors perform multivariate analysis to determine if a factor is a confounder in the study. So, for example, they might report a crude relative risk as 1.5 for a study, and then say that after performing multivariate analysis, the age-gender adjusted relative risk was also 1.5. As the relative risk did not change, age and gender were not confounders.

In contrast, another author might report a crude relative risk of 3.0, and an age-adjusted relative risk of 5.0. As the relative risk changed after adjustment, age was a confounder in the study. (Assume statistical significance.)

6.2.2. Stratification:

Gordis cites an example of urbanization and air pollution as a possible confounder in a study assessing an association between lung cancer and cigarette smoking. To address this issue, the lung cancer rates of smokers vs. non-smokers are assessed for individual strata: no urbanization, slight urbanization, and town and city urbanization.

Says Gordis, “If the relationship of lung cancer to smoking is due to smoking, and not to the confounding effect of pollution and/or urbanization, then in each stratum of urbanization the incidence of lung cancer should be higher in smokers than in nonsmokers. It would then be clear that the observed association of smoking and lung cancer could not be due to degree of urbanization.”

There are also methods called direct and indirect adjustments, the former requiring a knowledge of strata specific rates while the latter does not.

7. Interaction

Whereas the previous discussions have focused on threats to internal validity, interaction, also called effect modification, is an external validity issue. Interaction does not invalidate study results. Rather, a failure to recognize and report interaction, if it exists, is a missed opportunity to note differences in outcomes, or associations, between subgroups in the study.

This notion was briefly introduced in Lecture 1 - Critiquing a Randomized Controlled Trial. The concept is that although an overall relative risk can be reported, various strata within the study might have different relative risks. In this case, the overall relative risk is actually a weighted average of the relative risks in each stratum. Knowing that the relative risks vary in different strata might be important for a clinician to know.

Interaction answers the questions, “Is the relationship between the exposure and the outcome the same or different across various strata, or subgroups, within the study.” For example, suppose the relative risk for subjects assigned to Med A vs. Med B in the prevention of MIs is 0.8 over a decade. This means, of course, that subjects assigned to Med A had 80% of the risk, or 20% less risk, of having an MI during the decade than subjects assigned to Med B. (Assume statistical significance.)

Now assume that the authors looked at the relative risk by a stratum, as men vs. women. The authors reported that the relative risk of men on Med A having an MI over the decade was 1.3 compared to men on Med B. The relative risk for women on Med A having an MI over the decade was 0.4 over the decade compared to women on Med B.

Notice that the relative risk for Med A vs. Med B was different by a so-called third factor, this being gender. Women had a lower relative risk than men, 1.3 vs. 0.4. (Assume statistical significance.) This is an example of interaction. The overall relative risk of 0.8 was a weighted average of the number of men and women in the study, each group having a different relative risk.

7.1. How Do Interaction and Confounding Differ?

Key statements from the Hennekens and Buring text:

  • Confounding is a nuisance effect resulting in a distortion of the true relationship between the exposure and risk of disease.
  • The aim is to control confounding and eliminate its effects.
  • Effect modification is to be described and reported, not controlled.

7.1.1. Consider the following Table I showing baseline characteristis for a study of Med X vs. Med Y in the prevention of MIs:

Consider the following Table I showing baseline characteristics for a study of Med X vs. Med Y in the prevention of MIs:

Table 1
Characteristic Med X Med Y
Mean Age 60 60
Mean Cholesterol Level 220 220
% Smokers 10 10
% Hypertensive 20 20
% Female 40 40
% Who Exercise Regularly 5 5

The authors report that after a decade of study, the relative risk of developing an MI in subjects assigned to Med X vs. Med Y was 1.8. The authors did not comment on confounding or adjustment issues for these six characteristics.

Question: Could this study be confounded by any or all of the 6 characteristics listed in Table 1?
Answer: No. All of these characteristics, potential confounders, were perfectly distributed in Med X and Med Y subjects. If one did a multivariate adjusted for any one or all of these factors, the relative risk would remain at 1.8.
Question: Is it possible that there could be interaction by age?
Answer: Yes. Suppose the authors analyzed the relative risks for the following strata: age 40-50; age 50-60; and age 60-70. Further, assume that the characteristics of the other five baseline characteristics were the same in the Med X and Med Y arms for each stratum thus eliminating the possibility of confounding in the strata by these characteristics. If the authors find a relative risk of 1.0 in the age 40-50 stratum; 1.6 in the 50-60 stratum; and 2.3 in the 60-70 stratum, then interaction exists. (Assume statistical significance.) The association between the exposure (Med) and outcome (MI) was affected by a third factor, decade of life. These observations should be reported in the study.

Consider the following Table 2 showing baseline characteristics of Med S vs. Med T in the prevention of strokes:

Table 2
Characteristic Med S Med T
Mean Age 55 55
% Females 60 60
% Smokers 15 15
% Hypertensive 10 30

The authors report that after a decade of study, the relative risk of a stroke in subjects assigned to Med S vs. Med T was 0.6. The authors did not comment on confounding or adjustment issues.

Question: Could this study be confounded by age?
Answer: No. The mean age was the same in each study arm.
Question: Could this study be confounded by age and smoking status?
Answer: No. Both factors are equally distributed into each arm.
Question: Could this study be confounded?
Answer: Yes. There is an unequal distribution of hypertensives, a potential confounder, in each study arm. The authors should report a hypertension-adjusted relative risk. If the hypertension-adjusted relative risk remains at 0.6, there was no confounding by hypertension. If the hypertension-adjusted relative risk is not 0.6, the study was confounded by hypertension. (Assume statistical significance.)
Question: Is it possible that there could be interaction by gender? .
Answer: Yes. Suppose the authors analyzed the data by gender and determined that the hypertension-adjusted relative risk for women on Med S vs. Med T was 0.4 and the hypertension-adjusted relative risk for men on Med S vs. Med T was 0.7. In this case, there was interaction by gender. Note that the relative risk for women was below 0.6 and the relative risk for men was above 0.6. Hence, the overall reported relative risk of 0.6 was a weighted average of the number of men and women in the study
Question: Can a variable be both a confounder and an effect modifier in a given study? .
There is disagreement among epidemiologists on this so we will not address this “debate” in our course.

Information for this lecture was gathered from sources including the following: Epidemiology by Leon Gordis; Principles of Biostatistics by Pagano and Gauvreau; Epidemiology in Medicine by Hennekens and Buring.

8. Ancillary Material

8.1. Readings

8.1.1. Required

  • Read Chapter 14, More on Causal Inferences: Bias, Confounding, and Interaction, Gordis Text

8.1.2. Recommended

  • Epidemiology in medicine / Charles H. Hennekens, Julie E. Buring ; edited by Sherry L. Mayrent ; foreword by Sir Richard Doll. Boston : Little, Brown, c1987.
  • Principles of biostatistics / Marcello Pagano, Kimberlee Gauvreau. Australia ; Pacific Grove, CA : Duxbury, c2000.