Important Note

Tufts ended funding for its Open Courseware initiative in 2014. We are now planning to retire this site on June 30, 2018. Content will be available for Tufts contributors after that date. If you have any questions about this please write to

Tufts OpenCourseware
Author: Janet, E.A. Forrester, Ph.D.
Color Key
Important key words or phrases.
Important concepts or main ideas.

1. Samples vs. populations

In order to understand statistical inference as it applies to the medical literature, you must first understand the difference between populations and samples.

Every 10 years the US government does a census of the population in which an effort is made to count and collect basic information from every person in the country. When resources are too scarce for a census, we can do a sample survey in which we select a sub-group of individuals who are representative of the population, collect data from the sample and make inferences (an educated guess) about those measures in the population. Of course, if we did not measure the whole population, there will always be some error in our measurement estimates.We call this error, “sampling error”.

When we do medical research, we take a sample of people representative of the sorts of persons we wish to study – perhaps candidates for a new drug treatment. Any results we get from our research are only estimates of the true effect that the medication will have in the population. The participants in a clinical trial are, statistically speaking, a sample. The population to which we are inferring the results of our clinical trial is the population of “patients like these”- people who might get the new drug once the FDA has approved it and it is on the market. We evaluate the efficacy of a new drug and its safety on a sample of people. Since we know our results are just estimates, we must be cautious about how we interpret the results of our study for the public, who want to know whether we are confident that the new drug is really effective and safe based on limited sample.

So that we do not get confused about whether we are talking about the characteristics of a population (called parameters), or the same characteristics, estimated from a sample (called statistics), we use different symbols to express parameters and statistics.Greek letters are used to express any feature of a population, roman symbols are used for the characteristics of a sample.Note, the parameter has no sampling error associated with it, because it is a measure of the whole population. The statistic has error associated with it because it is taken from a sample. So, when you hear on the news a report of the percentage of the US population under 20 from the census data, there will be no error range reported. However, should the news be describing a poll on voter preferences, estimated from a sample, you will often hear a reported error range of plus or minus so many percentage points.

1.1. The easiest type of sample is a truly "simple random sample" taken from the whole population of interest

Example: You want to know what percentage of people with tuberculosis (TB) in Massachusetts smoke cigarettes. Massachusetts keeps a registry of all the thousands of TB cases in the state (TB is a mandatory reportable disease). So, using a computer-generated random numbers program, you randomly select 200 people with TB from the registry, and try to establish the smoking history of each person. From this you can calculate the percentage of smokers among people with TB

1.2. If you are interested in different subpopulations (e.g. men and women separately), to make sure they are both well represented, you can create a "stratified sample".

Example: You want to know if a greater percentage of men diagnosed in Massachusetts with TB smoke cigarettes than women diagnosed with TB. Since there are many fewer women with TB in the state, if you took a simple random sample, you would probably not get many women in your sample of the registry. To get a stratified sample of men and women, you separate the men's and women's lists and then you take a simple random sample of 100 men with TB and a simple random sample of 100 women with TB from the separate lists. Then you try to establish the smoking history of each person selected.

1.3. The most common samples used in medical research are not random samples, but are based on the availability of people to participate in your research study. These are called "convenience samples".

  • People who find out about and refer themselves to enroll in a study.
  • Patients who are seen in your office; people who died in your Health Department region, etc.

We assume that the people in the convenience sample we are using to make inferences about a population of people really do represent the population we wish to infer to. As you read medical studies, you should ask yourself to what extent that is the case in each study, and how that might affect how you would infer the results of the study to your own patients.

Because samples have error in them, we can never be sure if the results of our study, say of cholesterol levels between treatment and placebo groups in a RCT, is due to sampling error or whether the new treatment really had an effect on reducing cholesterol. In order to answer that question we need to know something about the expected error range in the statistics we are calculating from our study samples. The Central Limit Theorem (CLT) tells us about the relation between the true population value and the amount of error in the sample statistics. The CLT is key to truly understanding inferential statistics

2. The Central Limit Theorem

In order to understand the confidence intervals and p values that are reported in the medical literature, you must first understand the concepts expressed in the central limit theorem, (CLT). In class, we will use the example of the mean of a sample, Median Mathematical Symbol to illustrate the CLT. Our example is height data from a "population" of Mexican children. (In reality, the "population" is a itself sample of 500 children). The students in the medical class that graduated in 2002 were asked to each draw a random sample of size 5 and another random sample of size 10 from this "population" of 500 children and calculate the mean height of each sample. We then plotted the mean heights from the all the students' samples to illustrate the CLT in action.

This plot of mean heights from all the samples has a special name: it is called a "sampling distribution" - because it is the distribution of a statistic (the mean) derived from a series of samples. The CLT tells us that if we take many random samples from a population, then calculate the means from each sample and plot the means, that the plot will have the following characteristics:

  1. The plot of the sample means will tend to be Gaussian in shape - even if the population characteristic did not have a Gaussian distribution (The characteristic in this example is heights of little kids in a village in Mexico)
  2. The mean of the sample means is equal to the population mean,
  3. The standard deviation of the sampling distribution is dependent on both the standard deviation of the population distribution and the size of the sample (here 5 vs. 10).

The larger the sample size, the smaller the standard deviation of the sampling distribution.

The standard deviation of the sampling distribution has a special name: It is called the "standard error of the mean" (SEM or SE). "Standard error" is a term reserved to describe only the standard deviation of sampling distributions, and therefore, it is not interchangeable with the term standard deviation - which can refer to a measure of the spread of any distribution. In other words, a standard error is a special type of standard deviation - σ from a sampling distribution of a statistic.

As we said, the standard error of the sampling distribution is dependent on both the standard deviation of the population distribution and the size of the sample. The formula describing the relationship between the population standard deviation, σ, the size of the sample, N, and the standard error of the mean is:

SE = σ / Square Root with N Smbol

From this formula you can see that there will be tend to be more error in estimates made from small samples than estimates made from large samples.

The data calculated from the medical students shows you that there were a lot of different sample means calculated - reflecting sampling error. However, most of the sample means fell on or close to the true mean height of all 500 children (112.3 cm). It was unusual that a sample mean fell very far away from the true, population mean. The standard error of the plot of sample means was smaller for the means calculated from samples of size 10 than from the means calculated from samples of size 5.

The distribution of the sample means was more Gaussian in shape that the population heights. Since the sampling distribution of the mean is Gaussian, it must adhere to the 68-95-99% rule. Thus, 95% of all the medical students sample means would be expected to fall within 2 SE of the true population mean (i.e. the mean height of all the 500 children, 112.3 cm). This statement is very powerful, if you pause to think about it, because it implies that any one sample would be expected to fall within 2 SE of the true population mean about 95% of the time, and only about 5% of samples would be expected fall more than 2 SE away from the true mean.

If we turn this statement around we can say that the true population mean must, therefore, fall within 2 SE of the sample mean. We use this information to calculate a confidence interval around our statistic - in this example, our sample mean.

3. The interpretation of a confidence interval (CI)

The confidence interval, like the CLT, is interpreted by imagining many samples being drawn, many means being calculated and many confidence intervals calculated around those means.

The interpretation of the 95% confidence interval is that 95% of confidence intervals calculated will have the true population parameter included in the range bounded by the maximum and minimum value in the CI.

While your text book, Gordis, does not agree with us on this point, we believe that for practical purposes you can think of this as meaning that you can be 95% sure that the true population parameter lies within the confidence interval you calculated around your sample statistic.

In the example used here, if I had asked the medical students to calculate a confidence interval around each of their sample means, we would expect that 95% of those confidence intervals would have contained the mean height of the population of 500 children, 112.3 cm. We also would have expected 5% of the CI not to include the mean of 112.3.

Before we continue, you should realize that this lecture has demonstrated to you the theoretical underpinnings of statistical inference. We don't repeat studies in exactly the same way that we did above. We only used repeated samples above to illustrate the CLT. In reality, any one study, as expensive as it may be, is going to boil down to one sample statistic - just one. That statistic might be the difference in mean cholesterol between a new treatment group and a group receiving placebo, or it might be difference in the proportion of patients alive after 5 years following new vs. conventional treatment for cancer. However, the CLT tells us that, despite the random error inherent in our one sample, we would expect to be about 2 SE from the true answer in 95% of clinical studies we conduct. It is this ability to quantify our margin of error that allows our one study to be of some use to the public and allows us to say, cautiously, whether a new drug is really effective and safe based on one study.

4. Ancillary Material

4.1. Readings

4.1.1. Required

  • Read Pagano 8.1, 8.2
  • Read Pagano 9.1
  • Read Pagano 22.1 (Note: Pagano's definition of selection bias here differs from ours. See Gordis 204-206 for our definition of selection bias)