Tufts OpenCourseware
Search
Author: Alice Tang, Ph.D.
Objectives
  • Describe the Chi-square test for nominal data
  • Describe the Paired t-test
  • Describe the Two-Sample t-test
  • Describe the One-sample t-test
  • Discuss the Student’s t-distribution
  • Review the Central Limit Theorem
Color Key
Important key words or phrases.
Important concepts or main ideas.

1. The Student’s t-distribution

1.1. Review of the Central Limit Theorem (CLT):

If we were to take a sample of size n from a population and calculate its mean, then sample the same population again and calculate the mean, then sample it again and calculate the mean, and keep doing this many times...then we graph all of those means on one curve, we would get a “sampling distribution”of the mean. This sampling distribution will be normally distributed, with mean = µ and standard deviation = ?/?n.

The CLT is very powerful, but it has two limitations: 1) it depends on a large sample size, and 2) to use it, we need to know the standard deviation of the population.

In reality, we usually don’t know the standard deviation of the population so we use the standard deviation of our sample (denoted as ‘s’) as an estimate.

Standard error (SE) = s/?n

Since we are estimating the standard deviation using our sample, the sampling distribution will not be normal (even though it appears bell-shaped). It is a little shorter and wider than a normal distribution, and it’s called a t-distribution. The t-distribution is actually a family of distributions – there is a different distribution for each sample value of n-1 (degrees of freedom). The shape of t depends on the size of the sample…the larger the sample size, the more confident we can be that ‘s’ is near ‘?’, and the closer t gets to Z.

Because it is not normal, the t-distribution does not follow the 68-95-99 rule, but we can use t-tables or computer programs to estimate the area-under-the-curve (probability) associated with a specific t-score, t = (X - µ) / SE , and a specific sample size.

Z-distribution T-distribution

Again, the t-distribution approaches the normal distribution as n approaches infinity.

2. Statistical tests for Continuous data

2.1. One-sample t-test:

How do we decide if a continuous measure taken on a sample of people is significantly more extreme than we might find by chance alone?

Remember that if we had taken a sample many times, the means collectively would form a curve around the true value that follows the t-distribution. So, whenever we are testing a sample mean, we use a t-statistic with SE in the denominator. Because of this, the larger the sample size, the more values fall into a smaller range.

To test a sample of normal continuous data, we need:

  1. An expected value = the population or true mean
  2. An observed mean = the average of your sample
  3. A measure of spread: standard error
  4. Degrees of freedom (df) = n-1 (number of values used to calculate SD or SE)

Then, we can calculate a test statistic to be compared to a known distribution. In the case of continuous, normal data, it’s the t-statistic and the t-distribution.

t = (observed mean - expected mean)
SE

*Notice that t is a measure of the difference between your data and what you expect to see, in units of standard error. This is a common theme of testing continuous variables.

An example: You would like to see whether your clinic of HIV-positive men has more extreme testosterone levels than you would expect by chance. The lab tells you that, among healthy men, 1) Testosterone levels are normally distributed; 2) the average population testosterone level is 600 ng/dl.

  1. Null hypothesis: Testosterone levels (your clinic) = Testosterone levels (general population)=600.
  2. Alternative hypothesis: Testosterone levels (clinic) ? 600; 2-sided
  3. Set alpha=0.05
  4. Sample your patients: 25 men who happen to visit in July. The results return with a mean testosterone = 500 ng/ml in your patients, SD=200 ng/ml. The average seems pretty good to you; it’s close to 600. You calculate SE= SD / ?n = 200/5 = 40.
    t = 500 - 600 = -2.5
    40

    Your results are 2.5 standard error units below the expected value. The degrees of freedom are n-1 = 25-1 = 24.

  5. You use a computer program or a statistic table (see textbook Table A.4) to look up the t- distribution with 24 degrees of freedom. A t of 2.5 (positive and negative values are handled the same because the curve is symmetric) has the same area of 0.01 in each tail. Because you’re doing a two-tailed test, you need to consider the possibility of both tails, or 2 x 0.0l (again because the normal curve is symmetric). In this case, p=0.02. Under the assumption that the true testosterone value of these patients is 600, the likelihood of getting a mean of 500, or more extreme in either direction from 600, by random sampling alone is only 2%.

    Note: If the sample size is small (less than about 30), a t-statistic greater than approximately 2 will be needed to achieve a p less than 0.05. If you set alpha at 0.01, an even larger t will be needed to achieve statistical significance. The t-value needed to achieve statistical significance with a given alpha and a given sample size is called the critical value. With large samples (>30), a two-sided test, and alpha=0.05, the critical value for t is near 2.0 (because of the Central Limit Theorem that the curve is then normal).
  6. P is less than or equal to alpha, so you reject the null hypothesis. You conclude that the average you saw was unlikely to have occurred by chance alone, and that your patients’ testosterone levels are lower on average than a healthy population.

2.2. Two-sample t-test:

You can also use the t-test to compare two different groups of continuous data as the outcome. Let’s say that you’ve just completed a randomized clinical trial comparing the diastolic blood pressures of hypertensive people treated with either a new drug (n = 100) or placebo (n = 100) for 10 weeks.

H0: DBP (drug) = DBP (placebo), alpha=0.05

After treatment:
Group 1 has mean DBP of 90 (SD=10) and group 2 has mean DBP of 100 (SD=11)

In this case we measure the difference between the DBPs of the two groups. Under the null hypothesis, we propose that this difference equals 0.

Observed difference = DBP (drug) - DBP (placebo) = 90 - 100 = -10
Expected difference = 0

We can calculate an estimate of the SE of this difference from our data. (There is a special complicated formula in the textbook for doing this, but I’ve done it for you):

SE (difference) = 1.1

t= (Observed difference - Expected difference) = -10 - 0 = -9
SE (difference) 1.1

Degrees of freedom = (n1-1)+(n2-1)=(100-1)+(100-1) = 198
This result is 9 SE units below the expected result under H0. Using the t-distribution table with 198 degrees of freedom, the corresponding p-value is <<<0.05

2.3. Paired t-test:

Sometimes data are paired, for example, if you want to know whether diabetics have the same blood sugar after a particular treatment as they did before the treatment. In this case, the “before” and “after” are not independent – they are taken from the same person. What you are testing is the change in the same individual. When your data are paired, you basically create one set of data by calculating each person’s change, then doing a one-sample t-test.

Observed change: Average of the changes in each individual
Expected change: 0
SE = SE of the change
Degrees of freedom = (number of pairs)-1

Testing the means of 3 or more groups of continuous, normally distributed data to see if they are all equal to one another: For this we would use an entirely different test called the analysis of variance, commonly referred to as ANOVA. It also gives us a p-value.

2.4. Testing continuous, non-normal data

  1. All of the above tests assume that your data are normally or approximately normally distributed, or your sample size is large enough to apply the properties of the central limit theorem. But sometimes your data are not normal and your sample size is relatively small. You can try to mathematically transform the data into a normal distribution (for example by taking the square root, or the logarithm of all the values). If you can make them normal, you can use the t-tests or ANOVA.
  2. If the data are still not normally distributed, we use a different class of tests known as “non-parametric” tests, i.e. the Mann Whitney U test. These tests are based on the ranking or ordering of the data, rather than their numerical values.

2.5. Statistical Test for Nominal Data:

Categorical or nominal data is usually tested with the Chi-square test statistic. Here’s an example:

  • Null hypothesis: Cigarette use does not affect the risk of lung cancer in men; or Proportion of smokers who get lung Ca = Proportion of nonsmokers who get lung Ca
  • Alternative hypothesis: The two proportions are not equal (two-sided test)
  • Set alpha = 0.05
  • Study Design: 20-year cohort study of 210 men, ages 30-50 living in Garrett County, MD (convenience sample). After 20 years, we OBSERVE:
    Lung Ca No Ca
    Smokers 25 (A) 75 (B) 100
    Nonsmokers 17 (C) 93 (D) 110
    42 168 210

    Smokers and nonsmokers are the two groups being compared. The data of interest is the rate of lung cancer, which is a categorical variable (yes/no). This is a 2x2 table; it has 4 cells; each is arbitrarily named A-D.

    D. For categorical data, use a Chi-square test statistic:

    ?2 = ??(0bserved-Expected)2
    Expected

    We calculate EXPECTED values under the null hypothesis of no difference between the two groups (smokers/nonsmokers) using the rate of cancer in the whole group of 20% (42/210):

    Lung Ca No Ca
    Smokers 100 x 20% = 20 100 x 80% = 80 100
    Nonsmokers 110 x 20% = 22 110 x 80% = 88 110
    42 168 210

    Then, we can calculate the chi-square test statistic:

    Cell Observed Expected O-E (O-E)2 (O-E)2/E
    A 25 20 5 25 25/20 = 1.25
    B 75 80 -5 25 25/80 = 0.31
    C 17 22 -5 25 25/22 = 1.14
    D 93 88 5 25 25/88 = 0.28
    ?2 = ?[(O-E)2/E] = 2.98

    Getting a p-value:


    • Calculate the “degrees of freedom” (df) = (# rows - 1) * (# columns - 1)
    • For example, a 2x2 table always has: (2 - 1) * (2 - 1)= 1*1 = 1 df

    The probability has been calculated for seeing any particular chi-square value with any number of degrees of freedom by chance alone, under the chi-square distribution. These probabilities can be found in ?2 tables or computer programs. So, we look up the probability of getting this value of 2.98 (or one more extreme) with 1 degree of freedom by chance alone…p=0.09. (By the way, although we’re doing a two-tailed test, we don’t double the proportions in the tails from the table…by design, the chi-square test is always 2-sided).

    P>alpha, so we cannot reject our null hypothesis.

    Conclude: The difference we observed between cigarette smokers and non-smokers in the rate of lung cancer could have occurred by chance alone.

    Note: If there are too few data in a single cell of an r x c table (less than 5 observations per cell), the chi-square test is not accurate. You then need to use a special test, called the Fisher’s Exact test.

3. SUMMARY OF STATISTICAL TESTS:

CATEGORICAL DATA Enough data Too little data (<5 in a cell)
Any r x c table Chi-square Fisher's Exact
CONTINUOUS DATA Normal (even if transformed to normal) or large n Not normal: (non-parametric tests)
One (group) sample 1-sample t-test Kolmogorov-Smirnov
Two samples 2-sample t-test Mann-Whitney U or Rank Sum
Paired data 1-sample t-test on paired differences (paired t-test) Wilcoxon Signed-Rank
Three or more samples Analysis of variance (ANOVA) Kruskal-Wallis

* You are not responsible for memorizing the last (non-parametric column) of the continuous data table, but you should know that it's there and what it's for.

4. Ancillary Material

4.1. Readings

4.1.1. Required

  • Chapter 9, Section 9.3
  • Chapter 10, Sections 10.1 and 10.2
  • Chapter 11, Sections 11.1 and 11.2
  • Chapter 13, Introduction and Section 13.4
  • Chapter 15, Section 15.1