Introduction

Link to a Word file with the transcript of the video

Consider two examples in which samples are to be used to estimate some parameter in a population:

  1. Suppose I wish to estimate the mean weight of the freshman class entering Boston University in the fall, and I select the first five freshmen who agree to be weighed. Their mean weight is 153 pounds. Is this an accurate estimate of the mean value for the entire freshman class? Intuitively, you know that the estimate might be off by a considerable amount, because the sample size is very small and may not be representative of the mean for the entire class. In addition, if I were to repeat this process and take multiple samples of five students and compute the mean for each of these samples, I would likely find that the estimates varied from one another by quite a bit. This also implies that some of the estimates are very inaccurate, i.e. far from the true mean for the class.
  2. Suppose I have a box of marbles that are either blue or yellow, and I want you to estimate the proportion of blue marbles without looking into the box. I shake up the box and allow you to select 4 marbles and examine them to compute the proportion of blue marbles in your sample. Again, you know intuitively that the estimate might be very inaccurate, because the sample size is so small. If you were to repeat this process and take multiple samples of 4 marbles to estimate of the proportion of blue marbles, you would likely find that the estimates varied from one another by quite a bit, and many of the estimates would be very inaccurate.

The parameters being estimated differ in these two examples. The first is a measurement variable, i.e. body weight, which could have been any one of an infinite number of measurements on a continuous scale. In the second example the marbles are either blue or yellow (i.e., a discrete variable that can only have a limited number of values), and in each sample the proportion of blue marbles was determined in order to estimate the proportion of blue marbles in the entire box. Nevertheless, while these variables are of different types, they both illustrate the problem of random error when using a sample to estimate a parameter in a population.

The problem of random error also arises in epidemiologic investigations. The basic goals of epidemiologic studies are a) to measure a disease frequency or b) to compare measurements of disease frequency in two exposure groups in order to measure the extent to which there is an association with a health outcome. However, both of these estimates might be inaccurate because of random error.

Essential Questions

  1. How do we differentiate differences that are real versus just due to chance?
  2. How do we assess the uncertainty from samples?

Examples of Where This is Leading:

Learning Objectives

After completing this section, you will be able to: