I want to talk very briefly about estimating the portion in a single group and then using a 95% confidence interval in order to assess the precision of that estimate.
In my hypothetical example here, imagine that there may in fact be a fairly large number of humans who have contracted bird flu.
We don't know how many, and many of these may have escaped the attention of medical personnel.
But let's say that at a very early stage we have identified 8 individuals who had confirmed bird flu, and we obtained detailed information on these people. There might be a number of things that we would let like to know about them,
But certainly one of the things we would like to know is what percentage of them, i.e., what proportion of them died as a result of the bird flu.
In this hypothetical example I have eight individuals who have confirmed bird flu, and you can see that I've indicated that four of them died, so the case-fatality rate is 50%. But this is a small sample, and therefore it does not have a great deal of precision. We're not really sure if this would really apply to the entire population of humans who might contract bird flu.
But we can get a handle on the precision by computing the 95% confidence interval for that single proportion.
So, we're not making any comparisons here. We just have a single group, and as you can see, I used my Epi_Tools spreadsheet, specifically the worksheet that says "CI One Group" meaning confidence interval in one group, and in this instance I have entered a numerator of four and a denominator of 8, and it computes an estimated of a portion, which is 50%, and what I'm really interested in is this 95% confidence interval, which you can see ranges from about 21.5% to about 78.5%.
This is a broad confidence interval. We don't have a lot of precision, but we would interpret this by saying the estimated case-fatality rate was 50%. This is our point estimate, and the 95% confidence interval ranges from 21.5% to about 78.5%. With 95% confidence the true proportion lies within this range.
So again we acknowledging that there is a certain degree of imprecision in this estimate because of the small sample size.
But what would have happened if we had, in fact, obtained a larger sample?
This next spreadsheet shows a series of four calculations, and each of them shows a progressively larger sample size shown by the denominator in column B.
This was the original one in row 9. It shows the 4 out of 8 that we originally examined, and the 95% confidence interval was 21.5 to 78.5%.
If we were to get a larger sample, say 16 people, we wouldn't necessarily get the same numerator. It doesn't have to be 50% again, but let's suppose for the sake of argument, just for comparison, suppose that in fact 8 of the 16 died. In that case, the case-fatality rate would again be 50%, but now notice that the confidence interval has narrowed to about
28 to 72%. And if I had a sample of 100 individuals and I were to find that 50% of them died, the confidence interval would then be from 40% to about 60%. And finally, with 1000 individuals, 50% of which died, the 95% confidence interval narrows further to about 47 to 53%. So, you can see that as my sample size has gotten larger, the width of the confidence interval has gotten narrower indicating that I have greater precision in my estimate. I have more confidence in estimates obtained with large samples.
So the precision of these estimates is going to be very dependent on sample size.
The greater the sample size is, the narrower the confidence interval is, and the more precise the estimate is.