Using Spreadsheets in Public Health
Excel (PCs) and Numbers (Mac & iPad)

Introduction


Spreadsheets are very useful tools in public health because they are widely available, and can be used for collecting data, statistical analysis, constructing graphs and tables which can be exported into other applications or converted into image files. The basic function of Excel spreadsheets (for PCs) and Numbers (for Mac and iPad) is similar; they all consist of a matrix of rows (1, 2, 3, 4, etc.) and columns (A, B, C, ... Z, AA, AB, AB, etc.) which can be used to store text or numeric data and to perform a wide variety of mathematical and other functions that greatly facilitate analysis, organization, and reporting.

However, there are differences between Excel and Numbers in the user interface and the menus, and there are also differences among different versions of Excel. Nevertheless, the basic layout and functions within all of these are quite similar, and it is also noteworthy that Excel spreadsheets will open and run in Numbers, and Numbers spreadsheets can be "Saved as" Excel spreadsheets.

This tutorial will provide videos that illustrate the user interface for various versions of Excel and for Numbers '09. These will introduce users to whichever version of Excel or Numbers they have. The examples that we have developed to illustrate the application of spreadsheets in public health practice have been created in Excel, but the example spreadsheets can be downloaded and opened in whichever spreadsheet application you are using, and the examples are more or less generic and should be easily adapted to the spreadsheet you are using.

 

 

Excel 2010


Basics

The figure below shows and empty spreadsheet in Excel 2010. The top rows show the main menu elements (File, Home, Insert, etc.), and beneath that is a toolbar which provides quick access to a number of commonly used tools. The bottom of the figure shows the matrix of columns and rows that comprise the actual spreadsheet. This window shows just a tiny fraction of the available spreadsheet. The columns are labeled with letters, and if we scrolled to the right, we would see that once one gets to column "Z", the columns continue with "AA", "AB", etc. to provide an enormous number of columns all the way to "XFD". Similarly, there are many available rows numbered 1 through 1,048,756. This large area makes it possible to story large amounts of data in tabular form.

Orientation and Navigation

This video provides a short orientation to the Excel 2010 spreadsheet how to move around within it.. Topics include:

The menus in previous versions will be different from these, but navigation and other design aspects will be the same.

Entering Data

 This video illustrates:

To view the video in full screen mode, click on the icon at the very lower right, next to "You Tube".

Exercise I - Basics


 

Name

Age

Weight (lbs)

Height (in)

Weight (kg)

Height (meters)

BMI (kg/m2)

Joe

45

272

72

 

 

 

Jane

37

165

65

 

 

 

Sarah

55

223

64

 

 

 

Tom

61

201

62

 

 

 

Ayesha

48

193

68

 

 

 

Sally

41

289

63

 

 

 

Dorothy

29

213

70

 

 

 

James

38

188

66

 

 

 

 

The worksheets at the bottom of your screen have titles. The default names are "Sheet 1", "Sheet 2," etc. You can change the names by double-clicking on the worksheet title to select it, and then typing in a new name.

  1. Use the scroll bar at the right side of the window to scroll down. Note that the column titles disappear.
  2. In Excel 2010 you can freeze the column titles at the top, so they are always visible. First place your cursor in cell A2 to indicate that you want the cells in the first row to freeze. Then click on "View", then select "Freeze Panes". Note that when you do this the line between rows 1 & 2 becomes a little bit darker.
  3. Now scroll down and up again, noting that you can see the column titles at all times.

=D2*2.54/100

This should give a result of 1.8288 meters.

=E2/(F2^2)

You should get a BMI of 36.89. Note also that the cell is already formatted to two decimal places.

Note also that the parentheses around (F2^2) are critical, since they indicate that the value of F2 (meters) should first be squared (i.e., raised to the second power), and the result should then be divided into E2.

In cells A11 through A15 enter the following labels: Count, Sum, Average, Std. Deviation, Median.

In cell B11 enter the function "=COUNT(B2:B9)".

In cell B12 enter the function "=SUM(B2:B9)"

In cell B13 enter "=AVERAGE(B2:B9)".

In cell B14 enter "=STDEV.S(B2:B9)".

In cell B15 enter "=MEDIAN(B2:B9)".

Now replicate these functions to the right in order to apply them to columns C through G.

 

Now select rows 12-15 and format them to one decimal place.

 

Exercise 2 - Graphing


In the handouts on the right side of the page there is a link to a spreadsheet called "MAVEN data set." This contains de-identified data from MAVEN (the Massachusetts Virtual Epidemiologic Network), which is used for surveillance of infectious diseases.

The lower portion of the first worksheet lists partial data for over 1,500 cases of infectious disease reported to MAVEN. (For simplicity, the data set you see contains some of the more common infectious diseases). Note that I have already sorted the data by type of infection.

In the uppermost portion of the spreadsheet I have created a summary table that uses the "COUNT" function to count the number of cases of each type of infection. 

Your charts should look something like the ones below:

Creating an Epidemic Curve

In the handouts on the right side of this page, the Excel file called "Epi_Tools.XLS" has two worksheets, each demonstrating a simple way to create an epidemic curve when you have dates of onset for a series of cases. The first method, simply sorting the cases by date of onset and then counting the number of cases occurring at, say, two day intervals, is illustrated in the worksheet called "Epi Curve 1". The second method, using pivot tables, is illustrated in the worksheet called "Epi Curve 2".

Exercise 3 - Solving an Outbreak of Hepatitis A (Case-Control Study)


In the handouts on the right side, open the Excel file call "Hepatitis Outbreak". Save this file to your local computer or thumb drive.

This file contains actual data from investigation of a hepatitis outbreak that occurred in Marshfield, MA in 2004. There were 20 cases reported within a short period of time. The Massachusetts Department of Public Health was asked to investigate the outbreak. Descriptive epidemiology, including hypothesis-generating interviews, suggested that the source was most likely a food handler at one of several possible food establishments.

The investigators conducted a case-control study. They created a questionnaire and collected data from 19 cases and 38 controls. The controls were matched to the cases by age group and by neighborhood of residence. The data is in the first worksheet in the Excel file above. Note that the data indicating whether the subjects were cases or control are inconsistent, with a variety of responses indicating yes or no. The same is true for the data indicating which food establishments they were exposed to.

 You now have the information you need to compute odds ratios and 95% confidence intervals for each of the food establishments. 

 

Exercise 4 - Analysis of a Cohort Study


Is there an association between High Density Lipoprotein Cholesterol (HDLC) levels and myocardial infarction?

In Handouts on the right-hand side of the spreadsheet module there is an Excel file with actual data from the Framingham Heart Study. In this data set there a variable called labeled HDLC in column R. This is the subject's value for high density lipoprotein cholesterol at the beginning of the study (baseline). HDLC is sometimes called the 'good' cholesterol, because higher levels of HDLC seem to be associated with a lower risk of atherosclerotic heart disease. [For more on lipoproteins and heart disease see the discussion of this topic in the PH709 module on Atherosclerosis]

In this analysis you will examine the association between low HDLC (i.e. values < 40) and risk of myocardial infarction. In the spreadsheet with the Framingham data set there is a variable called MI_FCHD in column W which indicates whether the subject was hospitalized for a myocardial infarction or died of a myocardial infarction during this phase of the study. This is the outcome of interest for this particular analysis. TIMEMIFC in column AD indicates the disease free observation time (in days) for having a fatal or non-fatal myocardial infarction (heart attack). This time represents the number of days elapsed from baseline until one of three things happened: 1) they had a myocardial infarction, 2) they were lost to follow-up, or 3) the subject reached the last date for this phase of the study without an event. Note that many subjects have 8,766 days of TIMEMIFC; these are all subjects who had the maximum number of observation days because they were observed from baseline until the end of this phase of the study (8.766) and did not have a myocardial infarction during that entire period.

Example

This is a 21 minute video which walks you through a similar analysis comparing the risk of myocardial infarction between subjects with high LDL (> 130) and low LDL (<130). This will provide detailed instructions on performing an analysis in a cohort study.

Detailed Instructions:

Open the Framingham data set in the Handouts section of the online module and save it to your computer using "File", "Save as". Then analyze the association between low HDLC (HDLC <40) and myocardial infarction (MI_FCHD). Compute BOTH the cumulative incidence and the incidence rates in the exposed and unexposed groups. Use these results to compute the Risk Ratio (using cumulative incidence) and the Rate Ratio (using the incidence rates).

Note: if there are individuals who do not have a recorded value for HDLC, these subjects should be excluded from the analysis.]

You will need to compute:

  1. The number of 'events' (MI_FCHD) in exposed and unexposed groups.
  2. The number of subjects in exposed (HDLC<40) and unexposed (HDLC>40) groups.
  3. The person-time (in years) for the appropriate groups.
  4. The incidence rate in person-years for each group.
  5. The cumulative incidence for each group.

With this information complete your analysis by using the Cohort Study worksheet in Epi_Tools.xls to compute the

  1. Risk ratio (using cumulative incidences), 95% confidence interval for the risk ratio, and the p-value for this comparison.
  2. Rate ratio (using incidence rates), 95% confidence interval for the rate ratio, and the p-value for this comparison.

Finally, using the Excel data file that you used to sort and tabulate the data, insert a text box that summarizes the results of your analysis, and interpret your findings in a few sentences. Label the summary worksheet tab as "Summary."

 

Excel 2010 - Smart Method Video Course


The Smart Method - Free Online Video Course with 302 video tutorials

http://www.youtube.com/user/TheSmartMethod?feature=watch

 

 

More Tutorials for Excel 2010


Excel 2010 Tutorial For Beginners #1 - Overview (Microsoft Excel)

(From YouTube.com; http://www.youtube.com/watch?v=eI_7oc-E3h0

 

Excel 2010: Simple Formulas

(From YouTube;

http://www.youtube.com/watch?v=JJPjs-lITAw

Excel 2010: Functions (AVERAGE, MAX, MIN, COUNT)

From YouTube;

 

Excel 2010: Functions (AutoSum, Sum, Max, Min, Average, Large, Small, ABS, Power)

(From YouTube;

http://www.youtube.com/watch?v=8r4Cs4Iay3o

 

Excel 2010: The IF function: what it is, and how to use it

(From Microsoft.com)

The IF function checks to see if a condition you specify is true, or false. If true, one thing happens; if false, something else happens. For example, if you use the IF function to see if amounts spent are under or over budget, the result for True could be "Within budget," while the result for False could be "Over budget."

Download  | Watch online

 

Excel 2010: The AND, OR, and NOT functions

(From YouTube; http://www.youtube.com/watch?v=SNbxFSMRw6g

Excel 2010: Sparklines

From YouTubel

http://www.youtube.com/watch?v=iNrhjFixC28

 

 

How to create a basic chart in Excel 2010

(From Microsoft.com)

Charts make data visual. With a chart you can transform spreadsheet data to show comparisons, patterns, and trends.

Download  | Watch online

Video Modules for Excel 2003


Basics of Excel Worksheets

Module 1: Basics of Excel Worksheets

Formatting an Excel Worksheet

Module 2: Formatting an Excel Worksheet Example Spreadsheet:  Beach Inspections Form

(Note Before watching the video, download the Beach Inspections spreadsheet and save it to your own computer. Right-click on this Link to Example Spreadsheet, and select 'Save target as'.)        

Simple Math Formulas

Module 3a: Simple Math Formulas Example Spreadsheet:  IgG Dose Calculations

(Note: Before watching the video, download the IgG Dose Calculation spreadsheet and save it to your own computer. Right-click on this Link to example spreadsheet, and select 'Save target as'.            

Additional Formatting Tips            

Using the "IF" Function & Sorting Data

Module 3b: The "IF" Function and Sorting Example Spreadsheet: Tracking Pneumococcal Vaccinations

(Note: Before watching the video, download the PneumoVaccines spreadsheet and save it to your own computer. Right-click on this Link to example spreadsheet, and select 'Save target as'.            

Creating Form Letters in Excel

Module 4: Creating Form Letters Example Spreadsheet:  Food Establishment Inspections

(Note: Before watching the video, download the GreaseTraps spreadsheet and save it to your own computer. Right-click on this Link to example spreadsheet, and select 'Save target as'.            

Password Protecting Excel Files

Module 5: Password Protecting Files Start Time Example:  Tracking Pneumococcal Vaccinations
Note:

(Note: Before watching the video, download the PneumoVaccines spreadsheet and save it to your own computer. Right-click on this Link to example spreadsheet, and select 'Save target as'.            

Using "Find", "Replace", "Count", "Sum", and "Average" 

Module 8a: An Outbreak Investigation:
Using Find, Replace, Count, Sum, Average
Start Time Example:  Hepatitis A Outbreak
Note:

(Note: Before watching the video, download the Hepatitis Outbreak spreadsheet and save it to your own computer. Right-click on this Link to example spreadsheet, and select 'Save target as'.            

Microsoft Tutorials for Excel 2007


Introductory Tutorials from Microsoft.com

Introduction to EXCEL 2007

Description of the layout How to accomplish everyday tasks in Excel 2007, and a bit about the XML file formats. Get to know Excel 2007: Create your first workbook
How to create a workbook, enter and edit text and numbers, and add rows or columns Get to know Excel 2007: Enter formulas
How to enter simple formulas into worksheets, and how to make formulas update their results automatically Learn how to figure out dates using formulas in Excel 2007
How to find the number of days between dates, or the date after a number of workdays, or the date after a number of days, months, or years Charts I: How to create a chart in Excel 2007
How to create a chart using the new Excel 2007 commands and make changes to a chart after you create it Pivottables in EXCEL 2007
How pivottable reports organize, summarize, and analyze data. how to create a pivottable

 

Getting up to speed with Excel 2007

The layout of the 2007 spreadsheet, everyday tasks, and an explanation of the XML file formats Get to know Excel 2007: Create your first workbook
How to create a workbook, enter and edit text and numbers, and add rows or columns Get to know Excel 2007: Enter formulas
How to enter simple formulas into worksheets, and how to make formulas update their results automatically Learn how to figure out dates using formulas in Excel 2007
How to find the number of days between dates, or the date after a number of workdays, or the date after a number of days, months, or years

 

Charts I: How to create a chart in Excel 2007

How to create a chart using the new Excel 2007 commands and make changes to a chart after you create it Pivot Tables I:
How pivot tables organize, summarize and analyze data PivotTable II: Filter PivotTable report data in Excel 2007
How to filter to hide and display selected data in PivotTable reports PivotTable III: Calculate data in PivotTable reports in Excel 2007
Power tools: how to summarize data by using summary functions other than SUM, such as COUNT and MAX. How to show data as a percentage of the total by using a custom calculation. How to create your own formulas in PivotTable reports.

"Numbers" on Apple Mac


"Numbers" is the spreadsheet application for Apple devices. It functions much like Excel, but there are some differences. Note also that Excel spreadsheets will open and run in Numbers, and Numbers spreadsheets can be "Saved as" Excel spreadsheets.

Here is a list of hyperlinks to tutorials on specific aspects of Numbers

Numbers Help

Get started in Numbers

Numbers at a glance

Create a new spreadsheet

Open an existing spreadsheet

Replace template placeholders

Organize a spreadsheet with sheets

Customize your app

Undo or redo changes

Use Handoff with Numbers

 

Add and edit tables

Add or delete tables

Work with rows and columns

Add content to table cells

Format table cells

Create a custom cell format

Add controls to cells

Merge or unmerge cells

Add a comment to a cell

Add conditional highlighting to cells

Change the look of a table

Change the look of table text

Save a table as a new style

Move, resize, and lock a table

Format tables for bidirectional text

Sort data in a table

Filter data

 

Enter formulas and functions

Calculations

Types of arguments and values

Ways to use the string operator and wildcards

Functions that accept conditions and wildcards as arguments

 

Add and edit charts

Add or delete a chart

Scatter charts and bubble charts

Interactive charts

Move, resize, and rotate a chart

Modify chart data

Adjust a chart's markings and labels

Change a chart's type

Change the look of a chart

Save and organize chart styles

 

Add images, shapes, and media

Objects overview

Add and edit images

Add and edit shapes

Add video and audio

Change the look of an object

Create object styles

Resize, rotate, and flip an object

Align and position objects

Layer, group, and lock objects

 

Add, edit, and format text

Add text

Change the look of text

Use paragraph styles

Align text

Format lists

Adjust character spacing and formatting

Format punctuation

Add color or a border to a text box

Check spelling

Find and replace text

Add comments and highlight text

Use bidirectional text

Format Chinese, Japanese, or Korean text

 

Manage spreadsheets

Save and rename a spreadsheet

Locate a spreadsheet

Save a spreadsheet in another format

Save a spreadsheet in package or single-file format

Lock a spreadsheet

Password-protect a spreadsheet

Add custom templates

Move a spreadsheet

Delete a spreadsheet

 

Share and print

Use iCloud with Numbers

Share and edit a spreadsheet with others

Send a copy of a spreadsheet

Print a spreadsheet

Use iTunes to transfer files

Share files over a remote server

 

 Formulas and Functions Help

Formulas

Functions

Date and time functions

Duration functions

Engineering functions

Financial functions

Logical and information functions

Numeric functions

Reference functions

Statistical functions

Text functions

Trigonometric functions

 

There are also many You Tube video tutorials for Numbers.

 

Biostatistics Applications


This page provides links to Excel files which contain multiple worksheets that enable you to perform a wide variety of statistical functions and tests. To use these, you must open them and then save them on your local computer.

Summarizing Data

Generates summary statistics on a continuous variable

Confidence Interval Estimates

Worksheets compute confidence intervals for means, proportions, the mean difference in matched or paired samples, and the difference in means and difference in proportions in two independent samples and accept either raw data or summary statistics.

Hypothesis Testing Procedures

Worksheets conduct tests of hypothesis for means, proportions, the mean difference in matched or paired samples, and the difference in means and difference in proportions in two independent samples and accept either raw data or summary statistics.

ANOVA

Worksheets conduct analysis of variance tests of equality of k independent means and accept either raw data or summary statistics.

Chi-Square Tests

Worksheets conduct the chi-square goodness of fit and chi-square test of independence based on observed frequencies.  

Sample Size Determination

Worksheets estimate the sample size(s) required to ensure adequate precision in confidence intervals or high power in tests of hypothesis for means, proportions, the mean difference in matched or paired samples, and the difference in means and difference in proportions in two independent samples.