Internal and External Validity
Now let's take a deeper look into the common threats to internal validity. Familiarity with these threats will help guide you in choosing your evaluation design where the goal is to minimize such threats within the confines of your available resources.
Observed changes seen between observation points (ie. Pre-test and post-test) may be due to changes in the testing procedure. This could include changes to the content or the mode of administration and data collection.
- Keep an eye out for this if there are multiple observation/test points in your study
- Go for consistency. Instrumentation threats can be reduced or eliminated by making every effort to maintain consistency at each observation point. This includes the instrument (questionnaire, type of testing kit, etc.), the administrators, and the method of administration (paper, telephone, etc.)
The tendency of extreme pre-test scores to revert back toward the population mean, such that when individuals are selected for program participation based on extreme pretest results their posttest scores will tend to shift toward the mean score, regardless of the efficacy of the program.
- Avoid selecting participants based on extreme performance or scores.
This is a threat that is internal to the individual participant. It is the possibility that mental or physical changes occur within the participants themselves that could account for the evaluation results. In general, the longer the time from the beginning to the end of a program the greater the maturation threat.
- If feasible within your evaluation questions, reducing the amount of time between the pretest and posttest can limit maturation threats.
- Should be particularly mindful of this threat when working with children, as they are going through a great deal of mental and physical changes.
The administration of a pretest prior to the program may convey knowledge to the participants. This particular threat can either overstate or understate your program effect.
- Keep an eye out for this threat whenever there is a pretest-posttest design and no comparison group to help control for the learning curve of taking the pretest.
Observed program results may be explained by events or experiences (external) that impact the individual between program participation and follow up.
- Evaluators should do their best to identify any external events or changes that may impact their program results (media coverage, policies, major events, etc.)
- As with maturation threats, history threats can be reduced by limiting follow up time.
Whenever you have a nonequivalent comparison group and an exposure group, the difference seen in their posttest scores could be due to pre-existing differences between the groups rather than the impact of the program itself. This is of particular concern when the exposure group and the comparison group are significantly different from one another in characteristics.
- Be alert for this potential threat if you are working with a nonequivalent comparison group.
Interactions with Selection Threats
- Selection - History: Selection differences between the participants in the intervention and the comparison groups lead to differences in exposure or impact of historical events
- Selection - Maturation: Selection differences between the participants in the intervention and the comparison groups lead to differences in maturation effects
- Selection - Instrumentation: Selection differences between the participants in the intervention and the comparison groups lead to differences in instrument scores
Other threats may come into play in the course of implementing your program evaluation design. Both randomized and non-randomized designs may be at risk as long as there is a control (randomized) or comparison (non-randomized) group:
- Diffusion or imitation: Can occur when individuals in the intervention group interact with those in the control/comparison group. Such cross-contamination via sharing of information can lessen the differences between the intervention and control/comparison group.
- Compensatory equalization of treatments: Often administrators view their program as beneficial. As such, it may be difficult to accept that some receive the intervention while others in the control or comparison group do not have the opportunity. To correct this, program administrators may being offering all or part of the program to those in the control/comparison group, thus eliminating any program effect between the two groups.
- Compensatory rivalry by people receiving less desirable treatments (John Henry effect): Interventions may be seen as offering desirable features compared to the control/comparison groups. If those in the control/comparison group are aware they are receiving less desirable services they may try to compensate for this difference by attempting to outperform the intervention group.
- Resentful demoralization of respondents receiving less desirable treatments: Again, if desirable services are not provided to those in the control/comparison group, rather than compensating they may become unmotivated or less cooperative.
Internal Validity Scenarios
Below are examples of health program evaluations, each highlighting a specific threat to internal validity. For each scenario, determine the most pressing threat to internal validity. Once you have reviewed all scenarios, select Show Answers to review the correct responses.
Scenario 1: A middle school has a new afterschool program for eighth graders targeted at increasing media literacy surrounding alcohol. The goal is to make them "savvy" consumers of advertising, and reduce the impact of such ads on alcohol consumption. Unfortunately, the program can only support 20 students. The evaluator decides to administer a pretest to all 200 eighth graders in the school, and take the 10% with the lowest test scores. Instrumentation, regression, testing, maturation, or history threat?
Scenario 2: The evaluators administer the pre-test for an evaluation as a pen and paper survey, and then for the post-test decide to adapt the survey to an online version.
Scenario 3: The Heart Healthy program is a one-day seminar targeted at education surrounding healthy food choices and cooking skills to reduce risk of major cardiac events. A pretest is given to see how knowledgeable participants are regarding heart healthy foods and how best to prepare them. The day after the seminar a posttest is circulated to discern program impact.
Scenario 4: A new family planning consultation is implemented at a local community health clinic, which experiences a high percentage of women reporting unwanted pregnancies. Peer educators are used to discuss family planning and contraceptive use with women using the clinic. The evaluators follow up with the participants one year later to record contraceptive use and whether any unwanted pregnancies occurred. There is no comparison group, so evaluators are looking only at the group that received the counseling. They find that since receiving counseling the rate of birth control usage has increased significantly. However, the evaluators are aware that last year a regulation was put in place providing free birth control to women regardless of insurance status.
Scenario 5: A program targeted at promoting physical fitness and healthy eating among pre-teens was conducted in 2010 in one public middle school in Charlestown. It is now 2015 and the team is interested in following up to see the current activity level and BMI of the participants and how it varies from their pre-program scores, compared to a public middle school located in the South End.
Answer the following series of True/False statements regarding internal validity.
Applicability of evaluation results to other populations, setting and time periods is often a question to be answered once internal validity threats have been eliminated or minimized.
Below is a selection of external threats that can help guide your conclusions on the generalizability of your research results:
- Interaction of Selection and Treatment: Does the program's impact only apply to this particular group, or is it also applicable to other individuals with different characteristics?
- Interaction of Testing and Treatment: If your design included a pretest, would your results be the same if implemented without a pretest?
- Interaction of Setting and Treatment: How much of your results are impacted by the setting of your program, and could you apply this program within a different setting and see similar results?
- Interaction of History and Treatment: An oversimplification here may be to say how "timeless" is this program. Could you get the same results received today in a future setting, or was there something specific to this time point (perhaps a major event) that influenced its impact.
- Multiple Treatment Threats: The program may exist in an ecosystem that includes other programs. Can the results seen with the program be generalized to other settings without the same program-filled environment?
Case Study Reflection
Thinking back to the Health Bucks case study:
- Can you identify any threats to external validity that the evaluators should be cognizant of?
- Pick one, and describe why it might apply to this Health Bucks situation?