Statistical Conclusion Validity

Is there a relationship between the two variables?

1. Low Statistical Power. The lower the power of the statistical test, the lower the likelihood of capturing an effect which does in fact exist.

2. Violated Assumptions of Statistical Tests. The particular assumptions of a statistical test must be met if the analysis results are to be meaningfully interpreted.

3. Fishing and the Error Rate Problem. The probability of making a Type I error on a particular comparison in a given experiment increases with the number of comparisons to be made in that experiment.

4. The Reliability of Measures. Measures of low reliability may not register true changes.

5. The Reliability of Treatment Implementation. When treatments are not administered in a standard fashion (e.g., different administrators and/or the same administrator behaving differently on different occasions) error variance will increase and the chance of obtaining true differences will decrease.

6. Random Irrelevancies in the Experimental Setting. Setting variables may divert respondents' attention to the treatment and/or introduce error variance, thus washing out treatment effects.

7. Random Heterogeneity of Respondents. When respondent variables do not interact with treatment but are related to outcome, error variance will be high (unless this relationship is captured by blocking, covariance, or the use of within-subject designs).

Internal Validity

Given that there is a relationship, is it plausibly causal from one operational variable to another?

1. History. The purported treatment effects may in fact be due to nontreatment events occurring between pre and posttesting.

2. Maturation. The purported treatment effects may in fact be due to nontreatment events occurring between pre- and post-testing.

3. Testing. Improved scores on the second administration of a test can be expected even in the absence of treatment.

4. Instrumentation.Changes in the calibration of the measuring instrument over time or changes in personnel making ratings may result in spurious criterion differences that masquerade as treatment effects.

5. Statistical Regression. Individuals selected on the basis of extreme scores, high or low, on a particular test will regress toward the mean on a second test administration. Thus a group of low-scoring individuals will "improve" without treatment. Conversely, high-scoring individuals might deteriorate in spite of it.

6. Selection. Unless experimental and control groups are formed thru random assignment, differences on outcome measures may be due to the groups per se rather than to treatment.

7. Mortality. If experimental and control treatments produce differential drop-out rates, the foregoing selection artifact becomes operative in spite of random assignment.

8. Interactions With Selection. Selection can interact with history, maturation, instrumentation, etc., as for example, when the local history of the experimental group differs from that of the control group.

9. Ambiguity About the Direction of Causal Inference. This is a salient threat in simple correlational designs, but not in most experiments where the temporal ordering of independent and dependant variables is clear.

10. Diffusion or Imitation of Treatments. If experimental and control subjects can and do communicate with each other about their respective treatments, there may in effect be no differences between these treatments.

11. Compensatory Equalization of Treatments. If administrators in applied settings bestow benefits on the control group in amounts equal to the experimental group, differences between these conditions may break down.

12. Compensitory Rivalry by Respondents' Receiving Less Desirable Treatments. If control subjects upon learning of their "underdog" condition become motivated to perform better, then real differences between experimental and control treatments may erode.

13. Resentful Demoralization of Respondents Receiving Less Desirable Treatments. If control subjects upon learning of their "deprived" condition become discouraged or angry and do worse, then differences between experimental and control treatments may be an artifact of differential motivation to perform.

Construct Validity of Putative Causes and Effects

Given that the relationship is plausibly causal, what are the particular cause and effect constructs involved in the relationship?

1. Inadequate Preoperational Explication of Constructs. A precise explication of constructs is vital for the linkage between treatments and outcomes. For example, attitudes are usually defined in terms of stable predispositions to respond. Thus a self-report scale administered on a single occasion may be an inadequate operational definition.

2. Mono-Operation Bias. Single operational definitions of causes and/or effects (e.g., one counselor administering treatment and/or one outcome measure) both under-represent the constructs and contain irrelevancies.

3. Mono-Method Bias. Multiple operational definitions of causes and/or effects may still contain irrelevancies or preclude generalization, if single methods are employed (e.g., videotaped young, male, WASP counselors administering treatment, and self-report devices exclusively representing outcome).

4. Hypothesis Guessing within Experimental Conditions. If subjects are aware of the hypotheses, the effects of a treatment may be confounded with the subject's desire to conform to the hypotheses.

5. Evaluation Apprehension. Apprehension about being evaluated may result in attempts by respondents to depict themselves as more competent or psychologically healthy than is in fact the case.

6. Experimenter Expectancies. The data in an experiment may be susceptible to bias in the direction of the experimenter's expectations.

7. Confounding Constructs and Levels of Constructs. When the level (or amount) of an independent variable is not linearly related to the dependant variable along the whole range of that variable, and when only one level of that variable is manipulated, erroneous conclusions about its impact are easily drawn.

8. Interaction of Different Treatments. If respondents receive more than one treatment, one can neither generalize to other subjects' receiving single treatments nor isolate the effects of the treatment from the effects of the context of several treatments.

9. Interaction of Testing and Treatment. It is not possible to determine a) if the treatment would have had an impact if the pretest had been omitted or b) if the follow-up effect would have endured if the posttest had been omitted, unless one employs independent experimental groups at each delayed-test condition.

10. Restricted Generalizability Across Constructs. Sometimes treatments will affect dependant variables quite differently implying a positive effect on some construct and an unintended negative effect on another. Unless multiple constructs are assessed, such relationships remain unknown.

External Validity

Given that there is probably a causal relationship from construct A to construct B, how generalizable is this relationship across persons, settings, and times?

1. Interaction of selection and treatment. People who agree to participate in a particular experiment may differ substantially from those who refuse, thus results obtained on the former may not be generalizable to the latter.

2. Interaction of Setting and Treatment. Results obtained in one setting may not be obtained in another (e.g., factory, military camp, university, etc.).

3. Interaction of History and Treatment. Causal relationships obtained on a particular day (December 7, 1941 as an extreme example) may not hold up under more mundane circumstances.

What is the Hawthorne Effect

The Hawthorne Effect refers to the behavior of interest being caused by subject being in the center of the experimental stage, e.g., having a great deal of attention focused on them. This usually manifests itself as a spurt or elevation in performance or physical phenomenon measured. Although the Hawthorne Effect is much more frequently seen in behavioral research, it is also present in medical research when human subjects are present. Dealing with this problem is handled by having a control group that is subject to the same conditions as the treatment groups, then administering a placebo to the control group. The study is termed a blindexperiment when the subject does not know whether he or she is receiving the treatment or a placebo. The study is termed double blind when neither the subject nor the person administering the treatment/placebo knows what is being administered knows either.

What would be the Hawthorne effect in Usability

Basics of Experimental Design

Factor

Variable

Interaction

Main Effect

Simple Effects

Randomly sampling web pages