Handout on Experimentation

Q1: What is an experiment?

An experiment is a special type of research study conducted to determine whether or not there is a causal relationship between variables, not just a correlation. In an experiment, one or more variables are manipulated (changed) in order to see what effect, if any, the change has on other variables. The variables that may or may not react to the changes are called dependent variables, because their values depend on the values of the other variables. The variables that are changed are called independent or manipulated variables. Sometimes they are called independent variables because there value doesn’t depend on anything except what the experimenter does to them.

With non-experimental research, we can tell if two variables are correlated with each other but we cannot determine whether changes in one variable cause changes in another. The relationship could be just a coincidence, or changes in both variables might be the result of a change in a third variable.

For example, assume that a manufacturer of snack crackers increased the size of the box the crackers are packaged in, even though the number of ounces stayed the same. A product manager analyzing sales figures for the crackers might discover that sales for the cracker went up after the box size was increased.

Does this mean that making the box bigger caused sales to increase?

The answer, of course, is not necessarily. There could be many reasons why sales went up that had nothing to do with the increase in the size of the box. Maybe the demand for the crackers increased because of economic or competitive factors that happened to coincide with the increase in box size. Maybe the new package was more attractive to retailers so many of them gave it a more desirable position on the shelves and this is what caused sales to increase. Maybe the price of the crackers changed or a coupon was dropped and this is what caused sales to increase. The point is, from simply looking at the sales data, we cannot be sure that sales increased because people prefer to buy the larger box.

If we could rule out all of the other possible explanations for an increase in sales, so that we knew the only thing that changed was the size of the box, we would be pretty sure that the change in the size of the box was the reason that more people bought the product. With non-experimental research techniques, we might try to account for most other possible explanations statistically (this will be discussed later in the course). With experimental research techniques, we try to make sure that these possible explanations never occur in the first place. We’ll talk more about how we do this below.

Q2. What are laboratory experiments?

Some experiments are conducted in a “laboratory.” This doesn’t have to be a real laboratory, like you would expect a chemist to have. It is just any special office or observation room that the experimenter brings people to and has control over. This is in contrast to a “field experiment” that is conducted in a more real life setting.

Laboratory experiments (or lab experiments) are called lab experiments because they mimic a traditional experiment in the hard sciences (e.g., chemistry or physics) in several ways.

Assume that you were conducting a chemistry experiment to see how the boiling point of a chemical changed when a second chemical was added to it. How would you do this?

First, you might carefully measure out some of the first chemical. Then, you would heat it up while carefully monitoring the temperature to determine the boiling point. You might then carefully measure out the same amount of the first chemical, add the second chemical, heat it up while carefully monitoring the temperature, and determine the new boiling point.

Of course, you would need to make sure that everything other than the addition of the second chemical was the same between trials. You couldn’t use a different thermometer, for example, because it might be a few degrees “off.” You couldn’t heat the chemicals in a different sized container because the heat might spread out less evenly. If you didn’t keep everything the same, you couldn’t be sure that it was adding the second chemical that really caused any changes.

When we conduct a lab experiment, we are trying to make it pretty much like a chemistry experiment, in that we want to make sure we have a great deal of control over what, who, when and how we are measuring. Of course, experiments in the social sciences (e.g., marketing, psychology, economics) face some special challenges compared to experiments in the hard sciences. People are more difficult to deal with than chemicals. Once ounce of chemical A from batch #13456 is pretty much identical to any other ounce of chemical A from batch #13456. The chemicals never lie to you to make themselves look good, refuse to come back to be measured a second time, or try to figure out what you are really measuring them for. Much of the complications involved in experimental design are due to the attempt to minimize these and other potential problems.

Q3: What are Field experiments?

A field experiment is conducted in a “real life” setting rather than in a laboratory. A test market is an example of a field experiment because we are manipulating the availability of a product in particular areas and measuring sales. We could test our box size hypothesis in a field setting by placing the smaller size on the shelves in some stores and the larger size on the shelves in other stores and measuring the response. We could test our ads in a field setting by having each version shown before various movies and asking patrons about the ad they saw as they left the theater. A field experiment has one primary advantage over a lab experiment: it is more realistic.

Q4. What is Random Assignment?

How do you decide which participants receive which treatment? For the purposes of illustration, let’s assume that we were an advertising agency and we wanted to test two different versions of a television commercial. We have recruited 60 consumers with demographic characteristics similar to the target market and will show each ad to 30 of them and measure their attitudes toward the ad. We have two very similar rooms – Room A and Room B – with very similar equipment. We will have 30 consumers go into Room A to view Ad #1 and 30 consumers go into Room B to view Ad#2.

How do we decide which consumers are assigned to Room A and which are assigned to Room B?

Maybe, we can put the first 30 consumers who show up for our study in Room A and the last 30 in Room B. But, some people are habitually early and some are habitually late. What happens if most of the people we put in Room A (the first 30 to arrive) are punctual, organized, on-time sort of people and most of the people we put in Room B (the last 30 to arrive) are….um…not?

If this happens, we have a problem similar to using different sized containers to heat our chemicals in our chemistry experiment. There would be a systematic difference between the consumers in Room A and Room B other than the different ad you were testing. If you see a difference in evaluations of the ads, you will not be sure whether the difference was due to the ad the participants were shown or some aspect of the consumers’ propensities for timeliness.

How else could you assign the consumers? You might assign the first consumer to arrive to Room A, the second to Room B, the third to Room A, the forth to Room B, and so on. This would be much better than our previous method, but it still carries a risk of systematic differences between the groups. What if the participants arrived in pairs? Whomever was the most outgoing and “take charge” of the pair would walk in first and be assigned to Room A. So, you might end up with an overrepresentation of outgoing, take-charge types in Room A compared to Room B.

The best way to assign participants to groups is randomly. Flip a coin for each (heads, Room A, tails, Room B) or use a random number table or generator to assign a number to each participant (the 30 lowest random numbers in Room A, the 30 highest random numbers in Room B).

Q5. What are Between-subjects versus Within-subjects designs?

The experiments we have discussed up to now have been “between-subjects” experiments because the dependent variables were measured and compared across treatment groups. 30 people see Ad A and 30 people see Ad B and the group means are compared. As long as randomization is done, if one ad has higher means then that ad is valid.

In a within-subjects (or within-groups) design, each participant receives more than one treatment and provides more than one response to the dependent measure. For example, we could show all of our participants Ad #1, some other material as a distraction, and then Ad #2. This creates some special problems (learning effects etc.,) because exposure to one ad might affect the response to the other.

Q6. What is internal validity? Why do we care?

The term internal validity refers to how sure we are that we have controlled all of the potential differences between treatment groups. The term includes the word “internal” because we are referring to issues within the experiment itself. It includes the word “validity” because we are concerned with how valid, or true, our findings will be in regard to the issue of causation. Will we really be able to say that our independent factors caused changes in our dependent measures? The stronger our internal validity, the more sure we are of causation.

Threats to internal validity in experiments include:

1.Selection bias. The subjects or participants are assigned to some experimental groups may be different from the subjects assigned to other groups, and subsequent differences among the groups may be caused by these subject differences rather than the experimental manipulations. For example, in our wall color example, stores assigned to the green color condition might be smaller or older than stores assigned to the yellow condition, with the result that green departments get worse results than they deserve in the experiment.

2.Treatment effects. The very act of conducting an experiment may affect the dependent variable in a way that alters or masks the effects of the manipulations.

One type of treatment effect is called a Hawthorne effect, in which all experimental groups react to being in the experimental spotlight. In our wall color experiment, customers and employees might simply react to the fact that the produce departments are being painted. Another type of treatment effect is demand effects. People who participate in experiments tend to act the way they think they should act-- that is, they try to comply with the demands (or "demand characteristics") of the situation. This can affect their responses to the experimental stimuli. Consider, for example, the magazine cover experiment in which a magazine company showed various cover designs to its employees and asked which design they preferred. In an experiment of this type, the possibility that some designs are better than others for grabbing people's attention will not be captured because subjects will carefully examine all of the covers. They will do this because they assume that careful attention is desired in the experiment.

Subjects also may try to guess the purpose of an experiment and "help" the researcher get good results.

3.Testing effects. The process of measuring a dependent variable may influence the way that subjects respond to experimental manipulations, and/or to later measures of the dependent variable.

Say, for example, that an ad agency compares two different television commercials for a restaurant with the following procedure: subjects are given a pre-test questionnaire to measure their prior attitudes toward the restaurant, then are shown one of the experimental commercials, then are given a post-test to measure their subsequent attitudes. This study is likely to have a testing effect. As soon as subjects see a commercial for the restaurant, they will remember the pre-test measures and view the commercial in the context of those measures. This may affect their responses to the commercial.

4.History effects. History effects arise when some outside event which affects the dependent variable happens during the experiment. For example, in our wall color experiment, if a competing grocery chain offers very low produce prices during the experiment, then this might create an impression that painting the walls causes sales to decline. History effects are particularly troublesome when they affect treatment groups disproportionately. For example, if the competitor that offers low prices during our wall color experiment has stores which compete with 9 of the green stores but only 5 of the yellow stores, then this might create an appearance that yellow is a better color because the yellow stores have higher sales.

5.Maturation effects. Maturation effects occur when some change that is unrelated to the manipulations, but affects the dependent variable, occurs withinsubjects. For example, in our wall color experiment, three of the stores might lose good produce managers during the experiment, and subsequently suffer from lower sales. This would lower the average produce sales in that store's treatment group.

6.Mortality effects. Some subjects may fail to complete the experiment. For example, in our grocery experiment, one of the stores may have a fire and be dropped from the study. If the subjects who leave the study are different in some way from the subjects who stay, then this mortality will influence the observed effects. As with all threats to internal validity, mortality is a particular problem when it affects different treatment groups disproportionately.

Threats / Causes
Selection Bias / Innate differences in subjects

Treatment Effects

Hawthorne Effect / Reaction to being in the spotlight
Demand Effects / Subject’s desire to please
Testing Effects / Sensitization to measurement
History Effects / External events to the experiment
Maturation Effects / Internal changes in subjects
Mortality Effects / Dropouts from the experiment
Instrument Variation / Changes in the instrument

Q7: What is External validity?

An experiment is internally valid to the extent that observed differences among treatment groups are valid effects of the manipulations. An experiment is externally valid to the extent that effects that occur in the experiment will occur in the actual market. External validity also is sometimes called "generalizability," because it refers to the extent to which experimental effects will generalize to the marketplace.