Bias in Evaluating the Required Methods Course

Bias in Evaluating the Required Methods Course /
Joseph F. Fletcher and Michael Painter-Main /
University of Toronto /
Prepared for Presentation at the American Political Science Association Teaching and Learning Conference, Washington DC, February 17, 2012. /

Bias in Evaluating the Required Methods Course

Abstract

Undergraduate Political Science programs often require students to take a research methods course. And such courses are typically among the most poorly rated. This is due, in part, to the way in which such courses are evaluated. At our university and elsewhere students are asked whether or not they would retake a course, “disregarding…program or degree requirements.” This formulation artificially inflates the number of negative responses and thus substantially lowers the “re-take” rate widely used by students, instructors and administrators to assess a course. By adding an additional question to the standard course evaluation in our required methods course we found evidence that posing the question in this way introduces considerable bias into the “re-take rate”. In asking students whether they would retake the course we ask them to consider its value in preparing them for further study and future work. The results show markedly fewer negative responses, resulting in a more positive overall evaluation. We locate these results in the framing and course evaluation literatures and show through multivariate analyses that asking students to disregard requirement status actually cues them to take it into consideration.

1. Introduction

Course evaluations are a common part of contemporary university and college life. Near the end of each term students are typically asked to evaluate the course on a standardized form. To reduce bias in the responses, the instructor is asked to leave the room while the students complete the evaluation. The results are generally published prior to the subsequent term, ostensibly to aid students in the selection of their courses. While a great deal of information is collected on the evaluation forms, in practice students, professors and administrators tend to focus upon a single item, colloquially referred to as the retake rate. This is the percentage of students indicating that in light of their experience they would still take the course. This figure is prominently displayed in the summaries made available to students at the beginning of each subsequent term. Moreover, departmental chairs, such as our own, often refer to this retake rate in their annual evaluation of the teaching staff. Promotion and tenure committees do likewise. Accordingly, professors are highly interested in this figure.

Having worked for many years as instructors in a required research methods course, naturally we too have become interested in the retake rate. This course has consistently received one of the lowest retake rates in the department. And we have heard about this in our annual reviews. Our efforts over the years to increase the retake rate for the course have met with only marginal success. Complaining of this to one of our colleagues, he mentioned that research methods courses are often among the most poorly rated across the disciplines. And indeed we consoled ourselves for several years by comparing our course’s retake rates to those of similar courses in other disciplines.

A few years back we decided to look closely at the question behind the retake rate that appears on the standardized form used across the Faculty of Arts and Science at our University. It reads“Considering your experience with this course, and disregarding your need for it to meet program or degree requirements, would you still have taken this course?”It occurred to us that this item was constructed, likely with the best of intentions, in at least partial awareness that required courses are likely regarded somewhat differently by students than are elective ones. No doubt the authors of this question were seeking to allow for this fact by inserting the parenthetical phrase “disregarding your need for it to meet degree or program requirements." Presumably, it was thought this instruction would aid students in disregarding the status of the course as either required or elective in making their evaluations. Unfortunately, even the best of intentions can go awry and have unintended consequences.

2. Literature Review

Even a passing awareness of the framing literaturesuggests that different responses can be elicited by different question wordings. As the work of Kahneman (2011) reminds us questions can highlight different aspects of the same situation with considerable impact upon the answers provided. In the current situation the phrasing designed,no doubt, to get students to discount their feelings about the required status of the course may well prime or increase the salience or availabilityof those same feelings. Higgins (1996) tells us, "priming occurs when a stimulus sub-consciously activates existing memories.”In the present instance the mere mention of a requirement may enhance the accessibility of feelings about required courses, inadvertently priming the very considerations that the question is presumably meant to avoid. Faced with similar problems survey researchers have become particularly attentive to framing effects which may lead to priming (see for example Zaller 1992; Kinder 2003; Nelson and Kinder 1996; Chong and Druckman 2007; Druckman 2004; Johnson 2011; Gamson and Modigliani 1987).

There is, of course, also a considerable literature on course evaluations. It has focused on three overarching themes:the teacher,the student and the course. Perhaps the most studied aspect of course evaluations is devoted to student evaluations of teaching. Aspects underlying teaching evaluations include the experience of the teacher, the implements used in the course by the teacher, as well as a teacher’s personality and behaviour toward the students (Marsh and Dunkin 1992; Simpson and Siguaw 2000; Clayson 1999). Student features lie largely with personal identity or traits held. These include gender (Tatro 1995) part/full-time nature of the student, enthusiasm going into a class and the length of their university experience (Frey, Leonard and Beatty 1975). Finally, the nature of the course has been given attention involving issues of course difficulty and expected grades (Pounder 2007; Braskamp and Ory 1994; Marsh 1987) and whether the class is required or an elective, with required courses seen as harder and producing lower expected grades leading to lower course evaluations (Wachtel 1998; Scherr and Scherr 1990; Petchers and Chow 1988; Darby 2006; Lovell and Haner 1955; Pohlmann 1975; Landis 1977; Pounder 2007).

Research focusing on required courses suggests that students bring prior attitudes toward the class to bear in their course evaluations (Barke, Tollefson and Tracy 1983). In particular, prior interest has been found to be a good predictor of overall evaluation of the class (Barth 2008; Francis 2011). A second prior consideration is the degree of initial enthusiasm a student brings to a course, with dread, and resentment being common responses in having to take a required course (Coleman and Conrad 2007). Such feelings are often linked to pre-conceived notions of the course (Heise 1979; 2002). Perhaps the poster child for such feelings is the required methods or statistics course, particularly in the social sciences, as many students are reluctant to take quantitative methodology courses (Alvarez 1992). Lewis-Beck (2001) surmises that this is a result of students lacking an inclination for mathematics, as well as possessing little prior experience in the subject matter. Such prior negative attitudes and beliefs greatly affect how the course is evaluated (Gal and Ginsburg 1994). Hence statistics-oriented courses tend to produce poorer student ratings as compared to non-statistics courses (Colman and Conrad 2007).

3. Methodology

The data for this study come from University of Toronto students taking a full year quantitative methods course in Political Science between 2009 and 2011.[1]The course is a required class for specialists, a designation applied to those whose primary focus is in the study of politics.[2] In each instance students voluntarily filled out course evaluations near the end of term.

Insofar as the questions on the standardized course evaluations were not under our control we could not experiment with their wording. So we took advantage of an opportunity to pose additional questions at the end of the standard evaluation. The additional items appeared on a separate sheet and students were asked to enter their responses on the standardized form in a special area designated for "additional statements are questions which may be supplied in class.” One of the additional items we included asked “Considering the value of this course in preparing for future study and future work, would you still have taken this course?" The options of yes and no were available, though some students chose to leave the item blank. In selecting this particular wording we do not mean to arrive at a perfectly unbiased question. Instead we sought to provide students with an alternative frame for the retake item, one that primed different considerations than the traditional question. We decided a reasonable alternative frame in evaluating a course mightcue students to consider their possible future in academics or working.

Data analyses are conducted using cross-tabulation and logistic regression.Our goal is not to consider all of the potential factors that may affect course evaluations; others have done this. Rather our purpose is to determine why the two versions of the question produce such different results. For this we will look at factors such as whether the course is a required one for the student, initial enthusiasm, perceived course difficulty, grade expectations and the value of the course as a learning experience. Our analysis enables us to consider how these factors differentially predict responses to the two forms of the retake question.

4. Findings

Table 1 presents the results for the traditional retake question as well as for the revised version we introduced in recent classes. As the left-hand column shows the traditional question produces roughly 40-60 split with most students indicating that they were unwilling to retake the course. In the right-hand column however we see a complete reversal of this distribution. Askingthe revised question results in the distribution reversing; 60% of the respondents now favor retaking the course. The Chi Square analysis indicates that there is an extremely small chance (p < .004) that these two questions are measuring the same thing.

Evidently how one frames the retake question makes a difference. And this difference is not merely statistical. Depending upon which question is used the majority flips from negative to positive. In particular, framing the retake question in the traditional way produces the majority who say they are unwilling to retake the course. Alternatively, framing the retake question in a way that highlights future considerations produces a majority saying they are willing to retake the course. The practical implications of this are considerable in so far as students, professors and administrators are likely to draw quite a different conclusion about the course based upon which question wording is used. Of course the traditional question has been the standard and we have been evaluated according.

Table 1. Willingness to Retake Required Methods Course by Question Type

Retake Course? / Traditional Question / Revised Question
Yes / 42.8% / 59.4%
No / 57.2 / 40.6
N / (152) / (155)

Source: University of Toronto Student Evaluations for POL242 Research Methods 2009-2011.

Chi2 = 8.46 with 1df; p =.004

Since both the traditional and the revised questions were asked of all respondents, we are able to cross-tabulate the answers given to the two questions. The results are presented in Table 2. The resulting pattern is striking. As can be seen in the first column, every student who indicated that he or she would retake the course using the traditional question also indicated a willingness to retake using the revised question. In other words there were no false positives created by the traditional frame. Moving to the second column of Table 2, however, suggests a considerable number of false negatives. Over one quarter of those who said they would not retake the course using the traditional question answered that they would retake the very same course using the new question. The Chi Square test indicates that there is a strongprobability(P. = .000) that the two columns statistically differ. The McNemar test, specifically designed to determine whether the same respondents differ in their answers on two measures,produces an exact significance of .000.

Table 2. Response to Revised Retake Question by Traditional Question Response (Yes=1; No=0)

Revised Retake Question Response / Yes / No
Yes / 100% / 28.4%
No / 0 / 71.6
N / (63) / (81)

Source: University of Toronto Student Evaluations for POL242 Research Methods 2009-2011.

Chi2 = 75.6 with 1df; p =.000; McNemar Exact significance =.000.

Clearly there is a difference between the responses obtained using the two versions of the retake question. But why is there such a difference? In order to investigate we look at whether different factors predict responses to the two versions of the question. For this we require a multivariate approach. And since the response options are in each instance dichotomies (1=yes; 0=no), we use logistic regression.

Table 3. Predictors of Traditional & Revised Retake Questions-Logistic Regression

Traditional Question / Revised Question
Pre-Course Factors
Required Status / -.90 (.45)* / -.58 (.47)
Enthusiasm at Time of
Registration / .70 (.35)* / .49 (.41)
In-Course Factors
Level of Difficulty / -.41 (.25) / -.47 (.26)
Expected Grade / .39 (.29) / .76 (.31)**
Learning Experience / 1.05 (.24)*** / 1.23 (.26)***
Constant / -.2.73 (2.76) / - 4.59 (2.96)
Summary Measures
-2 LL (initial; model)
RL2;Cox & Snell; Nagelkerke
% of Cases Correctly Classified / (152.7; 107.5)
(.30; .33; .45)
75.2 / (150.9; 96.6)
(.36; .38; .52)
78.6
N / (113) / (112)

Source: University of Toronto Student Evaluations for POL242 Research Methods 2009-2011.

*p<.05 **<p.01 ***p<.001

We pursue this notion of differential predictors using paired analyses.[3] The results are presented in Table 3. Logistic regression allows us to estimate the effectiveness a number of independent variables in predicting the (logged) probability of a dichotomous dependent variable. In this case the dependent variable is the probability of answering “yes” rather than “no” to the retake question. Separate equations using the same predictors are presented for both the traditional and revised versions of the retake question.

Looking first at the summary measures presented at the bottom of the table we see that the overall fitof both models is very good. The initial-2 log likelihood (-2LL), analogous to the total sum of squares in ordinary regression, represents the error variation of the model with only the intercept included. The model -2 LL, like the error sum of squares in linear analyses,is the variation remaining in the model after the predictors are included. The difference between these two, divided by the initial -2LLcan be interpreted as the proportional reduction in the absolute value of the -2LL. Menard (2010: 54) considers this to be the best measure of model fit. Like R2, it provides a quantitative indication of the extent to which the combination of independent variables in the model allows us to predict responses on the dependent variable.It is therefore sometimes referred to as a pseudo-R2. Adjusted versions of this coefficient by Cox and Snell and Nagelkeke are also presented. On all of these measures the independent variables in the equation markedly assist in explaining variation in the dependent variables. Similarly, the percent of cases correctly classified shows the extent to which respondentsare correctly classified into the discrete yes and no categories of the dependent variable. For both equations the percentages show a marked improvement over the roughly 58% possible using the modal category for each dependent variableas seen in the frequency distributions of Table 1. So both qualitative prediction and quantitative prediction are robust. The R2 and case classification measures both suggest the fit of the model for the revised question is marginally better than that for the traditional question, but the difference is relatively modest at best.

The upper portion of Table 3 presents unstandardized logistic regression coefficients (and their standard errors) whichare used in the model topredict the dependent variable.[4]These coefficientsclosely parallel the unstandardized coefficientspresented in an ordinary least squares regression, however logistic regression coefficients are expressed as estimates of a variable’s influence on the log odds of the dependent variable (or logit). Reading and writing about a variable’s influence on the log odds of retaking a statistics course is undoubtedly awkward, but it accurately expresses the logged units of the dependent variable.In this particular instance comparisons of the variables across models is somewhat facilitated by their being derived from the same data set and hence adding roughly comparable standard errors. Nevertheless, because each of the independent variables may be measured in different units, comparing the absolute size of their respective coefficients within a particular equation can be misleading; they must be viewed within the context of their respective standard errors. Accordingly, it is particularly handy to attend to the statistical significance of independent variables.

Looking across the first line of the table, the negative coefficients for required status indicate that,controlling for the other variables in the model, the retake rate is lower among those for whom the course is required. The sign on the coefficient is the same for both equations but their magnitudes differ markedly. Referring to the left-hand column of table 3 in the equation predicting answers on the traditional retake question, we see that the required status of the course results in a .90 unit decrease in the log odds of students’ willingnessto retake the course.[5]Moving the second column we see that required status produces a decrease of .58 units using the revised question. Viewed within the context of their respective standard errors, the effect of the course being required is substantially greater using the traditional measure than with the revised measure. And insofar as the standard errors are roughly equivalent, one might observe that the effect of the course’s required status is reduced by about a third in using the revised wording. Moreover, the effect is statistically significant in the left-hand column but not on the right. In other words, the required status of course is the significant predictor of retake rate using the traditional question but not the revised question.