Preventing Off-Task Gaming Behavior within Intelligent Tutoring Systems
Jason A. Walonoski Neil T. Heffernan
The MITRE CorporationWorcester Polytechnic Institute
Abstract. The positive learning effects of Intelligent Tutoring Systems may be negated by off-task student behavior, especially gaming behavior. Gaming the system has been defined as “exploiting properties of the system rather than by learning the material and trying to use that knowledge to answer correctly.” [1] The goal of this research was to develop a passive visual indicator to deter and prevent gaming behavior without active intervention, via graphical feedback to the student and teachers. Traditional active intervention approaches were also constructed for comparison purposes. Our passive graphical intervention has been well received by teachers, and results suggest that this technique is effective at reducing off-task gaming behavior.
Keywords: Off-task Gaming Behavior, Intelligent Tutoring Systems, Prevention
1Introduction
Intelligent Tutoring Systems (ITS) have been shown to have a positive effect on student learning [2], however these effects may be negated by a lack of student motivation or student misuse, particularly “gaming the system” [3]. Gaming the system has been defined as “exploiting properties of the system rather than by learning the material and trying to use that knowledge to answer correctly” [1]. Two common types of gaming behavior in ITS are rapid-fire guessing-and-checking and hint/help abuse [3]. Student gaming has been correlated with substantially less learning [3] therefore it is of particular importance to understand and remediate in order to maximize tutor effectiveness.
Within ITS there have been a variety of approaches towards remediation of gaming behavior in students, which are mostly active interventions focused on combating student gaming, with few approaches focused on prevention [1]. Some “preventative” approaches, such as those that actively tweak help features [4, 5], alert the student, or add extra work, we classify as “combative” since they actively intervene and often have side effects for non-gaming students. Our definition of prevention defers from these previous works, as we distinguish between these active interference and neutralization attempts and methods that do not explicitly interfere or label behavior as “bad” or “gaming.”
This research aimed at exploring a more comprehensive approach using active interventions to combat gaming along with a passive method to prevent gaming within the ASSISTments mathematics ITS [6]. Our passive intervention shows a “student tracker” that visually summarizes everything a student has done in the last twenty minutes, which a teacher can use to identify students who are “gaming the system” or detect other patterns of behavior.
2Combating Gaming versus Prevention of Gaming
Gaming behavior has been targeted with various methods [3]. Some of these interventions try to combat gaming like fire fighting – addressing the symptoms after they spring up, while others try to prevent gaming – stopping it before it begins [1].
We categorize interventions along two dimensions: the dimension of active and passive, and the dimension of static and dynamic. An intervention is active if it is used to combat behavior by altering the system interfaces (e.g. disabling buttons or popping up windows with text) or behaviors (e.g. introducing time delays to forcibly slow down student actions, adding extra scaffolding). An intervention is passive if it does not alter the environment interfaces or behavior at run time in response to student actions, but effectively alters the behavior of the students. Static interventions apply to all students and are inherent in the design of the tutor (e.g. fixed hint delays), while dynamic interventions are only invoked when triggered by some mechanism (e.g. disabling hints when gaming behavior is detected). We surmise that any intervention can be plotted along these two axes.
Active interventions have the potential to fuel gaming arms races, where gaming students adapt to the interventions and even attempt to game the anti-gaming interventions [5]. They also can unfairly penalize non-gaming students by inappropriate invocation of interventions, thereby inhibiting legitimate learning efforts of on-task students [3]. For example, by not allowing a student to be able to ask for hints when they are truly needed. Given that students who engage in off-task gaming behavior are a small percentage of all users (estimated between 5% and 15% in the ASSISTments system), the unfairness resulting from improper application of active interventions can discourage the majority of students who never game the system [3].
3Hypothesis
We hypothesize that dynamic passive interventions can be used to prevent gaming, eliminate the gaming arms race, and avoid unfairly penalizing on-task students by improper invocation. The ideal intervention should be dynamically linked to the students’ behavior or actions within the tutor, while not actively altering the tutoring environment interface or behavior for non-gaming students. A similar approach was made with “Scooter” in [1], however instead of additional exercises we employ simple messages (less altering of the tutor behavior), and instead of emotive cartoon images we use a student tracker that has informational overlays (mouse-overs). Our student tracker is a dynamic passive intervention that is a graphical visualization of student actions and progress that summarizes and identifies student behavior through emergent visual patterns. With these emergent visual patterns featured prominently on-screen for easy viewing by the student and teachers, gaming behavior might be prevented through a Panopticon-like effect (when the possibility of being watched, without knowing whether one is being watched at any given moment, causes self-corrective behavior) [7] or other psychological impulse. Additionally, such visualization should provide a launching point for teacher intervention where gaming behavior is identified or student misunderstandings are shown.
4Intervention Design
Our comprehensive approach uses active interventions to combat gaming with a passive intervention to prevent gaming within the ASSISTments mathematics ITS. We developed three dynamic gaming interventions, two traditional active interventions, and one passive intervention. The interventions were deployed and evaluated experimentally.
Two active interventions were used to respond separately to the two types of hallmark gaming behavior: rapid-fire guessing-and-checking and hint/help abuse. These interventions were triggered by a knowledge-engineered gaming detection model developed in [8], which marked a student as guessing-and-checking or abusing-hints based on the appropriate surface-level characteristics (no latent variables, such as student knowledge level or problem difficulty are used). If a student asked for three consecutive hints or incorrectly answered three consecutive times, then they were suspected of possibly gaming that problem. A possibly-hinting or possibly-guessing index would be increased by one, depending on which type of gaming they were suspected of engaging in. If an entire problem was completed without any individual step being suspected as gaming, then one or both the possibly-gaming indexes would be reduced by one (with a minimum of zero). If at any time either possibly-gaming index was above three, any further suspected actions increased the student’s guessing score or hinting score by one. Summing a students guessing score and hinting score yields their total gaming score [8]. In effect, this meant that if a student was repeatedly asking for hints until all help was exhausted, or repeatedly answering a question incorrectly without pausing, seeking help, or taking any other action, then the model begins to suspect them of gaming. Using prior evidence that gaming behavior usually occurs in clusters [9], the model marks a student as definitely gaming only if they are suspected of gaming on three separate yet consecutive problems. This suspicion-capability allowed for the graceful accommodation of students who occasionally displayed these gaming behaviors, but who were actually on task. When a student was marked as gaming, they received one of two active interventions, worded as carefully as possible as not to unintentionally insult the student or unjustly accuse them of doing anything improper. A screen capture of the active interventions appears in Figure 1.
Figure 1. Dynamic Active Interventions, Guess-and-Check & Hint-Abuse
The passive intervention sought to prevent gaming by providing visual feedback on student actions and progress by plotting student actions. It had no triggering mechanism, and was continuously featured prominently on-screen for easy viewing by the student and teachers. All recorded student actions (such as problem attempts, hint requests, bottom-out hints) were plotted in a horizontal timeline. Each action has associated summary text that identifies and provides relevant details and results of the action on mouse-over. These details include the question or problem text, the student action, the student answer or response (if they made an attempt), and the student response time (in milliseconds). The horizontal distance between points reflects the amount of time between the actions. The vertical height of actions is based on their outcome (correct actions are higher than incorrect actions). Throughout the design, the ubiquitous traffic-light color conventions of modern Western society are used, where green is implicitly “good” or “correct,” yellow is “caution,” and red is therefore “bad” or “incorrect.” As a summary estimate of the student’s performance, the background color of the graphic changes on a gradient from white to black based on the percentage correct of attempts (at one end of the spectrum, the color white is shown on greater than 90% correct, and on the other end of the spectrum, the color black is shown on less than 10% correct). An example of the component after a few minutes of on-task use is shown in Figure 2.
Figure 2. Dynamic Passive Intervention Example, On-Task
As designed, the passive intervention displays a summary of user actions over time that clearly classifies the behavior of a student to an observer (such as a teacher) by emerging patterns of indictors (see Figures 3 and 4 for example of charts capturing gaming behavior). The nature of the patterns (and the mouse-over drill down detail, not shown) also allows a teacher to ask, “what did you do here?” and prompt an investigation into the student actions which could reveal a lack of motivation, or even a fundamental gap in the student’s knowledge. This capability of assisting teachers in identifying student weaknesses or misunderstandings, via a trace of actions through the student’s session, can lead to a more sophisticated teacher-student dialogue than Scooter [1] or other previous responses to gaming.
Both of these functions, classifying gaming-status by an emergent pattern of indicators and the identification of student weaknesses, is captured in Figure 3, which illustrates how a typical guessing-and-checking gaming student’s actions would plot: a large series of rapid incorrect attempts is occasionally interrupted by a lone correct attempt (a lucky guess, perhaps) or the transfer to the next problem, only to be followed by a new stream of incorrect actions. This chart can only embody two possibilities: the student is engaging in off-task guessing-and-checking gaming behavior, or they have a fundamental lack of knowledge relating to the problem and are making lots of genuine errors without seeking help. Either way, teacher intervention is probably appropriate in such a case.
Figure 3. Dynamic Passive Intervention Example, Guessing-and-Checking
Similarly, Figure 4 shows the resulting graph of help-abuse gaming. A series of rapid hint requests (the yellow lines vertically near the center), followed immediately by a steep red drop (the bottom-out hint request) and then an upward green line (answering correctly after the answer was directly supplied by the bottom-out hint). This pattern occasionally appears in on-task usage, but when systematically repeated, is a clear indicator of off-task help-abuse gaming behavior.
Figure 4. Dynamic Passive Intervention Example, Abusing-Help
Both Figures 3 and 4 show the plots of students who were engaged in off-task gaming behavior after only a few minutes of a tutoring session. Since the graphical component scales as time progresses, the long-term identification of gaming appears slightly differently, but is even more readily apparent.
Figure 5. Dynamic Passive Intervention Example, Long Session with Partial Gaming
Figure 5 shows a plot of actions from a student whose usage was relatively on-task, then completed a problem via off-task help-abusing gaming. It is fairly obvious, even to an uninformed observer, that this portion of the graphical plot is far different than the rest of it. Not only does the graphical component plot the actions in a fairly logical manner (at the very least it is systematic), but it also does so in such a way that conspicuous patterns emerge in the case of off-task gaming behavior, especially as the length of the tutoring session increases.
5Experiment
Once all three dynamic interventions were designed and implemented, we conducted an experiment to test their effectiveness within the ASSISTments system with eighth-grade students from Worcester, Massachusetts, none of whom had used the system before the experiment. One group of students (70 students) received both the active and passive interventions (group 1); while a second group (57 students) received no interventions (group 2). Both groups of students used the tutoring system for an average of 3 class periods (approximately 45 minutes each period), each session having their rate of gaming measured by our knowledge-engineered gaming detection model [8]. Then we swapped the conditions, so that group 1 no longer received interventions, while the group 2 began to encounter them. The students used the tutoring system for another class period, and the rate of gaming was compared before and after the swapping of conditions.
Before switching conditions, group 1 had an average rate of gaming that was almost half the rate of group 2 (an average of 3.62 occurrences of gaming per session compared to 6.235), suggesting that the combination of active and passive interventions was perhaps having some sort of effect. However, in order to show that those differences were not due to chance the conditions were swapped. After the swap, both groups had decreased amounts of gaming. Group 1 reduced gaming on average by 2.8 occurrences per session, while group 2 decreased their gaming by an average of 4.4 occurrences per session. One-sided t-tests were performed on both groups average gaming scores, to see if the result was significantly different from zero, and in both cases the answer was yes (p < 0.0001, in both tests). To determine whether there really was a bigger impact with group 2 – enabling the intervention mechanisms versus disabling them – we conducted an analysis of variance (ANOVA) and the results (F(1, 125)=3.02, p=0.08) suggest that enabling the interventions (group 2) makes a bigger impact on gaming than disabling them (group 1).
Group / Metric / Before Change / After Change / Gaming Change / Gaming Std.Dev.Group 1(exper follwed by control) / Users (Students) / 70 / 70
Avg. Hint Score / 2.257 / 0.829
Avg. Guess Score / 6.657 / 0.057
Avg. Total Gaming Score / 8.914 / 0.886
Avg. Gaming Score Per Day / 3.62 / 0.8 / -2.82 / 3.70
Group 2 (Control followed by Inverstions) / Users (Students) / 57 / 57
Avg. Hint Score / 8.912 / 1.754
Avg. Guess Score / 11.61 / 0.053
Avg. Total Gaming Score / 20.53 / 1.807
Avg. Gaming Score Per Day / 6.235 / 1.781 / -4.45 / 6.71
Table 1. Raw Metrics from the Intervention Experiment
One possible interpretation and explanation of these results would be that when interventions are enabled students learn not to game, and once interventions are disabled, they simply continue not to game. Perhaps the decrease in gaming across both groups in the second half of the study is merely related to time, where any rebelliousness or novelty from gaming wears off and the offending students resign themselves to doing actual work. Further analysis might reveal whether actual invocation or receiving of the active interventions is correlated with this decrease in gaming, as opposed to simply the possibility of receiving them (a student might have never seen the active interventions when they were enabled if they were never gaming). Otherwise, we might be able to conclude that the decrease in gaming was due more to the passive intervention, or perhaps other factors. Either way, these results seem to suggest that the combination of the dynamic active and dynamic passive interventions has a successful effect in the reduction of off-task gaming behavior. We leave the identification of the particular effects of each factor for future work.
6Passive Intervention Teacher Evaluation
A teacher survey was used to gather anecdotal evaluations of the passive intervention and was also meant as a feedback mechanism for suggestions to improve the graphical chart. The teacher survey had 5 Likert-scale questions that were rated on a scale of 1 to 5 (1=strong disagree, 2=disagree, 3=not sure, 4=agree, 5=strongly agree), and one open response question requesting suggestions and feed-back. The survey was administered to 10 teachers who had used the system with and without the dynamic passive intervention. A summary of the questions and responses appears below in Table 2.
Question / Strongly Disagree / Disagree / Not Sure / Agree / Strongly AgreeDo you think that the new graphical chart will aid teachers in assessing the progress and performance of their students? / 0 / 0 / 1 / 1 / 8
Do you think that the new graphical chart will aid students in self-assessing their progress and performance? / 0 / 0 / 1 / 5 / 4
Do you think that the new graphical chart will provide students with additional performance-based motivation? / 0 / 0 / 0 / 5 / 5
Do you think that the new graphical chart will provide students with additional learning-based motivation? / 0 / 0 / 2 / 5 / 3
Do you think that the new graphical chart will decrease off-task student behavior (talking, inactivity, excessive or unnecessary hinting or guessing)? / 0 / 0 / 2 / 3 / 5
Table 2. Summary of Teacher Survey Results