Tracking bouncing balls: Default linear motion extrapolation and bounce anticipation p. 1
Tracking bouncing balls:
Default linear motion extrapolation and bounce anticipation
Jeroen Atsma
Rob van Lier
Arno Koning
Donders Institute for Brain, Cognition and Behaviour
Centre for Cognition
Radboud University Nijmegen, The Netherlands
Jeroen Atsma
Student number: 0513172
MSc Thesis Cognitive Neuroscience
July, 2011
Abstract
We investigated motion extrapolation in object tracking in two experiments. In Experiment 1, we used a Multiple-Object-Tracking (MOT) task (3 targets, 3 distractors) combined with a probe detection task to map the attentional spread around a target object. We showed an increased probe detection rate at locations where a target is heading to. In Experiment 2, we introduced a black line ('wall') in the centre of the screen and block-wise manipulated the object's motion with respect to this wall: in one condition objects realistically bounced against the wall, whereas in the other condition objects went through. Just before a target object coincided with the wall, a probe could appear either on the bounce path or on the straight path. In addition to MOT, we included a Single-Object-Tracking (SOT) task (1 target, 5 distractors) for control purposes. We found that in both tasks, straight-path probes were detected more often than bounce-path probes for both motion conditions. This supports the idea that (retinal) linear motion extrapolation, which was also found in Experiment 1, is an automatic, stimulus-driven process. In SOT, bounce-path probes were detected more often in the bounce-motion condition compared to the straight-motion condition. In MOT, this bounce anticipation was not observed, presumably because of the high attentional load that is involved in tracking multiple objects. We conclude that tracking mechanisms seem to be primarily stimulus-driven and only when attentional load is low, top-down predictions are used.
Keywords: multiple object tracking (MOT), visual attention, prediction, anticipation, spatiotemporal information, attentional allocation
Our ability to visually track multiple objects is crucial in daily life. For example, when you drive a car: To avoid collision, you have to attend to other vehicles, cyclists, crossing pedestrians etc. To do this, it is likely to be helpful to encode not only the current position of the objects (e.g. other cars), but also to predict where the objects will be positioned in the near future. Predicting where an object will be in the near future based on its motion information is called motion extrapolation. In order to catch a ball, motion extrapolation is used: you have to predict when and where the ball will reach you. In a situation where someone throws a ball against a wall, in order to catch this ball, you have to anticipate the bounce. Motion extrapolation in such a situation is more complex, but adults appear to be able to do it almost flawlessly. Here we investigate (anticipatory) motion extrapolation in a situation where multiple objects have to be tracked.
Tracking an object requires selective attention. The attention system is defined by a few core aspects: it has a limited amount of resources, and only with some effort, these resources can be distributed over a limited number of stimuli (Pashler, 1998). Stimuli can intentionally be selected (top-down), or automatically (bottom-up). In order to maintain one's focus on a moving object, the object should maintain its identity as the same persisting individual, even when it temporarily becomes occluded or changes shape. In other words, the visual system has to determine object persistence. It seems that the visual system determines object persistence by assuming that physical objects cannot instantly relocate from one place to the next, but exist throughout space and time. In other words, the visual system expects spatiotemporal continuity (e.g. Scholl, 2001; Spelke, 1990). For example, when an object disappears into a 'tunnel' and reappears along a consistent trajectory, the object is perceived as one enduring entity (Burke, 1952; Michotte, Thinès & Crabbé, 1994/1991). This happens even when the object’s size, colour or shape has changed dramatically when it reappears (Flombaum & Scholl, 2006; Flombaum, Kundey, Santos, & Scholl, 2004).
In case more than one object is relocated, the brain has to figure out which object moved where. To do this, two heuristics seem to be used in particular: the law of proximity (e.g. Kolers, 1972; Navon, 1976; Ullman, 1980; Dawson, 1991) and the law good continuation (e.g. Wertheimer, 1923/1950; Ramachandran & Anstis, 1983). Consider two objects that are separated by a visual angle of ten degrees. When suddenly both objects are instantly relocated by one degree in some direction, object persistence will be determined by the law of proximity. The visual system assumes that the objects have not been swapped (so they must have travelled between eight and twelve degrees), but instead have travelled only one degree. The law good continuation refers to the heuristic that objects in motion do not change direction abruptly. Ramachandran and Anstis (1983) showed that when presenting ambiguous apparent motion of dots, perception can become strongly biased (and overriding proximity effects) by introducing interactions with earlier dots. The authors postulated that “If an object has once been seen moving in one direction, there is a strong tendency to continue seeing motion in that direction” (Ramachandran & Anstis, 1983, p.84). This suggests that the visual system may also use motion extrapolation for individuating and tracking moving objects.
We will distinguish two forms of motion extrapolation: anticipatory (top-down) and retinal (bottom-up). In the example with the bouncing ball, top-down knowledge about kinetics must be used to correctly extrapolate the ball's motion path. In the study of Ramachandran and Anstis, the apparent motion of the dots was biased at the retinal level. Motion extrapolation was determined by the law of good continuation without the involvement of top-down expectations. Verghese and McKee (2002) investigated retinal motion extrapolation using a trajectory detection task. Humans appear to be good in detecting a trajectory in visual noise. When surrounded by hundreds of random moving dots, a single dot moving on a straight path for only 100 ms already draws attention (Nakayama & Silverman, 1984; van Doorn & Koenderink, 1984; Snowden & Braddick, 1989). Verghese and Mckee showed that when within a time window of 70-100 ms unidirectional movement is detected, attention is automatically allocated to the subsequent segments of its trajectory leading to enhanced detectability.
The presence of retinal motion extrapolation is also shown in a phenomenon called 'representational momentum' (Freyd & Finke, 1984): When a moving object suddenly disappears and an observer is asked to localize the object’s final position, the observer's estimation is typically shifted in the direction of motion (Hubbard, 1995; Kerzel, 2003a; Kerzel 2003b; Iordanescu, Grabowecky, & Suzuki, 2009). However, motion extrapolation in representational momentum is not an automatic process like the one described in the study of Verghese and McKee. That is to say, when between the time of object disappearance and response attention is drawn by distracting stimuli, the forward displacement effect (shift) disappears (Kerzel, 2003a). This suggests that motion extrapolation in representational momentum requires sustained attention.
In the last couple of decades, the functioning of the attention system in object tracking has been extensively studied using a paradigm known as Multiple Object Tracking (MOT) (Pylyshyn & Storm, 1988; see Scholl, 2007 for a review). In MOT, a subset (targets) of identical moving objects on a pc screen have to be tracked simultaneously. Often the number of non-targets (distractors) is identical to the number of targets. Because all objects are identical, only spatiotemporal information can be used to separate the targets from the distractors. It has been found that humans can track up to five targets (Pylyshyn & Storm, 1988). After this number, tracking performance rapidly declines (e.g. Oksama & Hyönä, 2004).
Representational momentum has also been found in MOT. Iordanescu, Grabowecky, and Suzuki (2009) showed that target localization becomes more precise when a target disappears at a location close to other objects. Such a situation is prone to target-distractor confusion and therefore needs extra attentional resources. This heightening of attention reduces the perceived shift into the direction of motion (i.e. the forward displacement effect). According to the authors, when a target is less prone to target-distractor confusion, the visual system seems to rely more on retinal extrapolation. It is also possible that because in such a situation several motion cues are located close to each other at the retinal level, retinal motion extrapolation might break off.
When an object temporarily becomes occluded, the object's motion information is not present anymore at the retinal level. To extrapolate the object's motion in such a situation, top-down expectancies have to be used (i.e. anticipatory motion extrapolation). Several studies suggest that, in MOT, such expectancies are not used. For example, Franconeri, Pylyshyn and Scholl (2005) looked at tracking performance in MOT when targets temporarily become occluded. They showed that tracking was unaffected when a target object changed its movement direction under occlusion. However, tracking performance dropped when the distance between the location of occlusion and disocclusion was increased experimentally. This suggests that in situations of occlusion, the law of proximity is primarily used to determine object persistence. Furthermore, a MOT study by Keane and Pylyshyn (2006) showed that when all objects abruptly disappear for a brief period of time (e.g. 200 - 400 ms), tracking was best when the objects reappeared not at their extrapolated locations, but at the locations of disappearance or even opposite to their direction of motion. In sum, the visual system seems to (automatically) extrapolate the object's motion path (e.g. Verghese & McKee, 2002), but does not seem to use this information for tracking when occlusion occurs (e.g. Keane & Pylyshyn, 2006).
Apparently, when targets become occluded, top-down expectations (i.e. anticipation) are not used. Is motion extrapolation in MOT, then, purely retinal? When you watch a ball that bounces in a physically plausible way, you might have a guess about the ball’s new movement angle just before it actually bounces. Intuitively this would benefit tracking. Here, we investigate whether this form of anticipatory motion extrapolation is used in MOT. Based on the study by Verghese and Mckee, we expect that attention will be automatically allocated to the regions ahead of the moving targets (i.e. retinal motion extrapolation). In Experiment 1 we will map the spread of attention around randomly moving targets in order to find this retinal motion extrapolation. In Experiment 2, we will let the objects bounce in a physically plausible way. We hypothesize that just before a target bounces, attention increases at regions along the target's bounce path. In other words, we expect to find anticipatory motion extrapolation expressed in the distribution of attention.
The distribution of attention will be measured using a task where brief dots (probes) have to be detected. Since attention increases visual resolution and therefore visual sensitivity (Handy, Kingston, & Mangun, 1996; Yeshurun & Carrasco, 1998), detection of probes at specific locations can be used as a measure of the amount of attentional resources at those locations. MOT combined with a probe detection task enables us to map the distribution of attention, while spatiotemporal information of the moving objects can be manipulated experimentally (e.g. Alvarez & Scholl, 2005; Flombaum, Scholl, & Pylyshyn, 2008). For example, it has been shown that attention is concentrated on targets while distractors are actively inhibited (Pylyshyn, 2006; Flombaum, Scholl, & Pylyshyn, 2008). For Experiment 1, we hypothesize that probes ahead of a moving target will be detected more often compared to probes at other angles. For Experiment 2, we hypothesize that probes at a certain angle (from the target's centre) will be detected more often when a target is about to bounce into that angle. We will investigate bounce anticipation in MOT and also in Single Object Tracking (SOT). We hypothesize that because in SOT attentional load is relatively low, top-down anticipation will be more pronounced in SOT compared to MOT.
Experiment 1: Examining the spread of attention around a moving MOT target
The purpose of experiment 1 was to map the spread of attention with respect to a target's movement direction. In MOT, open space is not attended homogenously. Some locations in open space will receive more attentional resources than others, based on the objects' spatiotemporal information (e.g. Alvarez & Scholl, 2005; Matsubara, Shioiri, & Yaguchi, 2007; Vrins, Atsma, Koning, & van Lier, submitted). We used a probe detection task (briefly appearing dots) to measure the spread of attention at eight angles relative to the target's movement direction. We hypothesized that probes ahead of a target will be detected most often, reflecting retinal motion extrapolation. Four distances (from the target's centre) were used to determine at what distance the target's motion has the most influence on the spread of attention.
Method
Participants
Twenty-two undergraduate students of the Radboud University Nijmegen took part in the experiment and received course credits for participation. Three participants were excluded from analysis because of poor probe detection performance (< 10%). The nineteen analyzed subjects, eighteen women and one male, were aged between 17 and 23 years (M = 19.2, SD = 1.8). All reported normal or corrected-to-normal vision and were naive about the purpose of the experiment.
Stimuli and Design
Each trial involved three target and three distractor objects. The objects were all black circular outlines presented on a white background and each circle subtended 2.2° of the visual field (96 pixels). Each trial started with a four-second target-designation phase where the targets blinked four times. After this phase, all objects started to move along straight paths into random directions (using random 2D vectors). All objects were continuously visible and did not occlude each other. When an object intersected with the edge of the screen or another object, a new (pseudo-)random direction resolved the intersection, giving rise to a (unrealistically) bounce.
Object trajectories and probe locations were generated beforehand using custom made software written in C#. Object speed was kept constant at 7.0° / s (300 pixels / s). Only when a target was more than 300 pixels away from the other objects, could a probe appear near that target. In order to map the spread correctly, probes appeared not as stationary dots, but moved along with the associated target object. This kept the angle and distance to the target's centre constant.