Visual Motion Processing and Human Tracking Behavior - Modelling - Biologically Inspired Computer Vision (2015)

Biologically Inspired Computer Vision (2015)

Part III

Chapter 12
Visual Motion Processing and Human Tracking Behavior

Anna Montagnini, Laurent U. Perrinet and Guillaume S. Masson

12.1 Introduction

Vision is our primary source of information about the 3D layout of our environment or the occurrence of events around us. In nature, visual motion is abundant and is generated by a large set of sources, such as the movement of another animal, be it predator or prey, or our own movements. Primates possess high-performance systems for motion processing. Such a system is closely linked to ocular tracking reflexive sensorimotor circuitry. Two important observations support this assumption: first, it is not possible to generate smooth pursuit at will across a stationary scene, and, second, it is not possible to fully suppress pursuit in a scene consisting solely of moving targets [2, 3].

In nonhuman and human primates, visual motion information is extracted through a cascade of cortical areas spanning the dorsal stream of the cortical visual pathways (for reviews, see Ref. [4]). Direction- and speed-selective cells are found in abundance intwo key areas lying at the junction between the occipital and parietal lobes. The middle temporal (MT) and medio-superior temporal (MST) areas can decode local motion information of a single object at multiple scales, isolate it from its visual background, and reconstruct its trajectory [5, 6]. Interestingly, neuronal activities in MT and MST areas have been strongly related to the initiation and maintenance of tracking eye movements [7]. Dozens of experimental studies have shown that local direction and speed information for pursuit are encoded by MT populations [8] and transmitted to MST populations where non-retinal information are integrated to form an internal model of object motion [9]. Such a model is then forwarded to both frontal areas involved in pursuit control, such as frontal eye fields (FEF) and supplementary eye fields (SEF), as well as to the brainstem and cerebellar oculomotor system (see Ref. [10] for a review).

What can we learn about visual motion processing by investigating smooth pursuit responses to moving objects? Since the pioneering work of Hassenstein and Reichardt [11], behavioral studies of tracking behavioral responses have been highly influential upon theoretical approaches to motion detection mechanisms (see Chapter 17, this book), as illustrated by the fly vision literature (see Refs [12, 13] for recent reviews) and its application in bioinspired vision hardwares (e.g., Refs [14, 15]). Because of their strong stimulus-driven nature, tracking eye movements have been used similarly in primates to probe the properties of fundamental aspects of motion processing, from detection to pattern motion integration (see a series of recent reviews in Refs [2, 16, 17]) and the coupling between the active pursuit of motion and motion perception [18]. One particular interest of tracking responses is that they are time-continuous, smooth, measurable movements that reflect the temporal dynamics of sensory processing in the changes of eye velocity [16, 19]. Here, we would like to focus on a particular aspect of motion processing in the context of sensorimotor transformation: uncertainty. Uncertainty arises from both random processes, such as the noise reflecting unpredictable fluctuations on the velocity signal, as well as from nonstochastic processes such as ambiguity when reconstructing global motion from local information. We will show that both sensory noise and ambiguities impact the sensorimotor transformation as seen from the variability of eye movements and their course toward a steady, optimal solution.

Understanding the effects of various sources of noise and ambiguities can change our views on the two faces of the sensorimotor transformation. First, we can better understand how visual motion information is encoded in neural populations and the relationships between these population activities and behaviors [17, 20]. Second, it may change our view about how the brain controls eye movements and opens the door to new theoretical approaches based on inference rather than linear control systems. Albeit still within the framework of stimulus-driven motor control, such new point of view can help us elucidate how higher-level cognitive processes, such as prediction, can dynamically interact with sensory processing to produce a versatile, adaptive behavior by which we can catch a flying ball despite its complex and even sometimes partially occluded trajectory.

This chapter is divided into four parts. First, we examine how noise and ambiguity both affect the pursuit initiation. In particular, we focus on the temporal dynamics of uncertainty processing and the estimate of the optimal solution for motion tracking. Second, we summarize how nonsensory, predictive signals can help maintain a good performance when sensory evidences become highly unreliable and, on the contrary, when the future sensory inputs become highly predictable. Herein, we illustrate these aspects with behavioral results gathered in both human and nonhuman primates. Third, we show that these different results on visually guided and predictive smooth pursuit dynamics can be reconciled within a single Bayesian framework. Last, we propose a biologically plausible architecture implementing a hierarchical inference network for a closed-loop, visuomotor control of tracking eye movements.

12.2 Pursuit Initiation: Facing Uncertainties

The ultimate goal of pursuit eye movements is to reduce the retinal slip of image motion down to nearly zero such that fine details of the moving pattern can be analyzed by spatial vision mechanisms. Overall, the pursuit system acts as a negative feedback loop where the eye velocity matches target velocity (such that the pursuit gain is close to 1) to cancel the image motion on the retina. However, because of the delays due to both sensory and motor processing, the initial rising phase of eye velocity, known as pursuit initiation, is open-loop. This means that during this short period of time (less than about 100 ms), no information about eye movements is available to the system and the eye velocity depends only on the properties of the target motion presented to the subject. This short temporal window is ideal to probe how visual motion information is processed and transformed into an eye movement command [17, 21]. It becomes possible to map the different phases of visual motion processing to the changes in initial eye velocity and therefore to dissect out the contribution of spatial and temporal mechanisms of direction and speed decoding [19]. However, this picture becomes more complicated as soon as one considers more naturalistic and noisy conditions for motion tracking, whereby, for instance, the motion of a complex-shaped extended object has to be estimated, or when several objects move in different directions. We will not focus, here, on the last problem, which involves the complex and time-demanding computational task of object segmentation [16, 22]. In the next subsections, we focus instead on the nature of the noise affecting pursuit initiation and the uncertainty related to limitations in processing spatially localized motion information.

12.2.1 Where Is the Noise? Motion-Tracking Precision and Accuracy

Human motion tracking is variable across repeated trials and the possibility to use tracking behavior (at least during the initiation phase) to characterize visual motion processing across time and across all kinds of physical properties of the moving stimuli relies strongly on the assumption that oculomotor noise does not override the details of visual motion processing. In order to characterize the variability of macaques' pursuit responses to visual motion signals, Ref. [23] analyzed the monkeys' pursuit eye movements during the initiation phase – here, the first c12-math-0001 ms after pursuit onset. On the basis of a principal component analysis of pursuit covariance matrix, they concluded that pursuit variability was mostly due to sensory fluctuations in estimating target motion parameters such as onset time, direction, and speed, accounting for around c12-math-0002 of the pursuit variability. In a follow-up study, they estimated the time course of the pursuit system's sensitivity to small changes in target direction, speed, and onset time [24]. This analysis was based on pursuit variability during the first c12-math-0003 ms after target motion onset. Discrimination thresholds (inverse of sensitivity) decreased rapidly during open-loop pursuit and, in the case of motion direction, followed a similar time course to the one obtained from the analysis of neuronal activity in the MT area [25].

Noise inherent to the kinematic parameters of the moving target is not the only source of uncertainty for visual motion processing. A very well-known and puzzling finding in visual motion psychophysics is that the speed of low-contrast moving stimuli is most often underestimated as compared to high-contrast stimuli moving with exactly the same motion properties (see Chapter 9). In parallel with the perceptual misjudgment, previous studies have shown that tracking quality is systematically degraded (with longer onset latencies, lower acceleration at initiation, and lower pursuit gain) with low-contrast stimuli [26]. This reduction of motion estimation and tracking accuracy when decreasing the luminance-contrast of the moving stimulus has been interpreted as evidence in favor of the fact that, when visual information is corrupted, the motion-tracking system relies more on internal models of motion, or motion priors [27].

12.2.2 Where Is the Target Really Going?

The external world is largely responsible for variability. Another major source of uncertainty when considering sensory processing is ambiguity: a single retinal image can correspond to many different physical arrangements of the objects in the environment. In the motion domain, a well-known example of such input ambiguity is called “the aperture problem.” When seen through a small aperture, the motion of an elongated edge (i.e., a one-dimensional -1D- change in luminance, see Figure 12.1(a), middle panel) is highly ambiguous. The same local motion signal can be generated by an infinite number of physical translations of the edge. Hans Wallach (see Ref. [28] for an English translation of the original publication in German) was the first psychologist to recognize this problem and to propose a solution for it. A spatial integration of motion information provided by edges with different orientations can be used to recover the true velocity of the pattern. Moreover, two-dimensional (2D) features such as corners or line-endings (whereby luminance variations occur along two dimensions, see Figure 12.1(a), right panel, for an example) can also be extracted through the same small aperture as their motion is no longer ambiguous. Again, 1D and 2D motion signals can be integrated to reconstruct the two-dimensional velocity vector of the moving pattern. After several decades of intensive research at both physiological and behavioral levels, it remains largely unclear what computational rules are used by the brain to solve the 2D motion integration problem (see Ref. [29] for a collection of review articles).


Figure 12.1 Smooth pursuit's account for the dynamic solution of motion ambiguity and motion prediction. (a) A tilted bar translating horizontally in time (left panel) carries both ambiguous 1D motion cues (middle panel), and nonambiguous 2D motion cues (rightmost panel). (b) Example of average horizontal (c12-math-0004) and vertical (c12-math-0005) smooth pursuit eye velocity while tracking a vertical (left) or a tilted bar (right) translating horizontally, either to the right (red curves) or to the left (green curves). Velocity curves are aligned on smooth pursuit onset. (c) Schematic description of a trial in the experiment on anticipatory smooth pursuit: after a fixation display, a fixed duration blank precedes the motion onset of a tilted line moving rightward (with probability c12-math-0006) or leftward (with probability c12-math-0007). (d) Example of average horizontal (c12-math-0008) and vertical (c12-math-0009) smooth pursuit eye velocity in the anticipation experiment for two predictability conditions, c12-math-0010 (unpredictable, black curves) and c12-math-0011 (completely predictable, gray curves).

Indeed, several computational rules for motion integration have been proposed over the last 40 years (see Ref. [5] for a review). In a limited number of cases, a simple vector averaging of the velocity vectors corresponding to the different 1D edge motions can be sufficient. A more generic solution, the intersection-of-constraints (IOC) is a geometrical solution that can always recover the exact global velocity vector from at least two moving edges with different orientations [30, 31]. However, the fact that perceived direction does not always correspond to the IOC solution (for instance, for very short stimulus duration [32] or when a single 1D motion signal is present [33, 34]) has supported the role of local 2D features in motion integration.

Several feedforward computational models have been proposed to implement these different rules [35, 36]. All these feedforward models have the same architecture. Motion integration is seen as a two-stage computation. The first stage, corresponding to cortical area V1 in primates, extracts local motion information through a set of oriented spatiotemporal filters. This corresponds to the fact that most V1 neurons respond to the direction orthogonal to the orientation of an edge drifting across their receptive field [37]. The local motion analyzers feed a second, integrative stage where pattern motion direction is computed. This integrative stage is thought to correspond to the extra-striate cortical MT area in primates. MT neurons have large receptive fields, they are strongly direction selective and a large fraction of them can unambiguously signal the pattern motion direction, regardless of the orientation of their 1D components [37, 38]. Different nonlinear combinations of local 1D motion signals can be used to extract either local 2D motion cues or global 2D motion velocity vectors. Another solution proposed by Perrinet and Masson [39] is to consider that local motion analyzers are modulated by motion coherency [40]. This theoretical model shows the emergence of similar 2D motion detectors. These two-stage frameworks can be integrated into more complex models where local motion information is diffused across some retinotopic maps.

12.2.3 Human Smooth Pursuit as Dynamic Readout of the Neural Solution to the Aperture Problem

Behavioral measures do not allow capture of the detailed temporal dynamics of the neuronal activity underlying motion estimate. However, smooth pursuit recordings do still carry the signature of the dynamic transition between the initial motion estimate dominated by the vector average of local 1D cues and the later estimate of global object motion. In other terms, human tracking data provides a continuous (delayed and lowpass filtered) dynamic readout of the neuronal solution to the aperture problem. Experiments in our and other groups [41–43] have consistently demonstrated, in both humans and monkeys, that tracking is transiently biased at initiation toward the direction orthogonal to the moving edge (or the vector average if multiple moving edges are present), when such direction does not coincide with the global motion direction. After some time (typically 200–300 ms), such bias is extinguished and the tracking direction converges to the object's global motion. In the example illustrated in Figure 12.1, a tilted bar translates horizontally, thereby carrying locally ambiguous edge-related information (middle panel of part a). A transient nonzero vertical smooth pursuit velocity (lower right panel of Figure 12.1(b)) reflects the initial aperture-induced bias, which is different from the case where local and global motion are coherent (as for the pursuit of a horizontally moving vertical bar, see Figure 12.1(b), leftmost panels).

The size of the transient directional tracking bias and the time needed for converging to the global motion solution depend on several properties of the visual moving stimulus [42] including stimulus luminance contrast [44, 45]. In Section 12.4.1, we will see that this tracking dynamics is consistent with a simple Bayesian recurrent model (or equivalently a Kalman filter [44]), which takes into account the uncertainty associated with the visual moving stimulus and combines it with prior knowledge about visual motion (see also Chapter 9).

12.3 Predicting Future and On-Going Target Motion

Smooth pursuit eye movements can also rely on prediction of target movement to accurately follow the target despite a possible major disruption of the sensory evidence. Prediction allows also to compensate for processing delays, an unavoidable problem of sensory-to-motor transformations.

12.3.1 Anticipatory Smooth Tracking

The exploitation of statistical regularities in the sensory world and/or of cognitive information ahead of a sensory event is a common trait of adaptive and efficient cognitive systems that can, on the basis of such predictive information, anticipate choices and actions. It is a well-established fact that humans cannot generate smooth eye movements at will: for instance, it is impossible to smoothly track an imaginary target with the eyes, except in the special condition in which an unseen target is self-moved through the smooth displacement of one's own finger [46]. In addition, smooth pursuit eye movements do necessarily lag unpredictable visual target motion by a (short) time delay. In spite of this, it was already known many years ago that, when tracking regular periodic motion, pursuit sensorimotor delay can be nulled and a perfect synchronicity between target and eye motion is possible (see Refs [3, 47] for detailed reviews). Furthermore, when the direction of motion of a moving target is known in advance (for instance, because motion properties are the same across many repeated experimental trials), anticipatory smooth eye movements are observed in advance of the target motion onset [48], as illustrated in the upper panel of Figure 12.1(d). Interestingly, relatively complicated motion patterns such as piecewise linear trajectories [49] or accelerated motion [50] can be also anticipated. Finally, probabilistic knowledge about target motion direction or speed [51], and even subjectively experienced regularities extracted from the previous few trials [52] can modulate anticipatory smooth pursuit in a systematic way. Recently, several researchers have tested the role of higher-level cognitive cues for anticipatory smooth pursuit, leading to a rather diverse set of results. Although verbal or pictorial abstract cues indicating the direction of the upcoming motion seem to have a rather weak (although non-inexistent) influence on anticipatory smooth pursuit, other cues are more easily and immediately interpreted and used for motion anticipation [3]. For instance, a barrier blocking one of two branches in a line-drawing illustrating an inverted-y-shaped tube, where the visual target was about to move [53], leads to robust anticipatory smooth tracking in the direction of the other, unblocked branch.

12.3.2 If You Don't See It, You Can Still Predict (and Track) It

While walking on a busy street downtown, we may track a moving car with our gaze and, even when it is hidden behind a truck driving in front of it, we can still closely follow its motion and have our gaze next to the car's position at its reappearance. In the lab, researchers have shown [54] that during the transient disappearance of a moving target, human subjects are capable of continuing to track the hidden motion with their gaze, although with a lower gain (see Figure 12.2 (a) and (b)). During blanking, indeed, after an initial drop, eye velocity can be steadily maintained, typically at about 70% of pre-blanking target velocity, although higher eye speed can be achieved with training [55]. In addition, when the blank duration is fixed, an anticipatory reacceleration of the gaze rotation is observed ahead of target reappearance [56]. Extra-retinal, predictive information is clearly called into play to drive ocular tracking in the absence of a visual target. The true nature of the drive for such predictive eye velocity is still debated (see Ref. [2] for a review). Previous studies have proposed that it could either be a copy of the oculomotor command (an efference copy) serving as a positive feedback [55, 57] or a sample of visual motion being held in working memory [56]. In all cases, a rather implausible “switch-like” mechanism was assumed, in order to account for the change of regime between the visual- and prediction-driven tracking.


Figure 12.2 Examples of human smooth pursuit traces (one different participant on each column, a naive one on the left and a non-naive one on the right side) during horizontal motion of a tilted bar which is transiently blanked during steady-state pursuit. (a) and (b): Average horizontal (c12-math-0012) and vertical (c12-math-0013) eye velocity. Different blanking conditions are depicted by different colors, as from the figure legend. The vertical dashed line indicates the blank onset; vertical full colored lines indicate the end of the blanking epoch for each blanking duration represented. (c) and (e) Zoom on the aperture-induced bias of vertical eye velocity at target motion onset, for all blanking conditions. (d) and (f) Zoom on the aperture-induced bias of vertical eye velocity at target reappearance after blank (time is shifted so that 0 corresponds to blank offset), for all blanking conditions.

While the phenomenology of human smooth pursuit during the transient absence of a visual target is well investigated (see, e.g., Refs [56, 58, 59]), less is known about its functional characterization, and about how the extra-retinal signals implicated in motion tracking without visual input interact with retinal signals across time. In particular, as motion estimation for tracking is affected by sensory noise and computational limitations (see Sections 12.2.1 and 12.2.2), do we rely on extra-retinal predictive information in a way that depends on sensory uncertainty? In the past two decades, the literature on optimal cue combinations in multisensory integration has provided evidence for a weighted sum of different sources of information (such as visual and auditory [60], or visual and haptic cues [61], whereby each source is weighted according to its reliability (defined as inverse variance). Recently, [62] have shown that a predictive term on the smoothness of the trajectory is sufficient to account for the motion extrapolation part of the motion; however, it lacked a mechanism to weight retinal and extra retinal information. In another recent study, we have tested the hypothesis that visual and predictive information for motion tracking are weighted according to their reliability and dynamically integrated to provide the observed oculomotor command [63].

In order to do so, we have analyzed human smooth pursuit during the tracking of a horizontally-moving tilted bar that could be transiently hidden at different moments (blanking paradigm), either early, during pursuit initiation, or late, during steady-state tracking. By comparing the early and late blanking conditions we found two interesting results: first, the perturbation of pursuit velocity caused by the disappearance of the target was more dramatic for the late than the early blanking, both in terms of relative velocity drop and presence of an anticipatory acceleration before target reappearance. Second, a small, but significant, aperture-induced tracking bias (as described in Section 12.2.3) was observed at target reappearance after late but not early blanking. Interestingly, these two measures (the size of the tracking velocity reduction after blank onset and the size of the aperture bias after target disappearance) turned out to be significantly correlated across subjects for the late blanking conditions.

We interpreted the ensemble of these results as evidence in favor of dynamic optimal integration of visual and predictive information: at pursuit initiation, sensory variability is strong and predictive cues related to target- or gaze-motion dominate, leading to a relative reduction of both the effects of target blanking and of the aperture-bias. On the contrary, later on, sensory information becomes more reliable and the sudden disappearance of a visible moving target leads to a more dramatic disruption of motion tracking; coherently with this, the (re)estimation of motion is more strongly affected by the inherent ambiguity of the stimulus (a tilted bar). Finally, the observed correlation of these two quantities across different human observers indicates that the same computational mechanism (i.e., optimal dynamic integration of visual and predictive information) is scaled at the individual level in such a way that some people rely more strongly than others on predictive cues rather than on intrinsically noisy sensory evidence. Incidentally, in our sample of human volunteers, the expert subjects seemed to rely more on predictive information than the naive ones.

In Section 12.4.2, we illustrate a model which is based on hierarchical Bayesian inference and is capable to qualitatively capture the human behavior in our blanking paradigm. A second important question is whether and how predictive information is affected by uncertainty as well. We start to address this question in the next section.

12.4 Dynamic Integration of Retinal and Extra-Retinal Motion Information: Computational Models

12.4.1 A Bayesian Approach for Open-Loop Motion Tracking

Visual image noise and motion ambiguity, the two sources of uncertainty for motion estimate described in Sections 12.2.1 and 12.2.2 can be well integrated within a Bayesian [27, 44, 64, 65] or, equivalently, a Kalman-filtering framework [66, 67], whereby estimated motion is the solution of a dynamical statistical inference problem [68]. In these models, the information from different visual cues (such as local 1D and 2D motions) can be represented as probability distributions by their likelihood functions. Bayesian models also allow the inclusion of prior constraints related to experience, expectancy bias, and all possible sources of extrasensory information. On the ground of a statistical predominance of static or slowly moving objects in nature, the most common assumption used in models of motion perception is a preference for slow speeds, typically referred to as a low-speed prior (represented in Figure 12.3 (a)). The effects of priors are especially salient when signal uncertainty is high (see Chapter 9).


Figure 12.3 A Bayesian recurrent module for the aperture problem and its dynamic solution. (a) the prior and the two independent 1D (b) and 2D (c) likelihood functions (for a tilted line moving rightward at 5c12-math-0014/s) in the velocity space are multiplied to obtain the posterior velocity distribution (d). The inferred image motion is estimated as the velocity corresponding to the posterior maximum (MAP). Probability density functions are color-coded, such that dark red corresponds to the highest probability and dark blue to the lowest one.

The sensory likelihood functions can be derived for simple objects with the help of a few reasonable assumptions. For instance, the motion cue associated with a nonambiguous 2D feature would be approximated by a Gaussian likelihood centered on the true stimulus velocity and with a variance proportional to visual noise (e.g., inversely related to its visibility, see Figure 12.3)(c). On the other hand, edge-related ambiguous information would be represented by an elongated velocity distribution parallel to the orientation of the moving edge, with an infinite variance along the edge direction reflecting the aperture ambiguity (Figure 12.3(b)). Weiss and colleagues [27] have shown that, by combining a low-speed prior with an elongated velocity likelihood distribution parallel to the orientation of the moving line, it is possible to predict the aperture-induced bias (as illustrated in Figure 12.3(d)). By introducing the independent contribution of the nonambiguous 2D likelihood, as well as a recurrent optimal update of the prior (with a feedback from the posterior [44], see Figure 12.3), and cascading this recurrent network with a realistic model of smooth pursuit generation [45], we have managed to reproduce the complete dynamics of the solution of the aperture problem for motion tracking, as observed in human smooth pursuit traces.

12.4.2 Bayesian (or Kalman-Filtering) Approach for Smooth Pursuit: Hierarchical Models

Beyond the inferential processing of visual uncertainties, which mostly affect smooth pursuit initiation, we have seen in Section 12.3.2 that predictive cues can efficiently drive motion tracking when the visual information is deteriorated or missing. This flexible control of motion tracking has traditionally been modeled [55–57, 59] in terms of two independent modules, one processing visual motion and the other maintaining an internal memory of target motion. The weak point of these classical models is that they did not provide a biologically plausible mechanism for the interaction or alternation between the two types of control: a somewhat ad hoc switch was usually assumed for this purpose.

Inference is very likely to occur at different spatial, temporal, and neuro-functional scales. Sources of uncertainty can indeed affect different steps of the sensorimotor process in the brain. Recent work in our group [63] and other groups [67] has attempted to model several aspects of human motion tracking within a single framework, that of Bayesian inference, by postulating the existence of multiple inferential modules organized in a functional hierarchy and interacting according to the rules of optimal cue combination [68, 69]. Here we outline an example of this modeling approach applied to the processing of visual ambiguous motion information under different conditions of target visibility (in the blanking experiment).

In order to explain the data summarized in Section 12.3.2 for the transient blanking of a translating tilted bar, we designed a two-module hierarchical Bayesian recurrent model, illustrated in Figure 12.4. The first module, the retinal recurrent network (panel Figure 12.4a), implements the dynamic inferential process which is responsible for solving the ambiguity at pursuit initiation (see Section 12.2.3) and it only differs from the model scheme in Figure 12.3 by the introduction of processing delays estimated from the literature in monkey electrophysiology [70]. The second module (Figure 12.4b), the extra-retinal recurrent network, implements a dynamic short-term memory buffer for motion tracking. Crucially, the respective outputs of the retinal and extra-retinal recurrent modules are optimally combined in the Bayesian sense, so that their mean is weighted with their reliability (inverse variance) before the combination. By cascading the two Bayesian modules with a standard model [71] for the transformation of the target velocity estimate into eye velocity (Figure 12.4 c) and adding some feedback connections (also standard in the models of smooth pursuit to mimic the closed-loop phase), the model is capable, with few free parameters, to simulate motion tracking curves that resemble qualitatively the onesobserved for human subjects both during visual pursuit and during blanking. Note that one single crucial free parameter, representing a scaling factor for the rapidity with which the sensory variance increases during target blank, is responsible for the main effects described in Section 12.3.2 on the tracking behavior during the blank and immediately after the end of it.


Figure 12.4 Two-stages hierarchical Bayesian model for human smooth pursuit in the blanking experiment. The retinal recurrent loop (a) is the same as in Figure 12.3, with the additional inclusion of physiological delays. The posterior from the retinal recurrent loop and prior from the extra-retinal Bayesian network (b) are combined to form the postsensory output (c12-math-0015). The maximum a posteriori of the probability (c12-math-0016) of target velocity in space serves as an input to both the positive feedback system as well as the oculomotor plants (c). The output of the oculomotor plant is subtracted from the target velocity to form the image's retinal velocity (physical feedback loop shown as broken line). During the transient blank when there is no target on the retina, the physical feedback loop is not functional so that the retinal recurrent block does not decode any motion. The output of the positive feedback system (shown by the broken line) is added to the postsensory output (c12-math-0017) only when the physical feedback loop is functional. The probability distribution of target velocity in space (c12-math-0018) is provided as an input to the extra-retinal recurrent Bayesian network where it is combined with a prior to obtain a posterior which is used to update the prior.

Orban de Xivry et al. [67] have proposed an integrated Kalman filter model based on two filters, the first one extracting a motion estimate from noisy visual motion input, similar to a slightly simplified version of the previously described Bayesian retinal recurrent module. The second filter (referred to as a predictive pathway) provides a prediction for the upcoming target velocity on the basis of long-term experience (i.e., from previous trials). Importantly, the implementation of a long-term memory for a dynamic representation of target motion (always associated with its uncertainty) allows to reproduce the observed phenomenon of anticipatory tracking when target motion properties are repeated across trials (see Section 12.3.1. However, in Section 12.6 we mention some results that challenge the current integrated models of hierarchical inference for motion tracking.

12.4.3 A Bayesian Approach for Smooth Pursuit: Dealing with Delays

Recently, we considered optimal motor control and the particular problems caused by the inevitable delay between the emission of motor commands and their sensory consequences [72]. This is a generic problem that we illustrate within the context of oculomotor control where it is particularly insightful (see, for instance, Ref. [73] for a review). Although focusing on oculomotor control, the more general contribution of this work is to treat motor control as a pure inference problem. This allows us to use standard (Bayesian filtering) schemes to resolve the problem of sensorimotor delays – by absorbing them into a generative (or forward) model. A generative model is a set of parameterized equations which describe our knowledge about the dynamics of the environment. Furthermore, this principled and generic solution has some degree of biological plausibility because the resulting active (Bayesian) filtering is formally identical to predictive coding, which has become an established metaphor for neuronal message passing in the brain (see Ref. [74], for instance). It uses oculomotor control as a vehicle to illustrate the basic idea using a series of generative models of eye movements – that address increasingly complicated aspects of oculomotor control. In short, we offer a general solution to the problem of sensorimotor delays in motor control – using established models of message passing in the brain – and demonstrate the implications of this solution in the particular setting of oculomotor control.

Specifically, we considered delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalization of Kalman filtering to provide Bayes' optimal estimates of hidden states and action (such that our model is a particular hidden Markov model) in generalized coordinates of motion. Representing hidden states in generalized coordinates provides a simple way of compensating for both sensory and oculomotor delays. The efficacy of this scheme is illustrated using numerical simulations of pursuit initiation responses, with and without compensation. We then considered an extension of the generative model to simulate smooth pursuit eye movements – in which the visuo-oculomotor system believes both the target and its center of gaze are attracted to a (hidden) point moving in the visual field, similar to what was proposed above in Section 12.4.3. Finally, the generative model is equipped with a hierarchical structure, so that it can recognize and remember unseen (occluded) trajectories and emit anticipatory responses (see Section 12.4.2).

We show in Figure 12.5 the results of this model for a two-layered hierarchical generative model. The hidden causes are informed by the dynamics of hidden states at the second level: these hidden states model underlying periodic dynamics using a simple periodic attractor that produces sinusoidal fluctuations of arbitrary amplitude or phase and a frequency that is determined by a second-level hidden cause with a prior expectation of a frequency of c12-math-0019 (in Hz). It is somewhat similar to a control system model that attempts to achieve zero-latency target tracking by fitting the trajectory to a (known) periodic signal [75]. Our formulation ensures a Bayes' optimal estimate of periodic motion in terms of a posterior belief about its frequency. In these simulations, we used a fixed Gaussian prior centered on the correct frequency with a period of c12-math-0020. This prior reproduces a typical experimental setting in which the oscillatory nature of the trajectory is known, but its amplitude and phase (onset) are unknown. Indeed, it has been shown thatanticipatory responses are cofounded when randomizing the intercycle interval [54]. In principle, we could have considered many other forms of generative model, such as models with prior beliefs about continuous acceleration [76]. With this addition, the improvement in pursuit accuracy apparent at the onset of the second cycle is consistent with what was observed empirically [77].


Figure 12.5 This figure reports the simulation of smooth pursuit when the target motion is hemi-sinusoidal, as would happen for a pendulum that would be stopped at each half cycle left of the vertical (broken black lines in panel (d). We report the horizontal excursions of oculomotor angle in retinal space (a), (b) and the angular position of the target in an intrinsic frame of reference (visual space), (c), (d). Panel (d) shows the true value of the displacement in visual space (broken black lines) and the action (blue line) which is responsible for oculomotor displacements. Panel (a) shows in retinal space the predicted sensory input (colored lines) and sensory prediction errors (dotted red lines) along with the true values (broken black lines). The latter is effectively the distance of the target from the center of gaze and reports the spatial lag of the target that is being followed (solid red line). One can see clearly the initial displacement of the target that is suppressed after a few hundred milliseconds. The sensory predictions are based upon the conditional expectations of hidden oculomotor (blue line) and target (red line) angular displacements shown in panel (b). The gray regions correspond to 90% Bayesian confidence intervals and the broken lines show the true values of these hidden states. The generative model used here has been equipped with a second hierarchical level that contains hidden states, modeling latent periodic behavior of the (hidden) causes of target motion (states not shown here). The hidden cause of these displacements is shown with its conditional expectation in panel (c). The true cause and action are shown in panel (d). The action (blue line) is responsible for oculomotor displacements and is driven by the proprioceptive prediction errors.

This is because the model has an internal representation of latent causes of target motion that can be called upon even when these causes are not expressed explicitly in the target trajectory. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system – such as the oculomotor system – tries to control its environment with delayed signals. Neurobiologically, the application of delay operators just means changing synaptic connection strengths to take different mixtures of generalized sensations and their prediction errors.

12.5 Reacting, Inferring, Predicting: A Neural Workspace

We have proposed herein a hierarchical inference network that can both estimate the direction and speed of a moving object despite the inherent ambiguities present in the images and predict the target trajectory from accumulated retinal and extra-retinal evidence. What could be the biologically plausible implementation of such a hierarchy? What are its advantages for a living organism?

The fact that the pursuit system can be separated into two distinct blocks has been proposed by many others (see for recent reviews [17, 47, 78, 79]. Such structure is rooted on the need to mix retinal and extra-retinal information to ensure stability of pursuit, as originally proposed by Ref. [80]. Elaborations of this concept have formed the basis of a number of models based on a negative feedback control system [81, 82]. However, the fact that a simple efference copy feedback loop cannot account for anticipatory responses during target blanking as well as for the role of expectation about future target trajectory [47] or reinforcement learning during blanking [55] has called into question the validity of this simplistic approach (see Ref. [79] for a recent review). This has led to more complex models where an internal model of target motion is reconstructed from an early sampling and storage of target velocity and an efference copy of the eye's velocity signal. Several memory components have been proposed to serve different aspects of cognitive control of smooth pursuit and anticipatory responses [79, 83]. The hierarchical Bayesian model presented above (see Section 12.4.2) follows the same structure with two main differences. First, the sensory processing itself is seen as a dynamical network, whereas most of the models cited in this section have oversimplified target velocity representation. Second, we collapse all the different internal blocks and loops into a single inference loop representing the perceived target motion.

We have proposed earlier that these two inference loops might be implemented by two large-scale brain networks [78]. The dynamical visual inference loop is based on the properties of primate visual areas V1 and MT where local and global motion signals have been clearly identified, respectively (see Ref. [29] for a series of recent review articles). MT neurons solve the aperture problem with a slow temporal dynamics. When presented with a set of elongated, tilted bars, their initial preferred direction matches the motion direction orthogonal to the bar orientation. From there, that preferred direction gradually rotates toward the true, 2D translation of the bar so that after about c12-math-0021 ms, the MT population signals the correct pattern motion direction, independently of the bar orientation [70]. Several models have proposed that such dynamics is due to recurrent interactions between the V1 and MT cortical areas (e.g., [84–86]) and we have shown that the dynamical Bayesian model presented above give a good description of such neuronal dynamics and its perceptual and behavioral counterparts [44] (see also Chapter 10). Such recurrent network can be upscaled to include other cortical visual areas involved in shape processing (e.g., areas V2, V3, V4) to further improve form–motion integration and select one target among several distractors or the visual background [84]. The V1–MT loop exhibits however two fundamental properties with respect to the tracking of the object's motion. First, neuronal responses stop immediately when the retinal input disappears as during object blanking. Second, the loop remains largely immune to higher, cognitive inputs. For instance, the slow-speed prior used in our Bayesian model can hardly be changed by training in human observers [48].

In nonhuman primates, the medial superior temporal (MST) cortical area is essential for pursuit. It receives MT inputs about target direction and speed of visual pattern and represents the target's velocity vector. In the context of smooth pursuit control, neuronal activities in the MST area show several interesting features. First, during pursuit, if the target is blanked, the neuronal activity is maintained throughout the course of the blanking. This is clearly different from the MT area where neurons stop firing in the occurrence of even a brief disappearance of the target [9]. Second, both monkeys and humans can track imaginary large line-drawing targets where the central foveal part is absent [87]. MST neurons, but not MT cells, can signal the motion direction of these parafoveal targets, despite the fact visual edges fall outside their receptive fields. Thus, in the MST area, neurons are found whose activities are not different during pursuit of real (i.e., complete) or imaginary (i.e., parafoveal) targets [29, 88]. Lastly, many MST neurons can encode target motion veridically during eye movements in contrast to MT cells [89]. The above evidence strongly suggests that MST neurons integrate both retinal and extra-retinal information to reconstruct the perceived motion of the target. However, despite the fact that MT and MST areas are strongly, and recurrently connected, the strong difference between MT and MST neuronal responses during pursuit seems to indicate that extra-retinal signals are not back propagated to early visual processing areas. Interestingly, several recent models have articulated their two-stage computational approach with this architecture [90, 91] in order to model the dynamics of primate smooth-pursuit.

From MST, output signals are sent in two directions. One signal reaches the brainstem oculomotor structures through the pontine nuclei, the cerebellar floccular region, and the vestibular nuclei [92]. This cortico-subcortical pathway conveys the visual drive needed for pursuit initiation and maintenance. The second signal reaches the frontal cortex that includes the caudal parts of the FEF and the SEF (see Refs [10, 79] for recent reviews). FEF neurons share many properties of MST cells. In particular, they integrate both retinal and extra-retinal information during pursuit so that responses remain sustained during blanking or when simulated with imaginary targets [78, 88]. Moreover, both FEF and MST cells show a buildup of activity during anticipatory pursuit [93, 94]. Thus, FEF and MST appear to be strongly coupled to build an internal representation of target motion that can be used during steady-state tracking as well as during early phases of pursuit. Moreover, FEF area issues pursuit commands that are sent to the brainstem nucleus reticularis tegmenti pontis (NRTP) and the cerebellar vermis lobules before reaching the pursuit oculomotor structures. Several authors have proposed that such parieto-frontal loops might implement the prediction component of the pursuit responses (see Ref. [78] for review). Other have proposed to restrict its role to the computation of the perceived motion signal that drive the pursuit response [79], while higher signals related to prediction might be computed in more anterior areas such as SEF and prefrontal cortex (PFC).

Prediction is influenced by many cognitive inputs (cues, working memory, target selection) [47]. Accordingly, prediction-related neuronal responses during pursuit have been reported in the SEF area [95] and the caudal part of FEF [94]. Moreover, SEF activity facilitates anticipatory pursuit responses to highly predictable targets [96]. The group of Fukushima have identified several subpopulations of neurons in both areas that can encode directional visual motion memory, independently of movement preparation signals (see Ref. [79] for a complete review). However, FEF neurons more often mix predictive and motor preparation signals, while SEF cells more specifically encode a visual motion memory signal. This is consistent with the fact that many neurons in the PFC have been linked to temporal storage of sensory signals [97]. Thus, a working memory of target motion might be formed in the SEF area by integrating multiple inputs from parietal (MST) and prefrontal (FEF, PFC) cortical areas. Fukushima et al. [79] proposed that a recurrent network made of these areas (MST, FEF, SEF, PFC) might signal future target motion using prediction, timing, and expectation, as well as experience gained over trials.

All these studies define a neuronal workspace for our hierarchical inference model, as illustrated in Figure 12.6. Two main loops seem to be at work. A first loop predicts the optimal target motion direction and speed from sensory evidence (image motion computation, in red). It uses sensory priors such as the “smooth and slow motion prior” used for both perception and pursuit initiation that are largely immune to higher influence. By doing so, the sensory loop can preserve its ability to quickly react to a new sensory event and avoid the inertia of prediction systems. This loop would correspond to the reactive pathway of the pursuit model proposed by Barnes [47] and Fukushima et al. [79]. On the ground of some behavioral evidence [48], we believe that target motion prediction cannot easily overcome the aperture problem (see also the open questions discussed in Section 12.6), providing a strong indication that this sensory loop is largely not penetrable to cognitive influences. The second loop involves the MST and FEF areas to compute and store online target motion by taking into account both sensory and efference copy signals (object motion computation, green). A key aspect is that MST must act as a gear to prevent predictive or memory-related signals to backpropagate downstream to the sensory loop. We propose to distinguish between online prediction involving these two areas during an ongoing event such as target blanking and off-line prediction. The latter is based on a memory of target motion that spans across trials and might be used to both trigger anticipatory responses or drive responses based on cues. It might most certainly involve a dense, recurrent prefrontal network articulated around SEF and that offers a critical interface with the cognitive processes interfering with pursuit control (object memory loop, in blue).


Figure 12.6 A lateral view of the macaque cortex. The neural network corresponding to our hierarchical Bayesian model of smooth pursuit is made of three main corticocortical loops. The first loop between primary visual cortex (V1) and the mediotemporal (MT) area computes image motion and infers the optimal low-level solution for object motion direction and speed. Its main output is the medio-superior temporal (MST) area that acts as a gear between the sensory loop and the object motion computation loop. Retinal and extra-retinal signals are integrated in both MST and FEF areas. Such dynamical integration computes the perceived trajectory of the moving object and implements an online prediction that can be used on the course of a tracking eye movement to compensate for target perturbation such as transient blanking. FEF and MST area signals are sent to the supplementary eye field (SEF) and the interconnected prefrontal cortical areas. This third loop can elaborate a motion memory of the target trajectory and is interfaced with higher cognitive processes such as cue instruction or reinforcement learning. It also implements off-line predictions that can be used across trials, in particular to drive anticipatory responses to highly predictable targets.

This architecture presents many advantages. First, it preserves the brain to quickly react to a new event as a brutal change in target motion direction. Second, it ensures maintaining pursuit in a wide variety of conditions with good stability, and by constructing an internal model of target motion, it allows a tight coordination between pursuit and saccades [98]. Third, it provides an interface with higher cognitive aspects of sensorimotor transformation. Several questions remain however unanswered. Because of the strong changes seen in prediction with different behavioral contexts, several models, including the one presented here, postulate the existence of hard switches that can turn on or off the contribution of a particular model component. We need a better theoretical approach about decision making between these different loops. The Bayesian approach proposed here, similar to the Kalman filter models, opens the door to better understanding these transitions. It proposes that each signal (sensory evidence, motion memory, prediction) can be weighted from its reliability. Such a unifying theoretical approach can then be used to design new behavioral and physiological experiments.

12.6 Conclusion

Recent experimental evidence points to the need to revise our view of the primates' smooth pursuit system. Rather than a reflexive velocity-matching, negative-feedback loop, the human motion tracking system seems to be grounded on a complex set of dynamic functions that subserve a quick and accurate adaptive behavior even in visually challenging situations. By analyzing tracking eye movements produced with a simple, unique, and highly visible moving target, many of these notions could not have clearly emerged and it is now clear that testing more naturalistic visual motion contexts and carefully taking into account the sources of uncertainty at different scales is a crucial step toward understanding biological motion tracking. The approach highlighted here opens the door to several questions.

First, we have focused herein on luminance-based motion processing mechanisms. Such inputs can be well extracted by a bank of filters sensitive to motion energy at multiple scales. The human visual motion system is however more versatile and psychophysical studies have demonstrated that motion perception is based on cues that can be defined in many different ways. These empirical studies have led to the three-system theory of human visual motion perception by Lu and Sperling [99]. Besides the first-order system that responds to moving luminance patterns, a second-order system responds to moving modulations of feature types (i.e., stimuli where the luminance is the same everywhere but an area of higher contrast or of flicker moves). A third-order system slowly computes the motion of marked locations in a “salience map” where locations of important visual features in the visual space (i.e., a figure) are highlighted respective to their “background” (see Chapter 11). The contribution of the first-order motion to the initiation of tracking responses have been largely investigated (see Refs [17, 19] for reviews). More recent studies have pointed out that feature tracking mechanisms (i.e., a second-order system) are critical for finely adjusting this initial eye acceleration to object speed when reaching steady-state tracking velocity [100]. Whether and how the third-order motion system is involved in the attentional modulation of tracking is currently under investigation by many research groups. Altogether, these psychophysical and theoretical studies point toward the need for more complex front-end layers of visuomotor models so that artificial systems would become more versatile and adapt to complex, ambiguous environments.

Second, although the need for hierarchical, multiscale inferential models is now apparent, current models will have to meet the challenge of explaining a rich and complex set of behavioral data. Just to detail an example, somewhat unexpectedly, we have found that predicting 2D motion trajectories does not help solving the aperture problem [48]. Indeed, when human subjects are exposed to repeated motion conditions for a horizontally translating tilted bar across several experimental trials and they develop anticipatory smooth pursuit in the expected direction, yet their pursuit traces reflect the aperture-induced bias at initiation, as illustrated in Figure 12.1(d). It is important to notice that the robust optimal cue combination of the output of sensory and predictive motion processing modules postulated in the previous section [63, 67] are not capable, at this stage, to explain this phenomenon. Other crucial issues deserve to be better understood and modeled in this new framework, for instance, the dynamic interaction between global motion estimate and object segmentation, the precise functional and behavioral relationship between smooth tracking and discrete jump-like saccadic movements, or the role of high-level cognitive cues in modulating motion tracking [3].

12.6.1 Interest for Computer Vision

Machine-based motion tracking, similar to human motion tracking, needs to be robust to visual perturbations. As we have outlined above, including some form of predictive information might be extremely helpful to stabilize object tracking, in much the same way as what happens during transient target blanking for human smooth pursuit. Importantly, human tracking seems to be equipped with a more complete and advanced software for motion tracking, namely, one that allows us (i) to exploit the learned regularities of motion trajectories to anticipate and compensate for sensorimotor delays and consequent mismatch between gaze and target position in the event of a complex trajectory; (ii) to keep tracking an object temporarily occluded by other objects; and (iii) to start tracking in the dark when the future motion of an object is predictable, or even partly predictable. Some of these predictive cues might depend upon a high-level cognitive representation of kinematic rules or arbitrary stimulus–response associations. This type of apparatus is probably much too advanced to be implemented into a common automatic motion tracking device, but still it may be a source of inspiration for future advanced developments for machine vision and tracking.


The authors were supported by EC IP project FP7-269921, “BrainScaleS” and the French ANR-BSHS2-2013-006, ‘Speed’. They are grateful to Amarender Bogadhi, Laurent Madelain, Frederic Chavane and all the colleagues in the InVibe team for many enriching discussions. L.U.P wishes to thank Karl Friston and Rick Adams and the The Wellcome Trust Centre for Neuroimaging, University College London, for their essential contribution in the closed-loop delayed model. The chapter was written when G.S.M was an invited fellow from the CONICYT program, Chili. Correspondence and requests for materials should be addressed to A.M. (


1. 1. Rashbass, C. (1961) The relationship between saccadic and smooth tracking eye movements. J. Physiol., 159, 326–338.

2. 2. Kowler, E. (2011) Eye movements: the past 25years. Vision Res., 51 (13), 1457–1483, doi: 10.1016/j.visres.2010.12.014

3. 3. Kowler, E., Aitkin, C.D., Ross, N.M., Santos, E.M., and Zhao, M. (2014) Davida teller award lecture 2013: the importance of prediction and anticipation in the control of smooth pursuit eye movements. J. Vis., 14 (5), 10, doi: 10.1167/14.5.10

4. 4. Maunsell, J.H. and Newsome, W.T. (1987) Visual processing in monkey extrastriate cortex. Annu. Rev. Neurosci., 10, 363–401, doi: 10.1146/

5. 5. Bradley, D.C. and Goyal, M.S. (2008) Velocity computation in the primate visual system. Nat. Rev. Neurosci., 9 (9), 686–695, doi: 10.1038/nrn2472

6. 6. Born, R.T. and Bradley, D.C. (2005) Structure and function of visual area MT. Annu. Rev. Neurosci., 28, 157–189, doi: 10.1146/annurev.neuro.26.041002.131052

7. 7. Newsome, W.T. and Wurtz, R.H. (1988) Probing visual cortical function with discrete chemical lesions. Trends Neurosci., 11 (9), 394–400.

8. 8. Lisberger, S.G. and Movshon, J.A. (1999) Visual motion analysis for pursuit eye movements in area MT of macaque monkeys. J. Neurosci., 19 (6), 2224–2246.

9. 9. Newsome, W.T., Wurtz, R.H., and Komatsu, H. (1988) Relation of cortical areas MT and MST to pursuit eye movements. II. Differentiation of retinal from extraretinal inputs. J. Neurophysiol., 60 (2), 604–620.

10.10. Krauzlis, R.J. (2004) Recasting the smooth pursuit eye movement system. J. Neurophysiol., 91 (2), 591–603, doi: 10.1152/jn.00801.2003

11.11. Hassenstein, B. and Reichardt, W. (1956) Systemtheoretische analyze der zeit-, reihenfolgen- und vorzeichenauswer- tung bei der bewegungsperzeption des russelkafers chlorophanus. Z. Naturforsch., 11b, 513–524.

12.12. Borst, A. (2014) Fly visual course control: behavior, algorithms and circuits. Nat. Rev. Neurosci., 15, 590–599.

13.13. Borst, A., Haag, J., and Reiff, D. (2010) Fly motion vision. Annu. Rev. Neurosci., 33, 49–70.

14.14. Harrison, R. and Koch, C. (2000) A robust analog VLSI Reichardt motion sensor. Analog Integr. Circ. Signal Process., 24, 213–229.

15.15. Kohler, T., Röchter, F., Lindemann, J., and Möller, R. (2009) Bio-inspired motion detection in a FPGA-based smart camera. Bioinspiration Biomimetics, 4, 015 008.

16.16. Masson, G.S. (2004) From 1D to 2D via 3D: dynamics of surface motion segmentation for ocular tracking in primates. J. Physiol. Paris, 98 (1-3), 35–52, doi: 10.1016/j.jphysparis.2004.03.017

17.17. Lisberger, S.G. (2010) Visual guidance of smooth-pursuit eye movements: sensation, action, and what happens in between. Neuron, 66 (4), 477–491, doi: 10.1016/j.neuron.2010.03.027

18.18. Spering, M. and Montagnini, A. (2011) Do we track what we see? Common versus independent processing for motion perception and smooth pursuit eye movements: a review. Vision Res., 51 (8), 836–852, doi: 10.1016/j.visres.2010.10.017

19.19. Masson, G.S. and Perrinet, L.U. (2012) The behavioral receptive field underlying motion integration for primate tracking eye movements. Neurosci. Biobehav. Rev., 36 (1), 1–25, doi: 10.1016/j.neubiorev.2011.03.009

20.20. Osborne, L.C. (2011) Computation and physiology of sensory-motor processing in eye movements. Curr. Opin. Neurobiol., 21 (4), 623–628, doi: 10.1016/j.conb.2011.05.023

21.21. Lisberger, S.G., Morris, E.J., and Tychsen, L. (1987) Visual motion processing and sensory-motor integration for smooth pursuit eye movements. Annu. Rev. Neurosci., 10, doi: 10.1146/

22.22. Schütz, A.C., Braun, D.I., Movshon, J.A., and Gegenfurtner, K.R. (2010) Does the noise matter? Effects of different kinematogram types on smooth pursuit eye movements and perception. J. Vis., 10(13), 26, doi: 10.1167/10.13.26

23.23. Osborne, L.C., Lisberger, S.G., and Bialek, W. (2005) A sensory source for motor variation. Nature, 437 (7057), 412–416, doi: 10.1038/nature03961

24.24. Osborne, L.C., Hohl, S.S., Bialek, W., and Lisberger, S.G. (2007) Time course of precision in smooth-pursuit eye movements of monkeys. J. Neurosci., 27 (11), 2987–2998, doi: 10.1523/JNEUROSCI.5072-06.2007

25.25. Osborne, L.C., Bialek, W., and Lisberger, S.G. (2004) Time course of information about motion direction in visual area MT of macaque monkeys. J. Neurosci., 24 (13), 3210–3222, doi: 10.1523/JNEUROSCI.5305-03.2004

26.26. Spering, M., Kerzel, D., Braun, D.I., Hawken, M., and Gegenfurtner, K. (2005) Effects of contrast on smooth pursuit eye movements. J. Vis., 20 (5), 455–465,

27.27. Weiss, Y., Simoncelli, E.P., and Adelson, E.H. (2002) Motion illusions as optimal percepts. Nat. Neurosci., 5 (6), 598–604, doi: 10.1038/nn858

28.28. Wuerger, S., Shapley, R., and Rubin, N. (1996) “On the visually perceived direction of motion” by Hans Wallach: 60 years later. Perception, 25 (11), 1317–1367, doi: 10.1068/p251317

29.29. Masson, G.S. and Ilg, U.J. (eds) (2010) Dynamics of Visual Motion Processing: Neuronal, Behavioral and Computational Approaches, Springer-Verlag.

30.30. Fennema, C. and Thompson, W. (1979) Velocity determination in scenes containing several moving images. Comput. Graph. Image Process., 9, 301–315.

31.31. Adelson, E.H. and Movshon, J.A. (1982) Phenomenal coherence of moving visual patterns. Nature, 300 (5892), 523–525.

32.32. Yo, C. and Wilson, H.R. (1992) Perceived direction of moving two-dimensional patterns depends on duration, contrast and eccentricity. Vision Res., 32 (1), 135–147.

33.33. Lorenceau, J., Shiffrar, M., Wells, N., and Castet, E. (1993) Different motion sensitive units are involved in recovering the direction of moving lines. Vision Res., 33 (9), 1207–1217.

34.34. Gorea, A. and Lorenceau, J. (1991) Directional performances with moving plaids: component-related and plaid-related processing modes coexist. Spat. Vis., 5 (4), 231–252.

35.35. Wilson, H.R., Ferrera, V.P., and Yo, C. (1992) A psychophysically motivated model for two-dimensional motion perception. Visual Neurosci., 9 (1), 79–97.

36.36. Löffler, G. and Orbach, H.S. (1999) Computing feature motion without feature detectors: a model for terminator motion without end-stopped cells. Vision Res., 39 (4), 859–871.

37.37. Albright, T.D. (1984) Direction and orientation selectivity of neurons in visual area MT of the macaque. J. Neurophysiol., 52 (6), 1106–1130.

38.38. Movshon, J.A., Adelson, E.H., Gizzi, M.S., and Newsome, W.T. (1985) The analysis of moving visual patterns, in Pattern Recognition Mechanisms, vol. 54 (eds C. Chagas, R. Gattass, and C. Gross), Vatican Press, Rome, pp. 117–151.

39.39. Perrinet, L.U. and Masson, G.S. (2012) Motion-Based prediction is sufficient to solve the aperture problem. Neural Comput., 24 (10), 2726–2750, doi: 10.1162/NECO_a_00332

40.40. Burgi, P.Y., Yuille, A.L., and Grzywacz, N.M. (2000) Probabilistic motion estimation based on temporal coherence. Neural Comput., 12 (8), 1839–1867,

41.41. Masson, G.S. and Stone, L.S. (2002) From following edges to pursuing objects. J. Neurophysiol., 88 (5), 2869–2873, doi: 10.1152/jn.00987.2001

42.42. Wallace, J.M., Stone, L.S., and Masson, G.S. (2005) Object motion computation for the initiation of smooth pursuit eye movements in humans. J. Neurophysiol., 93 (4), 2279–2293, doi: 10.1152/jn.01042.2004

43.43. Born, R.T., Pack, C.C., Ponce, C.R., and Yi, S. (2006) Temporal evolution of 2-dimensional direction signals used to guide eye movements. J. Neurophysiol., 95 (1), 284–300, doi: 10.1152/jn.01329.2005

44.44. Montagnini, A., Mamassian, P., Perrinet, L.U., Castet, E., and Masson, G.S. (2007) Bayesian modeling of dynamic motion integration. J. Physiol. Paris, 101 (1-3), 64–77, doi: 10.1016/j.jphysparis.2007.10.013

45.45. Bogadhi, A.R., Montagnini, A., Mamassian, P., Perrinet, L.U., and Masson, G.S. (2011) Pursuing motion illusions: a realistic oculomotor framework for Bayesian inference. Vision Res., 51 (8), 867–880, doi: 10.1016/j.visres.2010.10.021

46.46. Gauthier, G.M. and Hofferer, J.M. (1976) Eye tracking of self-moved targets in the absence of vision. Exp. Brain Res., 26 (2), 121–139.

47.47. Barnes, G.R. (2008) Cognitive processes involved in smooth pursuit eye movements. Brain Cogn., 68 (3), 309–326, doi: 10.1016/j.bandc.2008.08.020

48.48. Montagnini, A., Spering, M., and Masson, G.S. (2006) Predicting 2D target velocity cannot help 2D motion integration for smooth pursuit initiation. J. Neurophysiol., 96 (6), 3545–3550, doi: 10.1152/jn.00563.2006

49.49. Barnes, G.R. and Schmid, A.M. (2002) Sequence learning in human ocular smooth pursuit. Exp. Brain Res., 144 (3), 322–335, doi: 10.1007/s00221-002-1050-8

50.50. Bennett, S.J., Orban de Xivry, J.J., Barnes, G.R., and Lefèvre, P. (2007) Target acceleration can be extracted and represented within the predictive drive to ocular pursuit. J. Neurophysiol., 98 (3), 1405–1414, doi: 10.1152/jn.00132.2007

51.51. Montagnini, A., Souto, D., and Masson, G.S. (2010) Anticipatory eye-movements under uncertainty: a window onto the internal representation of a visuomotor prior. J. Vis., 10 (7), 554.

52.52. Kowler, E., Martins, A.J., and Pavel, M. (1984) The effect of expectations on slow oculomotor control–IV. anticipatory smooth eye movements depend on prior target motions. Vision Res., 24 (3), 197–210.

53.53. Kowler, E. (1989) Cognitive expectations, not habits, control anticipatory smooth oculomotor pursuit. Vision Res., 29 (9), 1049–1057.

54.54. Becker, W. and Fuchs, A.F. (1985) Prediction in the oculomotor system: smooth pursuit during transient disappearance of a visual target. Exp. Brain Res., 57 (3), 562–575, doi: 10.1007/BF00237843

55.55. Madelain, L. and Krauzlis, R.J. (2003) Effects of learning on smooth pursuit during transient disappearance of a visual target. J. Neurophysiol., 90 (2), 972–982, doi: 10.1152/jn.00869.2002

56.56. Bennett, S.J. and Barnes, G.R. (2003) Human ocular pursuit during the transient disappearance of a visual target. J. Neurophysiol., 90 (4), 2504–2520, doi: 10.1152/jn.01145.2002

57.57. Churchland, M.M., Chou, I.H.H., and Lisberger, S.G. (2003) Evidence for object permanence in the smooth-pursuit eye movements of monkeys. J. Neurophysiol., 90 (4), 2205–2218, doi: 10.1152/jn.01056.2002

58.58. Orban de Xivry, J.J., Bennett, S.J., Lefèvre, P., and Barnes, G.R. (2006) Evidence for synergy between saccades and smooth pursuit during transient target disappearance. J. Neurophysiol., 95 (1), 418–427, doi: 10.1152/jn.00596.2005

59.59. Orban de Xivry, J.J., Missal, M., and Lefèvre, P. (2008) A dynamic representation of target motion drives predictive smooth pursuit during target blanking. J. Vis., 8 (15), 6.1–13, doi: 10.1167/8.15.6

60.60. Alais, D. and Burr, D. (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr. Biol., 14 (3), 257–262, doi: 10.1016/j.cub.2004.01.029

61.61. Ernst, M.O. and Banks, M.S. (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415 (6870), 429–433, doi: 10.1038/415429a

62.62. Khoei, M., Masson, G., and Perrinet, L.U. (2013) Motion-based prediction explains the role of tracking in motion extrapolation. J. Physiol. Paris, 107 (5), 409–420, doi: 10.1016/j.jphysparis.2013.08.001

63.63. Bogadhi, A., Montagnini, A., and Masson, G. (2013) Dynamic interaction between retinal and extraretinal signals in motion integration for smooth pursuit. J. Vis., 13 (13), 5, doi: 10.1167/13.13.5

64.64. Stocker, A.A. and Simoncelli, E.P. (2006) Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci., 9 (4), 578–585, doi: 10.1038/nn1669

65.65. Perrinet, L.U. and Masson, G.S. (2007) Modeling spatial integration in the ocular following response using a probabilistic framework. J. Physiol. Paris, 101 (1-3), 46–55, doi: 10.1016/j.jphysparis.2007.10.011

66.66. Dimova, K. and Denham, M. (2009) A neurally plausible model of the dynamics of motion integration in smooth eye pursuit based on recursive Bayesian estimation. Biol. Cybern., 100 (3), 185–201, doi: 10.1007/s00422-009-0291-z

67.67. Orban de Xivry, J.J., Coppe, S., Blohm, G., and Lefèvre, P. (2013) Kalman filtering naturally accounts for visually guided and predictive smooth pursuit dynamics. J. Neurosci., 33 (44), 17 301–17 313, doi: 10.1523/JNEUROSCI.2321-13.2013

68.68. Kalman, R.E. (1960) A new approach to linear filtering and prediction problems. Trans. ASME–J. Basic Eng., 82 (Series D), 35–45.

69.69. Fetsch, C.R., Pouget, A., DeAngelis, G.C., and Angelaki, D.E. (2012) Neural correlates of reliability-based cue weighting during multisensory integration. Nat. Neurosci., 15 (1), 146–154, doi: 10.1038/nn.2983

70.70. Pack, C.C. and Born, R.T. (2001) Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. Nature, 409, 1040–1042.

71.71. Goldreich, D., Krauzlis, R.J., and Lisberger, S.G. (1992) Effect of changing feedback delay on spontaneous oscillations in smooth pursuit eye movements of monkeys. J. Neurophysiol., 67 (3), 625–638,

72.72. Perrinet, L.U., Adams, R.A., and Friston, K.J. (2014) Active inference, eye movements and oculomotor delays. Biol. Cybern., 108 (6), 777–801, doi: 10.1007/s00422-014-0620-8

73.73. Nijhawan, R. (2008) Visual prediction: psychophysics and neurophysiology of compensation for time delays. Behav. Brain Sci., 31 (02), 179–198, doi: 10.1017/s0140525x08003804

74.74. Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., and Friston, K.J. (2012) Canonical microcircuits for predictive coding. Neuron, 76 (4), 695–711, doi: 10.1016/j.neuron.2012.10.038

75.75. Bahill, A.T. and McDonald, J.D. (1983) Model emulates human smooth pursuit system producing zero-latency target tracking. Biol. Cybern., 48 (3), 213–222,

76.76. Bennett, S.J., Orban de Xivry, J.J., Lefèvre, P., and Barnes, G.R. (2010) Oculomotor prediction of accelerative target motion during occlusion: long-term and short-term effects. Exp. Brain Res., 204 (4), 493–504, doi: 10.1007/s00221-010-2313-4

77.77. Barnes, G.R., Barnes, D.M., and Chakraborti, S.R. (2000) Ocular pursuit responses to repeated, single-cycle sinusoids reveal behavior compatible with predictive pursuit. J. Neurophysiol., 84 (5), 2340–2355,

78.78. Masson, G.S., Montagnini, A., and Ilg, U.J. (2010) When the brain meets the eye: tracking object motion, Biological Motion Processing, vol. 8 (eds G.S. Masson and U.J. Ilg), Springer-Verlag, pp. 161–188.

79.79. Fukushima, K., Fukushima, J., Warabi, T., and Barnes, G.R. (2013) Cognitive processes involved in smooth pursuit eye movements: behavioral evidence, neural substrate and clinical correlation. Front. Syst. Neurosci., 7 (4), 1–28.

80.80. Yasui, S. and Young, L. (1975) Perceived visual motion as effective stimulus to pursuit eye movement system. Science, 190, 906–908.

81.81. Robinson, D.A., Gordon, J.L., and Gordon, S.E. (1986) A model of the smooth pursuit eye movement system. Biol. Cybern., 55 (1), 43–57, doi: 10.1007/bf00363977

82.82. Krauzlis, R.J. and Lisberger, S.G. (1994) A model of visually-guided smooth pursuit eye movements based on behavioral observations. J. Comput. Neurosci., 1, 265–283.

83.83. Barnes, G.R. and Collins, C. (2011) The influence of cues and stimulus history on the non-linear frequency characteristics of the pursuit response to randomized target motion. Exp. Brain Res., 212, 225–240.

84.84. Tlapale, E., Masson, G.S., and Kornprobst, P. (2010) Modelling the dynamics of motion integration with a new luminance-gated diffusion mechanism. Vision Res., 50 (17), 1676–1692, doi: 10.1016/j.visres.2010.05.022

85.85. Bayerl, P. and Neunmann, H. (2004) Disambiguating visual motion through contextual feedback modulation. Neural Comput., 16, 2041–2066.

86.86. Berzhanskaya, J., Grossberg, S., and Mingolla, E. (2007) Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception. Spat. Vis., 20, 337–395.

87.87. Ilg, U.J. and Thier, P. (1999) Eye movements of rhesus monkeys directed towards imaginary targets. Vision Res., 39, 2143–2150.

88.88. Ilg, U.J. and Thier, P.P. (2003) Visual tracking neurons in primate area MST are activated by Smooth-Pursuit eye movements of an “imaginary” target. J. Neurophysiol., 90 (3), 1489–1502, doi: 10.1152/jn.00272.2003

89.89. Chukoskie, L. and Movshon, J.A. (2008) Modulation of visual signals in macaque MT and MST neurons during pursuit eye movements. J. Neurophysiol., 102, 3225–3233.

90.90. Pack, C., Grossberg, S., and Mingolla, E. (2001) A neural model of smooth pursuit control and motion. J. Cogn. Neurosci., 13 (1), 102–120.

91.91. Grossberg, S., Srihasam, K., and Bullock, D. (2012) Neural dynamics of saccadic and smooth pursuit coordination during visual tracking of unpredictably moving targets. Neural Netw., 27, 1–20.

92.92. Leigh, R. and Zee, D. (2006) The neurology of eye movements, 4th edn, Oxford University Press, New York.

93.93. Ilg, U.J. (2003) Visual tracking neurons in area MST are activated during anticipatory pursuit eye movements. Neuroreport, 14, 2219–2223.

94.94. Fukushima, K., Yamatobe, T., Shinmei, Y., and Fukushima, J. (2002) Predictive responses of periarcuate pursuit neurons to visual target motion. Exp. Brain Res., 145, 104–120.

95.95. Heinen, S. (1995) Single neuron activity in the dorsolateral frontal cortex during smooth pursuit eye movements. Exp. Brain Res., 104, 357–361.

96.96. Missal, M. and Heinen, S. (2004) Supplementary eye fields stimulation facilitates anticipatory pursuit. J. Neurophysiol., 92 (2), 1257–1262.

97.97. Goldman-Rakic, P. (1995) Cellular basis of working memory. Neuron, 14, 477–485.

98.98. Orban de Xivry, J. and Lefèvre, P. (2004) Saccades and pursuit: two outcomes of a single sensorimotor process. J. Physiol., 584, 11–23.

99.99. Lu, Z.L. and Sperling, G. (2001) Three-systems theory of human visual motion perception: review and update. J. Opt. Soc. Am. A, 18, 2331–2370.

100. 100. Wilmer, J. and Nakayama, K. (2007) Two distinct visual motion mechanisms for smooth pursuit: evidence from individual differences. Neuron, 54, 987–1000.