Fourth International Conference on Cognitive and Neural Systems

May 25-27 - 2000 - Boston (Massachusetts, USA)

 
Fast processing of natural scenes: detection, categorization and the role of top-down knowledge
 
Michèle Fabre-Thorpe, Arnaud Delorme, Guillaume Rousselet & Simon Thorpe
 
Centre de Recherche Cerveau et Cognition (CNRS-Université Paul Sabatier), 133 route de Narbonne. 31062 Toulouse France.
mft@cerco.ups-tlse.fr, arno@cerco.ups-tlse.fr, guillau@cerco.ups-tlse.fr, thorpe@cerco.ups-tlse.fr

 
    Humans are highly efficient at detecting animals within briefly flashed (20 ms) natural scenes that they have never seen before. Accuracy is commonly at 94% correct and mean reaction time typically range between 400-450 ms. Moreover frontal ERPs differ sharply from 150ms after stimulus onset between target and non-target trials (Thorpe et al., 1996); a differential brain activity that has been shown to be task decision related and category independent (Van Rullen & Thorpe, submitted). Such data imposes considerable constraints on models of visual processing because the speed of categorization leaves little time for anything more than a straight feed-forward pass through the visual system. Here we report the results of two further experiments that were specifically aimed at assessing the role of top-down influences in such tasks. The first study asked to what extent processing speed could be increased by training. The second directly compared rapid visual categorisation with a simpler visual task in which subjects had to respond each time a particular image was presented.
    For the first experiment, 14 human subjects were presented 1200 previously unseen images mixed at random with 1200 highly familiar images (200 images seen daily for three weeks, presented 6 times when mixed with new stimuli); ERPs were simultaneously recorded. As in the previous studies, subjects had to perform a go/no-go task, releasing a button when they detected an animal. The results showed that although mean RTs were shorter for familiar stimuli than novel ones (424 vs 444 ms), this effect was entirely due to the elimination of long-latency responses occuring for a relatively small group of "difficult" novel targets. In contrast, for the majority of images, there was no evidence for an increase in processing speed: the RT distributions for the fastest responses were identical for familiar and novel images, and the ERP data showed that the onset of the differential activity at around 150 ms post stimulus was unaffected by familiarity. The results imply that this sort of rapid categorization can be performed in the absence of specific knowledge about particular images.
    Given the failure of the first experiment to demonstrate an effect of top-down knowledge on minimal processing time, the second experiment was designed to measure processing speed in the situation where the use of top-down knowledge was maximized. Fourteen subjects alternated between the control "animal/non animal" experiment and a "single target" task. In this latter task, for each series of 100 trials, they were first asked to study a particular photograph and subsequently had to detect this target-image when flashed at random 50 times among 50 other natural scenes that they had never seen before. In this "detection" task, subjects could "preset" their visual system in order to make use of any low level feature of the target-image. The simple "detection" task was clearly easier than the categorisation task - accuracy was higher (98.7% vs 93.4%), and mean RTs were shorter (341 vs 409 ms). This RT shift towards shorter latencies was observed on the full range of behavioral responses, with a 40 ms latency decrease for the fastest ones. Moreover, the ERP differential brain activity started about 25 ms earlier than in the control task. Such data indicate that the minimal processing time for categorisation is only 25-40 ms longer than for simple detection. Such results can be interpretated in a variety of ways. One is that threshold for responding can be considerably reduced when the target is completely predictable; the other is that decision could be made on the basis of lower level features (as suggested by the pattern of false positives in the second task) thus bypassing some higher level processing stages.