8th Joint Symposium on Neural Computation

May 19 - 2001 - La Jolla (California, USA)

 
Face identification using one spike per neuron
 
(1)Arnaud Delorme & (2)Simon J. Thorpe
 
(1)CNL, Salk Institute, 10010 Torrey Pines Road, La Jolla CA92037, USA
(2)CERCO, 133, route de Narbonne, 31062 Toulouse, France
arno@salk.edu, thorpe@cerco.ups-tlse.fr

 
    In humans, 150 ms of processing is sufficient to detect the presence of a target in briefly flashed photographs of natural scenes (Thorpe et al, 1996, Nature, 381, 520), and in monkeys, processing time is probably even shorter (Fabre-Thorpe et al, 1998, Neuroreport, 9, 303). Here, we propose a neural model for object recognition compatible with this sort of rapid processing. We hypothesize that, in a wave of spikes, information is encoded by the order in which cells fire. Starting with the retina, the earliest firing cells are those with the strongest inputs, and at subsequent stages, more complex receptive field properties are produced by a desensitization mechanism which makes neurons sensitive to the order in which their inputs fire. Fast feed-forward shunting inhibition could implement such a mechanism and recent experiments have shown that such inhibition might be present in the very beginning of spike integration in V1 pyramidal neurons (Borg-Graham et al, 1998, Nature, 393, 369).
    Modeling millions of integrate and fire neurons in retinotopically organized maps using a specially designed neural network simulator called SpikeNET (Delorme et al, 1999, Neurocomputing, 26-27, 989), we implemented a relatively simple feed-forward architecture composed of three layers. The first layer corresponded to the retina, the second one represented groups of V1 orientation selective neurons and the final processing layer contained maps of neurons that were trained to respond to various views of 40 test faces. Lateral competitive inhibitory mechanisms in the last layer mean that once one unit has fired, activation of other units with receptive fields in the same part of visual space becomes increasingly difficult. Note that neurons never fired more than once so that conventional rate coding schemes cannot operate.
    During the learning phase, photographs of one of the 40 individuals were shown to the network and a supervised learning rule was used to adjust synaptic weights for the neurones located at the appropriate location in one of the 40 output layer maps. The algorithm used a Hebbian-like learning rule that adjusted the synaptic weights in such a way that the final weights converged to values that depended on the average ranking of each input. Inputs that were always activated first were given high weights, whereas an input that fired systematically towards the end of propagation would be assigned a low weight.
    We showed that, when the network is presented with novel views of the learnt faces, responses of neurons were remarkably reliable and the network categorized the face accurately. We also showed that changing the image contrast or adding noise produced reduction in the performances of the network that were comparable to that of the human visual system (see figure; Random responses would lead to 2.5% accuracy. Noisy images correspond to a weigted mixture of original images with random value images). Thus, despite the simplicity of the model, it provides a robust basis for object recognition that is compatible both with temporal constraints and the performance of the human brain.