One key research project in our lab is to use complex natural scene images to study saliency, attention, learning, and memory. In our previous studies, we annotated more than 5,000 objects in 700 well characterized images and recorded eye movements when participants looked at these images (Wang et al., Neuron, 2015). We will extend the same task to single-neuron recordings to investigate the neural correlates of saliency. We will also add three important components to this free viewing task: (1) we will repeat the images once or twice to explore a repetition effect (c.f. Jutras et al., PNAS, 2013), (2) we will ask neurosurgical patients to memorize the images during the first session (learning session) and test memory on the next day (recognition session) to explore a memory effect, and (3) we will explore memory encoding with overnight recording. Moreover, to probe the neural basis for altered saliency representation in autism (c.f. Wang et al., Neuron, 2015), we will analyze whether neurons are tuned to different saliency values and whether AQ/SRS scores correlate with the firing rate of fixations on semantic attributes such as faces.

One highlight of this project is to construct a “neuronal saliency map“. A saliency model can be constructed for each single neuron and population of neurons. By replacing the eye movement fixation density map by neuronal firing rate density map, the saliency weights can be calculated for neurons. This neuronal saliency map will reflect the tuning of a single neuron or a population of neurons when multiple saliency factors are considered simultaneously. Notably, the saliency weights can be easily compared between brain areas (e.g., OFC vs. amygdala) and between groups (e.g., ASD vs. controls; see [10] as an example for fixation saliency weights). The distribution of saliency weights across neurons can reflect the population coding scheme in a brain area. Lastly, the effect of such neuronal saliency maps can be readily validated: we can use this saliency map to predict the location of the next fixation and are thus able to quantify the prediction accuracy. A real-time decoder can be constructed.

Model-based eye tracking. We applied a linear support vector machine (SVM) classifier to evaluate the contribution of five general factors in gaze allocation. Feature maps are extracted from the input images and included three levels of features (pixel, object, and semantic-level) together with the image center and the background. We apply a random sampling to collect the training data and train on the ground-truth actual fixation data. The classifier outputs are the saliency weights, showing the relative importance of each feature in predicting gaze allocation.