How the brain encodes different face identities is one of the most fundamental and intriguing questions in neuroscience. There are currently two extreme hypotheses: (1) the exemplar-based model proposes that neurons respond in a remarkably selective and abstract manner to particular persons or objects, whereas (2) the axis-based model (a.k.a. feature-based model) posits that neurons distinguish facial features along specific axes (e.g., shape and skin color) in face space. However, a third under-explored coding scheme, the manifold-based coding, may exist in which neurons may encode the perceptual distance (i.e., similarity) between examples of faces at a macro level regardless of their individual features that may distinguish them at a micro level. Our lab is aiming to conduct one of the first studies to investigate face representation and coding in the human medial temporal lobe (MTL) at the single-neuron level. To the best of our knowledge, this will be the first study to directly compare different hypothesized neural coding schemes in the human MTL at the single-neuron level and also the first study to employ deep learning to study single-neuron responses in humans. Our single-neuron recordings will enable us to construct, validate, and explain neural face models to derive a general neural representation of faces. We will then use a deep neural network to explore the different neural face models listed above and thus be able to identify the predominant neural coding scheme in the human MTL. Together, our state-of-the-art human single-neuron recordings, powered by the latest image processing tools, will provide the most comprehensive and detailed analysis of neural representations of faces in humans with the highest possible spatial and temporal resolution to date.

Our recent data have shown that some MTL neurons are selective to multiple different face identities on the basis of shared visual features that form clusters in the representation of a deep neural network trained to recognize faces. Contrary to prevailing views, we find that these neurons represent an individual’s face with feature-based encoding, rather than through association with concepts. The response of feature neurons did not depend on face identity, race, gender, or familiarity; and the region of feature space to which they are tuned predicted their response to new face stimuli. Our results provide critical evidence bridging the perception-driven representation of facial features in the higher visual cortex and the memory-driven representation of semantics in the MTL, which may form the basis for declarative memory.

Feature-based neuronal coding of face identities. (A) Task. We employed a one-back task, in which patients responded whenever an identical famous face was repeated. Each face was presented for 1s, followed by a jittered inter-stimulus-interval (ISI) of 0.5 to 0.75s. (B) Percentage of single-identity (SI) and multiple-identity (MI) neurons in the entire neuronal population. Stacked bar shows MI neurons that encoded visually similar identities (i.e., demonstrating feature-based coding; red) or not (blue). (C, D) Population decoding of face identity. (C) Decoding performance was primarily driven by identity neurons. Shaded area denotes ±SEM across bootstraps. The horizontal dotted gray line indicates the chance level (2%). The top bars illustrate the time points with a significant above-chance decoding performance (bootstrap, P < 0.05, corrected by FDR for Q < 0.05). (D) MI neurons had a significantly better decoding performance than SI neurons. The top bar illustrates the time points with a significant difference between MI and SI neurons (bootstrap, P < 0.05, corrected by FDR for Q < 0.05). (E) Web-association score for MI neurons. For each neuron, we calculated a mean association score between the pairs of stimuli that the neuron was selective to (S-S), and between the pairs of stimuli where the neuron was selective to one of them but not selective (NS) to the other (S-NS). Error bars denote ±SEM across neurons. Left: MI neurons that encoded visually similar identities (i.e., with feature-based coding). Right: MI neurons that did not show feature-based coding. For neither case, MI neurons encoded conceptually related identities. (F-M) Two example neurons that encoded visually similar identities. (F, J) Neuronal responses to 500 faces (50 identities). Trials are aligned to face stimulus onset (gray line) and are grouped by individual identity. (G, K) Projection of the firing rate onto the feature space. Each color represents a different identity (names shown in the legend). The size of the dot indicates the firing rate. (H, L) Estimate of the spike density in the feature space. By comparing observed (upper) vs. permuted (lower) responses, we could identify a region where the observed neuronal response was significantly higher in the feature space. This region was defined as the tuning region of a neuron. (I, M) The tuning region of the neuron in the feature space (delineated by the red outline).

We have also found a neuronal social trait space for first impressions in the human amygdala and hippocampus, which may have a behavioral consequence likely involved in the abnormal processing of social information in autism. Our results suggest that there exists a neuronal population code for a comprehensive social trait representation in the human amygdala and hippocampus that underlies spontaneous first impressions.

Neuronal social trait space. (A) Task. We employed a simple one-back task, in which patients responded whenever an identical face stimulus was repeated. Each face was presented for 1s, followed by a jittered inter-stimulus-interval (ISI) of 0.5 to 0.75 s. (B) Distribution of face images in the social trait space based on their consensus social trait ratings after dimension reduction using t-distributed stochastic neighbor embedding (t-SNE). (C) Correlation between dissimilarity matrices (DMs). The social trait DM (left matrix) was correlated with the neural response DM (right matrix). Color coding shows dissimilarity values. (D-H) Observed vs. permuted correlation coefficient between DMs. The correspondence between DMs was assessed using permutation tests with 1000 runs. The magenta line indicates the observed correlation coefficient between DMs. The null distribution of correlation coefficients (shown in gray histogram) was calculated by permutation tests of shuffling the face identities (1000 runs). (D) All face-responsive neurons (n = 74). (E) Amygdala face-responsive neurons (n = 36). (F) Hippocampal face-responsive neurons (n = 38). (G) Social trait space constructed using Caucasian faces only (n = 74). (H) Social trait space constructed using African American faces only (n = 74). (I) Temporal dynamics of correlation between DMs. Bin size is 500 ms and step size is 50 ms. The first bin is from −500 ms to 0 ms (bin center: −250 ms) relative to stimulus onset, and the last bin is from 1000 ms to 1500 ms (bin center: 1250 ms) after stimulus onset. Dotted horizontal lines indicate the chance level and dashed horizontal lines indicate the ±Standard Deviation (SD) of the null distribution. The top asterisks illustrate the time points with a significant correlation between DMs (permutation test against null distribution, P < 0.05, corrected by false discovery rate [FDR] Q < 0.05).