Cortical integration from V1 into V4 along the object-processing hierarchy

Human and non‐human primates effortlessly see both global and local features of objects in great detail. However, how the cortex integrates local visual cues to form global representations along visual hierarchies remains mysterious, particularly considering a long-standing paradox in vision as neurally encoded complexity increases along the visual hierarchy, the known acuity or resolving power dramatically decreases. Putting it simply, how do we simultaneously recognize the face of our child, while still resolving the individual hairs of her or his eyelashes? Many models of visual processing follow the idea that low-level resolution and position information is discarded to yield high-level representations (including cutting-edge deep learning models). These are themes that are fundamental to conceiving how the brain does sensory transformation!

Combining large-scale imaging of high spatial resolution to record the transformation of information across three visual areas simultaneously (V1, V2, and V4) with electrophysiological multi-site laminar recordings, we found a bottom-up cascade of cortical integration of local visual cues as general cortical mechanisms for global representations in primate ventral and dorsal streams. The integrated neural responses are dependent on the sizes and preferences of their receptive fields. Recently, we reveal an unexpected neural clustering preserving visual acuity from V1 into V4, enabling a detailed spatiotemporal separation of local and global features along the object-processing hierarchy, suggesting that higher acuities are retained to later stages where more detailed cognitive behaviour occurs. The study reinforces the point that neurons in V4 (and most likely also in infero-temporal cortex) do not necessarily need to have only low visual acuity, which may begin to resolve the long-standing paradox concerning fine visual discrimination. Thus, our research will prompt further studies to probe how preservation of low-level information is useful for higher-level vision and provide new ideas to inspire the next generation of deep neural network architectures.