In edge computing scenarios, response speed, compactness, and power efficiency have become critical challenges for visual systems. Traditional vision architectures that separate sensing and computing have high latency, excessive power consumption, and potential privacy leakage, which are caused by data transmission. To address these issues, vision chips inspired by the human visual system have emerged as a promising solution. By integrating image acquisition and information processing within a single hardware platform, such chips enable a sensing–computation co-processing paradigm, supporting efficient visual perception and computation directly at the edge. Developing high-speed vision chips is an inherently interdisciplinary task that bridges physics, electronics, and information science. It addresses the key problems in device fabrication, circuit design, and intelligent algorithm integration. This paper systematically reviews recent advances in the core components of high-speed vision chips.
For high-speed sensor devices, this paper analyzes the physical mechanisms, structural innovations, and performance limitations of complementary metal oxide semiconductor (CMOS) image sensors (CISs), dynamic vision sensors (DVSs), and single-photon image sensors. High-speed CIS devices enhance temporal response by optimizing two fundamental aspects: charge transfer velocity and transfer path length. Gradient doping is employed to induce high-speed drift motion during charge transfer, while structural optimization based on physical device modeling shortens the transfer path, thereby enabling fast response. In contrast, the DVS performs event-triggered readout when light intensity changes exceed a predefined threshold. This event-driven mechanism effectively eliminates static redundant information and only generates spike-based data reflecting brightness changes, achieving low latency and high temporal resolution. For single-photon detection, the CIS-based quantum image sensors study the source and physical mechanism of noise, achieving ultra-low noise and extremely high conversion gain. The image sensors using single-photon avalanche diodes (SPADs) leverage the avalanche effect to directly convert incident photons into pulse outputs, realizing high-speed and high-sensitivity single-photon detection. Furthermore, electric-field modulation enhances photogenerated charge collection and reduces temporal jitter, thereby improving timing precision in SPADs.
In terms of readout circuits, this paper reviews the architectures and optimization strategies for high-speed analog-to-digital converters (ADCs), address-event encoding, and time-correlated single-photon counting. To enhance conversion efficiency while minimizing chip area and power consumption, various ADC architectures have been developed. The successive approximation register (SAR) ADC has become a foundational solution due to its high integration and low power characteristics. Hybrid architectures such as SAR/single-slope (SS) and pipeline–SAR combine the strengths of different schemes, thereby effectively overcoming the area–resolution trade-offs inherent in traditional SAR ADCs. For DVS sensors, the address-event representation (AER) readout mechanism performs real-time detection of brightness variations and outputs them as asynchronous events. This greatly enhances image processing throughput while reducing storage and transmission demands. In SPAD-based sensors, on-chip integration of counting and histogram computation effectively alleviates the data throughput bottleneck associated with large-scale single-photon detection. These readout strategies, each tailored to the characteristics of their corresponding sensing mechanisms, collectively improve data conversion and transmission efficiency in high-speed imaging scenarios.
For intelligent processing, the primary objective is to efficiently extract information from sensor data and enable algorithmic intelligence. This process generally involves two stages: the reconstruction stage, which focuses on recovering high-quality image sequences from sparse spike streams, and the intelligent processing stage, which achieves high-speed semantic understanding through real-valued or spike-based computational architectures. By deeply integrating reconstruction and cognition at both algorithmic and hardware levels, end-to-end intelligent vision systems can simultaneously achieve high speed, low power consumption, and high accuracy. With ongoing technological convergence, multimodal vision chips integrating CIS, DVS, and SPAD architectures combine the advantages of different sensor modalities, providing more comprehensive perceptual capabilities for next-generation machine vision systems. Looking ahead to the future, the continuous advancement of semiconductor manufacturing technologies and novel materials, combined with the deep integration of multimodal sensing and heterogeneous computing paradigms, is expected to drive the development of high-performance, low-power, and intelligent vision chips.