-
The rapid advancement of large language models (LLM) such as ChatGPT has imposed unprecedented demands on hardware in terms of computational power, memory capacity, and energy efficiency. Compute-in-Memory (CIM) technology, which integrates computation directly into memory arrays, has emerged as a promising solution to overcome the data movement bottlenecks of traditional von Neumann architectures, significantly reducing power consumption and enabling large-scale parallel processing. Among various non-volatile memory candidates, 3D NAND flash stands out due to its mature manufacturing process, ultrahigh density, and cost-effectiveness, making it a strong contender for commercial CIM deployment and local inference of large models.
Despite these advantages, most existing research on 3D NAND-based CIM remains at the academic level, focusing on theoretical designs or small-scale prototypes, with little attention to system-level architecture design and functional validation using product-grade 3D NAND chips for LLM applications. To address this gap, we propose a novel CIM architecture based on 3D NAND flash, leveraging a Source Line (SL) slicing technique to partition the array for parallel computation at minimal manufacturing cost. This architecture is complemented by an efficient mapping algorithm and pipelined dataflow, enabling system-level simulation and rapid industrial iteration.
We develop a PyTorch-based behavioral simulator for LLM inference on the proposed hardware, evaluating the impact of current distribution and quantization on system performance. Our design supports INT4/INT8 quantization and employs dynamic weight storage logic to minimize voltage switching overhead, further optimized through hierarchical pipelining to maximize throughput under hardware constraints.
Simulation results show that our simulation-grade 3D NAND compute-in-memory chip reaches generation speeds of 20 tokens/s with an energy efficiency of 5.93 TOPS/W on GPT-2-124M and 8.5 tokens/s with 7.17 TOPS/W on GPT-2-355M, while maintaining system-level reliability for open-state current distributions with σ < 2.5 nA; in INT8 mode, quantization error is the dominant accuracy bottleneck.
Compared with previous CIM solutions, our architecture supports larger model loads, higher computational precision, and significantly reduced power consumption, as evidenced by comprehensive benchmarking. The SL slicing technique keeps array wastage below 3%, while hybrid wafer-bonding integrates high-density ADC/TIA circuits to enhance hardware resource utilization.
This work represents the first system-level simulation of LLM inference on product-grade 3D NAND CIM hardware, providing a standardized and scalable reference for future commercialization. The complete simulation framework is released on GitHub to facilitate further research and development. Future work will focus on device-level optimization of 3D NAND and iterative improvements to the simulator algorithm.-
Keywords:
- 3D NAND /
- Compute-in-Memory (CIM) /
- Hardware Acceleration
-
[1] Singh Parihar S, Kumar S, Chatterjee S, Pahwa G, Singh Chauhan Y, Amrouch H 2025 IEEE J. Explor. Solid-State Comput. Devices Circuits 11 34
[2] Molom-Ochir T, Taylor B, Li H, Chen Y 2025 IEEE Trans. Circuits Syst. I, Reg. Papers 1
[3] Wu B, Lv X, Yu T, Chen K, Liu W 2025 IEEE Nanotechnol. Mag. 1
[4] Li H, Yao E, Qin P, Jiang S 2025 IEEE Trans. Magn. 1
[5] Khwa W S, Wen T H, Hsu H H, Huang W H, Chang Y C, Chiu T C, Ke Z E, Chin Y H, Wen H J, Hsu W T, Lo C C, Liu R S, Hsieh C C, Tang K T, Ho M S, Lele A S, Teng S H, Chou C C, Chih Y D, Chang T Y J, Chang M F 2025 Nature 639 617
[6] Sharma V, Zhang X, Dhakad N S, Kim T T H 2025 IEEE Trans. Circuits Syst. I, Reg. Papers 1
[7] Liu S, Wei S, Yao P, Wu D, Jie L, Pan S, Tang J, Gao B, Qian H, Wu H 2025 J. Semicond. 46 062304
[8] Chang S H, Yen R H, Liu C N 2025 J. Emerg. Technol. Comput. Syst. 21 4:1
[9] Zhang Y Q, Wang J J, Lyv Z Y, Han S T 2022 Acta Phys. Sin. 71 148502(in Chinese)[张宇琦, 王俊杰, 吕子玉, 韩素婷 2022 Acta Phys. Sin. 71 148502]
[10] Shim W, Yu S 2021 IEEE J. Explor. Solid-State Comput. Devices Circuits 7 1
[11] Hong Y, Kim M, Kim C 2025 techrxiv:174439324.42202505
[12] Shim W, Yu S 2021 IEEE J. Explor. Solid-State Comput. Devices Circuits 7 61
[13] Shim W, Yu S 2021 IEEE Electron Device Lett. 42 160
[14] Chen Y Y, He Y H, Miao X S, Yang D H 2022 Acta Phys. Sin. 71 210702(in Chinese)[陈阳洋, 何毓辉, 缪向水, 杨道虹 2022 Acta Phys. Sin. 71 210702]
[15] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł ukasz, Polosukhin I 2017 Advances in Neural Information Processing Systems Curran Associates, Inc, 2017 pp5998-6008
[16] Hanna M, Liu O, Variengien A 2023 Adv. Neural Inf. Process. Syst 36 76033
[17] Lue H T, Hsu P K, Wei M L, Yeh T H, Du P Y, Chen W C, Wang K C, Lu C Y 2019 2019 IEEE International Electron Devices Meeting (IEDM), 2019-12 pp38.1.1-38.1.4
[18] Kim M, Liu M, Everson L, Park G, Jeon Y, Kim S, Lee S, Song S, Kim C H 2019 2019 IEEE International Electron Devices Meeting (IEDM), 2019-12 pp38.3.1-38.3.4
[19] Kang M, Kim H, Shin H, Sim J, Kim K, Kim L S 2022 IEEE Trans. Comput. 71 1291
[20] Lee S T, Yeom G, Yoo H, Kim H S, Lim S, Bae J H, Park B G, Lee J H 2021 IEEE Trans. Electron Devices 68 3365
[21] Lee S T, Lee J H 2020 Front. Neurosci. 14
[22] Wong R, Kim N, Higgs K, Agarwal S, Ipek E, Ghose S, Feinberg B 2024 arXiv:2403.06938
[23] Hsieh C C, Lue H T, Li Y C, Hung S N, Hung C H, Wang K C, Lu C Y 2023 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023-06 pp1-2
Metrics
- Abstract views: 62
- PDF Downloads: 0
- Cited By: 0