A Compute-In-Memory Architecture and System-Technology Codesign Simulator Based 3D NAND Flash

ZHENG Hao; LIU Huiwen; FANG Yuxuan; FAN Dongyu; HAN Yuhui; HOU Chunyuan; LIU Wei; XIA Zhiliang; HUO Zongliang

doi:10.7498/aps.74.20250891

Abstract
The rapid advancement of large language models (LLM) such as ChatGPT has imposed unprecedented demands on hardware in terms of computational power, memory capacity, and energy efficiency. Compute-in-Memory (CIM) technology, which integrates computation directly into memory arrays, has emerged as a promising solution to overcome the data movement bottlenecks of traditional von Neumann architectures, significantly reducing power consumption and enabling large-scale parallel processing. Among various non-volatile memory candidates, 3D NAND flash stands out due to its mature manufacturing process, ultrahigh density, and cost-effectiveness, making it a strong contender for commercial CIM deployment and local inference of large models.
Despite these advantages, most existing research on 3D NAND-based CIM remains at the academic level, focusing on theoretical designs or small-scale prototypes, with little attention to system-level architecture design and functional validation using product-grade 3D NAND chips for LLM applications. To address this gap, we propose a novel CIM architecture based on 3D NAND flash, leveraging a Source Line (SL) slicing technique to partition the array for parallel computation at minimal manufacturing cost. This architecture is complemented by an efficient mapping algorithm and pipelined dataflow, enabling system-level simulation and rapid industrial iteration.
We develop a PyTorch-based behavioral simulator for LLM inference on the proposed hardware, evaluating the impact of current distribution and quantization on system performance. Our design supports INT4/INT8 quantization and employs dynamic weight storage logic to minimize voltage switching overhead, further optimized through hierarchical pipelining to maximize throughput under hardware constraints.
Simulation results show that our simulation-grade 3D NAND compute-in-memory chip reaches generation speeds of 20 tokens/s with an energy efficiency of 5.93 TOPS/W on GPT-2-124M and 8.5 tokens/s with 7.17 TOPS/W on GPT-2-355M, while maintaining system-level reliability for open-state current distributions with σ < 2.5 nA; in INT8 mode, quantization error is the dominant accuracy bottleneck.
Compared with previous CIM solutions, our architecture supports larger model loads, higher computational precision, and significantly reduced power consumption, as evidenced by comprehensive benchmarking. The SL slicing technique keeps array wastage below 3%, while hybrid wafer-bonding integrates high-density ADC/TIA circuits to enhance hardware resource utilization.
This work represents the first system-level simulation of LLM inference on product-grade 3D NAND CIM hardware, providing a standardized and scalable reference for future commercialization. The complete simulation framework is released on GitHub to facilitate further research and development. Future work will focus on device-level optimization of 3D NAND and iterative improvements to the simulator algorithm.

Keywords:
3D NAND /

Compute-in-Memory (CIM) /

Hardware Acceleration
Cited By

[1]	Singh Parihar S, Kumar S, Chatterjee S, Pahwa G, Singh Chauhan Y, Amrouch H 2025 IEEE J. Explor. Solid-State Comput. Devices Circuits 11 34
[2]	Molom-Ochir T, Taylor B, Li H, Chen Y 2025 IEEE Trans. Circuits Syst. I, Reg. Papers 1
[3]	Wu B, Lv X, Yu T, Chen K, Liu W 2025 IEEE Nanotechnol. Mag. 1
[4]	Li H, Yao E, Qin P, Jiang S 2025 IEEE Trans. Magn. 1
[5]	Khwa W S, Wen T H, Hsu H H, Huang W H, Chang Y C, Chiu T C, Ke Z E, Chin Y H, Wen H J, Hsu W T, Lo C C, Liu R S, Hsieh C C, Tang K T, Ho M S, Lele A S, Teng S H, Chou C C, Chih Y D, Chang T Y J, Chang M F 2025 Nature 639 617
[6]	Sharma V, Zhang X, Dhakad N S, Kim T T H 2025 IEEE Trans. Circuits Syst. I, Reg. Papers 1
[7]	Liu S, Wei S, Yao P, Wu D, Jie L, Pan S, Tang J, Gao B, Qian H, Wu H 2025 J. Semicond. 46 062304
[8]	Chang S H, Yen R H, Liu C N 2025 J. Emerg. Technol. Comput. Syst. 21 4:1
[9]	Zhang Y Q, Wang J J, Lyv Z Y, Han S T 2022 Acta Phys. Sin. 71 148502(in Chinese)[张宇琦, 王俊杰, 吕子玉, 韩素婷 2022 Acta Phys. Sin. 71 148502]
[10]	Shim W, Yu S 2021 IEEE J. Explor. Solid-State Comput. Devices Circuits 7 1
[11]	Hong Y, Kim M, Kim C 2025 techrxiv:174439324.42202505
[12]	Shim W, Yu S 2021 IEEE J. Explor. Solid-State Comput. Devices Circuits 7 61
[13]	Shim W, Yu S 2021 IEEE Electron Device Lett. 42 160
[14]	Chen Y Y, He Y H, Miao X S, Yang D H 2022 Acta Phys. Sin. 71 210702(in Chinese)[陈阳洋, 何毓辉, 缪向水, 杨道虹 2022 Acta Phys. Sin. 71 210702]
[15]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł ukasz, Polosukhin I 2017 Advances in Neural Information Processing Systems Curran Associates, Inc, 2017 pp5998-6008
[16]	Hanna M, Liu O, Variengien A 2023 Adv. Neural Inf. Process. Syst 36 76033
[17]	Lue H T, Hsu P K, Wei M L, Yeh T H, Du P Y, Chen W C, Wang K C, Lu C Y 2019 2019 IEEE International Electron Devices Meeting (IEDM), 2019-12 pp38.1.1-38.1.4
[18]	Kim M, Liu M, Everson L, Park G, Jeon Y, Kim S, Lee S, Song S, Kim C H 2019 2019 IEEE International Electron Devices Meeting (IEDM), 2019-12 pp38.3.1-38.3.4
[19]	Kang M, Kim H, Shin H, Sim J, Kim K, Kim L S 2022 IEEE Trans. Comput. 71 1291
[20]	Lee S T, Yeom G, Yoo H, Kim H S, Lim S, Bae J H, Park B G, Lee J H 2021 IEEE Trans. Electron Devices 68 3365
[21]	Lee S T, Lee J H 2020 Front. Neurosci. 14
[22]	Wong R, Kim N, Higgs K, Agarwal S, Ipek E, Ghose S, Feinberg B 2024 arXiv:2403.06938
[23]	Hsieh C C, Lue H T, Li Y C, Hung S N, Hung C H, Wang K C, Lu C Y 2023 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023-06 pp1-2

[1]	Fang Yu-Xuan, Xia Zhi-Liang, Yang Tao, Zhou Wen-Xi, Huo Zong-Liang. Improvement of fluorine attack induced word-line leakage in 3D NAND flash memory. Acta Physica Sinica, doi: 10.7498/aps.73.20231557
[2]	Fang Yu-Xuan, Yang Yi, Xia Zhi-Liang, Huo Zong-Liang. First-principles study of F adsorption by TiN with its oxide surface in three-dimensional NAND flash memory. Acta Physica Sinica, doi: 10.7498/aps.73.20240254
[3]	Chen Yang-Yang, He Yu-Hui, Miao Xiang-Shui, Yang Dao-Hong. 3D-NAND flash memory based neuromorphic computing. Acta Physica Sinica, doi: 10.7498/aps.71.20220974
[4]	Zhang Yu-Qi, Wang Jun-Jie, Lü Zi-Yu, Han Su-Ting. Multimode modulated memristors for in-sensor computing system. Acta Physica Sinica, doi: 10.7498/aps.71.20220226
[5]	Wu Chang-Chun, Zhou Pu-Jun, Wang Jun-Jie, Li Guo, Hu Shao-Gang, Yu Qi, Liu Yang. Memristor based spiking neural network accelerator architecture. Acta Physica Sinica, doi: 10.7498/aps.71.20220098
[6]	Shan Xuan-Yu, Wang Zhong-Qiang, Xie Jun, Zheng Jia-Hui, Xu Hai-Yang, Liu Yi-Chun. Recent progress in optoelectronic memristive devices for in-sensor computing. Acta Physica Sinica, doi: 10.7498/aps.71.20220350
[7]	Wang Tong, Wen Juan, Lü Kang, Chen Jian-Zhong, Wang Liang, Guo Xin. Bio-inspired sensory systems with integrated capabilities of sensing, data storage, and processing. Acta Physica Sinica, doi: 10.7498/aps.71.20220281
[8]	Wu Xiao-Yu, Zhao Hu, Li Zhi. Three-dimensional transmon coherence measurement method based on network analyser. Acta Physica Sinica, doi: 10.7498/aps.69.20200252
[9]	Hou Zhi-Shan, Xu Shuai, Luo Yang, Li Ai-Wu, Yang Han. Femtosecond laser 3D printing temperature sensitive microsphere lasers. Acta Physica Sinica, doi: 10.7498/aps.68.20190298
[10]	Xiong Yi-Jun, Wang Yan, Wang Qiang, Wang Chun-Qi, Huang Xiao-Zhong, Zhang Fen, Zhou Ding. Structural broadband absorbing metamaterial based on three-dimensional printing technology. Acta Physica Sinica, doi: 10.7498/aps.67.20172262
[11]	Liao Jian, Xie Zhao-Qi, Yuan Jian-Mei, Huang Yan-Ping, Mao Yu-Liang. First-principles study of 3d transition metal Co doped core-shell silicon nanowires. Acta Physica Sinica, doi: 10.7498/aps.63.163101
[12]	Wang Zhen, Li Yong-Xin, Xi Xiao-Jian, Lü Lei. Heteoclinic orbit and backstepping control of a 3D chaotic system. Acta Physica Sinica, doi: 10.7498/aps.60.010513
[13]	Shang Jia-Xiang, Yu Xian-Yang. The site preference of 3d transition metals in NiAl and its effects on bond characters. Acta Physica Sinica, doi: 10.7498/aps.57.2380
[14]	Zhao Zong-Yan, Liu Qing-Ju, Zhang Jin, Zhu Zhong-Qi. First-principles study of 3d transition metal-doped anatase. Acta Physica Sinica, doi: 10.7498/aps.56.6592
[15]	He Li-Ming, Cao Wei, Chen Xue-Qian, Zhu Yun-Xia. Calculation of helium 1D—3D term intervals for　1snd(n=4—11) states. Acta Physica Sinica, doi: 10.7498/aps.54.5077
[16]	Zhao Xin-Xin, Tao Xiang-Ming, Chen Wen-Bin, Cai Jian-Qiu, Tan Ming-Qiu. Magnetism of 3d transition metal monolayers on Pd(001) surface: density functional theory study. Acta Physica Sinica, doi: 10.7498/aps.54.5849
[17]	Lü Jin, Xu Xiao-Hong, Wu Hai-Shun. Structure and magnetism of 3d series (TM)4 clusters. Acta Physica Sinica, doi: 10.7498/aps.53.1050
[18]	ZHOU YI-YANG. . Acta Physica Sinica, doi: 10.7498/aps.44.122
[19]	ZHANG QIANG-JI, CHEN NAI-QUN, HUA ZHONG-YI. INVESTIGATION OF 3d TRANSITION METALS BY IONIZATION LOSS SPECTROSCOPY. Acta Physica Sinica, doi: 10.7498/aps.40.1344
[20]	GU YI-MING, HUANG MING-ZHU, WANG KE LING. ELECTRONIC STRUCTURES OF 3d-TRANSITION METAL IN GaAs1-xPx ALLOY SYSTEM. Acta Physica Sinica, doi: 10.7498/aps.37.11

Metrics

Abstract views: 396
PDF Downloads: 27
Cited By: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

Search

Article

留言板

A Compute-In-Memory Architecture and System-Technology Codesign Simulator Based 3D NAND Flash

Abstract

Cited By

Metrics

Authors

Referees

About Journal

About

Search

Article

留言板

A Compute-In-Memory Architecture and System-Technology Codesign Simulator Based 3D NAND Flash

Abstract

Cited By

Metrics

Publishing process

Authors

Referees

About Journal

About