-
With the rapid development of artificial intelligence technology, large language models (LLMs) have become the core driving force for the paradigm shift in materials science research. This review explores the comprehensive role of LLMs in accelerating material design throughout the entire research lifecycle from knowledge mining to intelligent design. This work aims to emphasize how LLMs can leverage their advantages in information retrieval, cross-modal data integration, and intelligent reasoning to address challenges in traditional materials research, such as data fragmentation, high experimental costs, and limited reasoning capabilities. Key methods include applying LLMs to knowledge discovery through techniques such as retrieval-augmented generation (RAG), multi-modal information retrieval, and knowledge graph construction. These approaches can efficiently extract and construct material data from a vast repository of scientific literature and experimental records. Additionally, LLMs are integrated with automated experimental platforms to optimize workflows from natural language-driven experiment design to high-throughput iterative testing. The results demonstrate that LLMs significantly enhance material research efficiency and accuracy. For instance, in knowledge mining, LLMs improve information retrieval accuracy by up to 29.4% in tasks such as predicting material synthesis conditions. In material design, LLMs can accelerate computational modeling, structure and performance prediction, and reverse engineering, reducing experimental trial-and-error cycles. Notably, LLMs perform well in cross-scale knowledge integration, linking material composition, processing parameters, and performance metrics to guide innovative synthesis pathways. However, challenges still exist, including dependence on high-quality data, the “black-box” nature of LLMs, and limitations in handling complex material systems. The future direction emphasizes improving data quality through multi-source integration, enhancing model explainability through visualization tools, and deepening interdisciplinary collaboration, and bridging the gaps between AI and domain-specific expertise. In summary, LLMs are reshaping materials science by implementing a data-driven, knowledge-intensive research paradigms. The ability of LLMs to integrate vast datasets, predict material properties, and automate experimental workflows makes them indispensable tools for accelerating material discovery and innovation. With the development of LLMs, their synergistic effect with physical constraints and experimental platforms is expected to open new fields in material design. -
Keywords:
- large language models /
- material design /
- intelligent tasks /
- synthesis and property prediction
-
图 2 (a) CLAIRify框架: 基于LLMs的NLP模块. LLMs采用输入①; 结构化语言定义和资源约束, 生成未经验证的结构化语言②; 输出结果由验证者检查并通过反馈传递给LLMs③; LLMs产生的输出通过验证器④; 将正确的输出⑤; 传递给任务和运动规划模块⑥; 生成机器人轨迹⑦[39]; (b) 自主移动机器人研究各个系统的工作流程和每个模块的功能[40]; (c) Chemspyd通过Executor与AutoSuite进行通信, Executor读写共享的CSV文件, 提供了一种标准的通信方式, 这种通信方式是人类可读的, 并且由Python和AutoSuite共同支持[42]
Figure 2. (a) CLAIRify Framework: NLP Module Based on LLMs. The LLMs take input ①; structured language definitions and resource constraints are used to generate unverified structured language ②; the output is checked by a verifier and fed back to the LLMs via feedback ③; the output generated by the LLMs passes through the verifier ④; the correct output ⑤ is passed to the task and motion planning module ⑥; generating robot trajectories ⑦[39]. (b) Workflow of various systems in autonomous mobile robot research and the functions of each module[40]; (c) Chemspyd communicates with AutoSuite via the Executor, which reads and writes shared CSV files, providing a standardized communication method that is human-readable and supported by both Python and AutoSuite[42].
图 3 (a) 流动化学系统集成的多种反应模块和在线分析工具形成了连续的合成路径[54]; (b) 上部分为DiZyme工作流程: 从制定科学任务到发现新的纳米酶材料. 下部分为使用 pubchempy和rdkit文库获得的新描述符扩展了数据库, 表示有机涂层和材料成分的分子特征[57]; (c) 顶部面板是标准文本挖掘过程的示意图: 左部分是专家注释以构建基线语料库; 中间部分是从文献文本中提取关键信息并构建扩展语料库; 右部分是存储在数据库中以供将来的数据挖掘. 底部面板为将合成句子转换为动作序列的示例. 动作序列的关键组成部分, 如起始和目标材料、合成步骤及其条件, 通过不同的文本挖掘算法从段落中找到和提取[58]
Figure 3. (a) Integration of various reaction modules and online analytical tools in a flow chemistry system, forming a continuous synthetic pathway[54]. (b) The upper part illustrates the DiZyme workflow: from formulating scientific tasks to discovering new nanozyme materials. The lower part shows the expansion of the database with new descriptors obtained using the pubchempy and rdkit libraries, representing molecular features of organic coatings and material compositions[57]. (c) The top panel is a schematic diagram of the standard text mining process: the left part involves expert annotation to construct a baseline corpus; the middle part extracts key information from literature texts and builds an extended corpus; the right part stores the data in a database for future data mining. The bottom panel provides an example of converting synthesis sentences into action sequences. Key components of the action sequences, such as starting and target materials, synthesis steps, and their conditions, are identified and extracted from paragraphs using different text mining algorithms[58].
图 4 (a) 生成式AI工具可以增强材料科学中的假设生成[59]; (b) 定制工作流程的示意图, 从已知的合金成分到发现新的金属玻璃[60]; (c) 玻璃金属模型[61]; (d) MgBERT模型的基本架构[59]
Figure 4. (a) Generative AI tools can enhance hypothesis generation in materials science[59]; (b) schematic diagram of a customized workflow, from known alloy compositions to the discovery of new metallic glasses[60]; (c) a metallic glass model[61]; (d) the basic architecture of the MgBERT model[59].
图 5 (a) BERT的工作流程[69]; (b) MLP 的过程包括五个步骤: 数据收集、预处理、文本分类、信息提取和数据挖掘[70]; (c) 用于构建 ANN 建模数据库的数字化过程图示[63]
Figure 5. (a) Workflow of BERT[69]; (b) the process of MLP consists of five steps: data collection, preprocessing, text classification, information extraction, and data mining[70]; (c) illustration of the digitization process for constructing an ANN modeling database[63].
-
[1] Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Kuttler H, Lewis M, Yin W, Rocktaschel T, Riedel S, Kiela D 2020 34th Conference on Neural Information Processing Systems (NeurIPS) Canada, December 6-12 2020, p16792
[2] Shi L, Liu Z M, Yang Y, Wu W Z, Zhang Y Y, Zhang H B, Lin J, Wu S Y, Chen Z H, Li R M, Wang N, Liu Z P, Tan H B, Gao H Y, Zhang Y, Wang G 2024 arXiv: 2408.04665v2 [cs. CL]
[3] Luu R K., Buehler M J. 2024 Adv. Sci. 11 e2306724
Google Scholar
[4] Li J T, Liu Y Q, Fan W Q, Wei X Y, Liu H, Tang J L, Li Q 2024 IEEE Trans. Knowl. Data Eng. 36 6071
Google Scholar
[5] Park N H. , Callahan T J. , Hedrickd J L, Erdmann T, Capponi S 2024 arXiv: 2408. 11793v2[cs. AI]
[6] Chiang Y, Hsieh E, Chou C H, Riebesell J 2024 arXiv: 2401.17244v3[cs. CL]
[7] Tang Y H, Xu W B, Cao J, Ma J Z, Gao W L, Farrell S, Erichson B, Mahoney M W. , Nonaka A, Yao Z 2025 arXiv: 2502.13107v2[cs. AI]
[8] 李长泰, 韩旭, 蒋若辉, 贠培文, 胡鹏飞, 班晓娟 2023 工程科学学报 46 290
Li C T, Han X, Jiang R H, Yun P W, Hu P F, Ban X J 2023 Chin. J. Eng. Sci. 46 290
[9] Tanishq G, Mohd Z, Anoop Krishnan N M, Mausam 2022 npj Comput. Mater. 8 940
[10] Lai N S, Tew Y S, Zhong X L, Yin J, Li J L, Yan B H, Wang X N 2023 Ind. Eng. Chem. Res. 62 17835
Google Scholar
[11] Thway M, Low A K. Y., Khetan S, Dai H W, Recatala J, Chen A P, Hippalgaonkar K 2024 Digital Discovery 3 328
Google Scholar
[12] Yu S L, Ran N, Liu J J 2024 Artif. Intell. Chem. 2 100076
Google Scholar
[13] Durmaz A R, Thomas A, Mishra L, Murthy R N, Straub T 2024 Sci. Data. 11 1112
Google Scholar
[14] Lei G, Docherty R, Cooper S J. 2024 Digital Discovery 3 1257
Google Scholar
[15] Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson K A, Ceder G 2019 J. Chem. Inf. Model. 59 3692
Google Scholar
[16] Zeng Z N, Yin B C, Wang S P, Liu J R, Yang C, Yao H S, Sun X Z, Sun M S, Xie G T, Liu Z Y 2024 Bioinformatics 40 btae534
Google Scholar
[17] Jia X, Aziz A, Hashimoto Y, Li H 2024 Sci. China Mater. 67 1173
Google Scholar
[18] Polak M P, Morgan D 2024 Nat. Commun. 15 1569
Google Scholar
[19] Foppiano L, Lambard G, Amagasa T, Ishii M 2024 Sci. Technol. Adv. Mater. Methods. 4 2356506
[20] Dagdelen J, Dunn A, Lee S, Walker N, Rosen A S, Ceder G, Persson K A, Jain A 2024 Nat. Commun. 15 1418
Google Scholar
[21] Zheng Z L, Zhang O F, Borgs C, Chayes J T., Yaghi O M 2023 J. Am. Chem. Soc. 145 18048
Google Scholar
[22] Yang J M, Walker K C, Bekar A A, Hao B, Bhadelia N, Joseph D, Paschalidis I C, 2024 Int. J. Med. Informatics 189 105500
Google Scholar
[23] 时宗彬, 朱丽雅, 乐小虬 2024 数据分析与知识发现 8 23
Shi Z B, Zhu L Y, Le X Q 2024 Data Anal. Knowl. Discovery 8 23
[24] Buehler M J 2023 J. Mech. Phys. Solids 181 105454
Google Scholar
[25] Zia G A J, Valdestilhas A, Torres B J M, Kruschwitz S 2024 1st International Workshop on Semantic Materials Science: Harnessing the Power of Semantic Web Technologies in Materials Science, SeMatS 2024 Amsterdam, the Netherlands, September 17-19 2024, p101
[26] Corlatescu D G, Watanabe M, Ruseti S, Dascalu M, McNamara D S 2024 Comput. Hum. Behav. 154 108154
Google Scholar
[27] Bai X F, He S, Li Y, Xie Y B, Zhang X, Du W L, Li J R 2025 NPJ Comput. Mater. 11 1
Google Scholar
[28] 贺强, 杨晓强, 徐艺 2018 玻璃钢/复合材料 4 62
He Q, Yang X Q, Xu Y 2018 Fiber Reinf. Plast. /Compos. 4 62
[29] 陈慧琳 2017 硕士学位论文 (石家庄: 河北科技大学)
Chen H L 2017 M. S. Thesis (Shijiazhuang: Hebei University of Science and Technology
[30] Ye Y P, Ren J, Wang S Z, Wan Y W, Wang H F, Razzak I, Hoex B, Xie T, Zhang W J 2024 arXiv: 2404.03080v3[cs. CL]
[31] Yang F L, Egon C, Xue J, Ryuhei S, Kazuaki K, Yusuke H, Shin O, Li H 2023 Nano Mater. Sci. 6 256
[32] Songshan Lake Materials Laboratory, Chinese Academy of Sciences Institute of Physics https://news.qq.com/rain/a/20250211A03D5O00 [2025-2-11]
[33] Buehler M J 2024 ACS Eng. Au 4 241
Google Scholar
[34] Cai J M, Yuan Y J, Sui X P, Lin Y Z, Zhuang K, Xu Y, Zhang Q, Ukrainczyk N, Xie T Y 2024 Constr. Build. Mater. 425 135965
Google Scholar
[35] 任海玉, 刘建平, 王健, 顾勋勋, 陈曦, 张越, 赵昌顼 2024 计算机工程与应用 61 1
Ren H Y, Liu J P, Wang J, Gu X X, Chen X, Zhang Y, Zhao C X 2025 Comput. Eng. Appl. 61 1
[36] Bran A M, Cox S, Schilter O, Baldassari C, White A D, Schwaller P 2024 Nat. Mach. Intell. 6 525
Google Scholar
[37] Ansari M, Watchorn J, Brown C E, Brown J S 2024 arXiv: 2410. 03963v1[physics. chem-ph]
[38] Zheng Z L, Rampal N, Inizan T J, Borgs C, Chayes J T, Yaghi O M 2025 Nat. Rev. Mater. 10 369
Google Scholar
[39] Yoshikawa N, Skreta M, Darvish K, Arellano S, Zhi J, Kristensen L B, Li A Z, Zhao Y C, Xu H P, Kuramshin A, Aspuru A, Shkurti F, Garg A 2023 Auton. Robot. 47 1057
Google Scholar
[40] Zhu Q, Zhang F, Huang Y, Xiao H Y, Zhao L Y, Zhang X C, Song T, Tang X S, Li X, He G, Chong B C, Zhou J Y, Zhang Y H, Zhang B C, Cao J Q, Luo M, Wang S, Ye G L, Zhang W J, Chen X, Cong S, Zhou D L, Li H R, Li J L, Zou G, Shang W W, Jiang J, Luo Y 2022 Natl. Sci. Rev. 9 nwac190
Google Scholar
[41] Dai T W, Vijayakrishnan S, Szczypiński F T, Ayme J F, Simaei E, Fellowes T, Clowes R, Kotopanov L, Shields C E, Zhou Z X, Ward J W, Cooper A I 2024 Nature 635 8040
[42] Seifrid M, Strieth F, Haddadnia M, Wu T C, Alca E, Bodo L, Arellano S, Yoshikawa N, Skreta M, Keunen R, Aspuru A 2024 Digital Discovery 3 1319
Google Scholar
[43] Zheng Z L, Florit F, Jin B K, Wu H Y, Li S C, Nandiwale K Y, Chase A Salazar C A, Jason G Mustakis J G, Green W H, Jensen K F 2024 Angew. Chem. Int. Ed. 64 e202418074
[44] Ruan Y X, Lu C Y, Xu N, He Y C, Chen Y X, Zhang J, Xuan J, Pan J Z, Fang Q, Gao H Y, Shen X D, Ye N, Zhang Q, Mo Y M 2024 Nat. Commun. 15 10160
Google Scholar
[45] Hatakeyama K, Ishikawa H, Takaishi S, Igarashi Y, Nabae Y, Hayakawa T 2024 Polym. J. 56 997
Google Scholar
[46] Zhou J Y, Luo M, Chen L J, Zhu Q, Jiang S, Zhang F, Shang W W, Jiang J 2025 Digital Discovery 4 636
Google Scholar
[47] Antunes L M, Butler K T, Grau C R 2024 Nat. Commun. 15 10570
Google Scholar
[48] Szymanski N J, Rendy B, Fei Y X, Kumar R E, He T J, Milsted D, McDermott M J, Gallant M, Cubuk E D, Merchant A, Kim H, Jain A, Bartel C J, Persson K, Zeng Y, Ceder G 2023 Nature 624 86
Google Scholar
[49] Sriram A, Miller B K, Chen R T Q, Wood B M 2024 arXiv: 2410.23405v1[cs. LG]
[50] Jia S Y, Zhang C, Fung V 2024 arXiv: 2406.13163v1[cond-mat. mtrl-sci]
[51] Liu H, Zheng H, Jia Z H, Zhou B H, Liu Y, Chen X L, Feng Y J, Li W, Yang W J, Li H 2023 Front. Chem. Sci. Eng. 17 2156
Google Scholar
[52] 张成翼, 王兴宇, 王子运 2024 催化学报 59 7
Google Scholar
Zhang C Y, Wang X Y, Wang Z Y 2024 Chin. J. Catal. 59 7
Google Scholar
[53] Slautin B N, Liu Y T, Liu Y, Emery R, Hong S, Dubey A, Shvartsman V V, Lupascu D C, Sanchez S L, Ahmadi M, Kim Y, Strelcov E, Brown K A, Rack P D, Kalinin S V 2025 arXiv: 2501.02503v1[cond-mat. mtrl-sci]
[54] Su Y M, Wang X, Ye Y X, Xie Y B, Xu Y J, Jiang Y B, Wang C 2024 Chem. Sci. 15 12200
Google Scholar
[55] 陈子逸, 谢帆恺, 万萌, 袁扬, 刘淼, 王宗国, 孟胜, 王彦棡 2023 中国物理 B 32 118104
Google Scholar
Chen Z Y, Xie F K, Wan M, Yuan Y, Liu M, Wang Z G, Meng S, Wang Y L 2023 Chin. Phys. B 32 118104
Google Scholar
[56] Zhang D, Li H 2024 arXiv: 10.26434[cs. DL]
[57] Razlivina J, Dmitrenko A, Vinogradov V 2024 J. Phys. Chem. 15 5804
[58] Chen X Q, Gao Y, Wang L D, Cui W J, Huang J M, Du Y, Wang B 2024 Sci. Data 11 347
Google Scholar
[59] Liu S Y, Wen T Q, Pattamatta A S L S, Srolovitz D J 2024 Mater. Today 80 240
Google Scholar
[60] Chen C, Maqsood A, Zhang Z, Wang X B, Duan L R, Wang H H, Chen T Y, Liu S Y, Li Q T, Luo J S, Jacobsson T J 2024 Cell Rep. Phys. Sci. 5 102058
Google Scholar
[61] Wang J Y, Liu X J, Wu Y, Wang H, Ma D, Lu Z P 2023 Acta Mater. 261 119386
Google Scholar
[62] 王紫维, 韩民, 金彪 2024 环境化学 43 69
Google Scholar
Wang Z W, Han M, Jin B 2024 Environ. Chem. 43 69
Google Scholar
[63] Ding R, Wang X B, Tan A, Li J, Liu J G 2023 ACS Catal. 13 13267
Google Scholar
[64] Unni R, Zhou M Y, Wiecha P R, Zheng Y B 2024 Curr. Opin. Solid State Mater. Sci. 30 101157
Google Scholar
[65] Han X Q, Wang X D, Xu M Y, Feng Z, Yao B W, Guo P J, Gao Z F, Lu Z Y 2025 Chin. Phys. Lett. 42 027403
Google Scholar
[66] Choudhary K 2024 J. Am. Chem. Soc. 15 6909
[67] Oliveira O N Jr, Christino L, Oliveira M C F, Paulovich F V 2023 J. Chem. Inf. Model. 63 7605
Google Scholar
[68] Liu G, Sun M, Wojciech M, Jiang M, Chen J 2024 arXiv: 2410. 04223v1[cs. LG]
[69] Zhang C W, Zhai Y S, Gong Z Y, Duan H L, She Y B, Yang Y F, Su A 2024 J. Cheminformatics 16 89
Google Scholar
[70] Choi J, Lee B 2024 Commun. Mater. 5 13
Google Scholar
[71] Wang Y X, Li Y, Tang Z C, Li H, Yuan Z L, Tao H G, Zou N L, Bao T, Liang X H, Chen Z Z, Xu S H, Bian C, Xu Z M, Wang C, Si C, Duan W H, Xu Y 2024 Sci. Bull. 69 2514
Google Scholar
[72] Liu Y, Yang Z W, Yu Z Y, Liu Z T, Liu D H, Lin H L, Li M Q, Ma S C, Avdeev X, Shi S Q 2023 J. Materiomics 9 798
Google Scholar
[73] Buehler M J 2024 Adv. Funct. Mater. 34 9531
[74] Wu T, Shen J Z, Jia Z X, Wang Y X, Zheng Z L 2025 arXiv: 2502. 18890v1[cs. CL]
[75] Obuchi K, Funaya K, Toyama K 2024 NEC Tech. J. 17 46
[76] Li R C, Patel T, Wang Q Y, Du X Y 2024 arXiv: 2408. 14033v2 [cs. AI]
[77] Luo Z M, Yang Z L, Xu Z X, Yang W, Du X Y 2025 arXiv: 2501. 04306v1[cs. CL]
[78] Tang Z C, Li H, Lin P Z, Gong X X, Jin G, He L X, Jiang H, Ren X G, Duan W H, Xu Y 2024 Nat. Commun. 15 8815
Google Scholar
Metrics
- Abstract views: 755
- PDF Downloads: 84
- Cited By: 0