Search

Article

x

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

A high-quality dataset construction method for text mining in materials science

Liu Yue Liu Da-Hui Ge Xian-Yuan Yang Zheng-Wei Ma Shu-Chang Zou Zhe-Yi Shi Si-Qi

Citation:

A high-quality dataset construction method for text mining in materials science

Liu Yue, Liu Da-Hui, Ge Xian-Yuan, Yang Zheng-Wei, Ma Shu-Chang, Zou Zhe-Yi, Shi Si-Qi
PDF
HTML
Get Citation
  • Numerous data and knowledge generated and stored as text in peer-reviewed scientific literature are important for materials research and development. Although text mining can automatically explore this information, the barriers of acquiring high-quality textual data prevent its general application in materials science. Herein, we systematically analyze the issues of textual DATA QUALITY and related research from the perspectives of data quality and quantity. Following this, we propose a pipeline to construct high-quality datasets for text mining in materials science. In this pipeline, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is used to generate high-quality pre-annotated corpora conditioned on the characteristics of material texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating material domain knowledge (cDA-DK) is constructed to augment the data quantity. Experimental results on datasets with various material systems demonstrate that our method can effectively improve the accuracy of downstream models and the F1-score towards the named entity recognition task in NASICON-type solid electrolyte material reaches 84%. This study provides an important insight into the general application of text mining in materials science, and is expected to advance the material design and discovery driven by data and knowledge bidirectionally.
      Corresponding author: Shi Si-Qi, sqshi@shu.edu.cn
    • Funds: Project supported by the National Key Research and Development Program of China (Grant No. 2021YFB3802101), and the National Natural Science Foundation of China (Grant Nos. 92270124, 52073169, 52102313).
    [1]

    Gupta T, Zaki M, Krishnan N M A, Mausam 2022 npj Comput. Mater. 8 102Google Scholar

    [2]

    Olivetti E A, Cole J M, Kim E, Kononova O, Ceder G, Han T Y J, Hiszpanski A M 2020 Appl. Phys. Rev. 7 041317Google Scholar

    [3]

    Venugopal V, Sahoo S, Zaki M, Agarwal M, Gosvami N N, Krishnan N M A 2021 Patterns 2 100290Google Scholar

    [4]

    Kononova O, He T, Huo H, Trewartha A, Olivetti E A, Ceder G 2021 iScience 24 102155Google Scholar

    [5]

    Kim E, Huang K, Saunders A, McCallum A, Ceder G, Olivetti E 2017 Chem. Mater. 29 9436Google Scholar

    [6]

    Mysore S, Jensen Z, Kim E, Huang K, Chang H S, Strubell E, Flanigan J, McCallum A, Olivetti E 2019 Proceedings of the 13th Linguistic Annotation Workshop Florence, Italy, August 1, 2019 p56

    [7]

    Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O, Persson K A, Ceder G, Jain A 2019 Nature 571 95Google Scholar

    [8]

    Vaucher A C, Zipoli F, Geluykens J, Nair V H, Schwaller P, Laino T 2020 Nat. Commun. 11 3601Google Scholar

    [9]

    Nie Z, Zheng S, Liu Y, Chen Z, Li S, Lei K, Pan F 2022 Adv. Funct. Mater. 32 2201437Google Scholar

    [10]

    Wang W R, Jiang X, Tian S H, Liu P, Dang D P, Su Y J, Lookman T, Xie J X 2022 npj Comput. Mater. 8 9Google Scholar

    [11]

    Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson K A, Ceder G, Jain A 2019 J. Chem. Inf. Model. 59 3692Google Scholar

    [12]

    Friedrich A, Adel H, Tomazic F, Hingerl J, Benteau R, Maruscyk A, Lange L 2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics Seattle, Washington, July 5–10, 2020 p1255

    [13]

    He T, Sun W, Huo H, Kononova O, Rong Z, Tshitoyan V, Botari T, Ceder G 2020 Chem. Mater. 32 7861Google Scholar

    [14]

    Beal M S, Hayden B E, Le Gall T, Lee C E, Lu X, Mirsaneh M, Mormiche C, Pasero D, Smith D C, Weld A, Yada C, Yokoishi S 2011 ACS Comb. Sci. 13 375Google Scholar

    [15]

    Rajan A C, Mishra A, Satsangi S, Vaish R, Mizuseki H, Lee K R, Singh A K 2018 Chem. Mater. 30 4031Google Scholar

    [16]

    刘悦, 邹欣欣, 杨正伟, 施思齐 2022 硅酸盐学报 50 863Google Scholar

    Liu Y, Zou X X, Yang Z W, Shi S Q 2022 J. Chin. Ceram. Soc. 50 863Google Scholar

    [17]

    赵凯琳, 靳小龙, 王元卓 2021 软件学报 32 349Google Scholar

    Zhao K L, Jin X L, Wang Y Z 2021 J. Software 32 349Google Scholar

    [18]

    Wei J, Zou K 2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9 th International Joint Conference on Natural Language Processing Hong Kong, China, November 3–7, 2019 p6382

    [19]

    Morris J X, Lifland E, Yoo J Y, Grigsby J, Jin D, Qi Y 2020 Proceedings of the 2020 EMNLP (Systems Demonstrations) Punta Cana, Dominican Republic, November 16–20, 2020 p119

    [20]

    Malandrakis N, Shen M, Goyal A, Gao S, Sethi A, Metallinou A 2019 Proceedings of the 3rd Workshop on Neural Gene ration and Translation (WNGT 2019) Hong Kong, China, November 4, 2019 p90

    [21]

    Wu X, Lü S W, Zang L J, Han J Z, Hu S L 2019 Computational Science–ICCS 2019 (Cham: Springer Nature Switzerland AG) p84

    [22]

    Kumar V, Choudhary A, Cho E 2021 arXiv: 2003.02245 [cs. CL]

    [23]

    Xu X, Lei Y, Li Z 2020 IEEE Trans. Ind. Electron. 67 2326Google Scholar

    [24]

    Shinyama Yhttps://euske.github.io/pdfminer/ [2022-11-20]

    [25]

    Jessop D M, Adams S E, Willighagen E L, Hawizy L, Murray-Rust P 2011 J. Cheminf. 3 41Google Scholar

    [26]

    Hawizy L, Jessop D M, Adams N, Murray-Rust P 2011 J. Cheminf. 3 17Google Scholar

    [27]

    Swain M C, Cole J M 2016 J. Chem. Inf. Model. 56 1894Google Scholar

    [28]

    Sun C C 2009 J. Pharm. Sci. 98 1671Google Scholar

    [29]

    Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D 1999 Natural Language Processing Using Very Large Corpora (Berlin: Springer) pp157–176

    [30]

    Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V 2019 arXiv: 1907.11692 [cs. CL]

    [31]

    Chen S, Wu C, Shen L, Zhu C, Huang Y, Xi K, Maier J, Yu Y 2017 Adv. Mater. 29 1700431Google Scholar

    [32]

    肖睿娟, 李泓, 陈立泉 2018 67 128801Google Scholar

    Xiao R J, Li H, Chen L Q 2018 Acta Phys. Sin. 67 128801Google Scholar

    [33]

    Liu Y, Ge X Y, Yang Z W, Sun S Y, Liu D H, Avdeev M, Shi S Q 2022 J. Power Sources 545 231946Google Scholar

  • 图 1  高质量材料文本挖掘数据集构建管道

    Figure 1.  The pipeline for constructing high-quality datasets for materials text mining.

    图 2  文献的数据与过程溯源示意图

    Figure 2.  The illustration of the traceability of literature data and process.

    图 3  实体关系标注流程示意图

    Figure 3.  The process of annotation on entities and relations.

    图 4  基于cDA-DK的材料文本数据增强

    Figure 4.  Materials textual data augmentation based on cDA-DK.

    图 5  两份数据集的样本统计情况对比 (a) 三元组个数分布情况; (b) 语句长度分布情况

    Figure 5.  Comparison of sample statistics of two datasets: (a) The distribution of numbers of triplets; (b) the distribution of length of sentence.

    图 6  实体识别模型在不同数据集上的混淆矩阵 (a) Dataset 1的混淆矩阵; (b) Dataset 2的混淆矩阵

    Figure 6.  Confusion matrix of NER model on various datasets: (a) The confusion matrix of Dataset 1; (b) the confusion matrix of Dataset 2.

    图 7  MatBERT-BiLSTM-CRF在不同数据集上的训练及验证Loss变化曲线 (a) Dataset 1上的Loss变化曲线; (b) Dataset 2上的Loss变化曲线; (c) Dataset 4上的Loss变化曲线; (d) Dataset 5上的Loss变化曲线

    Figure 7.  The training and validation loss function of MatBERT-BiLSTM-CRF on various datasets: (a) The loss function on Dataset 1; (b) the loss function on Dataset 2; (c) the loss function on Dataset 4; (d) the loss function on Dataset 5.

    图 8  对激活能预测起关键影响的部分描述符, 其中虚线表示尚未被研究的潜在描述符[33]

    Figure 8.  Partial descriptor entities that are critical for predicting activation energy, of which dotted lines indicate potential ones still to be developed[33].

    表 1  材料科学文本语料获取方式对比

    Table 1.  Comparison of acquisition methods of materials scientific corpus.

    获取方式数据库文档类型访问权限文档数量参考
    索引数据库 APICAplus论文, 专利, 报告订阅www.cas.org/support/documentation/references
    DOAJ论文部分订阅doaj.org
    PubMed Central论文开放获取较少www.ncbi.nlm.nih.gov/pmc
    Science Direct论文订阅dev.elsevier.com/api_docs.html
    Scopus摘要开放获取较少
    Springer Nature论文, 书籍订阅dev.springernature.com/
    网络爬虫网页论文, 专利, 报告, 书籍开放获取requests.readthedocs.io, crummy.com/software/BeautifulSoup
    DownLoad: CSV

    表 2  化学与材料科学中常用的自然语言处理工具

    Table 2.  Common natural language processing tools in chemistry and materials science.

    名称适用范围是否开源版本迭代功能完备性难易性友好性
    OSCAR4[25]化学反应和生物化学普通
    ChemicalTagger[26]化学合成作用和条件普通
    ChemDataExtractor[27]通用化学和材料科学领域容易
    DownLoad: CSV

    表 3  已有材料文本挖掘研究中的实体标签定义对比

    Table 3.  Comparison of entity label definitions in previous materials text mining research.

    来源目标标签数标签类别适用领域应用实例
    Weston等[11]构建材料领域最新研究结果
    与历史文献的关联
    7无机材料, 相结构, 描述符, 属
    性, 应用, 合成方法, 表征方法
    无机材料目标材料检索, 文献搜索
    与总结, 元信息分析
    He等[13]从无机固相合成反应文献
    中挖掘反应前体信息
    3材料, 合成反应前体,
    目标化合物
    无机固相
    合成反应
    固相合成反应前体
    数据挖掘, 元信息分析
    Friedrich等[12]标注科学出版物中与SOFCs
    实验相关的信息
    4(SOFC) 17(SOFC-slot)实验, 材料, 数值, 应用等电池材料构建SOFCs科学语料库并用
    于多个实验信息提取任务
    Wang等[10]从文献中自动挖掘出数据驱动的材
    料设计模型所需的高质量可靠数据
    6元素, 合金命名实体, 成分含
    量, 属性描述符, 属性值, 其他
    合金材料钴基单晶高温合金${ {\rm{\gamma } } }'$
    相固溶温度预测
    Nie等[9]构建语义表示框架以探索潜在
    的锂离子电池阴极材料
    3无机材料, 锂离子电池
    阴极材料, 属性描述符
    电池材料新型锂离子电池阴
    极材料设计与寻优
    DownLoad: CSV

    表 4  面向通用领域的材料实体类型定义

    Table 4.  The definition of materials entity types in the general domain.

    实体标签定义示例
    Composition与化学式有关的内容; 描述材料内部与含量相关的内容等. NaCl, CaCl2; Na concentration, Electrons charge carriers.
    Structure晶体结构; 相; 用于刻画晶体结构的名称等. Fcc, Phase; Bottleneck, Channel, Path.
    Property带单位的可度量值; 材料表现出来定性的性质或现象;
    描述材料产生物理/化学行为或物理/化学机制的名词等.
    Conductivity, Activation, Radius; Ferroelectric, Metallic; Phase transition, Ionic reaction.
    Processing材料合成技术或加工工艺; 材料改性手段等. Solid state reaction, Annealing; Doping.
    Characterization用于表征材料的任何实验、理论、模型或公式等. XRD, STM, Photoluminescence, DFT;
    Bethe-Salpeter equation.
    Application任何高级的应用; 任何特定的器件、系统等. Cathode, Photovoltaics; Battery Management System.
    Feature样品类型、形状的特殊描述. Single crystal, Bulk, nanotube, Quantum dot.
    Condition描述材料所处的环境或外部条件. 980 $°{\rm{C}}$, 1000 MPa.
    DownLoad: CSV

    表 5  面向通用领域的材料实体关系类型定义

    Table 5.  The definition of materials relation types in the general domain.

    关系标签 (A to B)定义可能存在此关系的实体类型
    Cause-EffectA对B有影响Property-Property, Composition-Structure, Structure-Property, ...
    Component-WholeA是B的部分Composition-Composition, ...
    Feature-OfA是B的特征Feature-Composition, Feature-Application, ...
    Located-OfA占据了B位置Composition-Structure, ...
    Instance-OfA是B的实例Composition-Composition, Structure-Structure, Property-Property, ...
    Condition-OnA的条件是BProcessing-Condition, ...
    Method-OfA的表征方法是BProperty-Characterization, ...
    OtherA与B存在除上述关系类型外的其他关系
    DownLoad: CSV

    表 6  常用文本标注工具对比

    Table 6.  Comparison of common tools for text annotation.

    标注工具适配任务文本要求角色管理权限难易性友好性可扩展性参考
    Label Studio多模态信息标注严格不完善普通labelstud.io
    Brat关系标注一般完善普通github.com/nlplab/brat
    Doccano文本分类严格较完善普通github.com/doccano
    EasyData实体与关系标注一般完善容易ai.baidu.com/easydata/
    DownLoad: CSV
    算法1 数据增强方法cDA-DK
    输入 原始数据集$ {D}_{{\rm{t}}{\rm{r}}{\rm{a}}{\rm{i}}{\rm{n}}} = \{\left({x}_{1}, {y}_{1}\right), \left({x}_{2}, {y}_{2}\right), \dots , \left({x}_{n}, {y}_{n}\right)\} $    预训练语言模型模型$ {P}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $
       材料领域词典$ C=\{{w}_{1}, {w}_{2}, \dots , {w}_{m}\} $输出 增强数据集$ {D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}} $
    1: 开始
    2: for $ {w}_{i}\in C $ do3:   $ {w}_{i} $输入至$ {P}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $的词汇表并训练其对应的词向量
    4: 在下游任务文本数据增强上微调$ {P}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $得到$ {F}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $
    5: 初始化$ {D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}}=\left\{\right\} $
    6: for $ \left\{{x}_{i}, {y}_{i}\right\}\in {D}_{{\rm{t}}{\rm{r}}{\rm{a}}{\rm{i}}{\rm{n}}} $ do
    7:  $ ({\widehat{x}}_{i}, {\widehat{y}}_{i})={F}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}}({x}_{i}, {y}_{i}) $ // 生成新的样本
    8:  $ {D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}}={D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}}\cup ({\widehat{x}}_{i}, {\widehat{y}}_{i}) $ // 生成样本加入增强数据集
    9: 结束
    DownLoad: CSV

    表 7  NASICON实体关系数据集与CoNLL-2004数据集的对比

    Table 7.  Comparison of the NASICON dataset with the CoNLL-2004 dataset.

    数据集样本数实体类型实体数关系类型关系数
    CoNLL-20041, 44145, 34752, 020
    NASICON2, 43484, 85782, 297
    DownLoad: CSV

    表 8  NASICON实体关系数据集在增强前后的数据示例对比

    Table 8.  Comparison of samples before and after augmentation of NASICON dataset.

    数据集样本数实体数关系数示例
    原始数据集243448572297The (O) ionic (B-Property) conductivity (I-Property) decreases (O) with (O) increasing (O) activation (B-Property) energy (I-Property) . (O)
    cDA-DK 增强数据集484697144594The (O) electrode (B-Property) conductivity (I-Property) decreases (O) with (O) increasing (O) electric (B-Property) energy (I-Property) . (O)
    DownLoad: CSV

    表 9  实验数据集信息

    Table 9.  The details of experimental datasets.

    数据集名称应用领域重命名样本量语料规模来源
    NASICON 实体识别数据集NASICON 型固态电解质Dataset 12, 43455篇文献领域专家标注
    Dataset 22, 434数据增强
    Dataset 330535篇文献非专业人员标注
    Matscholar[11]无机材料Dataset 45, 459800份摘要领域专家标注
    Dataset 55, 459数据增强
    DownLoad: CSV

    表 10  实体识别模型在不同材料数据集上的实验结果

    Table 10.  The results of NER model on various materials datasets.

    数据集材料类别样本量PrecisionRecallF1-score
    Dataset 1NASICON 型固态电解质2, 4340.780.830.80
    Dataset 22, 4340.680.720.70
    Dataset 2+32, 7390.830.850.84
    Dataset 4无机材料5, 4590.860.900.88
    Dataset 55, 4590.750.780.77
    DownLoad: CSV
    Baidu
  • [1]

    Gupta T, Zaki M, Krishnan N M A, Mausam 2022 npj Comput. Mater. 8 102Google Scholar

    [2]

    Olivetti E A, Cole J M, Kim E, Kononova O, Ceder G, Han T Y J, Hiszpanski A M 2020 Appl. Phys. Rev. 7 041317Google Scholar

    [3]

    Venugopal V, Sahoo S, Zaki M, Agarwal M, Gosvami N N, Krishnan N M A 2021 Patterns 2 100290Google Scholar

    [4]

    Kononova O, He T, Huo H, Trewartha A, Olivetti E A, Ceder G 2021 iScience 24 102155Google Scholar

    [5]

    Kim E, Huang K, Saunders A, McCallum A, Ceder G, Olivetti E 2017 Chem. Mater. 29 9436Google Scholar

    [6]

    Mysore S, Jensen Z, Kim E, Huang K, Chang H S, Strubell E, Flanigan J, McCallum A, Olivetti E 2019 Proceedings of the 13th Linguistic Annotation Workshop Florence, Italy, August 1, 2019 p56

    [7]

    Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O, Persson K A, Ceder G, Jain A 2019 Nature 571 95Google Scholar

    [8]

    Vaucher A C, Zipoli F, Geluykens J, Nair V H, Schwaller P, Laino T 2020 Nat. Commun. 11 3601Google Scholar

    [9]

    Nie Z, Zheng S, Liu Y, Chen Z, Li S, Lei K, Pan F 2022 Adv. Funct. Mater. 32 2201437Google Scholar

    [10]

    Wang W R, Jiang X, Tian S H, Liu P, Dang D P, Su Y J, Lookman T, Xie J X 2022 npj Comput. Mater. 8 9Google Scholar

    [11]

    Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson K A, Ceder G, Jain A 2019 J. Chem. Inf. Model. 59 3692Google Scholar

    [12]

    Friedrich A, Adel H, Tomazic F, Hingerl J, Benteau R, Maruscyk A, Lange L 2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics Seattle, Washington, July 5–10, 2020 p1255

    [13]

    He T, Sun W, Huo H, Kononova O, Rong Z, Tshitoyan V, Botari T, Ceder G 2020 Chem. Mater. 32 7861Google Scholar

    [14]

    Beal M S, Hayden B E, Le Gall T, Lee C E, Lu X, Mirsaneh M, Mormiche C, Pasero D, Smith D C, Weld A, Yada C, Yokoishi S 2011 ACS Comb. Sci. 13 375Google Scholar

    [15]

    Rajan A C, Mishra A, Satsangi S, Vaish R, Mizuseki H, Lee K R, Singh A K 2018 Chem. Mater. 30 4031Google Scholar

    [16]

    刘悦, 邹欣欣, 杨正伟, 施思齐 2022 硅酸盐学报 50 863Google Scholar

    Liu Y, Zou X X, Yang Z W, Shi S Q 2022 J. Chin. Ceram. Soc. 50 863Google Scholar

    [17]

    赵凯琳, 靳小龙, 王元卓 2021 软件学报 32 349Google Scholar

    Zhao K L, Jin X L, Wang Y Z 2021 J. Software 32 349Google Scholar

    [18]

    Wei J, Zou K 2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9 th International Joint Conference on Natural Language Processing Hong Kong, China, November 3–7, 2019 p6382

    [19]

    Morris J X, Lifland E, Yoo J Y, Grigsby J, Jin D, Qi Y 2020 Proceedings of the 2020 EMNLP (Systems Demonstrations) Punta Cana, Dominican Republic, November 16–20, 2020 p119

    [20]

    Malandrakis N, Shen M, Goyal A, Gao S, Sethi A, Metallinou A 2019 Proceedings of the 3rd Workshop on Neural Gene ration and Translation (WNGT 2019) Hong Kong, China, November 4, 2019 p90

    [21]

    Wu X, Lü S W, Zang L J, Han J Z, Hu S L 2019 Computational Science–ICCS 2019 (Cham: Springer Nature Switzerland AG) p84

    [22]

    Kumar V, Choudhary A, Cho E 2021 arXiv: 2003.02245 [cs. CL]

    [23]

    Xu X, Lei Y, Li Z 2020 IEEE Trans. Ind. Electron. 67 2326Google Scholar

    [24]

    Shinyama Yhttps://euske.github.io/pdfminer/ [2022-11-20]

    [25]

    Jessop D M, Adams S E, Willighagen E L, Hawizy L, Murray-Rust P 2011 J. Cheminf. 3 41Google Scholar

    [26]

    Hawizy L, Jessop D M, Adams N, Murray-Rust P 2011 J. Cheminf. 3 17Google Scholar

    [27]

    Swain M C, Cole J M 2016 J. Chem. Inf. Model. 56 1894Google Scholar

    [28]

    Sun C C 2009 J. Pharm. Sci. 98 1671Google Scholar

    [29]

    Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D 1999 Natural Language Processing Using Very Large Corpora (Berlin: Springer) pp157–176

    [30]

    Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V 2019 arXiv: 1907.11692 [cs. CL]

    [31]

    Chen S, Wu C, Shen L, Zhu C, Huang Y, Xi K, Maier J, Yu Y 2017 Adv. Mater. 29 1700431Google Scholar

    [32]

    肖睿娟, 李泓, 陈立泉 2018 67 128801Google Scholar

    Xiao R J, Li H, Chen L Q 2018 Acta Phys. Sin. 67 128801Google Scholar

    [33]

    Liu Y, Ge X Y, Yang Z W, Sun S Y, Liu D H, Avdeev M, Shi S Q 2022 J. Power Sources 545 231946Google Scholar

  • [1] Chen Xin-Jie, Zhang Jing-Na, Zhang Hui-Tao, Xia Di-Meng, Xu Wen-Feng, Zhu Yi-Ning, Zhao Xing. Computed tomography data based X-ray spectrum estimation method. Acta Physica Sinica, 2023, 72(11): 118701. doi: 10.7498/aps.72.20222307
    [2] Ma Jin-Long, Du Chang-Feng, Sui Wei, Xu Xiang-Yang. Data traffic capability of double-layer network based on coupling strength. Acta Physica Sinica, 2020, 69(18): 188901. doi: 10.7498/aps.69.20200181
    [3] Lin Dan-Ying, Niu Jing-Jing, Liu Xiong-Bo, Zhang Xiao, Zhang Jiao, Yu Bin, Qu Jun-Le. Phasor analysis of fluorescence lifetime data and its application. Acta Physica Sinica, 2020, 69(16): 168703. doi: 10.7498/aps.69.20200554
    [4] Wu Si-Yuan, Wang Yu-Qi, Xiao Rui-Juan, Chen Li-Quan. Development and application of battery materials database. Acta Physica Sinica, 2020, 69(22): 226104. doi: 10.7498/aps.69.20201542
    [5] Guo Shu-Hui, Lu Xin. Live streaming: Data mining and behavior analysis. Acta Physica Sinica, 2020, 69(8): 088908. doi: 10.7498/aps.69.20191776
    [6] Liu Zhen, Yang Xiao-Chao, Zhang Xiao-Xin, Zhang Shen-Yi, Yu Qing-Long, Zhang Xin, Xue Bing-Sen, Guo Jian-Guang, Zong Wei-Guo, Shen Guo-Hong, Bai Chao-Ping, Zhou Ping, Ji Wen-Tao. On-orbit cross-calibration and assimilation for relativistic electron observations from FengYun 4A and GOES-13. Acta Physica Sinica, 2019, 68(15): 159401. doi: 10.7498/aps.68.20190433
    [7] Duan Yan-Hui, Wu Wen-Hua, Fan Zhao-Lin, Luo Jia-Qi. Proper orthogonal decomposition-based data mining of aerodynamic shape for design optimization. Acta Physica Sinica, 2017, 66(22): 220203. doi: 10.7498/aps.66.220203
    [8] Liang Ming-Hui, Zheng Fei-Hu, An Zhen-Lian, Zhang Ye-Wen. Numerical extraction of electric field distribution from thermal pulse method based on Monte Carlo simulation. Acta Physica Sinica, 2016, 65(7): 077702. doi: 10.7498/aps.65.077702
    [9] Jia Guo, Huang Xiu-Guang, Xie Zhi-Yong, Ye Jun-Jian, Fang Zhi-Heng, Shu Hua, Meng Xiang-Fu, Zhou Hua-Zhen, Fu Si-Zu. Experimental measurement of liquid deuterium equation of state data. Acta Physica Sinica, 2015, 64(16): 166401. doi: 10.7498/aps.64.166401
    [10] Zhang Xin-Peng, Hu Niao-Qing, Cheng Zhe, Zhong Hua. Vibration data recovery based on compressed sensing. Acta Physica Sinica, 2014, 63(20): 200506. doi: 10.7498/aps.63.200506
    [11] Yang Fu-Qiang, Zhang Ding-Hua, Huang Kui-Dong, Wang Kun, Xu Zhe. Review of reconstruction algorithms with incomplete projection data of computed tomography. Acta Physica Sinica, 2014, 63(5): 058701. doi: 10.7498/aps.63.058701
    [12] Su Yong, Fan Dong-Ming, You Wei. Gravity field model calculated by using the GOCE data. Acta Physica Sinica, 2014, 63(9): 099101. doi: 10.7498/aps.63.099101
    [13] Zhou Wen-Jing, Hu Wen-Tao, Qu Hui, Zhu Liang, Yu Ying-Jie. Recording and numerical reconstruction of single digital tomographic hologram. Acta Physica Sinica, 2012, 61(16): 164212. doi: 10.7498/aps.61.164212
    [14] Hong Zhen-Jie, Liu Rong-Jian, Guo Peng, Dong Nai-Ming. Non-spherical symmetric inversion of ionospheric occultation data. Acta Physica Sinica, 2011, 60(12): 129401. doi: 10.7498/aps.60.129401
    [15] Tan Ye, Yu Yu-Ying, Dai Cheng-Da, Tan Hua, Wang Qing-Song, Wang Xiang. Measurement of low-pressure Hugoniot data for bismuth with reverse-impact geometry. Acta Physica Sinica, 2011, 60(10): 106401. doi: 10.7498/aps.60.106401
    [16] Cong Rui, Liu Shu-Lin, Ma Rui. An approach to phase space reconstruction from multivariate data based on data fusion. Acta Physica Sinica, 2008, 57(12): 7487-7493. doi: 10.7498/aps.57.7487
    [17] Zhou Nan-Run, Zeng Gui-Hua, Gong Li-Hua, Liu San-Qiu. Quantum communication protocol for data link layer based on entanglement. Acta Physica Sinica, 2007, 56(9): 5066-5070. doi: 10.7498/aps.56.5066
    [18] Liu Xin-Yuan, Xie Bai-Qing, Dai Yuan-Dong, Wang Fu-Ren, Li Zhuang-Zhi, Ma Ping, Xie Fei-Xiang, Yang Tao, Nie Rui-Juan. Adaptive noise cancellation for SQUID-based magnetocardiogram. Acta Physica Sinica, 2005, 54(4): 1937-1942. doi: 10.7498/aps.54.1937
    [19] YANG LIN-BAO. SAMPLED-DATA FEEDBACK CONTROL FOR CHEN'S CHAOTIC SYSTEM. Acta Physica Sinica, 2000, 49(6): 1039-1042. doi: 10.7498/aps.49.1039
    [20] WANG ZHU-XI, CHANG LI-YUAN. ON THE CALCULATION OF THE VIRIAL COEFFICIENTS OF HYDROGEN GAS FROM EXPERIMENTAL DATA. Acta Physica Sinica, 1965, 21(3): 508-518. doi: 10.7498/aps.21.508
  • supplement 补充材料-7-20222316-070701.pdf supplement
Metrics
  • Abstract views:  6224
  • PDF Downloads:  212
  • Cited By: 0
Publishing process
  • Received Date:  05 December 2022
  • Accepted Date:  07 February 2023
  • Available Online:  09 February 2023
  • Published Online:  05 April 2023

/

返回文章
返回
Baidu
map