计算机与现代化 ›› 2025, Vol. 0 ›› Issue (02): 52-57.doi: 10.3969/j.issn.1006-2475.2025.02.007

• 人工智能 • 上一篇    下一篇

面向人才履历信息的三元组联合抽取模型




  

  1. (1.新疆师范大学计算机科学技术学院,新疆 乌鲁木齐 830054; 2.新疆电子研究所,新疆 乌鲁木齐 830013)
  • 出版日期:2025-02-28 发布日期:2025-02-28
  • 基金资助:
    新疆自治区重点研发计划项目(2022B01007-2)

A Triple Joint Extraction Model for Talent Resume Information

  1. (1. College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China;
    2. Xinjiang Electronic Research Institute, Urumqi 830013, China)
  • Online:2025-02-28 Published:2025-02-28

摘要: 人才职称评定领域中蕴含着大量人才履历信息,但履历信息通常以自然语言的形式存在,难以从中抽取关键信息作为人才职称评定依据。为了解决此问题,本文将实体抽取和关系抽取进行联合建模,构建一种面向人才履历信息的三元组联合抽取模型(RLAC)。首先,通过中文预训练语言模型RoBERT-wwm对人才履历信息进行底层编码;其次,引入LSTM网络和注意力机制改善人才履历信息难以识别头实体问题,增强编码上下文语义特征提取能力;再次,将编码后的信息输入至头实体标注器中获得头实体;最后,将头实体与人才履历信息进行拼接后输入至尾实体关系标注器中缓解关系重叠问题,从而获得三元组。在人才履历数据集上的实验结果表明,与基线模型相比,本文模型在精确率、召回率以及F1值上均有提升,具有良好的三元组抽取能力。

关键词: 实体识别, 三元组抽取, 联合抽取, 人才履历信息, 关系重叠

Abstract: The field of talent title evaluation contains a large amount of talent resume information, but resume information often exists in the form of natural language, which experts find difficult to use as a basis for talent title evaluation. To address this issue, this article combines entity extraction and relationship extraction for joint modeling, and constructs a triplet joint extraction model (RLAC) for talent resume information. Firstly, the Chinese pre-trained language model RoBERT-wwm is used to encode the underlying talent resume information. Secondly, the introduction of LSTM network and attention mechanism improves the problem of difficult recognition of head entities in talent resume information, and enhances the ability to extract semantic features in coding context. Thirdly, input the encoded information into the header entity annotator to obtain the header entity. Finally, concatenate the head entity and talent resume information and input them into the tail entity relationship annotator to alleviate the problem of relationship overlap, thus obtaining a triplet. Compared with the baseline model, the experimental results on the talent resume dataset of the proposed model has improved accuracy, recall, and F1 value, indicating that the model has good triplet extraction ability.

Key words: entity recognition, triplet extraction, joint extraction, talent resume information, relationship overlap

中图分类号: