面向研究性项目文档的知识画像模型

doi:10.3969/j.issn.1006-2475.2020.02.013

计算机与现代化 ›› 2020, Vol. 0 ›› Issue (02): 60-.doi: 10.3969/j.issn.1006-2475.2020.02.013

面向研究性项目文档的知识画像模型

（华北计算技术研究所，北京100083）

收稿日期:2019-08-22 出版日期:2020-03-03 发布日期:2020-03-03
作者简介:武迪（1995-），男，内蒙古呼和浩特人，硕士研究生，研究方向：大数据分析，自然语言处理，E-mail: wd369123@126.com ；艾中良（1971-），男，研究员，研究方向：大数据分析技术，人工智能技术，E-mail: aizl067@sina.com；刘忠麟（1984-），男，工程师，研究方向：大数据分析，E-mail: 2933462260@qq.com；李常宝（1980-），男，高级工程师，博士，研究方向：大数据分析，自然语言处理，E-mail: lichangbao_1@163.com。

Knowledge Portrait Model for Research Project Documents

(North China Institute of Computing Technology, Beijing 100083, China)

Received:2019-08-22 Online:2020-03-03 Published:2020-03-03

摘要/Abstract

摘要： 为提高科研活动产生的研究性项目文档的知识要点的精准智能识别和检索能力，通过分析文档行文结构，提出建立文档知识画像的方法。设计精准贴合研究性项目文档结构的多层次的知识画像，自动识别文档知识要点，并按语义段落多粒度提取知识要点。采用知识表达准确率对模型的知识提取精确度进行检验，实验结果表明模型比传统方法对文档知识描述更准确，可在实际工作中使用。

关键词: 研究性项目文档, 知识画像, 知识提取, 文档描述模型, 结构要素

Abstract: Aiming to improve the ability of accurate and intelligent identification of the knowledge points of the research project documents produced by research activities, we analyzed the document structure and proposed a method for establishing document knowledge portrait. A multi-level knowledge portrait fitting the structure of documents is designed. It identified the key points of the document knowledge automatically and extracted the knowledge points according to the multi-granularity of the semantic paragraph. The accuracy of knowledge extraction is used to test the accuracy of the model. The experimental results show that the model is more accurate than the traditional method for document knowledge description and can be used in practical application.

Key words: research project document, knowledge portrait, knowledge extraction, document description model, structural elements

中图分类号:

TP391.12

武迪，艾中良，刘忠麟，李常宝 . 面向研究性项目文档的知识画像模型[J]. 计算机与现代化, 2020, 0(02): 60-.

WU Di， AI Zhong-liang， LIU Zhong-lin， LI Chang-bao . Knowledge Portrait Model for Research Project Documents[J]. Computer and Modernization, 2020, 0(02): 60-.

参考文献

［1］罗思群. 基于XML技术的数据转换［D］. 北京:中国科学院软件研究所, 2001.
［2］章勇. 基于扩展的XML实现知识与文档的转换［J］. 计算机应用, 2004,24(s1):318-319.
［3］郝森,朱战立. 对XML文档结构树形表示的研究与实现［J］. 现代电子技术, 2007,30(18):83-84.
［4］廖开际,熊会会,叶东海. 基于知识元理论的应急文档结构化建模［J］. 计算机应用研究, 2011,28(1):175-178.
［5］魏伟,郭崇慧,唐琳,等. 基于知识元的文献挖掘研究——以粤海关文献资料为例［J］. 情报科学, 2017，35(6):138-144.
［6］秦春秀,杨智娟,赵捧未,等. 面向科技文献知识表示的知识元本体模型［J］. 图书情报工作, 2018,62(3):94-103.
［7］王秀红,袁艳,赵志程,等. 专利文献的结构树模型及其在相似度计算中的应用［J］. 情报理论与实践, 2015,38(3):107-111.
［8］马永起,蒙立荣,余杰,等. 面向IETM的PDF文档发布模型研究［J］. 信息技术与网络安全, 2017,36(24):87-91.
［9］ MAYBURY M T. Generating summaries from event data［J］. Information Processing and Management, 1995,31(5):735-751.
［10］刘开瑛,薛翠芳. 中文文本中抽取特征信息的区域与技术［J］. 中文信息学报, 1998,12(2):2-8.
［11］郑义. 多媒体信息自动摘要及其相关技术研究［D］. 上海:复旦大学, 2003.
［12］张丽. 文本挖掘中关键词与文本摘要自动提取研究［D］. 青岛:青岛理工大学, 2018.
［13］ZHANG X L. Semantic Web and semantic-based networked information retrieval［J］. Journal of the China Society for Scientific & Technical Information, 2002,21(4):413-420.
［14］路燕. 基于多DTD的XML查询技术研究［D］. 上海:复旦大学, 2003.
［15］王成龙. XML文档语义相似性研究综述［D］. 长春:东北师范大学, 2009.
［16］KIRYAKOV A, POPOV B, TERZIEV I, et al. Semantic annotation, indexing, and retrieval［J］. Web Semantics Science Services & Agents on the World Wide Web, 2004,2(1):49-79.
［17］ALANI H, KIM S, MILLARD D E, et al. Automatic ontology-based knowledge extraction and tailored biography generation from the Web［J］. IEEE Intelligent Systems, 2013,18(1):14-21.
［18］李向阳,张亚非. 基于语义标注的信息抽取［J］. 解放军理工大学学报(自然科学版), 2004,5(4):39-43.
［19］陈叶旺,李文,彭鑫,等. 基于本体的文档语义标注改进方法［J］. 东南大学学报(自然科学版), 2009,39(6):1109-1113.
［20］〖JP3〗张玉芳,张泓博,熊忠阳. 语义相似度计算在语义标注中的应用［J］. 计算机工程与应用, 2013,49(4):153-156.
［21］丁效. 句子级中文事件抽取关键技术研究［D］. 哈尔滨:哈尔滨工业大学, 2011.
［22］张帆,乐小虬. 面向领域科技文献的句子级创新点抽取研究［J］. 数据分析与知识发现, 2014,30(9):15-21.
［23］MIHALCEA R, TARAU P. TextRank: Bringing order into texts［C］// Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2004.

面向研究性项目文档的知识画像模型

Knowledge Portrait Model for Research Project Documents

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价