计算机与现代化 ›› 2020, Vol. 0 ›› Issue (12): 72-77.

• 数据库与数据挖掘 • 上一篇    下一篇

基于条件随机场的电力工程标书文本实体识别方法

  

  1. (1.江西师范大学计算机信息工程学院,江西南昌330022; 2.江西省电力公司,江西南昌330077;
    3.江西省科学院,江西南昌330096; 4.国网甘肃省电力公司信息通信公司,甘肃兰州730000)
  • 出版日期:2021-01-07 发布日期:2021-01-07
  • 作者简介:邵诗韵(1996—),女,浙江宁波人,硕士,研究方向:数据科学与自然语言处理,E-mail: shiyunssy@163.com; 周宇(1966—),男,高级工程师,硕士,研究方向:电力信息化,E-mail: zhouyu1966@163.com; 杨蕾(1993—),女,助理工程师,硕士,研究方向:能源经济,能源与环境系统工程,E-mail: 594410749@qq.com; 通信作者:钟茂生(1974—),男,教授,博士,研究方向:人工智能与自然语言处理,E-mail: zhongmaosheng@sina.com; 戴芮(1988—),女,工程师,硕士,研究方向:电力信息化,E-mail: smiledairy@126.com.cn; 赵家乐(1996—),男,硕士研究生,研究方向:数据安全检索和工作流优化,E-mail: zhaojiale0415@163.com。
  • 基金资助:
    江西省重点研发计划项目(20181A50029)

An Efficient Entity Identification Method for Electric Bidding Documents Based on Conditional Random Field

  1. (1. School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China;
    2. Jiangxi Electric Power Company, Nanchang 330077, China; 3. Jiangxi Academy of Sciences, Nanchang 330096, China;
    4. State Grid Gansu Information & Telecommunication Company, Lanzhou 730000, China)
  • Online:2021-01-07 Published:2021-01-07

摘要: 近年来,随着国家经济的飞速发展,电力建设工程投资快速增加,电力建设工程标书数量及相应的工程标书评审工作量也急剧增加。传统的纯人工标书评审耗时、耗力且速度慢。要实现工程标书的机器自动评审,就需要对标书进行关键内容自动抽取和摘要,标书文本中的实体识别是关键步骤。鉴于工程标书中有很多非常用词语组合存在,现有的技术对工程标书中的地名等实体的识别效果并不理想。针对上述问题,本文提出并设计一种基于条件随机场的电力工程标书的实体识别技术方案,通过机器实现对标书的自动化快速处理,并帮助实现关键性工程的电子化评估和数据共享。该方法已在实验中证实了其有效性,并已被应用到电力领域的文件自动化处理事务中。

关键词: 电力标书, 条件随机场模型, 实体识别, 自动处理, 监督学习

Abstract: In recent years, with the rapid development of the national economy, the investment in power construction projects has increased rapidly. Both the number of the associated tenders and the corresponding workload of evaluation have soared. The conventional manual method for assessment is time-consuming, costly, and inefficient. For improving the efficiency of the bid review and reducing the related costs, it is ideal to take advantage of automatic or semi-automatic analysis. Among the adoption of machine-assisted, the entity identification in the tender text, definitely, plays an essential role in information extraction and text summarization. Since there are many complex and a hybrid combination of words in text like location names, the existing recognition technology does not perform well. In this paper, we propose an application-friendly critical information extraction method based on the conditional random fields (CRF), which realizes the automatic and rapid processing of tenders and accelerates the re-assessment of various engineering construction projects and data sharing. Our proposed mechanism has got an experimental verification of efficiency. It has been employed to the automatic transactions in the power sector.

Key words: electric bidding documents, conditional random field model, entity identification, automatically read, supervised learning