Computer and Modernization ›› 2022, Vol. 0 ›› Issue (05): 75-81.

Previous Articles     Next Articles

Hierarchical Representation of Power Text Named Entity Recognition and Project-expert Matching

  

  1. (1. Electric Power Research Institute of Yunnan Power Grid Co., Ltd., Kunming650217, China;
    2. School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an710049, China)
  • Online:2022-06-08 Published:2022-06-08

Abstract: To address the project-expert matching problem existing inthe evaluation work of the application for science and technology projects in the power field, this paper proposes a novel hierarchical word representation model (Attention-RoBerta-BiLSTM-CRF, ARBC) for power text named entity recognition. Moreover, a project-expert matching algorithm is also presented based on semantic and pictorial double feature space mapping strategy. ARBC model consists of a word embedding module, a Bi-directional Long Short-Term Memory (BiLSTM) module and a Conditional Random Field (CRF) module. The hierarchical word embedding module utilizes the information of word, sentence and document of the power text. Specifically the word embedding vector based on the pre-trained RoBerta model is extracted firstly. Then, the contextual representation of any sentence is enhanced by introducing an attention mechanism based on word frequency-inverse document frequency values at the document level. Finally, the word embedding and sentence embedding are linearly weighted and fused to form a hierarchical representation vector of a given word. Once the named entities of power texts are recognized by ARBC model, the task of entity effetive accurate matching between power projects and experts is achieved by the semantic and pictorial double feature space mapping strategy. Experimental results demonstrated on a set of 2000 power project abstract texts for the task of named entities recognition, and a F1 value of 83% is achieved based on the ARBC model, which is significantly higher than the widely used pre-trained models such as Bert and RoBerta. In addition, the entity matching strategy based on double feature space mapping achieves 85% accuracy for the power text-expert matching task.

Key words: hierarchical representation, named entity recognition, expertmatching, power text