Computer and Modernization

Previous Articles     Next Articles

Research on Chinese Word Segmentation Technology for Military Field

  

  1. (Simulation Training Center, Army Engineering University, Shijiazhuang 050000, China)
  • Received:2018-04-03 Online:2018-11-22 Published:2018-11-23

Abstract: When the word segmentation model cross-field word segmentation, the performance will be significantly reduced. Due to the complexity of annotating the corpus work of the legacy system development documents of the army, this paper proposes an adaptation method of Chinese word segmentation in combination with n-gram and domain dictionary. By extracting the n-gram features of the target corpus, the method adapts to the word segmentation model in the feature domain. Then, the domain dictionary is used to perform reverse maximum matching correction on the word segmentation results. Experimental results show that in the corpus of documents related to the legacy system of the army, the word segmentation model trained by this method improves the F-measure by 12.4%.

Key words:  , n-gram characteristics; domain dictionary

CLC Number: