计算机与现代化 ›› 2022, Vol. 0 ›› Issue (09): 13-18.

• 数据库与数据挖掘 • 上一篇    下一篇

基于多词汇特征增强的中文事件检测方法

  

  1. (华南师范大学计算机学院,广东广州510631)
  • 出版日期:2022-09-22 发布日期:2022-09-22
  • 作者简介:缪梓敬(1997—),男,广东汕尾人,硕士研究生,研究方向:事件抽取,事理图谱构建,E-mail: 1040058330@qq.com; 梅欣(1996—),男,江西抚州人,硕士研究生,研究方向:多模态学习,E-mail: 571459387@qq.com。
  • 基金资助:
    广东省重点领域研发计划项目(2019B111101001)

Chinese Event Detection Based on Multi-lexicon Feature Augmentation

  1. (Schoolof Computer, South China Normal University, Guangzhou 510631, China)
  • Online:2022-09-22 Published:2022-09-22

摘要: 事件检测主要研究从非结构化文本中自动识别事件触发词,实现所属事件类型的正确分类。与英文相比,中文需要经过分词才能利用词汇信息,还存在“分词-触发词”不匹配问题。针对中文语言特性与事件检测任务的特点,本文提出一种基于多词汇特征增强的中文事件检测模型,通过外部词典为字级别模型引入包含多词汇信息的词汇集,以利用多种分词结果的词汇信息。同时采用静态文本词频统计与自动分词工具协同决策词汇集中词汇的权重,获取更加精确的词汇语义。在ACE2005中文数据集上与现有模型进行实验对比分析,结果表明本文方法取得了最好的性能,验证了该方法在中文事件检测上的有效性。

关键词: 中文事件检测, 特征增强, 多词汇特征, 词汇权重决策

Abstract: Event detection mainly focuses on event trigger recognition from unstructured text to achieve correct classification of event types. Compared with English, there is no natural separation in Chinese, and word segmentation boundaries need to be determined before lexicon information can be used. In addition, there is a word-trigger mismatch problem in Chinese event detection. According to the characteristics of Chinese and event detection, a Chinese event detection based on multi-lexicon feature augmentation is proposed, which introduces a word collection containing multi-word information into the character-based model through an external dictionary to utilize the semantic information of multiple segmentation results. At the same time, static text word frequency statistics and Chinese word segmentation systems are used to make collaborative decision on the weight of words in the lexical set to obtain more accurate lexical semantics. The experimental results on ACE2005 Chinese dataset show that the proposed method achieves the best performance, which verifies its effectiveness in Chinese event detection.

Key words:  Chinese event detection, feature augmentation, multiple lexicon feature, lexicon weight determination