计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

一种基于BERT的自动文本摘要模型构建方法

  

  1. (华北计算技术研究所,北京100083)
  • 收稿日期:2019-07-15 出版日期:2020-02-13 发布日期:2020-02-13
  • 作者简介:岳一峰(1994-),男,河南新乡人,硕士研究生,研究方向:自然语言处理,E-mail: 228941230 @qq.com; 黄蔚(1972-),女,研究员,硕士,研究方向:大数据处理整合与挖掘分析; 任祥辉(1979-),男,研究员,硕士,研究方向:系统体系架构,大数据分析。
  • 基金资助:
    国家重点研发计划资助项目(2016YFB0801400)

An Automatic Text Summarization Model Construction Method Based on BERT Embedding

  1. (North China Institute of Computing Technology, Beijing 100083, China)
  • Received:2019-07-15 Online:2020-02-13 Published:2020-02-13

摘要: 针对传统词向量在自动文本摘要过程中因无法对多义词进行有效表征而降低文本摘要准确度和可读性的问题,提出一种基于BERT(Bidirectional Encoder Representations from Transformers)的自动文本摘要模型构建方法。该方法引入BERT预训练语言模型用于增强词向量的语义表示,将生成的词向量输入Seq2Seq模型中进行训练并形成自动文本摘要模型,实现对文本摘要的快速生成。实验结果表明,该模型在Gigaword数据集上能有效地提高生成摘要的准确率和可读性,可用于文本摘要自动生成任务。

关键词: 文本摘要, BERT模型, 注意力机制, Sequence-to-Sequence模型

Abstract: Aiming at the problem that the traditional word vector can not effectively represent polysemous words in text summarization, which reduces the accuracy and readability of summarization, this paper proposes an automatic text summarization model construction method based on BERT (Bidirectional Encoder Representations from Transformers)Embedding. This method introduces the BERT pre-training language model to enhance the semantic representation of word vector. The generated word vectors are input into the Seq2Seq model for training to form an automatic text summarization model, which realizes the rapid generation of text summarization. The experimental results show that the model can effectively improve the accuracy and readability of the generated summarization on Gigaword dataset, and can be used for automatic text summarization generation tasks.

Key words: text summarization, BERT model, attention mechanism, Sequence-to-Sequence(Seq2Seq) model

中图分类号: