计算机与现代化 ›› 2022, Vol. 0 ›› Issue (06): 56-66.

• 人工智能 • 上一篇    下一篇

文本摘要模型的研究进展

  

  1. (延安大学数学与计算机科学学院,陕西延安716000)
  • 出版日期:2022-06-23 发布日期:2022-06-23
  • 作者简介:张紫芸(1995—),女,陕西宝鸡人,硕士研究生,研究方向:算法设计与分析,E-mail: 1048735602@qq.com; 王文发(1968—),男,陕西志丹人,教授,硕士生导师,研究方向:算法分析与设计,E-mail: wwf@13010360; 通信作者:马乐荣(1974—),男,陕西神木人,教授,硕士生导师,博士,研究方向:自然语言处理,E-mail: mlr@yau.edu.cn; 丁苍峰(1978—),男,河南唐河人,副教授,博士,研究方向:自然语言处理,E-mail: dcf@yau.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(61866038, 62041212, 61763046); 陕西省自然科学基础研究计划项目(2020JM-548)

Research Progress of Text Summarization Model

  1. (School of Mathematics and Computer Science, Yan’an University, Yan’an 716000, China)
  • Online:2022-06-23 Published:2022-06-23

摘要: 随着互联网产生的文本数据越来越多,文本信息过载问题日益严重,对各类文本进行一个“降维”处理显得非常必要,文本摘要便是其中一个重要的手段,也是人工智能领域研究的热点和难点之一。文本摘要旨在将文本或文本集合转换为包含关键信息的简短摘要。近年来语言模型的预处理提高了许多自然语言处理任务的技术水平,包括情感分析、问答、自然语言推理、命名实体识别和文本相似性、文本摘要。本文梳理文本摘要以往的经典方法和近几年的基于预训练的文本摘要方法,并对文本摘要的数据集以及评价方法进行整理,最后总结文本摘要目前面临的挑战与发展趋势。

关键词: 数据集, 文本摘要, 预训练模型

Abstract: With more and more text data generated by the Internet, the problem of text information overload is becoming more and more serious. It is very necessary to reduce the dimension of various texts, and text summarization is one of the important means, and it is also one of the hot and difficult points in the field of artificial intelligence research. Text is designed to transform a text or a collection of texts into a short summary containing key information. In recent years, language model preprocessing has improved the technical level of many natural language processing tasks, including emotion analysis, question and answer, natural language reasoning, named entity recognition, text similarity and text summarization. In this paper, the classic methods of text summarization in the past and the methods of text summarization based on pre-training in recent years are combed, and the data sets and evaluation methods of text summarization are sorted out. Finally, the challenges and development trends of text summarization are summarized.

Key words: datasets, text abstract, pre-training model