计算机与现代化 ›› 2025, Vol. 0 ›› Issue (11): 32-40.doi: 10.3969/j.issn.1006-2475.2025.11.004

• 人工智能 • 上一篇    下一篇

融合多维提示及多层次标签词扩展的小样本专利分类方法

  


  1. (北京信息科技大学网络文化与数字传播北京市重点实验室,北京 100101)
  • 出版日期:2025-11-20 发布日期:2025-11-24
  • 作者简介: 作者简介:游新冬(1979—),女,福建永定人,教授,博士,研究方向:中文信息处理,多媒体信息处理,E-mail: youxindong@bistu.edu.cn; 赵玉贤(2000—),男,江西赣州人,硕士研究生,研究方向:自然语言处理,E-mail: zhaoyuxian1024@163.com; 通信作者:吕学强(1970—),男,辽宁抚顺人,教授,博士,研究方向:专利知识挖掘,自然语言处理,E-mail: icddtxyx@163.com; 刘勃杉(1998—),男,河北石家庄人,硕士研究生,研究方向:自然语言处理,E-mail: kylin520s@163.com。
  • 基金资助:
    基金项目:国家自然科学基金资助项目(62171043); 北京市自然科学基金资助项目(4232025); 青海省创新平台建设专项(2022-ZJ-T02); 北京市教委科研计划科技一般项目(KM202311232003, KM202311232002)
       

Few-Shot Patent Classification Method Integrating Multi-dimensional Prompts and Multi-level Label Expansion


  1. (Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China)
  • Online:2025-11-20 Published:2025-11-24

摘要: 摘要:为推动产学研一体化、新兴产业和未来产业发展,高校专利按产业需求进行分类是非常有必要的。然而,当前缺乏面向新兴产业的专利分类资源,而且数据标注成本高昂。因此,本文针对新兴产业专利分类提出一种融合多维提示及多层次标签词扩展的小样本专利分类方法。该方法使用BERTopic进行主题聚类获取专利文本中的主题关键词,使用GLM-4抽取专利文本中的专业术语,帮助模型从宏观和微观层面多维度理解专利;使用Masked Language Modeling(MLM)和ChatGPT从多层次扩充标签词空间,为提示学习模型提供更加丰富且具有语义深度的标签词汇。实验在构建的小样本专利分类数据集上进行验证,均取得了比基线模型更好的分类效果,且效果优于GLM-4大语言模型,验证了所提出的方法在小样本专利分类上的有效性。


关键词: 关键词:提示学习, 提示工程, 答案工程, 专利分类

Abstract: Abstract: To promote the integration of industry, academia, and research and drive the development of emerging and future industries, it is necessary to classify university patents according to industrial needs. However, currently, there is a lack of patent classification resources for emerging industries, and the cost of data annotation is high. Therefore, this paper proposes a few-shot patent classification method that integrates multi-dimensional prompts and multi-level label word expansion for emerging industry patent classification. This method uses BERTopic for topic clustering to obtain the topic keywords in patent texts and uses GLM-4 to extract professional terms from patent texts to help the model understand patents from multiple dimensions at the macro and micro levels. It uses Masked Language Modeling (MLM) and ChatGPT to expand the label word space from multiple levels and provide more abundant and semantically deep label words for the prompt learning model. Experiments are verified on the constructed few-shot patent classification data set and achieve better classification results than baseline models. Moreover, the effect is better than that of the GLM-4 large language model, verifying the effectiveness of the proposed method in few-shot patent classification.

Key words: Key words: prompt learning, prompt engineering, answer engineering, patent classification

中图分类号: