计算机与现代化 ›› 2023, Vol. 0 ›› Issue (03): 71-78.

• 数据库与数据挖掘 • 上一篇    下一篇

基于增强语义模型的货品名分类算法

  

  1. (1.南京航空航天大学经济与管理学院,江苏 南京 211106; 2.合肥工业大学管理学院,安徽 合肥 230009)
  • 出版日期:2023-04-17 发布日期:2023-04-17
  • 作者简介:李晓峰(1979—),男,河北衡水人,博士生,研究方向:自然语言处理,数据挖掘,E-mail: lxf0895@nuaa.edu.cn; 通信作者:马静(1966—),女,重庆人,教授,博士生导师,博士,研究方向:数据挖掘,自然语言处理,复杂网络,E-mail: majing5525@126.com; 周琰(2000—),男,江苏南京人,研究方向:大数据管理与应用。
  • 基金资助:
    国家自然科学基金面上项目(72174086)

Classification Algorithm for Goods Names Based on Enhanced Semantic Model

  1. (1. College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2. School of Management, Hefei University of Technology, Hefei 230009, China)
  • Online:2023-04-17 Published:2023-04-17

摘要: 海关申报(报关)是指进出口货物所有人向海关办理进出境手续的过程。报关过程主要包括:填写报关单、单据检验、货物查验等流程。本文主要针对报关单中的货物品名依赖人工填写,存在申报成本高、效率低下、准确率不稳定等有待优化的问题,提出以报关货物描述短文本为基础,分别使用TF-IDF模型和BERT模型提取词频特征与语义特征,根据语料特点创新性地用词频特征增强语义特征;用ViT模型提取货物图像特征并与文本特征在交叉注意力机制作用下融合,再由多粒度级联森林分类器实现货物名称分类,达到精准获取货物品名的目的。实验结果显示:货物品名分类准确率为0.92,召回率为0.90,F1-score为0.91,表明了本文所提算法在解决报关货物品名分类问题上具有合理性与优越性,有助于解决现有问题。

关键词: 货物描述, 货物名, 货物名分类, 增强语义

Abstract: Customs declaration is the  process of the owner of the import and export goods to the customs . The process of customs declaration mainly includes: filling customs declaration, document inspection, cargo inspection, and others. This paper primarily focuses on the name of goods in the customs declaration depends on manual filling and the problems of high declaration cost, low efficiency, unstable accuracy, and other to be optimized, proposes to take the short texts of goods description at customs declaration as corpus, and extractes the word-frequency features and semantic features using the TF-IDF and the BERT models. According to the characteristics of the corpus, this paper innovatively enhances semantic features with word-frequency features. Secondly, the ViT model extractes image features and fuses them with text features under the cross-attention mechanism. Finally, the multi-grain cascade forest classifier realizes the classification of goods names and achieves the purpose of accurately obtaining goods names. The experimental results show that the precision is 0.92, the recall is 0.90, and the F1-score is 0.91, which fully demonstrates the rationality and superiority of the algorithm in solving this problem and helps solve the existing problems.

Key words: goods description, goods name, goods name classification, enhanced semantic