计算机与现代化

• 应用与开发 • 上一篇    下一篇

一种手写图文分离方法

胡兴鸿1,2, 施大鹏1,2, 冯桂焕1,2   

  1. 1.计算机软件新技术国家重点实验室,江苏南京210093;2.南京大学软件学院,江苏南京210093
  • 收稿日期:2013-09-09 修回日期:1900-01-01 出版日期:2013-12-18 发布日期:2013-12-18

A Method of Handwriting Texts and Shapes Separation

HU Xing-hong1,2, SHI Da-peng1,2, FENG Gui-huan1,2   

  1. 1. State Key Laboratory for Novel Software Technology, Nanjing 210093, China;
    2. Software Institute, Nanjing University, Nanjing 210093, China
  • Received:2013-09-09 Revised:1900-01-01 Online:2013-12-18 Published:2013-12-18

摘要: 手写识别作为改善人机交互的技术之一已经变得越来越重要,涌现了大量对手写文字和手绘图形的研究工作,而作为手写识别的一个重要部分,对图形和文本的分类工作一直没有获得足够的重视。本文基于开源数据挖掘工具Weka设计并实现一种手写图文分离方法,基于LogitBoost、Random Forest和LADTree三种不同分类器的测试结果表明,LogitBoost的综合分类效果最好。通过联合3个分类器能够实现精确的图形判定,但文本的分类效率则受限于分类效果最差的分类器。同时基于信息增益评估结果,分析了不同特征对图文分类的影响。

关键词: 手写识别, 数据挖掘, 图文分离, 分类模型

Abstract: As a technology to improve human-computer interaction, handwriting recognition is becoming more and more important. However, the distinction of handwriting texts and shapes has not drawn enough attention. In this paper, we designed and implemented a handwriting text and shape separation approach based on Weka. The experiment results based on three classification techniques, LogitBoost, RandomForest and LogitBoost, show that LogitBoost performances best. Through a combination of these three classifiers, shapes can be recognized more accurately, while the precision of text is limited by the classifier with lowest accuracy. Moreover, the effect of different features to the results is analyzed based on Information Gain Method.

Key words: sketch recognition, data mining, text-shape separation, classification model