一种手写图文分离方法

doi:10.3969/j.issn.1006-2475.2013.12.034

计算机与现代化 ›› 2013, Vol. 12 ›› Issue (12): 145-148.doi: 10.3969/j.issn.1006-2475.2013.12.034

一种手写图文分离方法

胡兴鸿1，2，施大鹏1，2，冯桂焕1，2

1.计算机软件新技术国家重点实验室，江苏南京210093；2.南京大学软件学院，江苏南京210093

收稿日期:2013-09-09 修回日期:1900-01-01 出版日期:2013-12-18 发布日期:2013-12-18

A Method of Handwriting Texts and Shapes Separation

HU Xing-hong1，2， SHI Da-peng1，2， FENG Gui-huan1，2

1. State Key Laboratory for Novel Software Technology， Nanjing 210093， China;
2. Software Institute， Nanjing University， Nanjing 210093， China

Received:2013-09-09 Revised:1900-01-01 Online:2013-12-18 Published:2013-12-18

摘要/Abstract

摘要： 手写识别作为改善人机交互的技术之一已经变得越来越重要，涌现了大量对手写文字和手绘图形的研究工作，而作为手写识别的一个重要部分，对图形和文本的分类工作一直没有获得足够的重视。本文基于开源数据挖掘工具Weka设计并实现一种手写图文分离方法，基于LogitBoost、Random Forest和LADTree三种不同分类器的测试结果表明，LogitBoost的综合分类效果最好。通过联合3个分类器能够实现精确的图形判定，但文本的分类效率则受限于分类效果最差的分类器。同时基于信息增益评估结果，分析了不同特征对图文分类的影响。

关键词: 手写识别, 数据挖掘, 图文分离, 分类模型

Abstract: As a technology to improve human-computer interaction， handwriting recognition is becoming more and more important. However， the distinction of handwriting texts and shapes has not drawn enough attention. In this paper， we designed and implemented a handwriting text and shape separation approach based on Weka. The experiment results based on three classification techniques， LogitBoost， RandomForest and LogitBoost， show that LogitBoost performances best. Through a combination of these three classifiers， shapes can be recognized more accurately， while the precision of text is limited by the classifier with lowest accuracy. Moreover， the effect of different features to the results is analyzed based on Information Gain Method.

Key words: sketch recognition, data mining, text-shape separation, classification model

胡兴鸿1，2，施大鹏1，2，冯桂焕1，2 . 一种手写图文分离方法[J]. 计算机与现代化, 2013, 12(12): 145-148.

HU Xing-hong1，2， SHI Da-peng1，2， FENG Gui-huan1，2 . A Method of Handwriting Texts and Shapes Separation[J]. Computer and Modernization, 2013, 12(12): 145-148.

参考文献

［1］ Blagojevic R V. Using Data Mining for Digital Ink Recognition［D］. University of Auckland， 2011.
［2］ Landay J A. SILK: Sketchting interfaces like crazy［C］// Proceedings of Human Factors in Computing Systems， ACM CHI’96. 1996:398-399.
［3］ Gross M D. The electronic cocktail napkin: A computational environment for working with design diagrams［J］. Design Studies， 1996， 17(1): 53-69.
［4］ Hammond T， Davis R. Tahuti: A geometrical sketch recognition system for UML class diagrams［C］// 2002 AAAI Spring Symposium on Sketch Understanding. 2002:59-68.
［5］ Jain A K， Namboodiri A M， Subrahmonia J. Structure in on-line documents［C］// Proceedings of the 6th International Conference on Document Analysis and Recognition. 2001: 844-848.
［6］ Bishop C M， Svensen M， Hinton G E. Distinguishing text from graphics in online handwritten ink［C］// Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition. 2004:142-147. 〖JY〗〖HT5”K〗(下转第154页)〖HT〗〖ZK)〗〖FL)〗
［7］〖ZK(#〗Zhou X D， Liu C L， Quiniou S， et al. Text/non-text ink stroke classification in Japanese handwriting based on Markov random fields［C］// Proceedings of the 9th International Conference on Document Analysis and Recognition. 2007:377-381.
［8］ Delaye A， Liu C L. Text/non-text classification in online handwritten documents with conditional random fields［C］// Proceedings of the Chinese Conference on Pattern Recognition. 2012: 514-521.
［9］杜剑锋. Weka 完整中文教程［DB/OL］. http://wenku.baidu.com/view/449180c189eb172ded63b7c7.html， 2012-05-09.
［10］Leo Breiman. Random forests［J］. Machine Learning， 2001， 45(1):5-32.
［11］Jerome Friedman， Trevor Hastie， Robert Tibshirani. Additive logistic regression: A statistical view of boosting［J］. Annals of Statistics， 2000， 28(2):337-407.
［12］Geoffrey Holmes，Bernhard Pfahringer，Richard Kirkby， et al. Multiclass alternating decision trees ［C］// Proceedings of the 13th European Conference on Machine Learning. 2001:161-172.
［13］Indermühle E， Liwicki M， Bunke H. IAMonDo-database: An online handwritten document database with non-uniform contents［C］// Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. 2010: 97-104.

[1]	袁红伟1, 常利军1, 郝家欢2, 樊娜2, 王超2, 罗闯2, 张泽辉2. 基于标签传播的轨迹兴趣点挖掘及隐私保护[J]. 计算机与现代化, 2024, 0(05): 46-54.
[2]	谢仕斌, 刘梦赤, 唐诗琪, 周瑞平, . 基于多特征提取的时间卷积知识追踪模型[J]. 计算机与现代化, 2023, 0(07): 25-29.
[3]	刘佩. 基于数据挖掘的医保控费系统[J]. 计算机与现代化, 2023, 0(06): 89-94.
[4]	王劭华, 欧阳会丹, 孙丹, 王康, 吴鸿萍, 钟询, 褚兴平, 杨松涛. 基于Apriori算法的大学生体测项目关联规则挖掘[J]. 计算机与现代化, 2023, 0(03): 66-70.
[5]	宋晓丽, 张勇波, 张培颖. 基于半监督学习的学生消费数据异常检测[J]. 计算机与现代化, 2022, 0(12): 13-17.
[6]	张军, 邱龙龙. 一种基于BERT和池化操作的文本分类模型[J]. 计算机与现代化, 2022, 0(06): 1-7.
[7]	段桂芹, 邹臣嵩. 基于近邻传播聚类的职业能力评价模型[J]. 计算机与现代化, 2022, 0(05): 21-27.
[8]	杨琳, 白钊, 寇勇刚. 基于RFM模型的随机森林算法对民航客户的流失分析[J]. 计算机与现代化, 2021, 0(01): 100-104.
[9]	李科心, 李静, 邵佳炜, 肖屹. 多层次序列集成的高维数值型异常检测[J]. 计算机与现代化, 2020, 0(06): 73-.
[10]	蒋毅,欧郁强,梁广,高杨,严永高,林捷,赵晓宁. 基于数据挖掘的现场作业风险态势评估方法[J]. 计算机与现代化, 2020, 0(04): 78-.
[11]	齐玉东1，丁海强1，赵锦超2，孙明玮1. 基于biRNN的海军军械不均衡文本数据集处理方法[J]. 计算机与现代化, 2019, 0(12): 21-.
[12]	郭燚1，张卫山1，徐亮2，翟佳3. 基于微服务的石油大数据挖掘平台[J]. 计算机与现代化, 2019, 0(05): 25-.
[13]	李娜，毛国君，邓康立. 基于k-means聚类的股票KDJ类指标综合分析方法[J]. 计算机与现代化, 2018, 0(10): 12-.
[14]	田丽. 情报分析中提取主题信息核心要素的模型及方法[J]. 计算机与现代化, 2018, 0(10): 22-.
[15]	杜薇，周武能. 基于CTC模型的无分割文本验证码识别[J]. 计算机与现代化, 2018, 0(09): 48-.

一种手写图文分离方法

A Method of Handwriting Texts and Shapes Separation

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价