一种手写体识别误差与用户花费平衡算法

计算机与现代化 ›› 2015, Vol. 0 ›› Issue (9): 50-56.

一种手写体识别误差与用户花费平衡算法

新疆工程学院计算机工程系，新疆维吾尔自治区乌鲁木齐830011

收稿日期:2015-05-11 出版日期:2015-09-21 发布日期:2015-09-24
作者简介:尚雪莲（1977-），女，甘肃武威人，新疆工程学院计算机工程系讲师，硕士，研究方向：模式识别，图像处理；梁传君（1980-），女，讲师，硕士，研究方向：模式识别，图形图像处理。
基金资助:
新疆维吾尔自治区自然科学基金资助项目(2013211A031)；新疆工程学院基金资助项目(2014030415)

A Balance Algorithm Between Handwriting Error and User Effort

Department of Computer Engineering, Xinjiang Institute of Engineering, Urumqi 830011, China

Received:2015-05-11 Online:2015-09-21 Published:2015-09-24

摘要/Abstract

摘要：

针对当前计算机辅助注释手写文本文件转录算法存在效率不高的问题，提出一种能预测自动识别单词块中的错误率，并估计校正转录到某个用户定义的错误率所需花费精力的手写文本文件

转录算法。首先，分析传统的错误估计方法及其存在的主要问题；然后，提出对整个单词块执行错误估计以提高准确率的思想；最后，将当前执行最好技术进行合并，提出手写文本转录方法。本算法包

含在转录手写文本文件的交互式方法中，以主动学习和半监督学习技术有效利用用户交互。在2个真实手写文件上进行转率实验，实验考虑了用户所花精力和转录准确性之间的平衡，实验结果表明了本算

法的有效性。

关键词: , 计算机辅助标注, 手写体识别, 用户花费, 平衡, 文本转录, 误差评估

Abstract:

To solve the problem of poor performance in present computer-assisted annotation transcription of handwritten text documents, a new algorithm for predicting the

error rate in a block of automatically recognized words is proposed, and estimates how much effort is required to correct a transcription to a certain user-defined error rate.

Firstly, the main problem in traditional error estimating methods is analyzed. Then, the estimation of the error is performed for a whole block of words to raise the accuracy

rate. Finally, the best-performing techniques presented in previous works are combined to form our method. The proposed method is included in an interactive approach to

transcribe handwritten text documents, which efficiently employs user interactions by means of active and semi-supervised learning techniques. Transcription results, in terms of

trade-off between user effort and transcription accuracy, are reported for two real handwritten documents, and prove the effectiveness of the proposed algorithm.

Key words: computer-assisted annotation, handwriting recognition, user effort, balance, text transcription, error estimation

尚雪莲，梁传君. 一种手写体识别误差与用户花费平衡算法[J]. 计算机与现代化, 2015, 0(9): 50-56.

SHANG Xue-lian, LIANG Chuan-jun. A Balance Algorithm Between Handwriting Error and User Effort[J]. Computer and Modernization, 2015, 0(9): 50-56.

［1］玛依热. 联机手写维吾尔文字母识别方法［J］. 模式识别与人工智能, 2013,25(6):979-986.
［2］ Revuelta-Martínez A, Rodríguez L, García-Varea I.A computer assisted speech transcription system［C］// Proceedings of the Demonstrations at the 13th Conference of the

European Chapter of the Association for Computational Linguistics. Berlin. 2012:41-45.
［3］严军. 空间手写识别特征提取研究［D］. 杭州：浙江大学, 2012.〖HJ1.5mm〗
［4］何永丛,刘文波,张弓,等. 基于NSCT的旋转不变纹理图像检索算法［J］. 中国图象图形学报, 2011,16(1):79-83.
［5］ Sanchis A, Juan A, Vidal E. A word-based Nave Bayes classifier for confidence estimation in speech recognition［J］. IEEE Transactions on Audio, Speech, and Language

Processing, 2012,20(2):565-574.
［6］姚正斌,丁晓青,刘长松. 基于统计的动态规划算法在联机汉字切分中的应用［J］. 计算机应用研究, 2011,28(7):2592-2594.
［7］ Wang Qiufeng, Yin Fei, Liu Chenglin. Unsupervised language model adaptation for handwritten Chinese text recognition［J］. Pattern Recognition, 2014,47(3):1202-1216.
［8］ Sánchez-Sáez R, Leiva L A, Sánchez J A, et al. Interactive predictive parsing using a web-based architecture［C］// Proceedings of the NAACL HLT 2010 Demonstration

Session. Association for Computational Linguistics Stroudsburg. 2010:37-40.
［9］丁世飞,齐丙娟,谭红艳. 支持向量机理论与算法研究综述［J］. 电子科技大学学报, 2011,40(1):2-10.
［10］Mouchère H,李锦鹏,Viard-Gaudin C, et al. 多笔画联机手写字符识别的动态时间规正算法［J］. 华南理工大学学报(自然科学版), 2013,41(7):107-113.
［11］Serrano N, Sanchis A, Juan A. Balancing error and supervision effort in interactive-predictive handwriting recognition［C］// Proceedings of the 15th International

Conference on Intelligent User Interfaces. 2010:373-376.
［12］刘毅,毛震东,张冬明,等. 低质量汉字的分块搜索两级识别法［J］. 计算机辅助设计与图形学学报, 2012,24(2):170-175.
［13］Farhad M M, Hossain S M, Khan A S, et al. An efficientoptical character recognition algorithm using artificial neural network by curvature properties of characters

［C］// 2014 International Conference on Informatics, Electronics & Vision(ICIEV). 2014:1-5.
［14］Sanchez-Cortina I, Serrano N, Sanchis A, et al. A prototype for interactive speech transcription balancing error and supervision effort［C］// Proceedings of the 2012

ACM international conference on Intelligent User Interfaces. 2012:325-326.
［15］Haiyang L I, Zheng T, Zheng G, et al. Confidence measure based on context consistency using word occurrence probability and topic adaptation for spoken term detection［J

］. IEICE Transactions on Information and Systems, 2014,97(3):554-561.
［16］邵忻. 基于跨领域主动学习的图像分类方法［J］. 计算机应用, 2014,34(4):1169-1171.
［17］Feenstra R C, Li Z, Yu M. Exports and credit constraints under incomplete information: Theory and evidence fromChina［J］. Review of Economics and Statistics, 2014,96

(4):729-744.

[1]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[2]	万兵1, 2, 3, 赵文涛4, 潘多涛1, 赵峥韬2, 3, 孙朝阳2, 3, 俞建成2, 3. 无人帆船半物理仿真测试系统设计[J]. 计算机与现代化, 2024, 0(12): 91-99.
[3]	陈宇航1, 杨勇1, 帕力旦·吐尔逊1, 樊小超1, 任鸽1, 刁宇峰2. 融合句法特征与语义特征的作文自动评分方法[J]. 计算机与现代化, 2024, 0(11): 64-69.
[4]	薛浩, 马静, 郭小宇. 基于Focal Loss改进LightGBM的供水管网毛刺数据检测[J]. 计算机与现代化, 2024, 0(09): 74-81.
[5]	周传华1, 2, 任太娇1, 罗岚1, 周昊1. 基于联合熵的非平衡数据边界混合重采样[J]. 计算机与现代化, 2024, 0(09): 95-100.
[6]	黄文栋, 王怡凡. 基于模态类别的多模态信息处理与融合综述[J]. 计算机与现代化, 2024, 0(07): 47-62.
[7]	钟海龙1, 2, 何月顺1, 何璘琳1, 陈杰1, 田鸣3, 郑瑞银4. 基于代价敏感卷积神经网络的加密流量分类#br# #br#[J]. 计算机与现代化, 2024, 0(05): 55-60.
[8]	刘付谦, 秦华妮, 赖惠慧. 基于SMOTE和贝叶斯优化的Adj-LightGBM人岗匹配算法[J]. 计算机与现代化, 2023, 0(03): 90-95.
[9]	李静元, 张珂, 杨东裕. 基于雾计算的工业互联网安全数据访问方法[J]. 计算机与现代化, 2022, 0(12): 118-122.
[10]	孙丹, 施炜利, 饶兰香, 孟莎莎, 郭晓明, 李逸伦. 基于改进混合采样和XGBoost算法的信用卡欺诈检测方法[J]. 计算机与现代化, 2022, 0(09): 111-118.
[11]	杨进, 张晨. 改进EasyEnsemble的软投票策略下的用户购买预测方法[J]. 计算机与现代化, 2022, 0(07): 47-53.
[12]	秦鸣乐, 年梅, 张俊, . 基于深度生成对抗网络的恶意TLS流量识别[J]. 计算机与现代化, 2022, 0(04): 121-126.
[13]	徐鑫强, 何鹏, . 基于图过滤框架对图卷积滤波器灵活性的研究[J]. 计算机与现代化, 2022, 0(03): 103-110.
[14]	周传华, 朱俊杰, 徐文倩, 邓佳佳. 基于聚类欠采样的集成分类算法[J]. 计算机与现代化, 2021, 0(11): 72-76.
[15]	陈春燕, 刘梦赤. 基于粒子群遗传算法的智能组卷策略[J]. 计算机与现代化, 2021, 0(08): 16-23.