A Large Margin Nearest Neighbor Algorithm of Large-scale Text Classification

doi:10.3969/j.issn.1006-2475.2016.06.015

Abstract

Abstract: The large margin nearest neighbor algorithm has strong learning ability and generalization ability, which is widely used in the field of classification. But it will sink into difficulties when the semidefinite programming（SDP） scale of the LMNN algorithm expands rapidly as the data increasing used to solve the large-scale text classification problem. To solve this problem, we introduced the Huber loss function, which divided the Semidefinite Optimization Model of LMNN algorithm into two low-level continuous optimization sub-models, and finally reduced the computation complexity of the algorithm and improved its efficiency. The experimental results on the classification data set of public opinion show that the precision of the proposed algorithm was improved 4.5%, and the classification time saved 47.1% compared with the traditional one. It also can prove that adopting the low-level decomposition reduction method to improve the performance of the LMNN algorithm is feasible and more suitable for large-scale text classification.


Key words: semidefinite programming, large margin nearest neighbor, Huber loss function, large-scale text classification, generalization


CLC Number:

TP311.13

ZHU Qian1, QIN Hua1, FENG Zhi-xin2, CHEN Chen1. A Large Margin Nearest Neighbor Algorithm of Large-scale Text Classification[J]. Computer and Modernization, 2016, 0(6): 68-72.

References

［1］ Marasovic T, Papic V, Zanchi V. LMNN metric learning and fuzzy nearest neighbour classifier for hand gesture recognition［J］. Journal on Multimodal User Interfaces, 2015,9(3):211-221.
［2］ Todeschini R, Ballabio D, Consonni V, et al. Locally centred Mahalanobis distance:A new distance measure with salient features towards outlier detection［J］. Analytica Chimica Acta, 2013,787:1-9.
［3］方育柯,傅彦,周俊临,等. 基于选择性集成的最大化软间隔算法［J］. 软件学报, 2012,34(5):1132-1147.
［4］ Hu Fan, Xia Gui-song, Sun Hong. Multi-Level max-margin analysis for semantic classification of satellite images［J］. Wuhan University Journal of Natural Sciences, 2015,20(1):47-54.
［5］ Assi K C, Labelle H, Cheriet F. Modified large margin nearest neighbor metric learning for regression［J］. IEEE Signal Processing Letters, 2014,21(3):292-296.
［6］ Shen Chunhua, Kim J, Wang Lei. Scalable large-margin Mahalanobis distance metric learning［J］. IEEE Transactions on Neural Networks, 2010,21(9):1524-1530.
［7］ Weinberger K Q, Saul L K. Fast solvers and efficient implementations for distance metric learning［C］// Proceedings of the 25th International Conference on Machine Learning. 2008:1160-1167.
［8］ Weinberger K Q, Sha F, Saul L K. Convex optimizations for distance metric learning and pattern classification ［Applications Corner］［J］. IEEE Signal Processing Magazine, 2010,27(3):146-158.
［9］ Park K, Shen C, Hao Z, et al. Efficiently learning a distance metric for large margin nearest neighbor classification［C］// Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011:453-458.
［10］陈开志,乐承沛,钟尚平. 融合距离度量学习和SVM的图像匹配算法［J］. 小型微型计算机系统, 2015(6):1353-1357.
［11］Deng Zhi-hong, Luo Kun-hu. CLE_LMNN: A novel framework of LMNN based on clustering labeled examples［J］. Expert Systems with Applications, 2015,42(14):5988-5993.
［12］彭凯,汪伟,杨煜普. 基于余弦距离度量学习的伪K近邻文本分类算法［J］. 计算机工程与设计, 2013,34(6):2200-2203.
［13］连荷清,李斌,孙怀江. 基于极大间隔最近邻学习的运动捕获数据检索［J］. 计算机应用与软件, 2013,30(11):302-305.
［14］韦化,吴阿琴,白晓清. 一种求解机组组合问题的内点半定规划方法［J］. 中国电机工程学报, 2008,28(1):35-40.
［15］胡晓雄,贾育秦. 基于不同损失和距离函数的乘更新分类算法［J］. 计算机应用研究, 2014,31(2):344-347.
［16］Shen Chunhua, Welsh A, Wang Lei. PSDBoost: Matrix-generation linear programming for positive semidefinite matrices learning［C］// Proceedings of Advances in Neural Information Processing Systems. 2008:1473-1480.
［17］Erway J B, Marcia R F. Algorithm 943: MSS: MATLABsoftware for L-BFGS trust-region subproblems for large-scale optimization［J］. ACM Transactions on Mathematical Software(TOMS), 2014,40(4):Article No. 28.
［18］杨柳,于剑,景丽萍. 一种自适应的大间隔近邻分类算法［J］. 计算机研究与发展, 2013,50(11):2269-2277.

[1]	QIU Ling1, 2, SONG Zhi1, 2, LYU Shuang1, 2, YANG Xue1, 2. Application of Data Synchronization Technology in External Services of Meteorological Big Data Cloud Platform [J]. Computer and Modernization, 2024, 0(07): 76-81.
[2]	ZHONG Song-ying. Textile Raw Material Cost Warning Based on Apriori Algorithm of Association Rules [J]. Computer and Modernization, 2023, 0(07): 43-43.
[3]	CHEN Hao, ZHANG Ya, LUO Xi-chang, ZHANG Ya-li, LIU Wen-jing. Meteorological Data Storage and Retrieval System Based on MongoDB [J]. Computer and Modernization, 2020, 0(08): 100-104.
[4]	SU Lin-ping, AN Ran, LI Wei, CUI Wen-chao, ZHANG Xiao-liang. Design of Power Operation and Maintenance Audit System Based on Hadoop [J]. Computer and Modernization, 2020, 0(01): 49-.
[5]	MOCheng-wei,FANBing-bing. DOP：ASimpleOpenDataFrameworkandItsApplication [J]. Computer and Modernization, 2018, 0(07): 6-.
[6]	JIPeng1,2,LIHui1,2,CHENMei1,2,DAIZhen-yu1,2. DoFFT:AFastFourierTransformMethodBasedonDistributedDatabase [J]. Computer and Modernization, 2018, 0(06): 19-.
[7]	CHEN Lijuan, XIE Huosheng. A Parallel Algorithm for Mining onshelf Utility Itemset with Negative Item Values [J]. Computer and Modernization, 2018, 0(04): 13-.
[8]	LIU Sai1, NIE Qing-jie1, LIU Jun1, LI Dong-min2, LI Jing2. A New Access Control Model for Real Time Database #br# Backup System Based on Quantified Action [J]. Computer and Modernization, 2018, 0(01): 116-122.
[9]	YUAN Zhao-zheng1, SHAO Xiu-li1, YAN Kai-jing2, LI Dan2, GUO Jian-jun3. Design and Implementation of HBase Query Based on SQL [J]. Computer and Modernization, 2017, 0(7): 20-26+61.
[10]	LI Jin-yang, JIANG Shun-qing. Continuous Probabilistic Skyline Queries for Moving Data Points in Manhattan Road Networks [J]. Computer and Modernization, 2017, 0(7): 85-90.
[11]	SUN Xiao-yin, ZHOU Wei. Big Data Analytics Technology Based on MOOC [J]. Computer and Modernization, 2017, 0(4): 89-93,108.
[12]	DAI Sheng, WANG Bo. Relational Query in Large-scale Configuration Management Database Based on Graph Database [J]. Computer and Modernization, 2017, 0(1): 51-56.
[13]	ZHANG Chao1,2, GUO Hui-ming2, ZHANG Hong2. Cloud Storage Research on Supervisory Video Heat Based on K-means [J]. Computer and Modernization, 2016, 0(6): 12-15.
[14]	WU Fei, MAO Yu-guang. A Multi-null Value Estimation Method Based on Multi-table Relationship Information in Relational Database [J]. Computer and Modernization, 2016, 0(6): 117-122.
[15]	DAI Yang, CHEN Fang. Data Compression Algorithm in Real-time Database [J]. Computer and Modernization, 2016, 0(6): 123-126.