合成少数类过采样过滤器方法在二手车推荐中的应用

doi:10.3969/j.issn.1006-2475.2016.07.025

计算机与现代化 ›› 2016, Vol. 251 ›› Issue (07): 118-123.doi: 10.3969/j.issn.1006-2475.2016.07.025

合成少数类过采样过滤器方法在二手车推荐中的应用

1.南京航空航天大学计算机科学与技术学院，江苏南京210016;2.南京航空航天大学无人机研究院，江苏南京210016

收稿日期:2016-01-08 出版日期:2016-07-21 发布日期:2016-07-22
作者简介: 邱海波(1989-),男,江苏阜宁人,南京航空航天大学计算机科学与技术学院硕士研究生,研究方向：智能系统与数据挖掘; 钱忠民 (1971-),男,副教授,博士,研究方向：信息系统与信息安全; 钱默抒(1978-),男, 南京航空航天大学无人机研究院副研究员,博士,研究方向：飞行控制系统及其容错控制。
基金资助:
国家自然科学基金资助项目（61403195）；江苏省自然科学基金资助项目（SBK2014042586）

Used-car Recommendation Based on Synthetic Minority Over-sampling Technique Filter

1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; 
2. Unmanned Aevial Vehicle Research Institute, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Received:2016-01-08 Online:2016-07-21 Published:2016-07-22

摘要/Abstract

摘要： 由于二手车推荐的数据集具有非平衡特性，因此，二手车推荐可视为非平衡分类问题，可借助解决非平衡分类问题的方法来实现二手车推荐。本文对非平衡数据分类的数据集重构进行研究，通过分析合成少数类过采样方法(Synthetic Minority Over-sampling Technique, SMOTE)的特点与不足，提出合成少数类过采样过滤器方法(Synthetic Minority Over-sampling Technique Filter, SmoteFilter)，对SMOTE方法合成样本进行过滤，减少合成样本中的噪声数据，提高训练样本“质量”。使用支持向量机对SMOTE合成的数据和SmoteFilter合成的数据进行实验对比，结果表明SmoteFilter方法相较传统的SMOTE过采样方法，提高了二手车推荐中少数类的预测精度，提升了对二手车推荐的整体预测性能。

关键词: , 二手车推荐, 分类, 非平衡数据, 过采样, 支持向量机

Abstract: Due to the fact the used-car data have unbalanced characteristics, recommendation of used-cars boils down to unbalanced data classification problem and it can be solved with the unbalanced classification methods. In this paper, with the focus on reconstruction of the trainning data set and by an analysis of characteristics and deficiency of the SMOTE over-sampling method, we propose the Synthetic Minority Over-sampling Technique Filter, or SmoteFilter for short. It works by filtering the data generated by SMOTE over-sampling and reduces the noise in generated data. Based on support vector machine using data generated by SMOTE and SmoteFilter, the experimental study shows that SmoteFilter method has better effect on predicting accuracy of minority class than the SMOTE method, improving the prediction performance of vehicle recommendation.

Key words: used-car recommendation, classification, imbalanced dataset, over-sampling, support vector machine

邱海波1，钱忠民1，钱默抒2. 合成少数类过采样过滤器方法在二手车推荐中的应用[J]. 计算机与现代化, 2016, 251(07): 118-123.

QIU Hai-bo1， QIAN Zhong-min1， QIAN Mo-shu2. Used-car Recommendation Based on Synthetic Minority Over-sampling Technique Filter[J]. Computer and Modernization, 2016, 251(07): 118-123.

参考文献

［1］张辉，郑安文. 中国二手车市场现状分析及发展对策［J］. 汽车工业研究, 2012(7):10-13.
［2］ Alpaydin E. Introduction toMachine Learning［M］. The MIT Press, 2004.
［3］ López V, Fernández A, García S, et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics［J］. Information Sciences, 2013,250(11):113-141.
［4］ Chawla N V, Bowyer K W, Hall L O. SMOTE: Syntheticminority over-sampling technique［J］. Journal of Artificial Intelligence Research, 2002，16(1):321-357.
［5］张学工. 关于统计学习理论与支持向量机［J］. 自动化学报, 2004,26(1):32-42.
［6］邓乃扬，田英杰. 支持向量机——理论、算法与拓展［M］. 北京：科学出版社, 2009.
〖LL〗［7］ Fergani B. Evaluating C-SVM, CRF and LDA classification for daily activity recognition［C］// 2012 International Conference on Multimedia Computing and Systems (ICMCS). 2012:272-277.
［8］ Graf A B A, Borer S. Normalization in support vector machines［C］// Proceedings of the 23rd DAGM-Symposium on Pattern Recognition. 2001:277-282.
［9］ Chang C C, Lin C J. LIBSVM: A library for support vector machines［J］. ACM Transactions on Intelligent Systems and Technology, 2011,2(3):27.
［10］Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms［J］. Pattern Recognition, 1997,30(7):1145-1159.
［11］Ferri C, Flach P, Orallo J H. Learning decision trees using the area under the ROC curve［C］// Proceedings of the 19th International Conference on Machine Learning. 2002:139-146.
［12］Jones K S. Information Retrieval Experiment［M］. London Butterworths, 1981.
［13］Leontowich A F G. Utility of the G value and the critical dose to soft X-ray radiation damage of polyacrylonitrile［J］. Radiation Physics and Chemistry, 2013,90:87-91.
［14］Davenport M A. The 2v-SVM: A cost-sensitive extension of the v-SVM［R］. Texas: Rice University ECE Technical Report, 2005.
［15］Lin Sheng-liang, Liu Zhi. Parameter selection in SVM with RBF kernel function［J］. Journal of Zhejiang University of Technology, 2007,35(2):163.
［16］Hall M, Frank E, Holmes G, et al. The WEKA data mining software:An update［J］. ACM SIGKDD Explorations Newsletter, 2009,11(1):10-18.
［17］邵信光,杨慧中,陈刚. 基于粒子群优化算法的支持向量机参数选择及其应用［J］. 控制理论与应用, 2006,23(5):740-743.

[1]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[2]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[3]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[4]	万兵1, 2, 3, 赵文涛4, 潘多涛1, 赵峥韬2, 3, 孙朝阳2, 3, 俞建成2, 3. 无人帆船半物理仿真测试系统设计[J]. 计算机与现代化, 2024, 0(12): 91-99.
[5]	龚谊承1, 2, 刘青1, 2. 基于RF-LCE-BiLSTM-Attention-AMSSA模型的京剧二分类[J]. 计算机与现代化, 2024, 0(11): 7-12.
[6]	陈宇航1, 杨勇1, 帕力旦·吐尔逊1, 樊小超1, 任鸽1, 刁宇峰2. 融合句法特征与语义特征的作文自动评分方法[J]. 计算机与现代化, 2024, 0(11): 64-69.
[7]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.
[8]	焦一凯1, 2, 朱欣娟1, 2. 公共文化资源标签推荐方法[J]. 计算机与现代化, 2024, 0(10): 107-112.
[9]	周传华1, 2, 任太娇1, 罗岚1, 周昊1. 基于联合熵的非平衡数据边界混合重采样[J]. 计算机与现代化, 2024, 0(09): 95-100.
[10]	何若男1, 范翔2, 陈益1, 姜羽菲1, 曹辉1. 比例优势逻辑回归优化嗓音障碍指数算法[J]. 计算机与现代化, 2024, 0(08): 1-4.
[11]	赵小明, 潘婷, 刘伟锋. 基于图像分类的自动绘画心理分析方法[J]. 计算机与现代化, 2024, 0(08): 92-97.
[12]	周宪溪, 牟莉. 基于改进TF-IDF和AGLCNN的新闻长文本分类模型[J]. 计算机与现代化, 2024, 0(08): 120-126.
[13]	黄文栋, 王怡凡. 基于模态类别的多模态信息处理与融合综述[J]. 计算机与现代化, 2024, 0(07): 47-62.
[14]	曹宁1, 严心娥1, 徐根祺2, 许又文1, 张正勃2, 杜倩云2. 基于DEFA-LSSAR的水利工程边坡力学参数预测模型[J]. 计算机与现代化, 2024, 0(07): 106-111.
[15]	王志强, 郑爽. 基于半监督学习的StyleGAN图像生成模型[J]. 计算机与现代化, 2024, 0(06): 14-18.

合成少数类过采样过滤器方法在二手车推荐中的应用

Used-car Recommendation Based on Synthetic Minority Over-sampling Technique Filter

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价