基于随机森林与LambdaMART的搜索排序模型

doi:10.3969/j.issn.1006-2475.2017.03.012

计算机与现代化 ›› 2017, Vol. 0 ›› Issue (3): 54-.doi: 10.3969/j.issn.1006-2475.2017.03.012

基于随机森林与LambdaMART的搜索排序模型

1.武汉邮电科学研究院,湖北武汉430074； 2.烽火通信科技股份有限公司南京研发部，江苏南京210019

收稿日期:2016-08-03 出版日期:2017-03-29 发布日期:2017-03-30
作者简介:雷武（1992-），男，湖北武汉人，武汉邮电科学研究院、烽火通信科技股份有限公司南京研发部硕士研究生，研究方向：海量数据挖掘，信息检索；廖闻剑（1970-），男，教授级高级工程师，博士，研究方向：信息安全，海量数据挖掘，网络行为分析；彭艳兵（1974-），男，高级工程师，博士，研究方向：网络行为分析，海量数据挖掘。

Search Rank Model Based on Random Forests and LambdaMART

1. Wuhan Research Institute of Posts and Telecommunications, Wuhan 430074, China;
2. Nanjing R & D, FiberHome Telecommunication Technologies Co., Ltd., Nanjing 210019, China

Received:2016-08-03 Online:2017-03-29 Published:2017-03-30

摘要/Abstract

摘要：

目前的研究表明，Boosting算法在各种任务中都能提供良好的预测性能。而在学习排序中，基于Boosting的模型，例如Rankboost和LambdaMART，在对公共数据集的评估中表现优秀。本文通过研究

随机森林算法和LambdaMART，将随机森林算法作为基础模型，学习一个排序函数，将函数的输出作为LambdaMART的初始函数，最终生成排序模型。在公共数据集上基于评价指标ERR和NDCG对排序模型进行

验证，结果表明本排序模型均要优于原始算法。

关键词: 排序学习, 随机森林算法, LambdaMART算法, 集成学习, 排序模型

Abstract:

Recent studies have shown that Boosting provides excellent predictive performance across a wide variety of tasks. In learningtorank, boosted models such as

RankBoost and LambdaMART have been shown to be among the best performing learning methods based on evaluations on public data sets. In this paper, we investigate Random Forests

（RF） and LambdaMART. Then we combine the two algorithms by first learning a ranking function with RF and using it as initialization for LambdaMART to create a new rank model.

We report our results on the public learningtorank data sets using two metrics ERR and NDCG. The new rank model performs better than two original algorithms models.

Key words: , learning to rank； Random Forests； LambdaMART； ensemble learning； ranking model

中图分类号:

TP181

雷武1,2，廖闻剑2，彭艳兵2. 基于随机森林与LambdaMART的搜索排序模型[J]. 计算机与现代化, 2017, 0(3): 54-.

LEI Wu1,2， LIAO Wenjian2， PENG Yanbing2. Search Rank Model Based on Random Forests and LambdaMART[J]. Computer and Modernization, 2017, 0(3): 54-.

［1］Chapelle O, Keerthi S S. Efficient algorithms for ranking with SVMs［J］. Information Retrieval, 2010,13（3）:201-215.
［2］Burges C, Shaked T, Renshaw E, et al. Learning to rank using gradient descent［C］// International Conference on Machine Learning. 2005:89-96.
［3］Zheng Zhaohui, Zha Hongyuan, Zhang Tong, et al. A general Boosting method and its application to learning ranking functions for Web search［C］// Proceedings of the 2007

Conference on Advances in Neural Information Processing Systems. 2007:1697-1704.
［4］黄震华,张佳雯,田春岐,等. 基于排序学习的推荐算法研究综述［J］. 软件学报, 2016,27（3）:691-713.
［5］Cao Houwei, Verma R, Nenkova A. Speakersensitive emotion recognition via ranking: Studies on acted and spontaneous speech［J］. Computer Speech & Language, 2015,28

（1）:186-202.
［6］Song Yang, Wang Hongning, He Xiaodong. Adapting deep RankNet for personalized search［C］// ACM International Conference on Web Search and Data Mining. 2014,144（5）:S-471.


［7］Miao Zhigao, Wang Juan, Zhou Aimin, et al. Regularized boost for semisupervised ranking［C］// Proceedings of the 18th Asia Pacific Symposium on Intelligent and

Evolutionary Systems. Springer International Publishing, 2015:643-651.
［8］Li Hang. Learning to rank for information retrieval and natural language processing［M］// Synthesis Lectures on Human Language Technologies. Morgan & Claypool, 2011:113.
［9］Panda B, Herbach J S, Basu S, et al. PLANET: Massively parallel learning of tree ensembles with MapReduce［J］. Proceedings of the Vldb Endowment, 2009,2（2）:1426-1437.
［10］Ge Guangtao, Wong G W. Classification of premalignant pancreatic cancer massspectrometry data using decision tree ensembles［J］. BMC Bioinformatics, 2008,9（1）:275.


［11］Schietgat L, Vens C, Struyf J, et al. Predicting gene function using hierarchical multilabel decision tree ensembles［J］. BMC Bioinformatics, 2010,11（1）:1-14.
［12］Criminisi A, Shotton J, Konukoglu E. Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and SemiSupervised

Learning［R］. Microsoft Technical Report, MSR-TR-2011-114, 2011.
［13］Mohan A, Chen Z, Weinberger K Q. Websearch ranking with initialized gradient boosted regression trees［C］// Jmlr: Workshop & Conference. 2011:77-89.
［14］Breiman L. Random forests［J］. Machine Learning, 2001,45（1）:5-32.
［15］Burges C J C , Svore K M, Bennett P N, et al. Learning to rank using an ensemble of LambdaGradient models［J］. Journal of Machine Learning Research, 2011,14:25-35.
［16］Burges C J C. From Ranknet to Lambdarank to Lambdamart: An Overview［R］. MSR-TR-2010-82, 2010.
［17］Donmez P, Svore K M, Burges C J C. On the local optimality of LambdaRank［C］// International ACM SIGIR Conference on Research and Development in Information Retrieval,

SIGIR 2009. 2009:460-467.
［18］Ganjisaffar Y, Caruana R, Lopes C V. Bagging gradientboosted trees for high precision, low variance ranking models［C］// Proceeding of the, International ACM SIGIR

Conference on Research and Development in Information Retrieval, SIGIR 2011. 2011:85-94.
［19］Chapelle O, Metlzer D, Zhang Ya, et al. Expected reciprocal rank for graded relevance［C］// ACM Conference on Information and Knowledge Management, CIKM 2009. 2009:621

-630.
［20］Wang Yining, Wang Liwei, Li Yuanzhi, et al. A theoretical analysis of NDCG type ranking measures［J］. Journal of Machine Learning Research, 2013,30:25-54.
［21］Qin Tao, Liu Tieyan, Xu Jun, et al. LETOR: A benchmark collection for research on learning to rank for information retrieval［J］. Information Retrieval, 2010,13

（4）:346-374.

[1]	吕美静1, 年梅1, 张俊1, 2, 付鲁森1. 基于自编码器的网络流量异常检测[J]. 计算机与现代化, 2024, 0(12): 40-44.
[2]	王杰, 徐祥, 罗晓丹, 张萌, 黄澈, 洪冠中, 汪翔. 基于集成学习的巢湖面雨量计算方法[J]. 计算机与现代化, 2023, 0(09): 38-43.
[3]	龚云翔, 袁仕芳, 刘付谦. 基于集成学习与不平衡数据的返贫预测[J]. 计算机与现代化, 2022, 0(04): 12-16.
[4]	王磊, 宋波. 基于ADE-Stacking的心力衰竭非计划性再入院风险预测模型[J]. 计算机与现代化, 2022, 0(01): 23-27.
[5]	王继民, 季昌政, 李家欢, 曹颖. 基于集成学习的中小河流洪水预报[J]. 计算机与现代化, 2021, 0(05): 51-58.
[6]	杨琳, 白钊, 寇勇刚. 基于RFM模型的随机森林算法对民航客户的流失分析[J]. 计算机与现代化, 2021, 0(01): 100-104.
[7]	王海, 江峰, 杜军威, 赵军. 过采样与集成学习方法在软件缺陷预测中的对比研究[J]. 计算机与现代化, 2020, 0(06): 83-.
[8]	李科心, 李静, 邵佳炜, 肖屹. 多层次序列集成的高维数值型异常检测[J]. 计算机与现代化, 2020, 0(06): 73-.
[9]	刘树艺，李静，胡春，王伟. 基于卷积神经网络与集成学习的交通标志识别[J]. 计算机与现代化, 2019, 0(12): 67-.
[10]	顾陈楠,曾晓勤 . 基于3D卷积神经网络的动态手势识别[J]. 计算机与现代化, 2019, 0(11): 75-.
[11]	乔媛，廖小平，邵开霞. 基于跳跃显露模式挖掘算法的癌症分类[J]. 计算机与现代化, 2018, 0(05): 100-.
[12]	刘意. 一种基于频繁子图的集成分类算法[J]. 计算机与现代化, 2017, 0(1): 32-35.
[13]	李瑞1，袁小玲2. 半动态集成选择分类方法[J]. 计算机与现代化, 2015, 0(2): 48-.

基于随机森林与LambdaMART的搜索排序模型

Search Rank Model Based on Random Forests and LambdaMART

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 13

编辑推荐

Metrics

本文评价