计算机与现代化

• 人工智能 • 上一篇    下一篇

基于多变量Probit回归的用户兴趣挖掘方法

  

  1. (上海市学生事务中心信息技术部,上海 200235)
  • 收稿日期:2016-11-09 出版日期:2017-06-23 发布日期:2017-06-23
  • 作者简介:申强华(1969-),男,湖南湘潭人,上海市学生事务中心信息技术部工程师,硕士,研究方向:数据分析,数据挖掘。

User Interest Mining via Multivariate Probit Regression

  1. (Department of Information Technology, Shanghai Center for Student Affairs, Shanghai 200235, China)
  • Received:2016-11-09 Online:2017-06-23 Published:2017-06-23

摘要: 用户兴趣挖掘一直是很多领域的基础问题,例如推荐系统、个性化检索和在线广告。一个用户在Internet或现实生活中的历史行为虽然能反映用户的兴趣,但是如果用户第一次使用网络,因为缺少历史行为信息,系统很难获得用户的兴趣。为解决无法获取新用户兴趣的问题,本文提出一种基于多变量Probit回归的用户兴趣挖掘方法。采用马尔科夫链蒙特卡洛(MCMC)方法估计模型的后验分布。通过合成数据与豆瓣明星对电影的兴趣验证模型的性能,结果表明所提出的方法能够有效地预测冷启动用户的兴趣。

关键词: 用户兴趣, 多变量Probit, MCMC方法

Abstract: Mining user interest is a fundamental technique in many fields such as recommender system, personalized retrieval and online advertising. The historical actions of a user through the Web or in real word reflect his interests. However, if the user uses the Web at his first time, it is difficult to learn his interests because only few historical actions are known. To deal with this issue, we propose a variant of multivariate Probit model to learn the prior of the user’s interests based on user’s attributes. The attributes may include sign up location, sign up time and some other registration information. The posterior distribution of the model is simulated by a Markov chain Monte Carlo (MCMC) method to estimate the expectation of user’s interest. To evaluate our algorithm, we collect the information of movie stars and their movies as the evaluation dataset. The experiment on this dataset demonstrates that the prior information can effectively improve the performance on cold start users.

Key words: user interest, multivariate Probit, MCMC method

中图分类号: