结合注意力机制和图神经网络的CTR预估模型

摘要/Abstract

摘要： 大多数CTR预测的算法都是将特征嵌入初始化为一个固定的维度，忽略了长尾物品特征的流行度不高。把它和头部物品的嵌入向量设置为相同长度会导致模型训练不平衡，影响最后的预测结果。基于此，本文首先使用一个端到端的可微框架，该框架可以根据特征的流行度自动选择不同的嵌入维度。其次，引入挤压激励网络机制和具有残差连接的多头自注意力机制，分别从不同角度动态地学习特征的重要性以及识别重要的特征组合，然后使用图神经网络代替传统内积和哈达玛积显式建模二阶特征交互。最后为了进一步提高性能，将DNN组件与浅层模型相结合形成深度模型，利用贝叶斯优化算法为深度模型选择一组超参数，避免复杂的调参过程，并且在2个基准数据集上实验，结果验证模型的有效性

关键词: 点击率预测, 自动嵌入搜索, 挤压激励网络, 多头自注意力机制, 图神经网络, 贝叶斯优化

Abstract: Most CTR prediction algorithms initialize the feature embedding as a fixed dimension， ignoring the low popularity of the long tail feature. Setting it to the same length as the head object embedding vector will lead to unbalanced model training and affect the final recommendation results. Based on this， this paper first uses an end-to-end differentiable framework， which can automatically select different embedded dimensions according to the popularity of features. Secondly， this paper introduces squeeze excitation network mechanism and multi-head self-attention mechanism with residual connection to dynamically learn the importance of features and identify important feature combinations from different angles， and then uses graph neural network to explicitly model the second-order feature interaction instead of traditional inner product and Hadamard product. Finally， in order to further improve the performance， this paper combines the DNN component with the shallow model to form the depth model， uses the Bayesian optimization algorithm to select a set of super parameters for the depth model to avoid the complex parameter adjustment process， and the experimental results on two benchmark datasets verify the effectiveness of the model.

Key words: CTR prediction, automatic embedded search, squeeze excitation network, multi-head self-attention mechanism, graph neural network, Bayesian optimization

夏义春, 李汪根, 李豆豆, 葛英奎, 王志格. 结合注意力机制和图神经网络的CTR预估模型[J]. 计算机与现代化, 2023, 0(03): 29-37.

XIA Yi-chun, LI Wang-gen, LI Dou-dou, GE Ying-kui, WANG Zhi-ge. CTR Prediction Model Combining Attention Mechanism and Graph Neural Network[J]. Computer and Modernization, 2023, 0(03): 29-37.

参考文献

［1］ ZHAO X Y， WANG C， CHEN M， et al. AutoEmb： Automated embedding dimensionality search in streaming recommendations［J］. arXiv preprint arXiv：2002.11252， 2020.
［2］ RENDLE S. Factorization machines［C］// The 10th IEEE International Conference on Data Mining. 2010：14-17.
［3］陈彬，张荣梅，张琦. DCFM：基于深度学习的混合推荐模型［J］. 计算机工程与应用， 2021，57（3）：150-155.
［4］王瑞平，贾真，刘畅，等. 基于DeepFM的深度兴趣因子分解机网络［J］. 计算机科学， 2021，48（1）：226-232.
［5］王越，于莲芝. 一个以注意力机制结合隐式和显式的特征交叉的CTR预估模型［J］. 小型微型计算机系统， 2021，42（9）：1884-1890.
［6］邓路佳，刘平山. 基于GMM-FMs的广告点击率预测研究［J］. 计算机工程， 2019，45（5）：122-126.
［7］冯勇，韩晓龙，顾兆旭，等. 基于耦合CNN评分预测模型的个性化商品推荐［J］. 小型微型计算机系统， 2020，41（2）：393-398.
［8］ ZHANG W N， DU T M， WANG J. Deep learning over multi-field categorical data［C］// European Conference on Information Retrieval. 2016：45-57.
［9］ CHENG H T， KOC L， HARMSEN J， et al. Wide & deep learning for recommender systems［C］// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. 2016：7-10.
［10］ XIAO J， YE H， HE X N， et al. Attentional factorization machines： Learning the weight of feature interactions via attention networks［J］. arXiv preprint arXiv：1708.04617， 2017.
［11］ GUO H F， TANG R M， YE Y M， et al. DeepFM： A factorization-machine based neural network for CTR prediction［J］. arXiv preprint arXiv：1703.04247， 2017.
［12］ LIAN J X， ZHOU X H， ZHANG F Z， et al. xDeepFM： Combining explicit and implicit feature interactions for recommender systems［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018：1754-1763.
［13］ HUANG T W， ZHANG Z Q， ZHANG J L. FiBiNET： Combining feature importance and bilinear feature interaction for click-through rate prediction［C］// Proceedings of the 13th ACM Conference on Recommender Systems. 2019：169-177.
［14］ HU J， SHEN L， ALBANLE S. Squeeze-and-excitation networks［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018：7132-7141.
［15］ VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［J］. Advances in Neural Information Processing Systems， 2017：6000-6010.
［16］ SUN Y， PAN J W， ZHANG A， et al. FM2： Field-matrixed factorization machines for recommender systems［C］// Proceedings of the Web Conference 2021. 2021：2828-2837.
［17］ YANG J， ZHANG D， FRANGI A F， et al. Two-dimensional PCA： A new approach to appearance-based face representation and recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2004，26（1）：131-137.
［18］ ELSKEN T， METZEN J H， HUTTER F. Neural architecture search： A survey［J］. The Journal of Machine Learning Research， 2019，20（1）：1997-2017.
［19］ JOGLEKAR M R， LI C， CHEN M， et al. Neural input search for large scale recommendation models［C］// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020：2387-2397.
［20］ GINART A A， NAUMOV M， MUDIGERE D， t al. Mixed dimension embeddings with application to memory-efficient recommendation systems［C］// 2021 IEEE International Symposium on Information Theory （ISIT）. 2021：2786-2791.
［21］ MCMAHAN H B， HOLT G， SCULLEY D， et al. Ad click prediction： A view from the trenches［C］// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013：1222-1230.
［22］ RENDLE S. Factorization machines with libfm［J］. ACM Transactions on Intelligent Systems and Technology （TIST）， 2012，3（3）：1-22.
［23］ HE X， CHUA T S. Neural factorization machines for sparse predictive analytics［C］// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017：355-364.
［24］ WANG R X， FU B， FU G， et al. Deep & cross network for ad click predictions［C］// Proceedings of the ADKDD'17. 2017：1-7.
［25］ LIU B， TANG R M， CHEN Y Z， et al. Feature generation by convolutional neural network for click-through rate prediction［C］// The World Wide Web Conference. 2019：1119-1129.
［26］ SONG W P， SHI C C， XIAO Z P， et al. Autoint： Automatic feature interaction learning via self-attentive neural networks［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019：1161-1170.
［27］ CHENG W Y， SHEN Y Y， HUANG L P. Adaptive factorization network： Learning adaptive-order feature interactions［J］. arXiv preprint arXiv：1909.03276， 2019.
［28］ LI Z K， CUI Z Y， WU S， et al. Fi-GNN： Modeling feature interactions via graph neural networks for ctr prediction［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019：539-548.
［29］ QU Y R， FANG B H， ZHANG W N， et al. Product-based neural networks for user response prediction over multi-field categorical data［J］. ACM Transactions on Information Systems （TOIS）， 2018，37（1）：1-35.
［30］ SHAN Y， HOENS T R， JIAO J， et al. Deep crossing： Web-scale modeling without manually crafted combinatorial features［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016：255-262.
［31］ CHO K， VAN MERRIËNBOER B， GULCEHRE C， et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation［J］. arXiv preprint arXiv：1406.1078， 2014.

[1]	马钰, 杨勇, 任鸽, 帕力旦·吐尔逊. 基于GCN和微调BERT的作文自动评分方法[J]. 计算机与现代化, 2024, 0(09): 33-37.
[2]	黄政霖, 董宝良. 基于语义和结构增强的时序知识图谱问答方法[J]. 计算机与现代化, 2024, 0(03): 15-23.
[3]	陈俊义. 基于图节点动静态特征的健康事件预测模型[J]. 计算机与现代化, 2023, 0(10): 39-44.
[4]	冀心成, 汪衍凯, 张迎, 许彦杰. 贝叶斯优化梯度提升树的室内日光照度分布预测[J]. 计算机与现代化, 2023, 0(09): 44-50.
[5]	刘付谦, 秦华妮, 赖惠慧. 基于SMOTE和贝叶斯优化的Adj-LightGBM人岗匹配算法[J]. 计算机与现代化, 2023, 0(03): 90-95.
[6]	姚春华, 张学磊, 宋馨宇, 张举, 蔡佳志, 冯翱. 一种基于图卷积神经网络和依存分析的财经新闻情感分析方法[J]. 计算机与现代化, 2022, 0(05): 33-39.
[7]	万发洋, 于旭, 徐其江. 基于多头自注意力机制的深度缺陷分派模型[J]. 计算机与现代化, 2021, 0(11): 39-43.
[8]	杨佳雪, 彭国争, 韩立新. 基于用户动态兴趣的视频点击率预测模型[J]. 计算机与现代化, 2021, 0(03): 82-87.
[9]	王垚,李为,吴克河,崔文超. GBDT与LR融合模型在加密流量识别中的应用[J]. 计算机与现代化, 2020, 0(03): 93-.
[10]	仵海云. 基于MLP和Sobol的注采连通情况判别[J]. 计算机与现代化, 2020, 0(03): 40-.