基于SMOTE和贝叶斯优化的Adj-LightGBM人岗匹配算法

摘要/Abstract

摘要： 近2年由于新冠疫情的影响，各行各业受到了巨大的冲击，传统招聘方式难以实行，一方面招聘单位人才缺口大，另一方面求职者无法线下应聘。网络招聘的出现为求职者和招聘单位带来了一定的方便，但仍存在人岗匹配效率低、匹配不平衡的问题，如何精准且快速地完成人岗匹配工作成为需要解决的迫切问题。针对该问题，提出一种基于SMOTE和贝叶斯优化的Adj-LightGBM人岗匹配算法。首先对人岗数据集进行数据预处理；其次使用SMOTE算法对匹配成功样本进行过采样处理，处理后的正负样本比例为1：3；然后在验证集上使用贝叶斯优化寻找最优的LightGBM模型；最后对该模型进行测试与评价，得出该模型的F1-score为0.974，Auc为0.971。通过与支持向量机、随机森林以及XGBoost算法进行对比，发现本文提出的Adj-LightGBM算法不仅在人岗匹配预测上具有更高的准确性，而且在模型训练效率上也有着显著优势。

关键词: 人岗匹配, 不平衡数据, 过采样技术, 贝叶斯优化, 轻量级梯度提升机

Abstract: COVID-19 has a significant impact on all walks of life during the last two years. The traditional recruitment tactics are difficult to put into practice. On the one hand， the recruitment gap is large， on the other hand， job seekers have nowhere to apply for a job. The emergence of online recruitment has brought some convenience to job seekers and recruitment units， but there are still issues such as low efficiency and unbalanced matching betheen person-post. How to execute job matching effectively and swiftly has become an urgent issue that need to be addressed. To solve this problem， a person-posts matching algorithm of Adj-LightGBM based on SMOTE and Bayesian optimization is proposed. Firstly， the post data set is preprocessed. Secondly， SMOTE algorithm is used to over sample the successfully matched samples with a positive-to-negative sample ratio of 1：3. Then， Bayesian optimization is used to find the optimal LightGBM model on the verification set. Finally， the model is tested and evaluated. The optimal Auc and F1-score of the model is 0.974 and 0.970. Compared with support vector machine， random forest and XGBoost algorithm， it is discovered that the proposed algorithm not only has higher accuracy in person-post matching prediction， but also has substantial benefits in model training efficiency.

Key words: person-post matching, unbalanced data, SMOTE, Bayesian optimization, LightGBM

刘付谦, 秦华妮, 赖惠慧. 基于SMOTE和贝叶斯优化的Adj-LightGBM人岗匹配算法[J]. 计算机与现代化, 2023, 0(03): 90-95.

LIU Fu-qian, QIN Hua-ni, LAI Hui-hui. Person-post Matching Adj-LightGBM Algorithm Based on SMOTE and Bayesian Optimization[J]. Computer and Modernization, 2023, 0(03): 90-95.

参考文献

［1］姚振. 2021届高校毕业生就业工作进展情况［EB/OL］. （2021-05-13）［2021-11-11］. http：//moe.gov.cn/jyb_xwfb/xw_
fbh/moe_ 2606/2021/tqh_ 210513/sfcl/202105/t20210513_
531163.html.
［2］ SEKIGUCHI T. Person-organization fit and person-job fit in employee selection： A review of the literature［J］. Osaka Keidai Ronshu， 2004，54（6）：179-196.
［3］ KUMAR P S P， DHAMODHARAN V， CHANDRASEKAR M. Empirical study on employee engagement， increased productivity， happiness and job satisfaction resulting through proper P-J fit and P-O fit among BPO professionals in Chennai， Indian Scenario［J］. International Journal of Physical and Social Sciences， 2015，5（10）：442-453.
［4］ MALINOWSKI J， KEIM T， WENDT O， et al. Matching people and jobs： A bilateral recommendation approach［C］// Proceedings of the 39th Annual Hawaii International Conference on System Sciences. 2006，6：137c-142c.
［5］沈文海. 人岗匹配的理论研究与实证分析［D］. 厦门：厦门大学， 2002.
［6］赵希男，温馨，贾建锋. 组织中人岗匹配的测算模型及应用［J］. 工业工程与管理， 2008（2）：112-117.
［7］易斌，姜飞. 支持向量机在人岗匹配度测算中的应用［J］. 中南林业科技大学学报（社会科学版）， 2011，5（6）：92-94.
［8］张毅，高元荣，黄宗财，等. 结合深度语义特征的人岗精准匹配算法［J］. 贵州大学学报（自然科学版），2021，38（1）：65-70.
［9］张斌. 基于BP神经网络的煤炭企业人岗匹配研究［J］. 煤炭经济研究， 2020（7）：82-88.
［10］袁珍珍，卢少华. BP神经网络在人岗匹配度测算中的应用［J］. 武汉理工大学学报（信息与管理工程版）， 2010，32（3）：515-518.
［11］戴卫东，蒋蓉，李铁欣. 基于BP神经网络的科技人员人岗匹配测评模型［J］. 沈阳工业大学学报（社会科学版）， 2018，11（2）：160-164.
［12］王庆，刘琨，张志超. 基于BP神经网络的知识员工—岗位匹配测评研究［J］. 科技管理研究， 2009，29（10）：294-295.
［13］周文泳，冯丽霞，段春艳. 基于不平衡数据的公司破产预测研究［J］. 同济大学学报（自然科学版）， 2022，50（2）：283-290.
［14］ CHAWLA N V， BOWYER K W， HALL L O， et al. SMOTE： Synthetic minority over-sampling technique［J］. Journal of Artificial Intelligence Research， 2002，16（1）：321-357.
［15］ KE G L， MENG Q， FINLEY T， et al. LightGBM： A highly efficient gradient boosting decision tree［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017：3149-3157.
［16］丁建立，孙玥. 基于LightGBM的航班延误多分类预测［J］. 南京航空航天大学学报， 2021，53（6）：847-854.
［17］高治鑫，包腾飞，李扬涛，等. 基于贝叶斯优化LightGBM的大坝变形预测模型［J］. 长江科学院院报， 2021，38（7）：46-50.
［18］肖迁，焦志鹏，穆云飞，等. 基于LightGBM的电动汽车行驶工况下电池剩余使用寿命预测［J］. 电工技术学报， 2021，36（24）：5176-5185.
［19］王兴玲，李占斌. 基于网格搜索的支持向量机核函数参数的确定［J］. 中国海洋大学学报（自然科学版）， 2005（5）：859-862.
［20］ BERGSTRA J， BENGIO Y. Random search for hyper-parameter optimization［J］. Journal of Machine Learning Research， 2012，13（1）：281-305.
［21］ PELIKAN M. Bayesian optimization algorithm［M］// Hierarchical Bayesian Optimization Algorithm. Springer， 2005：31-48.
［22］宋建，陈广森，陈敬福，等. 基于特征选择和贝叶斯优化LightGBM的注塑制品尺寸预测［J］. 工程塑料应用， 2021，49（8）：54-60.
［23］徐韧，苏怀智，杨立夫. 基于GP-XGBoost的大坝变形预测模型［J］. 水利水电科技进展， 2021，41（5）：41-46.
［24］ KUSHNER H J. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise［J］. Journal of Fluids Engineering， 1963，86（1）：97-106.
［25］广东省政务服务数据管理局、深圳市人民政府.人岗精准匹配模型［EB/OL］.（2021-04-20）［2021-10-12］. https：//www.sodic.com.cn/competitions/900008/datasets?utm_referer=sodic.
［26］陈霞，邱桃荣，魏玲玲，等. 基于数据挖掘的病历数据预处理［J］. 计算机与现代化， 2007（5）：23-24.
［27］刘中强，邹维维. 基于采样技术和LightGBM的用户用电异常检测模型［J］. 计算机系统应用， 2021，30（9）：232-236.
［28］ ZHU C， ZHU H S， XIONG H， et al. Person-job fit： Adapting the right talent for the right job with joint representation learning［J］. ACM Transactions on Management Information Systems， 2018，9（3）：1-17.
［29］秦川. 面向智能招聘的数据挖掘方法及其应用［D］. 合肥：中国科学技术大学， 2021.
［30］丁世飞，齐丙娟，谭红艳. 支持向量机理论与算法研究综述［J］. 电子科技大学学报， 2011，40（1）：2-10.

[1]	周传华1, 2, 任太娇1, 罗岚1, 周昊1. 基于联合熵的非平衡数据边界混合重采样[J]. 计算机与现代化, 2024, 0(09): 95-100.
[2]	薛浩, 马静, 郭小宇. 基于Focal Loss改进LightGBM的供水管网毛刺数据检测[J]. 计算机与现代化, 2024, 0(09): 74-81.
[3]	冀心成, 汪衍凯, 张迎, 许彦杰. 贝叶斯优化梯度提升树的室内日光照度分布预测[J]. 计算机与现代化, 2023, 0(09): 44-50.
[4]	夏义春, 李汪根, 李豆豆, 葛英奎, 王志格. 结合注意力机制和图神经网络的CTR预估模型[J]. 计算机与现代化, 2023, 0(03): 29-37.
[5]	孙丹, 施炜利, 饶兰香, 孟莎莎, 郭晓明, 李逸伦. 基于改进混合采样和XGBoost算法的信用卡欺诈检测方法[J]. 计算机与现代化, 2022, 0(09): 111-118.
[6]	龚云翔, 袁仕芳, 刘付谦. 基于集成学习与不平衡数据的返贫预测[J]. 计算机与现代化, 2022, 0(04): 12-16.
[7]	唐洁, 文元美. 基于3CNN-BiGRU的睡眠自动分期研究[J]. 计算机与现代化, 2022, 0(02): 120-126.
[8]	周传华, 朱俊杰, 徐文倩, 邓佳佳. 基于聚类欠采样的集成分类算法[J]. 计算机与现代化, 2021, 0(11): 72-76.
[9]	董燕辉, 肖军弼, 张红霞, 杨勇进, 计志滨. 面向不平衡数据集的应用系统识别方法[J]. 计算机与现代化, 2021, 0(05): 93-97.
[10]	闫芮铵, 张立臣. 基于Focal Loss和卷积神经网络的入侵检测[J]. 计算机与现代化, 2021, 0(01): 65-69.
[11]	史明华,吴广潮. 基于聚类混合采样的不平衡数据分类[J]. 计算机与现代化, 2020, 0(05): 34-.
[12]	王垚,李为,吴克河,崔文超. GBDT与LR融合模型在加密流量识别中的应用[J]. 计算机与现代化, 2020, 0(03): 93-.
[13]	仵海云. 基于MLP和Sobol的注采连通情况判别[J]. 计算机与现代化, 2020, 0(03): 40-.
[14]	易未1,毛力1,孙俊1,吴林海2,3. 改进Smote算法在不平衡数据集上的分类研究[J]. 计算机与现代化, 2018, 0(03): 83-.
[15]	霍旭1，吴涛1,2. 对于不平衡数据的模糊时间序列预测[J]. 计算机与现代化, 2017, 0(12): 108-110.