计算机与现代化 ›› 2023, Vol. 0 ›› Issue (01): 88-94.

• 算法设计与分析 • 上一篇    下一篇

基于模拟退火的扩展孤立森林异常检测算法

  

  1. (1.常熟理工学院计算机科学与工程学院,江苏 常熟 215500; 2.常熟市医学检验所,江苏 常熟 215500)
  • 出版日期:2023-03-02 发布日期:2023-03-02
  • 作者简介:王诗愉(2000—),男,四川江油人,本科生,研究方向:机器学习与异常检测,E-mail: 834772398@qq.com; 肖利东(2000—),男,福建晋江人,本科生,研究方向:机器学习与数据挖掘,E-mail:1328344592@qq.com; 严心淳(1980—),男,江苏常熟人,高级工程师,硕士,研究方向:医学数据分析挖掘,E-mail: 32598601@qq.com; 通信作者:应文豪(1979—),男,江苏常熟人,副教授,硕士生导师,博士,研究方向:数据挖掘与智能计算,E-mail: ywh@cslg.edu.cn。
  • 基金资助:
    国家重点研发计划项目(2018YFB1004901); 教育部人文社科基金资助项目(18YJCZH229); 国家自然科学基金资助项目(61972059); 江苏省教育科学“十三五”规划课题(X-a/2018/08); 江苏省大学生创新创业训练计划项目(202110333006)

Extended Isolated Forest Anomaly Detection Algorithm Based on Simulated Annealing

  1. (1. School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China; 
    2. Changshu Medicine Examination Institute, Changshu 215500, China)
  • Online:2023-03-02 Published:2023-03-02

摘要: 扩展孤立森林(Extended Isolation Forest, EIF)有效解决了孤立森林(Isolation Forest, iForest)对局部异常点不敏感的问题,但EIF将轴平行的孤立条件更替为使用随机斜率的超平面,导致算法模型损失了一部分泛化能力,并由于大量的向量点乘运算增加了时间开销。针对上述情况,提出一种基于模拟退火的扩展孤立森林算法(Extended Isolation Forest based on Simulated Annealing, SA-EIF)。该算法根据每棵孤立树(Isolation Tree, iTree)对于数据集的预测结果计算每棵iTree的精度值和差异值,并基于此构建适应度函数,最终利用模拟退火算法筛选数棵检测性能较优的iTree构建集成学习模型。在ODDS 异常检测数据集中进行K折交叉验证的实验结果表明:SA-EIF算法对局部异常点敏感,较现有的EIF算法减少约20%~40% 的时间开销,提高约5%~10%的检测精度。

关键词: 扩展孤立森林, 孤立森林, 模拟退火, 异常检测

Abstract: Extended Isolation Forest (EIF) effectively solves the problem that Isolation Forest(iForest) is not sensitive to local abnormal points, but EIF replaces the isolated condition of axis-parallel with a hyperplane with random slope, which causes the algorithm model to lose part of the generalization ability, and increases time cost due to a large number of vector dot multiplication operations. In response to the above situation, an Extended Isolation Forest based on Simulated Annealing (SA-EIF) is proposed. The algorithm calculates the accuracy value and the difference value of each iTree (Isolation Tree) according to the prediction result of each iTree for the data set, then builds fitness function based on this. Finally, the iTree with better detection performance is selected by the simulated annealing algorithm to construct integrative learning model. The experimental results of K-fold cross-validation in the ODDS anomaly detection dataset indicate that the SA-EIF algorithm is sensitive to local anomalies, reducing the time cost by 20%~40% compared with EIF, and the recognition accuracy is about 5%~10% higher than EIF.

Key words: extended isolation forest, isolation forest, simulated annealing, anomaly detection