计算机与现代化 ›› 2024, Vol. 0 ›› Issue (09): 74-81.doi: 10.3969/j.issn.1006-2475.2024.09.013

• 数据库与数据挖掘 • 上一篇    下一篇

基于Focal Loss改进LightGBM的供水管网毛刺数据检测



  

  1. (南京航空航天大学经济与管理学院,江苏 南京 211106)
  • 出版日期:2024-09-27 发布日期:2024-09-29
  • 基金资助:
    国家自然科学基金面上项目(72174086)

Water Supply Pipeline Burr Data Detection Based on Improved LightGBM by Focal Loss

  1. (College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
  • Online:2024-09-27 Published:2024-09-29

摘要: 针对数据不平衡导致的管网毛刺数据检测召回率偏低问题,提出一种Focal Loss改进LightGBM的管网毛刺数据检测方法。首先,结合管网毛刺数据的特点,针对性构造邻域相关特征。其次,将Focal Loss函数引入LightGBM,提高模型对难以检测的毛刺样本的权重,并对Focal Loss不同的参数取值进行实验,以平衡精确率与召回率。最后,选择不同参数的Focal Loss进行模型融合,进一步提升模型对不平衡毛刺数据的检测性能。在某市供水管网的真实数据上进行实验,结果表明,对比基于交叉熵损失函数的单一模型,本文提出的Focal Loss改进后的融合模型在毛刺数据上召回率和F1值的提升幅度达33.3和18个百分点,但毛刺数据的精确率还有待进一步提升。本文所提方法从损失函数入手,动态调整难易样本的权重,有效地提升了不平衡数据下的毛刺数据的检测性能。

关键词: 异常检测, Focal Loss, LightGBM, 不平衡数据, 毛刺数据

Abstract:  Addressing the issue of low recall in the detection of burrs in water supply pipelines due to data imbalance, this paper proposes an improved method for detecting pipeline burr data by utilizing the Focal Loss function and integrating it with LightGBM. Firstly, considering the characteristics of pipeline burr data, neighborhood-related features are constructed. Secondly, the Focal Loss function is introduced into LightGBM to increase the model’s weight on hard-to-detect burr samples. Different parameter values for Focal Loss are experimented to balance precision and recall. Finally, different parameter settings for Focal Loss are selected for model fusion to further improve the detection performance of the model on imbalanced burr data. Experiments are carried out on a real dataset from a municipal water supply pipeline. The experimental results show that, compared with a single model based on the cross-entropy loss function, the fused model with the improved Focal Loss in this paper achieves 33.3 percentage points increase in recall and 18 percentage points increase in F1 score for burr data. However, the precision of burr data detection still needs further improvement. The method proposed in this paper starts with loss function and dynamically adjusts the weights of difficult and easy samples to effectively improve the detection performance of burr data under unbalanced data.

Key words: anomaly detection, Focal Loss, LightGBM, imbalanced data, burr data

中图分类号: