基于堆叠降噪自编码器的跨项目软件缺陷数量预测方法

摘要/Abstract

摘要： 在软件缺陷预测技术应用中，需要预测的项目可能是一个全新的项目，或者需要预测的项目历史数据较为不足。一种解决方法是利用已有数据充足的项目（源项目）构建模型完成对新项目（目标项目）的预测，主要利用传统机器学习方法对源项目与目标项目进行特征迁移学习完成缺陷预测，但不同项目之间的数据存在较大的分布差异，同时传统机器方法学习到的特征表示能力很弱且缺陷预测性能较差。针对此问题，从深度学习出发提出一种基于堆叠降噪自编码器的跨项目缺陷预测方法，该方法结合堆叠降噪自编码器和最大均值差异距离，能够有效地提取源项目与目标项目可迁移的深层次特征表示，基于该特征可以训练出有效的缺陷数量预测模型。实验结果表明，在Relink数据集和AEEEM数据集上与经典的跨项目缺陷预测方法Burak过滤法、Peters过滤法、TCA以及TCA+进行比较，该方法在大多数情况下可取得最好的预测结果。

关键词: 跨项目软件缺陷预测, 堆叠降噪自编码器, 最大均值差异距离, 深度特征表示

Abstract: In the application of software defect prediction technology， the project to be predicted may be a brand new project， or the historical data of the project to be predicted is insufficient. One solution is to use a project （source project） with sufficient data to build a model to complete the prediction of a new project （target project）， and mainly use traditional machine learning methods to perform feature transfer learning on the source project and the target project to complete defect prediction. There is a large difference in the distribution of data between different projects， and the feature representation ability learned by traditional machine methods is weak and the defect prediction performance is poor. In response to this problem， a cross-item defect prediction method based on stacked denoising autoencoders is proposed from the perspective of deep learning. This method combines stacked denoising autoencoders and maximum mean difference distance， which can effectively extract the transferable deep-level feature representation of source items and target items， based on which an effective defect number prediction model can be trained. The experimental results show that compared with the classical cross-item defect prediction methods Burak filtering method， Peters filtering method， TCA and TCA+ on Relink dataset and AEEEM dataset， this method achieves the best prediction results in most cases.

Key words: cross-project software defect prediction, stacked denoising autoencoders, maximum mean difference distance, deep feature representation

刘路瑶, 韩培胜. 基于堆叠降噪自编码器的跨项目软件缺陷数量预测方法[J]. 计算机与现代化, 2023, 0(04): 32-38.

Denoising Autoencoders. Cross-project Software Defect Number Prediction Method Based on Stacked[J]. Computer and Modernization, 2023, 0(04): 32-38.

参考文献

［1］ HALL T， BEECHAM S， BOWES D， et al. A systematic literanture review on fault prediction performance in software engineering［J］. IEEE Transactions on Software Engineering， 2012，38（6）:1276-1304.
［2］贾修一，张文舟，李伟湋，等. 基于变分自编码器的异构缺陷预测特征表示方法［J］. 软件学报， 2021，32（7）:2204-2218.
［3］倪超，陈翔，刘望舒，等. 基于特征迁移和实例迁移的跨项目缺陷预测方法［J］. 软件学报， 2019，30（5）:1308-1329.
［4］ HOSSEINI S， TURHAN B， GUNARACHNA D. A systematic literature review and meta-analysis on cross project defect prediction［J］. IEEE Transactions on Software Engineering， 2019，45（2）:111-147.
［5］陈翔，王莉萍，顾庆，等. 跨项目软件缺陷预测方法研究综述［J］. 计算机学报， 2018，41（1）:254-274.
［6］ ZIMMERMANN T， NAGAPPAN N， GALL H， et al. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process［C］// Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 2009:91-100.
［7］ TURHAN B， MENZIES T， BENER A B， et al. On the relative value of cross-company and within-company data for defect prediction［J］. Empirical Software Engineering， 2009，14（5）:540-578.
［8］ PETERS F， MENZIES T， MARCUS A. Better cross company defect prediction［C］// Proceedings of the 2013 10th Working Conference on Mining Software Repositories （MSR）. 2013:409-418.
［9］何吉元，孟昭鹏，陈翔，等. 一种半监督集成跨项目软件缺陷预测方法［J］. 软件学报， 2017，28（6）:1455-1473.
［10］陈曙，叶俊民，刘童. 一种基于领域适配的跨项目软件缺陷预测方法［J］. 软件学报， 2020，31（2）:266-281.
［11］ TONG H N， LIU B， WANG S H. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning［J］. Information and Software Technology， 2018，96:94-111.
［12］ WANG S， LIU T Y， NAM J， et al. Deep semantic feature learning for software defect prediction［J］. IEEE Transactions on Software Engineering， 2020，46（12）:1267-1293.
［13］ LI J， HE P J， ZHU J M， et al. Software defect prediction via convolutional neural network［C］// Proceedings of the 2017 IEEE International Conference on Software Quality， Reliability and Security （QRS）. 2017:318-328.
［14］邱少健. 基于迁移学习的跨项目软件缺陷预测关键技术研究［D］. 广州:华南理工大学， 2019.
［15］ DENG L， SELTZER M L， YU D， et al. Binary coding of speech spectrograms using a deep auto-encoder［C］// Proceedings of the 11th Annual Conference of the International Speech Communication Association. 2010:1692-1695.
［16］ VINCENT P， LAROCHELLE H， BENGIO Y， et al. Extracting and composing robust features with denoising autoencoders［C］// Proceedings of the 25th International Conference on Machine Learning. 2008:1096-1103.
［17］ VINCENT P， LAROCHELLE H， LAJOIE I， et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion［J］. Journal of Machine Learning Research， 2010，11:3371-3408.
［18］ BENGIO Y， LAMBLIN P， POPOVICI D， et al. Greedy layer-wise training of deep networks［C］// Proceedings of the 19th International Conference on Neural Information Processing Systems. 2006:153-160.
［19］ BORGWARDT K M， GRETTON A， RASCH M J， et al. Integrating structured biological data by kernel maximum mean discrepancy［J］. Bioinformatics， 2006，22（14）:e49-e57.
［20］简艺恒，余啸. 基于数据过采样和集成学习的软件缺陷数目预测方法［J］. 计算机应用， 2018，38（9）:2637-2643.
［21］ PAN S J， TSANG I W， KWOK J T， et al. Domain adaptation via transfer component analysis［J］. IEEE Transactions on Neural Networks， 2011，22（2）:199-210.
［22］ NAM J， PAN S J， KIM S. Transfer defect learning［C］// Proceedings of the 2013 35th International Conference on Software Engineering （ICSE）. 2013:382-391.
［23］ D’AMBROS M， LANZA M， ROBBES R. Evaluating defect prediction approaches: A benchmark and an extensive comparison［J］. Empirical Software Engineering， 2012，17（4-5）:531-577.
［24］ PETERS F， MENZIES T. Privacy and utility for defect prediction: Experiments with MORPH［C］// Proceedings of the 2012 34th International Conference on Software Engineering （ICSE）. 2012:189-199.
［25］ WU R X， ZHANG H Y， KIM S， et al. Relink: Recovering links between bugs and changes［C］// Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011:15-25.
［26］ D’AMBROS M， LANZA M， ROBBES R. An extensive comparison of bug prediction approaches［C］// Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories （MSR 2010）. 2010:31-41.
［27］李叶飞，官国飞，葛崇慧，等. FSDNP:针对软件缺陷数预测的特征选择方法［J］. 计算机工程与应用， 2019，55（14）:61-68.