计算机与现代化

• 算法设计与分析 •    下一篇

一种启发式线性回归损失函数选取方法

  

  1. 雅安职业技术学院机电与信息工程系,四川雅安625000
  • 收稿日期:2017-02-28 出版日期:2017-08-31 发布日期:2017-09-01
  • 作者简介:张祎(1975-),女,四川雅安人,雅安职业技术学院机电与信息工程系讲师,硕士,研究方向:计算机应用,软件工程。

A Heuristic Linear Regression Loss Function Selection Method

  1. Department of Mechanical, Electrical and Information Engineering, Ya’an Polytechnic College, Ya’an 625000, China
  • Received:2017-02-28 Online:2017-08-31 Published:2017-09-01

摘要: 损失函数度量回归分析中,信息损失和错误的程度是机器学习算法最小化的目标函数。本文研究在有限数据集上线性回归分析的损失函数选取方法。对于给定的噪声密度,存在一个满足一致性条件的最优损失函数(如噪声密度满足高斯分布,则常见的最优损失函数是平方损失函数)。但在实际应用中,噪声密度往往是不确定的,且训练样本集有限。一些统计信息可用来对有限信息环境下的损失函数进行选取,但这些统计信息是基于一些一致性假设且在有限的样本集上不一定有效。针对这些问题,借鉴Vapnik的ε-insensitive损失函数,提出一种启发式的基于样本数目及噪声方差的参数设置方法。实验结果表明,与常用的平方损失函数及Huber的leastmodulus loss相比,本文的损失函数性能更健壮且预测效率更准确。

关键词: 损失函数, 支持向量机, 平方损失函数, 参数选择, VC维

Abstract: Loss function is used to quantify information loss and false degree in regression analysis. This paper addresses heuristic loss function selection for linear regression. For a given noise density, there exists an optimal loss function under an asymptotic setting i.e. squared loss is optimal for Gaussian noise density. However, in reallife applications the noise density is always unknown and the training samples are finite. Robust statistics provides ways for selecting the loss function using statistical information about noise density, however robust statistics is based on asymptotic assumption and may not be well applied for finite sample data sets. For such practical problems, we try to utilize concept of Vapnik’s εinsensitive loss function. We propose a heuristic method for setting the value of ε as a function of samples and noise variance. Experimental comparisons for linear regression problems show that the proposed loss function performs more robustly performance and yields higher prediction accuracy compared with popular squared loss and Huber’s leastmodulus loss. 

Key words: support vector machine, square loss function, parameter selection, VC dimension

中图分类号: