计算机与现代化 ›› 2022, Vol. 0 ›› Issue (08): 25-29.

• 算法设计与分析 • 上一篇    下一篇

基于FLANN改进的KNN医疗分类算法

  

  1. (东华理工大学信息工程学院,江西南昌330013)
  • 出版日期:2022-08-22 发布日期:2022-08-22
  • 作者简介:郭凯(1999—),男,江西抚州人,硕士研究生,研究方向:大数据分析与云计算,计算机视觉,图像处理,E-mail: 2162602386@qq.com; 通信作者:艾菊梅(1966—),女,江西南昌人,教授,硕士,研究方向:大数据分析与云计算,机器学习,信息安全,E-mail: 412221434@qq.com。
  • 基金资助:
    江西省放射性地学大数据技术工程实验室开放基金资助项目(JELRGBDT201805)

An Improved KNN Medical Classification Algorithm Based on FLANN

  1. (School of Information Engineering, East China Institute of Technology Nanchang 330013, China)
  • Online:2022-08-22 Published:2022-08-22

摘要: 本文通过研究KNN(K-最近邻)算法在疾病预测领域的使用与分析,总结出KNN的2点不足,针对不足进行相应改进并提出F_KNN(循环最近邻搜索)算法:1)针对KNN计算量大、效率低下的缺点,本文采用FLANN(快速最近邻搜索)循环搜索与待测样本距离最近的点,记录若干个最近邻点作为最近邻点子集,利用此子集取代全集对待测样本进行计算,可以降低计算量,极大地提高了KNN算法效率;2)针对KNN难以对高维数据集分类的缺点,本文采用AHP(层次分析法)对样本的特征属性进行相关性研究,使用合适的参数分配权重,提高了KNN算法准确率。本文采用一组脑中风数据集对优化后的算法进行实验,实验结果表明,F_KNN准确率达96.2%。与传统KNN相比,F_KNN提高了分类性能且极大地提高了算法效率。在处理高维且较大的数据集时,F_KNN算法优势明显,具有较好的应用前景。

关键词: K-最近邻, 循环最近邻搜索, 快速最近邻搜索, 层次分析法, 脑中风, 疾病预测

Abstract: In this paper, by studying the application and analysis of KNN (k-nearest neighbor) algorithm in the field of disease prediction, two shortcomings of KNN are summarized, and the F_KNN (cyclic search nearest neighbor) algorithm is proposed: 1) for faults of KNN large amount of calculation and low efficiency, this paper uses the FLANN (quick nearest neighbor search) to loop search the nearest point of sample under test, record the number of nearest neighbor points as nearest neighbor ideas set, calculate using the sample subset to replace the complete treatment, can reduce the amount of calculation, greatly improve the efficiency of the KNN algorithm; 2) In view of the shortcoming of KNN that it is difficult to classify high-dimensional data sets, AHP (analytic hierarchy process) is adopted in this paper to study the correlation of characteristic attributes of samples, and appropriate parameters are used to assign weights, which improves the accuracy of KNN algorithm. In this paper, a set of cerebral apoplexy data sets are used to test the optimized algorithm, and the experimental results show that the accuracy of F_KNN is 96.2%. Compared with the traditional KNN, it improves the classification performance and greatly improves the efficiency of the algorithm. When dealing with high dimensional and large data sets, F_KNN algorithm has obvious advantages and has a good application prospect.

Key words: KNN, F_KNN, FLANN, AHP, stroke, disease prediction