基于粗糙集与改进KNN算法的文本分类方法的研究

doi:10.3969/j.issn.1006-2475.2012.02.023

计算机与现代化 ›› 2012, Vol. 198 ›› Issue (2): 86-89.doi: 10.3969/j.issn.1006-2475.2012.02.023

基于粗糙集与改进KNN算法的文本分类方法的研究

邵莉

阿坝师范高等专科学校教务处，四川汶川 623000

收稿日期:2011-09-13 修回日期:1900-01-01 出版日期:2012-02-24 发布日期:2012-02-24

Study of Text Classification Method Based on Rough Set and Improved KNN Algorithm

SHAO Li

Teaching Affairs Office, Aba Teachers College，Wenchuan, 623000, China

Received:2011-09-13 Revised:1900-01-01 Online:2012-02-24 Published:2012-02-24

摘要/Abstract

摘要： KNN算法是文本自动分类领域中的一种常用算法，对于低维度的文本分类，其分类准确率较高。然而在处理大量高维度文本时，传统KNN算法由于需处理大量训练样本导致样本相似度的计算量增加，降低了分类效率。为解决相关问题，本文首先利用粗糙集对高维文本信息进行属性约简，删除冗余属性，而后用改进的基于簇的KNN算法进行文本分类。通过仿真实验，证明该方法能够提高文本的分类精度和准确率。

关键词: 粗糙集, 改进KNN, 文本分类

Abstract: The KNN algorithm is a common method in the field of automatic text classification. It has high classification accuracy for texts with low dimensional vectors. However, when it deals with large numbers of highdimensional texts, the traditional KNN algorithm, due to the need to process considerable the training samples, result in increased similarity calculation and reduced classification efficiency. To solve ensuing problems, this paper uses the rough set method to reduce the attributes of decision table and remove redundant attributes, and then the improved clusterbased KNN algorithm is used to classify texts. Simulation results show that the method can improve the precision and accuracy rate of text classification.

Key words: rough set, improved KNN algorithm, text classification method

中图分类号:

TP392

邵莉. 基于粗糙集与改进KNN算法的文本分类方法的研究[J]. 计算机与现代化, 2012, 198(2): 86-89.

SHAO Li. Study of Text Classification Method Based on Rough Set and Improved KNN Algorithm[J]. Computer and Modernization, 2012, 198(2): 86-89.

[1]	周宪溪, 牟莉. 基于改进TF-IDF和AGLCNN的新闻长文本分类模型[J]. 计算机与现代化, 2024, 0(08): 120-126.
[2]	张可1, 艾中良2, 刘忠麟3, 顾平莉1, 刘学林4. 基于多元组匹配损失的司法论辩理解方法[J]. 计算机与现代化, 2024, 0(06): 115-120.
[3]	毛明扬, 徐胜超. 面向粒子群优化BP神经网络的粗糙集连续属性离散化算法[J]. 计算机与现代化, 2023, 0(09): 115-119.
[4]	徐涯昕, 何泽恩, 徐绪堪. 基于CNN-BiLSTM网络的数控机床故障文本自动分类[J]. 计算机与现代化, 2023, 0(04): 7-14.
[5]	段珣, 杨志勇, 江峰. 一种基于邻域粒度熵的离群点检测算法[J]. 计算机与现代化, 2022, 0(10): 19-23.
[6]	张军, 邱龙龙. 一种基于BERT和池化操作的文本分类模型[J]. 计算机与现代化, 2022, 0(06): 1-7.
[7]	赵延平, 王芳, 夏杨. 基于支持向量机的短文本分类方法[J]. 计算机与现代化, 2022, 0(02): 92-96.
[8]	郭书武, 陈军华. 基于深度学习的教材德目分类方法[J]. 计算机与现代化, 2021, 0(09): 106-112.
[9]	贾澎涛, 孙炜. 基于深度学习的文本分类综述[J]. 计算机与现代化, 2021, 0(07): 29-37.
[10]	代继鹏, 邵峰晶, 孙仁诚. 基于改进CHI和TF-IDF的短文本分类的研究[J]. 计算机与现代化, 2021, 0(06): 6-11.
[11]	尼格拉木·买斯木江, 艾孜尔古丽·玉素甫. 基于BERT及双向GRU模型的慕课用户评论情感倾向性分析[J]. 计算机与现代化, 2021, 0(04): 20-26.
[12]	姚传文, 黄道斌, 叶明全, . 基于粗糙集自适应粒度的MR脑肿瘤图像分割[J]. 计算机与现代化, 2021, 0(01): 34-37.
[13]	李希敏, 李书琪. 基于粗糙集的多源数据库缓存冲突自动处理方法[J]. 计算机与现代化, 2020, 0(10): 36-39.
[14]	周灵, 张英俊, 潘理虎. 一种基于情感特征的短文本分类方法[J]. 计算机与现代化, 2020, 0(07): 80-84.
[15]	景栋盛, 薛劲松, 冯仁君. 基于深度Q网络的垃圾邮件文本分类方法[J]. 计算机与现代化, 2020, 0(06): 89-.

基于粗糙集与改进KNN算法的文本分类方法的研究

Study of Text Classification Method Based on Rough Set and Improved KNN Algorithm

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价