计算机与现代化

• 人工智能 • 上一篇    下一篇

面向微阵列基因数据的基于PA指标双向聚类算法

  

  1. (1.广东医学院信息工程学院,广东 东莞 523808; 2.广东医学院公共卫生学院,广东 东莞 523808
  • 收稿日期:2014-09-24 出版日期:2014-12-22 发布日期:2014-12-22
  • 作者简介:林勤(1987-),男,广东揭阳人,广东医学院信息工程学院助理实验师,硕士研究生,研究方向:数据挖掘,并行计算; 林斯达(1992-),男,广东汕头人,广东医学院公共卫生学院本科生,研究方向:数据挖掘,生物信息学; 朱文敏(1992-),男,广东河源人,本科生,研究方向:云计算,数据挖掘。
  • 基金资助:
    广东医学院面上基金资助项目(XK1330); 广东医学院大学生科研立项发明类重点项目(2012FZDI004); 广东医学院大学生科研立项社科类一般项目(2013SYDG009)

A Biclustering Algorithm Based on PA Index in Microarray Gene Expression Data

  1. (1. School of Information Engineering, Guangdong Medical College, Dongguan 523808, China;
    2. School of Public Health, Guangdong Medical College, Dongguan 523808, China)
  • Received:2014-09-24 Online:2014-12-22 Published:2014-12-22

摘要: 针对目前双聚类算法很少考虑所得聚类结果整体的划分质量问题,提出一种基于PA指标的双聚类算法。该算法选定一种衡量所有簇划分效果的PA指标来构造双聚类的模型,运用启发式贪心策略,通过迭代增删行列的方式挖掘出划分效果较高的几个双聚类。将所提算法与CC、FLOC算法进行算法性能的比较。实验结果表明,该算法能获得更好的结果。这说明该算法更能挖掘出具备既有统计意义又有生物意义的局部模式。

关键词: PA指标, 双聚类, GO分析, 微阵列基因数据

Abstract: To improve the global quality of the outcome of the biclustering, a biclustering algorithm based on PA index was proposed. In this algorithm, the PA index which can estimate the effect of outcome of the biclustering was chosen to construct the bicluster model. Through deleting or inserting rows and columns in the heuristic greedy fashion, the algorithm obtained the significant biclusters of high global quality. To compare the performance of the algorithm with CC and FLOC, a real dataset experiment was simulated. The result shows that the algorithm in this paper can obtain better results. In a word, the algorithm is capable of detecting potentially statistically and biologically significant biclusters.

Key words: PA index, biclustering, GO analysis, microarray gene expression data

中图分类号: