计算机与现代化 ›› 2022, Vol. 0 ›› Issue (08): 43-49.

• 人工智能 • 上一篇    下一篇

基于组稀疏联合学习的影像遗传学数据关联分析

  

  1. (1.西安交通大学城市学院数学教研室,陕西西安710018;2.西安交通大学数学与统计学院,陕西西安710049)
  • 出版日期:2022-08-22 发布日期:2022-08-22
  • 作者简介:赵迎利(1995—),女,陕西咸阳人,助教,硕士研究生,研究方向:机器学习,数据挖掘,E-mail: z_yingli@126.com; 朱旭(1964—),男,教授,博士,研究方向:大数据算法与分析,E-mail: 1643760185@qq.com。
  • 基金资助:
    西安交通大学城市学院新教工专项(202002X10)

Association Analysis of Image Genetic Data Based on Group Sparse Joint Leraning

  1. (1. Department of Mathematics Teaching, Xi’an Jiaotong University City College, Xi’an 710018, China;
    2. School of Mathematics and Statistics,Xi’an Jiaotong University, Xi’an 710049, China)

  • Online:2022-08-22 Published:2022-08-22

摘要: 影像遗传学的发展很大程度上促进精神类疾病的研究,其主要是分析并挖掘多模态数据以找出与疾病相关的致病机制,但是此类数据的特征之间通常呈现出群组相关或者多个特征相关的特性,传统的方法很难找出具有相关性的疾病机制,易出现过稀疏的问题。针对上述问题,本文引入可以实现组内稀疏和组间平滑的正则化项l1,2范数,并将其与可以实现组间稀疏和组内平滑的l2,1范数联合共同惩罚典型相关分析,通过优化数据之间的相关性实现具有相关性的群组特征和组内特征之间的两模态数据集的特征选择。仿真实验结果表明,本文方法在较准确地估计出2组数据之间的相关系数的同时可选择出具有相关性的组间特征和组内特征;在真实的精神分裂症数据集上,本文方法可找出更多的与精神分裂症相关的易感基因和风险脑区。

关键词: l1,2范数, l2,1范数, 相关性, 组稀疏典型相关分析, 特征选择

Abstract: The development of image genetics has greatly promoted the research of mental diseases. It mainly analyzes and mines multimodal data to find out the disease-related pathogenesis. However, the data usually show the characteristics of group correlation or multiple feature correlation. It is difficult to find the relevant disease mechanism by traditional methods, which is prone to the problem of too sparse. To solve the above problems, this paper introduces the regularization term l1,2 norm which can achieve intra-group sparsity and inter-group smoothing, and jointly punishes canonical correlation analysis with the l2,1 norm which can achieve inter-group sparsity and intra-group smoothing. By optimizing the correlation between data, the feature selection of two-modal data sets with related group features and intra-group features is realized. The results of simulation experiments show that this method can not only accurately estimate the correlation coefficient between the two groups of data, but also select the relevant inter-group and intra-group features. On the real schizophrenia data set, this method can find more susceptibility genes and risk brain regions related to schizophrenia.

Key words: l1,2 norm, l2,1 norm, relevance, group sparse canonical correlation analysis, feature selection