计算机与现代化 ›› 2025, Vol. 0 ›› Issue (03): 12-21.doi: 10.3969/j.issn.1006-2475.2025.03.003

• 人工智能 • 上一篇    下一篇

 多级联合图嵌入亲脂性分子分类




  

  1. (延安大学数学与计算机科学学院,陕西 延安 716000)
  • 出版日期:2025-03-28 发布日期:2025-03-28
  • 基金资助:
    国家自然科学基金资助项目(62262067); 陕西省人才项目(YAU202213065, CXY202107); 延安大学十四五重大科研项目(2021ZCQ012); 延安大学研究生教育创新计划项目(YCX2023006)

Multilevel Joint Graph Embedding for Lipophilic Molecular Classification

  1. (School of Mathematics and Computer Science, Yan’an University, Yan’an 716000, China)
  • Online:2025-03-28 Published:2025-03-28

摘要: 亲脂性分子分类是生物信息学和化学领域的重要研究方向,其目标是根据化学结构和功能特征对分子进行有效的亲脂性分类。然而,由于亲脂性分子的特性复杂且多样,传统的图神经网络分类方法在处理这类问题时,未能有效地提取分子内部的层次关系和充分考虑分子的结构信息,这导致关键原子的信息丢失和全局结构信息缺失。针对上述问题,提出一种多级联合图嵌入网络Mul-JoG。Mul-JoG融合Graph Transformer和图池化策略构建网络层,通过将不同网络层的输出串联,每一层融合以往所有层的信息,从而构成多级联合图嵌入网络。该网络通过从多个视角获取分子的拓扑结构,捕捉分子的全局信息和交互关系,有效建模分子复杂结构,实现亲脂性分子的准确分类。在药物分子数据集上的实验结果显示,Mul-JoG在AUC和ACC上分别达到了97.96%和92.94%。相较于基准方法,AUC提升了1.53个百分点,ACC提升了3.07个百分点。这表明Mul-JoG能够更准确分类亲脂性分子。

关键词: 亲脂性分类, 分子表示学习, 图神经网络, 图池化策略

Abstract:  Classification of lipophilic molecules is an important area of research in the fields of bioinformatics and chemistry, where the goal is to efficiently classify molecules in terms of lipophilicity on the basis of their chemical structure and functional characteristics. However, due to the complex and diverse properties of lipophilic molecules, the traditional graph neural network classification methods fail to effectively extract the hierarchical relationships within the molecule and fully consider the structural information of the molecule when dealing with this type of problem, which results in the loss of information about the key atoms and the lack of global structural information. To address the above problems, a Multilevel Joint Graph Embedding Network (Mul-JoG) is proposed. Mul-JoG fuses Graph Transformer and graph pooling strategies to construct network layers, by concatenating the outputs of different network layers, and each layer fuses the information from all previous layers to form a multi-level joint graph embedding network. By obtaining the topological structure of molecules from multiple perspectives, the network captures the global information and interactions of molecules, effectively modeling the complex structure of molecules, and realizing the accurate classification of lipophilic molecules. The experimental results on the drug molecule dataset show that Mul-JoG achieved 97.96% and 92.94% in AUC and ACC, respectively. Compared with the benchmark method, the AUC and ACC is improved by 1.53 and 3.07 percentage points, respectively. The results showed that Mul-JoG is able to accurately classify lipophilic molecules.

Key words:  , lipophilicity classification, molecules indicate learning, graph neural network, graph pooling strategy

中图分类号: