计算机与现代化 ›› 2023, Vol. 0 ›› Issue (07): 13-19.doi: 10.3969/j.issn.1006-2475.2023.07.003

• 人工智能 • 上一篇    下一篇

基于区域内读数段分类的插入/缺失基因组变异检测方法

  

  1. 1.哈尔滨师范大学计算机科学与信息工程学院,黑龙江 哈尔滨 150025; 2.烟台大学计算机与控制工程学院,山东 烟台 264005
  • 出版日期:2023-07-26 发布日期:2023-07-27
  • 作者简介:李兰兰(1998—),女,江苏徐州人,硕士研究生,研究方向,智能软件技术,变异检测,E-mail: lilanlan_hb@163.com;高建龙(1996—),男,江苏泗阳人,硕士研究生,研究方向,生物信息学,E-mail: jlgao852@yeah.net; 通信作者:朱晓 (1984—),男,山东日照人,讲师,博士,研究方向:智能软件技术,生物信息,E-mail: xzhu@ytu.edu.cn; 穆培政 (1998—),男,山东泰安人,硕士研究生,研究方向,变异检测,E-mailL: mupeizheng@163.com。
  • 基金资助:
    国家自然科学基金资助项目(61902094); 黑龙江省自然科学基金资助项目(QC2018082); 黑龙江省普通本科高等学校青年创新人才培养计划项目(UNPYSCT-2018183)

Insertion/Deletion Genomic Variations Detection Method Based on Regional Read#br# Segments Classification#br#

  1. 1. School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China;
    2. School of Computer and Control Engineering, Yantai University, Yantai 264005, China
  • Online:2023-07-26 Published:2023-07-27

摘要: 长读长测序技术产生的长读数,尤其是精确的长读数,为变异检测提供了很好的数据基础。插入/缺失是较常见的基因组变异,也是重要的致病性变异来源。人类基因组的二倍体特性和高度重复结构导致一些复杂形式的杂合插入/缺失变异的检测仍具有一定难度,变异检测的敏感度和精确度仍有改进空间。针对现有方法对复杂形式的杂合插入/缺失的变异检测效果不佳这一问题,提出一种基于区域内读数段分类的插入/缺失基因组变异检测方法。该方法基于精确的长读数,使用基于双序列比对的读数段分类算法将区域内的读数段根据人类基因组的二倍体特性至多分为2组,从而更精确地检测插入/缺失变异。该方法与其他5种常见的变异检测方法在2组模拟数据集和1组真实数据集上进行比较。实验结果表明,该方法可以提高复杂杂合插入/缺失变异检测的敏感度,具有较好的插入/缺失变异检测效果。

关键词: 长读数, 读数段, 分类, 插入变异, 缺失变异, 变异检测

Abstract:  The long-read sequencing data produced by long-read sequencing technology, especially the accurate long-read, provides a good data basis for genome variation detection. Insertion/deletion variation is a common genomic variation and an important source of pathogenic variation. The diploid characteristics and highly repetitive structure of the human genome led to some difficulties in the detection of some complex forms of heterozygous insertion/deletion variations, and there is still room for improvement in the sensitivity and accuracy of variation detection. In order to solve the problem that the previous methods did not work well for the detection of heterozygous insertion/deletion variations in complex forms, an insertion/deletion genomic variation detection method based on reginal read segments classification is proposed. This method is based on accurate long-read. The read segments classification algorithm based on pairwise alignment is used to divide the read segments in the region into two groups at most according to the diploid characteristics of the human genome, so as to detect insertion/deletion variations more accurately. The proposed method is compared with five other common variation detection methods on two simulated datasets and one real dataset. Experimental results show that this method can improve the sensitivity of complex heterozygous insertion/deletion variations detection, and has a good effect of insertion/deletion variations detection.

Key words: long-read, read segments, classification, insertion variation, deletion variation, variation detection

中图分类号: