Computer and Modernization ›› 2023, Vol. 0 ›› Issue (07): 13-19.doi: 10.3969/j.issn.1006-2475.2023.07.003

Previous Articles     Next Articles

Insertion/Deletion Genomic Variations Detection Method Based on Regional Read#br# Segments Classification#br#

  

  1. 1. School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China;
    2. School of Computer and Control Engineering, Yantai University, Yantai 264005, China
  • Online:2023-07-26 Published:2023-07-27

Abstract:  The long-read sequencing data produced by long-read sequencing technology, especially the accurate long-read, provides a good data basis for genome variation detection. Insertion/deletion variation is a common genomic variation and an important source of pathogenic variation. The diploid characteristics and highly repetitive structure of the human genome led to some difficulties in the detection of some complex forms of heterozygous insertion/deletion variations, and there is still room for improvement in the sensitivity and accuracy of variation detection. In order to solve the problem that the previous methods did not work well for the detection of heterozygous insertion/deletion variations in complex forms, an insertion/deletion genomic variation detection method based on reginal read segments classification is proposed. This method is based on accurate long-read. The read segments classification algorithm based on pairwise alignment is used to divide the read segments in the region into two groups at most according to the diploid characteristics of the human genome, so as to detect insertion/deletion variations more accurately. The proposed method is compared with five other common variation detection methods on two simulated datasets and one real dataset. Experimental results show that this method can improve the sensitivity of complex heterozygous insertion/deletion variations detection, and has a good effect of insertion/deletion variations detection.

Key words: long-read, read segments, classification, insertion variation, deletion variation, variation detection

CLC Number: