Computer and Modernization

Previous Articles     Next Articles

Realization of Accelerating Gene Big Data Analysis by Grid Computing

  

  1. (1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
      2. Beijing Chigene Translational Medical Research Center Co. Ltd., Beijing 100176, China)
  • Received:2019-04-03 Online:2019-08-15 Published:2019-08-16

Abstract: In order to solve the problems of large amount of gene sequencing data, long time data analysis, high cost of building FPGA and GPU computing platform, and insufficient compatibility of computing software, the paper designs a high-throughput sequencing data analysis architecture called Sequence Grid(SeqGrid) by distributed computing ideas. The architecture installs the centos open source operating system, uses the grid engine Sun Grid Engine (SGE), an ordinary CPU, a mechanical hard disk, and a SSD hard disk, and concurrently dispatches bioinformatics software bwa, GATK, etc. to realize data analysis. The results show the 30 GB data analysis time of single whole exome sequence is shortened from 15 hours to 1 hour, and the computing speed is 15 times faster than that of the serial process, which effectively improves the efficiency of data analysis.

Key words: grid computing, high performance computing cluster, bioinformatics, high-throughput sequencing

CLC Number: