计算机与现代化

• 应用与开发 • 上一篇    下一篇

基于序列与结构特征结合的蛋白质与DNA绑定位点预测

  

  1. 南京理工大学计算机科学与工程学院,江苏南京210094
  • 收稿日期:2015-10-19 出版日期:2016-01-22 发布日期:2016-01-26
  • 作者简介:杨骥(1990-),男,安徽合肥人,南京理工大学计算机科学与工程学院硕士研究生,研究方向:模式识别与生物信息学。

Prediction of DNA-protein Binding Sites Based on Combining Sequence with Structure Information

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Received:2015-10-19 Online:2016-01-22 Published:2016-01-26

摘要: 目前国内外对于DNA-蛋白质绑定位点预测的研究大多集中在仅以蛋白质序列信息或仅以蛋白质结构信息为基础进行计算,而二者结合所实现的预测效果较差。本文提出一种在蛋白质位置特异性得分矩阵序列特征的基础上,结合蛋白质残基的溶剂可及表面积、相对表面积、深度和突出指数这几个结合效果良好的结构特征的DNA与蛋白质绑定位点预测方法,并使用随机下采样方法解决训练集样本不平衡问题,最后使用支持向量机算法进行预测。实验结果表明,本文方法具有较好的预测能力。

关键词: 位置特异性得分矩阵, 可及表面积, 相对表面积, 深度与突出指数, 随机下采样, 支持向量机

Abstract: Most of the research of DNA-protein binding sites are focusing on just computing protein sequence information or structure information, while the results are terrible if combing this two information, no matter what at home or abroad. To solve this problem, we combine protein structure information of accessible surface area, relative solvent accessibility, depth index and protrusion index with protein sequence information of position specific scoring matrix to predict DNA-Protein binding sites. Then we use under sampling to solve the unbalance problem of training dataset. Finally, we use support vector machine to make prediction. The result of experiment shows the method that we proposed can achieve better performance in prediction.

Key words: position specific scoring matrix, accessible surface area, relative solvent accessibility, depth index and protrusion index, under sampling, support vector machine

中图分类号: