计算机与现代化 ›› 2020, Vol. 0 ›› Issue (06): 28-.

• 信息安全 • 上一篇    下一篇

云计算环境下的可修分布式系统可靠性分析方法

  

  1. (1.贵州大学计算机科学与技术学院,贵州贵阳550025;2.贵州省公共大数据重点实验室,贵州贵阳550025)
  • 收稿日期:2020-02-06 出版日期:2020-06-24 发布日期:2020-06-28
  • 作者简介:杨牧川(1995-),男,四川广元人,硕士研究生,研究方向:云计算,E-mail: morchyang@qq.com; 吕晓丹(1970-),男,上海人,副教授,硕士,研究方向:云计算,数据分析,E-mail: lvxiaodan111@126.com; 蒋朝惠(1965-),男,四川广安人,教授,硕士,研究方向:云计算,信息安全,软件工程,E-mail: jiangchaohui@126.com。
  • 基金资助:
    贵州省科技计划资助项目(黔科合基础[2017]1051)

Reliability Analysis Method for Repairable Distributed System in Cloud Compute Environment

  1. (1. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China;
    2. Guizhou Provincial Key Laboratory of Public Big Data, Guiyang 550025, China)
  • Received:2020-02-06 Online:2020-06-24 Published:2020-06-28

摘要: 随着云计算技术的进一步发展,越来越多的应用系统托管在云计算平台上,这就对构成云计算平台的众多分布式系统的可靠性提出了更高的要求。传统分析方法难以在系统规模较大时对可修分布式系统做可靠性分析。为了提高服务质量以及降低因违反服务水平协议而导致的经济损失,本文基于马尔可夫模型提出一种适用于可修分布式系统的可靠性分析方法。通过简化系统的状态空间,在系统运行期间对其软硬件状态进行采样,并通过对分布式系统的失效过程和修复过程进行分析,根据给定时间内的失效概率序列、修复概率序列计算分布式系统的节点状态转移矩阵,得出该马尔可夫矩阵对应的稳态向量。根据特定分布式系统的自身特性,对该稳态向量进一步分析,得出系统最终的可靠性衡量指标。最后通过实验验证了该方法的可用性和有效性。

关键词: 可靠性, 分布式系统, 马尔可夫模型

Abstract: With the further development of cloud computing technology, more and more application systems are hosted on cloud computing platforms, which puts forward higher requirements for the reliability of the many distributed systems that make up a cloud computing platform. It is difficult for traditional analysis methods to analyze the reliability of repairable distributed system when the system scale is large and dynamic.  In order to improve service quality and reduce economic losses caused by violation of service level agreements, this paper proposes a reliability analysis method for repairable distributed systems based on Markov models. By simplifying the state space of the system, the software and hardware states are sampled during the system operation, and the failure process and repair process of the distributed system are analyzed. According to the failure probability sequence and repair probability sequence in a given time, the node state transition matrix of the distributed system is calculated, and the steady-state vector corresponding to the Markov matrix is obtained. Then according to the characteristics of the distributed system, the steady-state vector is further analyzed to obtain the final reliability measurement index of the system. Finally, the validity and effectiveness of the method are verified by experiments.

Key words: reliability, distributed system, Markov model

中图分类号: