计算机与现代化 ›› 2013, Vol. 1 ›› Issue (2): 130-133.doi: 10.3969/j.issn.1006-2475.2013.02.032

• 应用与开发 • 上一篇    下一篇

一种GPU集群任务调度中间件的设计与实现

陈春雷   

  1. 西北工业大学自动化学院,陕西西安710072
  • 收稿日期:2012-12-21 修回日期:1900-01-01 出版日期:2013-02-27 发布日期:2013-02-27

Design and Implementation of Task Scheduling Middleware for GPU Cluster

CHEN Chun-lei   

  1. School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
  • Received:2012-12-21 Revised:1900-01-01 Online:2013-02-27 Published:2013-02-27

摘要: GPU的协处理器特性使得计算机集群的静态任务调度策略会导致计算能力分配不均。针对这一问题,本文提出一种基于权重的动态任务调度策略,并通过中间件的形式将其应用于GPU集群。该策略将集群中的所有GPU视为整体,但不依赖于全局信息。每个集群节点仅通过在本地维护的GPU权重来决定使用本地GPU或远程GPU。作为调度策略的载体,中间件保证了调度策略对用户的透明,它主要由3个部分构成:API库、资源管理后台程序和GPU执行后台程序。在两节点验证性平台上的实验结果表明,该策略获得的GPU利用率比静态调度策略高16%,比另一种依赖全局信息的动态调度策略(基于全局队列的调度策略)高45%。

关键词: GPU集群, 动态任务调度, 中间件

Abstract: In a GPU cluster, the static task scheduling policy may result in unbalanced allocation of computing resource, because GPUs work as co-processors. A weight-based dynamic scheduling policy is proposed and implemented as a middleware, so that it can be applied to the GPU cluster. Under this policy, local GPUs and remote GPUs are not explicitly distinguished, and no global information is required. Every cluster node decides whether to use local GPUs or remote GPUs, according to weights of GPUs. And these weights are locally maintained by each node, respectively. As a carrier of the policy, the middleware ensures that the policy is transparent to users. It is composed of three parts: API library, resource management daemon, and GPU execution daemon. The policy is validated on a two-node cluster. Experiments show that the weight-based dynamic scheduling policy can achieve a 16% higher GPU utilization rate than the static policy, and a 45% higher GPU utilization rate than another dynamic policy (global-queue-based policy).

Key words: GPU cluster, dynamic task scheduling, middleware