计算机与现代化

• 操作系统 • 上一篇    下一篇

基于磁光虚拟存储系统的文件调度算法

  

  1. (南京航空航天大学计算机科学与技术学院,江苏南京211106)
  • 收稿日期:2018-10-17 出版日期:2019-05-14 发布日期:2019-05-14
  • 作者简介:王子炫(1993-),男,安徽阜阳人,硕士研究生,研究方向:大数据存储技术,E-mail: 965565284@qq.com; 魏力,江苏无锡人,硕士研究生,研究方向:机器学习,大数据分析; 张育平(1959-),男,副教授,研究方向:软件工程,构件。
  • 基金资助:
    国家自然科学基金资助项目(61402225); 江苏省自然科学基金资助项目(BK20140832); 中国博士后基金资助项目(2013M540447)

File Scheduling Algorithm Based on Magneto-optical Virtual Storage System

  1. (School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
  • Received:2018-10-17 Online:2019-05-14 Published:2019-05-14

摘要: 基于光盘库的Hadoop分布式文件系统(HDFS光盘库)在单位存储成本、数据安全性、使用寿命等方面非常符合当前大数据存储要求,但是HDFS不适合存储大量小文件和实时数据读取。为了使HDFS光盘库能更好地运用到更多大数据存储场景,本文提出一种更加适合大数据存储的磁光虚拟存储系统(MOVS, Magneto-optical Virtual Storage System)。系统在HDFS光盘库与用户之间加入磁盘缓存,并在磁盘缓存内通过文件标签分类、虚拟存储、小文件合并等技术将磁盘缓存内小文件合并为适合HDFS光盘库存储的大文件,提高系统的数据传输速度。系统还使用了文件预取、缓存替换等文件调度算法对磁盘缓存内文件进行动态更新,减少用户访问HDFS光盘库次数。实验结果表明,MOVS相对HDFS光盘库在响应时间和数据传输速度方面得到很大改善。

关键词: 磁盘缓存, 虚拟存储, 文件预取, 缓存替换, 小文件合并

Abstract: The Hadoop distributed file system (HDFS CD-ROM database) based on CD-ROM database meets the current requirements of large data storage in terms of unit storage cost, data security and service life, etc., but it is not suitable for storing a large number of small files and real-time data reading. To better apply HDFS CD-ROM database in more big data storage scenarios, this paper proposes a magneto-optical virtual storage system (MOVS) more suitable for big data storage, which adds disk cache between HDFS CD-ROM database and users, and merges small files in disk cache into large files suitable for HDFS CD-ROM storage through file label classification, virtual storage, small file merging and other technologies, improving the data transmission speed. The system also uses file scheduling algorithm such as file pre-fetching and cache replacement to dynamically update the files in disk cache, so as to minimize the number of HDFS CD-ROM database accesses. The results of experiment show that MOVS can greatly improve the response time and data transmission speed compared with HDFS CD-ROM database.

Key words: disk cache, virtual storage, file pre-fetching, cache replacement, small file merging

中图分类号: