To improve data availability and performance of cluster, current HDFS adapt uniform data replication. However, different files have different popularity and sometimes
the disparity is enormous, access to high popular data may hurt job performance. To address this problem, a dynamic replicas strategy based on predicted popularity is put
forward. By making full use of the recent data popularity, based on grey prediction model, we use Markov prediction model to correct the predicted deviation because of the burst
access and shifting access, and get the accurate predicted popularity of file. After then, finite channel service model based on the predicted popularity is established to
calculate the minimum replicas meeting user demand. Experimental result shows that compared with default data replication, our strategy can more effectively avoid contentions,
reduce the time consuming of job, and alleviated the network traffic.