Computer and Modernization ›› 2021, Vol. 0 ›› Issue (10): 49-56.

Previous Articles     Next Articles

A Spark Streaming Parameter Optimization Method Based on Deep Reinforcement Learning

  

  1. (1. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China;
    2. Guizhou Provincial Key Laboratory of Software Engineering and Information Security, Guiyang 550025, China;
    3. Iflytek Co., Ltd., Hefei 230011, China)
  • Online:2021-10-14 Published:2021-10-14

Abstract: Spark Streaming is the mainstream open source distributed stream analysis framework, and its performance optimization is one of the current research hotspots. In Spark Streaming performance optimization, configuration parameter optimization in business scenarios is an important factor in its performance improvement. In the Spark Streaming system, there are more than 200 configurable parameters, which requires high experience for parameter tuning personnel. Non optimized parameter configuration will affect the execution performance of streaming jobs. Therefore, in view of the parameter configuration optimization problem of Spark Streaming, a Spark Streaming parameter optimization method based on deep reinforcement learning (DQN-SSPO) is proposed, which converts the parameter optimization configuration problem of Spark Streaming into the problem of obtaining the maximum return in deep reinforcement learning model training, and a weighted state space transfer method is proposed to increase the probability of high feedback rewards for model training. Experiments on three typical streaming analysis tasks show that the performance of streaming jobs on Spark Streaming after parameter optimization is reduced by 27.93% in total scheduling time and 42% in total processing time.

Key words: Spark Streaming, performance optimization, deep reinforcement learning, parameter tuning