Loop Skewing Optimization on Global Data Regrouping

doi:10.3969/j.issn.1006-2475.2017.06.009

Computer and Modernization ›› 2017, Vol. 0 ›› Issue (6): 45-49.doi: 10.3969/j.issn.1006-2475.2017.06.009

Previous Articles Next Articles

Loop Skewing Optimization on Global Data Regrouping

(1. China South Power Grid International Co., Ltd., Guangzhou 510080, China;
2. 5th Lab, Electric Power Information Security Classified Protection Test and Evaluation Center, Guangzhou 510080, China;
3. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China)

Received:2016-10-01 Online:2017-06-23 Published:2017-06-23

Abstract

Abstract: Loop skewing is a method of loop transformation in program optimization. It changes the form of iteration space and marks the across iterations in loops with the traditional parallel. The loop can be calculated in parallel. But after loop skewing, data which is programed in parallel is discrete. Times of every iteration execution are different. To make full use of SIMD extension, this paper presents loop skewing optimization on global data regrouping. We analyze loop skewing optimization and regroup the data in array for the problem of discrete data. This part improves the data locality and it is simple to do the vector operation. Then we realize the non-full vector operation for the problem of different iteration times. This part makes the tail loop can be executed in vectorization. At last, we choose the wavefront program for testing. After optimization, the program execution speed can be increased by 10.73 times in average.

Key words: SIMD, loop skew, data regroup, non-full vector operation

CLC Number:

TP312

CHEN Hua-jun1,2, WANG Qi3, HONG Chao1,2, FANG Meng1,2. Loop Skewing Optimization on Global Data Regrouping[J]. Computer and Modernization, 2017, 0(6): 45-49.

References

[1] Intel. Intel Intrinsics Guide[DB/OL]. https://software.intel.com/sites/landingpage/IntrinsicsGuide/, 2011-10-20.

[2] Pennycook S J, Hughes C J, Smelyanskiy M, et al. Exploring SIMD for molecular dynamics, using Intel® Xeon® processors and Intel® Xeon Phi^TM coprocessors[C]// Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 2013:1085-1097.

[3] Liu Xing, Smelyanskiy M, Chow E, et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors[C]// Proceedings of the 27th International ACM Conference on Supercomputing. 2013:273-282.

[4] 高伟,赵荣彩,韩林,等. SIMD自动向量化编译优化概述[J]. 软件学报, 2015,26(6):1265-1284.

[5] 陈火旺,刘春林,谭庆平,等. 程序设计语言:编译原理[M]. 3版. 北京:国防工业出版社, 2000.

[6] Zhao Yuan, Kennedy K. Scalarization using loop alignment and loop skewing[J]. Journal of Supercomputing, 2005,31(1):5-46.

[7] Sarkar V. Optimized unrolling of nested loops[C]// Proceedings of the 14th ACM International Conference on Supercomputing. 2000:153-166.

[8] Liu Jun, Zhang Yuanrui, Jang O, et al. A compiler framework for extracting superword level parallelism[C]// Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012:347-358.

[9] Liu Peng, Zhao Rongcai, Gao Wei, et al. A new algorithm to exploit superword level parallelism[C]// Proceedings of the 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing. 2013:521-527.

[10] Dragomir O S, Bertels K. K-loops: Loop skewing for reconfigurable architectures[C]// Proceedings of the 2009 IEEE International Conference on Field-Programmable Technology. 2009:199-206.

[11] Sinkarovs A, Scholz S B. Data layout inference for code vectorisation[C]// Proceedings of the 2013 IEEE International Conference on High Performance Computing and Simulation. 2013:527-534.

[12] Mei Gang, Tian Hong. Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation[DB/OL]. https://arxiv.org/pdf/1402.4986v1.pdf, 2014-02-20.

Loop Skewing Optimization on Global Data Regrouping

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics

Comments