[2] Pennycook S J, Hughes C J, Smelyanskiy M, et al. Exploring SIMD for molecular dynamics, using Intel® Xeon® processors and Intel® Xeon PhiTM coprocessors[C]// Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 2013:1085-1097.
[3] Liu Xing, Smelyanskiy M, Chow E, et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors[C]// Proceedings of the 27th International ACM Conference on Supercomputing. 2013:273-282.
[4] 高伟,赵荣彩,韩林,等. SIMD自动向量化编译优化概述[J]. 软件学报, 2015,26(6):1265-1284.
[5] 陈火旺,刘春林,谭庆平,等. 程序设计语言:编译原理[M]. 3版. 北京:国防工业出版社, 2000.
[6] Zhao Yuan, Kennedy K. Scalarization using loop alignment and loop skewing[J]. Journal of Supercomputing, 2005,31(1):5-46.
[7] Sarkar V. Optimized unrolling of nested loops[C]// Proceedings of the 14th ACM International Conference on Supercomputing. 2000:153-166.
[8] Liu Jun, Zhang Yuanrui, Jang O, et al. A compiler framework for extracting superword level parallelism[C]// Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012:347-358.
[9] Liu Peng, Zhao Rongcai, Gao Wei, et al. A new algorithm to exploit superword level parallelism[C]// Proceedings of the 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing. 2013:521-527.
[10] Dragomir O S, Bertels K. K-loops: Loop skewing for reconfigurable architectures[C]// Proceedings of the 2009 IEEE International Conference on Field-Programmable Technology. 2009:199-206.
[11] Sinkarovs A, Scholz S B. Data layout inference for code vectorisation[C]// Proceedings of the 2013 IEEE International Conference on High Performance Computing and Simulation. 2013:527-534.
[12] Mei Gang, Tian Hong. Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation[DB/OL]. https://arxiv.org/pdf/1402.4986v1.pdf, 2014-02-20. |