An effective implementation of Strassen's algorithmusing AVX intrinsics for a multicore architecture
รหัสดีโอไอ
Creator 1. Nwe Zin Oo
2. Panyayot Chaikan
Title An effective implementation of Strassen's algorithmusing AVX intrinsics for a multicore architecture
Publisher Research and Development Office, Prince of Songkla University
Publication Year 2563
Journal Title Songklanakarin Journal of Science and Technology
Journal Vol. 42
Journal No. 6
Page no. 1368-1376
Keyword advanced vector extension, AVX, AVX-2, matrix-matrix multiplication, FMA, Strassen's algorithm
URL Website https://rdo.psu.ac.th/sjstweb/index.php
ISSN 0125-3395
Abstract This paper proposes an effective implementation of Strassen's algorithm with AVX intrinsics to augment matrix-matrixmultiplication in a multicore system. AVX-2 and FMA3 intrinsic functions are utilized, along with OpenMP, to implement themultiplication kernel of Strassen's algorithm. Loop tiling and unrolling techniques are also utilized to increase the cacheutilization. A systematic method is proposed for determining the best stop condition for the recursion to achieve maximumperformance on specific matrix sizes. In addition, an analysis method makes fine-tuning possible when our algorithm is adaptedto another machine with a different hardware configuration. Performance comparisons between our algorithm and the latestversions of two well-known open-source libraries have been carried out. Our algorithm is, on average, 1.52 and 1.87 times fasterthan the Eigen and the OpenBLAS libraries, respectively, and can be scaled efficiently when the matrix becomes larger.
Songklanakarin Journal of Science and Technology (SJST)

บรรณานุกรม

EndNote

APA

Chicago

MLA

ดิจิตอลไฟล์

Digital File
DOI Smart-Search
สวัสดีค่ะ ยินดีให้บริการสอบถาม และสืบค้นข้อมูลตัวระบุวัตถุดิจิทัล (ดีโอไอ) สำนักการวิจัยแห่งชาติ (วช.) ค่ะ