精品成人一区二区三区免费视频,视频一区中文字幕,亚洲人成网址

基于GPU的稀疏深度神經網絡性能優化

電子技術應用

石于誠，黃建強，邊浩東，吳利，賈金芳，王曉英

青海大學計算機技術與應用系，青海西寧 810016

摘要： 摘要：隨著神經網絡層數不斷加深，稀疏深度神經網絡在計算與存儲空間上更具優勢，但稀疏深度神經網絡的性能仍然有待優化。為此提出基于GPU的稀疏深度神經網絡性能優化方法，對于計算順序進行調整，增強數據重用性，并結合GPU的獨特結構與CUDA編程方法，通過預取等方法進一步提升性能。基于GraphChallenge官方提供的數據集，相較于cuSPARSE相關庫函數，最高獲得了2.5倍的性能加速。

關鍵詞： 深度神經網絡稀疏化異構平臺稀疏矩陣矩陣乘

中文引用格式： 石于誠，黃建強，邊浩東，等. 基于GPU的稀疏深度神經網絡性能優化[J]. 電子技術應用，2023，49(12)：14-19.
英文引用格式： Shi Yucheng，Huang Jianqiang，Bian Haodong，et al. Performance optimization of sparse deep neural network based on GPU[J]. Application of Electronic Technique，2023，49(12)：14-19.

Performance optimization of sparse deep neural network based on GPU

Shi Yucheng，Huang Jianqiang，Bian Haodong，Wu Li，Jia Jinfang，Wang Xiaoying

Department of Computer Technology and Application，Qinghai University，Xining 810016，China

Abstract： With the deepening of neural network layers, the sparse deep neural network has more advantages in computing and storage space, but the performance of the sparse deep neural network still needs to be optimized. Therefore, a performance optimization method based on GPU sparse deep neural network is proposed, which adjusts the order of computation, enhances the reusability of data, and combines the unique structure of GPU with CUDA programming method, performance is further improved by prefetching and other methods. According to GraphChallenge's official data set, it achieved up to 2.5 times the performance acceleration compared to the related cuSPARSE library functions.

Key words : deep neural network；sparsification；heterogeneous platform；sparse matrix-matrix multiplication

0　引言

隨著神經網絡原理性研究的不斷深入以及算力逐步增強，越來越多的深度神經網絡涌現。例如在自然語言處理[1]領域，谷歌提出Transformer[2]模型，其本身對于梯度消失這一難題的解決以及可以進行并行訓練等一系列的優勢，使得大模型愈發火熱，ChatGPT[3]也是在此基礎上訓練得到的。但規模龐大的深度神經網絡對于模型應用的時效性提出了更大的挑戰，由于“存儲墻”[4]和“功耗墻”[5]的存在，稀疏深度神經網絡[6-7]進入研究視野，GPU設備和稀疏深度神經網絡的結合使得訓練速度再邁上一個嶄新的臺階。

本文詳細內容請下載：http://m.jysgc.com/resource/share/2000005799

作者信息

石于誠，黃建強，邊浩東，吳利，賈金芳，王曉英

（青海大學計算機技術與應用系，青海西寧 810016）

原創聲明：此內容為AET網站原創，未經授權禁止轉載。

相關內容