Conference Full-text Available

ExTensor: An Accelerator for Sparse Tensor Algebra

Authors: Kartik Hegde , Hadi Asghari-Moghaddam , Michael Pellauer , Neal Crago , Aamer Jaleel

DOI: 10.1145/3352460.3358275

Keywords:

Description: Generalized tensor algebra is a prime candidate for acceleration via customized ASICs. Modern tensors feature wide range of data sparsity, with the density non-zero elements ranging from 10-6% to 50%. This paper proposes novel approach accelerate kernels based on principle hierarchical elimination computation in presence sparsity. relies rapidly finding intersections---situations where both operands multiplication are non-zero---enabling new fetching mechanisms and avoiding memory latency overheads associated sparse implemented software. We propose ExTensor accelerator, which builds these ideas handling sparsity into hardware enable better bandwidth utilization compute throughput. evaluate several relative industry libraries (Intel MKL) state-of-the-art compilers (TACO). When normalized, we demonstrate an average speedup 3.4×, 1.3×, 2.8×, 24.9×, 2.7× SpMSpM, SpMM, TTV, TTM, SDDMM respectively over server class CPU.

.The resource attribute category is marked as computer automatic recognition, which may not be accurate. You can try clicking the link to view the resource details.

References(54)
Eriko Nurvitadhi, Asit Mishra, Yu Wang, Ganesh Venkatesh, Debbie Marr, Hardware accelerator for analytics of sparse data design, automation, and test in europe. pp. 1616- 1621 ,(2016) , 10.3850/9783981537079_0766
Yu-Hsin Chen, Joel Emer, Vivienne Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks international symposium on computer architecture. ,vol. 44, pp. 367- 379 ,(2016) , 10.1145/3007787.3001177
Rajeev Balasubramonian, Naveen Muralimanohar, Norman P. Jouppi, CACTI 6.0: A Tool to Model Large Caches ,(2009)
Andrzej Cichocki, Namgil Lee, Ivan Oseledets, Anh-Huy Phan, Qibin Zhao, Danilo P. Mandic, Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1. arXiv: Numerical Analysis. ,(2016) , 10.1561/2200000059
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Andreas Moshovos, Cnvlutin: ineffectual-neuron-free deep neural network computing international symposium on computer architecture. ,vol. 44, pp. 1- 13 ,(2016) , 10.1145/3007787.3001138
Seher Acer, Oguz Selvitopi, Cevdet Aykanat, Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems parallel computing. ,vol. 59, pp. 71- 96 ,(2016) , 10.1016/J.PARCO.2016.10.001
Yunji Chen, Tianshi Chen, Shaoli Liu, Qi Guo, Zidong Du, Huiying Lan, Shijin Zhang, Lei Zhang, Ling Li, Cambricon-x: an accelerator for sparse neural networks international symposium on microarchitecture. pp. 1- 12 ,(2016) , 10.5555/3195638.3195662
Asit K. Mishra, Eriko Nurvitadhi, Ganesh Venkatesh, Jonathan Pearce, Debbie Marr, Fine-grained accelerators for sparse machine learning workloads asia and south pacific design automation conference. pp. 635- 640 ,(2017) , 10.1109/ASPDAC.2017.7858395
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, Saman Amarasinghe, The tensor algebra compiler Proceedings of the ACM on Programming Languages. ,vol. 1, pp. 77- ,(2017) , 10.1145/3133901
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Hyun Yoon, In-Datacenter Performance Analysis of a Tensor Processing Unit international symposium on computer architecture. ,vol. 45, pp. 1- 12 ,(2017) , 10.1145/3079856.3080246