Comments (6)
中科院说的深度学习指令集diannaoyu到底是什么? ShiDianNao部分- 杨军的回答 - 知乎
from papernotes.
from papernotes.
1. 片上缓存设计原理
根据CNN计算任务的特点(一组input neurons通过一层数学计算,生成一组output neurons),设计了三类SRAM存储:
NBin:存取input neurons。
NBout:存放输出output neurons。
SB:存放完整的模型参数。
在这三类存储中,SB要求能够hold住模型的全部参数,而NBin/NBout要求能够hold住神经网络一个layer的完整input/output neurons。原因是因为,模型参数会被反复使用,所以需要放在SRAM里以减少从DRAM里加载模型参数的时间开销,而作为CNN模型输入数据的一张特定的图片/视频帧的raw data被模型处理完毕后不会被反复使用,所以只需要确保每个神经层计算过程中所需的input/output neurons都hold在SRAM里,就足以满足性能要求。
作者:杨军
链接:https://www.zhihu.com/question/41216802/answer/124409366
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
from papernotes.
2. NFU的PE阵列设计里,值得一提的是对Inter-PE data propagation的支持。
一个NFU(Neural Functional Unit),这是一个由若干个PE组成的计算阵列。每个PE内部由一个乘法器、一个加法器、若干个寄存器、两组用于在PE阵列水平/垂直方向进行数据交互的FIFO以及一些辅助的控制逻辑组成
NFU的计算结果会输出到一个ALU,通过ALU最终写入到NBOut里。ALU里实现了一些并行度要求不那么高的运算支持,比如average pooling会用到的除法操作,以及非线性激活函数的硬件实现等。其中非线性激活函数的实现,使用了分段函数进行插值近似[8],以求在精度损失较小的情况下,获取功耗和性能的收益。
引入这层支持的考虑是减少NFU与SRAM的数据通讯量。我们回顾一下卷积层Feature Map的计算细节,会注意到同一个feature map里不同的output neuron,在stride没有超过kernel size的前提下,其输入数据存在一定的overlap,这实际上就是Inter-PE data propagation的引入动机,通过将不同的output neuron之间overlap的那部分input neuron直接在PE之间进行传播,从而减少访问SRAM的频次,可以在性能和功耗上都获得一定的收益。这个收益,会随着卷积核尺寸的增加而变得更加明显。
作者:杨军
链接:https://www.zhihu.com/question/41216802/answer/124409366
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
从图中可以看的出, 一个PE算的是一个3*3卷积的最终结果,也就是一系列乘加运算结果。这里例子有四个PE,所以最后四个PE的结果是那四种不同颜色区域的四个卷积运算结果。 这种方式和TPU中的脉动整列中的PE类似,算的都是一系列乘加运算的结果,而不是四个PE算四个乘法,最后将其中三个PE的乘法结果累加到一个PE上
from papernotes.
3. buffer controler 提供多种模式的数据访问
(a)/(b)/(e)模式主要用于为卷积层提供数据读取,读取的每个input neuron会对应于一个output neuron(注意:在ShiDianNao里,这些已经通过Buffer Controller读取到PE中作为输入的input neuron接下来会通过Inter-PE data propagation的机制进行传递,从而节省了SRAM的访问带宽),其中(e)对应于卷积核step size > 1的情形。(d)对应于全连接层,读取一个input neuron,会用作多个output neuron的输入。
from papernotes.
4.结合神经网络不同层功能分类,提出两层状态机twolevel Hierarchical Finite State Machine (HFSM)
-
每一个一级状态 需要一条61-bits的指令在制定, 这一条指令携带了控制信号了必要参数信息,译码执行后会变成多个cycles来执行其对应的二级状态间的跳转。
-
一次神经网络训练需要在多次一级状态间仿佛跳转很多次
from papernotes.
Related Issues (20)
- PuDianNao: A Polyvalent Machine Learning Accelerator HOT 1
- GANAX: A UnifiedMIMD-SIMD Acceleration for Generative Adversarial Networks HOT 1
- Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin(MICRO2016) HOT 3
- Scalable LLVM-Based Accelerator Modeling in gem5 HOT 2
- The Hwacha Microarchitecture Manual, Version 3.8.1 HOT 1
- LACore: A Large-Format Vector Accelerator for Linear Algebra Applications(2017) HOT 8
- XLOOP: Architectural Specialization for Inter-Iteration Loop Dependence Patterns(MICRO2014) HOT 5
- PyMTLv3:Mamba: Closing the Performance Gap in Productive Hardware Development Frameworks (DAC) HOT 3
- FireSim:FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud (ISCA2018) HOT 1
- Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim EMC2’19
- Centrifuge: Evaluating full-system HLS-generated heterogenous-accelerator SoCs using FPGA-Acceleration ICCAD’19
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks HOT 2
- Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures HOT 1
- Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
- Stream-Dataflow Acceleration ISCA
- Automatic Code Generation for Rocket Chip RoCC Accelerators
- Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach (MICRO 2019)
- An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware(MICRO 2018) HOT 6
- Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling(MICRO2018) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from papernotes.