Comments (8)
整体架构
1.这是一个在gme5的risc-v cpu模型上拓展指令集,拓展了一个紧耦合的线性代数加速器。
2.LACfg用于用于获取标量数据,以及配置 INPUT/OUTPUT LAMemUnit的的取数模式(取数是从LACache中取数,不使用DMA)
3.LAExecUnit 是用来计算的部件。利用fifo做到运算-访问存分离。输入是三输入A B C 支持如下多种运算:
4.Private Scratchpad 用于保存中间结果。
from papernotes.
提到了与gem5-aladdin区别
from papernotes.
提到了与gem5-aladdin区别
from papernotes.
与向量处理器区别
from papernotes.
支持三种输出模式,其中多流输出比较有趣
前两种很常见, 第三种可以冲矩阵乘法运算中抽象出来:一行要和多列做乘加,那么可以发那以后重复多次并结成很长一行向量,再将多列拼接成很长的向量,这样就符合上面所说的多流输出。例子见下面:
from papernotes.
取数配置寄存器组 和 取数单元 和 计算单元 (对应其自定义的LAcore ISA)
0.LAcore ISA其实在ricvc-v的ISA上拓展新增一些自己定制的指令
1.LACfg是取数配置寄存器。代表了访存模式(source ,dest, stride, skip, count) 由配置指令配置。
2.LAExecUnit是计算单元,由计算指令驱动,计算指令指定相应的LACfg号(代表输入输出),用以驱动LAMemUnit从对应的LACfg中读取访存信息,取数填入fifo,计算单元同时从fifo取数(可以exe/mem分离)
3.除此之外还有数据移动指令。
from papernotes.
LCache的优化
疑问:cache居然可以一次同时从一个cache line取多个数?
from papernotes.
StratchPad取数优化
from papernotes.
Related Issues (20)
- Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach (MICRO 2019)
- An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware(MICRO 2018) HOT 6
- Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling(MICRO2018) HOT 5
- PuDianNao: A Polyvalent Machine Learning Accelerator HOT 1
- ShiDianNao: Shifting vision processing closer to the sensor(ISCA15) HOT 6
- GANAX: A UnifiedMIMD-SIMD Acceleration for Generative Adversarial Networks HOT 1
- Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin(MICRO2016) HOT 3
- Scalable LLVM-Based Accelerator Modeling in gem5 HOT 2
- The Hwacha Microarchitecture Manual, Version 3.8.1 HOT 1
- XLOOP: Architectural Specialization for Inter-Iteration Loop Dependence Patterns(MICRO2014) HOT 5
- PyMTLv3:Mamba: Closing the Performance Gap in Productive Hardware Development Frameworks (DAC) HOT 3
- FireSim:FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud (ISCA2018) HOT 1
- Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim EMC2’19
- Centrifuge: Evaluating full-system HLS-generated heterogenous-accelerator SoCs using FPGA-Acceleration ICCAD’19
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks HOT 2
- Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures HOT 1
- Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
- Stream-Dataflow Acceleration ISCA
- Automatic Code Generation for Rocket Chip RoCC Accelerators
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from papernotes.