daadaada / turingas Goto Github PK
View Code? Open in Web Editor NEWAssembler for NVIDIA Volta and Turing GPUs
License: MIT License
Assembler for NVIDIA Volta and Turing GPUs
License: MIT License
I was able to successfully install turingas. But when I try to run it with the copy_element example, I get the following error. I wonder if it has something to do with the latest commit.
File "<frozen importlib._bootstrap_external>", line 544, in spec_from_file_location
File "/usr/local/lib/python3.6/dist-packages/turingas-0.1-py3.6.egg/turingas/grammar.py", line 171
'HADD2' : [{'code' : 0x230, 'rule' : rf'HADD2 {rd}, {rs0}, {rs1};'}, 'lat' : 7],
update: it works fine, if I comment out the three half- instructions
I just attended your presentation. Is there anywhere I can find the Winograd code as described in your paper? Much appreciate it.
Like megas, user can insert a code snippet into <SCHEDULER_BLOCK>... </SCHEDULE_BLOCK>
. Then the scheduler will return an optimized instruction sequence. User do not care about the stall count of instructions in schedule blocks. It may be useful when writing sass code.
Hi,
Title says it all..
Project dead for new gpus.. recommended to use in that case
https://github.com/cloudcores/CuAssembler ?
In the paper, you mentioned that the implementation can be ported to fp16 version.
So, have you succeed in implementing fp16 Winograd with tensor-core and beating the performance of the cudnn.
I found that the cudnn doesn't have fp16 Winograd convolution3x3 but only fp16 gemm convolution3x3. I have no idea why Nvidia doesn't implement one.
Hi. I find that there might be a small typo in the grammar.py related to the read & write barrier index.
At line 389, the original code is ctrl_re = r'(?P<ctrl>[0-9a-fA-F\-]{2}:[1-6\-]:[1-6\-]:[\-yY]:[0-9a-fA-F\-])'
, which suggests that the barriers are indexed by 1~6. However, I found that there is no corresponding Wait barrier mask for barrier index 6.
After checking some sass code returned by the cuobjdump, I observed that barrier index 0 is usually used. So maybe its better to replace line 389 with ctrl_re = r'(?P<ctrl>[0-9a-fA-F\-]{2}:[0-5\-]:[0-5\-]:[\-yY]:[0-9a-fA-F\-])'
.
I was wondering how to generate the .sass file by using the disasm.py under the tools directory?
I tried "python disasm.py test.out > test.sass". However, the test.sass can not be used by turingas.
I just read your paper about gemm optimization (IPDPS) and winograd optimization (PPOPP), and I am very interested in the turingas.
But the lack of sample code makes me so hard to start.
So, could you please upload more sample code?
First of all, could you please upload the sample code about LDS and STS shared memory?
Thx.
Hello @daadaada ,
I wanted to write you an email, but couldn't find an address anywhere so I'm filing an issue instead.
I'm the author of eyalroz/cuda-api-wrappers. I'm interested in allowing users of my wrappers library to be able to check which targets cubin files contain code for. Unfortunately, the CUDA Driver API itself does not have API functions to check this.
Now, it seems - especially when leafing through your code here - that cubin's are basically ELF-like. But - they're not properly ELF, in the sense that readelf
refuses to read them and give information about them.
I'm not an ELF expert (quite the opposite), and the code in cubin.py has a lot of magic numbers and I can't make heads and tails of it. I would like to ask for your help in understanding how, programmatically, to open a cubin file and determine which kernels it holds for which targets/architectures. Or at least - what's the format in which this information is stored relative to ELF in general.
Hi Yan,
I found this assembler when I tried to modify and profile some CUDA codes.
Thanks for sharing such an exciting tool.
Is it possible to read a CUBIN file to turingas rather than write SASS codes from scratch?
I want to write a tool to read CUBIN file and convert to TuringAS format SASS file. Is there a document to describe the format of the input of TuringAS like this (https://github.com/daadaada/turingas/blob/master/examples/copy_element/copy_element.sass)?
Do you have any suggestions? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.