Giter VIP home page Giter VIP logo

turingas's Issues

assembler breaks on HADD2

I was able to successfully install turingas. But when I try to run it with the copy_element example, I get the following error. I wonder if it has something to do with the latest commit.

  File "<frozen importlib._bootstrap_external>", line 544, in spec_from_file_location
  File "/usr/local/lib/python3.6/dist-packages/turingas-0.1-py3.6.egg/turingas/grammar.py", line 171
    'HADD2' : [{'code' : 0x230, 'rule' : rf'HADD2 {rd}, {rs0}, {rs1};'}, 'lat' : 7],

update: it works fine, if I comment out the three half- instructions

Is Winograd code available?

I just attended your presentation. Is there anywhere I can find the Winograd code as described in your paper? Much appreciate it.

Is there any plan to develop an instruction scheduler for turingas?

Like megas, user can insert a code snippet into <SCHEDULER_BLOCK>... </SCHEDULE_BLOCK>. Then the scheduler will return an optimized instruction sequence. User do not care about the stall count of instructions in schedule blocks. It may be useful when writing sass code.

fp16 winograd

In the paper, you mentioned that the implementation can be ported to fp16 version.
So, have you succeed in implementing fp16 Winograd with tensor-core and beating the performance of the cudnn.

I found that the cudnn doesn't have fp16 Winograd convolution3x3 but only fp16 gemm convolution3x3. I have no idea why Nvidia doesn't implement one.

Read barrier index and Write barrier index should be 0~5 rather than 1~6

Hi. I find that there might be a small typo in the grammar.py related to the read & write barrier index.

At line 389, the original code is ctrl_re = r'(?P<ctrl>[0-9a-fA-F\-]{2}:[1-6\-]:[1-6\-]:[\-yY]:[0-9a-fA-F\-])', which suggests that the barriers are indexed by 1~6. However, I found that there is no corresponding Wait barrier mask for barrier index 6.

After checking some sass code returned by the cuobjdump, I observed that barrier index 0 is usually used. So maybe its better to replace line 389 with ctrl_re = r'(?P<ctrl>[0-9a-fA-F\-]{2}:[0-5\-]:[0-5\-]:[\-yY]:[0-9a-fA-F\-])'.

How to generate the .sass file?

I was wondering how to generate the .sass file by using the disasm.py under the tools directory?
I tried "python disasm.py test.out > test.sass". However, the test.sass can not be used by turingas.

Shared Memory Sample Code

I just read your paper about gemm optimization (IPDPS) and winograd optimization (PPOPP), and I am very interested in the turingas.
But the lack of sample code makes me so hard to start.

So, could you please upload more sample code?
First of all, could you please upload the sample code about LDS and STS shared memory?

Thx.

Request for help regarding the cubin format

Hello @daadaada ,

I wanted to write you an email, but couldn't find an address anywhere so I'm filing an issue instead.

I'm the author of eyalroz/cuda-api-wrappers. I'm interested in allowing users of my wrappers library to be able to check which targets cubin files contain code for. Unfortunately, the CUDA Driver API itself does not have API functions to check this.

Now, it seems - especially when leafing through your code here - that cubin's are basically ELF-like. But - they're not properly ELF, in the sense that readelf refuses to read them and give information about them.

I'm not an ELF expert (quite the opposite), and the code in cubin.py has a lot of magic numbers and I can't make heads and tails of it. I would like to ask for your help in understanding how, programmatically, to open a cubin file and determine which kernels it holds for which targets/architectures. Or at least - what's the format in which this information is stored relative to ELF in general.

Is it possible to read sass from cubin/elf?

Hi Yan,
I found this assembler when I tried to modify and profile some CUDA codes.
Thanks for sharing such an exciting tool.

Is it possible to read a CUBIN file to turingas rather than write SASS codes from scratch?

I want to write a tool to read CUBIN file and convert to TuringAS format SASS file. Is there a document to describe the format of the input of TuringAS like this (https://github.com/daadaada/turingas/blob/master/examples/copy_element/copy_element.sass)?

Do you have any suggestions? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.