spcl / dace Goto Github PK

DaCe - Data Centric Parallel Programming

License: BSD 3-Clause "New" or "Revised" License

Python 78.25% CMake 0.28% C++ 1.89% Cuda 0.93% C 0.03% HTML 0.03% Shell 0.21% MATLAB 0.01% Jupyter Notebook 18.37% Tcl 0.01%

high-performance-computing programming-language cuda fpga high-level-synthesis vivado-hls

dace's People

Contributors

Stargazers

Watchers

Forkers

tbennun cpenny42 and-ivanov targetsm happysky2046 steven-chien gronerl manuelburger orausch phschaad jankleine sancierra fthaler deathyyoung mfkiwl 1c4nfan bazumo thobauma carljohnsen tobiasholenstein andreaskuster komplexon3 huettern amaliujia gibchikafa jnice-81 fenglaichun sscholbe xiacijie zuriscript n2d4 benibenj tehrengruber pombredanne mirjaj lsdace30095 walon1998 embodimentgeniuslm3 i-zaak butterluo eddie-c-davis andramariailies jerryyouxin meshtag twicki floriandeconinck jiela42 alexander-hammett thomasrockhu-codecov shigangli rolihmeynard huldalilja machinelearningsystem paulsc96 nyctophile-1302 pbruneau ndryden slaclab tomhairless ajppp astrotuna201 sarahtr fishiu liinekasak seelevolle manilireb dphpc-hs2022 hugoqnc xinyao1994 etiennebirling berke-ates mcopik c-tc limy77 sajohn-ch thetrident somu5307 ris-bali char-1ee lamyiowce anyin233 bhuvanade nrwahl2 harshvmahawar luo-yihang benweber42 horiamercan doublebiao matteonu oelbert edopao hodelcl com1t parcorelab dspwithaheart nobugeveryday philip-paul-mueller luckyplusten munahaf michaelstarman

dace's Issues

Polybench samples fail to open in DIODE because of an include of a local file

DIODE working with a folder

DIODE should work with more than one file, and be able to be part of a Python workflow with saving files.

Remove "saved projects"
Save SDFGs directly to folder
Save files on ctrl-s
Pane to view files in folder / subfolders
Save pane layout in folder
Use local configuration
View multiple SDFGs? One at a time?
Reload file if changed on disk

Running multiple SDFGs with the same name using CUDA fails with DuplicateDLLError

Problem

When compiling and running multiple SDFGs with the same name from the same Python executable, the SDFG is usually deleted in between (I assume by garbage collection?), which unloads the dynamic libraries.

However, when CUDA is involved, it seems that the SDFGs are not deleted even when they are no longer references by Python, which results in the loaded library to stick around. In practice, this can result in DuplicateDLLError when trying to load a new SDFG using the same name.

Expected behavior

SDFGs should be cleaned up when they are no longer referenced, and it should be possible to run multiple SDFGs with the same name from the same Python executable without explicitly deleting them.

To reproduce

Run test tests/library/blas_dot.py on the library_nodes branch, but change the initialization of dace.SDFG to always have the same name, then run the test including the cuBLAS runs.

Workaround

Name each SDFG differently, or explicitly call del my_sdfg between executions.

Semantically-Accurate Memlets

Memlets should not have subset and other_subset, but src_subset and dst_subset or subset/reindex
Conversion code from relative offset to absolute offsets (a variant of memlet propagation)
Two presentations on the renderer - relative offsets / local view (internal representation) and absolute offsets (current one we present)
- In the local view mode - access nodes have two numbers: the sum of num_accesses going into the node (number of inputs) and the sum of num_accesses going out of the node (outputs)
Connectors should have types and shapes, just like arrays
Include reference nodes (indirect access to arrays of CSR matrices, for instance)

After that:

Transformations should be rewritten to support this new representation
Memlet propagation is only done for the "absolute offset" view

Cyclic dependency when importing library nodes

Describe the bug
When importing DaCe, there's a cyclic dependency in dace.libraries.blas -> dace.library -> dace -> dace.frontend.python.newast -> dace.libraries.blas. This is causing some weird errors in unexpected places, and we need to find a resolution to this.

To Reproduce
No minimal test case has been produced yet.

Compile multiple SDFGs with the same name

Describe the bug
If two different SDFG are loaded inside the same .py or .ibynb script with the same name

sdfg1 = dace.SDFG('unique_name')
x = sdfg1.compile(optimizer=False)
sdfg2 = dace.SDFG('unique_name')
y = sdfg2.compile(optimizer=False)

An error appears:

... dace/codegen/compiler.py:104: UserWarning: Library ... already loaded, renaming file self._library_filename)
Segmentation fault (core dumped)

Reproduce
Sometimes it is required to make more than two SDFG with the same name to reproduce.

test

Upgrade Xilinx backend to support Vitis

We need to upgrade the Xilinx compilation flow to target Vitis, which is the rebranded/repackaged version of SDx/SDAccel. This requires:

Updating hlslib to the newest version, which supports Vitis
Update CMake variable names to reflect changes in the Find-script
Possibly change some compiler flags to reflect the change from xocc to v++

Runtime-defined map range doesn't work

Describe the bug
I am trying to use the input variable (not symbolic value) to define range of map iterations. It doesn't work (probably due to incorrect memlet propagation).

To Reproduce

import dace
import numpy as np

N = dace.symbol('N')

@dace.program
def plus_1(X_in: dace.float32[N], num: dace.float32[1], X_out: dace.float32[N]):
    @dace.map
    def p1(i : _[0:num[0]]):
        x_in << X_in[i]
        x_out >> X_out[i]

        x_out = x_in + 1

X = np.random.rand(10).astype(np.float32)
Y = np.zeros(10)
num = np.zeros(1)
num[0] = 7

plus_1(X_in=X, num=num, X_out=Y, N=10)

print(Y)

It gives an error: KeyError: 'Missing program argument "__p1_e0"'

Expected behavior
First 7 elements of Y filled by non-zero random values from X with added 1 to them.

Screenshots

(another problem: can't show all text on memlets simultaneously, this is why there are two screenshots)

DIODE does not transform in latest master

Transformations broke since the merge of the serialization cleanup

Pane to new window opens infinite "Properties" panes

Numpy interface pass copies of array subsets instead of subsets themselves

Describe the bug
Numpy interface semantics of passing arrays in function arguments doesn't correspond to python semantic.

To Reproduce

import numpy as np
import dace

# dace semantics

M = dace.symbol('M')
K = dace.symbol('K')

@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
    for i, j in dace.map[0:M, 0:K]:
        B[j, i] = A[i, j]
        
@dace.program
def transpose_test_fail(C : dace.float32[20, 20], D : dace.float32[20, 20]):
    sdfg_transpose(C[:], D[:])
    
@dace.program
def transpose_test_success(C : dace.float32[20, 20], D : dace.float32[20, 20]):
    sdfg_transpose(C[:], D)

c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((20, 20), dtype=np.float32)
e = np.zeros((20, 20), dtype=np.float32)

transpose_test_fail(c, d, K=20, M=20)
transpose_test_success(c, e, K=20, M=20)

print('dace 1', np.linalg.norm(c.transpose() - d))
print('dace 2', np.linalg.norm(c.transpose() - e))

# python semantics

c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((20, 20), dtype=np.float32)
e = np.zeros((20, 20), dtype=np.float32)

def transpose(a, b):
    b[:] = a[:].transpose()
    
transpose(c[:], d)
transpose(c[:], e[:])

print('python 1', np.linalg.norm(c.transpose() - d))
print('python 2', np.linalg.norm(c.transpose() - e))

Output

dace 1 11.521441
dace 2 0.0
python 1 0.0
python 2 0.0

Expected output

dace 1 0.0
dace 2 0.0
python 1 0.0
python 2 0.0

Additional context
The possible reason for this problem is that D[:] creates a copy and passes it inside the function sdfg_transpose. It makes difficulties if we want to assign something to the subset of the array, because any subsetting operation (like D[3:7]) will create a copy.

Handle streams in fpga_transform_state

Currently, streams are not properly handled in fpga_transform_state.
This will let the codegeneration phase fails in one tries to convert to FPGA DaCe programs that contain stream (e.g. samples/simple/filter.py).

In the case of transient streams, this could be fixed by changing the storage class (+ I think some changes in sdfg_nesting.py).

Variable shadowing issue after applying FPGA transform in implicit notation

Running this code:

import dace
import numpy as np


n = dace.symbol("n")

@dace.program
def dot(x: dace.float32[n], y: dace.float32[n], result: dace.float32[1]):

    @dace.map(_[0:n])
    def product(i):
        x_in << x[i]
        y_in << y[i]

        result_out >> result(1, lambda a, b: a + b)
        result_out = x_in * y_in

# ----------
# MAIN
# ----------
if __name__== "__main__":
    a = np.array([1,2,3,4,5,6], dtype=np.float32)
    b = np.array([1,2,3,4,5,6], dtype=np.float32)
    c = np.array([0], dtype=np.float32)

    dot_sdfg = dot.to_sdfg()

    dot_sdfg(x=a, y=b, result=c, n=a.shape[0])
    print("Vec a: ", a)
    print("Vec b: ", b)
    print(c)

After applying "FPGATransformSDFG" the tasklet in connector and the inner state source memlet have a name clash i.e. produce a shadowing issue. See also in the attached image of the SDFG generated by the code after applying the FPGA transformation.

Last lines of error output:

  File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 464, in _emit_copy
    "    " + self.memlet_definition(sdfg, memlet, False, vconn),
  File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 975, in memlet_definition
    allow_shadowing=allow_shadowing)
  File "/home/burgerm/dace/dace/codegen/targets/target.py", line 226, in add
    raise dace.codegen.codegen.CodegenError(err_str)
dace.codegen.codegen.CodegenError: Shadowing variable x_in from type DefinedType.Pointer to DefinedType.Scalar

not-strictly-transformed sdfg does not compile

Describe the bug
Compiling this SDFG produces an error.
But first applying strict transformations an then compiling produces the desired output.

To Reproduce
Steps to reproduce the error:

Load SDFG from file k_storage_expanded.zip
sdfg.compile(optimizer="")

Steps to reproduce the workaround:

Load SDFG from file k_storage_expanded.zip
sdfg.apply_strict_transformations()
sdfg.compile(optimizer="")

Expected behavior
Get programs with the same output with and without sdfg.apply_strict_transformations().

Desktop

Ubuntu 18.04.4 LTS
#137

Floor division inside index, makes the compilation fail.

Describe the bug
When floor division (//) instead of division (/) is used inside the index, compilation fails.

To Reproduce
Try to run this program:

@dace.program(dace.float64[N], dace.float64[N])
def floor_div(Input, Output): 
    @dace.map(_[0:N])
    def div(i):
        inp << Input[i//N]
        out >> Output[i]
        out = inp

Python frontend: Names are not passed along to nested SDFGs

To Reproduce
Try to compile:

@dace.program
def linear(x: dace.float32[N, N, N], w: dace.float32[N, N]):
    out = np.ndarray(x.shape, x.dtype)
    for i in dace.map[0:N]:
        out[i] = x[i] @ w
    return out

Compiling this, however, works:

@dace.program
def linear(x: dace.float32[N, N, N], w: dace.float32[N, N]):
    out = np.ndarray(x.shape, x.dtype)
    for i in dace.map[0:N]:
        out[i] = x[i] @ w[:]
    return out

Possible issue: map symbols do not register as locals

In the Python frontend, a global variable with the same name as a map variable will override it.

Immaterial storage does not properly pass on to nested SDFGs

To test, run immaterial_test.py or immaterial_range_test.py without strict transformations.

Broken Jupyter support

Describe the bug
SDFGs no longer render in Jupyter notebooks.

To Reproduce
Steps to reproduce the behavior:

Open Jupyter notebook
Import dace
Write some example and try to evaluate the SDFG.

Expected behavior
The SDFG appears in the notebook.

Screenshots

Invalid types for array subscript with strict transformations disabled

Describe the bug
When I disable strict transformations, generated code doesn't compile.

error: invalid types ‘dace::vec<float, 1> {aka float}[int]’ for array subscript
                 X_out[0] = __tmpout;

To Reproduce
Steps to reproduce the behavior:

Disable strict transformations: automatic_strict_transformations: false in ~/.dace.conf
Execute:

import numpy as np
import dace

N = dace.symbol('N')

@dace.program
def dace_sum(X_in: dace.float32[N], X_out: dace.float32[1]):
    dace.reduce(lambda a, b: a + b, X_in, X_out, identity=0)
    
@dace.program
def dace_max(X_in: dace.float32[N], X_out: dace.float32[1]):
    dace.reduce(lambda a, b: max(a, b), X_in, X_out)

@dace.program
def dace_softmax(X_in : dace.float32[N], X_out : dace.float32[N]):
    
    tmp_max = dace.define_local([1], dtype=dace.float32)
    tmp_sum = dace.define_local([1], dtype=dace.float32)
    
    dace_max(X_in, tmp_max)
        
    @dace.map
    def softmax_tasklet_sub(i : _[0:N]):
        x_in << X_in[i]
        x_max << tmp_max
        x_out >> X_out[i]

        x_out = exp(x_in - x_max)
        
    dace_sum(X_out, tmp_sum)
    
    @dace.map
    def softmax_tasklet_div(i : _[0:N]):
        x_in << X_out[i]
        x_sum << tmp_sum
        x_out >> X_out[i]
        
        x_out = x_in / x_sum

X = np.array([1,2,3,4,5], dtype=np.float32)
Y = np.zeros(X.shape, dtype=np.float32)

dace_softmax(X_in=X, X_out=Y, N=X.shape[0])

Expected behavior
Everything should work as with
automatic_strict_transformations: true in ~/.dace.conf

Renderer improvements break some visualizations

Try to run "tutorials/sdfg_api.ipynb" in jupyter:

Before renderer improvements #58 (f2c358f)

After renderer improvements #58 (5bdc38e)

Make Javascript dependencies submodules

histogram_declarative sample sporadically failing

nest_state_subgraph does not work well with scalars

Memlets that become scalars do not generate proper nested sdfgs. Usually happens with maps of size 1

C++ interpolation is misquoted in the output

Test-case: call_sdfg_test.py

Focus on the 'printf("hello world %f\\n", i)'

dace/tests/call_sdfg_test.py

Lines 1 to 20 in ed882c4

 import dace 

 import numpy as np 

 sdfg = dace.SDFG('internal') 

 sdfg.add_array('inp', [2], dace.float32) 

 state = sdfg.add_state() 

 t = state.add_tasklet('p', {'i'}, set(), 'printf("hello world %f\\n", i)') 

 r = state.add_read('inp') 

 state.add_edge(r, None, t, 'i', dace.Memlet.simple('inp', '1')) 

 @dace.program 

 def caller(A: dace.float32[4]): 

 sdfg(inp=A[1:3]) 

 if __name__ == '__main__': 

 A = np.random.rand(4).astype(np.float32) 

 caller(A) 

 print('Should print', A[2])

This is misgenerated as

[ 25%] Building CXX object CMakeFiles/caller.dir....//dace/.dacecache/caller/src/cpu/caller.cpp.o
....//dace/.dacecache/caller/src/cpu/caller.cpp:21:20: warning: character constant too long for its type
   21 |             printf('hello world %f\n', i);
      |                    ^~~~~~~~~~~~~~~~~~
....//dace/.dacecache/caller/src/cpu/caller.cpp: In function ‘void __program_caller_internal(float*)’:
....//dace/.dacecache/caller/src/cpu/caller.cpp:10:51: warning: ‘new’ of type ‘float’ with extended alignment 64 [-Waligned-new=]
   10 |         float *__tmp0 = new float DACE_ALIGN(64)[2];
      |                                                   ^
....//dace/.dacecache/caller/src/cpu/caller.cpp:10:51: note: uses ‘void* operator new [](std::size_t)’, which does not have an alignment parameter
....//dace/.dacecache/caller/src/cpu/caller.cpp:10:51: note: use ‘-faligned-new’ to enable C++17 over-aligned new support
....//dace/.dacecache/caller/src/cpu/caller.cpp:21:20: error: invalid conversion from ‘int’ to ‘const char*’ [-fpermissive]
   21 |             printf('hello world %f\n', i);
      |                    ^~~~~~~~~~~~~~~~~~
      |                    |
      |                    int
In file included from /usr/include/c++/9.2.0/cstdio:42,
                 from /usr/lib/python3.8/site-packages/dace/codegen/../runtime/include/dace/dace.h:5,
                 from ....//dace/.dacecache/caller/src/cpu/caller.cpp:2:
/usr/include/stdio.h:332:43: note:   initializing argument 1 of ‘int printf(const char*, ...)’
  332 | extern int printf (const char *__restrict __format, ...);
      |                    ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
make[2]: *** [CMakeFiles/caller.dir/build.make:63: CMakeFiles/caller.dir....//dace/.dacecache/caller/src/cpu/caller.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:78: CMakeFiles/caller.dir/all] Error 2
make: *** [Makefile:84: all] Error 2

The " quotes for string became ' quotes for char

On Python 3.8.1 and python-astunparse 1.6.2

Library nodes: Add "Expand node" and "Expand all nodes" buttons to DIODE

Currently DIODE relies on automatic library node expansion to work. The workflow can be improved by having the buttons to expand individual library nodes for further transformation right in the UI. This should be part of the transformation chain as well, so that it can be undone and saved as part of the DIODE workspace.

"Expand" button within library node properties
"Expand all library nodes" button in SDFG properties

WCR appear on CPU instead of GPU after GPUTransformSDFG

Describe the bug
Segfault during illegal memory access from CPU to cudaMalloc allocated memory. Codegen creates code for WCR on CPU instead of GPU.

To Reproduce

Create SDFG with WCR
Apply GPUTransformSDFG
Execute SDFG

Impossible to reuse function with different symbolic instantiations

Issue

import numpy as np
import dace

M = dace.symbol('M')
K = dace.symbol('K')

@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
    for i, j in dace.map[0:M, 0:K]:
        B[j, i] = A[i, j]

@dace.program
def transpose_test(C : dace.float32[20, 20], D : dace.float32[5, 5], E : dace.float32[10, 10]):
    sdfg_transpose(C[0:5,0:5], D)
    sdfg_transpose(C[0:10,0:10], E)
    
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((5, 5), dtype=np.float32)
e = np.zeros((10, 10), dtype=np.float32)

transpose_test(c, d, e, K=???, M=???) # what K and M I should use here?

print(np.linalg.norm(c[0:5,0:5].transpose() - d))
print(np.linalg.norm(c[0:10,0:10].transpose() - e))

Proposed solution 1
Automatical derivation of symbolic values

import numpy as np
import dace

M = dace.symbol('M')
K = dace.symbol('K')

@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
    for i, j in dace.map[0:M, 0:K]:
        B[j, i] = A[i, j]

@dace.program
def transpose_test(C : dace.float32[20, 20], D : dace.float32[5, 5], E : dace.float32[10, 10]):
    sdfg_transpose(C[0:5,0:5], D)
    sdfg_transpose(C[0:10,0:10], E)
    
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((5, 5), dtype=np.float32)
e = np.zeros((10, 10), dtype=np.float32)

transpose_test(c, d, e) # <<< THIS

print(np.linalg.norm(c[0:5,0:5].transpose() - d))
print(np.linalg.norm(c[0:10,0:10].transpose() - e))

Proposed solution 2

import numpy as np
import dace

M = dace.symbol('M')
K = dace.symbol('K')

@dace.program
def sdfg_transpose(A : dace.float32[M, K], B : dace.float32[K, M]):
    for i, j in dace.map[0:M, 0:K]:
        B[j, i] = A[i, j]

@dace.program
def transpose_test(C : dace.float32[20, 20], D : dace.float32[5, 5], E : dace.float32[10, 10]):
    sdfg_transpose(C[0:5,0:5], D, K=5, M=5) # <<< THIS
    sdfg_transpose(C[0:10,0:10], E, K=10, M=10) # <<< THIS
    
c = np.random.rand(20, 20).astype(np.float32)
d = np.zeros((5, 5), dtype=np.float32)
e = np.zeros((10, 10), dtype=np.float32)

transpose_test(c, d, e) 

print(np.linalg.norm(c[0:5,0:5].transpose() - d))
print(np.linalg.norm(c[0:10,0:10].transpose() - e))

SDFG validation fails when loading from file with unexpanded library nodes

Describe the bug
SDFG.from_file calls SDFG.validate, which fails if there are unexpanded library nodes.

To Reproduce
Load an SDFG from file that has unexpanded library nodes.

Expected behavior
This failure should only happen when we're doing/about to do code generation. We need to move this validation somewhere else, or distinguish between the two cases when calling SDFG.validate.

Additional context
Add any other context about the problem here.

sdfg.apply_strict_transformations() transforms compiling sdfg to non-compiling sdfg.

Describe the bug
Applying strict transformations and then compiling this SDFG produces an error.
But only compiling without the strict transformations produces the desired output.

To Reproduce
Steps to reproduce the error:

Load SDFG from file coriolis_expanded.zip
sdfg.apply_strict_transformations()
sdfg.compile(optimizer="")

Steps to reproduce the working equivalent program:

Load SDFG from file coriolis_expanded.zip
sdfg.compile(optimizer="")

Expected behavior
Get programs that produce the same output, with and without sdfg.apply_strict_transformations().

Desktop

Ubuntu 18.04.4 LTS
#137

Error log

-- Configuring done
-- Generating done
-- Build files have been written to: /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/build

Scanning dependencies of target coriolis_stencil
[ 25%] Building CXX object CMakeFiles/coriolis_stencil.dir/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp.o
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp: In function ‘void __program_coriolis_stencil_internal(double*, double*, double*, double*, double*, int, int, int, int)’:
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:17:230: error: no matching function for call to ‘dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(int, int, int)’
                                     auto __v_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (v + ((w + (((8 * (K + 1)) * (u - 1)) * int_ceil(I, 8))) + ((8 * (k + v)) * int_ceil(I, 8))), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
                                                                                                                                                                                                                                      ^
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
                 from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note: candidate: ‘template<class ... Dim> dace::ArrayViewIn<T, DIMS, VECTOR_LEN, NUM_ACCESSES, ALIGNED, OffsetT>::ArrayViewIn(const T*, const Dim& ...)’
         explicit DACE_HDFI ArrayViewIn(T const* ptr, const Dim&... strides) :
                            ^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note:   template argument deduction/substitution failed:
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:17:104: note:   cannot convert ‘(v + ((w + (((8 * (K + 1)) * (u - 1)) * int_ceil<int, int>(I, 8))) + ((8 * (k + v)) * int_ceil<int, int>(I, 8))))’ (type ‘int’) to type ‘const double*’
                                     auto __v_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (v + ((w + (((8 * (K + 1)) * (u - 1)) * int_ceil(I, 8))) + ((8 * (k + v)) * int_ceil(I, 8))), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
                                                                                                      ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
                 from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(const dace::ArrayViewIn<double, 2, 1, 0>&)’
     class ArrayViewIn
           ^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note:   candidate expects 1 argument, 3 provided
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(dace::ArrayViewIn<double, 2, 1, 0>&&)’
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note:   candidate expects 1 argument, 3 provided
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:18:64: error: expected primary-expression before ‘)’ token
                                     auto *v_in = __v_in.ptr<1>();
                                                                ^
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:20:43: warning: unused variable ‘fc_in’ [-Wunused-variable]
                                     auto *fc_in = __fc_in.ptr<1>();
                                           ^~~~~
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:45:230: error: no matching function for call to ‘dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(int, int, int)’
                                     auto __u_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (u + ((((((8 * u) * (K + 1)) * int_ceil(I, 8)) + w) + ((8 * (k + v)) * int_ceil(I, 8))) - 1), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
                                                                                                                                                                                                                                      ^
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
                 from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note: candidate: ‘template<class ... Dim> dace::ArrayViewIn<T, DIMS, VECTOR_LEN, NUM_ACCESSES, ALIGNED, OffsetT>::ArrayViewIn(const T*, const Dim& ...)’
         explicit DACE_HDFI ArrayViewIn(T const* ptr, const Dim&... strides) :
                            ^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:45:28: note:   template argument deduction/substitution failed:
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:45:104: note:   cannot convert ‘(u + ((((((8 * u) * (K + 1)) * int_ceil<int, int>(I, 8)) + w) + ((8 * (k + v)) * int_ceil<int, int>(I, 8))) - 1))’ (type ‘int’) to type ‘const double*’
                                     auto __u_in = dace::ArrayViewIn<double, 2, 1, dace::NA_RUNTIME> (u + ((((((8 * u) * (K + 1)) * int_ceil(I, 8)) + w) + ((8 * (k + v)) * int_ceil(I, 8))) - 1), ((8 * (K + 1)) * int_ceil(I, 8)), 1);
                                                                                                      ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:20,
                 from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(const dace::ArrayViewIn<double, 2, 1, 0>&)’
     class ArrayViewIn
           ^~~~~~~~~~~
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note:   candidate expects 1 argument, 3 provided
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note: candidate: ‘constexpr dace::ArrayViewIn<double, 2, 1, 0>::ArrayViewIn(dace::ArrayViewIn<double, 2, 1, 0>&&)’
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/view.h:24:11: note:   candidate expects 1 argument, 3 provided
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:46:64: error: expected primary-expression before ‘)’ token
                                     auto *u_in = __u_in.ptr<1>();
                                                                ^
/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:48:43: warning: unused variable ‘fc_in’ [-Wunused-variable]
                                     auto *fc_in = __fc_in.ptr<1>();
                                           ^~~~~
In file included from /home/dominic/work/dace/dace/codegen/../runtime/include/dace/dace.h:16,
                 from /home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp:2:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/math.h: At global scope:
/home/dominic/work/dace/dace/codegen/../runtime/include/dace/math.h:108:43: warning: ‘dace::math::pi’ defined but not used [-Wunused-variable]
         static DACE_CONSTEXPR typeless_pi pi{};
                                           ^~
CMakeFiles/coriolis_stencil.dir/build.make:62: recipe for target 'CMakeFiles/coriolis_stencil.dir/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp.o' failed
make[2]: *** [CMakeFiles/coriolis_stencil.dir/home/dominic/work/dawn2dace/.dacecache/coriolis_stencil/src/cpu/coriolis_stencil.cpp.o] Error 1
CMakeFiles/Makefile2:77: recipe for target 'CMakeFiles/coriolis_stencil.dir/all' failed
make[1]: *** [CMakeFiles/coriolis_stencil.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

DIODE broken after serialization cleanup

Describe the bug
DIODE does not interact well with SDFGs following the merge of the serialization_cleanup branch.

Among broken features:

Add new symbols/arrays does not work
Properties of graph elements and transformations do not show up
Symbol types may change during serialization/deserialization and disallow transformation/running (Example: filter.py)

To Reproduce
Steps to reproduce the behavior:

Run DIODE
Open any file (gemm.py, filter.py for the third broken feature)
For each problem:
- Problem 1: In the properties window, type a name and click add symbol / add array. An exception from Python is raised.
- Problem 2: Click any graph element (state, node). No properties show up in the properties pane.
- Problem 3: Run filter.py, apply any transformation and revert. Python exceptions are raised due to a conversion from float to int during serialization/deserialization.

SDFG.name should be a property

Is your feature request related to a problem? Please describe.
Currently SDFG.name is implemented as a Python property, with an advanced setter and validation. This requires it to be handled manually during serialization, which causes weird issues.

Describe the solution you'd like
SDFG.name is used as a property, so it should be one.

Change how FPGA compiler is specified in the DaCe config

Right now, we are specifying the compiler with an executable name (e.g., "xocc"). Instead, we should pass the root folder of the installation (e.g., /opt/Xilinx/Vitis/2019.2). Furthermore, we should default to not setting this in the config, and instead letting the CMake script find the local installation.

Assignment in numpy interface doesn't compile

Describe the bug

error: cannot convert ‘dace::ArrayViewIn<float, 0, 1, 1>’ to ‘float*’ in assignment
         __tmp0 = dace::ArrayViewIn<float, 0, 1, 1> (a + 0);

To Reproduce

import numpy as np
import dace

@dace.program
def foo123(a : dace.float32[2], b : dace.float32[2]):
    b[0] = a[0]
    
A = np.array([1,2], dtype=np.float32)
B = np.array([3,4], dtype=np.float32)

foo123(A, B)

print(A)
print(B)

Diode: No SDFG found

Describe the bug
When opening a Dace program with Diode, the SDFG is not drawn and an error appears "ValueError: No SDFGs found in file. SDFGs are only recognized when @dace.programs or SDFG objects are found in the global scope"

This happens with some of the polybench samples (not all of them)

To Reproduce
Steps to reproduce the behavior:

Start Diode
Open a Dace program such as adi, cholesky or correlation

Some operators in Python don't compile

Describe the bug
Adding operators to element-wise statements, or augmented assignment, fails to run through the frontend.

To Reproduce
Run the program below:

@dace.program
def transpose(A: dace.float32[M, K], B: dace.float32[K, M]):
    for i, j in dace.map[0:M, 0:K]:
        B[j, i] = A[i, j] + 1

Segfaulting program takes DIODE with it

When a program segfaults and crashes when run through DIODE, DIODE also dies.

Instead, DIODE should run the program in a separate process, and realize that the process crashed, and report this to the user.

Python frontend: constants / symbols do not appear in generated code

To Reproduce

@dace.program
def subrange_of_subrange(A: dace.float32[2, 3, 4, 5], B: dace.float32[4]):
    i = 0
    j = 0
    k = 0
    B[:] = A[:, i, :, j][k, :]

i, j, and k do not appear in the generated code

MPI test fails sporadically

When multiple tests are running in Jenkins concurrently, the MPI test can fail sporadically. This shows up as false negatives for commits/pull requests that don't actually contain any new bugs.

/opt/mpich3.2.11/bin/mpirun
Running python3
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/dace_intel_fpga/tests/../tests/immaterial_test.py", line 4, in <module>
    import dace
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/__init__.py", line 4, in <module>
    from .frontend.python.decorators import *
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/frontend/python/decorators.py", line 7, in <module>
    from dace.frontend.python import parser
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/frontend/python/parser.py", line 8, in <module>
    from dace.config import Config
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 266, in <module>
    Config.initialize()
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 89, in initialize
    Config.load()
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 111, in load
    Config._config_metadata['required'])
  File "/var/lib/jenkins/workspace/dace_intel_fpga/dace/config.py", line 24, in _add_defaults
    if k not in config:
TypeError: argument of type 'NoneType' is not iterable

More DIODE client tests and Selenium

We are missing many tests (e.g., running code, transformations) in DIODE, which could be tested in two ways:

Through diode_client, sending HTTP requests to the server
Through the browser (including Javascript), using something like Selenium

Dimensionality mismatch between src/dst subsets

import numpy as np
import dace

@dace.program
def foo123(a : dace.float32[2,3], b : dace.float32[2,3]):
    b[0,:] = a[0,:]
    
A = np.full((2,3), 3, dtype=np.float32)
B = np.full((2,3), 4, dtype=np.float32)

foo123(A, B)

print(A)
print(B)

Error:

InvalidSDFGEdgeError: Dimensionality mismatch between src/dst subsets (at state assign_6_4, edge b[0, 0:3] -> [0:2] (__tmp0:None -> b:None))

Expected output:

If you replace b[0,:] = a[0,:] by b[0] = a[0] everything works as expected.

VSCode: Transformations and UI

      |  source code    |      | xforms | history
files +-----------------+ SDFG +------------------
      |  generated code |      |    properties

Rendering: Minimize edge-crossings when laying out connectors

This is samples/simple/spmv.py, there seem to be unnecessary edge crossings due to the order of connectors.

SMI Integration

Integration of a minimal set of SMI functionalities (p2p communications for the moment being).

Possible solution

Introduce the concept of remote streams

Technical details

If SMI should be used or not, is determined at codegeneration by looking at if remote streams are used

CMake Integration

The use of SMI is detected in the code-generation phase. In this case, proper Make targets are created for favoring compilation/emulation of SMI based programs.

This requires to define a topology file (that contains the mapping program <-> rank) for the sake of emulation. In this first implementation, this is not so meaningful but will be required for full SMI integration

Codegen object

Defined a target_name field, which can be set when returning a codegenobject and we want to have the field initialized

Generated code

For the sake of enabling an easy emulation toolchain, the host generated code will assume the presence of the following attributes:

smi_rank: current rank (int)
smi_num_ranks: total number of ranks (int)
smi_device: device used (int, useful for running on Noctua)

These must be defined by specializing the SDFG.

TODO: this must be cleaned

DuplicateDLLError in jupyter notebook

Describe the bug
If the first run of dace program fails in jupyter notebook, the second run will complain that the shared library is already loaded.

To Reproduce
Steps to reproduce the behavior (see screenshot):

In cell 3 make some mistake: for example, forget arguments K and M
Try to execute cell 4

Expected behavior
Cell 4 should be executed without any problems.

Screenshots

Update semantics w.r.t. symbol namespace

Symbols are now per-SDFG

Demo for running double buffering in Python.

Ask for a demo or usage for running double buffering in Python.

	import dace
	import numpy as np

	sdfg = dace.SDFG('internal')
	sdfg.add_array('inp', [2], dace.float32)
	state = sdfg.add_state()
	t = state.add_tasklet('p', {'i'}, set(), 'printf("hello world %f\\n", i)')
	r = state.add_read('inp')
	state.add_edge(r, None, t, 'i', dace.Memlet.simple('inp', '1'))


	@dace.program
	def caller(A: dace.float32[4]):
	sdfg(inp=A[1:3])


	if __name__ == '__main__':
	A = np.random.rand(4).astype(np.float32)
	caller(A)
	print('Should print', A[2])

spcl / dace Goto Github PK

dace's People

Contributors

Stargazers

Watchers

Forkers

dace's Issues

Problem

Expected behavior

To reproduce

Workaround

Possible solution

Technical details

CMake Integration

Codegen object

Generated code

Recommend Projects

Recommend Topics

Recommend Org