drhagen / tensora Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 1.0 900 KB

Sparse/dense tensor library for Python

Home Page: http://tensora.drhagen.com/

License: MIT License

Python 100.00%

tensora's People

Contributors

Stargazers

Watchers

Forkers

amirmolavi

tensora's Issues

Use LLVM as backend

Right now, the compiled binaries are generated by writing the C code to disk, invoking a C compiler on the .c file, and loading the resulting shared object with CFFI. We can avoid the disk by using the LLVM to create the binaries directly. The place to start would be llvmlite, which provides Python binding to the LLVM.

Make output format optional in evaluate

The output_format parameter should be optional in evaluate and default to all dense. This is a common enough choice and we default to all dense other places.

Purge scalar support

With #33 closed, full scalar support is deferred indefinitely. The existing code that partially supports scalars should be removed for now.

Centralize validation of formats against assignment

Currently, all the logic for checking that the formats match the variables in the assignment is done in tensora.function.PureTensorMethod. This code is not run by the CLI. This logic should be centralized into a class tensora.desugar.Problem that encapsulates the desugared assignment and validated format dictionary. Then this logic should be used upstream of code generation for both evaluate and the CLI.

Make Tensora work in Windows

Tensora's use of CFFI does not work in Windows. The following issues need to be fixed:

Cannot use FFI.dlopen(None) to grab a generic C library because there is no generic C library in Windows. There are several possible solutions.
Cannot use free as just a generic function because free is not the same function in all Windows libraries. This could be made a unix-specific feature because it is only used by take_ownership_of_tensor, which is not used internally by Tensora.
Cannot delete the library after loading it because files in use cannot be deleted in Windows

Better error message when solution cannot be found

Right now, we simply fall off the end of the solution generator with a StopIteration. This should be caught and give the user a nice error message when no consistent iteration order can be found.

Partially fuse loops in addition

Right now, loops inside a Sum node are supposed to be fused if they start with same index. However, this only happens if they are adjacent. If a loop with a different index is placed between two fusable loops in the Sum, they are not fused. This is both inefficient and causes a crash (because the variables are redeclared).

Fused:

tensora 'a(i) = b(i) + c(i) + d(j,i)'

Not fused:

tensora 'a(i) = b(i) + c(j,i) + d(i)'

Use a scalar pointer when emitting IR for a scalar bucket

Right now, a point-like bucket, looks like this:

double* restrict bucket_y_0 = y_vals + p_y_0_0;
int32_t i_bucket_y_0 = 0;
while (i_bucket_y_0 < 1) {
  bucket_y_0[i_bucket_y_0] = 0;
  i_bucket_y_0++;
}

// *** Computation of expression ***
bucket_y_0[0] += A_vals[p_A_1_1] * x_vals[p_x_2_0];

This is not wrong, and it is probably all optimized down by LLVM, but it would be fairly simple to emit this instead, which looks a lot nicer:

double* restrict bucket_y_0 = y_vals + p_y_0_0
*bucket_y_0 = 0

// *** Computation of expression ***
*bucket_y_0 += A_vals[p_A_1_1] * x_vals[p_x_2_0];

Don't emit computation when doing assembly

Right now, assembly does the exact same thing as evaluate. It should not emit any of the computation.

Add ability to take ownership of `taco_tensor_t*`

There is already the taco_structure_to_cffi function in compile that takes ownership of the data in indices and vals. If a taco_tensor_t is created in some C code outside Tensora, it would be nice to be able to take ownership of all the data in the structure. This will involve taking ownership of the struct pointer itself, then taking ownership of the dimensions, mode_ordering, and mode_types pointers, and finally calling taco_structure_to_cffi to take ownership of the rest.

A(i,j) = B(I,j) + C(i,k) * D(k,j) all CSR yields empty assemble

The following problem results in an empty assemble function:

tensora 'A(i,j) = B(i,j) + C(i,k) * D(k,j)' -f A:ds -f B:ds -f C:ds -f D:ds -t assemble

Remove `+=` operator

This operator implies in-place mutation, which Tensora does not support.

Mode ordering affects whether or not kernel is generated

This generates a valid kernel:

tensora 'A(i,m,j) = B(i,k)*C(m,j,k)' -f A1:dss -f S1:ds -f D1:sss

While this changing of the mode ordering does not:

tensora 'A(i,m,j) = B(i,k)*C(k,m,j)' -f A1:dss -f S1:ds -f D1:s2s0s1

These kernels should be identical, so the handling of mode ordering is wrong somewhere.

Should be able to add CSR and CSC to make dense

Tensora cannot find a solution for the following problem, when there is really no reason it shouldn't be able to:

tensora 'A(i,j) = B(i,j) + C(i,j)' -f A:dd -f B:ds -f C:d1s0

We may need to change the compiler to generate scratch space, which is then fused with the output.

Add Tensor.to_numpy

A naive implementation could loop over Tensor.items() and assign the entries to an array of zeros. There cannot be duplicate items.

Write sparse input to dense output all at once

Right now, if in a given iteration and the input is sparse while the output is dense, the iteration loop is dense and the loop writes each zero in between the sparse values. This is particularly inefficient when doing compute on an already assembled output. It also complicates the code having to always write to the output even when the input is sparse.

When the output is dense and the input is sparse, Tensora should preallocate the output and the iteration should be sparse. Care needs to be taken when subsequent layers are sparse. This might be less efficient when doing evaluate, but perhaps not depending on how optimized the processor is for writing blocks of zeros to memory. This preallocation only needs to be done during assembly and not during compute.

Make Tensors picklable

CFFI CData objects are not picklable. In order to pickle a Tensor object, it will have to be converted to something that can be pickled, probably by putting dimensions, modes, indices, and vals into a dictionary as the pickle state.

Validate expression inputs

We should validate the inputs to the leaves of tensor expressions so that various errors occur earlier rather than later:

Validate that the input to Integer is non-negative. Alternatively, we could modify the parser to allow negative numbers.
Validate that the input to Float is non-negative finite. Alternatively, we could modify the parser to allow inf and nan and negative numbers.
Validate that the name and index names of Tensor follow this regex [A-Za-z][A-Za-z0-9]*

Addition inside multiplication crashes

Fuzz testing found that a(i) = b(i) * (c(i) + 1) crashes. It appears we do not handle the case where addition happens inside multiplication.

Support scalars appropriately

Scalars in tensor expressions should be interpreted as floats. The Tensora compiler could generate code that accepts doubles, but TACO will not. We could shim the TACO code so that evaluate_taco still accepts floats, but that sounds like a source of ambiguity.

Here are the steps needed to implement this:

Make generate_c_code_tensora generate C code that takes doubles
Make generate_c_code_taco fail on any scalar format
Make PureTensorMethod.__call__ handle inputs and outputs that are scalar arguments. Should allow int and float as inputs. Should return a Python float as an output.

Error on broadcasting to output

It is legal to have an index in the target that is not mentioned in right-hand side of the assignment. For example, A(i,j) = b(i). This is interpreted as a broadcast along that dimension. Unfortunately, there is no way to determine the size of this dimension when using evaluate or TensorMethod. The sizes are normally determined from the sizes of the inputs, which is not possible when broadcasting to output dimension because the output tensor is not an input to those porcelain functions. These should error in TensorMethod, while continuing to be valid kernels to generate.