drhagen / tensora Goto Github PK
View Code? Open in Web Editor NEWSparse/dense tensor library for Python
Home Page: http://tensora.drhagen.com/
License: MIT License
Sparse/dense tensor library for Python
Home Page: http://tensora.drhagen.com/
License: MIT License
Right now, the compiled binaries are generated by writing the C code to disk, invoking a C compiler on the .c
file, and loading the resulting shared object with CFFI. We can avoid the disk by using the LLVM to create the binaries directly. The place to start would be llvmlite, which provides Python binding to the LLVM.
The output_format
parameter should be optional in evaluate
and default to all dense. This is a common enough choice and we default to all dense other places.
With #33 closed, full scalar support is deferred indefinitely. The existing code that partially supports scalars should be removed for now.
Currently, all the logic for checking that the formats match the variables in the assignment is done in tensora.function.PureTensorMethod
. This code is not run by the CLI. This logic should be centralized into a class tensora.desugar.Problem
that encapsulates the desugared assignment and validated format dictionary. Then this logic should be used upstream of code generation for both evaluate
and the CLI.
Tensora's use of CFFI does not work in Windows. The following issues need to be fixed:
FFI.dlopen(None)
to grab a generic C library because there is no generic C library in Windows. There are several possible solutions.free
as just a generic function because free
is not the same function in all Windows libraries. This could be made a unix-specific feature because it is only used by take_ownership_of_tensor
, which is not used internally by Tensora.Right now, we simply fall off the end of the solution generator with a StopIteration
. This should be caught and give the user a nice error message when no consistent iteration order can be found.
Right now, loops inside a Sum
node are supposed to be fused if they start with same index. However, this only happens if they are adjacent. If a loop with a different index is placed between two fusable loops in the Sum
, they are not fused. This is both inefficient and causes a crash (because the variables are redeclared).
Fused:
tensora 'a(i) = b(i) + c(i) + d(j,i)'
Not fused:
tensora 'a(i) = b(i) + c(j,i) + d(i)'
Right now, a point-like bucket, looks like this:
double* restrict bucket_y_0 = y_vals + p_y_0_0;
int32_t i_bucket_y_0 = 0;
while (i_bucket_y_0 < 1) {
bucket_y_0[i_bucket_y_0] = 0;
i_bucket_y_0++;
}
// *** Computation of expression ***
bucket_y_0[0] += A_vals[p_A_1_1] * x_vals[p_x_2_0];
This is not wrong, and it is probably all optimized down by LLVM, but it would be fairly simple to emit this instead, which looks a lot nicer:
double* restrict bucket_y_0 = y_vals + p_y_0_0
*bucket_y_0 = 0
// *** Computation of expression ***
*bucket_y_0 += A_vals[p_A_1_1] * x_vals[p_x_2_0];
Right now, assembly does the exact same thing as evaluate. It should not emit any of the computation.
There is already the taco_structure_to_cffi
function in compile
that takes ownership of the data in indices
and vals
. If a taco_tensor_t
is created in some C code outside Tensora, it would be nice to be able to take ownership of all the data in the structure. This will involve taking ownership of the struct pointer itself, then taking ownership of the dimensions
, mode_ordering
, and mode_types
pointers, and finally calling taco_structure_to_cffi
to take ownership of the rest.
The following problem results in an empty assemble function:
tensora 'A(i,j) = B(i,j) + C(i,k) * D(k,j)' -f A:ds -f B:ds -f C:ds -f D:ds -t assemble
This operator implies in-place mutation, which Tensora does not support.
This generates a valid kernel:
tensora 'A(i,m,j) = B(i,k)*C(m,j,k)' -f A1:dss -f S1:ds -f D1:sss
While this changing of the mode ordering does not:
tensora 'A(i,m,j) = B(i,k)*C(k,m,j)' -f A1:dss -f S1:ds -f D1:s2s0s1
These kernels should be identical, so the handling of mode ordering is wrong somewhere.
Tensora cannot find a solution for the following problem, when there is really no reason it shouldn't be able to:
tensora 'A(i,j) = B(i,j) + C(i,j)' -f A:dd -f B:ds -f C:d1s0
We may need to change the compiler to generate scratch space, which is then fused with the output.
A naive implementation could loop over Tensor.items()
and assign the entries to an array of zeros. There cannot be duplicate items.
Right now, if in a given iteration and the input is sparse while the output is dense, the iteration loop is dense and the loop writes each zero in between the sparse values. This is particularly inefficient when doing compute on an already assembled output. It also complicates the code having to always write to the output even when the input is sparse.
When the output is dense and the input is sparse, Tensora should preallocate the output and the iteration should be sparse. Care needs to be taken when subsequent layers are sparse. This might be less efficient when doing evaluate, but perhaps not depending on how optimized the processor is for writing blocks of zeros to memory. This preallocation only needs to be done during assembly and not during compute.
CFFI CData objects are not picklable. In order to pickle a Tensor
object, it will have to be converted to something that can be pickled, probably by putting dimensions
, modes
, indices
, and vals
into a dictionary as the pickle state.
We should validate the inputs to the leaves of tensor expressions so that various errors occur earlier rather than later:
Integer
is non-negative. Alternatively, we could modify the parser to allow negative numbers.Float
is non-negative finite. Alternatively, we could modify the parser to allow inf
and nan
and negative numbers.Tensor
follow this regex [A-Za-z][A-Za-z0-9]*
Fuzz testing found that a(i) = b(i) * (c(i) + 1)
crashes. It appears we do not handle the case where addition happens inside multiplication.
Scalars in tensor expressions should be interpreted as float
s. The Tensora compiler could generate code that accepts double
s, but TACO will not. We could shim the TACO code so that evaluate_taco
still accepts float
s, but that sounds like a source of ambiguity.
Here are the steps needed to implement this:
generate_c_code_tensora
generate C code that takes double
sgenerate_c_code_taco
fail on any scalar formatPureTensorMethod.__call__
handle inputs and outputs that are scalar arguments. Should allow int
and float
as inputs. Should return a Python float
as an output.It is legal to have an index in the target that is not mentioned in right-hand side of the assignment. For example, A(i,j) = b(i)
. This is interpreted as a broadcast along that dimension. Unfortunately, there is no way to determine the size of this dimension when using evaluate
or TensorMethod
. The sizes are normally determined from the sizes of the inputs, which is not possible when broadcasting to output dimension because the output tensor is not an input to those porcelain functions. These should error in TensorMethod
, while continuing to be valid kernels to generate.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.