zaccharieramzi / tfkbnufft Goto Github PK

View Code? Open in Web Editor NEW

28.0 3.0 4.0 6.43 MB

A robust, easy-to-deploy non-uniform Fast Fourier Transform in TensorFlow.

License: MIT License

Shell 0.68% Python 99.32%

tensorflow non-uniform-fourier mri nfft nufft

tfkbnufft's Introduction

TF KB-NUFFT

GitHub |

Simple installation from pypi:

pip install tfkbnufft

About

This package is a verly early-stage and modest adaptation to TensorFlow of the torchkbnufft package written by Matthew Muckley for PyTorch. Please cite his work appropriately if you use this package.

Computation speed

The computation speeds are given in seconds, for a 256x256 image with a spokelength of 512 and 405 spokes. These numbers are not to be directly compared to those of torchkbnufft, since the computation is not the same. They are just to give a sense of the time required for computation.

Operation	CPU	GPU
Forward NUFFT	0.1676	0.0626
Adjoint NUFFT	0.7005	0.0635

To obtain these numbers for your machine, run the following commands, after installing this package:

pip install scikit-image Pillow
python profile_tfkbnufft.py

These numbers were obtained with a Quadro P5000.

Gradients

w.r.t trajectory

This is experimental currently and is WIP. Please be cautious. Currently this is tested in CI against results from NDFT, but clear mathematical backing to some aspects are still being understood for applying the chain rule.

References

Fessler, J. A., & Sutton, B. P. (2003). Nonuniform fast Fourier transforms using min-max interpolation. IEEE transactions on signal processing, 51(2), 560-574.
Beatty, P. J., Nishimura, D. G., & Pauly, J. M. (2005). Rapid gridding reconstruction with a minimal oversampling ratio. IEEE transactions on medical imaging, 24(6), 799-808.
Feichtinger, H. G., Gr, K., & Strohmer, T. (1995). Efficient numerical methods in non-uniform sampling theory. Numerische Mathematik, 69(4), 423-440.

Citation

If you want to cite the package, you can use any of the following:

@conference{muckley:20:tah,
  author = {M. J. Muckley and R. Stern and T. Murrell and F. Knoll},
  title = {{TorchKbNufft}: A High-Level, Hardware-Agnostic Non-Uniform Fast Fourier Transform},
  booktitle = {ISMRM Workshop on Data Sampling \& Image Reconstruction},
  year = 2020
}

@misc{Muckley2019,
  author = {Muckley, M.J. et al.},
  title = {Torch KB-NUFFT},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/mmuckley/torchkbnufft}}
}

tfkbnufft's People

Contributors

Stargazers

Watchers

Forkers

chaithyagr deepdumbo neurospin-projects rixez

tfkbnufft's Issues

Enhancing CPU performance with tf map

Currently, the FFT from Tensorflow suffers from slowness on CPU.

However, this trick may help in reducing that.

It is needed, for example when running the NUFFT as part of an input pipeline (therefore on CPU).

Basic KB-NUFFT Example_Error

Hello, thank you so much for your code.
I have an issue when I try to run the example in the Notebook.
Even if it seems to run well from the printed execution output when I try to run it on my own,
I face the following error:

Thank you.

tfkbnufft not working in graph mode for tf 2.2

With tf 2.2 in tests I get the following error:

E         ValueError: in user code:
E         
E             /home/zaccharie/workspace/tfkbnufft/tfkbnufft/nufft/interp_functions.py:211 kbinterp  *
E                 for b in tf.range(tf.shape(x)[0]):
E             /home/zaccharie/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py:344 for_stmt
E                 iter_, extra_test, body, get_state, set_state, symbol_names, opts)
E             /home/zaccharie/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py:532 _tf_range_for_stmt
E                 opts)
E             /home/zaccharie/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py:862 _tf_while_stmt
E                 _verify_loop_init_vars(init_vars, symbol_names)
E             /home/zaccharie/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow/python/autograph/operators/control_flow.py:111 _verify_loop_init_vars
E                 raise ValueError('"{}" may not be None before the loop.'.format(name))
E         
E             ValueError: "params[dims]" may not be None before the loop.

This doesn't happen in tf 2.1 and doesn't happen when eager execution for tf.functions is turned on.

Gradient of ops are causing weird tf issue

The issue I get when trying to obtain the gradients from the tfkbnufft ops is this error, only happening on GPU:

020-06-17 17:37:06.446898: W tensorflow/core/grappler/utils/graph_view.cc:832] No registered 'BroadcastTo' OpKernel for GPU devices compatible with node {{node add_2}}
	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"
	.  Registered:  device='XLA_GPU'; Tidx in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, ..., DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_CPU'; Tidx in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, ..., DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_CPU_JIT'; Tidx in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, ..., DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_GPU_JIT'; Tidx in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, ..., DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='CPU'; T in [DT_VARIANT]
  device='CPU'; T in [DT_RESOURCE]
  device='CPU'; T in [DT_STRING]
  device='CPU'; T in [DT_BOOL]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_INT64]
  device='GPU'; T in [DT_INT32]
  device='GPU'; T in [DT_COMPLEX128]
  device='GPU'; T in [DT_COMPLEX64]
  device='GPU'; T in [DT_BOOL]
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

[URGENT] Test fails but CI reports a pass

CI reports a pass while the pytest actually reported a fail. This could miss some crucial issues.
An example for this is :

https://github.com/zaccharieramzi/tfkbnufft/pull/38/checks?sha=4b18de94d6b40e14ce1f7e14e3e80e5c5da7052b

We may need to move to setting up a test CI which is stronger.

Have Implementation for gradients with respect to k-space locations

Currently, the gradients only occur with respect to image or kspace data. However, going forward, we may want to have implementations for the gradients with respect to the kspace-locations.
Further, there is a recent approximation which is supposed to be faster and gives more accurate results than using just autodiff.
Please find the reference here: https://arxiv.org/pdf/2101.11369v1.pdf

Benchmark speed

A benchmark of the speed on CPU and on GPU would be welcome to see if the performance matches that of torchkbnufft.

Some performance loss on CPU might be explainable by this fft issue in tensorflow.

Density compensation

We might want to provide a way to perform density compensation.
I also want to have the density compensation component potentially learnable in some sense.

Probably the best place for this component is right after the gridding, because you don't need to do interpolation and gathering of correct coefficients.

We might also want to have a parameter setting the precision of the density compensation.

Correct tensor shapes in the documentation

Right now the tensor shapes are still the old ones from pytorch, featuring the real/imag dimension which is not needed in tf anymore.

This should go to avoid confusion.

scaling/weird results

I have been doing some further tests, and I notice the scaling is maybe off or (more likely) I am implementing something wrong? I took radial k-space data, transformed it to image space, then transformed it back to k-space. The maximum of the absolute value of the k-space guess compared to my original k-space data is 2 orders of magnitude off. I also am having trouble with simple gradient descent experiments where I fail to get convergence. This is with norm set to ortho, results are worse with norm turned off. Any idea what might be going wrong?

Correct example notebook

In particular, no density compensation is currently available.

Update publish action

Last release gave the following warning:

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2, actions/setup-python@v2. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

Handle multiple batches in a more generic way

Currently the way we handle multiple batches is to send repeated trajectories as input.

This is memory consuming and also might cause degradation in performance as deep inside we use tf.map_fn which runs with O(elem.shape[0]) So effectively, by sending reoeated trajectory, we seem to not be in great place. (Although on a contrary, using parallel_iterations will help us here, but mostly not by a lot. I still need to do some simple benchmark tests.).

However, we still need support for multiple trajectory for distributed training and training in a reconstruction pipeline that runs on multiple trajectories.

We can still come to a generic solution which can handle batches better even if we dont send repeated trajectories. That way, we can carry out the computations more easily in a vectorized fashion!

Tasks :

I will add a colab notebook here, to actually understand if there is any issues and what is the impact.
If we have a strong impact, we can work in such a way that we can work across coils and batch dimension internally as long as the trajectory is the same.

graph mode error when using kbinterp

When using kbinterp for example in a dataset pipeline, we get the following error:

tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph is disabled in this function. Try decorating it directly with @tf.function.

This is because of for b in tf.range(tf.shape(x)[0]) or for b in tf.range(tf.shape(y)[0]) in kbinterp and adjkbinterp.

Probably using tf.while_loop will solve this.

Use the `scatter_nd_*` trick only for tf versions under 2.4

Following the resolution of tensorflow/tensorflow#40672, the trick currently implemented to solve the inefficiency of scatter_nd_* is actually making it slower.

Maybe a simple test on the tf version is enough.

Explicit examples

The examples are right now a bit hard to find.
We should have 2 links in the readme:

to the notebooks for example eager usage
to fastmri-reproducible-benchmark for in model usage

Add better testing for decompensator estimation

Currently, the estimator for decompensator is only checked for a good run and not for any semantics.
We can ideally generate a VDS trajectory and ensure that the density compensated adjoint is close to the gridding solution (at least without scaling)

I will try to get at this in free time.

Allow dynamically shaped input

To handle fastMRI data, tfkbnufft needs to be able to handle dyanmically shaped input.

Currently, this is not doable because you have to pass in the image shape to KbNufftModule.

What we would need is to see what is affected by potentially having the shape only at inference time.
Of course there should still be a way to specify the shape beforehand for a computation speedup.

function retracing in tensorflow 2.3 with forward nufft

First, thank you so much for this implementation. I have been getting the following warning with the forward nufft in tf 2.3:

WARNING:tensorflow:11 out of the last 11 calls to <tensorflow.python.ops.custom_gradient.Bind object at 0x7fb3044ab450> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Have you been getting this warning when you run this? I am happy to try and debug it myself, but I need to figure out where in the code the function call is happening.

See here for example:https://github.com/mmuckley/torchkbnufft/blob/master/build_package.sh

Remove old codes of density compensator estimation

Adding @zaccharieramzi to do this as you are using this in some codes. When you are comfortable, feel free to remove the codes.
The old codes work only for radial trajectories and arent the best.