Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Yeah, install rust and cargo first with a tool like <a href="https://rustup.rs" rel="n

Here is an initial performance graph <a target="_blank" rel="noopene

Slow execution of the generator,about flaport/inverse_design

Comments (37)

flaport commented on August 26, 2024 3

I went a little bit crazy today and re-wrote the generator in rust (using arrayfire). It should be a drop-in replacement (after compilation). It's about 10 times faster for a 30x30 grid but gets progressively faster (relatively speaking) the larger the grid becomes.

code-wise it's still highly unoptimized I think (I basically just translated my crappy python code), but it's pretty cool to see an actual substantial performance increase. Feel free to check it out... :)

from inverse_design.

lucasgrjn commented on August 26, 2024 2

Hi!

I gave a look at inverse_design repo this summer after @flaport shared it. Unfortunately, I was very busy with some nanofabrication.
Now, I am more available and determined to make it work. So @Jan-David-Black I hope to be able to help you.

I just finished setting up my env to have Jax-GPU enabled on my desktop. If you want to do some testing, I can already do it. I may also have an idea to speed up the convolutions, I'll do some tests tomorrow.

Regards,
Lucas :)

from inverse_design.

jan-david-fischbach commented on August 26, 2024 2

one way to enforce symmetry is to add your transformed latent matrix with a symmetric version of itself.

I would assume that adding it with its transposed then leads to a symmetry along a diagonal?

from inverse_design.

flaport commented on August 26, 2024 2

I was able to improve the rust algorithm performance even more:

It's nice to see a linear behavior between # pixels and computation time :)

from inverse_design.

flaport commented on August 26, 2024 1

Yeah, for me it took about 10 seconds, but let's be fair... that's still too long.

That's why today I chose to completely rewrite the generator from scratch (still in rust, but without arrayfire). The new version is is 'blazingly fast': 128x128 takes about 200ms.

It's a new implementation which only works in the vicinity of the brush and does not use any convolutions and scales therefore a lot better with the size of the grid.

Moreover, I no longer use arrayfire and hence it should be easier to compile for you :)

The new version was forced pushed to master (sorry if you pulled already!), the old version still is in the rust-v1 branch.

from inverse_design.

flaport commented on August 26, 2024 1

Yeah, install rust and cargo first with a tool like https://rustup.rs. Then you can build the repo with a 'cargo build' when you're inside the rust folder. That build command creates the .so file needed for importing in Python too (it should appear somewhere in the 'target' folder created by cargo).

But once you confirmed cargo works, you can just run a 'make lib' in the root of the repo. That will move the .so file to a more convenient location.

That said, on windows the file rust builds might have a different extension, so we might have to make the make commands more cross platform

from inverse_design.

jan-david-fischbach commented on August 26, 2024 1

I did it exactly the same way 👍

from inverse_design.

flaport commented on August 26, 2024 1

Here is an initial performance graph

I don't like how the algorithm scales with brush size at all, but I'm sure there are tons of optimizations possible still. Not to mention that everything works on a single thread currently. Maybe there are parallelizations possible too.

from inverse_design.

jan-david-fischbach commented on August 26, 2024

For reference, I end up with 7 seconds for 128x128, when using a naive numpy and scipy.ndimage.convolve implementation

from inverse_design.

flaport commented on August 26, 2024

Hey @Jan-David-Black , that's pretty awesome! It's been a while since I worked on this so I don't fully remember the details, but I do remember that that the generator took too long (although I don't think it took 15 min for me). I was gonna give it an other shot another day but clearly that didn't happen...

Would you mind if I replace my implementation by yours in this repo? If you agree, maybe you can add the notebook you linked as a PR for proper attribution?

Anyway, thanks for pointing to a better solution :)

from inverse_design.

jan-david-fischbach commented on August 26, 2024

@flaport I don't quite understand your code in 07_inverse_design.ipynb. You seem to define a forward function however in the optimization scheme it is not used. When I try to naively "plug it in" where it seems to belong I run into issues with autograd and jax not playing nice.

from inverse_design.

flaport commented on August 26, 2024

Yep, this repository is one of my many unfinished projects 😅

from inverse_design.

jan-david-fischbach commented on August 26, 2024

I seem to have gotten it to work: https://jan-david-black.github.io/inverse_design_strict_fabrication/notebooks/inverse_design_local.html It is still quite rough around the edges: Because the outer regions are just masked away the fabrication constraints are violated in some parts around the design region border. I mostly struggled with differing versions of nbdev :/ The generator speed is still impacting the overall runtime quite heavily, especially as it scales worse than linear with pixel count at the moment (most parts do scale linear at the moment, just some bits missing).

from inverse_design.

flaport commented on August 26, 2024

Yeah, this repository uses nbdev<2. The more recent version is incompatible with the old one...

Thanks for getting it to work! I'll check it out soon :)

Are you using your generator or mine? Maybe if i ever find some time i might rewrite the generator in some faster language

from inverse_design.

jan-david-fischbach commented on August 26, 2024

I am running my generator, which I have extended to use local dilations, wherever easily possible. I believe that Jax might be used to speed up the local dilations using GPU acceleration. I cannot test that at the moment, however, as I do not have a suitable GPU at hand.
I am still struggling a bit with finding resolving touches after free touches have been applied (I think it makes sense to just calculate it once after all free touches, but I have to implement finding the extent for the local dilation still, as it depends on the positions of all free touches.) Apart from that I still have a crappy implementation of the heatmap for selecting the next touch (currently involving a convolution in every step). Here is some timing analysis of the current implementation:

The lower graph is summarized by its sum as "linear_ops" in the upper one. Resolving in the upper graph stands for calculating the resolving touches after free touches. It coincides almost perfectly with the time needed for global dilations. A brush of 9-diameter is used.

from inverse_design.

jan-david-fischbach commented on August 26, 2024

I have published a small package here: https://pypi.org/project/javiche/ that provides a wrapper decorator to easily use ceviche autograd differentiable functions with jax. I would like to make it a dependency going forward, to beautify/clean up the interaction between jax and ceviche in the inverse design notebooks a bit. Would that be ok with you @flaport ?

from inverse_design.

flaport commented on August 26, 2024

Sure! Go ahead :)

from inverse_design.

jan-david-fischbach commented on August 26, 2024

Wow really cool!! Still need to try it out. As far as I can see you still use global dilations, correct? So we might speed it up further using local dilations?

from inverse_design.

jan-david-fischbach commented on August 26, 2024

I have been trying to get your rust implementation to work on my M1 mac. Somehow I can only get the arrayfire binaries to work from within a x86 compiled program. Therefore I have to run it behind rosetta, which I believe plummets the performance in this case: For an exemplary 128x128 map the rust generator takes approximately 77s. Could you maybe report on the time it takes on your system?

from inverse_design.

lucasgrjn commented on August 26, 2024

Awesome!

I actually have never used Rust. To give it a try, I simply need to use a make build within the Rust part of the repo ?

from inverse_design.

jan-david-fischbach commented on August 26, 2024

How about using maturin?

from inverse_design.

lucasgrjn commented on August 26, 2024

I took some time to really dig deeply into your code @flaport.

I just have a question about the Python part which can be a bottleneck:

inverse_design/inverse_design/design.py

Lines 97 to 109 in 5abf24d

 # Internal Cell 

 @jax.jit 

 def _find_free_touches(touches_mask, pixels_mask, brush): 

 r = jnp.zeros_like(touches_mask, dtype=bool) 

 m, n = r.shape 

 i, j = jnp.arange(m), jnp.arange(n) 

 I, J = [idxs.ravel() for idxs in jnp.meshgrid(i, j)] 

 K = jnp.arange(m * n) 

 R = jnp.broadcast_to(r[None, :, :], (m * n, m, n)).at[K, I, J].set(True) 

 Rb = batch_conv2d(R, brush[None]) | pixels_mask 

 free_idxs = (Rb == pixels_mask).all((1, 2)) 

 free_touches_mask = jnp.where(free_idxs[:, None, None], R, 0).sum(0, dtype=bool) 

 return free_touches_mask ^ touches_mask

Why do you not use a bool mask instead of this function ?
I am asking this just in case but, since you were able to cut the time so drastically, I dont think it will be useful. (Except if you use a similar function in Rust !
(For this part, I also take a look. But... I am not really familiar with this language, so, I prefer not to comment this part.)

from inverse_design.

flaport commented on August 26, 2024

How about using maturin?

Cool! I will have to look into this :)

Why do you not use a bool mask instead of this function ?

Yeah... that function is horrible. I even forgot what I was doing there...

I think (apart from that function) the python code is an ok implementation if you want to understand what's going on but the implementation is pretty horrible. Too many global convolutions (dilations) and re-calculations.

The new implementation is much better... it does not re-compute anything and when it needs to dilate anything it stays close to the brush touch (I think that's what @Jan-David-Black meant with local dilations?). The new implementation is much better.

That said, I understand it's a lot more difficult to understand what's going on in an unfamiliar language, but rust is pretty cool... re-writing parts of existing python code in Rust is my preferred way to learn it better :)

from inverse_design.

jan-david-fischbach commented on August 26, 2024

The new implementation is much better... it does not re-compute anything and when it needs to dilate anything it stays close to the brush touch (I think that's what @Jan-David-Black meant with local dilations?). The new implementation is much better.

Jep exactly
It is also what I had started in my "local generator" in python which takes ~2s for 128x128 even though i still use global dilations in some places...

I got your new rust-generator to work much more easily (no more arrayfire 🎉). I'll open a pull request to share my maturin configuration, which also makes adding paths and copying around files obsolete :)

from inverse_design.

flaport commented on August 26, 2024

I got your new rust-generator to work much more easily (no more arrayfire tada). I'll open a pull request to share my maturin configuration, which also makes adding paths and copying around files obsolete :)

Awesome!

One caveat is that the current version of my algorithm scales pretty badly with brush size. But at least it's better than scaling with the grid size I think.

from inverse_design.

lucasgrjn commented on August 26, 2024

Maybe we could make a small benchmark to see the evolution as function of the size (object and brush ?)
TBH really need speed improvements! You rock :)
It cost me less than 20s for a latent design generation of Figure.6 size (640px with a circular_brush of 10px)

That said, I understand it's a lot more difficult to understand what's going on in an unfamiliar language, but rust is pretty cool... re-writing parts of existing python code in Rust is my preferred way to learn it better :)

Learning Rust was on my TODO list but you just underlined me how it is cool!

from inverse_design.

jan-david-fischbach commented on August 26, 2024

I have also added a notebook that uses ceviche_challenges in the optimization loop here
So far the optimization is quite bad...
Two things we might add to the generator:

Initial touches: I use a set of solid and void touches at the start of the generation process to ensure the fabrication constraints are also obeyed at the borders and waveguide ports (e.g. here)
Symmetry constraints: I still need to figure out how to do this best

from inverse_design.

lucasgrjn commented on August 26, 2024

@flaport if you have some spare times: do you plane to add some comments on you Rust code to explain the main idea of the different parts ? (I think you use a big brush and then a very brush on the voxel of the big brush).
-> It is more to understand the algorithm and its possible caveats rather than the syntax

from inverse_design.

flaport commented on August 26, 2024

Yes, I'll try to add some comments soon.

To answer your question related to the brushes:

the brush is used to dilute touches
the big brush (the brush convolved with itself) is used to find invalid regions
the very big brush (ideally the big brush convolved by the brush) is used as a search area for free and resolving touches.

from inverse_design.

lucasgrjn commented on August 26, 2024

I have also added a notebook that uses ceviche_challenges in the optimization loop here So far the optimization is quite bad... Two things we might add to the generator:

I will try to make an example tomorrow also using ceviche_challenges.

* Initial touches: I use a set of solid and void touches at the start of the generation process to ensure the fabrication constraints are also obeyed at the borders and waveguide ports ([e.g. here](https://jan-david-black.github.io/inverse_design_strict_fabrication/notebooks/inverse_design_local.html))

* Symmetry constraints: I still need to figure out how to do this best

Cool to see you arleady done what you explain me Friday !
For the simmetries, it wont be easy...

The idea to force touch on the symmetry boundary wont work since it will avoid or force one type...
With a symmetry an half brush size on the boundary will be allowed but it wont be authorized on a half part optimization...

from inverse_design.

lucasgrjn commented on August 26, 2024

Thanks for the explanations !

* the very big brush (ideally the big brush convolved by the brush) is used as a search area for free and resolving touches.

This is the point where I think it may be possible to make use of a mask, I will try to investigate.

from inverse_design.

jan-david-fischbach commented on August 26, 2024

I currently have the following problem. Maybe you know a quick fix:
My fork has diverged from the current state of this repo quite a bit. I would like to make a pull request for some small changes to incorporate maturin. Unfortunately I cannot just make a second clean fork... What do I do?

from inverse_design.

flaport commented on August 26, 2024

Yeah, that was my mistake. This is a public repo and I treated it as one of my private ones by force pushing something.

maybe just make the PR anyway and tell me which files you're interested in. I might try to solve it that way

from inverse_design.

flaport commented on August 26, 2024

Symmetry constraints: I still need to figure out how to do this best

one way to enforce symmetry is to add your transformed latent matrix with a symmetric version of itself.

For example using this

latent_t = latent_t + latent_t[::-1]

yields the following symmetric design:

from inverse_design.

lucasgrjn commented on August 26, 2024

Symmetry constraints: I still need to figure out how to do this best

one way to enforce symmetry is to add your transformed latent matrix with a symmetric version of itself.

I fully agree but in this case, we wont gain any time unfortunately

from inverse_design.

jan-david-fischbach commented on August 26, 2024

As we have a quite fast generator implementation by now I will close this Issue.
Feel free to open other issues for the other suggestions made here.

from inverse_design.

lucasgrjn commented on August 26, 2024

Very nice improvements !

from inverse_design.

Slow execution of the generator about inverse_design HOT 37 CLOSED

Comments (37)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	# Internal Cell
	@jax.jit
	def _find_free_touches(touches_mask, pixels_mask, brush):
	r = jnp.zeros_like(touches_mask, dtype=bool)
	m, n = r.shape
	i, j = jnp.arange(m), jnp.arange(n)
	I, J = [idxs.ravel() for idxs in jnp.meshgrid(i, j)]
	K = jnp.arange(m * n)
	R = jnp.broadcast_to(r[None, :, :], (m * n, m, n)).at[K, I, J].set(True)
	Rb = batch_conv2d(R, brush[None]) \| pixels_mask
	free_idxs = (Rb == pixels_mask).all((1, 2))
	free_touches_mask = jnp.where(free_idxs[:, None, None], R, 0).sum(0, dtype=bool)
	return free_touches_mask ^ touches_mask