Giter VIP home page Giter VIP logo

Comments (18)

lucidrains avatar lucidrains commented on June 6, 2024 1

pulling in @rom1504 because he might know 😂

from retro-pytorch.

ncoop57 avatar ncoop57 commented on June 6, 2024 1

@lucidrains fixed the issue with this PR using the guidance from @rom1504: #16 As he alluded to, we lose the memmaping ability :(. I did find a few more errors after that in the generator function, but I'll open a separate issue to discuss it and fixes.

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024 1

I recommend never using faiss gpu

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024 1

anyway here is the official recommendation https://github.com/facebookresearch/faiss/wiki/Comparing-GPU-vs-CPU

but in practice I've never seen a case where it's beneficial to use GPU for faiss

faiss cpu is very fast

from retro-pytorch.

lucidrains avatar lucidrains commented on June 6, 2024

@ncoop57 ohh interesting, i wonder if this is an issue with autofaiss 🤔

from retro-pytorch.

ncoop57 avatar ncoop57 commented on June 6, 2024

@ncoop57 ohh interesting, i wonder if this is an issue with autofaiss 🤔

@lucidrains looked into issues in autofaisa and no one's brought it up. Might be due to faiss itself as I found this issue: facebookresearch/faiss#709

Is there a specific environment that you've done to get it working?

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

Hey,
The answer is exactly the error.
Some index do not implement reconstructing embeddings, that's not a bug.

Why do you need to reconstruct?

Anyway you have these choices:

  1. Don't reconstruct
  2. Use reconstruct_n method
  3. Use index.make_direct_map() then you can use reconstruct. If you're using autofaiss there is an option of the same name

Now for the technical explanation: reconstructing an embedding using an index and an id requires having a way to go from id to embedding location in the index.
If the index is a flat index IE a simple embedding matrix, the solution is obvious: simply slice the matrix with the id.
However for example if your index is IVF which organizes the embeddings as N clusters then there is no obvious way to do it.
If you call reconstruct_n without the make_direct_map with an IVF index, faiss will scan the whole index looking for the index you want.
If you do call make_direct_map beforehand, faiss will prebuild a dict between embedding id to position in the index (ie cluster id + position in the cluster)

Hope that's clears things up!

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

Note a small drawback of index.make_direct_map() : it does not currently support memory mapping so the mapping will go to ram. That's about 12 * number of embeddings in ram. Eg 1.2GB for 100M embeddings
Could clearly be memory mapped as well but faiss didn't implement it
If you find that necessary, I'd gladly guide anyone wanting to implement that either outside of fais in python or inside of faiss in c++

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

Ah yeah i forgot the last and best option: if your use case is to reconstruct embeddings that are coming out of a search, then don't use reconstruct at all!
Instead use search_and_reconstruct which give you embeddings directly. No need for the direct map in that case

That's what I use eg in clip retrieval

from retro-pytorch.

lucidrains avatar lucidrains commented on June 6, 2024

oh I see, thanks for the lengthy explanation

do you know which setting for the build index function would disable the reconstructing? https://github.com/lucidrains/RETRO-pytorch/blob/main/retro_pytorch/retrieval.py#L288 I don't believe I'm using it anywhere else in the app and the error is being emitted at this step

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

do you know which setting for the build index function would disable the reconstructing? https://github.com/lucidrains/RETRO-pytorch/blob/main/retro_pytorch/retrieval.py#L288 I don't believe I'm using it anywhere else in the app and the error is being emitted at this step

that doesn't make sense no.
This error would be emitted only if index.reconstruct is called

from retro-pytorch.

ncoop57 avatar ncoop57 commented on June 6, 2024

do you know which setting for the build index function would disable the reconstructing? https://github.com/lucidrains/RETRO-pytorch/blob/main/retro_pytorch/retrieval.py#L288 I don't believe I'm using it anywhere else in the app and the error is being emitted at this step

that doesn't make sense no.
This error would be emitted only if index.reconstruct is called

I can confirm that this error arising with the build_index function. Specifically when should_be_mem_mappable flag is True @rom1504

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

Ok that is wrong, can you share the whole stack trace ?

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

So the bug is only that faiss-gpu is installed instead of faiss-cpu.
Just change that line in the colab and things work.

from retro-pytorch.

ncoop57 avatar ncoop57 commented on June 6, 2024

So the bug is only that faiss-gpu is installed instead of faiss-cpu.
Just change that line in the colab and things work.

Oh is there any speed difference in creating the index if we used faiss-cpu vs. GPU? My assumption was that it would be fast if we accelerated using the GPU. And I think I was getting a separate error when I didn't install the faiss-gpu something like "faiss has no 'StandardGPU...' attribute".

from retro-pytorch.

ncoop57 avatar ncoop57 commented on June 6, 2024

@rom1504 in the build_index, the flag use_gpu = torch.cuda.is_available(), is enabled. Do you recommend turning that off when building the index and wanting to have the mem mappable?

from retro-pytorch.

rom1504 avatar rom1504 commented on June 6, 2024

Oh is there any speed difference in creating the index if we used faiss-cpu vs. GPU?

yes it's usually slower on gpu

from retro-pytorch.

ncoop57 avatar ncoop57 commented on June 6, 2024

@lucidrains I can make an additional PR reverting the changes and removing the GPU flag to only use CPU if you'd like

from retro-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.