Giter VIP home page Giter VIP logo

Comments (7)

lw avatar lw commented on August 18, 2024

Most of what you are asking is explained in the documentation so, rather than copy-pasting it here, I suggest you to read up there and let me know if anything is unclear.

The one thing that I believe we did not explain in the doc is what we mean by "nearest neighbor" search. You are right in saying that to properly compute how "close" (similar) two entities are one should apply the proper operators and do the dot product. However, it turns out that once the embeddings are fully trained, their distance in L2 space already captures some semantic similarity and can thus be used to get a rough sense of the neighbors. This is an approximation we did in that example that we should have explained better. If you want a more exact search, there's a few options. I believe you can tell FAISS to use the dot product over the L2 norm, although not all indices support it. You cannot tell FAISS to apply the operator for you, but you can apply yourself the operator to your query before searching in the un-transformed embeddings (if you use "standard" relations, this only allows you to query the nearest left-hand side neighbors of a right-hand side entity; if you use dynamic relations you can do it on either side). If for some reason this doesn't work for you, you can drop FAISS entirely and do a slower but more correct evaluation of the scores between an entity and all other ones, similarly to what is done when ranking.

As mentioned in the page about the Wikidata embeddings, the TSV is almost the same format as the one of the export_to_tsv command, which is explained in the readme. The parameters of the operator of each relation type are at the end of the TSV file. They are not pre-applied to the embeddings (this would be impossible if one had more than one relation types, with different parameters).

You will also find in the doc that, in addition to TSV, there's a machine-readable format for these embeddings (i.e., .npy). There we also explain how to load it.

Dynamic relations are explained here. The parameters for the left-hand side operators also appear in the TSV file, with a _reverse_relation suffix.

Then, the FAISS example should work just the same for the Wikidata embeddings. Due to their size you may want to use a different index type for better performance, but that depends on your application and you should turn to the FAISS developers for help with tuning.

from pytorch-biggraph.

kadimaolivier avatar kadimaolivier commented on August 18, 2024

Hello, i would like to work with pyTorch-biggraph, my aim is from a graph data set ,i want to be able to find some entities simularities, and dertermine some simularity between entitites that has numerical attributes (my data are in RDF format) and at the end how can i apply TransEA model with numerical attributes after detemining the simularity between entities

from pytorch-biggraph.

lw avatar lw commented on August 18, 2024

Your questions are very broad and they are basically about how to design a full ML pipeline, which is something that is up to you, rather than how to employ PBG as one block of it, which is what we're here to help with. I advise you to check out the README and documentation and get back to us if you have specific issues.

from pytorch-biggraph.

kadimaolivier avatar kadimaolivier commented on August 18, 2024

Thank you very much for your advise and your feedback, may you please guide me since i am a newby in this field, is it possible to get vectors from an RDF dataset using PBG tool? if yes which step should follow? secondly after getting vectors i would like to compare these vectors and find which entities vectors are simular so is it also possible to do that with PBG, once again thanks in advance for your guidance and orientation

from pytorch-biggraph.

lw avatar lw commented on August 18, 2024

PBG, by itself, doesn't read RDF. You need to convert it to either the native format (explained here) or to TSV (tab-separated values), for which there already is an importer (i.e., torchbiggraph_from_tsv). The N-Tripes format of RDF is somewhat similar to TSV, so that may be easiest. Once you have your data in the right format, you can find in the doc explanations on how to train embeddings for its entities.

from pytorch-biggraph.

kadimaolivier avatar kadimaolivier commented on August 18, 2024

Thank you very much for the details...

from pytorch-biggraph.

lw avatar lw commented on August 18, 2024

Closing this as I think everything had been answered and there were no follow-ups. If I missed something or new questions arise, please reopen or create a new issue.

from pytorch-biggraph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.