Comments (5)
Hi @Albert-Ma,
if you are looking for getting phrase representations from the documents, please refer here.
The code that extracts phrases is https://github.com/princeton-nlp/DensePhrases/blob/main/generate_phrase_vecs.py and also see
which is used in
generate_phrase_vecs.py
.from densephrases.
Hi @jhyuklee,
I am looking for how to get phrases from raw documents like Wikipedia or squad?
This is the very first step of the phrase retrieval and I think this may occur before training the model.
Or the phrase retrieval didn't extract phrases from documents explicitly and it did it on the fly?
I'll check the generate_phrase_vecs.py
function.
Thanks.
from densephrases.
Phrase retrieval is trained with QA datasets which contain phrase-level answer annotations. So we don't need to explicitly extract phrases before training. After training, generate_phrase_vecs.py
filters out irrelevant phrases (i.e., start/end tokens) and stores only relevant phrases that could be used for downstream tasks. This filtering model was also trained on QA datasets so that these phrases serve as answer candidates. In embed_utils.py
, there is a function that applies this filtering:
Here, metadata means phrase vector related outputs for each document (phrase start/end vectors, start2end mapper, etc).
from densephrases.
Got it, thanks
from densephrases.
You can also check this issue! #17
I think it's related.
from densephrases.
Related Issues (20)
- Issue while creating faiss index, Command is not clear HOT 14
- Representations of phrases HOT 6
- Train custom teacher model HOT 3
- Question about faiss parameter HOT 4
- Modifying num_clusters in index-vecs HOT 11
- Unable to Reproduce Passage Retrieval Results on NQ HOT 9
- Reproduction of DensePhrase (w/ PQ, w/o qft) on SQuAD HOT 9
- Significance of line 174 in train_query.py code HOT 4
- Iterative retrieval in case of non-unique top-k retrieval HOT 2
- failed with "make draft MODEL_NAME=test" HOT 2
- Where is the code for queries to get phrases searching score rank? HOT 2
- how to evaluate model on SQuAD (non openQA settings) HOT 1
- How to choose phrase to encode in wikipedia document
- DensePhrases for non-answerable questions
- run_demo.py : IndexError: index out of range in self HOT 1
- editing the demo file HOT 3
- IndexError: index 99 is out of bounds for axis 0 with size 35
- Recipe to build dense representations from corpus HOT 1
- Implementation of contrastive loss with in-passage negative
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from densephrases.