cgraywang / deepex Goto Github PK
View Code? Open in Web Editor NEWCode repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"
Home Page: https://arxiv.org/pdf/2109.11171.pdf
License: Apache License 2.0
Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"
Home Page: https://arxiv.org/pdf/2109.11171.pdf
License: Apache License 2.0
Hello, thanks for your contribution to Open Information Extraction research.
I'm currently working on using your repo to create triples from the raw texts in Wikipedia by using your code.
bash tasks/OIE_2016.sh
I changed the data_dir to construct triples then executed the above code.
Code works fine, and it gave me the output called search_res.json.
The problem is, I can't find the description for the format of this output.
I can't find the keys for each value of this output.
{"deduplicated:": {"Another Temporary Gallery [SEP] gallery [SEP] The Museum": [8, 0.7081716619431973, [[0, 25], [65, 75]], 27, 0] ...
For example, the thing I want to know is this list.
[8, 0.7081716619431973, [[0, 25], [65, 75]], 27, 0]
Coudl you briefly explain what each of this value mean?
请问论文中公式(1)中loss的实现具体在哪个文件中?仅在Reranking函数中计算了sentence embedding和triple embedding的一阶范数。
Thank you for great work in zero-shot IE. Where can I find the related script about the pre-training about Magolor/deepex-ranking-model
Hi, thank you for your good work in OIE2016, but I use your default parameters with 1 V100 GPU, but I get the following results:
Did I have some mistakes that I cannot get the 0.72 F1?
Also, I find that the constractive pretrain deep-ranking-model has not been released and the code is inference code now. It needs about 3 hours to test. Am I right?
Hope for your reply!
PyTorch 1.7.1 with CUDA 10.1, single GPU.
After running
git clone --recursive [email protected]:cgraywang/deepex.git
cd ./deepex
conda create --name deepex python=3.7 -y
conda activate deepex
pip install -r requirements.txt
pip install -e .
bash tasks/OIE_2016.sh
I am getting
FileNotFoundError: [Errno 2] No such file or directory: 'result/OIE_2016.bert-large-cased.np.d2048.b6.sorted/P0_result.json'
I suspect that because the output file was not generated, so digging in the trace log I found:
RuntimeError
: RuntimeErrorRuntimeError
NCCL error in: /opt/conda/conda-bld/pytorch_1607370141920/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8: RuntimeError:
NCCL error in: /opt/conda/conda-bld/pytorch_1607370141920/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8: NCCL error in: /opt/conda/conda-bld/pytorch_1607370141920/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8RuntimeError
And some other NCCL errors.
I am not sure where to start debugging.
Thank you in advance.
I am trying to do a simple text-to-triple inference task (openIE) and I thought that this would be the way to start.
Hi
Could you described the steps to reproduce the results in your paper for OIE 2016 and the other systems. You used the supervised-oie repo and the benchmark included in there. There is also a related repo from the same author oie-benchmark with the same evaluation. However they are slightly different and have had updates and neither seems to work out of the box when running the eval.sh script which should produce the PR Curve plots (part of what i was interested in comparing).
Looking at them its unclear which version of the benchmark corpus is being used and whether you picked test (or dev) split or used the whole dataset. The default script seems to use all the data but when going through the tasks you provide it seems to use the test split.
Thanks
Tony
An Evaluation on BenchIE would be great. Since it is probably the best available dataset.
Link to Dataset: https://paperswithcode.com/dataset/benchie
使用 ‘git clone --recursive [email protected]:cgraywang/deepex.git’ 进行下载时出现:
Cloning into 'deepex'...
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Hi,
did you relese a skript about "Magolor/deepex-ranking-model"?
I dont know how to use it.
I followed the README and successfully run the OpenIE16 benchmark, then I modified OIE_2016.json
file to point to my directory with test.txt
file containing just one line Julia owns two cats and one dog.
The output is however really poor, the expected triples [Julia, owns, two cats]
and [Julia, owns, one dog]
have low scores and there are many other (ill-created) triples, sometimes with even higher score values (complete output attached below).
Is this the normal behavior of the model? Why is the performance so poor? Is there a systematic issue with how I am doing this?
['$input_txt:$ Julia owns two cats and one dog',
{'deduplicated:': {'Julia [SEP] owns two [SEP] One Dog': [2,
0.15572701767086983,
[[0, 5], [24, 31]],
8,
0],
'Julia [SEP] owns two cats [SEP] One Dog': [5,
0.38142501655966043,
[[0, 5], [24, 31]],
21,
0],
'Julia [SEP] cats [SEP] One Dog': [2,
0.09584893216378987,
[[0, 5], [24, 31]],
6,
0],
'Two Cats [SEP] cats [SEP] One Dog': [2,
0.09491016250103712,
[[11, 19], [24, 31]],
6,
0],
'Julia [SEP] owns two cats and [SEP] One Dog': [6,
0.4070159122347832,
[[0, 5], [24, 31]],
26,
0],
'Julia [SEP] cats and [SEP] Two Cats': [2,
0.14055378548800945,
[[0, 5], [11, 19]],
9,
0],
'Julia [SEP] two cats [SEP] One Dog': [1,
0.06196947582066059,
[[0, 5], [24, 31]],
4,
0],
'Julia [SEP] one [SEP] Two Cats': [1,
0.06188515014946461,
[[0, 5], [11, 19]],
4,
0],
'Julia [SEP] owns [SEP] Two Cats': [6,
0.2982198027893901,
[[0, 5], [11, 19]],
20,
0],
'Julia [SEP] owns [SEP] One Dog': [4,
0.17877793312072754,
[[0, 5], [24, 31]],
12,
0],
'Julia [SEP] two [SEP] One Dog': [3,
0.1447404371574521,
[[0, 5], [24, 31]],
10,
0],
'Julia [SEP] and one [SEP] Two Cats': [5,
0.32297115167602897,
[[0, 5], [11, 19]],
23,
0],
'Two Cats [SEP] owns [SEP] One Dog': [8,
0.44091942673549056,
[[11, 19], [24, 31]],
32,
0],
'Julia [SEP] and [SEP] One Dog': [1,
0.04122000187635422,
[[0, 5], [24, 31]],
3,
0],
'Julia [SEP] cats and one [SEP] Two Cats': [2,
0.10967723815701902,
[[0, 5], [11, 19]],
8,
0],
'Two Cats [SEP] one [SEP] One Dog': [2,
0.08130411058664322,
[[11, 19], [24, 31]],
6,
0],
'Two Cats [SEP] owns two [SEP] One Dog': [8,
0.4609282175078988,
[[11, 19], [24, 31]],
36,
0],
'Julia [SEP] and [SEP] Two Cats': [3,
0.14550211280584335,
[[0, 5], [11, 19]],
12,
0],
'Two Cats [SEP] two [SEP] One Dog': [4,
0.19086267473176122,
[[11, 19], [24, 31]],
16,
0],
'Two Cats [SEP] and [SEP] One Dog': [6,
0.1381131475791335,
[[11, 19], [24, 31]],
21,
0]}}]
Hi,
I am unable to figure out how to run OIE inference using the deepex models. Can you please help answering:
使用论文中举的例子 ,”Born in Glasgow, Fisher is a graduate of the London Opera Centre. “ ,在第一步generator中抽取的三元组结果比较混乱,不知是否正常(OIE2016结果复现没有问题,top3 f1=72.6)。模型为默认的bert-large-cased。
search_res.json.txt
The file of P0_result.json is empty.
Hi,
I am unable to figure out how to use the deepex model. I want to input a sentence and get the triplets generated from it. Is there any code snippet/script in the repo that can help me do that?
Thanks
Thank you for your great work on zero-shot IE. I am very impressed with the results while I am very curious on what is the 'task agnostic corpus', is it an open dataset?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.