ad-freiburg / elevant Goto Github PK
View Code? Open in Web Editor NEWEntity linking evaluation and analysis tool
Home Page: https://elevant.cs.uni-freiburg.de/
License: Apache License 2.0
Entity linking evaluation and analysis tool
Home Page: https://elevant.cs.uni-freiburg.de/
License: Apache License 2.0
Hello,
I am trying to use a new benchmark and experiment over it. I ran all of the scripts as I have multiple times before and I don't think there were any errors. But when I upload the new benchmark and experiment, although it gives me what seem like plausible numbers, clicking on the experiment does not display the documents, instead spinning forever.
In the console I can see that an error is happening while it is trying to read or format the first article. It happens at line 1814:
Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'type')
if ("true_entity" in mention) {
// Use the type of the parent entity because this is the type that counts in the evaluation.
let curr_label_id = mention.true_entity.id;
while (curr_label_id in child_label_to_parent) {
curr_label_id = child_label_to_parent[curr_label_id];
}
gt_annotation.gt_entity_type = label_id_to_label[curr_label_id].type; <--- error
// Get text of parent span
if (curr_label_id !== mention.true_entity.id) {
let parent_span = label_id_to_label[curr_label_id].span;
const articles = (example_modal) ? window.articles_example_benchmark : window.benchmark_articles[benchmark];
gt_annotation.parent_text = articles[article_index].text.substring(parent_span[0], parent_span[1]);
}
I added a try block and it looks like label_id_to_label
is empty. The mention
looks fine.
I have spent some time looking for any obvious problems in the datafiles and haven't found any.
If you would like to see it for yourself, you can do so here
Did something go wrong with the data generation, or is something wrong with the app? I would appreciate any assistance.
Hello,
I got GENRE set up from your repo, but when I try to run it, I get this:
root@dec09fef21fa:/GENRE# python3 main.py --yago -i agolo-110823.benchmark.jsonl -o out.jsonl --split_iter --mention_trie data/mention_trie.pkl --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
File "main.py", line 3, in <module>
from model import Model
File "/GENRE/model.py", line 6, in <module>
from genre.fairseq_model import GENRE
File "/GENRE/genre/fairseq_model.py", line 14, in <module>
from fairseq import search, utils
File "/GENRE/fairseq/fairseq/utils.py", line 20, in <module>
from fairseq.modules.multihead_attention import MultiheadAttention
File "/GENRE/fairseq/fairseq/modules/__init__.py", line 10, in <module>
from .character_token_embedder import CharacterTokenEmbedder
File "/GENRE/fairseq/fairseq/modules/character_token_embedder.py", line 11, in <module>
from fairseq.data import Dictionary
File "/GENRE/fairseq/fairseq/data/__init__.py", line 23, in <module>
from .indexed_dataset import (
File "/GENRE/fairseq/fairseq/data/indexed_dataset.py", line 112, in <module>
6: np.float,
File "/usr/local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Hello,
I am trying to build the docker container for Elevant. I have cloned the repo and run docker build -t elevant .
.
This is on MacOS.
The output is below.
[+] Building 83.3s (11/28) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.54kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 259B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:20.04 1.9s
=> [internal] load build context 0.1s
=> => transferring context: 1.95MB 0.0s
=> [ 1/24] FROM docker.io/library/ubuntu:20.04@sha256:ed4a42283d9943135ed87d4ee34e542f7f5ad9ecf2f244870e23122f703f91c2 3.7s
=> => resolve docker.io/library/ubuntu:20.04@sha256:ed4a42283d9943135ed87d4ee34e542f7f5ad9ecf2f244870e23122f703f91c2 0.0s
=> => sha256:a80d11b67ef30474bcccab048020ee25aee659c4caaca70794867deba5d392b6 424B / 424B 0.0s
=> => sha256:0341906bdafc976cd73b05ea0e3df2e4884c6b6816197a2ffbd2367061c19acf 2.32kB / 2.32kB 0.0s
=> => sha256:915eebb74587f0e5d3919cb77720c143be9a85a8d2d5cd44675d84c8c3a2b74a 25.97MB / 25.97MB 2.8s
=> => sha256:ed4a42283d9943135ed87d4ee34e542f7f5ad9ecf2f244870e23122f703f91c2 1.13kB / 1.13kB 0.0s
=> => extracting sha256:915eebb74587f0e5d3919cb77720c143be9a85a8d2d5cd44675d84c8c3a2b74a 0.7s
=> [ 2/24] WORKDIR /home/ 0.1s
=> [ 3/24] RUN apt-get update 7.6s
=> [ 4/24] RUN apt-get install -y python3 python3-pip git wget vim curl python3-gdbm 45.0s
=> [ 5/24] RUN git clone https://github.com/huggingface/neuralcoref.git 13.2s
=> [ 6/24] RUN python3 -m pip install -r neuralcoref/requirements.txt 10.7s
=> ERROR [ 7/24] RUN python3 -m pip install -e neuralcoref 1.1s
------
> [ 7/24] RUN python3 -m pip install -e neuralcoref:
0.654 Obtaining file:///home/neuralcoref
1.082 ERROR: Command errored out with exit status 1:
1.082 command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/neuralcoref/setup.py'"'"'; __file__='"'"'/home/neuralcoref/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
1.082 cwd: /home/neuralcoref/
1.082 Complete output (39 lines):
1.082 /usr/local/lib/python3.8/dist-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /home/neuralcoref/neuralcoref/neuralcoref.pxd
1.082 tree = Parsing.p_module(s, pxd, full_module_name)
1.082
1.082 Error compiling Cython file:
1.082 ------------------------------------------------------------
1.082 ...
1.082 int length
1.082
1.082
1.082 cdef class Vocab:
1.082 cdef Pool mem
1.082 cpdef readonly StringStore strings
1.082 ^
1.082 ------------------------------------------------------------
1.082
1.082 /usr/local/lib/python3.8/dist-packages/spacy/vocab.pxd:29:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
1.082 Processing neuralcoref.pyx
1.082 Traceback (most recent call last):
1.082 File "/home/neuralcoref/bin/cythonize.py", line 168, in <module>
1.082 run(args.root)
1.082 File "/home/neuralcoref/bin/cythonize.py", line 157, in run
1.082 process(base, filename, db)
1.082 File "/home/neuralcoref/bin/cythonize.py", line 123, in process
1.082 preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
1.082 File "/home/neuralcoref/bin/cythonize.py", line 86, in preserve_cwd
1.082 func(*args)
1.082 File "/home/neuralcoref/bin/cythonize.py", line 62, in process_pyx
1.082 raise Exception("Cython failed")
1.082 Exception: Cython failed
1.082 Traceback (most recent call last):
1.082 File "<string>", line 1, in <module>
1.082 File "/home/neuralcoref/setup.py", line 239, in <module>
1.082 setup_package()
1.082 File "/home/neuralcoref/setup.py", line 174, in setup_package
1.082 generate_cython(root, 'neuralcoref')
1.082 File "/home/neuralcoref/setup.py", line 163, in generate_cython
1.082 raise RuntimeError('Running cythonize failed')
1.082 RuntimeError: Running cythonize failed
1.082 Cythonizing sources
1.082 ----------------------------------------
1.083 ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
------
Dockerfile:8
--------------------
6 | RUN git clone https://github.com/huggingface/neuralcoref.git
7 | RUN python3 -m pip install -r neuralcoref/requirements.txt
8 | >>> RUN python3 -m pip install -e neuralcoref
9 | COPY requirements.txt requirements.txt
10 | RUN python3 -m pip install -r requirements.txt
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install -e neuralcoref" did not complete successfully: exit code: 1```
Hi!
I am not sure, but I think there might be model version issues with Spacy.
First, the easy part: the data directory vars in setup are messed up. I fixed those manually.
Then, I get the error below. This is not using docker. I have tried everything I can think of to get docker running on my Mac with zero luck, so this is currently not an option. Some searches suggest that similar problems have arisen from mismatch of spacy versions, but I have tried loading different model versions and such without any luck thus far. These are what I have (output of pip freeze
):
spacy==3.4.4
spacy-conll==3.3.0
spacy-legacy==3.0.12
spacy-loggers==1.0.5
The error:
python3 link_benchmark_entities.py Spacy -l spacy -b agolo-110823
2023-11-17 11:40:32 [INFO]: Loading config file configs/spacy.config.json for linker spacy.
2023-11-17 11:40:32 [INFO]: Initializing linker spacy with config parameters {'linker_name': 'Spacy', 'model_name': 'wikipedia', 'kb': 'wikipedia', 'experiment_description': 'Using a knowledge base and model derived from Wikipedia.'} ...
2023-11-17 11:40:35 [INFO]: Loading linker model...
Traceback (most recent call last):
File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 164, in <module>
main(cmdl_args)
File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 40, in main
linking_system = LinkingSystem(args.linker_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 43, in __init__
self._initialize_linker(linker_name, prediction_file, prediction_format)
File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 97, in _initialize_linker
self.linker = SpacyLinker(self.linker_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/linkers/spacy_linker.py", line 23, in __init__
self.model = EntityLinkerLoader.load_trained_linker(model_name, kb_name=kb_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/helpers/entity_linker_loader.py", line 28, in load_trained_linker
model.from_bytes(model_bytes)
File "/opt/homebrew/lib/python3.11/site-packages/spacy/language.py", line 2202, in from_bytes
util.from_bytes(bytes_data, deserializers, exclude)
File "/opt/homebrew/lib/python3.11/site-packages/spacy/util.py", line 1302, in from_bytes
return from_dict(srsly.msgpack_loads(bytes_data), setters, exclude) # type: ignore[return-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/spacy/util.py", line 1324, in from_dict
setter(msg[key])
File "/opt/homebrew/lib/python3.11/site-packages/spacy/language.py", line 2191, in <lambda>
deserializers["tokenizer"] = lambda b: self.tokenizer.from_bytes( # type: ignore[union-attr]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "spacy/tokenizer.pyx", line 838, in spacy.tokenizer.Tokenizer.from_bytes
File "spacy/tokenizer.pyx", line 127, in spacy.tokenizer.Tokenizer.rules.__set__
File "spacy/tokenizer.pyx", line 574, in spacy.tokenizer.Tokenizer._load_special_cases
File "spacy/tokenizer.pyx", line 604, in spacy.tokenizer.Tokenizer.add_special_case
File "spacy/tokenizer.pyx", line 592, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for ' '. Tokenizer exceptions are only allowed to specify ORTH and NORM.
Trying to install without docker by doing docker file commands as instructed.
There is a conflict apparently between these packages. I managed to get it to run by removing the version requirements, but I don't know if this will actually work or not.
python3 -m pip install -r requirements.txt
INFO: pip is looking at multiple versions of xrenner to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 11) and -r requirements.txt (line 12) because these package versions have conflicting dependencies.
The conflict is caused by:
radboud-el 0.0.1 depends on flair>=0.11
xrenner 2.2.0.0 depends on flair==0.6.1
Hello,
I'd like to create a benchmarking dataset for Elevant which includes coref mentions. I assume the coref mentions need to be labeled as such somehow so that coref can be turned on and off for evaluation. However, looking at the json schema I do not see a field in the ground truth labels for that. I do see that for predictions "coreference" is stored in the "recognized_by" field, but I don't see something similar for the ground truth labels. How should I treat coref ground truth mentions in a benchmarking dataset?
Hello!
Happy to say I managed to run a full eval and view it in the GUI. Overall things are looking great.
There's just one thing, which might be an issue with my data but nevertheless might be worth looking into.
Most documents display properly, but a few have things like what is shown below (text and image).
Below that I have included the corresponding line from the benchmark.jsonl
file.
Any idea what is causing this? Thanks!
1
A helicopter accident in northeastern
class="annotation gt unknown lowlight beginning annotation_id_0_1_2">Syria over the weeken
Groundtruth: [UNKNOWN]
Note: Entity not found in the knowledge base
d left 22
American
service member
s injured, the U.S.
mili
n class="annotation gt unknown lowlight beginning annotation_id_0_1_7">tary said Tuesday.
The milit
Groundtruth: [UNKNOWN]
Note: Entity not found in the knowledge base
ary statement said tha
t the cause of the accident was under investigatio
n and that no enemy fire involved.“A helicopter mishap in northeastern
Syria
resulted in the injuries of various degrees of 22 U.S.
service members,” US Central Command
said. “No enemy fire was reported.”
"The service members are receiving treatment for their injuries and 10 have been evacuated to higher care facilities," Centcom
added in a statement.
A spokesman for the U.S.
-backed Syria
n Kurdish
forces did not immediately respond to an Associated Press
request for comment.
{"id": 1, "title": "1", "text": "A helicopter accident in northeastern Syria over the weekend left 22 American service members injured, the U.S. military said Tuesday.\n\n\nThe military statement said that the cause of the accident was under investigation and that no enemy fire involved.\u201cA helicopter mishap in northeastern Syria resulted in the injuries of various degrees of 22 U.S. service members,\u201d US Central Command said. \u201cNo enemy fire was reported.\u201d\n\n\n\"The service members are receiving treatment for their injuries and 10 have been evacuated to higher care facilities,\" Centcom added in a statement.\n\n\nA spokesman for the U.S.-backed Syrian Kurdish forces did not immediately respond to an Associated Press request for comment.\n\n\nThere are at least 900 U.S. forces in Syria on average, along with an undisclosed number of contractors. U.S. special operations forces also move in and out of the country, but are usually in small teams and are not included in the official count.\n\n\nU.S. forces have been in Syria since 2015 to assist the Kurdish-led Syrian Forces in the fight against the militant Islamic State group. Since the extremist group was defeated in Syria in March 2019, U.S. troops have been trying to prevent any comeback by IS, which swept through Iraq and Syria in 2014, taking control of large swaths of territory.\n\n\nHowever, IS sleeper cells remain a threat. There are also about 10,000 IS fighters held in detention facilities in Syria and tens of thousands of their family members living in two refugee camps in the country's northeast.\n\n\nOver the past years, U.S. troops have been subjected to attacks carried out by IS members and Iran-backed fighters there. In late March, a drone attack on a U.S. base killed a contractor and wounded five American troops and another contractor. In retaliation, U.S. fighter jets struck several locations around the eastern province of Deir el-Zour, which borders Iraq.\n\n\nU.S. Defense Secretary Lloyd Austin said at the time that the strikes were a response to the drone attack as well as a series of recent attacks against U.S.-led coalition forces in Syria by groups affiliated with Iran\u2019s Revolutionary Guard.\n\n\nIn a related development, Syrian Kurdish-led authorities announced Saturday that hundreds of IS fighters held in prisons around the region will be put on trial after their home countries refused to repatriate them.", "evaluation_span": [0, 2357], "labels": [{"id": 0, "span": [25, 43], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [3], "optional": false, "type": "Q27096213|Q43229"}, {"id": 1, "span": [276, 294], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [16], "optional": false, "type": "Q27096213|Q43229"}, {"id": 2, "span": [43, 48], "entity_id": "Q858", "name": "Syria", "parent": 21, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 3, "span": [30, 35], "entity_id": "Q858", "name": "Syria", "parent": 0, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 4, "span": [608, 613], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 5, "span": [742, 747], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 6, "span": [979, 984], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 7, "span": [1022, 1027], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 8, "span": [107, 120], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 9, "span": [727, 738], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 10, "span": [954, 965], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 11, "span": [1154, 1165], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 12, "span": [1551, 1562], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 13, "span": [214, 218], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 14, "span": [345, 349], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 15, "span": [596, 600], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 16, "span": [282, 286], "entity_id": "Q30", "name": "United States of America", "parent": 1, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 17, "span": [809, 813], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 18, "span": [164, 168], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 19, "span": [368, 386], "entity_id": "Q1476046", "name": "United States Central Command", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 20, "span": [544, 551], "entity_id": "Q1476046", "name": "United States Central Command", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 21, "span": [38, 59], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [2], "optional": false, "type": "OTHER"}, {"id": 22, "span": [664, 680], "entity_id": "Q40469", "name": "Associated Press", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 23, "span": [116, 146], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [], "optional": false, "type": "OTHER"}, {"id": 24, "span": [67, 92], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [], "optional": false, "type": "OTHER"}, {"id": 25, "span": [2169, 2199], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [], "optional": false, "type": "OTHER"}, {"id": 26, "span": [1070, 1089], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 27, "span": [1210, 1212], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 28, "span": [1314, 1316], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 29, "span": [1376, 1378], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 30, "span": [1609, 1611], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 31, "span": [2236, 2238], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 32, "span": [1234, 1238], "entity_id": "Q796", "name": "Iraq", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 33, "span": [1892, 1896], "entity_id": "Q796", "name": "Iraq", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 34, "span": [1624, 1628], "entity_id": "Q794", "name": "Iran", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 35, "span": [1864, 1876], "entity_id": "Q239097", "name": "Deir ez-Zor", "parent": null, "children": [], "optional": false, "type": "Q27096213"}, {"id": 36, "span": [1923, 1935], "entity_id": "Q941013", "name": "Lloyd Austin", "parent": null, "children": [], "optional": false, "type": "Q215627"}, {"id": 37, "span": [2113, 2139], "entity_id": "Q271110", "name": "Islamic Revolutionary Guard Corps", "parent": null, "children": [], "optional": false, "type": "Q43229"}]}
Hi, it's a great work!
I would like to know if it is necessary to ensure that the ground truth is in the candidate set when evaluating Entity Linking, particularly for the REL method.
I look forward to your response.
I ran a command as part of an attempt to debug another issue. It used the experiment name "test".
Now I have an experiment showing up in the UI named "test" that I cannot get rid of.
I have searched within all files in the repo for the word "test" and have deleted all of the files with "test" in the name. "test" still shows up in the web app.
I have deleted all the experiment files that I could find and re-run the commands with different names. An experiment with the name "test" is still there, and the new one with the new name cannot be found.
I have also changed the metadata file to have the correct name.
How can I get rid of this error in the web app and successfully upload an experiment with the title I want it to have?
Hello!
I have a new benchmark based on a small dataset I've put together. It exists in json files natively. I ran a script of my own to convert it to NIF format. I spot checked the NIF file and it looks correct. Specifically, all the entities are linked to the correct texts.
I then ran:
python3 add_benchmark.py "agolo-e2e-eval" -bfile agolo_e2e_eval_dataset.nif -bformat nif
After checking the resulting file, it appears that the entities are appearing with the wrong texts. For instance, this is doc16 in the NIF file:
<http://example.org/doc16> a nif:Context,
nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "447"^^xsd:nonNegativeInteger ;
nif:isString """Honduras President Xiomara Castro
...and here is the entry for the entity "Honduras":
<http://example.org/doc16#offset_0_447_14> a nif:OffsetBasedString,
nif:Phrase ;
nif:anchorOf "Honduras " ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "9"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://example.org/doc16> ;
itsrdf:taIdentRef <Q30081060> .
However, in the resulting file agolo-e2e-eval.benchmark.jsonl
I see the entity:
"evaluation_span": [0, 2357], "labels": [{"id": 0, "span": [0, 8], "entity_id": "Q783", "name": "Honduras", "parent": null, "children": [1], "optional": false, "type": "Q27096213|Q43229"},
in the same record (on the same line) as this text:
{"id": 21, "title": "http://example.org/doc7", "text": "A helicopter accident in...
So somehow it appears the conversion script is mixing up the texts and the entities that go with them. This seems to be consistent across the resulting file.
I have attached the input and output files.
Any idea what could be going wrong?
Thanks!
Hi all!
Thanks for building such a convenient platform. I am wondering if you guys have plans to add more EL benchmarks like Der, OKE15 and OKE16. Adding these will definitely make your platform more famous in the EL community and get rid of the old GERBIL benchmark...
Hello!
On the same new benchmark dataset that I attached to another issue, I tried running an experiment with REL.
I realize, as outlined in the previous issue, that this dataset for some reason didn't convert properly, but this seems to be a different problem.
I ran: python3 link_benchmark_entities.py Rel -l rel -b agolo-e2e-eval
This is what happens:
python3 link_benchmark_entities.py Rel -l rel -b agolo-e2e-eval
2023-11-06 13:29:05 [INFO]: Loading config file configs/rel.config.json for linker rel.
2023-11-06 13:29:05 [INFO]: Initializing linker rel with config parameters {'linker_name': 'REL', 'wiki_version': 'wiki_2014', 'ner_model': 'ner-fast', 'use_api': False, 'api_url': 'https://rel.cs.ru.nl/api', 'experiment_description': 'Using the Wiki 2014 version and Flair for NER.'} ...
2023-11-06 13:29:09 [INFO]: Loading Wikipedia to Wikidata database from ./data/wikidata_mappings/wikipedia_name_to_qid.db ...
2023-11-06 13:29:11 [INFO]: -> 9279408 Wikipedia-Wikidata mappings loaded.
2023-11-06 13:29:11 [INFO]: Loading redirects database from ./data/wikipedia_mappings/redirects.db ...
2023-11-06 13:29:13 [INFO]: -> 10914101 redirects loaded.
2023-11-06 13:29:13 [INFO]: Creating directory ./data/linker_files/rel/
2023-11-06 13:29:13 [INFO]: Downloading and extracting http://gem.cs.ru.nl/generic.tar.gz
2023-11-06 13:35:51 [INFO]: Saved file at ./data/linker_files/rel/generic
2023-11-06 13:35:51 [INFO]: Downloading and extracting http://gem.cs.ru.nl/wiki_2014.tar.gz
2023-11-06 13:57:41 [INFO]: Saved file at ./data/linker_files/rel/wiki_2014
2023-11-06 13:57:42,285 https://nlp.informatik.hu-berlin.de/resources/models/ner-fast/en-ner-fast-conll03-v0.4.pt not found in cache, downloading to /var/folders/_s/ph0bp1tn74s52yx13q0c_tq40000gn/T/tmp184af80d
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 256774339/256774339 [00:28<00:00, 9069899.45B/s]
2023-11-06 13:58:11,249 copying /var/folders/_s/ph0bp1tn74s52yx13q0c_tq40000gn/T/tmp184af80d to cache at /Users/alan/.flair/models/en-ner-fast-conll03-v0.4.pt
2023-11-06 13:58:11,839 removing temp file /var/folders/_s/ph0bp1tn74s52yx13q0c_tq40000gn/T/tmp184af80d
2023-11-06 13:58:12,045 loading file /Users/alan/.flair/models/en-ner-fast-conll03-v0.4.pt
Traceback (most recent call last):
File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 164, in <module>
main(cmdl_args)
File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 40, in main
linking_system = LinkingSystem(args.linker_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 43, in __init__
self._initialize_linker(linker_name, prediction_file, prediction_format)
File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 174, in _initialize_linker
self.linker = RelLinker(self.entity_db, self.linker_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/linkers/rel_linker.py", line 80, in __init__
self.ner_tagger = load_flair_ner(ner_model)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/REL/ner/flair_wrapper.py", line 12, in load_flair_ner
return SequenceTagger.load(fetch_model(path_or_url, cache_root))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/REL/utils.py", line 18, in fetch_model
return get_from_cache(path_or_url, cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/flair/file_utils.py", line 215, in get_from_cache
response = requests.head(url, headers={"User-Agent": "Flair"}, allow_redirects=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/api.py", line 100, in head
return request("head", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/sessions.py", line 575, in request
prep = self.prepare_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/sessions.py", line 486, in prepare_request
p.prepare(
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/models.py", line 368, in prepare
self.prepare_url(url, params)
File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/models.py", line 439, in prepare_url
raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'ner-fast': No scheme supplied. Perhaps you meant https://ner-fast?
Hello again!
I keep seeing errors here and there, but they all look superficially like they might be easy fixes. I hope so, I was really hoping to have some experiments run and showing up in the UI in the next few days. I really appreciate any help you can offer.
In a previous issue I reported an error running an experiment with REL.
I tried doing the same with REfined, and it finished running the linking with no errors. So then I ran the evaluation script and got the following. The name of that file (./data/wikidata_mappings/alias_to_qids.db.back
) looks suspicious...could it be a simple error in the name of the file it's looking for? I have checked and I DO have ./data/wikidata_mappings/alias_to_qids.db
.
python3 evaluate_linking_results.py evaluation-results/refined/refined.agolo-e2e-eval.linked_articles.jsonl
2023-11-06 14:53:49 [INFO]: Evaluating linking results from ['evaluation-results/refined/refined.agolo-e2e-eval.linked_articles.jsonl'] ...
2023-11-06 14:53:49 [INFO]: Loading whitelist types from small-data-files/whitelist_types.tsv ...
2023-11-06 14:53:49 [INFO]: Loading whitelist type adjustments from small-data-files/type_adjustments.txt ...
2023-11-06 14:53:49 [INFO]: -> Whitelist type adjustments loaded.
2023-11-06 14:53:49 [INFO]: -> 29 whitelist types loaded.
2023-11-06 14:53:49 [INFO]: Initializing entity database for evaluation ...
2023-11-06 14:53:49 [INFO]: Loading entity ID to name database from ./data/wikidata_mappings/qid_to_label.db ...
2023-11-06 14:55:06 [INFO]: -> 86599035 entity ID to name mappings loaded.
2023-11-06 14:55:06 [INFO]: Loading entity ID to whitelist types database from ./data/wikidata_mappings/qid_to_whitelist_types.db ...
2023-11-06 14:56:14 [INFO]: -> 77316268 entity ID to whitelist types mappings loaded.
2023-11-06 14:56:14 [INFO]: Loading whitelist type adjustments from small-data-files/type_adjustments.txt ...
2023-11-06 14:56:14 [INFO]: -> Whitelist type adjustments loaded.
2023-11-06 14:56:14 [INFO]: Loading Wikipedia to Wikidata database from ./data/wikidata_mappings/wikipedia_name_to_qid.db ...
2023-11-06 14:56:17 [INFO]: -> 9279408 Wikipedia-Wikidata mappings loaded.
2023-11-06 14:56:17 [INFO]: Loading redirects database from ./data/wikipedia_mappings/redirects.db ...
2023-11-06 14:56:21 [INFO]: -> 10914101 redirects loaded.
2023-11-06 14:56:21 [INFO]: Loading demonyms from ./data/wikidata_mappings/qid_to_demonym.tsv ...
2023-11-06 14:56:21 [INFO]: -> 1646 demonyms loaded
2023-11-06 14:56:21 [INFO]: Loading real numbers from ./data/wikidata_mappings/quantity.tsv ...
2023-11-06 14:56:21 [INFO]: -> 136867 real numbers loaded.
2023-11-06 14:56:21 [INFO]: Loading points in time from ./data/wikidata_mappings/datetime.tsv ...
2023-11-06 14:56:21 [INFO]: -> 194619 points in time loaded.
2023-11-06 14:56:21 [INFO]: Loading family name aliases into entity database ...
2023-11-06 14:56:21 [INFO]: Yielding given name mapping from ./data/wikidata_mappings/qid_to_name.tsv ...
2023-11-06 14:56:31 [INFO]: -> 1573195 family name aliases loaded into entity database.
2023-11-06 14:56:31 [INFO]: Loading alias to entity IDs database from ./data/wikidata_mappings/alias_to_qids.db.back ...
Traceback (most recent call last):
File "/Users/alan/repos/agolo/elevant/evaluate_linking_results.py", line 178, in <module>
main(cmdl_args)
File "/Users/alan/repos/agolo/elevant/evaluate_linking_results.py", line 43, in main
evaluator = Evaluator(type_mapping_file, whitelist_file=whitelist_file, contains_unknowns=not args.no_unknowns,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/evaluation/evaluator.py", line 81, in __init__
self.entity_db = load_evaluation_entities(type_mapping_file, custom_kb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/evaluation/evaluator.py", line 30, in load_evaluation_entities
entity_db.load_alias_to_entities()
File "/Users/alan/repos/agolo/elevant/src/models/entity_database.py", line 204, in load_alias_to_entities
self.alias_to_entities_db = EntityDatabaseReader.get_alias_to_entities_db()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/helpers/entity_database_reader.py", line 278, in get_alias_to_entities_db
aliases_db = EntityDatabaseReader.read_from_dbm(filename, value_type=set)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alan/repos/agolo/elevant/src/helpers/entity_database_reader.py", line 225, in read_from_dbm
dbm_db = dbm.open(db_file, "r")
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dbm/__init__.py", line 85, in open
raise error[0]("db file doesn't exist; "
dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db
I have gotten this error a few times in a row while trying to install without docker:
make download_all
wget https://ad-research.cs.uni-freiburg.de/data/entity-linking/wikidata_mappings.tar.gz
--2023-10-30 13:20:22-- https://ad-research.cs.uni-freiburg.de/data/entity-linking/wikidata_mappings.tar.gz
Resolving ad-research.cs.uni-freiburg.de (ad-research.cs.uni-freiburg.de)... 132.230.150.101
Connecting to ad-research.cs.uni-freiburg.de (ad-research.cs.uni-freiburg.de)|132.230.150.101|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10473195010 (9.8G) [application/x-gzip]
Saving to: ‘wikidata_mappings.tar.gz.2’
wikidata_mappings.tar.gz.2 100%[==============================================================================================>] 9.75G 16.7MB/s in 10m 2s
2023-10-30 13:30:25 (16.6 MB/s) - ‘wikidata_mappings.tar.gz.2’ saved [10473195010/10473195010]
tar -xvzf wikidata_mappings.tar.gz -C ./data/wikidata_mappings/
x alias_to_qids.db: truncated gzip input
tar: Error exit delayed from previous errors.
make: *** [download_wikidata_mappings] Error 1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.