Giter VIP home page Giter VIP logo

elevant's People

Contributors

amundfr avatar flackbash avatar hannahbast avatar hertelm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jadsitt

elevant's Issues

Accidental experiment title cannot be expunged

I ran a command as part of an attempt to debug another issue. It used the experiment name "test".

Now I have an experiment showing up in the UI named "test" that I cannot get rid of.

I have searched within all files in the repo for the word "test" and have deleted all of the files with "test" in the name. "test" still shows up in the web app.

I have deleted all the experiment files that I could find and re-run the commands with different names. An experiment with the name "test" is still there, and the new one with the new name cannot be found.

I have also changed the metadata file to have the correct name.

How can I get rid of this error in the web app and successfully upload an experiment with the title I want it to have?

Error while running evaluate_linking_results.py on new results file

Hello again!

I keep seeing errors here and there, but they all look superficially like they might be easy fixes. I hope so, I was really hoping to have some experiments run and showing up in the UI in the next few days. I really appreciate any help you can offer.

In a previous issue I reported an error running an experiment with REL.

I tried doing the same with REfined, and it finished running the linking with no errors. So then I ran the evaluation script and got the following. The name of that file (./data/wikidata_mappings/alias_to_qids.db.back) looks suspicious...could it be a simple error in the name of the file it's looking for? I have checked and I DO have ./data/wikidata_mappings/alias_to_qids.db.

python3 evaluate_linking_results.py evaluation-results/refined/refined.agolo-e2e-eval.linked_articles.jsonl
2023-11-06 14:53:49 [INFO]: Evaluating linking results from ['evaluation-results/refined/refined.agolo-e2e-eval.linked_articles.jsonl'] ...
2023-11-06 14:53:49 [INFO]: Loading whitelist types from small-data-files/whitelist_types.tsv ...
2023-11-06 14:53:49 [INFO]: Loading whitelist type adjustments from small-data-files/type_adjustments.txt ...
2023-11-06 14:53:49 [INFO]: -> Whitelist type adjustments loaded.
2023-11-06 14:53:49 [INFO]: -> 29 whitelist types loaded.
2023-11-06 14:53:49 [INFO]: Initializing entity database for evaluation ...
2023-11-06 14:53:49 [INFO]: Loading entity ID to name database from ./data/wikidata_mappings/qid_to_label.db ...
2023-11-06 14:55:06 [INFO]: -> 86599035 entity ID to name mappings loaded.
2023-11-06 14:55:06 [INFO]: Loading entity ID to whitelist types database from ./data/wikidata_mappings/qid_to_whitelist_types.db ...
2023-11-06 14:56:14 [INFO]: -> 77316268 entity ID to whitelist types mappings loaded.
2023-11-06 14:56:14 [INFO]: Loading whitelist type adjustments from small-data-files/type_adjustments.txt ...
2023-11-06 14:56:14 [INFO]: -> Whitelist type adjustments loaded.
2023-11-06 14:56:14 [INFO]: Loading Wikipedia to Wikidata database from ./data/wikidata_mappings/wikipedia_name_to_qid.db ...
2023-11-06 14:56:17 [INFO]: -> 9279408 Wikipedia-Wikidata mappings loaded.
2023-11-06 14:56:17 [INFO]: Loading redirects database from ./data/wikipedia_mappings/redirects.db ...
2023-11-06 14:56:21 [INFO]: -> 10914101 redirects loaded.
2023-11-06 14:56:21 [INFO]: Loading demonyms from ./data/wikidata_mappings/qid_to_demonym.tsv ...
2023-11-06 14:56:21 [INFO]: -> 1646 demonyms loaded
2023-11-06 14:56:21 [INFO]: Loading real numbers from ./data/wikidata_mappings/quantity.tsv ...
2023-11-06 14:56:21 [INFO]: -> 136867 real numbers loaded.
2023-11-06 14:56:21 [INFO]: Loading points in time from ./data/wikidata_mappings/datetime.tsv ...
2023-11-06 14:56:21 [INFO]: -> 194619 points in time loaded.
2023-11-06 14:56:21 [INFO]: Loading family name aliases into entity database ...
2023-11-06 14:56:21 [INFO]: Yielding given name mapping from ./data/wikidata_mappings/qid_to_name.tsv ...
2023-11-06 14:56:31 [INFO]: -> 1573195 family name aliases loaded into entity database.
2023-11-06 14:56:31 [INFO]: Loading alias to entity IDs database from ./data/wikidata_mappings/alias_to_qids.db.back ...
Traceback (most recent call last):
  File "/Users/alan/repos/agolo/elevant/evaluate_linking_results.py", line 178, in <module>
    main(cmdl_args)
  File "/Users/alan/repos/agolo/elevant/evaluate_linking_results.py", line 43, in main
    evaluator = Evaluator(type_mapping_file, whitelist_file=whitelist_file, contains_unknowns=not args.no_unknowns,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/evaluation/evaluator.py", line 81, in __init__
    self.entity_db = load_evaluation_entities(type_mapping_file, custom_kb)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/evaluation/evaluator.py", line 30, in load_evaluation_entities
    entity_db.load_alias_to_entities()
  File "/Users/alan/repos/agolo/elevant/src/models/entity_database.py", line 204, in load_alias_to_entities
    self.alias_to_entities_db = EntityDatabaseReader.get_alias_to_entities_db()
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/helpers/entity_database_reader.py", line 278, in get_alias_to_entities_db
    aliases_db = EntityDatabaseReader.read_from_dbm(filename, value_type=set)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/helpers/entity_database_reader.py", line 225, in read_from_dbm
    dbm_db = dbm.open(db_file, "r")
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dbm/__init__.py", line 85, in open
    raise error[0]("db file doesn't exist; "
dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db

Any plan of adding popular EL benchmarks

Hi all!
Thanks for building such a convenient platform. I am wondering if you guys have plans to add more EL benchmarks like Der, OKE15 and OKE16. Adding these will definitely make your platform more famous in the EL community and get rid of the old GERBIL benchmark...

Spacy model version issues?

Hi!

I am not sure, but I think there might be model version issues with Spacy.

First, the easy part: the data directory vars in setup are messed up. I fixed those manually.

Then, I get the error below. This is not using docker. I have tried everything I can think of to get docker running on my Mac with zero luck, so this is currently not an option. Some searches suggest that similar problems have arisen from mismatch of spacy versions, but I have tried loading different model versions and such without any luck thus far. These are what I have (output of pip freeze):

spacy==3.4.4
spacy-conll==3.3.0
spacy-legacy==3.0.12
spacy-loggers==1.0.5

The error:

python3 link_benchmark_entities.py Spacy -l spacy -b agolo-110823
2023-11-17 11:40:32 [INFO]: Loading config file configs/spacy.config.json for linker spacy.
2023-11-17 11:40:32 [INFO]: Initializing linker spacy with config parameters {'linker_name': 'Spacy', 'model_name': 'wikipedia', 'kb': 'wikipedia', 'experiment_description': 'Using a knowledge base and model derived from Wikipedia.'} ...
2023-11-17 11:40:35 [INFO]: Loading linker model...
Traceback (most recent call last):
  File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 164, in <module>
    main(cmdl_args)
  File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 40, in main
    linking_system = LinkingSystem(args.linker_name,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 43, in __init__
    self._initialize_linker(linker_name, prediction_file, prediction_format)
  File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 97, in _initialize_linker
    self.linker = SpacyLinker(self.linker_config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/linkers/spacy_linker.py", line 23, in __init__
    self.model = EntityLinkerLoader.load_trained_linker(model_name, kb_name=kb_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/helpers/entity_linker_loader.py", line 28, in load_trained_linker
    model.from_bytes(model_bytes)
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/language.py", line 2202, in from_bytes
    util.from_bytes(bytes_data, deserializers, exclude)
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/util.py", line 1302, in from_bytes
    return from_dict(srsly.msgpack_loads(bytes_data), setters, exclude)  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/util.py", line 1324, in from_dict
    setter(msg[key])
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/language.py", line 2191, in <lambda>
    deserializers["tokenizer"] = lambda b: self.tokenizer.from_bytes(  # type: ignore[union-attr]
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "spacy/tokenizer.pyx", line 838, in spacy.tokenizer.Tokenizer.from_bytes
  File "spacy/tokenizer.pyx", line 127, in spacy.tokenizer.Tokenizer.rules.__set__
  File "spacy/tokenizer.pyx", line 574, in spacy.tokenizer.Tokenizer._load_special_cases
  File "spacy/tokenizer.pyx", line 604, in spacy.tokenizer.Tokenizer.add_special_case
  File "spacy/tokenizer.pyx", line 592, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for '  '. Tokenizer exceptions are only allowed to specify ORTH and NORM.

GENRE using deprecated numpy attribute

Hello,

I got GENRE set up from your repo, but when I try to run it, I get this:

root@dec09fef21fa:/GENRE# python3 main.py --yago -i agolo-110823.benchmark.jsonl  -o out.jsonl --split_iter --mention_trie data/mention_trie.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from model import Model
  File "/GENRE/model.py", line 6, in <module>
    from genre.fairseq_model import GENRE
  File "/GENRE/genre/fairseq_model.py", line 14, in <module>
    from fairseq import search, utils
  File "/GENRE/fairseq/fairseq/utils.py", line 20, in <module>
    from fairseq.modules.multihead_attention import MultiheadAttention
  File "/GENRE/fairseq/fairseq/modules/__init__.py", line 10, in <module>
    from .character_token_embedder import CharacterTokenEmbedder
  File "/GENRE/fairseq/fairseq/modules/character_token_embedder.py", line 11, in <module>
    from fairseq.data import Dictionary
  File "/GENRE/fairseq/fairseq/data/__init__.py", line 23, in <module>
    from .indexed_dataset import (
  File "/GENRE/fairseq/fairseq/data/indexed_dataset.py", line 112, in <module>
    6: np.float,
  File "/usr/local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Benchmark conversion from NIF format producing incorrect results

Hello!

I have a new benchmark based on a small dataset I've put together. It exists in json files natively. I ran a script of my own to convert it to NIF format. I spot checked the NIF file and it looks correct. Specifically, all the entities are linked to the correct texts.

I then ran:

python3 add_benchmark.py "agolo-e2e-eval" -bfile agolo_e2e_eval_dataset.nif -bformat nif

After checking the resulting file, it appears that the entities are appearing with the wrong texts. For instance, this is doc16 in the NIF file:

<http://example.org/doc16> a nif:Context,
        nif:OffsetBasedString ;
    nif:beginIndex "0"^^xsd:nonNegativeInteger ;
    nif:endIndex "447"^^xsd:nonNegativeInteger ;
    nif:isString """Honduras President Xiomara Castro

...and here is the entry for the entity "Honduras":

<http://example.org/doc16#offset_0_447_14> a nif:OffsetBasedString,
        nif:Phrase ;
    nif:anchorOf "Honduras " ;
    nif:beginIndex "0"^^xsd:nonNegativeInteger ;
    nif:endIndex "9"^^xsd:nonNegativeInteger ;
    nif:referenceContext <http://example.org/doc16> ;
    itsrdf:taIdentRef <Q30081060> .

However, in the resulting file agolo-e2e-eval.benchmark.jsonl I see the entity:

"evaluation_span": [0, 2357], "labels": [{"id": 0, "span": [0, 8], "entity_id": "Q783", "name": "Honduras", "parent": null, "children": [1], "optional": false, "type": "Q27096213|Q43229"},

in the same record (on the same line) as this text:

{"id": 21, "title": "http://example.org/doc7", "text": "A helicopter accident in...

So somehow it appears the conversion script is mixing up the texts and the entities that go with them. This seems to be consistent across the resulting file.

I have attached the input and output files.

Any idea what could be going wrong?

Thanks!

Experiment working, but some parsing errors in the document display in GUI

Hello!

Happy to say I managed to run a full eval and view it in the GUI. Overall things are looking great.

There's just one thing, which might be an issue with my data but nevertheless might be worth looking into.

Most documents display properly, but a few have things like what is shown below (text and image).

Below that I have included the corresponding line from the benchmark.jsonl file.

Any idea what is causing this? Thanks!

Screenshot 2023-11-08 at 10 27 47 AM
1
A helicopter accident in northeastern 

class="annotation gt unknown lowlight beginning annotation_id_0_1_2">Syria over the weeken
Groundtruth: [UNKNOWN]
Note: Entity not found in the knowledge base
d left 22 

American

 service member

s injured, the U.S.

 mili

n class="annotation gt unknown lowlight beginning annotation_id_0_1_7">tary said Tuesday.

The milit
Groundtruth: [UNKNOWN]
Note: Entity not found in the knowledge base
ary statement said tha

t the cause of the accident was under investigatio

n and that no enemy fire involved.“A helicopter mishap in northeastern 

Syria

 resulted in the injuries of various degrees of 22 U.S.

 service members,” US Central Command

 said. “No enemy fire was reported.”

"The service members are receiving treatment for their injuries and 10 have been evacuated to higher care facilities," Centcom

 added in a statement.

A spokesman for the U.S.

-backed Syria

n Kurdish

 forces did not immediately respond to an Associated Press

 request for comment.
{"id": 1, "title": "1", "text": "A helicopter accident in northeastern Syria over the weekend left 22 American service members injured, the U.S. military said Tuesday.\n\n\nThe military statement said that the cause of the accident was under investigation and that no enemy fire involved.\u201cA helicopter mishap in northeastern Syria resulted in the injuries of various degrees of 22 U.S. service members,\u201d US Central Command said. \u201cNo enemy fire was reported.\u201d\n\n\n\"The service members are receiving treatment for their injuries and 10 have been evacuated to higher care facilities,\" Centcom added in a statement.\n\n\nA spokesman for the U.S.-backed Syrian Kurdish forces did not immediately respond to an Associated Press request for comment.\n\n\nThere are at least 900 U.S. forces in Syria on average, along with an undisclosed number of contractors. U.S. special operations forces also move in and out of the country, but are usually in small teams and are not included in the official count.\n\n\nU.S. forces have been in Syria since 2015 to assist the Kurdish-led Syrian Forces in the fight against the militant Islamic State group. Since the extremist group was defeated in Syria in March 2019, U.S. troops have been trying to prevent any comeback by IS, which swept through Iraq and Syria in 2014, taking control of large swaths of territory.\n\n\nHowever, IS sleeper cells remain a threat. There are also about 10,000 IS fighters held in detention facilities in Syria and tens of thousands of their family members living in two refugee camps in the country's northeast.\n\n\nOver the past years, U.S. troops have been subjected to attacks carried out by IS members and Iran-backed fighters there. In late March, a drone attack on a U.S. base killed a contractor and wounded five American troops and another contractor. In retaliation, U.S. fighter jets struck several locations around the eastern province of Deir el-Zour, which borders Iraq.\n\n\nU.S. Defense Secretary Lloyd Austin said at the time that the strikes were a response to the drone attack as well as a series of recent attacks against U.S.-led coalition forces in Syria by groups affiliated with Iran\u2019s Revolutionary Guard.\n\n\nIn a related development, Syrian Kurdish-led authorities announced Saturday that hundreds of IS fighters held in prisons around the region will be put on trial after their home countries refused to repatriate them.", "evaluation_span": [0, 2357], "labels": [{"id": 0, "span": [25, 43], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [3], "optional": false, "type": "Q27096213|Q43229"}, {"id": 1, "span": [276, 294], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [16], "optional": false, "type": "Q27096213|Q43229"}, {"id": 2, "span": [43, 48], "entity_id": "Q858", "name": "Syria", "parent": 21, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 3, "span": [30, 35], "entity_id": "Q858", "name": "Syria", "parent": 0, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 4, "span": [608, 613], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 5, "span": [742, 747], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 6, "span": [979, 984], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 7, "span": [1022, 1027], "entity_id": "Q858", "name": "Syria", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 8, "span": [107, 120], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 9, "span": [727, 738], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 10, "span": [954, 965], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 11, "span": [1154, 1165], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 12, "span": [1551, 1562], "entity_id": "Q11211", "name": "United States Armed Forces", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 13, "span": [214, 218], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 14, "span": [345, 349], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 15, "span": [596, 600], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 16, "span": [282, 286], "entity_id": "Q30", "name": "United States of America", "parent": 1, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 17, "span": [809, 813], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 18, "span": [164, 168], "entity_id": "Q30", "name": "United States of America", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 19, "span": [368, 386], "entity_id": "Q1476046", "name": "United States Central Command", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 20, "span": [544, 551], "entity_id": "Q1476046", "name": "United States Central Command", "parent": null, "children": [], "optional": false, "type": "Q43229"}, {"id": 21, "span": [38, 59], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [2], "optional": false, "type": "OTHER"}, {"id": 22, "span": [664, 680], "entity_id": "Q40469", "name": "Associated Press", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 23, "span": [116, 146], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [], "optional": false, "type": "OTHER"}, {"id": 24, "span": [67, 92], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [], "optional": false, "type": "OTHER"}, {"id": 25, "span": [2169, 2199], "entity_id": "Unknown", "name": "UnknownNoMapping", "parent": null, "children": [], "optional": false, "type": "OTHER"}, {"id": 26, "span": [1070, 1089], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 27, "span": [1210, 1212], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 28, "span": [1314, 1316], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 29, "span": [1376, 1378], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 30, "span": [1609, 1611], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 31, "span": [2236, 2238], "entity_id": "Q2429253", "name": "Islamic State", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 32, "span": [1234, 1238], "entity_id": "Q796", "name": "Iraq", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 33, "span": [1892, 1896], "entity_id": "Q796", "name": "Iraq", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 34, "span": [1624, 1628], "entity_id": "Q794", "name": "Iran", "parent": null, "children": [], "optional": false, "type": "Q27096213|Q43229"}, {"id": 35, "span": [1864, 1876], "entity_id": "Q239097", "name": "Deir ez-Zor", "parent": null, "children": [], "optional": false, "type": "Q27096213"}, {"id": 36, "span": [1923, 1935], "entity_id": "Q941013", "name": "Lloyd Austin", "parent": null, "children": [], "optional": false, "type": "Q215627"}, {"id": 37, "span": [2113, 2139], "entity_id": "Q271110", "name": "Islamic Revolutionary Guard Corps", "parent": null, "children": [], "optional": false, "type": "Q43229"}]}

make download_all: alias_to_qids.db: truncated gzip input

I have gotten this error a few times in a row while trying to install without docker:

make download_all
wget https://ad-research.cs.uni-freiburg.de/data/entity-linking/wikidata_mappings.tar.gz
--2023-10-30 13:20:22--  https://ad-research.cs.uni-freiburg.de/data/entity-linking/wikidata_mappings.tar.gz
Resolving ad-research.cs.uni-freiburg.de (ad-research.cs.uni-freiburg.de)... 132.230.150.101
Connecting to ad-research.cs.uni-freiburg.de (ad-research.cs.uni-freiburg.de)|132.230.150.101|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10473195010 (9.8G) [application/x-gzip]
Saving to: ‘wikidata_mappings.tar.gz.2’

wikidata_mappings.tar.gz.2                    100%[==============================================================================================>]   9.75G  16.7MB/s    in 10m 2s  

2023-10-30 13:30:25 (16.6 MB/s) - ‘wikidata_mappings.tar.gz.2’ saved [10473195010/10473195010]

tar -xvzf wikidata_mappings.tar.gz -C ./data/wikidata_mappings/
x alias_to_qids.db: truncated gzip input
tar: Error exit delayed from previous errors.
make: *** [download_wikidata_mappings] Error 1

Error installing neuralcoref during docker build: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Hello,

I am trying to build the docker container for Elevant. I have cloned the repo and run docker build -t elevant ..

This is on MacOS.

The output is below.


[+] Building 83.3s (11/28)                                                                                                                                            docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                  0.0s
 => => transferring dockerfile: 1.54kB                                                                                                                                                0.0s
 => [internal] load .dockerignore                                                                                                                                                     0.0s
 => => transferring context: 259B                                                                                                                                                     0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                                                                                                                       1.9s
 => [internal] load build context                                                                                                                                                     0.1s
 => => transferring context: 1.95MB                                                                                                                                                   0.0s
 => [ 1/24] FROM docker.io/library/ubuntu:20.04@sha256:ed4a42283d9943135ed87d4ee34e542f7f5ad9ecf2f244870e23122f703f91c2                                                               3.7s
 => => resolve docker.io/library/ubuntu:20.04@sha256:ed4a42283d9943135ed87d4ee34e542f7f5ad9ecf2f244870e23122f703f91c2                                                                 0.0s
 => => sha256:a80d11b67ef30474bcccab048020ee25aee659c4caaca70794867deba5d392b6 424B / 424B                                                                                            0.0s
 => => sha256:0341906bdafc976cd73b05ea0e3df2e4884c6b6816197a2ffbd2367061c19acf 2.32kB / 2.32kB                                                                                        0.0s
 => => sha256:915eebb74587f0e5d3919cb77720c143be9a85a8d2d5cd44675d84c8c3a2b74a 25.97MB / 25.97MB                                                                                      2.8s
 => => sha256:ed4a42283d9943135ed87d4ee34e542f7f5ad9ecf2f244870e23122f703f91c2 1.13kB / 1.13kB                                                                                        0.0s
 => => extracting sha256:915eebb74587f0e5d3919cb77720c143be9a85a8d2d5cd44675d84c8c3a2b74a                                                                                             0.7s
 => [ 2/24] WORKDIR /home/                                                                                                                                                            0.1s
 => [ 3/24] RUN apt-get update                                                                                                                                                        7.6s
 => [ 4/24] RUN apt-get install -y python3 python3-pip git wget vim curl python3-gdbm                                                                                                45.0s
 => [ 5/24] RUN git clone https://github.com/huggingface/neuralcoref.git                                                                                                             13.2s
 => [ 6/24] RUN python3 -m pip install -r neuralcoref/requirements.txt                                                                                                               10.7s
 => ERROR [ 7/24] RUN python3 -m pip install -e neuralcoref                                                                                                                           1.1s
------
 > [ 7/24] RUN python3 -m pip install -e neuralcoref:
0.654 Obtaining file:///home/neuralcoref
1.082     ERROR: Command errored out with exit status 1:
1.082      command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/neuralcoref/setup.py'"'"'; __file__='"'"'/home/neuralcoref/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
1.082          cwd: /home/neuralcoref/
1.082     Complete output (39 lines):
1.082     /usr/local/lib/python3.8/dist-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /home/neuralcoref/neuralcoref/neuralcoref.pxd
1.082       tree = Parsing.p_module(s, pxd, full_module_name)
1.082
1.082     Error compiling Cython file:
1.082     ------------------------------------------------------------
1.082     ...
1.082         int length
1.082
1.082
1.082     cdef class Vocab:
1.082         cdef Pool mem
1.082         cpdef readonly StringStore strings
1.082               ^
1.082     ------------------------------------------------------------
1.082
1.082     /usr/local/lib/python3.8/dist-packages/spacy/vocab.pxd:29:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
1.082     Processing neuralcoref.pyx
1.082     Traceback (most recent call last):
1.082       File "/home/neuralcoref/bin/cythonize.py", line 168, in <module>
1.082         run(args.root)
1.082       File "/home/neuralcoref/bin/cythonize.py", line 157, in run
1.082         process(base, filename, db)
1.082       File "/home/neuralcoref/bin/cythonize.py", line 123, in process
1.082         preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
1.082       File "/home/neuralcoref/bin/cythonize.py", line 86, in preserve_cwd
1.082         func(*args)
1.082       File "/home/neuralcoref/bin/cythonize.py", line 62, in process_pyx
1.082         raise Exception("Cython failed")
1.082     Exception: Cython failed
1.082     Traceback (most recent call last):
1.082       File "<string>", line 1, in <module>
1.082       File "/home/neuralcoref/setup.py", line 239, in <module>
1.082         setup_package()
1.082       File "/home/neuralcoref/setup.py", line 174, in setup_package
1.082         generate_cython(root, 'neuralcoref')
1.082       File "/home/neuralcoref/setup.py", line 163, in generate_cython
1.082         raise RuntimeError('Running cythonize failed')
1.082     RuntimeError: Running cythonize failed
1.082     Cythonizing sources
1.082     ----------------------------------------
1.083 ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
------
Dockerfile:8
--------------------
   6 |     RUN git clone https://github.com/huggingface/neuralcoref.git
   7 |     RUN python3 -m pip install -r neuralcoref/requirements.txt
   8 | >>> RUN python3 -m pip install -e neuralcoref
   9 |     COPY requirements.txt requirements.txt
  10 |     RUN python3 -m pip install -r requirements.txt
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install -e neuralcoref" did not complete successfully: exit code: 1```

Dependency conflict between radboud-el and xrenner

Trying to install without docker by doing docker file commands as instructed.

There is a conflict apparently between these packages. I managed to get it to run by removing the version requirements, but I don't know if this will actually work or not.

python3 -m pip install -r requirements.txt

INFO: pip is looking at multiple versions of xrenner to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 11) and -r requirements.txt (line 12) because these package versions have conflicting dependencies.

The conflict is caused by:
    radboud-el 0.0.1 depends on flair>=0.11
    xrenner 2.2.0.0 depends on flair==0.6.1

Marking coref mentions in benchmark dataset

Hello,

I'd like to create a benchmarking dataset for Elevant which includes coref mentions. I assume the coref mentions need to be labeled as such somehow so that coref can be turned on and off for evaluation. However, looking at the json schema I do not see a field in the ground truth labels for that. I do see that for predictions "coreference" is stored in the "recognized_by" field, but I don't see something similar for the ground truth labels. How should I treat coref ground truth mentions in a benchmarking dataset?

Article results not being displayed

Hello,

I am trying to use a new benchmark and experiment over it. I ran all of the scripts as I have multiple times before and I don't think there were any errors. But when I upload the new benchmark and experiment, although it gives me what seem like plausible numbers, clicking on the experiment does not display the documents, instead spinning forever.

In the console I can see that an error is happening while it is trying to read or format the first article. It happens at line 1814:

Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'type')

            if ("true_entity" in mention) {
                // Use the type of the parent entity because this is the type that counts in the evaluation.
                let curr_label_id = mention.true_entity.id;
                while (curr_label_id in child_label_to_parent) {
                    curr_label_id = child_label_to_parent[curr_label_id];
                }
                gt_annotation.gt_entity_type = label_id_to_label[curr_label_id].type;  <--- error
                // Get text of parent span
                if (curr_label_id !== mention.true_entity.id) {
                    let parent_span = label_id_to_label[curr_label_id].span;
                    const articles = (example_modal) ? window.articles_example_benchmark : window.benchmark_articles[benchmark];
                    gt_annotation.parent_text = articles[article_index].text.substring(parent_span[0], parent_span[1]);
                }

I added a try block and it looks like label_id_to_label is empty. The mention looks fine.

I have spent some time looking for any obvious problems in the datafiles and haven't found any.

If you would like to see it for yourself, you can do so here

Did something go wrong with the data generation, or is something wrong with the app? I would appreciate any assistance.

Candidate set

Hi, it's a great work!

I would like to know if it is necessary to ensure that the ground truth is in the candidate set when evaluating Entity Linking, particularly for the REL method.

I look forward to your response.

Experiment with REL failed on new benchmark dataset

Hello!

On the same new benchmark dataset that I attached to another issue, I tried running an experiment with REL.

I realize, as outlined in the previous issue, that this dataset for some reason didn't convert properly, but this seems to be a different problem.

I ran: python3 link_benchmark_entities.py Rel -l rel -b agolo-e2e-eval

This is what happens:

python3 link_benchmark_entities.py Rel -l rel -b agolo-e2e-eval
2023-11-06 13:29:05 [INFO]: Loading config file configs/rel.config.json for linker rel.
2023-11-06 13:29:05 [INFO]: Initializing linker rel with config parameters {'linker_name': 'REL', 'wiki_version': 'wiki_2014', 'ner_model': 'ner-fast', 'use_api': False, 'api_url': 'https://rel.cs.ru.nl/api', 'experiment_description': 'Using the Wiki 2014 version and Flair for NER.'} ...
2023-11-06 13:29:09 [INFO]: Loading Wikipedia to Wikidata database from ./data/wikidata_mappings/wikipedia_name_to_qid.db ...
2023-11-06 13:29:11 [INFO]: -> 9279408 Wikipedia-Wikidata mappings loaded.
2023-11-06 13:29:11 [INFO]: Loading redirects database from ./data/wikipedia_mappings/redirects.db ...
2023-11-06 13:29:13 [INFO]: -> 10914101 redirects loaded.
2023-11-06 13:29:13 [INFO]: Creating directory ./data/linker_files/rel/
2023-11-06 13:29:13 [INFO]: Downloading and extracting http://gem.cs.ru.nl/generic.tar.gz
2023-11-06 13:35:51 [INFO]: Saved file at ./data/linker_files/rel/generic
2023-11-06 13:35:51 [INFO]: Downloading and extracting http://gem.cs.ru.nl/wiki_2014.tar.gz
2023-11-06 13:57:41 [INFO]: Saved file at ./data/linker_files/rel/wiki_2014
2023-11-06 13:57:42,285 https://nlp.informatik.hu-berlin.de/resources/models/ner-fast/en-ner-fast-conll03-v0.4.pt not found in cache, downloading to /var/folders/_s/ph0bp1tn74s52yx13q0c_tq40000gn/T/tmp184af80d
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 256774339/256774339 [00:28<00:00, 9069899.45B/s]
2023-11-06 13:58:11,249 copying /var/folders/_s/ph0bp1tn74s52yx13q0c_tq40000gn/T/tmp184af80d to cache at /Users/alan/.flair/models/en-ner-fast-conll03-v0.4.pt
2023-11-06 13:58:11,839 removing temp file /var/folders/_s/ph0bp1tn74s52yx13q0c_tq40000gn/T/tmp184af80d
2023-11-06 13:58:12,045 loading file /Users/alan/.flair/models/en-ner-fast-conll03-v0.4.pt
Traceback (most recent call last):
  File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 164, in <module>
    main(cmdl_args)
  File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 40, in main
    linking_system = LinkingSystem(args.linker_name,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 43, in __init__
    self._initialize_linker(linker_name, prediction_file, prediction_format)
  File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 174, in _initialize_linker
    self.linker = RelLinker(self.entity_db, self.linker_config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/linkers/rel_linker.py", line 80, in __init__
    self.ner_tagger = load_flair_ner(ner_model)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/REL/ner/flair_wrapper.py", line 12, in load_flair_ner
    return SequenceTagger.load(fetch_model(path_or_url, cache_root))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/REL/utils.py", line 18, in fetch_model
    return get_from_cache(path_or_url, cache_dir)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/flair/file_utils.py", line 215, in get_from_cache
    response = requests.head(url, headers={"User-Agent": "Flair"}, allow_redirects=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/api.py", line 100, in head
    return request("head", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/sessions.py", line 575, in request
    prep = self.prepare_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/sessions.py", line 486, in prepare_request
    p.prepare(
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/Users/alan/repos/agolo/elevant/venv/lib/python3.11/site-packages/requests/models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'ner-fast': No scheme supplied. Perhaps you meant https://ner-fast?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.