Dear colleagues, thank you for your amazing work on rethinking of th

Float in words_indexes about wire57 HOT 2 CLOSED

RuslanAbliazov commented on September 10, 2024

Float in words_indexes

from wire57.

Comments (2)

rali-udem commented on September 10, 2024 1

Thank you for your interest.

You are right. In the dataset provided, a token usually has a corresponding word index that is the position of the token in the tokenized source sentence. Sometimes, a token can have an index that is a list[int, float] instead of merely an int.

The latter list is an undocumented and unusable characteristic of the dataset. It appears when a given token occurs more than once in the original sentence, e.g. "Tokyo" in a source sentence with multiple "Tokyo"s. In this case, we have resorted to a heuristic that attempts to guess which occurrence of "Tokyo" we are talking about and produces a list (probable_token_index, confidence_score). Neither value can be used readily. Worse, they can be unreliable, i.e. refer to the wrong words. These list indices are not used during evaluation, so this does not invalidate results. Should you wish to have access to such indices, we invite you to finish implementing this feature correctly.

As for the code referenced by googledoc_manual_OIE_loader.load_WiRe_annotations() , we unfortunately cannot release it.

Thank you for pointing this out.

from wire57.

RuslanAbliazov commented on September 10, 2024

Thank you for your response!

from wire57.

Recommend Projects

Float in words_indexes about wire57 HOT 2 CLOSED

Comments (2)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent