There are SEVERAL issues while predicting output using LSTM model. Can anyone help??<b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Unable to predict using LSTM model about cliner HOT 10 CLOSED

text-machine-lab commented on July 29, 2024

Unable to predict using LSTM model

from cliner.

Comments (10)

Ketki-Savle commented on July 29, 2024 2

Hey Tom

Happy to merge some of my changes with comments regarding the same. Also happy to answer other's issues as much as I can if you don't mind. Truly appreciate this.

Thanks & Regards,

Ketki

from cliner.

tompollard commented on July 29, 2024 1

@Ketki-Savle please be kind. Supporting open source tools is a difficult task and being overly critical or demanding will just discourage people from sharing in future. Outlining what steps you have taken to try to fix the bugs would be a helpful first step.

from cliner.

wboag commented on July 29, 2024

Thanks @tompollard though in fairness to Ketki, I think there's some legitimate frustration. I'm not sure there is anyone actively being the lead developer on this. I've tried training UROPs to take over, but so far nothing has stuck but I can't keep maintaining the project I started as an undergrad 6 years ago.

I've suggested to Anna that maybe we should consider taking the project down if its not being maintained, but she points to the benefits it can have (especially the CRF, which isn't broken and is easy enough to start running).

If anyone would like to be the lead developer of CliNER, I think Tristan, Anna, and I would be very grateful & would do what we can to help you get up to speed. The project hasn't been developed in a while.

from cliner.

tompollard commented on July 29, 2024

Thanks @wboag I agree with Anna that it's worth keeping available. Think GitHub now has an archive state, so maybe that would work?

from cliner.

Ketki-Savle commented on July 29, 2024

Hey guys,

Please excuse me for my frustration but I spent last 72 hours understanding what is broken and I have no luck so far! This is such a fantastic work and it is really sad if we can't leverage it you know. :(

Moving on, I have been able to train LSTM on i2b2 data and the only issue that is remaining now is, predict functionality is not working. Below is the trace:

Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/Users/ag33366/smi/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/Users/ag33366/smi/CliNER/code/predict.py", line 182, in predict
output = note.write(labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 85, in write
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 383, in tok_labels_to_concepts
test_tok_labels = tok_concepts_to_labels(tokenized_sents, concepts)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 242, in tok_concepts_to_labels
labels[lineno-1][start_tok] = 'B-%s' % label
IndexError: list assignment index out of range

Another question that I have is, in model.py file, predict_generic function has 2 sets of args. Not sure which one is the best to use for LSTM. So far I am using the one with all args (thats the one which is commented). If you could share your thoughts, it would be really helpful :)

I could resolve a bunch of other issues prior to this and I am happy to take this discussion on gmail if thats ok. Thank you.

from cliner.

tompollard commented on July 29, 2024

@Ketki-Savle glad to hear that you're making progress! Please could you try dropping a breakpoint() (or import pdb; pdb.set_trace() if < Python 3.7) before line 242 and post the contents of tok_concepts?

e.g. maybe replace:

labels[lineno-1][start_tok] = 'B-%s' % label

with

try:
    labels[lineno-1][start_tok] = 'B-%s' % label
except IndexError:
    breakpoint()

then print(tok_concepts)

(if running from the command line, then you may need to add python -m pdb ... as the first argument to enter the debugger).

from cliner.

Ketki-Savle commented on July 29, 2024

Hello,

Here is the updated trace. Please feel free to correct me if we are missing out on something.

/Users/ag33366/smi/CliNER/code/notes/documents.py(243)tok_concepts_to_labels()
-> label,lineno,start_tok,end_tok = concept
(Pdb) continue
/Users/ag33366/smi/CliNER/code/notes/documents.py(249)tok_concepts_to_labels()
-> print("Tok concept is ", tok_concepts)
(Pdb) continue
Tok concept is [('problem', 1, 49, 52), ('problem', 1, 56, 61), ('problem', 1, 63, 68), ('problem', 1, 71, 76), ('problem', 1, 88, 89), ('problem', 1, 93, 94), ('problem', 1, 117, 119), ('problem', 1, 121, 121), ('problem', 1, 248, 249), ('problem', 1, 256, 257), ('problem', 1, 272, 274), ('problem', 1, 281, 283), ('problem', 1, 300, 302), ('problem', 1, 313, 315), ('problem', 1, 320, 321), ('problem', 1, 327, 329), ('problem', 1, 332, 333), ('problem', 1, 338, 339), ('problem', 1, 346, 347), ('problem', 1, 373, 375), ('problem', 1, 385, 387), ('test', 1, 516, 516), ('test', 1, 517, 517), ('problem', 1, 565, 568), ('problem', 1, 655, 656), ('problem', 1, 733, 734), ('problem', 1, 1254, 1261), ('problem', 1, 1265, 1267), ('problem', 1, 1291, 1293), ('problem', 1, 1421, 1422), ('problem', 2, 49, 52), ('problem', 2, 56, 61), ('problem', 2, 63, 68), ('problem', 2, 71, 76), ('problem', 2, 88, 89), ('problem', 2, 93, 94), ('problem', 2, 117, 119), ('problem', 2, 121, 121), ('problem', 2, 248, 249), ('problem', 2, 256, 257), ('problem', 2, 272, 274), ('problem', 2, 281, 283), ('problem', 2, 300, 302), ('problem', 2, 313, 315), ('problem', 2, 320, 321), ('problem', 2, 327, 329), ('problem', 2, 332, 333), ('problem', 2, 338, 339), ('problem', 2, 346, 347), ('problem', 2, 373, 375), ('problem', 2, 385, 387), ('test', 2, 516, 516), ('test', 2, 517, 517), ('problem', 2, 565, 568), ('problem', 2, 655, 656), ('problem', 2, 733, 734), ('problem', 2, 1254, 1261), ('problem', 2, 1265, 1267), ('problem', 2, 1291, 1293), ('problem', 2, 1421, 1422), ('problem', 3, 49, 52), ('problem', 3, 56, 61), ('problem', 3, 63, 68), ('problem', 3, 71, 76), ('problem', 3, 88, 89), ('problem', 3, 93, 94),........{trying not to paste all the labels for error demo}...('problem', 138, 300, 302), ('problem', 138, 313, 315), ('problem', 138, 320, 321), ('problem', 138, 327, 329), ('problem', 138, 332, 333), ('problem', 138, 338, 339), ('problem', 138, 346, 347), ('problem', 138, 373, 375), ('problem', 138, 385, 387), ('test', 138, 516, 516), ('test', 138, 517, 517), ('problem', 138, 565, 568), ('problem', 138, 655, 656), ('problem', 138, 733, 734), ('problem', 138, 1254, 1261), ('problem', 138, 1265, 1267), ('problem', 138, 1291, 1293), ('problem', 138, 1421, 1422)]
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/Users/ag33366/smi/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/Users/ag33366/smi/CliNER/code/predict.py", line 182, in predict
output = note.write(labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 86, in write
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 390, in tok_labels_to_concepts
test_tok_labels = tok_concepts_to_labels(tokenized_sents, concepts)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 249, in tok_concepts_to_labels
print("Tok concept is ", tok_concepts)
IndexError: list assignment index out of range

from cliner.

Ketki-Savle commented on July 29, 2024

Hey @tompollard,

Forgot to mention another thing here. For CRF model we are using train/dev 90/10 split and for LSTM we have train/test/val split. I made some changes in split as well as CRF split was causing 0 data points for test set while using LSTM. Below is the updated code from model.py: (in generic_train function)

    val_sents = train_sents[:ind ]
    train_sents = train_sents[ 2*ind:]
    test_sents = train_sents[ind:2*ind]

    val_labels = train_labels[:ind ]
    train_labels = train_labels[ 2*ind:]
    test_labels = train_labels[ind:2*ind]

This will take care of another index error on :

print (Datasets_tokens['valid'][0])
print (Datasets_tokens['test'][0])

Thank you.

-Ketki

from cliner.

Ketki-Savle commented on July 29, 2024

Hello,

If you could update document.py to this :

From line 86:

concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels[0])

This will solve the problem.

Thank you.

-Ketki

from cliner.

tompollard commented on July 29, 2024

Hi @Ketki-Savle, great to hear that you go to the bottom of this and thanks for the posts. It would be good to try to merge your fixes into the repo, so that we can solve the problems for other people too. Are you comfortable making a pull request with your changes, or would you prefer if someone else took this on? Thanks! Tom

from cliner.

Unable to predict using LSTM model about cliner HOT 10 CLOSED

Comments (10)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent