Giter VIP home page Giter VIP logo

Comments (10)

Ketki-Savle avatar Ketki-Savle commented on July 29, 2024 2

Hey Tom

Happy to merge some of my changes with comments regarding the same. Also happy to answer other's issues as much as I can if you don't mind. Truly appreciate this.

Thanks & Regards,

Ketki

from cliner.

tompollard avatar tompollard commented on July 29, 2024 1

@Ketki-Savle please be kind. Supporting open source tools is a difficult task and being overly critical or demanding will just discourage people from sharing in future. Outlining what steps you have taken to try to fix the bugs would be a helpful first step.

from cliner.

wboag avatar wboag commented on July 29, 2024

Thanks @tompollard though in fairness to Ketki, I think there's some legitimate frustration. I'm not sure there is anyone actively being the lead developer on this. I've tried training UROPs to take over, but so far nothing has stuck but I can't keep maintaining the project I started as an undergrad 6 years ago.

I've suggested to Anna that maybe we should consider taking the project down if its not being maintained, but she points to the benefits it can have (especially the CRF, which isn't broken and is easy enough to start running).

If anyone would like to be the lead developer of CliNER, I think Tristan, Anna, and I would be very grateful & would do what we can to help you get up to speed. The project hasn't been developed in a while.

from cliner.

tompollard avatar tompollard commented on July 29, 2024

Thanks @wboag I agree with Anna that it's worth keeping available. Think GitHub now has an archive state, so maybe that would work?

from cliner.

Ketki-Savle avatar Ketki-Savle commented on July 29, 2024

Hey guys,

Please excuse me for my frustration but I spent last 72 hours understanding what is broken and I have no luck so far! This is such a fantastic work and it is really sad if we can't leverage it you know. :(

Moving on, I have been able to train LSTM on i2b2 data and the only issue that is remaining now is, predict functionality is not working. Below is the trace:

Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/Users/ag33366/smi/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/Users/ag33366/smi/CliNER/code/predict.py", line 182, in predict
output = note.write(labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 85, in write
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 383, in tok_labels_to_concepts
test_tok_labels = tok_concepts_to_labels(tokenized_sents, concepts)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 242, in tok_concepts_to_labels
labels[lineno-1][start_tok] = 'B-%s' % label
IndexError: list assignment index out of range

Another question that I have is, in model.py file, predict_generic function has 2 sets of args. Not sure which one is the best to use for LSTM. So far I am using the one with all args (thats the one which is commented). If you could share your thoughts, it would be really helpful :)

I could resolve a bunch of other issues prior to this and I am happy to take this discussion on gmail if thats ok. Thank you.

from cliner.

tompollard avatar tompollard commented on July 29, 2024

@Ketki-Savle glad to hear that you're making progress! Please could you try dropping a breakpoint() (or import pdb; pdb.set_trace() if < Python 3.7) before line 242 and post the contents of tok_concepts?

e.g. maybe replace:

labels[lineno-1][start_tok] = 'B-%s' % label

with

try:
    labels[lineno-1][start_tok] = 'B-%s' % label
except IndexError:
    breakpoint()

then print(tok_concepts)

(if running from the command line, then you may need to add python -m pdb ... as the first argument to enter the debugger).

from cliner.

Ketki-Savle avatar Ketki-Savle commented on July 29, 2024

Hello,

Here is the updated trace. Please feel free to correct me if we are missing out on something.

/Users/ag33366/smi/CliNER/code/notes/documents.py(243)tok_concepts_to_labels()
-> label,lineno,start_tok,end_tok = concept
(Pdb) continue
/Users/ag33366/smi/CliNER/code/notes/documents.py(249)tok_concepts_to_labels()
-> print("Tok concept is ", tok_concepts)
(Pdb) continue
Tok concept is [('problem', 1, 49, 52), ('problem', 1, 56, 61), ('problem', 1, 63, 68), ('problem', 1, 71, 76), ('problem', 1, 88, 89), ('problem', 1, 93, 94), ('problem', 1, 117, 119), ('problem', 1, 121, 121), ('problem', 1, 248, 249), ('problem', 1, 256, 257), ('problem', 1, 272, 274), ('problem', 1, 281, 283), ('problem', 1, 300, 302), ('problem', 1, 313, 315), ('problem', 1, 320, 321), ('problem', 1, 327, 329), ('problem', 1, 332, 333), ('problem', 1, 338, 339), ('problem', 1, 346, 347), ('problem', 1, 373, 375), ('problem', 1, 385, 387), ('test', 1, 516, 516), ('test', 1, 517, 517), ('problem', 1, 565, 568), ('problem', 1, 655, 656), ('problem', 1, 733, 734), ('problem', 1, 1254, 1261), ('problem', 1, 1265, 1267), ('problem', 1, 1291, 1293), ('problem', 1, 1421, 1422), ('problem', 2, 49, 52), ('problem', 2, 56, 61), ('problem', 2, 63, 68), ('problem', 2, 71, 76), ('problem', 2, 88, 89), ('problem', 2, 93, 94), ('problem', 2, 117, 119), ('problem', 2, 121, 121), ('problem', 2, 248, 249), ('problem', 2, 256, 257), ('problem', 2, 272, 274), ('problem', 2, 281, 283), ('problem', 2, 300, 302), ('problem', 2, 313, 315), ('problem', 2, 320, 321), ('problem', 2, 327, 329), ('problem', 2, 332, 333), ('problem', 2, 338, 339), ('problem', 2, 346, 347), ('problem', 2, 373, 375), ('problem', 2, 385, 387), ('test', 2, 516, 516), ('test', 2, 517, 517), ('problem', 2, 565, 568), ('problem', 2, 655, 656), ('problem', 2, 733, 734), ('problem', 2, 1254, 1261), ('problem', 2, 1265, 1267), ('problem', 2, 1291, 1293), ('problem', 2, 1421, 1422), ('problem', 3, 49, 52), ('problem', 3, 56, 61), ('problem', 3, 63, 68), ('problem', 3, 71, 76), ('problem', 3, 88, 89), ('problem', 3, 93, 94),........{trying not to paste all the labels for error demo}...('problem', 138, 300, 302), ('problem', 138, 313, 315), ('problem', 138, 320, 321), ('problem', 138, 327, 329), ('problem', 138, 332, 333), ('problem', 138, 338, 339), ('problem', 138, 346, 347), ('problem', 138, 373, 375), ('problem', 138, 385, 387), ('test', 138, 516, 516), ('test', 138, 517, 517), ('problem', 138, 565, 568), ('problem', 138, 655, 656), ('problem', 138, 733, 734), ('problem', 138, 1254, 1261), ('problem', 138, 1265, 1267), ('problem', 138, 1291, 1293), ('problem', 138, 1421, 1422)]
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/Users/ag33366/smi/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/Users/ag33366/smi/CliNER/code/predict.py", line 182, in predict
output = note.write(labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 86, in write
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 390, in tok_labels_to_concepts
test_tok_labels = tok_concepts_to_labels(tokenized_sents, concepts)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 249, in tok_concepts_to_labels
print("Tok concept is ", tok_concepts)
IndexError: list assignment index out of range

from cliner.

Ketki-Savle avatar Ketki-Savle commented on July 29, 2024

Hey @tompollard,

Forgot to mention another thing here. For CRF model we are using train/dev 90/10 split and for LSTM we have train/test/val split. I made some changes in split as well as CRF split was causing 0 data points for test set while using LSTM. Below is the updated code from model.py: (in generic_train function)

    val_sents = train_sents[:ind ]
    train_sents = train_sents[ 2*ind:]
    test_sents = train_sents[ind:2*ind]

    val_labels = train_labels[:ind ]
    train_labels = train_labels[ 2*ind:]
    test_labels = train_labels[ind:2*ind]

This will take care of another index error on :

print (Datasets_tokens['valid'][0])
print (Datasets_tokens['test'][0])

Thank you.

-Ketki

from cliner.

Ketki-Savle avatar Ketki-Savle commented on July 29, 2024

Hello,

If you could update document.py to this :

From line 86:

concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels[0])

This will solve the problem.

Thank you.

-Ketki

from cliner.

tompollard avatar tompollard commented on July 29, 2024

Hi @Ketki-Savle, great to hear that you go to the bottom of this and thanks for the posts. It would be good to try to merge your fixes into the repo, so that we can solve the problems for other people too. Are you comfortable making a pull request with your changes, or would you prefer if someone else took this on? Thanks! Tom

from cliner.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.