Comments (10)
Hey Tom
Happy to merge some of my changes with comments regarding the same. Also happy to answer other's issues as much as I can if you don't mind. Truly appreciate this.
Thanks & Regards,
Ketki
from cliner.
@Ketki-Savle please be kind. Supporting open source tools is a difficult task and being overly critical or demanding will just discourage people from sharing in future. Outlining what steps you have taken to try to fix the bugs would be a helpful first step.
from cliner.
Thanks @tompollard though in fairness to Ketki, I think there's some legitimate frustration. I'm not sure there is anyone actively being the lead developer on this. I've tried training UROPs to take over, but so far nothing has stuck but I can't keep maintaining the project I started as an undergrad 6 years ago.
I've suggested to Anna that maybe we should consider taking the project down if its not being maintained, but she points to the benefits it can have (especially the CRF, which isn't broken and is easy enough to start running).
If anyone would like to be the lead developer of CliNER, I think Tristan, Anna, and I would be very grateful & would do what we can to help you get up to speed. The project hasn't been developed in a while.
from cliner.
Thanks @wboag I agree with Anna that it's worth keeping available. Think GitHub now has an archive state, so maybe that would work?
from cliner.
Hey guys,
Please excuse me for my frustration but I spent last 72 hours understanding what is broken and I have no luck so far! This is such a fantastic work and it is really sad if we can't leverage it you know. :(
Moving on, I have been able to train LSTM on i2b2 data and the only issue that is remaining now is, predict functionality is not working. Below is the trace:
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/Users/ag33366/smi/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/Users/ag33366/smi/CliNER/code/predict.py", line 182, in predict
output = note.write(labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 85, in write
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 383, in tok_labels_to_concepts
test_tok_labels = tok_concepts_to_labels(tokenized_sents, concepts)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 242, in tok_concepts_to_labels
labels[lineno-1][start_tok] = 'B-%s' % label
IndexError: list assignment index out of range
Another question that I have is, in model.py file, predict_generic function has 2 sets of args. Not sure which one is the best to use for LSTM. So far I am using the one with all args (thats the one which is commented). If you could share your thoughts, it would be really helpful :)
I could resolve a bunch of other issues prior to this and I am happy to take this discussion on gmail if thats ok. Thank you.
from cliner.
@Ketki-Savle glad to hear that you're making progress! Please could you try dropping a breakpoint()
(or import pdb; pdb.set_trace()
if < Python 3.7) before line 242 and post the contents of tok_concepts
?
e.g. maybe replace:
labels[lineno-1][start_tok] = 'B-%s' % label
with
try:
labels[lineno-1][start_tok] = 'B-%s' % label
except IndexError:
breakpoint()
then print(tok_concepts)
(if running from the command line, then you may need to add python -m pdb ...
as the first argument to enter the debugger).
from cliner.
Hello,
Here is the updated trace. Please feel free to correct me if we are missing out on something.
/Users/ag33366/smi/CliNER/code/notes/documents.py(243)tok_concepts_to_labels()
-> label,lineno,start_tok,end_tok = concept
(Pdb) continue
/Users/ag33366/smi/CliNER/code/notes/documents.py(249)tok_concepts_to_labels()
-> print("Tok concept is ", tok_concepts)
(Pdb) continue
Tok concept is [('problem', 1, 49, 52), ('problem', 1, 56, 61), ('problem', 1, 63, 68), ('problem', 1, 71, 76), ('problem', 1, 88, 89), ('problem', 1, 93, 94), ('problem', 1, 117, 119), ('problem', 1, 121, 121), ('problem', 1, 248, 249), ('problem', 1, 256, 257), ('problem', 1, 272, 274), ('problem', 1, 281, 283), ('problem', 1, 300, 302), ('problem', 1, 313, 315), ('problem', 1, 320, 321), ('problem', 1, 327, 329), ('problem', 1, 332, 333), ('problem', 1, 338, 339), ('problem', 1, 346, 347), ('problem', 1, 373, 375), ('problem', 1, 385, 387), ('test', 1, 516, 516), ('test', 1, 517, 517), ('problem', 1, 565, 568), ('problem', 1, 655, 656), ('problem', 1, 733, 734), ('problem', 1, 1254, 1261), ('problem', 1, 1265, 1267), ('problem', 1, 1291, 1293), ('problem', 1, 1421, 1422), ('problem', 2, 49, 52), ('problem', 2, 56, 61), ('problem', 2, 63, 68), ('problem', 2, 71, 76), ('problem', 2, 88, 89), ('problem', 2, 93, 94), ('problem', 2, 117, 119), ('problem', 2, 121, 121), ('problem', 2, 248, 249), ('problem', 2, 256, 257), ('problem', 2, 272, 274), ('problem', 2, 281, 283), ('problem', 2, 300, 302), ('problem', 2, 313, 315), ('problem', 2, 320, 321), ('problem', 2, 327, 329), ('problem', 2, 332, 333), ('problem', 2, 338, 339), ('problem', 2, 346, 347), ('problem', 2, 373, 375), ('problem', 2, 385, 387), ('test', 2, 516, 516), ('test', 2, 517, 517), ('problem', 2, 565, 568), ('problem', 2, 655, 656), ('problem', 2, 733, 734), ('problem', 2, 1254, 1261), ('problem', 2, 1265, 1267), ('problem', 2, 1291, 1293), ('problem', 2, 1421, 1422), ('problem', 3, 49, 52), ('problem', 3, 56, 61), ('problem', 3, 63, 68), ('problem', 3, 71, 76), ('problem', 3, 88, 89), ('problem', 3, 93, 94),........{trying not to paste all the labels for error demo}...('problem', 138, 300, 302), ('problem', 138, 313, 315), ('problem', 138, 320, 321), ('problem', 138, 327, 329), ('problem', 138, 332, 333), ('problem', 138, 338, 339), ('problem', 138, 346, 347), ('problem', 138, 373, 375), ('problem', 138, 385, 387), ('test', 138, 516, 516), ('test', 138, 517, 517), ('problem', 138, 565, 568), ('problem', 138, 655, 656), ('problem', 138, 733, 734), ('problem', 138, 1254, 1261), ('problem', 138, 1265, 1267), ('problem', 138, 1291, 1293), ('problem', 138, 1421, 1422)]
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/Users/ag33366/smi/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/Users/ag33366/smi/CliNER/code/predict.py", line 182, in predict
output = note.write(labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 86, in write
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 390, in tok_labels_to_concepts
test_tok_labels = tok_concepts_to_labels(tokenized_sents, concepts)
File "/Users/ag33366/smi/CliNER/code/notes/documents.py", line 249, in tok_concepts_to_labels
print("Tok concept is ", tok_concepts)
IndexError: list assignment index out of range
from cliner.
Hey @tompollard,
Forgot to mention another thing here. For CRF model we are using train/dev 90/10 split and for LSTM we have train/test/val split. I made some changes in split as well as CRF split was causing 0 data points for test set while using LSTM. Below is the updated code from model.py: (in generic_train function)
val_sents = train_sents[:ind ]
train_sents = train_sents[ 2*ind:]
test_sents = train_sents[ind:2*ind]
val_labels = train_labels[:ind ]
train_labels = train_labels[ 2*ind:]
test_labels = train_labels[ind:2*ind]
This will take care of another index error on :
print (Datasets_tokens['valid'][0])
print (Datasets_tokens['test'][0])
Thank you.
-Ketki
from cliner.
Hello,
If you could update document.py to this :
From line 86:
concept_tuples = tok_labels_to_concepts(self._tok_sents, token_labels[0])
This will solve the problem.
Thank you.
-Ketki
from cliner.
Hi @Ketki-Savle, great to hear that you go to the bottom of this and thanks for the posts. It would be good to try to merge your fixes into the repo, so that we can solve the problems for other people too. Are you comfortable making a pull request with your changes, or would you prefer if someone else took this on? Thanks! Tom
from cliner.
Related Issues (18)
- error when trying to dump the model into tmp file HOT 5
- Python 3 compatibility issue -- pickle.load(), encoding argument HOT 4
- LSTM HOT 1
- UMLS: package utilities isn't available any more HOT 1
- unable to run HOT 1
- Tokenization question HOT 1
- vector2.txt in LSTM_parameters.txt is not found HOT 2
- evaluate not working HOT 2
- cliner: command not found HOT 1
- format.py not working HOT 3
- Comparison to word "blood" hardcoded in get_cui
- UMLS utility HOT 1
- Cliner is not recognized HOT 3
- Directly interacting with code examples HOT 1
- Unable to run lstm model successfully HOT 3
- cliner command not found, after using all the steps from README. HOT 2
- Cliner Training foo.model issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cliner.