neubig / nn4nlp-code Goto Github PK

View Code? Open in Web Editor NEW

1.3K 65.0 488.0 11.92 MB

Code Samples from Neural Networks for NLP

License: Other

Jupyter Notebook 2.94% Python 94.64% Shell 0.14% Perl 2.27%

nn4nlp-code's Issues

about input file

Thank you for your work. Can you share the original data file with us? Thanks a lot!

03-wordemb-pytorch/wordemb-cbow.py : softmax missing ?

Shouldn't a softmax be added to the model output?

MST decoder in biaffine parser

I find the comments in the MST decoder confusing. The MST decoder is not running a full Chu-Liu-Edmonds MST. It is based on the heuristics that Stanford CoNLL 2017 parser used. Moreover, the scores in Stanford's code are assumed to be normalized by softmax (therefore multiplicative scoring) whereas here, the code passes unnormalized scores , which are basically log-probabilities (ignoring the denominator of softmax, which of course is okay because it affects uniformly whichever node you pick as the parent). I think all division operations in MST decoder should be replaced by subtraction, and 0s should be replaced by -inf.

I've got question about the code in nn4nlp-code/16-reinforce/bilstm-tagger.py

I've got a few questions about the code, so I will just list them out. And hope some of you can help.

In line 166, what is "XENT"? this variable only appears once in the code.
In line 106, I don't understand why the probability distribution is calculated in that way?(By the ay my
compiler said the as_array() part is not right.)
In line 109, "len(x)" is not right. It kept showing this up "TypeError: object of type '_dynet.Expression' has no len()". And I don't really know why the samples are picked this way?

The performances of code under 01-intro-pytorch are so bad.

Has anyone run those scripts? the performances are really bad... before uploaded to the repo, is there anyone verify those code?

Updating Pytorch code

I would like to know if there are any plans for completing the Pytorch code, and would you accept merge requests from non-class members?

add generation to 06-rnn lm examples.

Both language model examples in 06-rnn are lacking generation capability and usage examples. I think those should be added, and the output format changed to be closer to that of the 02-lm examples.

Errors while executing

I've got a few questions about the code "16-reinforce" , so I will just list them out. And hope you can help.

line 106 throws an error '_dynet.Expression' object has no attribute 'as_array'
2)line 109 throws an error '_dynet.Expression' object has no attribute 'len(x)'
Can you please help me the solution for this.

enc_dec.py def generate(sent):

There are some error in function generate(sent)

question regarding bilstm-tagger.py

Hi, Thanks for sharing the source code. I learned a lot. But, I have a quick question regarding "Reinforce score", my question is why the Reinforce score is "Score*reward" as calculated here :

line 126 of bilstm-tagger.py

#then calculate the reinforce scores using reinforce
    reinforce_scores = [r_s*score for r_s, score in zip(rewards_over_baseline, scores)]```

The version of Python

What's version of the Python that the course uses?

Unnecessary parameter() calls in dynet examples

In pretty much all of the dynet examples, there are unnecessary parameter(parameter_name) calls to convert parameters to expressions. This is no longer necessary, as the newest version of python dynet automatically does the conversion.

Model() should also be renamed to Parameter collection.

Computing the number of words

Most files share similar data reading code, like

nn4nlp-code/01-intro/cbow.py

Lines 18 to 22 in a9e8be5

 train = list(read_dataset("../data/classes/train.txt")) 

 w2i = defaultdict(lambda: UNK, w2i) 

 dev = list(read_dataset("../data/classes/test.txt")) 

 nwords = len(w2i) 

 ntags = len(t2i)

In most of the examples, the variable nwords is used as the effective vocabulary size, for instance, when we allocate parameters for embedding matrix.

nn4nlp-code/01-intro/cbow.py

Line 30 in a9e8be5

W_emb = model.add_lookup_parameters((nwords, EMB_SIZE)) # Word embeddings

However, there are likely many new words in dev/test set that might be added in w2i... their values are mapped to UNK, but they are still counted in len(w2i) which is likely not intended. Often this overcounting does not change the results, but it can be problematic in some cases.

Updating during dev evaluation in loglin-lm.py

This is a bug and should be removed:
https://github.com/neubig/nn4nlp-code/blob/master/02-lm/loglin-lm.py#L97

Other files should be checked as well.

Doubt regarding the code

I've got a few questions about the code "16-reinforce" , so I will just list them out. And hope you can help.

line 106 throws an error '_dynet.Expression' object has no attribute 'as_array'
2)line 109 throws an error '_dynet.Expression' object has no attribute 'len(x)'

Can you please help me the solution for this.

neubig / nn4nlp-code Goto Github PK

nn4nlp-code's Issues

about input file

03-wordemb-pytorch/wordemb-cbow.py : softmax missing ?

MST decoder in biaffine parser

I've got question about the code in nn4nlp-code/16-reinforce/bilstm-tagger.py

The performances of code under 01-intro-pytorch are so bad.

Updating Pytorch code

add generation to 06-rnn lm examples.

Errors while executing

enc_dec.py def generate(sent):

question regarding bilstm-tagger.py

The version of Python

Unnecessary parameter() calls in dynet examples

Computing the number of words

Updating during dev evaluation in loglin-lm.py

Doubt regarding the code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	train = list(read_dataset("../data/classes/train.txt"))
	w2i = defaultdict(lambda: UNK, w2i)
	dev = list(read_dataset("../data/classes/test.txt"))
	nwords = len(w2i)
	ntags = len(t2i)