Comments (4)
Note that the optimization objective is not BLEU score:
https://github.com/kyunghyuncho/dl4mt-material/blob/master/session3/nmt.py#L649-L659
This paper discusses some issues with the discrepancy between training and
inference in this kind of seq2seq model http://arxiv.org/pdf/1506.03099.pdf
On Thu, Jan 28, 2016 at 11:39 AM, Jencir Lee [email protected]
wrote:
1 I'm training the session3/nmtpy with attention on the europarl corpus
with 5000-term vocabulary, 250 word vector dimension, 500 internal
representation dimension It takes a couple of days for a full epoch on AWS
GPU instance (Nvidia K40) with 4G GPU memory I'm just wondering if there's
any knowingly more basic parallel corpus (eg understandable by 10-yo) for
training2 BLEU metric could make it anodyne for the translation to omit some
pivotal words, eg if the correct translation is "I believe, that xxxx" and
the machine translation omitted "believe" Would there by any idea that an
MT approach could better conserve the structural/compositional information?—
Reply to this email directly or view it on GitHub
#37.
from dl4mt-tutorial.
Also this one http://arxiv.org/pdf/1511.06456.pdf
On Thu, Jan 28, 2016 at 2:04 PM, Chris Hokamp [email protected]
wrote:
Note that the optimization objective is not BLEU score:
https://github.com/kyunghyuncho/dl4mt-material/blob/master/session3/nmt.py#L649-L659This paper discusses some issues with the discrepancy between training and
inference in this kind of seq2seq model
http://arxiv.org/pdf/1506.03099.pdfOn Thu, Jan 28, 2016 at 11:39 AM, Jencir Lee [email protected]
wrote:1 I'm training the session3/nmtpy with attention on the europarl corpus
with 5000-term vocabulary, 250 word vector dimension, 500 internal
representation dimension It takes a couple of days for a full epoch on AWS
GPU instance (Nvidia K40) with 4G GPU memory I'm just wondering if there's
any knowingly more basic parallel corpus (eg understandable by 10-yo) for
training2 BLEU metric could make it anodyne for the translation to omit some
pivotal words, eg if the correct translation is "I believe, that xxxx" and
the machine translation omitted "believe" Would there by any idea that an
MT approach could better conserve the structural/compositional information?—
Reply to this email directly or view it on GitHub
#37.
from dl4mt-tutorial.
As always, thanks a lot @chrishokamp , i may further add these two :
http://arxiv.org/abs/1512.02433
http://research.microsoft.com/apps/pubs/default.aspx?id=217163
but note that, none of these ideas are implemented in this repo.
@jli05 for your 1st question, may be you can try using TED talks (check IWSLT) or OpenSubtitles (OPUS), I'm not sure tho ?!
from dl4mt-tutorial.
Sorry I just meant the metric generally gives equal emphasis to each term/n-gram (apart from the NIST score), then the metric doesn't reflect well that some errors (eg missing the verb of the entire sentence) are more grave than others -- is it possible for the translation engine to conserve some structural information of source text or at least for the metric to reflect this? (I think S Bowman did a study concluding that LSTM actually preserves some tree-like information)
Thanks for the corpus suggestions. A side question is, when someone sets out to compile a corpus for his NLP training tasks, how much corpus is enough? Is there any practical rule of thumb? I ask this as NLP is different from images. If we were working on an image set, we could tell ourselves: "OK we've got 20 giraffe pictures. We're roughly fine."
from dl4mt-tutorial.
Related Issues (20)
- Why the grads have to be shared? HOT 2
- Cost is Nan after one epoch if maxlen > 50 HOT 3
- Where is dataset='/ichec/work/dl4mt_data/nec_files/wiki.tok.txt.gz'? HOT 2
- discrepancy between paper and code HOT 1
- NaN detected HOT 1
- L2 regularization on bias terms?
- Unnecessary bias term? HOT 1
- Asymmetry in read gate application
- dim == dim_nonlin and nin == nin_nonlin must be always true?
- How to build the dataset 'all.en.concat.gz.pkl' in session2/train_nmt_all.py? HOT 1
- Random Translations? HOT 5
- ValueError: unsupported pickle protocol: 3 HOT 2
- why convert the value of matrix to the type with astype('float32')? HOT 2
- a detailed description about param_init_gru and gru_layer HOT 2
- the value of TensorType(float32, 3D) HOT 2
- compute word probabilities HOT 2
- Maybe it's time to upgrade to Python 3 and ditch Python 2 support? HOT 1
- By condition what does it mean in this tutorial?
- Can anyone provide GPU version of translate.py? HOT 1
- TypeError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dl4mt-tutorial.