Giter VIP home page Giter VIP logo

Comments (7)

orhanf avatar orhanf commented on August 18, 2024

@jli05 would you mind stating the pros and cons of this in the context of nmt and dl4mt repo here, thanks a lot.

from dl4mt-tutorial.

jli05 avatar jli05 commented on August 18, 2024

Basically I hope this and future repos achieve more than merely as a tool for academic research.

Cons:

  1. Time on refactoring the current software/data systems to work with Python 3.4+. The amount of effort is proportional to the size of current systems.
  2. A little learning curve, max half a day.

Pros:

  1. Evolving with time. Python 2.7 is entering the End Of Life cycle till 2020 (PEP 373); Ubuntu is phasing out install of Python 2.7 by default. It's not hard to imagine newer releases of RHEL will adopt similar policies.
  2. Allows other parties to contribute who're developing with modern choice of languages.
  3. Allows constant maintenance of the repos so that it's a thriving experience for whoever maintain them.

Theano is one of the rare academic repos that's constantly maintained and streamlined, which can rival corporate offerings in a certain aspect. If we google for BlinkDB, the last commit was two years ago -- it remained a good concept that earned the author a paper. I believe software has wider and deeper impacts than publications today. I know this repo was initially put together as a tutorial; however I'd wish more for it and all future sequels: to be a point of reference for NMT, which showcases excellence in algorithm and brick-and-mortar code.

from dl4mt-tutorial.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

@jli05 Thans for sharing your view! I think this definitely makes sense.

@orhanf How about we don't fully validate the changes (as in training a full model and verifying that the same BLEU scores for all the models and language pairs we've tried), but simply make it runnable with Python3?

@jli05 Would you kindly help us move toward Python3 by making a PR for, say, one of the sessions?

from dl4mt-tutorial.

jli05 avatar jli05 commented on August 18, 2024

Sure I'll do it in the coming 1-2 weeks. Thanks @kyunghyuncho @orhanf !

from dl4mt-tutorial.

jli05 avatar jli05 commented on August 18, 2024

Could we confirm what's the exact workflow in data/?

My understanding is that for a simplest setup, we could run setup_local_env.sh. Is that sufficient? We made no call to preprocess.sh therein?

All the sessions take in the tokenised wiki dump and its associated dictionary from wiki.tok.txt.gz and wiki.tok.txt.gz.pkl. Currently we don't have scripts for generating them?

from dl4mt-tutorial.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

@orhanf can you answer this?

from dl4mt-tutorial.

orhanf avatar orhanf commented on August 18, 2024

@jli05 , a detailed description is provided with #53 but let me clarify this further here.

setup_local_env.sh was the initial script provided in the repo, intended to download an example data and preprocess it (only tokenization). Later on, in order to use subword-units (bpe), another script was added (preprocess.sh), which pre-processes existing data (tokenize, learn bpe, apply bpe and shuffle)

We now merged both functionalities into setup_local_env.sh and made it to call preprocess.sh optionally (with -b flag) when you want to use bpe.

from dl4mt-tutorial.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.