Giter VIP home page Giter VIP logo

gpt2's Introduction

Hi there πŸ‘‹

Gmail Badge GitHub User's stars Hits Kaggle

This is an overview of my github activities. For more details, check out my CV.

βš’οΈ Projects

  • GPT2 - GPT-2 PyTorch Implementation. [github] [demo] [demo-korean]
  • canrevan - The fast and efficient naver-news crawler. [github] [pypi]
  • Inverse DALL-E - Optical Character Recognition with Autoregressive Image Token Generation. [github]
  • Coverist-AI-Research - Generate a bookcover image with DALL-E. [github]
  • ALREADYME.md AI - Generate README.md with GPT-3 few-shot learning. [research repo] [serving repo]
  • Deploy KoGPT - A tutorial of deploying large-scale language model with NVIDIA Triton and FasterTransformer. [github]
  • polyglot-jax-inference - A Jax/Flax inference code for GPT-NeoX model on TPU. [github]
  • starcode-jax - A Jax/Flax inference code for StarCoder model on TPU. [github]
  • deit3-jax - Jax/Flax implementation of DeiT and DeiT-III. [github]

πŸ“ Contributions

  • πŸ€— Transformers - Change DataCollatorForSeq2Seq to pad labels to a multiple of pad_to_multiple_of [github] [pr]
  • πŸ€— Transformers - Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder [github] [pr]

πŸ† Kaggle competitions

Check out my profile for more details.

  • πŸŽ–οΈ Google - American Sign Language Fingerspelling Recognition - solo gold medal & money prize (5/1315) [overview] [github]
  • πŸŽ–οΈ Feedback Prize - Evaluating Student Writing - solo gold medal & money prize (4/2060) [overview] [github]
  • πŸ₯ˆ Learning Equality - Curriculum Recommendations - top 1% (13/1057) [overview]
  • πŸ₯ˆ G2Net Detecting Continuous Gravitational Waves - top 1% (12/936) [overview] [github]
  • πŸ₯ˆ CommonLit Readability Prize - top 1% (42/3633) [overview] [github]
  • πŸ₯ˆ Feedback Prize - Predicting Effective Arguments - top 2% (40/1567) [overview] [github]
  • πŸ₯ˆ Riiid Answer Correctness Prediction - top 2% (78/3395) [overview] [github]
  • πŸ₯ˆ Google AI4Code – Understand Code in Python Notebooks - top 3% (39/1135) [overview] [github]
  • πŸ₯ˆ Benetech - Making Graphs Accessible - top 4% (26/619) [overview]
  • πŸ₯ˆ Bristol-Myers Squibb – Molecular Translation - top 5% (50/874) [overview] [github]
  • πŸ₯‰ Quick, Draw! Doodle Recognition Challenge - top 5% (74/1309) [overview]
  • πŸ₯‰ Humpback Whale Identification - top 8% (189/2120) [overview]

πŸ’¬ Contact

Please check out the above badges to contact me. You can also open an issue in the corresponding repository or tag me (@affjljoo3581) in issues/prs/commits on GitHub.

gpt2's People

Contributors

affjljoo3581 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gpt2's Issues

Multi GPU mode is stuck at the beginning

hi @affjljoo3581
Thank you very much for your work
when I run for demo,it stuck. but no --gpus it works well [only on my first gpu]
[root@gpu02]:~/kb/src# python -m gpt2 train --train_corpus ../build/corpus.train.txt \

                 --eval_corpus            ../build/corpus.test.txt \
                 --vocab_path             ../build/vocab.txt \
                 --dims                   1024 \
                 --batch_train            128 \
                 --batch_eval             128 \
                 --seq_len                64 \
                 --total_steps            3000 \
                 --eval_steps             500 \
                 --save_steps             3000 \
                 --gpus                   4
                 --save_checkpoint_path   ckpt-gpt2.pth \
                 --save_model_path        gpt2-pretrained.pth

Train GPT-2 model: 0%| | 0/3000 [00:00<?, ?it/s]
How to fix it so that the program goes on?

Is Apex useful for GPT-2?

hi, Is there a reduction in the size of the GPT-2 model when using Apex, is the inference speed of the model faster?

Activaiton Function

From the paper Improving Language Understanding by Generative Pre-Training (GPT-2), it says that gelu was used as an activation function.
Is there any activation function used in the code?

Also, can you tell me the reason of adding Swish?

Which kind of tokenizer do you use? It looks like WordPiece, not BPE.

OpenAI's GPT-2 implementation uses BPE to make tokenizer, which needs 2 files: one is a .json file contains vocabulary, another is a .txt file contains merges.
Your implementation only uses one vocab.txt file, and some vocabulary may start with '##', which implys from your tokenization.py.
So do you use WordPiece not BPE?
(not English native speaker, sorry for my poor English...)

bidirectional training in GPT2

thanks for the sharing and code. May I ask you if there is method to fine tune the pre-trained GPT2 in bidirectional training or hybrid (uni- and bidirectional training together)? Thanks for any tips.

Training spec #2

Could you share other details of the training results in the comment of the issue which has loss of 3.2398 ?

For example, the things such as scheduler, optimizer beta1, beta2, dropout probability, gradient clipping, learning rate, warmup step, layer normalization, etc.

I just know about some training tips for parameter configuration.

Thank you.

Confusions on Usage

Hi! I'm new to gpt2 and also this project. Thanks for sharing this awesome project! I got problems when I want to run the code following the usage section.

After preparing datasets, you can train GPT-2 by using as follows:
$ python -m gpt2 train --train_corpus build/corpus.train.txt \ ...

Here you use gpt2 as a python module which is not metioned in previous usage section. I want to know what I can do to run this code and pretrain the gpt2 model. Looking forward to your reply!

Training spec

I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?

def add_subparser(subparsers: argparse._SubParsersAction):

Are these parameters used to get the loss ?

GPT-2 implementation problem

"Hi, I am reading the GPT-2 paper and encountering a problem with the following phrase related to implementation:

'A modified initialization method is used to account for the accumulation on the residual path with model depth. We scale the weights of residual layers at initialization by a factor of 1/√N, where N is the number of residual layers.'

My problem is that we normalize after accumulation (addition then normalization). So, why do we need to scale weights? Aren't we doing this to reduce the impact of accumulation?"

Dataset에 λŒ€ν•΄ 문의 λ“œλ¦½λ‹ˆλ‹€.

μ•ˆλ…•ν•˜μ„Έμš”. μš°μ„  ν•œκ΅­μ–΄ GPT2 pretrained λͺ¨λΈμ„ κ³΅κ°œν•΄μ£Όμ…”μ„œ 정말 κ°μ‚¬ν•©λ‹ˆλ‹€. 정말 멋진 ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€. ν•œ 가지 문의 사항이 μžˆλŠ”λ° facebookμ—μ„œ μ–ΈκΈ‰ν•˜μ‹  데이터셋 μ€‘μ—μ„œ ν˜Ήμ‹œ 'μ›Ήμ†Œμ…œ 데이터'κ°€ μ •ν™•νžˆ μ–΄λ–€ 데이터λ₯Ό μ˜λ―Έν•˜λŠ”μ§€ μ•Œ 수 μžˆμ„κΉŒμš”..?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.