Giter VIP home page Giter VIP logo

gpt-2-pytorch's Introduction

GPT2-Pytorch with Text-Generator

Better Language Models and Their Implications

Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. from openAI Blog

This repository is simple implementation GPT-2 about text-generator in Pytorch with compress code

Quick Start

  1. download GPT2 pre-trained model in Pytorch which huggingface/pytorch-pretrained-BERT already made! (Thanks for sharing! it's help my problem transferring tensorflow(ckpt) file to Pytorch Model!)
$ git clone https://github.com/graykode/gpt-2-Pytorch && cd gpt-2-Pytorch
# download huggingface's pytorch model 
$ curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
# setup requirements, if using mac os, then run additional setup as descibed below
$ pip install -r requirements.txt
  1. Now, You can run like this.
  • Text from Book 1984, George Orwell
$ python main.py --text "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him."
  1. Also You can Quick Starting in Google Colab

Option

  • --text : sentence to begin with.
  • --quiet : not print all of the extraneous stuff like the "================"
  • --nsamples : number of sample sampled in batch when multinomial function use
  • --unconditional : If true, unconditional generation.
  • --batch_size : number of batch size
  • --length : sentence length (< number of context)
  • --temperature: the thermodynamic temperature in distribution (default 0.7)
  • --top_k : Returns the top k largest elements of the given input tensor along a given dimension. (default 40)

See more detail option about temperature and top_k in here

Dependencies

  • Pytorch 0.41+
  • regex 2017.4.5

Mac OS Setup

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install torch tqdm
$ brew install libomp
$ export LC_ALL=en_US.UTF-8
$ export LANG=en_US.UTF-8
$ pip install -r requirements.txt

Author

License

  • OpenAi/GPT2 follow MIT license, huggingface/pytorch-pretrained-BERT is Apache license.
  • I follow MIT license with original GPT2 repository

Acknowledgement

Jeff Wu(@WuTheFWasThat), Thomas Wolf(@thomwolf) for allowing referring code.

gpt-2-pytorch's People

Contributors

graykode avatar raveenb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt-2-pytorch's Issues

Use my finetuned model?

I would very much like to know how I can use my own fine-tuned model that I trained using Colab to generate text. I have a bunch of checkpoints but I am uncertain how to proceed from here and (re)produce a bin file.

Cannot recognize <|endoftext|>

Thank you for this project! It is very helpful for me to understand how GPT2 synthesize text.

I also noticed that the GPT2/encoder.py does not implement the capability of recognizing special tokens as the HuggingFace tokenzier could.

The part of source code in HuggingFace's repo is at https://github.com/huggingface/transformers/blob/c836f77266be9ace47bff472f63caf71c0d11333/src/transformers/tokenization_utils.py#L516-L520

I understand that it is not critical, because there is only one special token <|endoftext|> in use wangkuiyi/huggingface-tokenizer-in-cxx#11

So, just saying.

Missing requirements

It needs these packages as well so I guess they need to go into requirements.txt:

torch
tqdm

Can we use transfer learning on GPT2?

Hi, i am new in this field. Can we do transfer learning with a new dataset which suppose may contain specific domain content like food, electronics so on... and train the model?

Discrepancy in Parameter Size of Smallest Model

I have been using an implementation of GPT-2 from your repository and noticed that the size of the smallest GPT-2 model available in the repository differs from the smallest model mentioned in the original paper of GPT-2.
Specifically, the size of the parameters of the smallest model in the repository is about 124M but the smallest model in original paper is 117M

I am curious to know why there is this difference

Invalid Syntax

I installed Python 2 and followed the instructions in the readme, but I'm getting an 'Invalid Syntax' error on the end quote on the following command. I have retyped the command just in case of a copy/paste artifact and I get the same error.

main.py --text "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him."

GPT-2 implementation problem

"Hi, I am reading the GPT-2 paper and encountering a problem with the following phrase related to implementation:

'A modified initialization method is used to account for the accumulation on the residual path with model depth. We scale the weights of residual layers at initialization by a factor of 1/√N, where N is the number of residual layers.'

My problem is that we normalize after accumulation (addition then normalization). So, why do we need to scale weights? Aren't we doing this to reduce the impact of accumulation?"

Help Increasing the amount of training/fine-tuning text to about 10k words

Hello,
I am trying to train/fine-tune the GPT-2 model using your wrapper, I have successfully made it to train by using a text file, however I would like to train the model with lots of text like 10 thousand words on a specific topic/domain and have it generate from 500-1000 words but I keep getting a strange error when I try it.
Please how do I increase the amount of training/fine-tuning text from the current limit to about 10,000 words?

training

Is there's any way to train GPT2 using my own text corpus?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.