Giter VIP home page Giter VIP logo

how-to-run-large-language-model-flan-t5-locally's Introduction

How to run Large Language Model FLAN -T5 and GPT locally

Hello everyone, today we are going to run a Large Language Model (LLM) Google FLAN-T5 locally and GPT2.

When you are building new applications by using LLM and you require a development environment in this tutorial I will explain how to do it.

Introduction

FLAN-T5

FLAN-T5 is a Large Language Model open sourced by Google under the Apache license at the end of 2022. It is available in different sizes - see the model card.

  • google/flan-t5-small: 80M parameters; 300 MB download
  • google/flan-t5-base: 250M parameters
  • google/flan-t5-large: 780M parameters; 1 GB download
  • google/flan-t5-xl: 3B parameters; 12 GB download
  • google/flan-t5-xxl: 11B parameters

FLAN-T5 models use the following models and techniques:

  • The pretrained model T5 (Text-to-Text Transfer Transformer)
  • The FLAN (Finetuning Language Models) collection to do fine-tuning multiple tasks

GPTNeo

The GPTNeo model was released in the EleutherAI/gpt-neo repository by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like causal language model trained on the Pile dataset.

Step 1. Installation of Conda

First you need to install anaconda at this link

png

in this location C:\Anaconda3 , then you, check that your terminal , recognize conda

C:\conda --version
conda 23.1.0

Step 2. Environment creation

The environments supported that I will consider is Python 3.10,

I will create an environment called analysis, but you can put the name that you like.

conda create -n LLM python==3.10

then we activate

conda activate LLM

then in your terminal type the following commands:

python -m pip install --upgrade pip
conda install ipykernel notebook

then

python -m ipykernel install --user --name LLM --display-name "Python (LLM)"

then we can install jupyter lab

pip install jupyterlab

then we install the following libraries used to the creation of our environment

For this

pip install transformers
pip install transformers[torch]

If we have CUDA capability we are going to use CUDA 11.4 and you can download here

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

otherwise

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
conda install pytorch torchvision torchaudio cpuonly -c pytorch

image-20230902125345227

pip install chardet
pip install fitz
pip install PyMuPDF
conda install -c conda-forge ipywidgets
jupyter nbextension enable --py widgetsnbextension

If you have different versions of CUDA you can install the best version of torch that fit your computer here

we would like to create a folder to work

mkdir workspace && cd workspace

later we open a the jupyter lab

jupyter lab

then you create a new notebook Python (LLM)

image-20230902130126244

Google FLAN-T5

There are different models of FLAN-T5 out there. Let us check here two models

  • google/flan-t5-small

  • google/flan-t5-large

Large Language Model FLAN-T5 and GTP locally

In this notebook we are going to run different versions of FLAN-T5 and GTP

We define the following prompt:

prompt ="A step by step recipe to make bolognese pasta:"

FLAN-T5-small

Here we are going to download 300 MB of data of the model

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
['Pour a cup of bolognese into a large bowl and add the pasta']

FLAN-T5-large

Here we are going to download 3GB MB of data of the model

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
['Toss the pasta with the sauce, then add the meat and toss again.']

CUDA Capability

import torch
is_cuda=torch.cuda.is_available()
if is_cuda:
    print("This computer uses CUDA") 
else:  
    print("This computer uses CPU") 
This computer uses CUDA
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large").to(device)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
['Toss the pasta with the sauce, then add the meat and toss again.']

GPT-neo-125M

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
input_ids = tokenizer(prompt, return_tensors="pt").to(device) 
#outputs = model.generate(**input_ids)
outputs = model.generate(**input_ids, do_sample=True, max_length=400)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
generation =tokenizer.batch_decode(outputs, skip_special_tokens=False)
print('\n'.join(generation))
A step by step recipe to make bolognese pasta:
Formalizing the ingredients and sauces
Wipe up the ingredients
Wipe the sauce without over or overstitching the ingredients.

FRENCH TENDER

I make a bolognese pasta for a wedding reception, using what I have learned from many other things but one thing about bolognese pasta is that I want a traditional Italian recipe for a wedding. This recipe has an Italian twist:

1/4 ounce Italian sausage cheese
1/4 ounce Italian sausage salt
2/3 ounce Italian sausage fresh ramekins
1/2 ounce Italian sausage chopped onions
1/2 ounce Italian sausage chopped parsley

In a shallow saucepan, melt butter and sauté onion
2/3 ounce Italian sausage with and sauce
salt and pepper
2/3 ounce Italian sausage chopped parsley

Thoroughly sauté onion until it softens
2/3 ounce Italian sausage with and sauce
salt and pepper

Preheat the oven to 325 degrees

Bring a large kettle to 50 to 60°

Sauté onion over medium-high heat
2/3 ounce Italian sausage with and sauce
salt and pepper

In a skillet lightly butter a small pot with 2/3 ounce Italian sauce
salt and pepper

Add 2/3 ounce Italian sauce seasoned with salt, pepper and mix well.
Sift together sausage, seasoning and garlic
Add enough water to bring to boil

Slowly pour in ingredients while stirring
Add enough water to bring to boil

Gravfry the pasta to lightly coat with oil
Place the pasta on the bottom rack with foil

Heat oil in skillet
Oil a small stovetop dish with hot brown sugar
Smoke the sauce. Place your fingers into the meat
sauté onion over hot oven(s)/heat

HuggingFace Enviroment

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import gradio as gr
import re
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large").to(device)
class GUI:
    def query(self,query,tokens=30):
        options=""
        tok_len=tokens
        t5query = f"""Question: "{query}" Context: {options}"""
        inputs = tokenizer(t5query, return_tensors="pt").to(device)
        outputs = model.generate(**inputs, max_new_tokens=tok_len)
        return tokenizer.batch_decode(outputs, skip_special_tokens=True)
        
    def begin(self,question,tokens):
        results = app.query(question,tokens)
        return results

app = GUI()
title = "Get answers with questions with Flan-T5"
description = "Results will show up in a few seconds."
css = """.output_image, .input_image {height: 600px !important}"""

iface = gr.Interface(fn=app.begin, 
                     inputs=[ gr.Textbox(label="Question"),
                             gr.Slider(30, 100, value=30, step = 1)
                            ],
                     outputs = gr.Text(label="Answer Summary"),
                     title=title,
                     description=description,
                     #article=article,
                     css=css,
                     analytics_enabled = True, enable_queue=True)
iface.launch(inline=False, share=False, debug=False)
C:\Users\rusla\AppData\Local\Temp\ipykernel_4780\898455749.py:27: GradioDeprecationWarning: `enable_queue` is deprecated in `Interface()`, please use it within `launch()` instead.
  iface = gr.Interface(fn=app.begin,


Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
iface.launch()
<iframe src="http://127.0.0.1:7860/" width="100%" height="500" allow="autoplay; camera; microphone; clipboard-read; clipboard-write;" frameborder="0" allowfullscreen></iframe>

FLAN-T5 vs GTP Neo

Now let us merge all tree models together and check the differences

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM
import torch
import gradio as gr
import re
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class GUI:
    def query(self,query,modelo="flan-t5-small",tokens=100):
        options=""
        tok_len=tokens
        t5query = f"""Question: "{query}" Context: {options}"""        
        if (modelo=="flan-t5-small" or modelo=="flan-t5-large"):
           tokenizer = AutoTokenizer.from_pretrained("google/{}".format(modelo))
           model = AutoModelForSeq2SeqLM.from_pretrained("google/{}".format(modelo)).to(device)
           inputs = tokenizer(t5query, return_tensors="pt").to(device)
           outputs = model.generate(**inputs, max_new_tokens=tok_len)
        else:
            model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M").to(device)
            tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
            input_ids = tokenizer(t5query, return_tensors="pt").to(device) 
            outputs = model.generate(**input_ids, do_sample=True, max_length=tok_len) 
        generation=tokenizer.batch_decode(outputs, skip_special_tokens=True)
        
        return '\n'.join(generation)
    def begin(self,question,modelo,tokens):
        results = app.query(question,tokens)
        return results

app = GUI()
title = "Get answers with questions with Flan-T5"
description = "Results will show up in a few seconds."
article="More info  <a href='https://ruslanmv.com/'>ruslanmv.com</a><br>" 
css = """.output_image, .input_image {height: 600px !important}"""

iface = gr.Interface(fn=app.begin, 
                     inputs=[ gr.Textbox(label="Question"),
                     gr.Radio(["flan-t5-small", "flan-t5-large","gpt-neo-125M"],label="Model",value="flan-t5-small"),
                     gr.Slider(30, 200, value=100, step = 1,label="Max Tokens"),],
                     outputs = gr.Text(label="Answer Summary"),
                     title=title,
                     description=description,
                     article=article,
                     css=css,
                     analytics_enabled = True
                    ,enable_queue=True)
iface.launch(inline=False, share=False, debug=False)
Running on local URL:  http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
iface.launch()

You can check out this program in the following link:

https://huggingface.co/spaces/ruslanmv/FLAN-T5-GPT

image-20230903015210074

Uninstall Environment

If you have issues to the installation of the kernel you can remove the kernel and the environment and try again.

To list the kernels currently installed execute

jupyter kernelspec list

To remove a kernel execute

jupyter kernelspec remove LLM

and to remove the environment

conda remove -n LLM --all

Only CPU

pip install -q git+https://github.com/huggingface/transformers.git'
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install fitz
pip install PyMuPDF

how-to-run-large-language-model-flan-t5-locally's People

Contributors

ruslanmv avatar

Stargazers

Samson Mabetho avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.