Giter VIP home page Giter VIP logo

simple-phi-2-chatbot's Introduction

Simple phi-2 chatbot

This repository contains a simple chatbot using the microsoft/phi-2 large language model.

Here you can see a GIF showcasing a short conversation.

GIF with chatbot demo

Quickstart

If you want to quickly run the code, you have three options:

  1. The easiest way is to run Simple phi-2 chatbot notebook on colab
  2. Download the code and run simple_phi2_chatbot.ipynb notebook (you need to install requirements.txt)
  3. Download the code and run gradio app.py (you need to install requirements.txt)

⚠️ Warning: Inference is much fastar on GPU than on CPU.

How it was done

This chatbot demo was done in the following steps:

Get familiar with the model

The first thing that was done, was to search info about the microsoft/phi-2 model and start getting familiar with how it performed, it's advantages, issues and flaws. For that, I experimented with it in a dedicated nottebook (experiments.ipynb), have a look at it. There you can find the first experiments I performed and some of the conclusions I got.

Search for a chatbot UI in Gradio

Gradio is a nice way to perform AI demos, such as chatbots. I searched to sample code to use gradio when building chatbots. Hopefully, there is a ChatInterface that allows developers to quickly create a chatbot like panel. Moreover, there is sample code on how to integrate that class with your custom language model.

Adapt the interface to work with phi-2

The original code used a different model (togethercomputer/RedPajama-INCITE-Chat-3B-v1), I had to adapt it to work with phi-2. Besides the obvious changes, such as changing the name when loading the tokenizer and model, other changes were needed. Here we showcase just the following:

  • Add some context to the chat (see CONTEXT variable) and change chat format. From <user>: to just User:.
  • Change the stopping criteria at StopOnTokens. This class needed to be updated to account for the id of the <|endoftext|> token of the phi-2 tokenizer. Additionally, the end of line token \n was finally added as well to prevent the model from hallucinating weird text (although this hurts the performance of the model when it wants to output legit multi-line content).
  • Create a new stopping criteria class called StopOnNames. It blocks the model from hallucinating future turns of the conversation as it stops it when it finds a token sequence that represents a new line with the speaker name (e.g. \nUser:).

    Note: This got obsolete after adding the end of line character as a stopping criteria. However it was left in the code intentionally in case it's useful for a more refined stopping mechanism.

Add GPU and CPU support

The sample code from microsoft/phi-2 model page was using GPU by default. The code was not working when running on a CPU. CPU support was added with a device variable that stores the device where the model should run, and changing the load parameters depending on that device.

Add multi language support

As the specifications explain, phi-2 is an English model:

Language Limitations: The model is primarily designed to understand standard English. Informal English, slang, or any other languages might pose challenges to its comprehension, leading to potential misinterpretations or errors in response.

However, it is still somehow capable of speaking other languages. To try that, a LANG variable was created (allowed values are EN for English, defaul; and ES for Spanish). Setting this variable, we change the language of the CONTEXT conditioning and we push the model to speak in that language. The model performance in Spanish is limited.

Adapt the code for different run configurations

Finally, the code was adapted to run on different configurations: using gradio command, on a local jupyter notebook or on Google Colab. Check the Quickstart section.

simple-phi-2-chatbot's People

Contributors

gnuevo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.