Giter VIP home page Giter VIP logo

opengpt's Introduction

OpenGPT

A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).

Learn more in our blog: AI for Healthcare | Introducing OpenGPT.

NHS-LLM

A conversational model for healthcare trained using OpenGPT. All the medical datasets used to train this model were created using OpenGPT and are available below.

Available datasets

  • NHS UK Q/A, 24,665 question and answer pairs, Prompt used: f53cf99826, Generated via OpenGPT using data available on the NHS UK Website. Download here
  • NHS UK Conversations, 2,354 unique conversations, Prompt used: f4df95ec69, Generated via OpenGPT using data available on the NHS UK Website. Download here
  • Medical Task/Solution, 4,688 pairs generated via OpenGPT using GPT-4, prompt used: 5755564c19. Download here

All datasets are in the /data folder.

Installation

pip install opengpt

If you are working with LLaMA models, you will also need some extra requirements:

pip install -r ./llama_train_requirements.txt

Tutorials

How to

  1. We start by collecting a base dataset in a certain domain. For example, collect definitions of all disases (e.g. from NHS UK). You can find a small sample dataset here. It is important that the collected dataset has a column named text where each row of the CSV has one disease definition.

  2. Find a prompt matching your use case in the prompt database, or create a new prompt using the Prompt Creation Notebook. A prompt will be used to generate tasks/solutions based on the context (the dataset collected in step 1.)

  • Edit the config file for dataset generation and add the appropirate promtps and datasets (example config file).
  • Run the Dataset generation notebook (link)
  1. Edit the train_config file and add the datasets you want to use for training.
  2. Use the train notebook or run the training scripts to train a model on the new dataset you created.

If you have any questions please checkout discourse

More Examples

opengpt's People

Contributors

linglongqian avatar w-is-h avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opengpt's Issues

wget example_train_config.yaml points to wrong URL

!wget https://github.com/CogStack/OpenGPT/blob/main/configs/example_train_config.yaml ended up downloading the HTML file Github page (And subsequently throws an error of course)

I think you want to maybe use
!wget https://raw.githubusercontent.com/CogStack/OpenGPT/main/configs/example_train_config.yaml instead

ERROR: Cannot determine archive format When running pip install -r ./llama_train_requirements.txt in venv (Resloved)

OS: windows 11
Vm : venv
Python ver: 3.11

Issue:
Anytime I would run the pip install -r ./llama_train_requirements.txt it wouldn't install stating "ERROR: Cannot determine archive format When running ( trail to pip ) "

Resolution :
I fixed it by adding "git+" to the transformers git address in the /llama_train_requirements.txt file.

. protobuf==3.20.3 accelerate git+https://github.com/huggingface/transformers sentencepiece
afterwards it ran without issue

Model weights available?

Thanks for your brilliant work! I'm wondering if the model weights are available for us to download. Cheers.

Processed NHS articles for all URLs mentioned in the complete data

I'm looking at the nhs_conditions_small_sample/original_data.csv and wondering if there is a similar data for all the URLs that are referenced in the processed QA and Conversation datasets. If not, is there any code to extract and process these from NHS website?

Price for dataset generation

Hi team,

Thanks for this great project. I wanna know what was the price you paid to complete the dataset generation task?

Excellent Project

I was thinking of exactly this sort of method just a day or two ago! I'm glad to see it formalized and proven out.

Are you all interested in working on more? If you'd like to collaborate, let me know!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.