Giter VIP home page Giter VIP logo

onlyphantom / llm-python Goto Github PK

View Code? Open in Web Editor NEW
663.0 12.0 261.0 2.14 MB

Large Language Models (LLMs) tutorials & sample scripts, ft. langchain, openai, llamaindex, gpt, chromadb & pinecone

Home Page: https://www.youtube.com/playlist?list=PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS

License: MIT License

Python 40.13% Jupyter Notebook 59.87%
langchain chromadb gpt-3 langchain-python llamaindex openai-api llm llmops pinecone tutorial

llm-python's Introduction

llm-python

A set of instructional materials, code samples and Python scripts featuring LLMs (GPT etc) through interfaces like llamaindex, langchain, Chroma (Chromadb), Pinecone etc. Mainly used to store reference code for my LangChain tutorials on YouTube.

LangChain youtube tutorials

Learn LangChain from my YouTube channel (~8 hours of LLM hands-on building tutorials); Each lesson is accompanied by the corresponding code in this repo and is designed to be self-contained -- while still focused on some key concepts in LLM (large language model) development and tooling.

Feel free to pick and choose your starting point based on your learning goals:

Part LLM Tutorial Link Video Duration
1 OpenAI tutorial and video walkthrough Tutorial Video 26:56
2 LangChain + OpenAI tutorial: Building a Q&A system w/ own text data Tutorial Video 20:00
3 LangChain + OpenAI to chat w/ (query) own Database / CSV Tutorial Video 19:30
4 LangChain + HuggingFace's Inference API (no OpenAI credits required!) Tutorial Video 24:36
5 Understanding Embeddings in LLMs Tutorial Video 29:22
6 Query any website with LLamaIndex + GPT3 (ft. Chromadb, Trafilatura) Tutorial Video 11:11
7 Locally-hosted, offline LLM w/LlamaIndex + OPT (open source, instruction-tuning LLM) Tutorial Video 32:27
8 Building an AI Language Tutor: Pinecone + LlamaIndex + GPT-3 + BeautifulSoup Tutorial Video 51:08
9 Building a queryable journal 💬 w/ OpenAI, markdown & LlamaIndex 🦙 Tutorial Video 40:29
10 Making a Sci-Fi game w/ Cohere LLM + Stability.ai: Generative AI tutorial Tutorial Video 1:02:20
11 GPT builds entire party invitation app from prompt (ft. SMOL Developer) Tutorial Video 41:33
12 A language for LLM prompt design: Guidance Tutorial Video 43:15
13 You should use LangChain's Caching! Tutorial Video 25:37
14 Build Chat AI apps with Steamlit + LangChain Tutorial Video 32:11

The full lesson playlist can be found here.

Quick Start

  1. Clone this repo
  2. Install requirements: pip install -r requirements.txt
  3. Some sample data are provided to you in the news foldeer, but you can use your own data by replacing the content (or adding to it) with your own text files.
  4. Create a .env file which contains your OpenAI API key. You can get one from here. HUGGINGFACEHUB_API_TOKEN and PINECONE_API_KEY are optional, but they are used in some of the lessons.
    • Lesson 10 uses Cohere and Stability AI, both of which offers a free tier (no credit card required). You can add the respective keys as COHERE_API_KEY and STABILITY_API_KEY in the .env file.

The .env file should look like this:

OPENAI_API_KEY=your_api_key_here

# optionals (not required for most of the series)
HUGGINGFACEHUB_API_TOKEN=your_api_token_here
PINECONE_API_KEY=your_api_key_here

HuggingFace and Pinecone are optional but is recommended if you want to use the Inference API and explore those models outside of the OpenAI ecosystem. This is demonstrated in Part 3 of the tutorial series. 5. Run the examples in any order you want. For example, python 6_team.py will run the website Q&A example, which uses GPT-3 to answer questions about a company and the team of people working at Supertype.ai. Watch the corresponding video to follow along each of the examples.

Dependencies

💡 Thanks to the work of @VanillaMacchiato, this project is updated as of 2023-06-30 to use the latest version of LlamaIndex (0.6.31) and LangChain (0.0.209). Installing the dependencies should be as simple as pip install -r requirements.txt. If you encounter any issues, please let me know.

If you're watching the LLM video tutorials, they may have very minor differences (typically 1-2 lines of code that needs to be changed) from the code in this repo since these videos have been released with the respective versions at the time of recording (LlamaIndex 0.5.7 and LangChain 0.0.157). Please refer to the code in this repo for the latest version of the code.

I will try to keep this repo up to date with the latest version of the libraries, but if you encounter any issues, please: (1) raise a discussion through Issues or (2) volunteer a PR to update the code.

NOTE: triton package is supported only for the x86_64 architecture. If you have problems with installing it, see the triton compatibility guide. Specifically, errors like ERROR: Could not find a version that satisfies the requirement triton (from versions: none) ERROR: No matching distribution found for triton. uname -p should give you the processor's name.

Mentorship and Support

I run a mentorship program under Supertype Fellowship. The program is self-paced and free, with a community of other learners and practitioners around the world (English-speaking). You can optionally book a 1-on-1 session with my team of mentors to help you through video tutoring and code reviews.

License

MIT © Supertype 2024

llm-python's People

Contributors

gin avatar nicholas-camarda avatar onlyphantom avatar somsub avatar vanillamacchiato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-python's Issues

Query execution hangs for 07_custom.py

Hi there, I was trying to get the 07_custom.py program to run with the facebook/opt-iml-1.3b model and I can see it loads the cache correctly and I put in enough print statements to see that it also was able to get the LLMPredictor, create the service context, and load the index from disk. However when it tries to call execute_query the program seemingly hangs. I can see my RAM usage spike for an extended period of time but no matter how long I wait (20 minutes?) I don't get a response from the model. Note that I am running with an AMD GPU so when creating the pipeline I removed the CUDA device specification because as far as I can tell CUDA Is not supported with AMD GPUs. Do I need a more powerful computer or CUDA to run this?

Here are my specifications:

OS: Windows 11
Processor AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz
Installed RAM 16.0 GB (13.9 GB usable)
Device ID XXXXXXXXXXXXXX
Product ID 00342-20715-34612-AAOEM
System type 64-bit operating system, x64-based processor
GPU 0: AMD radeon RX 6600M
GPU1: AMD Radeon(TM) Graphcis
Pen and touch Pen support

Thanks for your help!

ChromaDb doesn't work in 01_qna.py

The Chromadb doesn't work whenever i try to put it in the RetreivalQA.from_chain_type as a retreiver , it gives me this error :
'NoneType' object has no attribute 'info'

although it is the same code as you , only the importings is different due to LangChain different versions
Thanks in advance

llama_index PromptHelper Issue in custom script

i am facing these 2 errors, kindly help me out what I am missing ?

_script.py", line 47, in
prompt_helper = PromptHelper(
TypeError: PromptHelper.init() got an unexpected keyword argument 'max_input_size'
_script.py", line 47, in
prompt_helper = PromptHelper(
TypeError: PromptHelper.init() got an unexpected keyword argument 'max_chunk_overlap'

#code

prompt_helper = PromptHelper(
# maximum input size
max_input_size=2048,
# number of output tokens
num_output=256,
# the maximum overlap between chunks.
max_chunk_overlap=20,
)

create_collection no data

https://docs.trychroma.com/embeddings

create a Chroma vector store, by default operating purely in-memory

chroma_client = chromadb.Client()

create a collection

chroma_collection = chroma_client.create_collection("newspieces")

https://docs.trychroma.com/api-reference

print(chroma_collection.count())

documents = SimpleDirectoryReader('news').load_data()

index = GPTVectorStoreIndex.from_documents(documents, chroma_collection=chroma_collection)
print(chroma_collection.count())
print(chroma_collection.get()['documents'])
print(chroma_collection.get()['metadatas'])
output:
0
0
[]
[]

Problems with requirements.txt

On a fresh install - day old computer - Windows 11, VS Code 1.81.1 and Python 3.11 - Creating a virtual environment for this repo using requirements.txt failed on triton=2.0.0 and uvloop=0.17.0. Commented these out hoping that they are not critical for every tutorial (descriptions do not suggest as such).

Additionally, this answer from Stack Overflow was required to install C++ Build Tools.

index.storage_context.persist() not working as expected

index.storage_context.persist() is not storing the vector_store and creating thevector_store.json file

image

When I try to load from disk and run sc2 = StorageContext.from_defaults(persist_dir='./storage'), i get the following error:

No existing llama_index.vector_stores.simple found at ./storage/vector_store.json, skipping load.

I only have 1 document in my documents directory... Your example had 2. I wonder if that has soemthing to do with the issue?
image

Full Code:

with open('KPMGOutlook/kpmgoutlook.text', 'w') as file: file.write(kpmg_text)

documents = SimpleDirectoryReader('KPMGOutlook').load_data()

vector_store = ChromaVectorStore(chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store,persist_dir='storage')

index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

index.storage_context.persist()

query_engine = index.as_query_engine()

#Querying document. this works fine
r = query_engine.query("Which economy has the most positive outlook?")
print(r)

#This line gives me the error
sc2 = StorageContext.from_defaults(persist_dir='./storage')

pip install -r requirements.txt has some dependency/version issues

I have seen this error with other projects (not yours) when the python version was in mismatch. I currently have 3.9.6 (errors were reported with 3.11.x)...so not sure what may be going on here.


Collecting uvloop==0.17.0 (from -r requirements.txt (line 112))
Using cached uvloop-0.17.0.tar.gz (2.3 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\patbh\AppData\Local\Temp\pip-install-qa53rwky\uvloop_94d9148e502f4a8689747849ae1f0a57\setup.py", line 8, in
raise RuntimeError('uvloop does not support Windows at the moment')
RuntimeError: uvloop does not support Windows at the moment
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Request for License Definition

Hello,

I love all of your Youtube content upon reviewing your repo I've noticed that this repository currently doesn't have a license.

It would be very helpful if you could add a license to this repository. If you're unsure about which license to choose, GitHub has a guide here.

Thank you

'ListIndex' object has no attribute 'query'

when using the following code to create index, I got errors such as 'ListIndex' object has no attribute 'query' and AttributeError: 'ListIndex' object has no attribute 'save_to_disk'.

@timeit()
def create_index():
    print("Creating index")
    # Wrapper around an LLMChain from Langchaim
    llm = LLMPredictor(llm=LocalOPT())
    # Service Context: a container for your llamaindex index and query
    # https://gpt-index.readthedocs.io/en/latest/reference/service_context.html
    service_context = ServiceContext.from_defaults(
        llm_predictor=llm, prompt_helper=prompt_helper
    )
    docs = SimpleDirectoryReader("news").load_data()
    index = GPTListIndex.from_documents(docs, service_context=service_context)
    print("Done creating index", index)
    return index
  

File "demo9.py", line 101, in execute_query
response = index.query(
AttributeError: 'ListIndex' object has no attribute 'query'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.