Giter VIP home page Giter VIP logo

thinkgpt's Introduction

ThinkGPT ๐Ÿง ๐Ÿค–

ThinkGPT is a Python library aimed at implementing Chain of Thoughts for Large Language Models (LLMs), prompting the model to think, reason, and to create generative agents. The library aims to help with the following:

  • solve limited context with long memory and compressed knowledge
  • enhance LLMs' one-shot reasoning with higher order reasoning primitives
  • add intelligent decisions to your code base

Key Features โœจ

  • Thinking building blocks ๐Ÿงฑ:
    • Memory ๐Ÿง : GPTs that can remember experience
    • Self-refinement ๐Ÿ”ง: Improve model-generated content by addressing critics
    • Compress knowledge ๐ŸŒ: Compress knowledge and fit it in LLM's context either by anstracring rules out of observations or summarize large content
    • Inference ๐Ÿ’ก๏ธ: Make educated guesses based on available information
    • Natural Language Conditions ๐Ÿ“: Easily express choices and conditions in natural language
  • Efficient and Measurable GPT context length ๐Ÿ“
  • Extremely easy setup and pythonic API ๐ŸŽฏ thanks to DocArray

Installation ๐Ÿ’ป

You can install ThinkGPT using pip:

pip install git+https://github.com/alaeddine-13/thinkgpt.git

API Documentation ๐Ÿ“š

Basic usage:

from thinkgpt.llm import ThinkGPT
llm = ThinkGPT(model_name="gpt-3.5-turbo")
# Make the llm object learn new concepts
llm.memorize(['DocArray is a library for representing, sending and storing multi-modal data.'])
llm.predict('what is DocArray ?', remember=llm.remember('DocArray definition'))

Memorizing and Remembering information

llm.memorize([
    'DocArray allows you to send your data, in an ML-native way.',
    'This means there is native support for Protobuf and gRPC, on top of HTTP and serialization to JSON, JSONSchema, Base64, and Bytes.',
])

print(llm.remember('Sending data with DocArray', limit=1))
['DocArray allows you to send your data, in an ML-native way.']

Use the limit parameter to specify the maximum number of documents to retrieve. In case you want to fit documents into a certain context size, you can also use the max_tokens parameter to specify the maximum number of tokens to retrieve. For instance:

from examples.knowledge_base import knowledge
from thinkgpt.helper import get_n_tokens

llm.memorize(knowledge)
results = llm.remember('hello', max_tokens=1000, limit=1000)
print(get_n_tokens(''.join(results)))
1000

However, keep in mind that concatenating documents with a separator will add more tokens to the final result. The remember method does not account for those tokens.

Predicting with context from long memory

from examples.knowledge_base import knowledge
llm.memorize(knowledge)
llm.predict('Implement a DocArray schema with 2 fields: image and TorchTensor', remember=llm.remember('DocArray schemas and types'))

Self-refinement

print(llm.refine(
    content="""
import re
    print('hello world')
        """,
    critics=[
        'File "/Users/user/PyCharm2022.3/scratches/scratch_166.py", line 2',
        "  print('hello world')",
        'IndentationError: unexpected indent'
    ],
    instruction_hint="Fix the code snippet based on the error provided. Only provide the fixed code snippet between `` and nothing else."))
import re
print('hello world')

One of the applications is self-healing code generation implemented by projects like gptdeploy and wolverine

Compressing knowledge

In case you want your knowledge to fit into the LLM's context, you can use the following techniques to compress it:

Summarize content

Summarize content using the LLM itself. We offer 2 methods

  1. one-shot summarization using the LLM
llm.summarize(
  large_content,
  max_tokens= 1000,
  instruction_hint= 'Pay attention to code snippets, links and scientific terms.'
)

Since this technique relies on summarizing using a single LLM call, you can only pass content that does not exceed the LLM's context length.

  1. Chunked summarization
llm.chunked_summarize(
  very_large_content,
  max_tokens= 4096,
  instruction_hint= 'Pay attention to code snippets, links and scientific terms.'
)

This technique relies on splitting the content into different chunks, summarizing each of those chunks and then combining them all together using an LLM.

Induce rules from observations

Amount to higher level and more general observations from current observations:

llm.abstract(observations=[
    "in tunisian, I did not eat is \"ma khditech\"",
    "I did not work is \"ma khdemtech\"",
    "I did not go is \"ma mchitech\"",
])
['Negation in Tunisian Arabic uses "ma" + verb + "tech" where "ma" means "not" and "tech" at the end indicates the negation in the past tense.']

This can help you end up with compressed knowledge that fits better the limited context length of LLMs. For instance, instead of trying to fit code examples in the LLM's context, use this to prompt it to understand high level rules and fit them in the context.

Natural language condition

Introduce intelligent conditions to your code and let the LLM make decisions

llm.condition(f'Does this represent an error message ? "IndentationError: unexpected indent"')
True

Natural language select

Alternatively, let the LLM choose among a list of options:

llm.select(
    question="Which animal is the king of the jungle?",
    options=["Lion", "Elephant", "Tiger", "Giraffe"]
)
['Lion']

You can also prompt the LLM to choose an exact number of answers using num_choices. By default, it's set to None which means the LLM will select any number he thinks it's correct.

Use Cases ๐Ÿš€

Find out below example demos you can do with thinkgpt

Teaching ThinkGPT a new language

from thinkgpt.llm import ThinkGPT

llm = ThinkGPT(model_name="gpt-3.5-turbo")

rules = llm.abstract(observations=[
    "in tunisian, I did not eat is \"ma khditech\"",
    "I did not work is \"ma khdemtech\"",
    "I did not go is \"ma mchitech\"",
], instruction_hint="output the rule in french")
llm.memorize(rules)

llm.memorize("in tunisian, I studied is \"9rit\"")

task = "translate to Tunisian: I didn't study"
llm.predict(task, remember=llm.remember(task))
The translation of "I didn't study" to Tunisian language would be "ma 9ritech".

Teaching ThinkGPT how to code with thinkgpt library

from thinkgpt.llm import ThinkGPT
from examples.knowledge_base import knowledge

llm = ThinkGPT(model_name="gpt-3.5-turbo")

llm.memorize(knowledge)

task = 'Implement python code that uses thinkgpt to learn about docarray v2 code and then predict with remembered information about docarray v2. Only give the code between `` and nothing else'
print(llm.predict(task, remember=llm.remember(task, limit=10, sort_by_order=True)))

Code generated by the LLM:

from thinkgpt.llm import ThinkGPT
from docarray import BaseDoc
from docarray.typing import TorchTensor, ImageUrl

llm = ThinkGPT(model_name="gpt-3.5-turbo")

# Memorize information
llm.memorize('DocArray V2 allows you to represent your data, in an ML-native way')


# Predict with the memory
memory = llm.remember('DocArray V2')
llm.predict('write python code about DocArray v2', remember=memory)

Replay Agent memory and infer new observations

Refer to the following script for an example of an Agent that replays its memory and induces new observations. This concept was introduced in the Generative Agents: Interactive Simulacra of Human Behavior paper.

python -m examples.replay_expand_memory
new thoughts:
Klaus Mueller is interested in multiple topics
Klaus Mueller may have a diverse range of interests and hobbies

Replay Agent memory, criticize and refine the knowledge in memory

Refer to the following script for an example of an Agent that replays its memory, performs self-criticism and adjusts its memory knowledge based on the criticism.

python -m examples.replay_criticize_refine
refined "the second number in Fibonacci sequence is 2" into "Observation: The second number in the Fibonacci sequence is actually 1, not 2, and the sequence starts with 0, 1."
...

This technique was mainly implemented in the the Self-Refine: Iterative Refinement with Self-Feedback paper

For more detailed usage and code examples check ./examples.

thinkgpt's People

Contributors

alaeddine-13 avatar hanxiao avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.