jxnl / instructor Goto Github PK

View Code? Open in Web Editor NEW

5.0K 41.0 391.0 64.98 MB

structured outputs for llms

Home Page: https://python.useinstructor.com/

License: MIT License

Python 96.66% TypeScript 3.34%

openai python pydantic-v2 openai-functions validation openai-function-calli

instructor's Introduction

💥 whats up?

Currently working as an independent consultant. I use my expertise in recommendation systems to helps fast-growing startups build out their RAG applications. I am also the creator of Instructor, Flight, and an ML and data science educator.

Support

I want to support me, you can sponsor me on github. I don't want to start a substack, but I do want to write more. So this will fund my morning coffee and tea.

Creator
- Instructor
- Instructor-JS
- Youtube Chapters & Journal ~ 6k MAU
Sabbatical @ South Park Commons - 2023 - Present
Staff Machine Learning Engineer @ Stitchfix — 2016, 2018-2023
Prev, Meta, ActionIQ, NYU, Meltwater - 2013-2018
Computational Mathematics and Statistics @ University of Waterloo

Writing¶

Systems¶

Talks and Podcasts¶

instructor's People

Contributors

Stargazers

Watchers

Forkers

jphme youminxue dattgoswami daveokpare samvas-codes shresht8 codeaudit bluntworks tomchapin d3287t328 neuroscigeek77 techthiyanes zeno1408 shaunwei sambosis pursuityp know-it-all-marketing-llc amikos-tech guyschlider gmh5225 madhatter92 project-hero-tech cristobalcl jfontestad kingsframe gerarduffy mharris717 anamhira47 zshancs nirantk apollohuang1 adriangalilea awtkns marcosmagallanes atemaguer phiweger michaltorma almyai bllchmbrs haikuoxin mz0in lecole rayfernando1337 svats2k dnonline koljab samiur paxhumana-prime raymow97 stjordanis aiexanderdicke thinker007 kamilnowakflyps neilneuwirth hieutrluu lliwcwill phodaie realsrisri haoyitedaniu dhruv-anand-aintech jordanmaneval johnnysands zhiyu-01 kennethcassel corticalstack aresti amorriscode jlondonobo atbe alteredentropy nish1001 touristshaun f901107 krrishdholakia clsta j-94 lan2720 hbcbh1999 ivanleomk jeff3071 mboyanna w0lveri9 tonywhite11 swappybizz zboyles jeromyjsmith icdev2dev rgbkrk pablopalafox rogervaas maxjeblick ohadrubin python-popular-repos joshdey daaniyaan quito96 klyap lemmaleisa torobirgitte rkp64

instructor's Issues

create func of ChatCompletion does not return completion if self.function is None

in dsl/completion.py shouldn't create return completion?

def create(self):
"""
Create a chat response from the OpenAI API

    Returns:
        response (OpenAISchema): The response from the OpenAI API
    """
    kwargs = self.kwargs
    completion = openai.ChatCompletion.create(**kwargs)
    if self.function:
        return self.function.from_response(completion)
   **return completion**

Clarify docs - is there a difference between calling a ChatCompletion with response_model parameter vs. using model.openai_schema and then using a function_call in the ChatCompletion?

https://jxnl.github.io/instructor/#section-2-adding-additional-prompting Both options are shown, it's unclear if both can be used interchangeably? Or can OpenAISchema's only be used with function_call parameters of ChatCompletion s?

support for completions endpoint

Is your feature request related to a problem? Please describe.
The recent -instruct models are instruction tuned rather than dialogue tuned and should be very useful for most use cases of this library.

class UserDetail(BaseModel):
    name: str
    age: int

user: UserDetail = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Extract Jason is 25 years old"},
    ]
)

This should work.

Describe the solution you'd like
Patch should also patch the openai.Completion.create method.

The multi classification does not actually work as intended.

Describe the bug
The multi classification does not actually work as intended.

To Reproduce
I copy paste the example for multi prediction and the outputs result in all the labels being predicted always. No matter the classes declared, the prompt used, the result is the same. All classes are predicted.

examples/classification/multi_prediction.py

Small error in `openai_function`

Was getting error:
AttributeError: 'openai_function' object has no attribute 'schema'

Fixed by changing line 30 to:
assert message["function_call"]["name"] == self.openai_schema["name"], "Function name does not match"

Thanks for putting this up, this code is super useful.

openai.error.InvalidRequestError: Unrecognized request argument supplied: functions

Run example in Azure openai, following error occurs:
openai.error.InvalidRequestError: Unrecognized request argument supplied: functions

Can someone give some opinions on this? Thanks in advance.

Weird usecase where pydantic model has field that represents code but gets invalid json characters, failing model_validate_json

Is your feature request related to a problem? Please describe.
I have a weirdish use case, where one of the fields of the pydantic model represents code.
The code is often returned with a bunch of invalid json characters in it, like control characters (\u0000-\u001F).

This makes instructor fail on errors like this:
File "/opt/homebrew/lib/python3.11/site-packages/pydantic/main.py", line 530, in model_validate_json return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for RustCode Invalid JSON: control character (\u0000-\u001F) found while parsing a string at line 4 column 0 [type=json_invalid, input_value='\n{\n"generated_code": "...xample_output": "37"\n}', input_type=str] For further information visit https://errors.pydantic.dev/2.4/v/json_invalid joelkronander@MacBook-Pro-5 swissknife %

Describe the solution you'd like
Maybe one could handle cases like this with some form of "pre-validators" that could for example run byte64 encoding on those non-json compatible strings? Not sure how it would fit in exactly.

Additional context
Instructor is nice.

Base example doesnt work?

Hi jason, watched your Pydantic talk and thought I'd check it out. Seems like a fantastic idea but on openai==1.1.0 and instructor==0.3.0 raises a TypeError. This of course does not arise when using the "unpatched" openai client and sending the request, without the response_model kwarg

user = client.chat.completions.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'classmethod' object is not callable

thanks! and great talk

Add some basic ci for pytest

Just to know stuff isn't breaking

Changing Patch behavior

I think there are a few ways to add the response_model and other capabilies.

Monkey Patch Global

import instructor

instructor.patch()

resp = openai.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)

Monkey Patch Context

with instructor.patch():
      resp = openai.ChatComplete.create(..., response_model=Model)
      assert isinstance(resp, Model)

Import Custom SDK

from instructor import client (#as openai)

resp = client.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)

I think we need to be concious of how other tools also patch the client.

Wrong package name in README

MY BAD :I(

Doc improvement: why would one use distillation?

I was reading through the docs and saw https://jxnl.github.io/instructor/distillation/ . The page explains the "what" and the "how", but not the "why" - I assume this feature caters to some usecases, but it's not clear to me at all what those would be? The examples given seem like a ridiculously bad idea - replacing instantaneous, deterministic on-device calculations with slow, prone to hallucination api calls? Why would I ever want to use an LLM to perform simple math? I get they're just examples, but maybe it would be nice to have a paragraph explaining real usecases for this.

Decoupling the llm backend

Is your feature request related to a problem? Please describe.
I see the library is tightly coupled with openai function calling. but it would be good to decouple the model from pydantic way of doing things and use any model (llms from langchain) that way we can experiment with smaller/self-hosted/other cloud models

Describe the solution you'd like
ability to pass pydantic structures to any llm and get results back. for eample, something like using langchain tools where function calling is isolated from llm.

Describe alternatives you've considered
custom tools in langchain implementation for function calling

Additional context
not sure its already possible. I haven't experimented yet, but it looks like its coupled based on the repo subtitles /examples

Function does not obey Enums

Describe the bug
I set a enum for one of the function inputs.
I have a pydantic class that refers to the enum.
The output args show that the enum is not followed.

Expected behavior
I would expect that the generated args obey the enum I set for that field.

Bug: openai_schema removes properties named title

openai_schema removes properties/fields named title from json schema

Example:

class Author(OpenAISchema):
  """Class representing an author. 
  This class is used to extract author's name and
  poem's title from a text"""
  name: str = Field(..., description="Name of the author")
  title: str = Field(..., description="Title of the article")

Author.openai_schema

# output:
"""
{'name': 'Author',
 'description': "Class representing an author. \nThis class is used to extract author's name and
\npoem's title from a text",
 'parameters': {'type': 'object',
  'properties': {'name': {'description': 'Name of the author',
    'type': 'string'}},
  'required': ['name']}}
"""

Exclude properties with defaults from required

Suggestion:
parameters["required"] = sorted(k for k, v in parameters.get("properties", {}).items() if not "default" in v)
instead of
parameters["required"] = sorted(parameters["properties"])

That would allow us to:
data: Any = Field(None, description="Optional data attached")

Upcoming openai-python 1.0.0 release

Hello. Thanks for your great work on Instructor. Really appreciate that it's thoughtfully constructed for use in production.

I wanted to check what your plans are for the upcoming openai-python 1.0.0 release (openai/openai-python#631). Instructor currently has a dependency on <0.28.

Thanks!

Documentation: add some details on prompting other models using ```json and other tricks.

Add support for mkdocs

So we can get started on building our examples and dsl docs.

Help: Add links to the advanced usage

adding the links from files to the readme would be helpful, also adding more code snippets.

Default parameters to pydantic model

Is your feature request related to a problem? Please describe.
I'm always frustrated when I need send default parameters to pydantic response_model

Describe the solution you'd like
I want to send for example default sex to model (don't extract data with ChatCompletion), because I know Jason's sex 😄 :

class UserDetail(BaseModel, sex):
    weight: int
    sex: str
    def is_obese(self):
        if self.sex=='female' and self.weight>100:
            return True
        if self.sex=='male' and self.weight>120:
            return True
        return False


user: UserDetail = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    parameters={'sex': 'male'}
    messages=[
        {"role": "user", "content": "Extract Jason 200kg"},
    ]
)

Describe alternatives you've considered
I'd considered create another pydantic class to complete properties for user. But it is not correct way, because UserDetail should have all user properties, some extracted from ChatCompletion and others send for me.

Maybe, I have lost something. I'm not expert using pydantic. If you can share me another option I would be grateful.

Install and leverage erdantic

It would allow us to make a. Diagram per example and show how modeling gets us these nice to haves.

Support Headers in Chat completion api

For those that need to add additional information for example things like LLMObs should be good to be able to add headers

Bugs in `Example 2: Schema Extraction`

There are two bugs in Example 2: Schema Extraction.

There's a missing comma character after functions=[UserDetails.openai_schema]
Missing import, from pydantic import Field

Example link 404

Describe the bug
Link to examples in README is currently broken.

To Reproduce
Steps to reproduce the behavior:

Click on "To see more examples of how we can create interesting models check out some examples." link in the README
Links to https://github.com/jxnl/instructor/blob/main/examples/index.md

Expected behavior
Links to https://jxnl.github.io/instructor/examples/

Change openai base schema to a decorator so models only inherit from base model.

upgrade to pydantic v2

We should upgrade to pydantic v2

OpenDNS considers onrender.com a security threat

re tweet at https://twitter.com/jxnlco/status/1677907692122259456?s=20

A number of people on Cisco-administered networks may see some version of this error when attempting to access:

openai-function-call.onrender.com is classified as a potential security risk and access is restricted

May want to publish to a different domain/github pages?

JsonDecoderError at the specific place

Describe the bug
When Using the instructor, at some input. It will raise json error fault.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A way to fix the bug

Screenshots

Desktop (please complete the following information):
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core

Request: Trees of thought implemented as function calls w/ a search loop!

i think the 24 game or a cross word example would be awesome

Request: Pydantic types from openapi.json to make restful agent

there should be some tools that can get pydantic from openapi.json

would love to see an example like

endpoints = Endpoints.from("www.website.com/openapi.json")

completion = openai.ChatCompletion(
   function_call=endpoints
   ...
   )

Logic error in ChatCompletion or

for class ChatCompletion(BaseModel):

def or(self, other: Union[Message, OpenAISchema]) -> "ChatCompletion":
if isinstance(other, Message):
if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
self.system_message = other

should be

if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
else:
self.system_message = other

Automate doc building process

We should add in some automation in actions to 'mkdocs gh-deploy' I wanna new tag when we publish a new version

Does instructor support Azure OpenAI API ?

When I use Azure OpenAI, I often encounter errors, but occasionally it succeeds. I am not sure if the current instructor can use the Azure OpenAI API. Below is the function and frequent error message.

new_updates = openai.ChatCompletion.create(
        response_model=Report,
        deployment_id= dep.GPT_4,
        max_retries=2,
        messages=[
                {
                    "role": "system",
                    "content": SYSTEM_PROMPT_KG_SYT
                },
                {
                    "role": "user",
                    "content": f"""Extract any new events from the following:
                    # Part {i}/{num_iterations} of the input:

                    {inp}"""
                },
                {
                    "role": "user",
                    "content": f"""Here is the current state of the report:
                    {cur_state.model_dump_json(indent=2)}"""
                }
            ],
        
    )  # type: ignore

Describe the bug
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Traceback (most recent call last):
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 92, in
ade_report: Report = generate_report(text_chunks)
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 47, in generate_report
new_updates = openai.ChatCompletion.create(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 162, in new_chatcompletion_sync
response, error = retry_sync(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 117, in retry_sync
response = func(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 299, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 710, in _interpret_response
self._interpret_response_line(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'

Desktop (please complete the following information):

OS: Windows

Additional context
Azure OpenAI version : 2023-08-01-preview

Async might not be properly handled in latest instructor/openai versions?

I am using instructor = "^0.3.1" and openai = "^1.2.0".

I initialize my client as:

client = instructor.patch(AsyncOpenAI(
    api_key=OPENAI_API_KEY,
))

And then call it as:

async def myfunc():
    ...
                response = await client.chat.completions.create(
                    model=model_name,
                    messages=messages,
                    response_model=response_model, # type: ignore
                    max_retries=2
                )

This gives me an error: Error in getting response from model: 'coroutine' object has no attribute 'choices'.
I stepped through the code in a debugger and it seems like wrap_chatcomplete wraps the AsyncOpenAI().chat.completion.create as a sync function, not an async one?

pip install instructor has dependency conflicts in Colab

Describe the bug

Running !pip install instructor in Colab creates the following dependency conflicts:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.

To Reproduce
Steps to reproduce the behavior:

Create a new colab
Run !pip install instructor

Expected behavior
Clean install without dependency conflicts.

Support parameters docstring for `@openai_function` annotation

Is your feature request related to a problem? Please describe.
I want to be able to define a good old python function to use it both for the schema and execution, but if I want to add description to the parameters. Right now, I have to use a class definition. This could be solved by supporting the standard parameters parsing from docstrings.

Describe the solution you'd like
E.g., this should work:

@openai_function
def get_current_weather(
    location: str, format: Literal["celsius", "fahrenheit"] = "celsius"
) -> WeatherReturn:
    """
    Gets the current weather in a given location, use this function for any questions related to the weather

    Parameters
    ----------
    location
        The city to get the weather, e.g. San Francisco. Guess the location from user messages

    format
        A string with the full content of what the given role said
    """

    return WeatherReturn(
        location=location,
        forecast="sunny",
        temperature="25 C" if format == "celsius" else "77 F",
    )

But right now the description of the parameters goes into the function description, not into the parameters description.

How it is right now:

{
    'name': 'get_current_weather',
    'description': '\n    Gets the current weather in a given location, use this function for any questions related to the weather\n\n    Parameters\n    ----------\n    location\n        The city to get the weather, e.g. San Francisco. Guess the location from user messages\n\n    format\n        A string with the full content of what the given role said\n    ',
    'parameters': {
        'properties': {
            'location': {'type': 'string'},
            'format': {
                'default': 'celsius',
                'enum': ['celsius', 'fahrenheit'],
                'type': 'string'
            }
        },
        'required': ['format', 'location'],
        'type': 'object'
    }
}

How I expect it:

{
  'name': 'get_current_weather',
  'description': 'Gets the current weather in a given location, use this function for any questions related to the weather',
  'parameters': {
      'properties': {
          'location': {
              'description': 'The city to get the weather, e.g. San Francisco. Guess the location from user messages',
              'type': 'string'
          },
          'format': {
              'description': 'A string with the full content of what the given role said',
              'default': 'celsius',
              'enum': ['celsius', 'fahrenheit'],
              'type': 'string'
          }
      },
      'required': ['location'],
      'type': 'object'
  }
}

Adding a lightweight prompt abstraction to the SchemaClass

Sure! Here's the updated proposal where PromptConfig has the model as a required argument and all other attributes as optional. The default model is set to "gpt3.5-turbo-0613":

from pydantic import BaseModel
from typing import Optional

class OpenAISchema(BaseModel):
    class PromptConfig:
        model: str = "gpt3.5-turbo-0613"
        system: Optional[str]
        message: Optional[str]
        temperature: Optional[float]
        max_tokens: Optional[int]

    @classmethod
    def from_response(cls, response):
        # Implementation based on the actual response format.

    @classmethod
    def create(cls, message=None, *args, force_function=False, **kwargs):
        messages = kwargs.get("messages", [])

        if not messages and hasattr(cls, "PromptConfig"):
            if cls.PromptConfig.system:
                messages.append({
                    "role": "system",
                    "content": cls.PromptConfig.system
                })
            if cls.PromptConfig.message:
                messages.append({
                    "role": "user",
                    "content": cls.PromptConfig.message
                })

        if message:
            messages.append({
                "role": "user",
                "content": message
            })

        if force_function:
            kwargs['function_call'] = {"name": cls.openai_schema["name"]}

        kwargs['messages'] = messages

        if hasattr(cls, "PromptConfig"):
            kwargs.setdefault('model', cls.PromptConfig.model)
            kwargs.setdefault('temperature', cls.PromptConfig.temperature)
            kwargs.setdefault('max_tokens', cls.PromptConfig.max_tokens)

        completion = openai.ChatCompletion.create(
            functions=[cls.openai_schema],
            **kwargs
        )
        return cls.from_response(completion)

class Search(OpenAISchema):
    # Implementation remains the same

class MultiSearch(OpenAISchema):
    class PromptConfig:
        system = "You are a capable algorithm designed to correctly segment search requests."
        message = "Correctly segment the following search request"
        model = "gpt3.5-turbo-0613"
        temperature = 0.5
        max_tokens = 1000

    # Implementation remains the same

# Example of usage:
queries = MultiSearch.create(
    "Please send me the video from last week about the investment case study and also documents about your GPDR policy."
)
queries.execute()

This revision makes the PromptConfig more flexible and easier to use with the default model set and all other parameters as optional. This configuration can be overridden on a per-class basis, as shown in the MultiSearch.PromptConfig example.

Where to inject the few shot examples?

Where should I put the few shot examples into the prompt to improve accuracy? Should I put it in the model docstring or somewhere else? Can you provide an example?

Thanks.

json.decoder.JSONDecodeError: Invalid control character at: line 2 column 16 (char 17)

i am using openai_function_call to generate code. ran into this error, opening an issue for it. but cant identify a repro yet

Compatibility with Langchain

Is your feature request related to a problem? Please describe.
Would like to resolve dependency incompatibility between langchain and openai_function_call

Describe the solution you'd like
langchain and openai_function_call to be compatiable

Describe alternatives you've considered
None

Additional context

  Because no versions of openai-function-call match >0.2.0,<0.3.0
   and openai-function-call (0.2.0) depends on pydantic (>=2.0.2,<3.0.0), openai-function-call (>=0.2.0,<0.3.0) requires pydantic (>=2.0.2,<3.0.0).
  And because langchain (0.0.238) depends on pydantic (>=1,<2)
   and no versions of langchain match >0.0.238,<0.0.239, openai-function-call (>=0.2.0,<0.3.0) is incompatible with langchain (>=0.0.238,<0.0.239).
  So, because nira-ai depends on both langchain (^0.0.238) and openai-function-call (^0.2.0), version solving failed.

pip install instructor has dependency conflicts in Colab

Describe the bug

Running !pip install instructor in Colab creates the following dependency conflicts:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.

To Reproduce
Steps to reproduce the behavior:

Create a new colab
Run !pip install instructor

Expected behavior
Clean install without dependency conflicts.

Create a fastapi example to decomstrsted shared type

Would be cool to create an example where a fastapi and function call example shared the same object type and there was a great standardized way

Of allowing openai to call the endpoint as code

[Bounty] Instructor finetuning CLI needs to support validation_file and hyperparameters

Is your feature request related to a problem? Please describe.

We need to be able to pass in the hyperparameters and validation file here:
https://github.com/jxnl/instructor/blob/main/instructor/cli/jobs.py#L135

It should basically look like: https://platform.openai.com/docs/api-reference/fine-tuning/create#fine-tuning-create-hyperparameters

Describe the solution you'd like

make a PR to add it into the cli
update the documentation in the finetune docs page here: https://github.com/jxnl/instructor/blob/main/docs/cli/finetune.md

Default description for generated schema

When getting the .openai_schema from an OpenAISchema (BaseModel) class, if the class has a docstring, then that is used as the description. If there is no docstring, one is automatically added. The current default description (no docstring) is this - a description about the extraction process rather than a description of the object.

For example, if I define an Address as

class Address(City):
    country: str
    state: str
    city: str
    street: str

Then the .openai_schema is

{'name': 'Address',
 'description': 'Correctly extracted `Address` with all the required parameters with correct types',
 'parameters': {'properties': {'country': {'type': 'string'},
   'state': {'type': 'string'},
   'city': {'type': 'string'},
   'street': {'type': 'string'}},
  'required': ['city', 'country', 'state', 'street'],
  'type': 'object'}}

However, if I add a docstring to the type, like

class Address(City):
    """An address"""
    country: str
    state: str
    city: str
    street: str

then the .openai_schema is

{'name': 'Address',
 'description': 'An address',
 'parameters': {'properties': {'country': {'type': 'string'},
   'state': {'type': 'string'},
   'city': {'type': 'string'},
   'street': {'type': 'string'}},
  'required': ['city', 'country', 'state', 'street'],
  'type': 'object'}}

The current default string doesn't really have the same use case as the description when a docstring is present.

I think a better default description would be the empty string ("") or maybe just the class name. In most cases, I think it would be preferable that the language model is given no description of the type than one about the schema generation process.

Typer version too old

Describe the bug
Is there a reason why Typer version ^0.4.0 is used while the latest version is 0.9.0 ?
It might conflict with other packages that required more recent version of Typer

Bounty: Streaming function calls

To be considered checkout : https://replit.com/bounties/@jxnl/streaming-json-parse

I'd like to have the capability of parsing functions calls as they stream out for MultiTask when doing streaming function calls. You can use any existing python library. Must work for nested and deep objects.

Below is some code that won't work, since theres no good way of doing this:

from pydantic import BaseModel

class Task(BaseModel):
    id: int
    title: str

# This is your existing generator that yields chunks of JSON string
def json_chunks(json_string):
    for i in range(0, len(json_string), 5):  # replace 5 with the chunk size you want
         chunk = json_string[i:i+5]
         print("yield chunk:", chunk)
         yield chunk

def tasks_from_chunks(json_chunks: Generator[str, None, None]):
     # do something to get a single task_json
     task = Task.parse_raw(**task_json)
     print("yield task", task)
     yield task
     
json_string = '{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'

for task in tasks_from_chunks(json_chunks(json_string)):
     print(task)

Success criteria

tasks are yielded as soon as they are parsed, there for task 1 should yield before all jsons chunks are yielded
must contain a few examples to show it works correctly.

Add LLM based citation

It will be nice to have Fact generated with semantic citations (not the Regex-based ones that you have in the cookbook). We can do this with a custom validation function that invokes an LLM call.

Bug - cannot import name 'FieldValidationInfo' from 'pydantic'

Describe the bug
I get the following error: cannot import name 'FieldValidationInfo' from 'pydantic'.

When doing:

from instructor import OpenAISchema

To Reproduce

from instructor import OpenAISchema

Expected behavior

Expected it to not crash

Screenshots

Desktop (please complete the following information):
Version 0.2.8
Macbook Pro - Intel
Chrome

Help: Reorganize module strucutre

would be nice to have a structure where theres a directory per example so we can have a readme.md for each example and a list of evals to run.

jxnl / instructor Goto Github PK

instructor's Introduction

💥 whats up?

Support

Writing¶

Systems¶

Talks and Podcasts¶

instructor's People

Contributors

Stargazers

Watchers

Forkers

instructor's Issues

Monkey Patch Global

Monkey Patch Context

Import Custom SDK

openai-function-call.onrender.com is classified as a potential security risk and access is restricted

How it is right now:

How I expect it:

Success criteria

Recommend Projects

Recommend Topics

Recommend Org