System Info Hello, first of all, congratulate you for the great wo

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Incorrect output for tools (function calling) accord to openai specs about text-generation-inference HOT 15 CLOSED

puppetm4st3r commented on September 21, 2024 1

Incorrect output for tools (function calling) accord to openai specs

from text-generation-inference.

Comments (15)

drbh commented on September 21, 2024 2

hi @puppetm4st3r thank you for noting this issue, in order to open a PR you'll need to fork the repo and open a PR from your fork to this repo.

In order to run TGI locally you'll need to build everything and run the text-generation-launcher binary. Please see the local installation instructions on the readme: https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#local-install

from text-generation-inference.

joumenharzli commented on September 21, 2024 2

If anyone is having constantly "tools" as a function name, it's not related to the model you are using, there is a bug here

text-generation-inference/router/src/server.rs

Line 953 in 7eb3d75

name: "tools".to_string(),

from text-generation-inference.

puppetm4st3r commented on September 21, 2024 1

I solved (I think) but its my first time with Rust and I couldn't get the code to work on my local virtual env (instructions from readme.md didn't work), so I modified the code and ran it across the docker build and the container worked according to the Open AI specification, could you guide me (@drbh ) how to proceed? the regular pipeline to be able to execute locally and the tests to be able to do the PR as appropriate way.

now the ouput for:

from openai import OpenAI
tools = [
      {
          "type": "function",
          "function": {
              "name": "get_current_weather",
              "description": "Get the current weather",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "location": {
                          "type": "string",
                          "description": "The city and state, e.g. San Francisco, CA",
                      },
                      "format": {
                          "type": "string",
                          "enum": ["celsius", "fahrenheit"],
                          "description": "The temperature unit to use. Infer this from the users location.",
                      },
                  },
                  "required": ["location", "format"],
              },
          },
      },
      {
          "type": "function",
          "function": {
              "name": "get_n_day_weather_forecast",
              "description": "Get an N-day weather forecast",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "location": {
                          "type": "string",
                          "description": "The city and state, e.g. San Francisco, CA",
                      },
                      "format": {
                          "type": "string",
                          "enum": ["celsius", "fahrenheit"],
                          "description": "The temperature unit to use. Infer this from the users location.",
                      },
                      "num_days": {
                          "type": "integer",
                          "description": "The number of days to forecast",
                      },
                  },
                  "required": ["location", "format", "num_days"],
              },
          },
      }
  ]
# Initialize the client, pointing it to one of the available models
client = OpenAI(
    base_url="http://llm_server:3000/v1",
    api_key="_"
)

# NOTE: tools defined above and removed for brevity

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {
            "role": "system",
            "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
        },
        {
            "role": "user",
            "content": "What's the weather like the next 3 days in San Francisco, CA?",
        },
    ],
    tools=tools,
    tool_choice="auto",  # tool selected by model
    max_tokens=500,
)


called = chat_completion.choices[0].message.tool_calls
print(called)

code output:
[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"format":"fahrenheit","location":"San Francisco, CA","num_days":3}', name='get_n_day_weather_forecast'), type='function')]

open ai specs from docs outputs:
[ChatCompletionMessageToolCall(id='call_ujD1NwPxzeOSCbgw2NOabOin', function=Function(arguments='{\n "location": "Glasgow, Scotland",\n "format": "celsius",\n "num_days": 5\n}', name='get_n_day_weather_forecast'), type='function')]

llm raw output from TGI debug tracing:

{'id': 0, 'type': 'function', 'function': {'name': 'get_n_day_weather_forecast', 'arguments': {'format': 'celsius', 'location': 'San Francisco, CA', 'num_days': 3}}}

I did not touch the streaming methods and the objects ChatCompletionChunk and ChatCompletionDelta, my rust uderstanding is quiet pretty basic.

from text-generation-inference.

puppetm4st3r commented on September 21, 2024 1

thanks, will try! is my first intent of contribution in github, regards!

from text-generation-inference.

puppetm4st3r commented on September 21, 2024 1

I'm fine tunning some details after send the PR, but I have realized that for more complex functions (in a production environment) the models are very sensitive to the prompt engineer of both: the descriptions of the tool's json schema and the tooling_prompt, I Have tried many models of different sizes, in 7B it is a disaster and practically does not work (a lot of allucinations in the selected tool parameters), in mixtral 8x7 flavors didnt work well, in 34b it works with some errors, and in a 34bx2 moe there are already an aceptable quality that could be brought to a productive environment. Also I do some experiments in order to mimics more the open AI behaviour for the LLM to respond in a natural languaje form when there is no need to call a function, but it was a mess a lot of confusion by the LLM and did not work in a 70b, maybe with a bigger model but I dont have access to more VRAM so mi limit is 70B

The conclusion is to 100% mimics the open ai function calling there is a lot of work to done and challenges to solve !

@maziyarpanahi I'm planning for my production solution use TGI with 2 models, 1 for NLP, and other for function calling but without guidance, some thing like gorilla-llm/gorilla-openfunctions-v2

from text-generation-inference.

puppetm4st3r commented on September 21, 2024 1

@maziyarpanahi it could be helpful for you https://medium.com/@prudant/enabling-function-calling-with-gorilla-llm-gorilla-openfunctions-v2-using-the-openai-protocol-355492d0587d , my last simple but efficient implementation of local function calling

from text-generation-inference.

puppetm4st3r commented on September 21, 2024

I'm trying to solve the problem, making sure to get as close as possible to open ai's response schema in tools/function calling api. It's my first time in Rust but I already managed to compile the solution with my changes, now let's try it and I'll tell you to see if you can guide me on the steps to follow to make a PR :)

from text-generation-inference.

puppetm4st3r commented on September 21, 2024

Since forcing the LLM output does not allow the LLM to give conversational feedback, I additionally added the construction of a function that allows the LLM to report a run error when trying to select a tool either because required parameters are missing, or there is no tool that can Comply with the user's request, this allows controlling the precise execution of the tools or not executing them if it is not possible. This was implemented by modifying the default tools prompt, moving the json schema and tool selection instructions presented in the prompt to the last user role prompt (to better guide the LLM).

the new instruction prompt is:
Instructions for Tool Selection and Execution:\n1) Tools definitions: You will be presented with a JSON schema representing a set of tools and their execution constraints, intended for responding to user requests.\n2) Direct Matching Required: Select a tool that matches the user's request based on explicitly provided information. Avoid making assumptions about the user's intentions. The selected tool must directly address the request as specified, without inferring additional user intentions.\n3) Handling Incomplete Requests: If the user's request lacks sufficient detail to make a clear tool selection:\n - Do not guess or infer missing parameters.\n - Notify the situation with an error message detailing what specific information is missing.\n4) Error Reporting: If it's determined that no available tools can appropriately respond to the user's request due to missing or mismatched information, report this with an error message explaining in detail the discrepancy and why tool execution isn't possible.\n\nJSON Schema:\n

and the final prompt after applying chat template (with chatml) look like these:

<|im_start|>system
Please resolve the user's request, if it is not possible to resolve the request then report an error.<|im_end|>
<|im_start|>user
User request: Paris temperature today

---------------------------
Instructions for Tool Selection and Execution:
1) Tools definitions: You will be presented with a JSON schema representing a set of tools and their execution constraints, intended for responding to user requests.
2) Direct Matching Required: Select a tool that matches the user's request based on explicitly provided information. Avoid making assumptions about the user's intentions. The selected tool must directly address the request as specified, without inferring additional user intentions.
3) Handling Incomplete Requests: If the user's request lacks sufficient detail to make a clear tool selection:
   - Do not guess or infer missing parameters.
   - Notify the situation with an error message detailing what specific information is missing.
4) Error Reporting: If it's determined that no available tools can appropriately respond to the user's request due to missing or mismatched information, report this with an error message explaining in detail the discrepancy and why tool execution isn't possible.
de paris hoy
JSON Schema:
{"$functions":{"get_current_weather_by_city":{"description":"Given a city gets the current weather from","properties":{"format":{"description":"The temperature unit to use. Infer this from city","enum":["celsius","fahrenheit"],"type":"string"},"location":{"description":"The city name from a valid country (Only city names are valid inputs).","type":"string"},"name":{"const":"get_current_weather_by_city","description":"The name of the function","type":"string"}},"required":["city","format"],"type":"object"},"get_n_day_weather_forecast_by_city":{"description":"Given a city gets an N-day weather forecast","properties":{"format":{"description":"The temperature unit to use. Infer this from the city","enum":["celsius","fahrenheit"],"type":"string"},"location":{"description":"The city name from a valid country (Only city names are valid inputs)","type":"string"},"name":{"const":"get_n_day_weather_forecast_by_city","description":"The name of the function","type":"string"},"num_days":{"description":"The number of days to forecast","type":"integer"}},"required":["city","format","num_days"],"type":"object"},"notify_error":{"description":"Useful to notify when a tool can not be called.","properties":{"error":{"description":"The error or issue to notify","type":"string"}},"required":["error","language"],"type":"object"}},"properties":{"function":{"anyOf":[{"$ref":"#/$functions/get_current_weather_by_city"},{"$ref":"#/$functions/get_n_day_weather_forecast_by_city"},{"$ref":"#/$functions/notify_error"}]}},"required":["function"]}
---------------------------<|im_end|>
<|im_start|>assistant

The llm response is:

{
  "function": {
    "format": "celsius",
    "location": "Paris",
    "name": "get_current_weather_by_city"
  }
}

If I ask for the temperature in the moon for example, the llm response is:

{
  "function": {
    "error": "The request cannot be resolved with the available tools. The request requests the temperature on the moon, but the available tools can only provide weather information for cities on Earth. Please try again with a city on the earth."
  }
}

from text-generation-inference.

maziyarpanahi commented on September 21, 2024

Thanks @puppetm4st3r for your great work! I've been using grammar feature in Llama.cpp for function calling since the beginning and I can't wait to integrate this into my TGI setup. Appreciate your contributions here!

from text-generation-inference.

maziyarpanahi commented on September 21, 2024

@maziyarpanahi I'm planning for my production solution use TGI with 2 models, 1 for NLP, and other for function calling but without guidance, some thing like gorilla-llm/gorilla-openfunctions-v2

Thanks @puppetm4st3r for the detailed reply. Are there some examples that failed badly which they shouldn't have? Is the failure due to the model's weakness or a bug in enforcing the grammar? (does the test pass in other serving platform like Llama.cpp with a JSON grammar?)

from text-generation-inference.

puppetm4st3r commented on September 21, 2024

I think is the way of enforcing, small models did not work well inclusive with directly use of guidance frameworks, so in my experience by now in my use cases is better a 7B finetuned model for function calling compared with a 7B forced grammar model.

but by the other side a fintuned 7B for function calling lacks of enough reasoning for complex task, so you maybe can use that for simple taks like send an email, or query a simple sql table.

My best results in terms of quality/cost was to force grammars with a good 34b model, anything lower didnt work well for me, also tried a 72B but with 2k context len because my setup has 48gb, i'm waiting for the new 4bit kv cache from exllama2 to be included in the inference servers, that would allow us to run larger models with larger contexts on consumer GPUs...

By now I'm build a gptq quant to try the gorilla-llm/gorilla-openfunctions-v2 at 4 bits for func calling and a 34b for inference, with 2 TGI instances.

The bad performance of 7B is reflected in a lot of allucination and guessing of the function parameters, also they respect the forced grammar in terms of structure, but the content is a mess if your prompt is not a poem of perfect request definition.

from text-generation-inference.

maziyarpanahi commented on September 21, 2024

@maziyarpanahi it could be helpful for you https://medium.com/@prudant/enabling-function-calling-with-gorilla-llm-gorilla-openfunctions-v2-using-the-openai-protocol-355492d0587d , my last simple but efficient implementation of local function calling

Thanks @puppetm4st3r for sharing that post. I will start using TGI with grammar this week and compare it with Llama.cpp for function calling. (I mainly use 70B in 16-bit)

from text-generation-inference.

jphme commented on September 21, 2024

Maybe relevant here as well - I just commented under the PR ( #1587 (comment) ):

To make smaller models useful, It would be very beneficial to add a proper documentation for the function definition and function call format (when serialized to strings / in the prompt). Model creators could use this format for finetuning - currently it´s a huge issue that there is no standardized format and everyone does their own (and needs additional wrappers so code doesn't work with OpenAI compatible libs out-of-the-box, as the standard inference stacks don't support it).

We had just yesterday a discussion with Teknium on that, as Nous released a new model with a custom function format (as we did at DiscoResearch in the past).

from text-generation-inference.

vibhorag101 commented on September 21, 2024

I am facing the same issue. The incorrect tools format as per OpenAI specs breaks the compatibility with the instructor package. Hope the PR fixes this.

from text-generation-inference.

drbh commented on September 21, 2024

Update

hi @puppetm4st3r the tool response type has been updated in this PR #1650

Further discussion

If anyone is having constantly "tools" as a function name, it's not related to the model you are using, there is a bug here

text-generation-inference/router/src/server.rs

Line 953 in 7eb3d75

name: "tools".to_string(),

regarding the tool name, this is due to how functions are constrained in TGI and a larger discussion has been open here: #1657

from text-generation-inference.

Incorrect output for tools (function calling) accord to openai specs about text-generation-inference HOT 15 CLOSED

Comments (15)

I did not touch the streaming methods and the objects ChatCompletionChunk and ChatCompletionDelta, my rust uderstanding is quiet pretty basic.

Update

Further discussion

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent