Comments (8)
@qrdlgit It's basically based on RAG method so don't need to be contained within training set.
from gorilla.
Heh! This is my contribution I'm afraid. g'luck.
from gorilla.
Yes, but the paper claims superior performance to GPT4.
I have consistently found that GPT4 hallucinates less on data that it has been trained on. When you add vector retrieval it even does a better job.
For APIs that were added after the cutoff date, it wouldn't be surprising that GPT4 hallucinations would increase.
This might explain why Gorilla can out perform GPT4.
This is not a complaint. Gorilla paper was really great and has lots of fantastic ideas.
I didn't see any discussion of this in the paper. If there was and I missed it, please let me know. I just want to understand.
If you compare performance between Gorilla and GPT4 on APIs that were added after the cutoff date versus ones that came before, what would it look like?
from gorilla.
They used APIs that has been quite stable a while and I believe not much has been changed after the cut-off of GPT4 pre-training. so the benchmark seems to me fair enough.
from gorilla.
Don't take this personally, but I'm not sure you are familiar with these details.
eg, from https://github.com/ShishirPatil/gorilla/blob/main/data/apibench/huggingface_train.json
I found microsoft/xclip-base-patch16-zero-shot which had an initial commit in the last 9 months.
from gorilla.
@qrdlgit Thank you for your comments! One thing we need to clarify:
We don't require GPT-4 to output exactly same API here, as long as the API from GPT-4's output has the same functionality, we count as correct. See the script from here: https://github.com/ShishirPatil/gorilla/blob/main/eval/eval-scripts/ast_eval_hf.py. This has been very consistent from the very beginning.
from gorilla.
That answers the question, but not in the way you probably intended - Ie, evals were not done with API dates in mind.
Again, the gorilla is still a great idea and paper. A lot of good takeaways for sure.
However, in the future you probably want to be more careful about data leakage / data contamination issues. This is a problem I'm seeing in a lot of papers coming out recently.
One thing you might want to try is evaluating post cutoff APIs alone. The lack of fine tuning capability on GPT4 and its cutoff date is a significant achilles heel, at least for the moment.
If the performance is even more SOTA, that would be a great example of how using OS LLMs can be superior for certain use cases. GPT4 really is an (obsolete) jack of all trades, master of none.
from gorilla.
Thank you for your question and insightful discussion @qrdlgit and @fritzprix! When it comes to the issue of data contamination, we are completely aligned. We have been cautious to ensure that Gorilla doesn't encounter any of the test set data during its training phase. However, we are unable to provide any comment on the training/test data for models that are closed-source.
Your point about splitting APIs before and after 09/2021 is well taken. As @fritzprix pointed out, we would ideally like to believe that an oracle retriever can address the issue concerning the cut-off date as effectively as possible.
To validate this hypothesis, you can conduct a straightforward experiment - given that our training and evaluation datasets are open-sourced, it should be relatively simple to filter out APIs published post 09/2021 and validate this experiment. If you do end up doing it, please feel free to share the results. We would certainly appreciate such a contribution!
from gorilla.
Related Issues (20)
- how to test new model on BFCL? HOT 2
- [bug] openfunctions-v2 default chat template
- [feature] Add multi-turn conversational function calling category for benchmarking HOT 2
- the evaluation of class relevance in BFCL maybe unfair HOT 1
- What format was used for the final fine-tuning of LLaMA2-7B in RAFT? HOT 1
- [bug] Hosted Gorilla: <Issue> HOT 6
- The Urban Dictionary from the RapidAPI is not serving, can't evaluate execution data
- auto fill missed mandatory param is a nightmare HOT 3
- [bug] Hosted Gorilla: <Issue> HOT 2
- [bug] Hosted Gorilla: <Issue> HOT 1
- [bug] Hosted Gorilla: <Issue> HOT 2
- Rapid API error (Yahoo Finance, https://rapidapi.com/sparior/api/yahoo-finance15) is inaccessible HOT 6
- Local CUDA Support for RAFT
- Revamp Landing README HOT 3
- [bug] OpenFunctions-v2: <Issue> HOT 1
- [bug] OpenFunctions-v2: <HTTP code 502> HOT 1
- When [Evaluate the Response with AST tree matching]: TypeError: __init__() takes exactly 1 argument (2 given)
- Data issue HOT 1
- Question about AST evaluation for Java and JavaScript HOT 1
- [RAFT] Publish Pypi package with raft, eval and format scripts
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gorilla.