Comments (10)
Here, OpenAI acknowledges that GPT-4 "is not fully reliable" and "makes reasoning errors". In this case, it made a reasoning error about what model it is.
Might be an interesting eval to verify if a model is capable of knowing about itself without additional context.
from evals.
But it's charged under GPT4 on the account
from evals.
I believe ChatGPT with model gpt-4 (within the browser) is using GPT-4 with additional context.
I also believe the Chat API (not within the browser) with model gpt-4
(like in the playground) is also using GPT-4.
In short: It's incorrect that it reports using GPT-3 when it really uses some version of GPT-4.
My justification is that I am running a bunch of evals and gpt-4
outperforms gpt-3.5-turbo
.
And also my experience in using ChatGPT in the browser: The GPT-4 configuration outperforms the GPT-3 configuration and outputs better (more correct) responses.
from evals.
My only concern and question mark is training data cut off date. I know as an end user when I install or use a consumer facing app that claims gpt4 and answers as gpt3 is a UX problem.
from evals.
You would think when the platform was written that it would include the ID for the updated iteration. Not so much self aware but embedded in the program.
from evals.
Better comparison would be the OpenAI Playground, since ChatGPT (Plus) is pre-fed some system context we don't know.
from evals.
I tried it on openAI playground as well.
from evals.
and the training data cut off date it says is 2020 (same with GPT-3) not 2021
from evals.
The issue appears to be: When GPT-4 is configured, is GPT-4 used when the output says it is not GPT-4?
I think the best analogy I can make is: If I trained a parrot to bark like a dog, it does not mean that it is not a parrot.
If the configuration is set to GPT-4, GPT-4 is used even though the output states that it is not GPT-4.
For example, in your query:
ozgur@Ozgurs-MacBook-Pro ~ % curl https://api.openai.com/v1/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer [REDACTED]"
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "What is your model number!"}]
}'
{"id":"chatcmpl-6wVWzn95O6iDc1Z9W1kJNIZZtZZLh","object":"chat.completion","created":1679402249,"model":"gpt-4-0314","usage":{"prompt_tokens":12,"completion_tokens":45,"total_tokens":57},"choices":[{"message":{"role":"assistant","content":"As an AI language model, I do not have a model number like a physical device or product would. I am powered by OpenAI's GPT-3, which stands for Generative Pre-trained Transformer 3."},"finish_reason":"stop","index":0}]}
the query is set to use gpt-4
:
"model": "gpt-4"
and the prompt is:
"What is your model number!"
and the output is:
"I am powered by OpenAI's GPT-3, which stands for Generative Pre-trained Transformer 3."
Here, there appears to be a factual error: The output says it's powered by GPT-3 but the fact is that it's powered by GPT-4. Although the output is inaccurate, it is still GPT-4. A model can be wrong about self-identifying questions about itself, but it does not change what model is being used. In the GPT-4 Technical Report, OpenAI states:
Here, OpenAI acknowledges that GPT-4 "is not fully reliable" and "makes reasoning errors". In this case, it made a reasoning error about what model it is.
OpenAI does not make any guarantees or promises about accuracy. When you use ChatGPT or OpenAI's API, you agree to OpenAI's Terms of use:
By using our Services, you agree to these Terms.
In Section 7(b) of the Terms of use, the Disclaimer states that they do not warrant that their services will be accurate or error free:
(b) Disclaimer. THE SERVICES ARE PROVIDED “AS IS.” EXCEPT TO THE EXTENT PROHIBITED BY LAW, WE AND OUR AFFILIATES AND LICENSORS MAKE NO WARRANTIES (EXPRESS, IMPLIED, STATUTORY OR OTHERWISE) WITH RESPECT TO THE SERVICES, AND DISCLAIM ALL WARRANTIES INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, SATISFACTORY QUALITY, NON-INFRINGEMENT, AND QUIET ENJOYMENT, AND ANY WARRANTIES ARISING OUT OF ANY COURSE OF DEALING OR TRADE USAGE. WE DO NOT WARRANT THAT THE SERVICES WILL BE UNINTERRUPTED, ACCURATE OR ERROR FREE, OR THAT ANY CONTENT WILL BE SECURE OR NOT LOST OR ALTERED.
from evals.
As mentioned by @jonathanagustin, our models tend to hallucinate facts and this case was most likely an instance of hallucination. However, it may be possible (though the issue you noticed doesn't point to it) that the UI is actually wrong! If you suspect that might be the case, please contact our support team.
from evals.
Related Issues (20)
- Evaluate the cost of running tests
- Use github.com/apssouza22/chatflow as a conversational layer. It would enable actual API requests to be carried out from natural language inputs.
- oaieval --help errors for me HOT 3
- Context window of completion functions not accounted for
- Multiple evals not found HOT 5
- Should random collection of values be supported?
- Eval-running often hangs on last sample HOT 4
- In the task "balance_chemical_equation", many instances have incorrect labels. HOT 1
- Using different models in evaluating mode-graded eval and in generating the completion HOT 5
- `Failed to open: ../registry/data/social_iqa/few_shot.jsonl` with custom registry
- Evals broken with latest openai package v1.1.1 HOT 2
- Do not back off on `openai.BadRequestError` HOT 1
- Proposal for Adding a New Evaluation Metric: Sentiment Analysis Accuracy
- Improvements to `Match`: case insensitive and strip
- Running an evaluation can lead to circular import error HOT 4
- oaieval doesn't run beacuse of "module 'openai' has no attribute 'error'" HOT 3
- Error structure in `utils` after openai package upgrade HOT 2
- Mismatch between LangChainChatModelCompletionFn code and registry HOT 3
- Possibility to sell high quality benchmarks HOT 1
- Request to change arithmetical_puzzles prompting
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evals.