Comments (4)
from deepseek-llm.
TriviaQA我们测试的是web的子集,实际评测时每个样本选择的few-shot example是随机从train里面挑选的,tech report中只是给出了其中的一个示例。
另外评估结果差7个点可能是对答案的后处理不一致,我们使用的后处理脚本供参考:
def normalize_answer(s):
"""Lower text and remove punctuation, articles and extra whitespace."""
def remove_articles(text):
return re.sub(r"\b(a|an|the)\b", " ", text)
def white_space_fix(text):
return " ".join(text.split())
def handle_punc(text):
exclude = set(string.punctuation + "".join(["‘", "’", "´", "`"]))
return "".join(ch if ch not in exclude else " " for ch in text)
def lower(text):
return text.lower()
def replace_underscore(text):
return text.replace("_", " ")
return white_space_fix(remove_articles(handle_punc(lower(replace_underscore(s))))).strip()
from deepseek-llm.
哪里可以看到你们的测评脚本?想复线一波?
from deepseek-llm.
现在还没有开源评测的脚本
from deepseek-llm.
Related Issues (20)
- 67B-Instructor – will it be released shortly/ever? HOT 1
- Will finetune scripts be provided? HOT 1
- Programming Language in LeetCode Weekly Contest HOT 3
- Inquiry about Prompt Engineering and Handling Toxicity/Hallucination
- Missing files in released pretrain ckpts HOT 1
- 关于System Prompt HOT 4
- AlignBench测评结果复现求助 HOT 2
- AWS CLI 使用问题与 deepseek-ai S3 桶访问问题 HOT 1
- Training data distribution HOT 1
- 关于vllm使用的疑问 HOT 1
- 请问LLM和coder的base model结构是一样的吗?还是有什么区别呢? HOT 1
- Deepseek SFT数据包含system应该如何处理? HOT 1
- Scaling laws data HOT 1
- Could you please release intermediate pretraining checkpoints at HuggingFace?
- Deepseek VL? HOT 1
- 关于模型指标有一些疑问 HOT 1
- Humaneval, use base model or instruct finetuned model? HOT 1
- 贵团队是否会升级长上下文的版本? HOT 1
- Is the compute calculation wrong for Chinchilla in the paper? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepseek-llm.