microsofttranslator / gemba Goto Github PK

View Code? Open in Web Editor NEW

89.0 8.0 14.0 3.3 MB

GEMBA — GPT Estimation Metric Based Assessment

License: Creative Commons Attribution Share Alike 4.0 International

Python 100.00%

gemba's Introduction

GEMBA-MQM and GEMBA-DA

Setup

Install required packages with python >= 3.8

pip install -r requirements.txt

Set up secrets either for Azure API or OpenAI API:

export OPENAI_AZURE_ENDPOINT=
export OPENAI_AZURE_KEY=

export OPENAI_API_KEY=

Scoring with GEMBA

It assumes two files with the same number of lines. It prints the score for each line pair:

python main.py --source=source.txt --hypothesis=hypothesis.txt --source_lang=English --target_lang=Czech --method="GEMBA-MQM" --model="gpt-4"

The main recommended methods: GEMBA-MQM and GEMBA-DA with the model gpt-4.

Collecting and evaluating experiments for GEMBA-DA

Get mt-metric-eval and download resources:

git clone https://github.com/google-research/mt-metrics-eval.git
cd mt-metrics-eval
pip install .
alias mtme='python3 -m mt_metrics_eval.mtme'
mtme --download
cd ..
mv ~/.mt-metrics-eval/mt-metrics-eval-v2 mt-metrics-eval-v2

Collect data and run the scorer

python gemba_da.py 

export PYTHONPATH=mt-metrics-eval:$PYTHONPATH
python evaluate.py

License

GEMBA code and data are released under the CC BY-SA 4.0 license.

Paper

You can read more about GEMBA-DA in our arXiv paper or GEMBA-MQM in our arXiv paper.

How to Cite

GEMBA-MQM

@inproceedings{kocmi-federmann-2023-gemba-mqm,
    title = {GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4},
    author = {Kocmi, Tom  and Federmann, Christian},
    booktitle = "Proceedings of the Eighth Conference on Machine Translation",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
}

GEMBA-DA

@inproceedings{kocmi-federmann-2023-large,
    title = "Large Language Models Are State-of-the-Art Evaluators of Translation Quality",
    author = "Kocmi, Tom and Federmann, Christian",
    booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation",
    month = jun,
    year = "2023",
    address = "Tampere, Finland",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2023.eamt-1.19",
    pages = "193--203",
}

gemba's People

Contributors

Stargazers

Watchers

Forkers

kocmitom mayank-soni linguipster alvations naomiblkr leonardoemili seanpm2001 kleeeeea techthiyanes twilight2001 franciscoponcegomez m4a1carbin4 oliverthomas2000 crluna

gemba's Issues

CREDENTIALS

Hi,

Thank you for sharing the code to your metric. I am trying to run:
python gemba_mqm.py --source=source.txt --hypothesis=hypothesis.txt --source_lang=English --target_lang=Czech

I added my OpenAI key in the CREDENTIALS.py in the api_key value. But I get this error:

"Error, retrying...'gpt-4'".
I am running my experiment in COLAB.

I am not sure if there is anything else I need to modify in the CREDENTIALS.py file besides the api_key. I am using my account in OpenAI API. Could you please help with this issue?

执行main的时候报FileNotFoundError: [Errno 2] No such file or directory: 'mt-metrics-eval-v2/wmt22/sources/zh-en.txt'

这个不是问题，我搞错了

Can the sw also use GPT4T?

I tried to use a GPT4T model, but was not successful (GPT4 worked).

Maybe this is related to the old-API issue?

Connection Error after using own OpenAI API key

Dear Microsoft,

Hi, I keep having this connection error after configuring the CREDENTIALS.py file. I have no idea what's going on and have no clue how to debug for this problem. Can you please share more elaboration on solving this issue?

Thank you!

ps here is my CREDENTIALS.py config, I am not using azure

credentials = {
"deployments": {"gpt-4": "gpt-4-turbo"},
"api_base": "https://.openai.azure.com/",
"api_key": "**************************",
"requests_per_second_limit": 1
}

[Dependency changes] OpenAI changes API

When using the GEMBA as-is with gpt-4 in "deployment"/model with openai== it's throwing an error:

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

When openai migrate ran, it shows:

GEMBA % openai migrate
Retrieving Grit CLI metadata from https://api.keygen.sh/v1/accounts/custodian-dev/artifacts/marzano-macos-x64
Your working tree currently has untracked changes and Grit will rewrite files in place. Do you want to proceed? yes

./gemba/gpt_api.py
     
             if "api_base" in credentials:
                 # Azure API access
    -            openai.api_type = "azure"
    -            openai.api_version = "2023-05-15"
    -            openai.api_base = credentials["api_base"]
    -            openai.api_key = credentials["api_key"]
                 self.api_type = "azure"
             else:
                 # OpenAI API access
    -            openai.api_key = credentials["api_key"]
                 self.api_type = "openai"
     
             # limit the number of requests per second

Even after the migration, the text completion isn't working, so users had reclone GEMBA repo and do:

python3 -m pip install -r requirements
python3 -m pip install openai==0.28

before doing

python gemba_mqm.py --source=source.txt --hypothesis=hypothesis.txt \
  --source_lang=German --target_lang=English