keirp / automatic_prompt_engineer Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 144.0 4.47 MB

License: MIT License

Python 95.07% Jupyter Notebook 4.93%

automatic_prompt_engineer's People

Contributors

Stargazers

Watchers

Forkers

c00renut techthiyanes oqustudy llv22 shunsunsun marcus-arcadius netzkontrast alirezabayatmk lxuechen haoyitedaniu toandreyhse manisheshwar thejaminator rfriel glavin001 89sooner jzsues ryanz8 yqgao716 johnkurian dan255 chensun-yp jxzhangjhu yiran-hao coletrumbocole ivjia weili-107 janussanders afiqmuzaffar zouyiting xsthunder standardgalactic yoojlee codeaudit peijiangkuang evelynmitchell logan-zou pdsxsf curiosity007 chosenone75 andyjzhao llegomark alchaincyf xuguanfei krish240574 ianycxu evdcush csshali aiworkspace tomchapin hiroki11x zhangshuhao0928 log45 dremovd apollohuang1 fiddlecube jeetshah-at-x ferdinandl007 chengshuai-shi vibster brunoscaglione jlysh xiaoyanhe0713 mygit-2023 jetmew syno8 zhangbo2008 zhiyupan waqasjavedkhan eshaanagarwal n-dmn meghanathmacha wangjw6 ryenhails matthhong immortalise gitkolento heungson meigaoms ysong2123 choiandshin alialemimatinpour prempsg kp-forks danielalcaide m9e samee99 visla-melinda-devins kirank3k1 xushihao-gary shyampatibandla leonhardrocha popupbuddy xuyin0216 rongtongxueya lang398 cpiece jouya97 waterjason prototypingstill

automatic_prompt_engineer's Issues

"echo" when using new model

When I using get-3.5-turbo-instruct, the error shows
Setting 'echo' and 'logprobs' at the same time is not supported for this model..

When I set 'echo' to false, the logic of code has some problem and new error shows.

Question: Can I generate prompts without OUTPUT?

This project fits very well in one of my use cases where I need to generate prompts for DALLE from a set of image classes.
Example: classes = ["backpack", "mobile-phone"]

However, I don't have any outputs. Do you think that's something I can achieve?

Thanks.

Question about the searching process in Algorithm 1 Line 2-9.

Hi Keiran, thanks for the really great work! I have a question about the implementation of run_instruction_induction.py. As in the paper Algorithm 1 Line 2-9, there is a process of iteratively keeping top k% of prompts and re-evaluating with random training subsets. Maybe I didn't read carefully, is the process implemented in the code? By the way, how to judge the convergence and what is the value of k? Looking forward to your reply~

how to access `curie:ft-uoft-ml-group-2022-09-26-01-07-30`?

Hi! Thanks for open source and readable code! However, when I run python experiments/run_truthful_qa.py, its raise does not exist error. Should I use "curie" instead of "curie:ft-uoft-ml-group-2022-09-26-01-07-30"? thanks in advance.

possible source of error

model_names = {
    "judge": "curie:ft-uoft-ml-group-2022-09-26-01-07-30",
    "info": "curie:ft-uoft-ml-group-2022-09-27-13-35-15"
}

error

The model: `curie:ft-uoft-ml-group-2022-09-26-01-07-30` does not exist

Can I use other models for generate instead?

Hello. When I tried 'gpt3.5-turbo' model for generate I got error.

This is the error message, This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?.

try:
--> 160                 response = openai.Completion.create(
    161                     **config, prompt=prompt)

Can I use others?
How to change the 'end point'?

can i use my local LLM?

another domain

Can I randomly generate datasets to apply APE to other domains?

Why using f1 for evalution for common_concept?

Hi, thanks for your great job. I have read the paper and reviewed the code, and I find that it uses f1 as the metric for the common_concept dataset. However, the evaluate function utility.get_multi_answer_f1 seems to be a bug in this case. For example, in this implementation, it will receive two words like "[Yes, Yes]" as prediction and ground truth for calculating the f1 metric rather than a word list like "[[Yes, Yes, No], [No, Yes, No]]". It will not return the right answer of f1 metric I think, because each sample of the test data is calculated separately. Could you please give me an answer? Thanks!

In addition, It seems you use the EA as a metric for the common_concept dataset like in Figure 4, it is not the same as in your code (refer to following snapshots).

Error when using new model - gpt-3.5-turbo-instruct-0914

First I encountered the error:
"Too many parallel completions requested. You submitted..."

This was fixed after lowering the batch size to 3 in both the the generation and the evaluation.
(prompt_gen_batch_size=3, eval_batch_size=3)

After that I'm getting a division by zero error from the lines:
87 prompt_log_probs[-1].append(sum(lps) / len(lps))
88 i += 1
89 return prompt_log_probs
ZeroDivisionError: division by zero

Taken from the function _compute_avg_likelihood in the likelihood file.

What can I do?

Demo gradio error

----> 2 demo = get_demo()
      3 demo.launch(debug=True)

1 frames
[/usr/local/lib/python3.10/dist-packages/gradio/component_meta.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    153             return None
    154         else:
--> 155             return fn(self, **kwargs)
    156 
    157     return wrapper

TypeError: Textbox.__init__() got an unexpected keyword argument 'disabled'

Schedule of releasing BIG-Bench Instruction Induction (BBII)?

Hi, 😀, thanks for open-sourcing! I noticed the BBII was introduced in the openreivew version. I wonder if there is any schedule for releasing BIG-Bench Instruction Induction (BBII)?

Thanks in advance.

where can I find the "resample" part from the project?

credit loss

Hello！ we tried your demo and we think it was very helpful to us.
However, in the process of using it, we found that the use of paid API keys will cause a very serious problem of credit loss, which has not occurred when using free accounts. We would very much like to continue using this tool, how can I fix this?

text-davinci-002 is deprecated

I'm trying to generate prompts but the model you are using is deprecated, you should replace it with gpt-3.5-turbo-instruct

Can this be used along with multimodal LLMs (GPT4-vision-preview)

Hi,

Is it possible to use the same prompt generation methods with multimodal LLMs - e.g. GPT4-vision for information extraction tasks? Thanks in advance.

Error: Unexpected end of JSON input

Hi, this looks very promising!

Unforntunately, i am facing an issue when trying to put in my own dataset. This error doesn't occur when i load one of the preset tasks.

The console log:

Failed to load resource: the server responded with a status of 500 ()      index-8bb1e421.js:4
Uncaught (in promise) Error: Unexpected end of JSON input at index-8bb1e421.js:4:10593

Chat model Support :

Hey there, I'd like to know if there will be a release with gpt3.5 model? if not how easy should it be to integrate it in the code?

gpt-3.5-turbo-instruct model strange issue

I encountered the following error: cannot unmarshal number into Go struct field GeneralOpenAIRequest.logprobs of type bool, very strange, can anyone help?

Question about the set of prompts pre evaluation

Hi Keiran,

thank you for your contribution to prompt research!!! My seminar partner and I are trying to understand your code and test the evaluation step separately with different LLMs within a course at our university. To do this, we would like to extract the generated prompts and evaluate/rank them separately in an evaluation step with different LLMs.

Could you provide us with a JSON or similar file with the prompts per task type, in case you used more than the limit of 50 prompts per run when testing your code and saved a corresponding file (our resources do not allow us to do this for every task in davanci).

Thanks in advance and have a nice day!

Louis