ysymyth / react Goto Github PK

View Code? Open in Web Editor NEW

1.8K 1.8K 180.0 6.31 MB

[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models

License: MIT License

Jupyter Notebook 99.37% Python 0.63%

decision-making large-language-models llm prompting reasoning

react's Introduction

ReAct Prompting

GPT-3 prompting code for ICLR 2023 paper ReAct: Synergizing Reasoning and Acting in Language Models.

To use ReAct for more tasks, consider trying LangChain's zero-shot ReAct Agent.

Setup

You need to first have an OpenAI API key and store it in the environment variable OPENAI_API_KEY (see here).

Package requirement: openai, and install alfworld following instructions here.

Experiments

Run {hotpotqa,fever,alfworld,webshop}.ipynb. As HotpotQA and FEVER have large validation sets, we only run 500 random examples (see notebooks). We find PaLM and GPT-3 are better at different tasks.

	HotpotQA (500 random dev, EM)	FEVER (500 random dev, EM)	AlfWorld (success rate)	WebShop (success rate)
PaLM-540B (paper)	29.4	62.2	70.9	40
GPT-3 (davinci-002)	30.4	54	78.4	35.8

Citation

@inproceedings{yao2023react,
  title = {{ReAct}: Synergizing Reasoning and Acting in Language Models},
  author = {Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan},
  booktitle = {International Conference on Learning Representations (ICLR) },
  year = {2023},
  html = {https://arxiv.org/abs/2210.03629},
}

react's People

Contributors

Stargazers

Watchers

Forkers

codeaudit robert1003 taocao techthiyanes castillosebastian amritasaha87 mmaplecn ericxsun tswings ronald-d-rogers spacelearner amanikiruga yejiahaoye timwee helenhwl qingyun-wu dumpmemory dan255 mchlrnx ysong10 nicu-chiciuc moerehman goswamig doandongnguyen amikos-tech richardkelley haotiansun14 twilwa alexschlessinger merajahmed co-simulation toufunao goabiaryan mint-vip xinzhanxuan ameliechatelain babyblue26 cpmlreef mistrymm7 crhapsody apollohuang1 af-74413592 yihaocs hbcbh1999 kynesyn testttttttt11 standardgalactic krish240574 yc1999 greydoubt pbnewron xiang-li-oss zhangweijia1999 kbalde truehastr juglar-diaz ckqqqq craigbasson ratewar jakderrida masa8 evelynmitchell silkzwx clairema0418 ldruth28 samlee946 songym2020 yuchen-x sheldongg gowun daje0601 uwecerron phaethonp antonpolishko schultzjack hungchiayu1 lvchenyangai bamaao davila7 eric-doug ayunillariy hsinyu1226 adam-fallon manu87ds ravirao04 ashish-ram amlansamanta jemis140 ale-go azure-arc-0 gurusura dgg23 sunshinezhihuo mayi140611 abdullahalasad zenrsr chenzhiz stjordanis suspicious-cow eric-szewai

react's Issues

Webshop experiment details for numbers in paper

Hi,

For webshop env, what was the number of retrieved items displayed per page?
As per the code, it seems item names indexed after 3 are purposefully omitted, which does not seem to be clarified in the actual paper.

Could you please explicitly clarify this setting just so that I am clear whether this was a small change for visualization in code or was it done for all results reported in the paper?

I was looking through the earlier issues in the repo and couldn't find this resolved in the closed issues.

Thanks!

Alfworld GPT-3 Results

Hi,
I wondered if you had more details or numbers from your GPT-3 results on Alfworld? For instance, do you have the splits of accuracy across the different subtasks (as in Table 3 in the paper)?

I would try to reproduce it, but I reckon the total cost would be > $100 and would like to avoid it if possible.

Old or New openai version

I used the code as it is for the hotpotqa.ipynb and found the following error:

APIRemovedInV1 Traceback (most recent call last)
Cell In[53], line 10
8 old_time = time.time()
9 for i in idxs[:500]:
---> 10 r, info = webthink(i, to_print=True)
11 rs.append(info['em'])
12 infos.append(info)

Cell In[47], line 26
24 for i in range(1, 8):
25 n_calls += 1
---> 26 thought_action = llm(prompt + f"Thought {i}:", stop=[f"\nObservation {i}:"])
27 try:
28 thought, action = thought_action.strip().split(f"\nAction {i}: ")

Cell In[52], line 10
9 def llm(prompt, stop=["\n"]):
---> 10 response = openai.Completion.create(
11 model="text-davinci-002",
12 prompt=prompt,
13 temperature=0,
14 max_tokens=100,
15 top_p=1,
16 frequency_penalty=0.0,
17 presence_penalty=0.0,
18 stop=stop
19 )
20 return response["choices"][0]["text"]

File c:\Users\fattoh.alqershi\ReAct\ReAct\myvenv\lib\site-packages\openai\lib_old_api.py:39, in APIRemovedInV1Proxy.call(self, *_args, **_kwargs)
38 def call(self, *_args: Any, **_kwargs: Any) -> Any:
---> 39 raise APIRemovedInV1(symbol=self._symbol)

APIRemovedInV1:

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

Alternatively, you can pin your installation to the old version, e.g. pip install openai==0.28

It seems that error in version, when I back to version openai==0.28.

It raised other error related to the client parameters. Expected (messages, and other .....) but no messages there in the code.
Please, support me.

Thanks.

WEBSHOP_URL = "http://3.83.245.205:3000" 遇到一些问题

请问一下，在webshop实验中
我们的网页地址是： WEBSHOP_URL = "http://3.83.245.205:3000"
请问我应该如何把这个网页替换为我自己的网页？有网页构建的代码吗

Potential Implementation error on Webshop

Hi, I'm trying to reproduce your ReAct results on Webshop using some LLM APIs. However, I sometimes encountered the following errors.

Basically, sometimes, after you select some specific options and then click[Buy Now], it's going to show the error below:

Traceback  
(most recent call last) 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
2095 
,
      in  
__call__ 
def __call__(self, environ: dict, start_response: t.Callable) -> t.Any: 
"""The WSGI server calls the Flask application object as the 
WSGI application. This calls :meth:`wsgi_app`, which can be 
wrapped to apply middleware. 
""" 
return self.wsgi_app(environ, start_response) 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
2080 
,
      in  
wsgi_app 
try: 
ctx.push() 
response = self.full_dispatch_request() 
except Exception as e: 
error = e 
response = self.handle_exception(e) 
except:  # noqa: B001 
error = sys.exc_info()[1] 
raise 
return response(environ, start_response) 
finally: 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
2077 
,
      in  
wsgi_app 
ctx = self.request_context(environ) 
error: t.Optional[BaseException] = None 
try: 
try: 
ctx.push() 
response = self.full_dispatch_request() 
except Exception as e: 
error = e 
response = self.handle_exception(e) 
except:  # noqa: B001 
error = sys.exc_info()[1] 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
1525 
,
      in  
full_dispatch_request 
request_started.send(self) 
rv = self.preprocess_request() 
if rv is None: 
rv = self.dispatch_request() 
except Exception as e: 
rv = self.handle_user_exception(e) 
return self.finalize_request(rv) 
def finalize_request( 
self, 
rv: t.Union[ResponseReturnValue, HTTPException], 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
1523 
,
      in  
full_dispatch_request 
self.try_trigger_before_first_request_functions() 
try: 
request_started.send(self) 
rv = self.preprocess_request() 
if rv is None: 
rv = self.dispatch_request() 
except Exception as e: 
rv = self.handle_user_exception(e) 
return self.finalize_request(rv) 
def finalize_request( 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
1509 
,
      in  
dispatch_request 
getattr(rule, "provide_automatic_options", False) 
and req.method == "OPTIONS" 
): 
return self.make_default_options_response() 
# otherwise dispatch to the handler for that endpoint 
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) 
def full_dispatch_request(self) -> Response: 
"""Dispatches the request and on top of that performs request 
pre and postprocessing as well as HTTP exception catching and 
error handling. 
File  
"/home/user/webshop/web_agent_site/app.py" 
,
      line  
221 
,
      in  
done 
return html 
@app.route('/done/<session_id>/<asin>/<options>', methods=['GET', 'POST']) 
def done(session_id, asin, options): 
options = literal_eval(options) 
goal = user_sessions[session_id]['goal'] 
purchased_product = product_item_dict[asin] 
price = product_prices[asin] 
reward, reward_info = get_reward( 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py" 
,
      line  
59 
,
      in  
literal_eval 
expression.  The string or node provided may only consist of the following 
Python literal structures: strings, bytes, numbers, tuples, lists, dicts, 
sets, booleans, and None. 
""" 
if isinstance(node_or_string, str): 
node_or_string = parse(node_or_string, mode='eval') 
if isinstance(node_or_string, Expression): 
node_or_string = node_or_string.body 
def _raise_malformed_node(node): 
raise ValueError(f'malformed node or string: {node!r}') 
def _convert_num(node): 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py" 
,
      line  
47 
,
      in  
parse 
assert major == 3 
feature_version = minor 
elif feature_version is None: 
feature_version = -1 
# Else it should be an int giving the minor version for 3.x. 
return compile(source, filename, mode, flags, 
_feature_version=feature_version) 
def literal_eval(node_or_string): 
""" 
  File "<unknown>", line 1
    {'color': '2
               ^
SyntaxError: EOL while scanning string literal
 

      This is the Copy/Paste friendly version of the traceback.
     
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2095, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2080, in wsgi_app
    response = self.handle_exception(e)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2077, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1525, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1523, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1509, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/user/webshop/web_agent_site/app.py", line 221, in done
    options = literal_eval(options)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    {'color': '2
               ^
SyntaxError: EOL while scanning string literal
 

  The debugger caught an exception in your WSGI application.  You can now
  look at the traceback which led to the error.   

  If you enable JavaScript you can also use additional features such as code
  execution (if the evalex feature is enabled), automatic pasting of the
  exceptions and much more. 

        Brought to you by  
DON'T PANIC 
, your
        friendly Werkzeug powered traceback interpreter.
       
Console Locked 

          The console is locked and needs to be unlocked by entering the PIN.
          You can find the PIN printed out on the standard output of your
          shell that runs the server.
         
PIN:

To reproduce the error, you can try this:
In the ipython file of ReAct webshop, select the task id 83: i need a slim fit gray colored coat that has long sleeves. it should be in x-large size, and price lower than 40.00 dollars. Then do the following actions:

search[slim fit gray coat long sleeves x-large]
click[B09FF97YGV]
click[2#gray]
click[x-large]
click[Buy Now]

Then the error occurs. When doing these actions directly on the website, there is no such error. Therefore there may be something wrong when passing the argument to the environment.
(The errors I notice all come when an option that has '#' inside it is selected, maybe that's useful. )
Could you please help check that? Thank you so much!

where is the finetune dataset?

FEVER and WebShop code

Hello @ysymyth, thanks for sharing your code, excellent work! Is there any plan to release the code of FEVER and WebShop? Thank you!

Get low accuracy with GPT-3.5.

Hi, I'm tring to run ReAct with GPT-3.5-Turbo on hotpot dataset with provided jupyter notebook. But only get 0.182 accuracy, is it a reasonable result? I think it is much lower than result showed in paper.

How can I install ReAct?

Don't give me links to Alfworld! The installations there don't work, the support is nonexistent.
How can I install ReAct on my Ubuntu 22.04?

How to ask LLM generate ReAct format?

Did you use prompt like https://github.com/hwchase17/langchain/blob/bc2ed93b77cf9c40920ca5bf96968c90bb3e322e/langchain/agents/react/textworld_prompt.py#L4-L45 to ask GPT3 to generate result in ReAct format?

Or you just create many examples, and fine tune it, so it generate it? And this only works in your fine tuned model, and not working in GPT3-4?

I'd like to know if the method in langchain actually correct and works.

question on alfworld and textworld version.

when i run alfworld.ipynb, it return:
Initializing AlfredTWEnv...
Checking for solvable games...
Overall we have 134 games
Evaluating with 134 games
Traceback (most recent call last):
File "/home/ict/ReAct/react.py", line 55, in
env = env.init_env(batch_size=1)
File "/home/ict/miniconda3/envs/react1/lib/python3.9/site-packages/alfworld/agents/environment/alfred_tw_env.py", line 224, in init_env
infos = textworld.EnvInfos(won=True, admissible_commands=True, expert_type=expert_type, expert_plan=expert_plan, extras=["gamefile"])
File "/home/ict/miniconda3/envs/react1/lib/python3.9/site-packages/textworld/core.py", line 109, in init
raise ValueError(msg)
ValueError: Unknown information requested: ['expert_plan', 'expert_type']. Available information are: ['admissible_commands', 'command_templates', 'description', 'entities', 'extras', 'facts', 'fail_facts', 'feedback', 'game', 'intermediate_reward', 'inventory', 'last_action', 'last_command', 'location', 'lost', 'max_score', 'moves', 'objective', 'policy_commands', 'score', 'verbs', 'win_facts', 'won']
it seems that textworld do not work any more.

Davinci-002

Is davinci-002 referring to text-davinci-002 or davinci-002 (not-finetuned model)?

Could you provide text-davinci-002 log on HotpotQA 500 (30.8EM)?

Hi Shunyu,

Could you provide text-davinci-002 trajectory on HotpotQA 500 (30.8EM in Table 5 of A.1 GPT-3 Experiments)?

Thank you!

How did you go about finetuning?

Hi there, I cannot seem to find any information on the fine-tuning process in your paper and this repository.

A snippet from your paper:

However, when finetuned with just 3,000 examples, ReAct becomes the best
method among the four, with PaLM-8B finetuned ReAct outperforming all PaLM-62B prompting
methods, and PaLM-62B finetuned ReAct outperforming all 540B prompting methods. In contrast,
finetuning Standard or CoT is significantly worse than finetuning ReAct or Act for both PaLM-
8/62B, as the former essentially teaches models to memorize (potentially halluincated) knowledge
facts, and the latter teaches models how to (reason and) act to access information from Wikipedia, a
more generalizable skill for knowledge reasoning.

Paper, table2

I am impressed with your research. Thank you for your good research.

But I have a question and would like to ask.

According to Table 2 of the paper, success and failure modes are divided.

what is the definition of success mode and failure mode?
if success mode is a successful case, it should not include false positives, because false positives are predicting something wrong as right.
ultimately, Hallucinated reasoning traces or facts are present in both success mode and failure mode. I wonder why?

Thanks!

Question about webshopEnv

Hi! I'm replicating ReAct results on WebShop, and I have several questions with webshopEnv in the jupyter notebook

It seems like you set the environment to only output the top 3 product (instead of the full 10)

if prod_cnt >= 3:
    processed_t = ''

Is this also what you used in the paper?

There's also assert False when the button Next or Prev is clicked. Is this also intentional?

Also, I have got results of ReAct on WebShop with session id fixed_{1-500}, which I believe is the same setup as the paper, using this environment (did not modify it) but with different llm (not PaLM-540B):

gpt-turbo-3.5
Act - Score: 64.99 Success Rate: 34.0
ReAct - Score: 59.9 Success Rate: 30.0

code-davinci-002
Act - Score: 64.99 Success Rate: 34.0
ReAct - Score: 65.60 Success Rate: 38.8

Is this to be expected? Wondering if you have any thoughts on this. After some researching, there're people saying that chain-of-thought might not be as effective for models that was trained with RLHF like ChatGPT. But I don't have much explanation for why I'm not seeing the performance boost from Act to ReAct with Codex (code-davinci-002)

Thank you in advance! Love the simplicity of your work and I'm trying to come up with new ideas based off of this paper :)

Have you considered renaming this project?

Hello, thank you for this important work and project!
I'm already seeing many references to the paradigm. The problem is that there was already a massively popular project named React. This makes searches for ReAct somewhat difficult.

the power of reason & action pattern has been proved in autogpt

I was wondering if autogpt is inspired by your ideas. anyway, thanks for your great efforts.

[Reproducing Results] on Alfworld

Dear Authors,

Thank you for the great work on introducing ReAct.

Since, the original model that you used text-davinci-002 is deprecated on openai the closest two alternatives are: gpt-3.5-turbo and davinci-002. The best performance we get on e.g. the first 10 is 0.3, while the reported results on the first 10 envs of Alfworld are 0.7.

Could you share the traces or advice, what your latest scores on this environment is? Or how to reproduce your score of 0.7. @ysymyth @john-b-yang @descrip

Thanks.

cot->react & react->cot

Hello, I would like to ask if there is a code implementation for cot ->react and react ->cot mentioned in the paper

你好，我想问一下论文里提到的cot->react 和 react->cot 有代码实现吗

Could you please tell me how to access the url in the WebShop.ipynb: http://3.83.245.205:3000 ?

Thank you for your code. But I can not access the webshop url in your jupterbook. Do I have to launch another servise?

I got zero score running Webshop.ipython

I tried to run Webshop.ipython, and here are some of the outputs:

Observation: Invalid action!

Action: click[Add to Cart]
Observation: 

Action: click[Add to Cart]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

1 0.0 0.0 0.0
-------------
-----------------
1
Action: reset
Observation: 

Action: click[Buy Now]
Observation: Invalid action!

Action: click[Add to Cart]
Observation: 

Action: click[Add to Cart]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

2 0.0 0.0 0.0
-------------
-----------------

How can I get the right score? Thank you

How to finetune the small REACT model

Hi, I was wondering how could we finetune the small REACT model given the prompts generated using LLM being prompt tuned.

Are we trying to use LoRA or P-Tuning for the finetuning step?
How to use the prompt data?
(1) Letting all the actions and thoughts be the input and let the final action (answer) be the output
(2) Parse the whole ReAct process and use previous in-context info as input and current action as output
(3) Or any other way you used?

Really appreciate your help.

in

ReAct/wrappers.py

Line 162 in 6bdb3a1

except:

  def reset(self, seed=None, return_info=False, options=None, idx=None):
    self.env.reset(seed=seed, return_info=return_info, options=options)
    try:
      self.env.step('')
    except:
      pass
    self.env.reset(seed=seed, return_info=return_info, options=options)
    self.data_idx = int(np.random.randint(len(self.data))) if idx is None else idx
    observation = f"Claim: {self.data[self.data_idx][0]}"
    info = self._get_info()
    return (observation, info) if return_info else observation

I can not figure out why we need this try-except code， it seems this part of the code did nothing. The second self.env.reset will reset the env, there is no need for the first reset.

Still for the reset code, the return_info argument seems always been False. I think this argument can be dropped. Besides, the options and seed arguments have never been used in WikiEnv.reset, FeverWrapper.reset and WikiEnv.reset.
def reset(self, seed=None, return_info=False, options=None, idx=None):

ReAct/wikienv.py

Line 44 in 6bdb3a1

def reset(self, seed=None, return_info=False, options=None):

ReAct/wrappers.py

Line 158 in 6bdb3a1

def reset(self, seed=None, return_info=False, options=None, idx=None):

ReAct/wrappers.py

Line 214 in 6bdb3a1

def reset(self, seed=None, return_info=False, options=None, idx=None):
in the WikiEnv.step, the reward has not been changed since it was initialized, and has never been used why do we need this variable?
Besides, in the FeverWrapper.step, the reward is obtained by self.get_reward, not from WikiEnv.step. in

ReAct/wrappers.py

Line 188 in 6bdb3a1

obs, _, done, info = self.env.step(action)

, you use a _ to receive the reward from WikiEnv.step， it also demonstrated the reward in WikiEnv.step is not useful.

Thanks for your patience ~

Jupyter output on HotpotQA

@ysymyth Thanks for your good work!

Can you attach the output of HotpotQA (hotpotqa.ipynb), like those in (FEVER.ipynb)? Thank you!