Giter VIP home page Giter VIP logo

visualwebarena's People

Contributors

danielkornev avatar kohjingyu avatar ljang0 avatar robert1003 avatar shuyanzhou avatar tonystark262 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

visualwebarena's Issues

about CogVLM

hi,

When there are many images on a webpage, how do you handle the input length limit of CogVLM?

Thanks!

Trouble with Checkboxes

Hi, I tried your demo agent on my web application and for most interactions it does well. However it seems to have trouble with identifying/clicking on checkboxes. Is there some way to perhaps improve or resolve that?

Screenshot 2024-04-03 at 2 59 36 pm

Blank screenshot when running GPT-4V + SOM

The screenshot seems problematic when I run the GPT-4V + SoM agent with the following flags:

python run.py \
  --instruction_path agent/prompts/jsons/p_som_cot_id_actree_3s.json \
  --test_start_idx 0 \
  --test_end_idx 1 \
  --result_dir <your_result_dir> \
  --test_config_base_dir=config_files/test_shopping \
  --model gpt-4-vision-preview \
  --action_set_tag som  --observation_type image_som

Here is part of the render_0.html:
image
The GPT response also shows that the image sent was empty.

PyTest fail on test_click_open_new_tab

Hi There,

I'm running into test failures when I run the pytest test suite.

Here is my error:


tests/test_browser_env/test_script_browser_env.py s.s.......F

============================================================ FAILURES ============================================================
____________________________________________________ test_click_open_new_tab _____________________________________________________

accessibility_tree_current_viewport_script_browser_env = <browser_env.envs.ScriptBrowserEnv object at 0x7f1e5af406d0>

    def test_click_open_new_tab(
        accessibility_tree_current_viewport_script_browser_env: ScriptBrowserEnv,
    ) -> None:
        env = accessibility_tree_current_viewport_script_browser_env
        env.reset()
        env.step(
            create_playwright_action(
                "page.goto('https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_win_open')"
            )
        )
        obs, *_, info = env.step(
            create_playwright_action(
                'page.frame_locator("iframe[name=\\"iframeResult\\"]").get_by_role("button", name="Try it").click()'
            )
        )
        print("TP")
        print(info["page"].url)
>       assert info["page"].url == "https://www.w3schools.com/"
E       AssertionError: assert 'https://www....sref_win_open' == 'https://www.w3schools.com/'
E         - https://www.w3schools.com/
E         + https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_win_open

tests/test_browser_env/test_script_browser_env.py:293: AssertionError
------------------------------------------------------ Captured stdout call ------------------------------------------------------
TP
https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_win_open

I see that there has been some activity this , and that this is actually a new test itself from #23

How can I resolve this?

Thanks

Issue with URL links in classifieds tasks

Following the instructions you provided, the URL for classifieds is set to “http://localhost:9980/”. The home page loads correctly, but all the assets such as images, stylesheets, and links are on the “https://” URL instead of “http://”, which results in ERR_CONNECTION_CLOSED. As a result, we get only a blank page because the resources are not retrieved properly, causing the agent to fail the tasks.

AttributeError("'Page' object has no attribute 'client'")

While running the evaluation for Classifieds (also for reddit), I get the error Page' object has no attribute 'client'. The stack trace is shown below (also happens for config_files/test_classifieds/211.json). Did you face this issue? Any suggestion to fix this is highly appreciated.

[Config file]: config_files/test_classifieds/117.json
[Unhandled Error] AttributeError("'Page' object has no attribute 'client'")
Traceback (most recent call last):
  File "/home/pahuja.9/visualwebarena/run.py", line 396, in test
    obs, _, terminated, _, info = env.step(action)
  File "/home/pahuja.9/visualwebarena/browser_env/envs.py", line 307, in step
    observation = self._get_obs()
  File "<@beartype(browser_env.envs.ScriptBrowserEnv._get_obs) at 0x7f1a3d76e200>", line 10, in _get_obs
  File "/home/pahuja.9/visualwebarena/browser_env/envs.py", line 226, in _get_obs
    self.page, self.get_page_client(self.page)
  File "<@beartype(browser_env.envs.ScriptBrowserEnv.get_page_client) at 0x7f1a3d76e050>", line 33, in get_page_client
  File "/home/pahuja.9/visualwebarena/browser_env/envs.py", line 221, in get_page_client
    return page.client  # type: ignore
AttributeError: 'Page' object has no attribute 'client'

Assertion error in LLM based fuzzy match

for config_files/test_reddit/69.json, I get the following error in LLM based fuzzy match metric.

[Unhandled Error] AssertionError('n/a')
Traceback (most recent call last):
  File "/home/pahuja.9/visualwebarena/run.py", line 412, in test
    score = evaluator(
  File "/home/pahuja.9/visualwebarena/evaluation_harness/evaluators.py", line 626, in __call__
    cur_score = evaluator(trajectory, config_file, page, client)
  File "<@beartype(evaluation_harness.evaluators.HTMLContentExactEvaluator.__call__) at 0x7f992c464790>", line 115, in __call__
  File "/home/pahuja.9/visualwebarena/evaluation_harness/evaluators.py", line 472, in __call__
    StringEvaluator.fuzzy_match(
  File "<@beartype(evaluation_harness.evaluators.StringEvaluator.fuzzy_match) at 0x7f992c453e20>", line 69, in fuzzy_match
  File "/home/pahuja.9/visualwebarena/evaluation_harness/evaluators.py", line 197, in fuzzy_match
    return llm_fuzzy_match(pred, ref, intent)
  File "<@beartype(evaluation_harness.helper_functions.llm_fuzzy_match) at 0x7f992c452cb0>", line 69, in llm_fuzzy_match
  File "/home/pahuja.9/visualwebarena/evaluation_harness/helper_functions.py", line 609, in llm_fuzzy_match
    assert "correct" in response, response
AssertionError: n/a

I am using the same LLM for fuzzy match as in the original code.

Pytest code is not working

Hi team,

Thanks for releasing this interesting work.

I have a question about the uni test file (test_action_functionalities.py)

Ideally, it should parse some text like:"textbox' Full name'"
Screenshot 2024-07-18 at 5 52 34 PM

But actually, this is what we have from create_playwright_action.
Screenshot 2024-07-18 at 5 54 03 PM

This is the corresponding html content corresponding this part.
Screenshot 2024-07-18 at 5 56 10 PM

I follow readme to do the env setup. I am using Ubuntu22. playwright1.37.0. python=3.10.

Do you have any suggestion on this issue? Is it playwright version problem or browser version problem?

Thanks a lot for your help

Reproducing open-source model results

Hello,

I'm looking to reproduce some of the open-source model results from the VWA paper:
(1) Mixtral-8x7B model as the LLM backbone for Caption-augmented model
(2) CogVLM for the Multimodal Model.

Could someone share with me any flags/commands or instructions to setup these configurations for eval?

Archive.org Download Links Not Working

Hello,

I hope this message finds you well. I encountered an issue with the download links for the WebArena environment images hosted on Archive.org. The links appear to be broken and display an error message regarding metadata issues.

Affected Links:

Shopping Website Image
Wikipedia Website Image
Could you please look into this and provide updated links or fix the current ones?

Thank you very much for your assistance!

image

Errors in annotation

I found some errors in annotion.
In the classifieds_10:
sites: ['classifieds']
task_id: 10
require_login: True
storage_state: ./.auth/classifieds_state.json
start_url: http://localhost:9980
geolocation: None
intent_template: What is the {{attribute}} of {{item}}?
intent: What is the seat height in inches of the smaller piece of furniture on this page?
image: None
instantiation_dict: {'attribute': 'seat height in inches', 'item': 'the smaller piece of furniture on this page'}
require_reset: False
eval: {'eval_types': ['string_match'], 'reference_answers': {'exact_match': '21'}, 'reference_url': 'http://localhost:9980/index.php?page=item&id=43887', 'program_html': [], 'string_note': '', 'reference_answer_raw_annotation': ''}
reasoning_difficulty: easy
visual_difficulty: easy
overall_difficulty: easy
comments:
intent_template_id: 5

The output is 21 inches, which I think is correct.

In classified 142, the agent found wrong things in the trace of GPT4V. But it is evaluated as correct.

Elasticsearch / Opensearch issues on Shopping Website

Hello!

I'm having issues with the shopping website where items won't display in search or catalog. From admin panel, I believe this is due to the fact that one of the indexers is invalid, which in turn is due to opensearch / elasticsearch not working.

Testing elasticsearch in the admin panel throws the error "Class "" does not exist" even when localhost is running elasticsearch on 9200, or 'No Alive Nodes Found' when either elasticsearch or opensearch is used. I was thus hoping you could provide more information about how the search feature is configured within visualwebarena, as well as how one might link the two together.

Osclass Error

Hi, I ran the following commands in the environment readme to install classifieds environment, but encountered an OSClass Error:

unzip classifieds_docker_compose.zip
cd classifieds_docker_compose
vi classifieds_docker_compose/docker-compose.yml  # Set CLASSIFIEDS to your site url `http://<your-server-hostname>:9980/`, and change the reset token if required
docker compose up --build -d
# Wait for compose up to finish. This may take a while on the first launch as it downloads several large images from dockerhub.
docker exec classifieds_db mysql -u root -ppassword osclass -e 'source docker-entrypoint-initdb.d/osclass_craigslist.sql'  # Populate DB with content

Screenshot:

image

However, when I ran docker exec classifieds_db mysql -u root -ppassword osclass -e "SHOW TABLES;" to query the database tables, it seems good.

image

Could you help me resolve this? Thanks!

Configurations Setup for all Model Types

Hello,

Could you please share some of the configuration settings to reproduce the various model types?

I tried to reproduce the caption-augmented setup (Acc Tre + Caps) but my value was closer to the Multimodal result that had the Image Screenshot also as an input. Hoping I could get more clarification on how to switch between the 4 modes.

Here is my configurations

(1) Text-Only
observation_type: accessibility_tree
action_set_tag: id_accessibility_tree

(2) Caption-Augmented
observation_type: accessibility_tree_with_captioner
action_set_tag: id_accessibility_tree

(3) Multimodal
observation_type: ???
action_set_tag: id_accessibility_tree

(4) Multimodal (SoM)
observation_type: image_som
action_set_tag: som

Website Files Missing

Hello,

I was wondering if all of the website pages were included in the google drive downloads for the visual web arena environments set up? For One Stop shop it only displays a total of 24 items (and no items under the category tab), at least for me.

MySQL error with Classifieds website

Hi, I am trying to setup the classifieds website as outlined here https://github.com/web-arena-x/visualwebarena/blob/main/environment_docker/README.md#classifieds-website

When I execute docker exec classifieds_db mysql -u root -ppassword osclass -e 'source docker-entrypoint-initdb.d/osclass_craigslist.sql' # Populate DB with content, I get the error ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2).

Since docker is not affected by my local environment, it should run ideally. Kindly help me resolve this.

My docker_compose.yml is given below

version: '3.1'
  
services:
  web:
    image: jykoh/classifieds:latest
    ports:
      - "9980:9980"
    depends_on:
      - db
    container_name: classifieds
    environment:
      - CLASSIFIEDS=http://127.0.0.1:9980/
      - RESET_TOKEN=4b61655535e7ed388f0d40a93600254c
  db:
    image: mysql:8.1
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: osclass
    volumes:
      - ./mysql:/docker-entrypoint-initdb.d
      - db_data:/var/lib/mysql
    container_name: classifieds_db

volumes:
  db_data: {}

Some examples will report an error when running run.py

'''
Processing config_files/test_classifieds\5.json
2024-03-28 22:25:05,874 - INFO - [Config file]: config_files/test_classifieds\5.json
2024-03-28 22:25:05,875 - INFO - [Intent]: Navigate to my listing of the white car and delete it.
2024-03-28 22:25:06,181 - INFO - [Unhandled Error] InvalidSchema("No connection adapters were found for '127.0.0.1:9980/index.php?page=reset'")]
Processing config_files/test_classifieds\6.json
2024-03-28 22:25:06,182 - INFO - [Config file]: config_files/test_classifieds\6.json
2024-03-28 22:25:06,183 - INFO - [Intent]: Return the links of the 3 most recent motorcycles within $1000 to $2000 that are not orange.
Start testing config_files/test_classifieds\6.json
Finish testing config_files/test_classifieds\6.json
2024-03-28 22:27:15,567 - INFO - [Unhandled Error] LookupError("\n**********************************************************************\n Resource \x1b[93mpunkt\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt/english.pickle\x1b[0m\n\n Searched in:\n - 'C:\\Users\\PS/nltk_data'\n - 'C:\\Users\\PS\\Desktop\\visualwebarena\\venv\\nltk_data'\n - 'C:\\Users\\PS\\Desktop\\visualwebarena\\venv\\share\\nltk_data'\n - 'C:\\Users\\PS\\Desktop\\visualwebarena\\venv\\lib\\nltk_data'\n - 'C:\\Users\\PS\\AppData\\Roaming\\nltk_data'\n - 'C:\\nltk_data'\n - 'D:\\nltk_data'\n - 'E:\\nltk_data'\n - ''\n**********************************************************************\n")]
Processing config_files/test_classifieds\7.json
2024-03-28 22:27:15,570 - INFO - [Config file]: config_files/test_classifieds\7.json
2024-03-28 22:27:15,570 - INFO - [Intent]: Return the links of the 2 most recent items in the "Cell phones" category within $300 to $600 that are white in color.
Start testing config_files/test_classifieds\7.json
Finish testing config_files/test_classifieds\7.json
2024-03-28 22:28:47,585 - INFO - [Unhandled Error] LookupError("\n**********************************************************************\n Resource \x1b[93mpunkt\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt/english.pickle\x1b[0m\n\n Searched in:\n - 'C:\\Users\\PS/nltk_data'\n - 'C:\\Users\\PS\\Desktop\\visualwebarena\\venv\\nltk_data'\n - 'C:\\Users\\PS\\Desktop\\visualwebarena\\venv\\share\\nltk_data'\n - 'C:\\Users\\PS\\Desktop\\visualwebarena\\venv\\lib\\nltk_data'\n - 'C:\\Users\\PS\\AppData\\Roaming\\nltk_data'\n - 'C:\\nltk_data'\n - 'D:\\nltk_data'\n - 'E:\\nltk_data'\n - ''\n**********************************************************************\n")]
'''

AssertionError: Cookie ./.auth/reddit_state.json expired.

Thank you for your work. I reset the reddit by
bash ./scripts/reset_redddit.sh
Then run the prepare.sh, but alway meet the following error:

Traceback (most recent call last):
  File "code/vwa/browser_env/auto_login.py", line 182, in <module>
    main()
  File "code/vwa/browser_env/auto_login.py", line 173, in main
    assert not future.result(), f"Cookie {cookie_files[i]} expired."
AssertionError: Cookie ./.auth/reddit_state.json expired.

How to open trace files without display server?

I am trying to show the trace of one of the trace files, 463.trace.zip. This is the command I am using:

unzip 463.trace.zip -d 463_trace
xvfb-run playwright show-trace 463_trace

It has been sitting for several hours. Is this expected, or is there a better way to extract the trace?
Thanks!

Shopping Website Issues with setup:store-config:set

I've been able to host the shopping website successfully but noticed that running scripts/reset_shopping.sh causes the shopping website to clear all items not on the homepage, i.e. all the categories no longer have items. Specifically, it appears the command
docker exec $CONTAINER_NAME /var/www/magento2/bin/magento setup:store-config:set --base-url="http://localhost:7770" # no trailing slash
is causing this issue.

I was wondering what this command does in the context of the repo, and if it's safe to remove? If not, do you know what about this command could be causing the issue? I am currently hosting the shopping website on http://127.0.0.1:7770.

Multiple unreachable image ulrs

On the Classifieds and Reddit tasks, there are multiple image links that do not exist. An example of such errors is as follows:
L616 WARNING: cannot identify image file <_io.BytesIO object at 0x7f8b5c2f1ee0>

Why you have to remove the dupulicated content.

Why you have to remove the dupulicated content. Does this filter out cases where two elements with the same text?

           if content in text_content_text:
                        # Remove text_content_elements with content
                        text_content_elements = [
                            element
                            for element in text_content_elements
                            if element.strip() != content
                        ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.