gym_screen_task

brainstormed subtasks:

[ ] find a button

[x] click a button

[ ] click the right button of many

[x] drag a slider

[ ] click a slider

[x] press keyboard button

[ ] draw a figure

[ ] track a target with the mouse

[ ] press a button with timing

[ ] move a cursor object with keyboard

[ ] move the viewport with the keyboard

[ ] move the viewport with the mouse

[ ] drag select multiple objects with mouse

[ ] avoid object with cursor/mouse

[ ] copy text

[ ] write into textfield

[ ] paste text

[ ] drag the viewport with mouse

[ ] let the mouse get captured in a field

[ ] exit the mouse capture in a field

[ ] navigate through elements with keyboard

design:

its implemented as a gym environment, so you can use the simple gym.make, step, reset, obs and actions

the observation space is:

"screen": rgb pixels, with an adjustable resolution

"task_description": a textual description of the task, or subtask with optionally textual hints/prompts for a vlm

note, absolute mouse movement is ignored for now

the action space is:

"keyboard": printable ascii buttons

"mouse_buttons": left, right, scroll-click

"mouse_rel_move": relative mouse movement

"mouse_abs_move": absolute mouse movement (set mouse position in px)

"mouse_scroll": relative mouse scroll-wheel movement

additional data:

"semantic space": is an image of size screen, that represents semantic classes of ui elements: "text", "button", "image", ...

maybe we could use some kind of "ui-element-graph" to highly compress the screen semantics

we also need a format for an "screen-action-trace" this traces the actions for every frame

https://github.com/mlfoundations/open_clip

https://github.com/mlfoundations/open_flamingo/tree/main

other stuff

this was initially generated by cookiecutter and the https://github.com/flowpoint/pyproject template

flowpoint / gym_screen_task Goto Github PK

gym_screen_task's Introduction

gym_screen_task

brainstormed subtasks:

design:

additional data:

related work

other stuff

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent