noviscl / design2code Goto Github PK

View Code? Open in Web Editor NEW

426.0 426.0 31.0 509.3 MB

License: MIT License

Python 0.85% Jupyter Notebook 99.14% Shell 0.01%

design2code's People

Contributors

Stargazers

Watchers

design2code's Issues

better eval interface

improve API design of the eval code so users can either provide one single filename, one single directory, or multiple directories for running eval. also need to benchmark eval speed for people's reference.

visual_eval_v3_multi() should take input HTML files as input

visual_eval_v3_multi(input_list, debug=False) right now requires screenshots of the generated webpages as input, which shouldn't be necessary.

error when multithread.

Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Better Scoring Function

Looked through some examples and it seems that the scoring can still be improved, esp. for cases where certain elements are entirely missing from the generation.

Metric Improvement Ideas

Include image matching (size, positions, etc.). Also consider missing / extra image files.
Consider visual similarity of bounding box elements: colors, fonts (for text), etc.
Generate a more explainable multi-dimensional report apart from the final aggregate score. E.g., text element score & image element score / content score & layout score. We can ask humans to judge along the same dimensions.

Fix evaluation score

Need to fix the overlapping and merging issue of text blocks (the right one below):

DATA LICENSE

Need to check what would be an appropriate license for our test data. Also, are we ok with releasing them as part of the repo directly (e.g., should we somehow avoid them from being part of the future GPT training data)?

Preprocessing details

@NoviScl
Can we remove something like

@import url("http://fonts.googleapis.com/css?family=Open+Sans");

during preprocessing? I find it can sometimes lead to render failure while taking screenshot:

inquiry about its usage

after taking screenshot from (python3 data_utils/screenshot.py ).
How to input that screenshot to the model and gets generated code?

Why do you prefer to choose CLIP embedding to calculate high-level similarity instead of others? Are there any considerations?

Minimize code duplicates

There are some duplicates right now (e.g., screenshot code is also in the metrics modules; image rescaling code is copied over in the GPT-4V module). Would be nice to refactor the code to avoid these.

Potential Bug in Metric v3

Example 11625.png in gpt4v_visual_revision_prompting, error below:

Traceback (most recent call last):
 File “eval.py”, line 46, in <module>
  matched, final_score, multi_score = visual_eval_v3(os.path.join(predictions_dir, filename.replace(“.html”, “.png”)), os.path.join(reference_dir, filename.replace(“.html”, “.png”)))
 File “/Users/clsi/Desktop/Pix2Code/Pix2Code/metrics/visual_score.py”, line 963, in visual_eval_v3
  blocks1 = get_blocks_ocr_free(gpt_img)
 File “/Users/clsi/Desktop/Pix2Code/Pix2Code/metrics/ocr_free_utils.py”, line 229, in get_blocks_ocr_free
  different_pixels = find_different_pixels(p_png, p_png_1)
 File “/Users/clsi/Desktop/Pix2Code/Pix2Code/metrics/ocr_free_utils.py”, line 74, in find_different_pixels
  raise ValueError(“Images are not the same size”)
ValueError: Images are not the same size

@StevenyzZhang

We need high resolutions AND flexible aspect ratio.
1. playwright: flexible aspect ratio (full page option) but low res(?)
2. html2image: can set to high res, but need to set aspect ratio.
Default background colors can lead to unrecognizable words by OCR.
1. --default-background-color flag in browser settings
2. Color differences pose challenges to OCR (the dark color foot note in 1390.png)

noviscl / design2code Goto Github PK

design2code's People

Contributors

Stargazers

Watchers

Forkers

design2code's Issues

Recommend Projects

Recommend Topics

Recommend Org