Giter VIP home page Giter VIP logo

Comments (16)

simonw avatar simonw commented on June 7, 2024

I can increase the resolution of the images too when I do this, since they won't need to be small enough to not take up too much space any more.

I can use the til.simonwillison.net S3 bucket for this.

from til.

simonw avatar simonw commented on June 7, 2024

Images are currently generated by shot-scraper run from this Python script:

def png_for_path(path):
page_html = str(TMP_PATH / "generate-screenshots-page.html")
# Use datasette to generate HTML
proc = subprocess.run(["datasette", ".", "--get", path], capture_output=True)
open(page_html, "wb").write(proc.stdout)
# Now use shot-scraper to generate a PNG
proc2 = subprocess.run(
[
"shot-scraper",
"shot",
page_html,
"-w",
"800",
"-h",
"400",
"-o",
"-",
],
capture_output=True,
)
png_bytes = proc2.stdout
return png_bytes

from til.

simonw avatar simonw commented on June 7, 2024

Huh... those are PNGs. I bet they'd be a lot smaller if they were JPEGs, and even retina JPEGs might be smaller while still displaying well.

from til.

simonw avatar simonw commented on June 7, 2024

Ran this locally:

datasette . --get /sqlite/multiple-indexes > generate.html

Then:

shot-scraper shot generate.html -w 800 -h 400 --retina

Got this 216KB image:

generate-html

Tried a JPEG too - quality 80 was almost as big, but this got a smaller image (159KB):

shot-scraper shot generate.html -w 800 -h 400 --retina --quality 60

generate-html 1

from til.

simonw avatar simonw commented on June 7, 2024

Biggest question to decide is how to tell if an image has been created in S3 or not.

I'm tempted to do it based on the filename: use the shot hash as that name, do a quick list-files operation to see what files exist already, create the ones that don't.

from til.

simonw avatar simonw commented on June 7, 2024

That should run in GitHub Actions and generate JPEGs for every post and upload them to S3.

https://github.com/simonw/til/actions/runs/4842339363/jobs/8629221973

from til.

simonw avatar simonw commented on June 7, 2024

It's working...

 % s3-credentials list-bucket til.simonwillison.net
[
  {
    "Key": "0cf1e455f161435a4aea07480c27da89.jpg",
    "LastModified": "2023-04-30 03:54:06+00:00",
    "ETag": "\"c1ef69673fda4ebf1cd1cfa41d8dc255\"",
    "Size": 90039,
    "StorageClass": "STANDARD"
  },
  {
    "Key": "1447c8cdd4caa68e5514a1bb5b9f9f49.jpg",
    "LastModified": "2023-04-30 03:54:12+00:00",
    "ETag": "\"4adfdd03def8e54c651451f5b56e43b9\"",
    "Size": 111841,
    "StorageClass": "STANDARD"
  },
  {
    "Key": "14e4b902d5511a639a6c8d1e91d3dabb.jpg",
    "LastModified": "2023-04-30 03:54:35+00:00",
    "ETag": "\"2d3e29f3eaca62ba688c04a82d923fba\"",
    "Size": 118002,
    "StorageClass": "STANDARD"
  },

from til.

simonw avatar simonw commented on June 7, 2024

Generated image example: http://s3.amazonaws.com/til.simonwillison.net/f19a4a99ca28b20786ed7e35d8f9a8e7.jpg

from til.

simonw avatar simonw commented on June 7, 2024

To see how many are done:

% s3-credentials list-bucket til.simonwillison.net | jq length
43

410 total.

from til.

simonw avatar simonw commented on June 7, 2024

Partial logs from that GitHub Actions run:

Stored 96126 byte JPEG for github-actions_grep-tests.md shot hash 3e71efb58ec2d72ce37d6c93d7ace74e
Stored 70990 byte JPEG for github-actions_commit-if-file-changed.md shot hash 3b4a2012993962434fc8f5853cf5396b
Stored 72935 byte JPEG for bash_loop-over-csv.md shot hash d06963c31326ae773a8e7face614668c

from til.

simonw avatar simonw commented on June 7, 2024

It finished. All 410 images should be there now.

from til.

simonw avatar simonw commented on June 7, 2024

This query shows all the images on one page:

select
  json_object(
    'img_src',
    'https://s3.amazonaws.com/til.simonwillison.net/' || shot_hash || '.jpg',
    'width',
    400
  ) as img
from
  til

https://til.simonwillison.net/tils

I scrolled through and they all look good. This one was a favourite: https://s3.amazonaws.com/til.simonwillison.net/990ce33b65e40356be0035f185b3484c.jpg

from til.

simonw avatar simonw commented on June 7, 2024

Last steps:

  • Remove the datasette-media plugin and configuration
  • Delete the old cached images
  • Update the template to reference the new ones (oh no! That's going to require regenerating them all since the template hash will change)

from til.

simonw avatar simonw commented on June 7, 2024

Oops broke it:

Traceback (most recent call last):
  File "generate_screenshots.py", line 92, in <module>
    generate_screenshots(root)
  File "generate_screenshots.py", line 55, in generate_screenshots
    shot_html_hash.update(filepath.read_text().encode("utf-8"))
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/pathlib.py", line 1236, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/pathlib.py", line 1222, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/pathlib.py", line 1078, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/til/til/main/templates/row.html'

from til.

simonw avatar simonw commented on June 7, 2024

That's deployed now.

https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Ftil.simonwillison.net%2Fllms%2Ftraining-nanogpt-on-my-blog shows this:

image

from til.

simonw avatar simonw commented on June 7, 2024

Wrote this up as a TIL: https://til.simonwillison.net/shot-scraper/social-media-cards

from til.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.