Giter VIP home page Giter VIP logo

Comments (9)

virgil avatar virgil commented on June 7, 2024 1

Update: For what it's worth, on onion.link we simply put a Varnish cache in front of tor2web and we cache contents using the various Varnish policies. This was the best solution we found and it's compatible with CDNs like Fastly (if someone else wanted to do that).

I strongly discourage tor2web from implementing its own caching system. It would be a lot of work, as well as reinventing the wheel. Just use Varnish/Squid/whatever.

For the curious, onion.link actually caches pretty aggressively. It caches anything that doesn't explicitly say "don't cache me", via one of the various HTTP headers. I choose this because (1) most sites don't set the cache headers and (2) if their site breaks, it will encourage the onion-site maintainer to start using the appropriate cache headers.

from tor2web.

DRSDavidSoft avatar DRSDavidSoft commented on June 7, 2024 1

@virgil on a sidenote, thank you for the wonderful onion.link. With its 'aggressive caching' feature it is the best Tor2Web implementations that exists, and I use it on a daily basis. Most of tor sites are either slow, or broken at times, and this caching pretty much prevents these problems from accessing the content.
Also, I agree with the caching policies you've set up. In my opinion, any tor2web setup should have the same caching policies for the reasons you stated.

from tor2web.

fpietrosanti avatar fpietrosanti commented on June 7, 2024

Some consideration from caching:

  • Caching break dynamic website
    An internet forum with html pages cached at 1h will be unusable cause it will have a forced refresh of 1h.
    Caching must be as less invasive as possible.
  • Caching may bring additional responsibilities to a Tor2web node administrator
    Keeping unwanted files on tor2web operator's server filesystem.
    So the only good caching is the "in-memory" caching, that bring to a limitation on "how much stuff we can cache".
  • Caching does not specifically address latency issues
    Providing a fully-offline-cache it's almost impossible while not controlling the backend-content, without breaking it.
    For that reason caching may only act on specific kind of content (to avoid breaking t2w websites).
    For that reason the "latency improvements" provided by caching will be relatively low.
  • Caching can provide bandwidth saving improvements for high-traffic websites
    This is not currently an issue for Tor2web.
    It may represent an issue in case of high-traffic websites, that with it's own requests may overload the Tor Hidden Services infrastructure (like overloading the rendezvous point / introductory point).
    To mitigate/overcome this "possible" Tor limitation and provide a performance improvements, we may apply specific caching strategy to "high-traffic resources" in order to save Tor bandwidth.

My proposal is to apply caching only given the following conditions:

  • Only in-memory caching (NO DISK WRITE)
  • Only high-traffic resources caching (cache only when it's relevant to cache due to resource constraint)
  • Only static-resources caching (only cache resources that does not include dynamic content)

from tor2web.

fpietrosanti avatar fpietrosanti commented on June 7, 2024

Caching seems supported by Twisted Web Client http://twistedmatrix.com/trac/ticket/5126 .

We may enable this for specific static objects, for top-accessed websites detected by #13, and/or for all websites but keeping those objects cached for a short amount of time, for example some hours (it's memory only cache)?

from tor2web.

fpietrosanti avatar fpietrosanti commented on June 7, 2024

Hellais brought he idea that the TorHS server should also be able to "influence" the Caching Behaviour of Tor2web Proxy by setting specific Caching related headers.

That way for example a TorHS node may specify, with appropriate caching headers, to provide a static cache of /index.html and the resources required to quickly display the website without waiting to connect to TorHS.

Such a cache should have a maximum expiry time.

from tor2web.

fpietrosanti avatar fpietrosanti commented on June 7, 2024

/cc @virgil

from tor2web.

virgil avatar virgil commented on June 7, 2024

This is already done. The TorHS can dictate caching behavior with the "Cache-Control:" HTTP headers.

Bluntly, caching is complicated. In my work I've simply used Fastly.com (which does DNS-man-in-the-middle) and caches the content as your backend responds. This is the same model that CloudFlare uses.

TL;DR---there's already specific caching related headers. Incorporate caching support into tor2web directly would be welcome, but given the existing solutions I argue other issues (e.g., client-side-rewriting, generalizing to .i2p, etc.) have higher priority.

from tor2web.

NSkelsey avatar NSkelsey commented on June 7, 2024

@virgil with the varnish setup on onion.link have you disabled paging? Do you have a config file you are willing to share?

I vote we close this topic. There are tools that solve this problem.

from tor2web.

DRSDavidSoft avatar DRSDavidSoft commented on June 7, 2024

I think using an external caching solution has many benefits. For example, having an already completed and stable platform instead of re-inventing the wheel is obviously a major advantage.

However, I also think that if such solution is going to be used, there should be proper documentation and example configuration files, so it can be easily and effortlessly used in each setup. Tor2Web should have easy integration setup for a Varnish/Squid setup.

@virgil, I use Varnish as a regular basis, but as @NSkelsey has pointed out, sharing an example config file is much appreciated.

from tor2web.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.