Comments (6)
Is there a solution for this?
I use(d) grab-site to create large, singular warc files (with uncompressed .cdx) and just noticed that ReplayWeb is not able to open these. Previously I only tested with smaller archives.
Is there a tool to convert my existing warc archive to a WACZ collection without scraping again?
edit: currently reading the linked repo. Will try this
edit: There's now a tool for such a conversion in the linked repo. Great!
from replayweb.page.
Yes, would recommend using WACZ for larger warcs due to browsers issue having to load the entire file into memory at once. Firefox I know still seems to lock up randomly when streaming a larger file.
Perhaps grab-site can have an option to generate wacz, I'll suggest that there.
from replayweb.page.
Yeah, at this scale (30GB+), it won't be able to load it all with just the WARC.
It needs to be converted into a WACZ collection with a compressed index, and then it can load the collection on-demand.
I don't quite have the tools ready to do this, but the idea is that it could take a pywb collection and create a compressed .wacz file as per: https://github.com/webrecorder/web-archive-collection-format
I'll try to have a skeleton of a tool that does this (for the current spec) fairly soon, if you want to try it out.
from replayweb.page.
Ah, sorry I should have clarified: I packaged the collection as WACZ before attempting to load it into replayweb.page.
from replayweb.page.
Ah ok! But just using the plain, uncompressed .cdx, right? WACZ supports both compressed and uncompressed
The compression part is rather straightforward, just need to have a script that does it.. I've been using the one on webarchive-indexing but its a bit old..
from replayweb.page.
The latest version includes some fixes to prevent timing out when loading large WARCs (1.3.11), hopefully even on Firefox, so will close this for now. WACZ is still recommended for large WARCs, but I think this is now working as best as possible, given that the entire WARC must be loaded
from replayweb.page.
Related Issues (20)
- PWA Manifest Not Available on Deployed Site
- ReplayWebpage V2 Docs Content Reorganization HOT 1
- WACZ range request error HOT 2
- Document `liveRedirectOnNotFound`
- Inconsistently Loading Videos in Embedded Player HOT 2
- [Replay Bug]: the reply of image galleries sometimes mixes links to different subpages
- [Replay Bug]: replay shows the wrong video to a news article at dr.dk HOT 1
- [Bug]: Missing ads on news sites HOT 3
- [Replay Bug]: Failure to render websites created with Shorthand.com
- [Replay Bug]: Failure to render websites published on Microsoft SharePoint
- [Bug]: Safari can't open wacz stored on Dropbox, Firefox & Chrome can HOT 1
- [Replay Bug]: Star Citizen ARK Starmap - stuck on loading HOT 4
- [Feature]: Change image-rendering mode based on snapshot date
- RWP: tab list disappears when opening an empty WARC file HOT 1
- [Docs]: WBN is listed in the placeholder text instructing users what files to open but isn't supported HOT 1
- ReplayWebpage Branding Update
- [Replay Bug]: Facebook archive content display partial in replayweb page HOT 2
- replay of timed transitions in hero elements or carousels HOT 2
- [Replay Bug]: Players not rendering Scalar crawls in Chrome HOT 1
- [Feature]: Adblock Support!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from replayweb.page.