Topic: web-archiving Goto Github

Some thing interesting about web-archiving

👇 Here are 93 public repositories matching this topic...

akamhy / waybackpy

web-archiving,Wayback Machine API interface & a command-line tool

Home Page: https://pypi.org/project/waybackpy/

internet-archive wayback-machine internet-archiving archive-webpage archive-webpages wayback-machine-api cdx-api wayback-machine-python savepagenow web-archiving

archivebox / archivebox

web-archiving,🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Organization: archivebox

Home Page: https://archivebox.io

pocket wget browser-bookmarks pinboard chromium firefox backups rss web-archiving python

archivebox / archivebox-browser-extension

web-archiving,Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

Organization: archivebox

Home Page: https://chromewebstore.google.com/detail/archivebox-exporter/habonpimjphpdnmcfkaockjnffodikoj

archivebox chrome-extension firefox-extension svelte archiving browser-extension digipres digital-preservation internet-archiving web-archiving

archivebox / debian-archivebox

web-archiving,Home of the official apt/deb package for Ubuntu/Debian-based systems.

Organization: archivebox

Home Page: https://launchpad.net/~archivebox/+archive/ubuntu/archivebox

archivebox debian apt package internet-archiving stdeb web-archiving digipres aptitude ubuntu

archivebox / digestbox

web-archiving,DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.

Organization: archivebox

Home Page: https://DigestBox.io

archivebox backups digipres headless-browser internet-archiving warc web-archiving

archivebox / docs

web-archiving,Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.

Organization: archivebox

Home Page: https://docs.archivebox.io

archivebox sphinx python rest cli ui documentation wiki usage community digipres web-archiving internet-archiving

archivebox / electron-archivebox

web-archiving,Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

Organization: archivebox

Home Page: https://archivebox.io

archivebox electron docker internet-archiving digipres web-archiving desktop desktop-electron macos windows

archivebox / homebrew-archivebox

web-archiving,Homebrew formula for the ArchiveBox self-hosted internet archiving solution.

Organization: archivebox

Home Page: https://archivebox.io

archivebox homebrew macos package linuxbrew brew-tap internet-archiving web-archiving digipres

archivebox / pip-archivebox

web-archiving,Official Python package for ArchiveBox, the self-hosted internet archiving solution.

Organization: archivebox

Home Page: https://pypi.org/project/archivebox/

archivebox python pip pypi internet-archiving web-archiving digipres setuptools sdist wheel

bellingcat / auto-archiver

web-archiving,Automatically archive links to videos, images, and social media content from Google Sheets (and more).

Organization: bellingcat

Home Page: https://pypi.org/project/auto-archiver/

archive docker open-source-research python service scraping web-archiving

cocrawler / cdx_toolkit

web-archiving,A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

Organization: cocrawler

web-archiving web-archives warc cdx cdx-api commoncrawl python

dbeley / archiveboxmatic

web-archiving,ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.

User: dbeley

archivebox archiving web-archiving

florents-tselai / warcdb

web-archiving,WarcDB: Web crawl data as SQLite databases.

User: florents-tselai

Home Page: https://WarcDB.tselai.com

crawling sqlite warc cli web-data database web-archiving

gildas-lormeau / single-file-cli

web-archiving,CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

User: gildas-lormeau

cli deno nodejs single-file web-archiving web-scraper web-scraping

gwu-libraries / sfm-ui

web-archiving,Social Feed Manager user interface application.

Organization: gwu-libraries

Home Page: http://gwu-libraries.github.io/sfm-ui

code4lib social-feed-manager social-media web-archiving

harvard-lil / perma

web-archiving,Indelible links

Organization: harvard-lil

web-archiving libraries

helgeho / archivespark

web-archiving,An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

User: helgeho

archivespark spark-framework spark web-archiving webarchive internet-archive warc

internetarchive / fatcat

web-archiving,Perpetual Access To The Scholarly Record

Organization: internetarchive

Home Page: https://guide.fatcat.wiki

rust web-archiving scholarly-communication digital-library python open-access postgresql

internetarchive / sandcrawler

web-archiving,Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki

Organization: internetarchive

web-archiving

internetarchive / scrapy-warcio

web-archiving,Support for writing WARC files with Scrapy

Organization: internetarchive

warc web-archiving python scrapy

machawk1 / wail

web-archiving,:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation

User: machawk1

Home Page: https://matkelly.com/wail

web-archiving wayback python heritrix gui warc openwayback pyinstaller

machawk1 / warcreate

web-archiving,Chrome extension to "Create WARC files from any webpage"

User: machawk1

Home Page: https://warcreate.com

chrome-extension warc web-archiving

maxcountryman / warc-parquet

web-archiving,🗄️ A simple CLI for converting WARC to Parquet.

User: maxcountryman

crawling duckdb parquet warc web-archiving

n0tan3rd / node-warc

web-archiving,Parse And Create Web ARChive (WARC) files with node.js

User: n0tan3rd

webarchive webarchiving web-archives warc-files warc web-archiving pupeteer chrome-remote-interface

nla / chronicrawl

web-archiving,Experimental continouous web crawler for web archiving

Organization: nla

web-archiving

nla / chropro

web-archiving,Chrome debugging protocol client for Java

Organization: nla

chrome chrome-debugging-protocol chrome-devtools java web-archiving

nla / httrack2warc

web-archiving,Converts HTTrack crawls to WARC files

Organization: nla

web-archiving

nla / outbackcdx

web-archiving,Web archive index server based on RocksDB

Organization: nla

web-archiving wayback

oduwsdl / archivenow

web-archiving,A Tool To Push Web Resources Into Web Archives

Organization: oduwsdl

internet-archive web-archiving

oduwsdl / ipwb

web-archiving,InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

Organization: oduwsdl

ipfs warc wayback web-archiving python service-worker memento memento-rfc docker

oduwsdl / memgator

web-archiving,A Memento Aggregator CLI and Server in Go

Organization: oduwsdl

Home Page: https://memgator.cs.odu.edu/api.html

memento memento-rfc timemap web-archiving

oduwsdl / warrick

web-archiving,Recover lost websites from the Web Infrastructure

Organization: oduwsdl

Home Page: https://code.google.com/p/warrick

web-archiving memento-rfc memento recovery

web-archiving,A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.

Organization: own-data-privateer

Home Page: https://oxij.org/software/pwebarc/

archive backups internet internet-archiving self-hosted wayback-machine web-archiving

pirate / internet-archiving-talk

web-archiving,🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

User: pirate

Home Page: https://pirate.github.io/internet-archiving-talk/

internet-archiving talks slideshow web-archiving wget warc archivebox censorship ethics

programminghistorian / ph-submissions

web-archiving,The repository and website hosting the peer review process for new Programming Historian lessons

Organization: programminghistorian

Home Page: http://programminghistorian.github.io/ph-submissions

api data-management dh digital-history digital-humanities distant-reading linked-open-data mapping multi-lingual network-analysis

rahiel / archiveror

web-archiving,Archiveror will help you preserve the webpages you love. 💾

User: rahiel

Home Page: https://www.rahielkasim.com/archiveror/

archiving webextension linkrot mhtml browser-extension web-archiving firefox-extension chrome-extension javascript bookmark

rhizome-conifer / conifer

web-archiving,Collect and revisit web pages.

Organization: rhizome-conifer

Home Page: https://conifer.rhizome.org

webrecorder web-archiving archives pywb python docker wayback warc

rhizome-conifer / conifer-deploy

web-archiving,Conifer setup and deployment via Ansible

Organization: rhizome-conifer

ansible-playbook web-archiving webrecorder

ukwa / ukwa-manage

web-archiving,Shepherding our web archives from crawl to access.

Organization: ukwa

warc cdx hdfs wayback webarchive web-archiving

webrecorder / archiveweb.page

web-archiving,A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

Organization: webrecorder

Home Page: https://chrome.google.com/webstore/detail/webrecorder/fpeoodllldobpkbkabpblcfaogecpndd

chromium extension web-archiving webrecorder archiving wacz

webrecorder / browsertrix

web-archiving,Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

Organization: webrecorder

Home Page: https://browsertrix.com

archiving cloud warc web-archive web-archiving webrecorder wacz kubernetes