Giter VIP home page Giter VIP logo

browserexport's Introduction

browserexport

PyPi version Python 3.8|3.9|3.10|3.11|3.12 PRs Welcome

This:

  • locates and backs up browser history by copying the underlying database files to some directory you specify
  • can identify and parse the resulting database files into some common schema:
Visit:
  url: the url
  dt: datetime (when you went to this page)
  metadata:
    title: the <title> for this page
    description: the <meta description> tag from this page
    preview_image: 'main image' for this page, often opengraph/favicon
    duration: how long you were on this page

metadata is dependent on the data available in the browser (e.g. firefox has preview images, chrome has duration, but not vice versa)

Supported Browsers

This currently supports:

This can probably extract visits from other Firefox/Chromium-based browsers, but it doesn't know how to locate them to save them

Install

python3 -m pip install --user browserexport

Requires python3.7+

Usage

save

Usage: browserexport save [OPTIONS]

  Backs up a current browser database file

Options:
  -b, --browser
      [chrome | firefox | opera | safari | brave | waterfox |
      librewolf | floorp | chromium | vivaldi | palemoon | arc |
      edge | edgedev]
                                  Browser name to backup history for
  --pattern TEXT                  Pattern for the resulting timestamped filename, should include an
                                  str.format replacement placeholder for the date [default:
                                  browser_name-{}.extension]
  -p, --profile TEXT              Use to pick the correct profile to back up. If unspecified, will assume a
                                  single profile  [default: *]
  --path FILE                     Specify a direct path to a database to back up
  -t, --to DIRECTORY              Directory to store backup to. Pass '-' to print database to STDOUT
                                  [required]
  -h, --help                      Show this message and exit.

Must specify one of --browser, or --path

After your browser history reaches a certain size, browsers typically remove old history over time, so I'd recommend backing up your history periodically, like:

$ browserexport save -b firefox --to ~/data/browsing
$ browserexport save -b chrome --to ~/data/browsing
$ browserexport save -b safari --to ~/data/browsing

That copies the sqlite databases which contains your history --to some backup directory.

If a browser you want to backup is Firefox/Chrome-like (so this would be able to parse it), but this doesn't support locating it yet, you can directly back it up with the --path flag:

$ browserexport save --path ~/.somebrowser/profile/places.sqlite \
  --to ~/data/browsing

The --pattern argument can be used to change the resulting filename for the browser, e.g. --pattern 'places-{}.sqlite' or --pattern "$(uname)-{}.sqlite". The {} is replaced by the browser name.

Feel free to create an issue/contribute a browser file to locate the browser if this doesn't support some browser you use.

Can pass the --debug flag to show sqlite_backup logs

$ browserexport --debug save -b firefox --to .
[D 220202 10:10:22 common:87] Glob /home/sean/.mozilla/firefox with */places.sqlite (non recursive) matched [PosixPath('/home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite')]
[I 220202 10:10:22 save:18] backing up /home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite to /home/sean/Repos/browserexport/firefox-20220202181022.sqlite
[D 220202 10:10:22 core:110] Source database files: '['/tmp/tmpcn6gpj1v/places.sqlite', '/tmp/tmpcn6gpj1v/places.sqlite-wal']'
[D 220202 10:10:22 core:111] Temporary Destination database files: '['/tmp/tmpcn6gpj1v/places.sqlite', '/tmp/tmpcn6gpj1v/places.sqlite-wal']'
[D 220202 10:10:22 core:64] Copied from '/home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite' to '/tmp/tmpcn6gpj1v/places.sqlite' successfully; copied without file changing: True
[D 220202 10:10:22 core:64] Copied from '/home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite-wal' to '/tmp/tmpcn6gpj1v/places.sqlite-wal' successfully; copied without file changing: True
[D 220202 10:10:22 core:230] Running backup, from '/tmp/tmpcn6gpj1v/places.sqlite' to '/home/sean/Repos/browserexport/firefox-20220202181022.sqlite'
[D 220202 10:10:22 save:14] Copied 1840 of 1840 database pages...
[D 220202 10:10:22 core:246] Executing 'wal_checkpoint(TRUNCATE)' on destination '/home/sean/Repos/browserexport/firefox-20220202181022.sqlite'

For Firefox Android Fenix, the database has to be manually backed up (probably from a rooted phone using termux) from data/data/org.mozilla.fenix/files/places.sqlite.

inspect/merge

These work very similarly, inspect is for a single database, merge is for multiple databases.

Usage: browserexport merge [OPTIONS] SQLITE_DB...

  Extracts visits from multiple sqlite databases

  Provide multiple sqlite databases as positional arguments, e.g.:
  browserexport merge ~/data/firefox/*.sqlite

  Drops you into a REPL to access the data

  Pass '-' to read from STDIN

Options:
  -s, --stream  Stream JSON objects instead of printing a JSON list
  -j, --json    Print result to STDOUT as JSON
  -h, --help    Show this message and exit.

As an example:

browserexport --debug merge ~/data/firefox/* ~/data/chrome/*
[D 210417 21:12:18 merge:38] merging information from 24 sources...
[D 210417 21:12:18 parse:19] Reading visits from /home/sean/data/firefox/places-20200828223058.sqlite...
[D 210417 21:12:18 common:40] Chrome: Running detector query 'SELECT * FROM keyword_search_terms'
[D 210417 21:12:18 common:40] Firefox: Running detector query 'SELECT * FROM moz_meta'
[D 210417 21:12:18 parse:22] Detected as Firefox
[D 210417 21:12:19 parse:19] Reading visits from /home/sean/data/firefox/places-20201010031025.sqlite...
[D 210417 21:12:19 common:40] Chrome: Running detector query 'SELECT * FROM keyword_search_terms'
....
[D 210417 21:12:48 common:40] Firefox: Running detector query 'SELECT * FROM moz_meta'
[D 210417 21:12:48 common:40] Safari: Running detector query 'SELECT * FROM history_tombstones'
[D 210417 21:12:48 parse:22] Detected as Safari
[D 210417 21:12:48 merge:51] Summary: removed 3001879 duplicates...
[D 210417 21:12:48 merge:52] Summary: returning 334490 visit entries...

Use vis to interact with the data

[1] ...

You can also read from STDIN, so this can be used in conjunction with save, to merge databases you've backed up and combine your current browser history:

browserexport save -b firefox -t - | browserexport merge --json --stream - ~/data/browsing/* >all.jsonl

Or, use process substitution to save multiple dbs in parallel and then merge them:

$ browserexport merge <(browserexport save -b firefox -t -) <(browserexport save -b chrome -t -)

Logs are hidden by default. To show the debug logs set export BROWSEREXPORT_LOGS=10 (uses logging levels) or pass the --debug flag.

JSON

To dump all that info to JSON:

$ browserexport merge --json ~/data/browsing/*.sqlite > ./history.json
du -h history.json
67M     history.json

Or, to create a quick searchable interface, using jq and fzf:

browserexport merge -j --stream ~/data/browsing/*.sqlite | jq '"\(.url)|\(.metadata.description)"' | awk '!seen[$0]++' | fzf

Merged files like history.json can also be used as inputs files themselves, this reads those by mapping the JSON onto the Visit schema directly.

In addition to .json files, this can parse .jsonl (JSON lines) files, which are files which contain newline delimited JSON objects. This allows you to parse JSON objects one at a time, instead of loading the entire file into memory. The .jsonl file can be generated with the --stream flag:

browserexport merge --stream --json ~/data/browsing/*.sqlite > ./history.jsonl

Additionally, this can parse compressed JSON/JSONL files (using kompress): .xz, .zip, .lz4, .zstd, .zst, .tar.gz, .gz

For example, you could do:

browserexport merge --stream --json ~/data/browsing/*.sqlite | gzip --best > ./history.jsonl.gz
# test parsing the compressed file
browserexport --debug inspect ./history.jsonl.gz

If you don't care about keeping the raw databases for any other auxiliary info like form, bookmark data, or from_visit info and just want the URL, visit date and metadata, you could use merge to periodically merge the bulky .sqlite files into a gzipped JSONL dump to reduce storage space, and improve parsing speed:

# backup databases
rsync -Pavh ~/data/browsing ~/.cache/browsing
# merge all sqlite databases into a single compressed, jsonl file
browserexport --debug merge --json --stream ~/data/browsing/* > '/tmp/browsing.jsonl'
gzip '/tmp/browsing.jsonl'
# test reading gzipped file
browserexport --debug inspect '/tmp/browsing.jsonl.gz'
# remove all old datafiles
rm ~/data/browsing/*
# move merged data to database directory
mv /tmp/browsing.jsonl.gz ~/data/browsing

I do this every couple months with a script here, and then sync my old databases to a harddrive for more long-term storage

Shell Completion

This uses click, which supports shell completion for bash, zsh and fish. To generate the completion on startup, put one of the following in your shell init file (.bashrc/.zshrc etc)

eval "$(_BROWSEREXPORT_COMPLETE=bash_source browserexport)" # bash
eval "$(_BROWSEREXPORT_COMPLETE=zsh_source browserexport)" # zsh
_BROWSEREXPORT_COMPLETE=fish_source browserexport | source  # fish

Instead of evaling, you could of course save the generated completion to a file and/or lazy load it in your shell config, see bash completion docs, zsh functions, fish completion docs. For example for zsh that might look like:

mkdir -p ~/.config/zsh/functions/
_BROWSEREXPORT_COMPLETE=zsh_source browserexport > ~/.config/zsh/functions/_browserexport
# in your ~/.zshrc
# update fpath to include the directory you saved the completion file to
fpath=(~/.config/zsh/functions $fpath)
autoload -Uz compinit && compinit

HPI

If you want to cache the merged results, this has a module in HPI which handles locating/caching and querying the results. See setup and module setup.

That uses cachew to automatically cache the merged results, recomputing whenever you backup new databases

As a few examples:

✅ OK  : my.browser.all
✅     - stats: {'history': {'count': 1091091, 'last': datetime.datetime(2023, 2, 11, 1, 12, 37, 302883, tzinfo=datetime.timezone.utc)}}
✅ OK  : my.browser.export
✅     - stats: {'history': {'count': 1090850, 'last': datetime.datetime(2023, 2, 11, 4, 34, 12, 985488, tzinfo=datetime.timezone.utc)}}
✅ OK  : my.browser.active_browser
✅     - stats: {'history': {'count': 270363, 'last': datetime.datetime(2023, 2, 11, 22, 26, 24, 887722, tzinfo=datetime.timezone.utc)}}
# supports arbitrary queries, e.g. how many visits did I have in January 2020?
$ hpi query my.browser.all --order-type datetime --after '2022-01-01 00:00:00' --before '2022-01-31 23:59:59' | jq length
50432
# how many github URLs in the past month
$ hpi query my.browser.all --recent 4w -s | jq .url | grep 'github.com' -c
16357

Library Usage

To save databases:

from browserexport.save import backup_history
backup_history("firefox", "~/data/backups")
# or, pass a Browser implementation
from browserexport.browsers.all import Firefox
backup_history(Firefox, "~/data/backups")

To merge/read visits from databases:

from browserexport.merge import read_and_merge
read_and_merge(["/path/to/database", "/path/to/second/database", "..."])

You can also use sqlite_backup to copy your current browser history into a sqlite connection in memory, as a sqlite3.Connection

from browserexport.browsers.all import Firefox
from browserexport.parse import read_visits
from sqlite_backup import sqlite_backup

db_in_memory = sqlite_backup(Firefox.locate_database())
visits = list(read_visits(db_in_memory))

# to merge those with other saved files
from browserexport.merge import merge_visits, read_and_merge
merged = list(merge_visits([
    visits,
    read_and_merge(["/path/to/another/database.sqlite", "..."]),
]))

If this doesn't support a browser and you wish to quickly extend without maintaining a fork (or contributing back to this repo), you can pass a Browser implementation (see browsers/all.py and browsers/common.py for more info) to browserexport.parse.read_visits or programmatically override/add your own browsers as part of the browserexport.browsers namespace package

Comparisons with Promnesia

A lot of the initial queries/ideas here were taken from promnesia and the browser_history.py script, but creating a package here allows its to be more extendible, e.g. allowing you to override/locate additional databases.

TLDR on promnesia: lets you explore your browsing history in context: where you encountered it, in chat, on Twitter, on Reddit, or just in one of the text files on your computer. This is unlike most modern browsers, where you can only see when you visited the link.

Since promnesia #375, browserexport is used in promnesia in the browser.py file (to read any of the supported databases here from disk), see setup and the browser source quickstart in the instructions for more

Contributing

Clone the repository and [optionally] create a virtual environment to do your work in.

git clone https://github.com/seanbreckenridge/browserexport
cd ./browserexport
# create a virtual environment to prevent possible package dependency conflicts
python -m virtualenv .venv  # python3 -m pip install virtualenv if missing
source .venv/bin/activate

Development

To install, run:

python3 -m pip install '.[testing]'

If running in a virtual environment, pip will automatically install dependencies into your virtual environment. If running browserexport happens to use the globally installed browserexport instead, you can use python3 -m browserexport to ensure its using the version in your virtual environment.

After making changes to the code, reinstall by running pip install ., and then test with browserexport or python3 -m browserexport

Testing

While developing, you can run tests with:

pytest
flake8 ./browserexport
mypy ./browserexport
# to autoformat code
python3 -m pip install black
find browserexport tests -name '*.py' -exec python3 -m black {} +

browserexport's People

Contributors

aluhrs13 avatar andrewsb avatar apatel762 avatar karlicoss avatar seanbreckenridge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

browserexport's Issues

tests?

could probably create an example database by copying a current db and dropping a bunch of the sensitive data, just to make sure everythings still working

firefox/chrome: support `from_visit` field

Might be potentially interesting for Promnesia. Although also possible that the utility is very marginal and traversing history by timestamps is good enough.

  • firefox: moz_historyvisits.from_visit. On mobile the field is present but seems to always be NULL
  • chrome: visits.from_visit

Perhaps belongs to metadata, but also means that we'd need to keep original visit ID from the sqlite database to match the visit. And even worse, when all visits from different historic exports are merged in a single stream, the ids don't make sense anymore (they are 'internal' to a specific export in general). Although this would be possible to workaround if we somehow remap the ids in browserexport itself

Convert merged data back into sqlite databases browsers can read

As discussed here, is possible to take the merged data and convert it back into a new database to use, but supporting every browser would probably be a lot of code

Maybe support Chrome/Firefox, so at least you could merge data from browsers you use less into your 'main' (but this also promotes a silo, and pushes you towards firefox/chrome)

However,

  • I expect this would become a lot of code
  • you'd lose some data since this doesn't extract all attributes from every table, and we don't have some data, so we'd probably end up writing dummy values? Which then pollute the database even more for future exports...
  • re-normalizing all the data is an annoying task for every different schema.
  • the browser databases remove history over time anyways, so theres a limit, so you cant just merge millions of entries into your database to have history going back for years -- this is the whole reason browserexport exists, to save your old data.

Relatively low priority as I can interact with this programmatically by writing my own promnesia sources, but I can see this feature being useful for others...

Feels like a pretty difficult problem to solve, adds much more complexity to browserexport, when otherwise this is more functional -- taking databases, merging them and extracting visits

deprecate form history

it was an experiment and it hasnt really panned out, so can add a deprecation notice and remove it an a year or so with 0.4.0

Support Firefox Mobile

promnesia supports firefoxmobile

Apparently you can use termux to backup that database periodically, may require root?

Not sure, haven't been able to figure it out myself yet, pretty new to termux -- if anyone has any info on this, would appreciate a comment

how to backup list link plugins?

Hi. i have a bunch of cent browser plugins soon i am moving to a new gmail / i have a bunch of plugins , a lot of them.
I need to make a backup in the form of links.
example picture, just a list of my extensions as links from chrome store.

sshot-002

parse input history form database

would be nice to re-use detection/parse/serialization mechanisms,

would rather duck-type stuff with some protocols than do a bunch of classes

allow reading from merged JSON files

over time, the amount of databases/space needed increases and may get harder to remove 'redundant' backups, like described by bleanser in #26

allowing this to read from JSON files (the output from running browserexport merge ~/data/browsing/* --json >/tmp/dump.json can end up compressing the data quite a lot

save support on windows

Essentially, just need to install these browsers on Windows and see where they typically install to, add the corresponding paths to each Browser file in the data_directories function

support jsonl, json.gz and jsonl.gz file extensions

json lines would make reading from large JSON exports more memory efficient/allow it to start yielding lines before parsing the whole (sometimes half a gig) file

supporting the gziped versions of both of these would be nice as well as it drastically reduces the size of the json files as theyre just text with a lot of repeated property names

add integration test script

probably would just be run on my machine

  • opens browser application in the background (and possibly goes to some URL?)
  • saves the browser database to a temp directory
  • parses the browser history

reports if the parse failed (or if the recent browser URL or database could not be found)

merge optimization

use emitted set instead of reading all the data into memory concurrently

Create searchable interface

As mentioned in #16

may be nice to offer a basic interface to search the dumped JSON data, not everyone is able to do some list comprehension in python

Currently you can dump it to JSON, and thats machine consumable, but end users of this don't have a great way to search this... could create a basic HTML file which searches the JSON using some javascript?

could be in JS or Python, doesn't really matter

Parse moz_inputhistory

Is the search queries you type into the menu bar; has an input and a use count, and a place ID

for most items, this would be null, so maybe store this differently than in Visits? or merge it differently into the interactive portion, and provide another merge library func which merges individual moz_inputhistory wit moz_historyvisits from one database, and then merges those together into another 'merge' lib function.

gives access to these in both inspect and merge

This is of low priority, it doesn't really add something that valuable to the Visit enum, its quite distinct. Not personally that interested in the data myself

support exporting history with snap installs

Seems that in ubuntu 22.04 the default firefox installation in via Snap, so the profile dir is in ~/snap/firefox/common/.mozilla/firefox 👀

I might fix it myself later, just leaving it here in case someone else is looking -- likely needs a fix somewhere around this line, e.g. if could take a sequence of paths and go through them

"linux": "~/.mozilla/firefox/",

Btw IIRC it's been similar for chromium for a while, but haven't used chrome for ages, so not sure

Add cachew?

Doesnt seem that it'd be useful here, since we're already reading from a database (the firefox history database). Caching that info to another cachew database wouldn't make much sense.

Cant cache the live firefox history file because that keeps changing, so the only place cachew would improve any performance would be if we were spending a long time in merge_visits. But that doesnt even do any IO, its just a loop with a set, so doubtful.

For reference:

[ ~ ] $ time sh -c  'HPI_LOGS=debug python3 -c "from my.browsing import history; x = list(history())"'
[DEBUG   2020-09-05 03:07:21,267 my.browsing __init__.py:681] using inferred type <class 'ffexport.model.Visit'>
[D 200905 03:07:21 save_hist:66] backing up /home/sean/.mozilla/firefox/lsinsptf.dev-edition-default/places.sqlite to /tmp/tmpxvxci5yl/places-20200905100721.sqlite
[D 200905 03:07:21 save_hist:70] done!
[D 200905 03:07:21 merge_db:48] merging information from 2 databases...
[DEBUG   2020-09-05 03:07:21,303 my.browsing __init__.py:728] using /tmp/browser-cachw/homeseandatafirefoxdbsplaces-20200828223058.sqlite for db cache
[DEBUG   2020-09-05 03:07:21,303 my.browsing __init__.py:734] new hash: cachew: 0.7.0, schema: {'url': <class 'str'>, 'visit_date': <class 'datetime.datetime'>, 'visit_type': <class 'int'>, 'title': typing.Union[str, NoneType], 'description': typing.Union[str, NoneType], 'preview_image': typing.Union[str, NoneType]}, hash: 1598653858
[DEBUG   2020-09-05 03:07:21,310 my.browsing __init__.py:761] old hash: cachew: 0.7.0, schema: {'url': <class 'str'>, 'visit_date': <class 'datetime.datetime'>, 'visit_type': <class 'int'>, 'title': typing.Union[str, NoneType], 'description': typing.Union[str, NoneType], 'preview_image': typing.Union[str, NoneType]}, hash: 1598653858
[DEBUG   2020-09-05 03:07:21,310 my.browsing __init__.py:764] hash matched: loading from cache
[DEBUG   2020-09-05 03:07:22,083 my.browsing __init__.py:728] using /tmp/browser-cachw/tmptmpxvxci5ylplaces-20200905100721.sqlite for db cache
[DEBUG   2020-09-05 03:07:22,083 my.browsing __init__.py:734] new hash: cachew: 0.7.0, schema: {'url': <class 'str'>, 'visit_date': <class 'datetime.datetime'>, 'visit_type': <class 'int'>, 'title': typing.Union[str, NoneType], 'description': typing.Union[str, NoneType], 'preview_image': typing.Union[str, NoneType]}, hash: 1599300441
[DEBUG   2020-09-05 03:07:22,085 my.browsing __init__.py:761] old hash: None
[DEBUG   2020-09-05 03:07:22,085 my.browsing __init__.py:770] hash mismatch: computing data and writing to db
[D 200905 03:07:22 parse_db:69] Parsing visits from /tmp/tmpxvxci5yl/places-20200905100721.sqlite...
[D 200905 03:07:22 parse_db:88] Parsing sitedata from /tmp/tmpxvxci5yl/places-20200905100721.sqlite...
[D 200905 03:07:28 merge_db:60] Summary: removed 91,787 duplicates...
[D 200905 03:07:28 merge_db:61] Summary: returning 98,609 visit entries...
sh -c   7.46s user 0.19s system 99% cpu 7.711 total
[ ~ ] $ time sh -c 'HPI_LOGS=debug python3 -c "from my.browsing import history; x = list(history())"'
[D 200905 03:07:48 save_hist:66] backing up /home/sean/.mozilla/firefox/lsinsptf.dev-edition-default/places.sqlite to /tmp/tmpsvri7hr8/places-20200905100748.sqlite
[D 200905 03:07:48 save_hist:70] done!
[D 200905 03:07:48 merge_db:48] merging information from 2 databases...
[D 200905 03:07:48 parse_db:69] Parsing visits from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[D 200905 03:07:48 parse_db:88] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[D 200905 03:07:49 parse_db:69] Parsing visits from /tmp/tmpsvri7hr8/places-20200905100748.sqlite...
[D 200905 03:07:49 parse_db:88] Parsing sitedata from /tmp/tmpsvri7hr8/places-20200905100748.sqlite...
[D 200905 03:07:50 merge_db:60] Summary: removed 91,787 duplicates...
[D 200905 03:07:50 merge_db:61] Summary: returning 98,609 visit entries...
sh -c   1.65s user 0.10s system 99% cpu 1.759 total

First run is 7 seconds, with a cached cachew hit for the backed up database. Second is reading from both of them directly, which takes 1.6 seconds.

For reference, this is how I modified my.browsing from HPI

diff --git a/my/browsing.py b/my/browsing.py
index 9f44322..af66530 100644
--- a/my/browsing.py
+++ b/my/browsing.py
@@ -25,17 +25,25 @@ import tempfile
 from pathlib import Path
 from typing import Iterator, Sequence
 
-from .core.common import listify, get_files
+from .core.common import listify, get_files, mcachew
 
 
+from .kython.klogging import LazyLogger, mklevel
 # monkey patch ffexport logs
 if "HPI_LOGS" in os.environ:
-    from .kython.klogging import mklevel
     os.environ["FFEXPORT_LOGS"] = str(mklevel(os.environ["HPI_LOGS"]))
 
+logger = LazyLogger(__name__, level="info")
 
-from ffexport import read_and_merge, Visit
+CACHEW_PATH = "/tmp/browser-cachw"
+
+# create cache path
+os.makedirs(CACHEW_PATH, exist_ok=True)
+
+from ffexport import Visit
 from ffexport.save_hist import backup_history
+from ffexport.parse_db import read_visits
+from ffexport.merge_db import merge_visits
 
 @listify
 def inputs() -> Sequence[Path]:
@@ -60,7 +68,20 @@ def history(from_paths=inputs) -> Results:
     import my.browsing
     visits = list(my.browsing.history())
     """
-    yield from read_and_merge(*from_paths())
+    # only load items that are in the config.export path using cachew
+    # the 'live_file' is always going to be uncached
+    db_paths = list(from_paths())
+    tmp_path = db_paths.pop()
+    yield from merge_visits(*map(_read_history, db_paths), _read_history(tmp_path))
+
+
+def _browser_mtime(p: Path) -> int:
+    return int(p.stat().st_mtime)
+
+@mcachew(hashf=_browser_mtime, logger=logger, cache_path=lambda db_path: f"{CACHEW_PATH}/{str(db_path).replace('/','')}")
+def _read_history(db: Path) -> Iterator[Visit]:
+    yield from read_visits(db)
+
 
 def stats():
     from .core import stat

simplify locate_database calls

create an enum/some helper function to handle expanding path/sending warnings when not present in a dispatch dict like

{linux: ~/something,
mac: ~/something
...
}

Most of that functions body would just be the dict and the call

new browsers - firefox forks adress in windows 10

librewolf
\AppData\Roaming\librewolf\Profiles\

floorp
\AppData\Roaming\Floorp\Profiles\

firefox - in the program it has a dot in the .mozzila here is without it maybe it changed already
\AppData\Roaming\Mozilla\Firefox\Profiles\

and '*/places.sqlite' //like in firefox

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.