history-report's Introduction

History Report

Create page and domain CSV reports from your Chrome browsing history

Google Chrome lets you view and search your browsing history, although with limitations on metadata and the ability to filter or sort. Though you can export your data to an XML file from the browser or get a JSON download from your Google account, those raw data files are still inconvenient to use.

Therefore this project provides Python 3.6 tool to convert your own downloaded browsing history JSON into two easy-to-use CSV reports which you can search, filter and sort in a CSV editor.

Title	Description
Page Report	List of URLs in the history events, including the last visit time, domain and page title. Sorted by URL.
Domain Report	Summary of unique domains and counts of pages associated with each. Sorted by domain but easily sortable by page count.

Example usage

CLI

$ ./historyreport.py [OPTIONS]

Flag	Description
-h	Show help message and exit.
-e --exclude	If provided, read the configured exclusions CSV and exclude any URLs in the file before writing the CSV report.

Samples

Sample input files:

Sample output files:

Exclusions

When creating the reports, certain domains or URLs will be excluded. This is done with a config, containing app defaults and user-defined items. For example, the app will ignore www.facebook.com and you might decided to exclude Such as gmail.com.

This project cares about browsing events, so irrelevant events are ignored and define in IGNORE_EVENTS in historyreport.py.

In that script, only http and https URLs are used, filtering out items like local file paths (file:///) and FTP URLs (ftp://).

Documentation

Setup and run the application with the following in the docs directory:

Privacy notice

Your browsing history is kept totally private when using this project:

This project does not require access to the internet to run or your Google credentials to run.
No data is sent outside of this application - the only output is local CSV files in the project's unversioned var directory.

License

Released under MIT.

history-report's People

Contributors

Stargazers

Watchers

history-report's Issues

Go through notes

From plan.txt Dec 2018

Aim:
    Recover URLs which I want to bookmark
        Requires manually reading the page
            Some can be ignored if too much repetition on area or don't need
        Could add to bookmarks to avoid duplication and use some manual sorting into folder
    Make them easy to find
    Read them
    Generate once off report as CSV
        Need domains and pages together
        But also group by visited in periods - column for page to filter by
        Count instead of actual dates

Using frozen dump to recover tab from past year. Afterwards things are sent to bookmarks.


Parsing
    urlparse('')
    => ParseResult(scheme='', netloc='', path='', params='', query='', fragment='')

    from urllib.parse import ParseResult*
    x = ParseResult(*('scheme', 'netloc', 'path', 'params', 'query', 'fragment'))
    x.geturl()
    'scheme://netloc/path;params?query#fragment'



Unicode
    Errors were just in VC maybe? PyCharm is fine.

    TODO: Find out what encoding is used to make use of unicode characters which appear in URLs (such as equals sign) and possibly emojis or at least show emojis as ASCII.
    Some titles contain emojis. Normal unicode characters can only be parsed after emojis are replaced.
    Some URLs are broken
    Check for which sites
    URL can be found using title and domain search.

    https://stackoverflow.com/questions/33485255/python-decoding-a-string-that-consists-of-both-unicode-code-points-and-unicode
    Input either
        codecs.decode('\\u002d', 'unicode_escape')
        '\u002d'
    Gives
        '-'

Categories from transitions
{
    'LINK': 11626,
    'TYPED': 731,
    'AUTO_BOOKMARK': 127
    'RELOAD': 2313, Work keeping just in case.

    'GENERATED': 588,  Generated - google searches
    'AUTO_TOPLEVEL': 579, Chrome native - ignore
    'FORM_SUBMIT': 528,
}


Firebase could be a backend to access from anywhere but still need frontend to be setup on work and home laptop and reachable by cellphone if using that.

Recommend Projects

michaelcurrin / history-report Goto Github PK

history-report's Introduction

History Report

Example usage

CLI

Samples

Exclusions

Documentation

Privacy notice

License

history-report's People

Contributors

Stargazers

Watchers

history-report's Issues

Go through notes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent