Giter VIP home page Giter VIP logo

history-report's Introduction

History Report

Create page and domain CSV reports from your Chrome browsing history

Google Chrome lets you view and search your browsing history, although with limitations on metadata and the ability to filter or sort. Though you can export your data to an XML file from the browser or get a JSON download from your Google account, those raw data files are still inconvenient to use.

Therefore this project provides Python 3.6 tool to convert your own downloaded browsing history JSON into two easy-to-use CSV reports which you can search, filter and sort in a CSV editor.

Title Description
Page Report List of URLs in the history events, including the last visit time, domain and page title. Sorted by URL.
Domain Report Summary of unique domains and counts of pages associated with each. Sorted by domain but easily sortable by page count.

Example usage

CLI

$ ./historyreport.py [OPTIONS]
Flag Description
-h Show help message and exit.
-e --exclude If provided, read the configured exclusions CSV and exclude any URLs in the file before writing the CSV report.

Samples

Sample input files:

Sample output files:

Exclusions

When creating the reports, certain domains or URLs will be excluded. This is done with a config, containing app defaults and user-defined items. For example, the app will ignore www.facebook.com and you might decided to exclude Such as gmail.com.

This project cares about browsing events, so irrelevant events are ignored and define in IGNORE_EVENTS in historyreport.py.

In that script, only http and https URLs are used, filtering out items like local file paths (file:///) and FTP URLs (ftp://).

Documentation

Setup and run the application with the following in the docs directory:

Privacy notice

Your browsing history is kept totally private when using this project:

  • This project does not require access to the internet to run or your Google credentials to run.
  • No data is sent outside of this application - the only output is local CSV files in the project's unversioned var directory.

License

Released under MIT.

history-report's People

Contributors

michaelcurrin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

history-report's Issues

Go through notes

From plan.txt Dec 2018

Aim:
    Recover URLs which I want to bookmark
        Requires manually reading the page
            Some can be ignored if too much repetition on area or don't need
        Could add to bookmarks to avoid duplication and use some manual sorting into folder
    Make them easy to find
    Read them
    Generate once off report as CSV
        Need domains and pages together
        But also group by visited in periods - column for page to filter by
        Count instead of actual dates

Using frozen dump to recover tab from past year. Afterwards things are sent to bookmarks.


Parsing
    urlparse('')
    => ParseResult(scheme='', netloc='', path='', params='', query='', fragment='')

    from urllib.parse import ParseResult*
    x = ParseResult(*('scheme', 'netloc', 'path', 'params', 'query', 'fragment'))
    x.geturl()
    'scheme://netloc/path;params?query#fragment'



Unicode
    Errors were just in VC maybe? PyCharm is fine.

    TODO: Find out what encoding is used to make use of unicode characters which appear in URLs (such as equals sign) and possibly emojis or at least show emojis as ASCII.
    Some titles contain emojis. Normal unicode characters can only be parsed after emojis are replaced.
    Some URLs are broken
    Check for which sites
    URL can be found using title and domain search.

    https://stackoverflow.com/questions/33485255/python-decoding-a-string-that-consists-of-both-unicode-code-points-and-unicode
    Input either
        codecs.decode('\\u002d', 'unicode_escape')
        '\u002d'
    Gives
        '-'

Categories from transitions
{
    'LINK': 11626,
    'TYPED': 731,
    'AUTO_BOOKMARK': 127
    'RELOAD': 2313, Work keeping just in case.

    'GENERATED': 588,  Generated - google searches
    'AUTO_TOPLEVEL': 579, Chrome native - ignore
    'FORM_SUBMIT': 528,
}


Firebase could be a backend to access from anywhere but still need frontend to be setup on work and home laptop and reachable by cellphone if using that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.