Giter VIP home page Giter VIP logo

treasure-audit's Introduction

Treasure Audit

A simple web spider with a desktop interface

Treasure Audit is a multi-platform desktop Web spider that allows users to crawl and search websites, without having to worry about out-of-date content as a result of caching.

Features

  • Import a list of websites to crawl from a .txt file
  • Export a list of matched pages
  • Add a virtually unlimited amount of matching criteria to filter for content
  • View pages with an HTML text viewer or HTML renderer
  • Highlight matches within the HTML view
  • Flexible interface optimized for webmasters and people building out sites
  • Real-time auditing – no more caching from Google
  • Linux, Mac, and Windows compatibility

Binaries

You can download the compiled binaries from GitHub for Linux, Mac, and Windows: https://github.com/marceloclubhouse/treasure-audit/releases/tag/v0.1.3

Manual Installation

Download and install Python from

https://www.python.org/downloads/

Install Requests, BeautifulSoup, and PyQt5

pip3 install requests bs4 pyqt5

Download Treasure Audit from this page and execute run.py

python3 run.py

Usage

Choose a schema and enter the URL of the website you want to crawl then click Crawl

Add your criteria to narrow down the list of matched pages.

After the pages are crawled, you can click on the individual pages to view their HTML, or you can choose to render the pages in a boxed web browser (without any assets like CSS, JS, or images)

If you want to see where your criteria matches within the HTML, you can enable highlighting by going to View > Highlight Matches, which will highlight your matches in green within the HTML viewer.

At which point you can scroll through the HTML viewer to find the match.

If you find a match that you don’t want to include, you can copy and paste it into the criterion box and choose to ignore it

You can open the page you’re viewing in an external web browser by either going to Menu > Edit > Open Page in Web Browser

which will open the page in your default browser.

Credits

  • Logo – Shamash Teran
  • Icons – Cole Bemis
  • GUI and Software – Marcelo Cubillos

Treasure Audit is provided under the GPL v3 License. See LICENSE.txt for more information.

treasure-audit's People

Contributors

marceloclubhouse avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.