Giter VIP home page Giter VIP logo

sketchy's Introduction

Sketchy #Overview#

What is Sketchy?

Sketchy is a task based API for taking screenshots and scraping text from websites.

What is the Output of Sketchy?

Sketchy's capture model contains all of the information associated with screenshotting, scraping, and storing html files from a provided URL. Screenshots (sketches), text scrapes, and html files can either be stored locally or on an S3 bucket. Optionally, token auth can be configured for creating and retrieving captures. Sketchy can also perform callbacks if required.

How Does Sketchy Do It?

Sketchy utilizes PhantomJS with lazy-rendering to ensure Ajax heavy sites are captured correctly. Sketchy also utilizes Celery task management system allowing users to scale Sketchy accordingly and manage time intensive captures.

Release History

Version 1.1 - December 4, 2014

A number of improvements and bug fixes have been made:

  • A new model and API endpoint called "Static" was created. This allows users to send Sketchy a static HTML file for text scraping and screenshoting. See the Wiki for usage information.
  • New PhantomJS script called 'static.js' for creating screenshots of static html files.
  • Creation of a new endpont: api/v1.0/capture/last which shows the last capture that was taken.
  • Creation of a new endpont: api/v1.0/static/last which shows the last static capture that was taken.
  • API list view is now reverse sorted so most recent capture is listed on the top of the page.
  • For callback requests, capture status is now updated
  • Task retry has been optimitzed to only retry on ConnectionErrors. This should speedup errors that would never succeed during a retry.
  • A new configuration setting "SSL_HOST_VALIDATION" can be set to scrape/screenshot webpages with SSL errors.
  • A new configuration setting "CAPTURE_ERRORS" can be used to scrape/screenshot webpages that have 4xx or 5xx http status codes.

#Documentation#

Documentation is maintained in the Github Wiki

#Docker# Sketchy is also available as a Docker container.

sketchy's People

Contributors

sbehrens avatar sherzberg avatar

Watchers

James Cloos avatar Daniel Baddeley avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.