Giter VIP home page Giter VIP logo

scraperwiki_local's Introduction

Local ScraperWiki Python Library

This library aims to be a drop-in replacement for the Python scraperwiki library for use locally. That is, functions will work the same way, and data will go into a local SQLite database; a targeted bombing of ScraperWiki's servers will not stop this local library from working, unless you happen to be running it on one of ScraperWiki's servers.

Installing

This will soon be in PyPI, but for now you can just install from the git repository.

Documentation

Read the standard ScraperWiki Python library's documentation, then look below for some quirks about the local version.

Quirks

The local library aims to be a drop-in replacement. In reality, the local version sometimes works better, though not all of the features have been implemented.

Differences

Datastore differences

The local scraperwiki.sqlite is powered by DumpTruck, so some things work a bit differently.

Data is stored to a local sqlite database named scraperwiki.sqlite.

Bizarre table and column names are supported.

scraperwiki.sqlite.execute will return an empty list of keys on an empty select statement result.

scraperwiki.sqlite.attach downloads the whole datastore from ScraperWiki, the first time it runs; it then uses the cached database

Other Differences

Status of implementation

In general, features that have not been implemented raise a NotImplementedError.

Datastore

scraperwiki.sqlite is missing the following features.

  • All of the verbose keyword arguments (These control what is printed on the ScraperWiki code editor)

Geo

The UK geocoding helpers (scraperwiki.geo) documented on scraperwiki.com have been implemented. They partially depend on scraperwiki.com being available.

Utils

scraperwiki.utils is implemented, as well as the following functions.

  • scraperwiki.log
  • scraperwiki.scrape
  • scraperwiki.pdftoxml
  • scraperwiki.swimport

Deprecated

These submodules are deprecated and thus will not be implemented.

  • scraperwiki.apiwrapper
  • scraperwiki.datastore
  • scraperwiki.jsqlite
  • scraperwiki.metadata
  • scraperwiki.newsql

Development

Run tests with ./runtests; this small wrapper cleans up after itself.

Specs

Here are some ScraperWiki scrapers that demonstrate the non-local library's quirks.

https://scraperwiki.com/scrapers/scraperwiki_local/ https://scraperwiki.com/scrapers/cast/ https://scraperwiki.com/scrapers/things_happen_when_you_do_not_commit/ https://scraperwiki.com/scrapers/what_does_show_tables_return/ https://scraperwiki.com/scrapers/on_conflict/ https://scraperwiki.com/scrapers/spaces_in_table_names/ https://scraperwiki.com/scrapers/spaces_in_table_names_1/

scraperwiki_local's People

Contributors

tlevine avatar aidanhs avatar drj11 avatar teajaymars avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.