Giter VIP home page Giter VIP logo

olek / libcraigscrape Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brighton36/libcraigscrape

2.0 2.0 1.0 868 KB

An easy library to do the heavy lifting between you and craigslist's posting database. Given a URL, libcraigscrape will follow links, scrape fields, and make ruby-sense out of the raw html from craigslist's servers.

Home Page: http://www.derosetechnologies.com/community/libcraigscrape

License: GNU Lesser General Public License v3.0

Ruby 100.00%

libcraigscrape's Introduction

= libcraigscrape - A craigslist URL-scraping support Library

An easy library to do the heavy lifting between you and Craigslist's posting database. Given a URL, libcraigscrape will
follow links, scrape fields, and make ruby-sense out of the raw html from craigslist's servers.

For more information, head to the {craiglist monitoring}[http://www.derosetechnologies.com/community/libcraigscrape] help section of our website.

== craigwatch
libcraigscrape was primarily developed to support the included craigwatch[link:files/bin/craigwatch.html] script. See the included craigwatch script for
examples of libcraigscape in action, and (hopefully) to serve an immediate craigscraping need.

== Installation

Install via RubyGems:

  sudo gem install libcraigscrape

== Usage

=== Scrape Craigslist Listings since Sep 10

On the 'miami.craigslist.org' site, using the query "search/sss?query=apple"

  require 'rubygems'
  require 'libcraigscrape'
  require 'date'
  require 'pp'
  
  miami_cl = CraigScrape.new 'us/fl/miami'
  miami_cl.posts_since(Time.parse('Sep 10'), 'search/sss?query=apple').each do |post|
    pp post  
  end

=== Scrape Last 225 Craigslist Listings

On the 'miami.craigslist.org'  under the 'apa' category

  require 'rubygems'
  require 'libcraigscrape'
  require 'pp'
  
  i=1
  CraigScrape.new('us/fl/miami').each_post('apa') do |post|
    break if i > 225
  	 i+=1
  	 pp post
  end

=== Multiple site with multiple section/search enumeration of posts

In Florida, with the exception of 'miami.craigslist.org' & 'keys.craigslist.org' sites, output each post in 
the 'crg' category and for the search 'artist needed'

  require 'rubygems'
  require 'libcraigscrape'
  require 'pp'
  
  non_sfl_sites = CraigScrape.new('us/fl', '- us/fl/miami', '- us/fl/keys')
  non_sfl_sites.each_post('crg', 'search/sss?query=artist+needed') do |post|
  	 pp post
  end

=== Scrape Single Craigslist Posting

This grabs the full details under the specific post http://miami.craigslist.org/mdc/sys/1140808860.html

  require 'rubygems'
  require 'libcraigscrape'
  
  post = CraigScrape::Posting.new 'http://miami.craigslist.org/mdc/sys/1140808860.html'
  puts "(%s) %s:\n %s" % [ post.post_time.strftime('%b %d'), post.title, post.contents_as_plain ]

=== Scrape Single Craigslist Listing

This grabs the post summaries of the single listings at http://miami.craigslist.org/search/sss?query=laptop

  require 'rubygems'
  require 'libcraigscrape'
  
  listing = CraigScrape::Listings.new 'http://miami.craigslist.org/search/sss?query=laptop'
  puts 'Found %d posts for the search "laptop" on this page' % listing.posts.length

== Author
- Chris DeRose ([email protected])
- DeRose Technologies, Inc. http://www.derosetechnologies.com

== License

See COPYING[link:files/COPYING.html]

libcraigscrape's People

Contributors

brighton36 avatar mcfiredrill avatar olek avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

kdmny

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.