Giter VIP home page Giter VIP logo

ariel's Introduction

= Ariel release 0.1.0

== About - Ariel: A Ruby Information Extraction Library
Ariel is a library that allows you to extract information from semi-structured
documents (such as websites). It is different to existing tools because rather
than expecting the developer to write rules to extract the desired information,
Ariel will use a small number of labeled examples to generate and learn
effective extraction rules. It is developed by Alex Bradbury and released under
the MIT license. Ariel was started as a Google Summer of Code project mentored
by Austin Ziegler in 2006.

== Install
gem install ariel

== Announcement

I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
Code work. This release should be easy to use, very functional, and hopefully
useful - so it's worth trying out. I've put a lot of effort in to writing clear
and straightforward documentation to get your started, so take a look at the
docs available at http://ariel.rubyforge.org. In particular, flick through the
tutorial and quick start guide. If you're interested, you may also want to take
a look at the theory page where I've made a good start on describing the method
Ariel uses to learn extraction rules. If you have any problems or find any bugs,
just send me an email or add it to the issue tracker (see link below). Enjoy.
See the FAQ for a vim snippet to make labeling examples a little easier.

== Quickstart/Basic usage

* @require 'ariel'@
* Define a structure for the information you wish to extract: 
    structure = Ariel::Node::Structure.new do |r|
      r.item :title
      r.item :body
      r.list :comments do |c|
        c.list_item :comment do |d|
          d.item :author
          d.item :body
        end
      end
     end
* Collect a few examples of the sort of document you wish to extract information
  from (pages from the same website for instance).
* Label each example with tags such as <l:title>, <l:comment> and so on in the
  relevant places.
*  Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
* Find the documents you want to extract information from.
*  extractions = Ariel.extract structure, unlabeled_file1,
  unlabeled_file2
*  extractions[0].search('comments/*/body').each {|e| puts e.extracted_text} =>
  "Great stuff, loving it", "I love life", .....
*  extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
  returns the first result rather than an array of matches).


== Credits
Ariel is developed by Alex Bradbury as a Google Summer of Code project under the
mentoring of Austin Ziegler.

== Links
SVN Repository: http://rubyforge.org/projects/ariel
Issue tracker: http://code.google.com/p/ariel/issues/
Documentation/homepage: http://ariel.rubyforge.org
RDoc: http://ariel.rubyforge.org/rdoc/

ariel's People

Contributors

jashmenn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

darkphantum

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.