jashmenn / ariel Goto Github PK
View Code? Open in Web Editor NEWFork of A Ruby Information Extraction Library
Home Page: http://rubyforge.org/projects/ariel
License: MIT License
Fork of A Ruby Information Extraction Library
Home Page: http://rubyforge.org/projects/ariel
License: MIT License
= Ariel release 0.1.0 == About - Ariel: A Ruby Information Extraction Library Ariel is a library that allows you to extract information from semi-structured documents (such as websites). It is different to existing tools because rather than expecting the developer to write rules to extract the desired information, Ariel will use a small number of labeled examples to generate and learn effective extraction rules. It is developed by Alex Bradbury and released under the MIT license. Ariel was started as a Google Summer of Code project mentored by Austin Ziegler in 2006. == Install gem install ariel == Announcement I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of Code work. This release should be easy to use, very functional, and hopefully useful - so it's worth trying out. I've put a lot of effort in to writing clear and straightforward documentation to get your started, so take a look at the docs available at http://ariel.rubyforge.org. In particular, flick through the tutorial and quick start guide. If you're interested, you may also want to take a look at the theory page where I've made a good start on describing the method Ariel uses to learn extraction rules. If you have any problems or find any bugs, just send me an email or add it to the issue tracker (see link below). Enjoy. See the FAQ for a vim snippet to make labeling examples a little easier. == Quickstart/Basic usage * @require 'ariel'@ * Define a structure for the information you wish to extract: structure = Ariel::Node::Structure.new do |r| r.item :title r.item :body r.list :comments do |c| c.list_item :comment do |d| d.item :author d.item :body end end end * Collect a few examples of the sort of document you wish to extract information from (pages from the same website for instance). * Label each example with tags such as <l:title>, <l:comment> and so on in the relevant places. * Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3 * Find the documents you want to extract information from. * extractions = Ariel.extract structure, unlabeled_file1, unlabeled_file2 * extractions[0].search('comments/*/body').each {|e| puts e.extracted_text} => "Great stuff, loving it", "I love life", ..... * extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at returns the first result rather than an array of matches). == Credits Ariel is developed by Alex Bradbury as a Google Summer of Code project under the mentoring of Austin Ziegler. == Links SVN Repository: http://rubyforge.org/projects/ariel Issue tracker: http://code.google.com/p/ariel/issues/ Documentation/homepage: http://ariel.rubyforge.org RDoc: http://ariel.rubyforge.org/rdoc/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.