Giter VIP home page Giter VIP logo

agio's Introduction

Agio

Description

Agio is a library and a tool for converting HTML to Markdown.

About the Name

The name was chosen because agio is “a premium on money in exchange”, sort of the opposite of a markdown. It comes from the Italian aggio (premium), not from the Italian agio (ease), although the hope is that there is an ease in use of this library.

Why Agio?

  1. Agio is well tested.

  2. Agio is pure Ruby.

  3. Agio is MIT licensed.

Agio is Well Tested

With this release, there are 274 unit tests (RSpec examples) covering almost everything. The only things not covered are Agio itself (which mostly performs coordination duties) and Agio::Bourse (the parsed HTML-to-Markdown translator). These tests cover 85%–95% of the code known to be tested, and the modules currently missing test will have those tests completed prior to the release of Agio 1.0.

In addition to the unit tests, more than a hundred manual tests have been run to verify the quality of output for basic HTML. These manual tests have taken one of two forms:

  1. Markdown input converted to HTML with rdiscount, and then converted back to Markdown with Agio. In all cases, the resulting Markdown is either identical to the original or the differences can be attributed to style (how Agio writes emphasized text versus the hand-written original, or how Agio represents links by default). This tests Agio’s round-trip capability.

  2. HTML input converted to Markdown with either Pandoc or html2txt.py, converted back to HTML with rdiscount, and then converted once again to Markdown with Agio. This tests Agio’s ability to output data that is syntactically similar to those of better-known and presumably better-tested tools.

Agio will likely have bugs, especially before version 1.0, and not all features are yet implemented or exposed to the user. Syntactic support is also incomplete, as the goal is to support many of the syntax extensions found in Markdown Extra or other popular modules, such as Github-flavoured Markdown.

Where

Synopsis

Install Agio with:

gem install agio

Run Agio against HTML with:

agio input.html > output.markdown

History

Why I Wrote Agio

Agio is the result of some yak-shaving as I was looking to convert my blog content from WordPress to Jekyll. The Jekyll wiki points to Thomas Frőssman’s Exitwp Python script as a reliable conversion mechanism, but I found that it couldn’t handle the data in my WordPress export file. So, I ported Exitwp from Python to Ruby as Poole.

Like Exitwp, Poole depends on Pandoc. While it’s an amazing tool, it took the better part of 45 minutes to compile the Haskell Platform with Homebrew and Pandoc with Cabal. Looking around the Ruby community, there wasn’t a single Ruby-based HTML-to-Markdown converter that I felt I could trust to get everything right that was also licensed to my liking (I prefer the MIT license). While Kramdown is impressive, it’s GPL-licensed. I didn’t want Poole (which is MIT-licensed) to depend on anything that provided any less freedom for any purpose.

Armed with this plan, I started the process of deeply understanding how Aaron Swartz’s html2txt.py script works. This included an early version of Agio that was a more-or-less straightforward port, but produced output that was worse because of differences between Python’s textwrap module and Ruby’s Text::Format that could not be cleanly resolved by tweaking Aaron’s algorithm.

:include: License.rdoc

agio's People

Stargazers

Angus H. avatar Michel avatar Turadg Aleahmad avatar Austin Ziegler avatar

Watchers

Austin Ziegler avatar Turadg Aleahmad avatar

Forkers

akaibo gotomypc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.