Giter VIP home page Giter VIP logo

html2text_ruby's Introduction

html2text Build Gem Version

html2text is a very simple gem that uses DOM methods to convert HTML into a format similar to what would be rendered by a browser - perfect for places where you need a quick text representation. For example:

<html>
<title>Ignored Title</title>
<body>
  <h1>Hello, World!</h1>

  <p>This is some e-mail content.
  Even though it has whitespace and newlines, the e-mail converter
  will handle it correctly.

  <p>Even mismatched tags.</p>

  <div>A div</div>
  <div>Another div</div>
  <div>A div<div>within a div</div></div>

  <a href="https://foo.com">A link</a>

</body>
</html>

Will be converted into:

Hello, World!

This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly.

Even mismatched tags.

A div
Another div
A div
within a div

[A link](https://foo.com)

See the original blog post or the related StackOverflow answer.

Installing

Add the gem into your Gemfile and run bundle install:

gem 'html2text'

Then you can:

require 'html2text'

text = Html2Text.convert(html)

Tests

See all of the test cases defined in spec/examples/. These can be run with bundle exec rake.

License

html2text is licensed under MIT.

Other versions

  1. html2text, the original PHP implementation.
  2. actionmailer-html2text, automatically generate text parts for HTML emails sent with ActionMailer.

html2text_ruby's People

Contributors

aried3r avatar bobjflong avatar dependabot[bot] avatar mscrivo avatar soundasleep avatar splattael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

html2text_ruby's Issues

Looking for maintainers

Hi everyone! I'm no longer a full-time Ruby dev and I've run out of capacity to maintain this project, so I'm looking for some maintainers going forward. Alternatively I can archive the project as read-only.

Ideal criteria:

  • You have at least one project on GitHub
  • You have experience releasing components to rubygems

Other than that I'm happy for maintainers to take this project into whatever direction it needs to go! :)

For the future of this project I'd suggest some of the most critical tasks are

  • Move CI from travis-ci to Github Actions
  • Set up CI to automatically publish to rubygems
  • Update Gemfile/.lock as necessary e.g. #16

Html2Text.convert() blows out on non-string

How to reproduce?

[5] pry(main)> Html2Text.convert('<b>ab</b>cd')
=> "abcd"
[6] pry(main)> Html2Text.convert(nil)
NoMethodError: undefined method `gsub' for nil:NilClass
from /Users/vpithart/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/html2text-0.2.0/lib/html2text.rb:22:in `replace_entities'
[9] pry(main)> Html2Text.convert(0)
NoMethodError: undefined method `gsub' for 0:Fixnum
from /Users/vpithart/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/html2text-0.2.0/lib/html2text.rb:22:in `replace_entities'

Desired behaviour

The convert() should internally call .to_s on whatever the input is.

Options for URL matching and output

The code considers the href and text to be the same for both these cases:

(A) <a href="https://a.b.c">a.b.c</a>
(B) <a href="http://a.b.c">a.b.c</a>

Both are rendered as 'a.b.c'

However it does not consider the following to be the same:

(C) <a href="http://a.b.c/">a.b.c</a>

So the output is rendered as '[a.b.c](http://a.b.c/)'

I think the href in (A) should not be considered equal, whereas I expect (C) to render as 'a.b.c'.

For some circumstances, it might also be useful to keep the '[text](url)' format regardless of equality.

Maybe it would be useful to do the URL matching in a separate function that could optionally be provided by callers?

Additionally perhaps allow the output format to be varied, e.g. as
'text <url>' which seems to be common in text portions of mime mails.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.