nono / html-truncator Goto Github PK

Wants to truncate an HTML string properly? This Ruby gem is for you.

Home Page: http://rubygems.org/gems/html_truncator

License: MIT License

Ruby 100.00%

html-truncator's Issues

Messing up Nokogiri

Am I wrong in assuming that you are modifying the way that Nokogiri's built-in classes work for all code that is calling into Nokogiri? Thus making this unusable in large projects that already depend on Nokogiri?

Truncations after tag boundary

For html like the following:

<p>
  five words in this paragraph
</p>
<p>
  some more text which will be truncated
</p>

Given the above html is stored in the html var:

HTML_Truncator.truncate(html, 5)

returns html as follows:

<p>
  five words in this paragraph
</p>
<p>
  ...
</p>

Ideally I would expect the ... to be appended to the first <p> block, with the second block removed, like so:

<p>
  five words in this paragraph...
</p>

This looks like it could be a little bit of a pain to implement, sorry :3

HTML encoded strings are decoded

Just stumbled at this:

HTML_Truncator.truncate('12345678901', 10, length_in_chars: true)
=> "1234567890…" # good

HTML_Truncator.truncate('<br>12345678901', 10, length_in_chars: true)
=> "<br>1234567890…" # good

HTML_Truncator.truncate('<br>&lt;br&gt;12345678901', 10, length_in_chars: true)
=> "<br><br>123456…" # bad, second '<br>' is decoded!

HTML_Truncator.truncate('&lt;br&gt;', 10, length_in_chars: true)
=> "&lt;br&gt;" # inconsistent: if length is shorter, it is not decoded as opposed to example before

I think the method should never decode strings. The encoded chars could count as 1 length (< is length 1 etc), so:

HTML_Truncator.truncate('<br>&lt;br&gt;12345678901', 10, length_in_chars: true)
=> "<br>&lt;br&gt;123456…"

This would make most sense I think.

Comma after text

I truncate some text with comma and recieve result: 'some long text,...'
So, i would like to remove that comma after text too
Can you add this feature?

Pattern in words scan

To search words and count in text node you use pattern like this

HTML-Truncator/lib/html_truncator.rb

Line 78 in 45382ea

words = content.scan(/\s*\S+/)

but it doesn't covers special charecters like NO-BREAK SPACE
I think pattern /[[:space:]]*[[[:punct:]][[:word:]]]+/ is better because also covers non-ASCII characters in utf-8

Strips script and style

HTML_Truncator.truncate("<style>Lorem ipsum dolor sit amet.</style>", 3)
HTML_Truncator.truncate("<script>Lorem ipsum dolor sit amet.</script>", 3)

results in

<style></style>…

some tags should not be touched at all

Not work with russian language

HTML_Truncator.truncate("<p>Русский текст.</p>", 5, length_in_chars: true) 
# => "<p>…</p>"

For speed, allow Nokogiri nodes to be passed in

I am already using Nokogiri for processing HTML, and it would be more efficient if I could just pass in a node to your sanitizer, instead of outputting to HTML, then having your library re-convert it to a node.

Removing img tag

When i truncate my text it just remove the img tag:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sed rhoncus mauris. Pellentesque tempus, sapien sit amet volutpat tristique, felis lectus rhoncus sem, ut laoreet velit nisi ac turpis.

Strips iframe tags on ruby 2.1

@nono

2.0.0-p247 :061 > text = "<iframe width=640 height=360 src=//www.youtube.com/embed/WLIfmnlSkQ4?feature=player_detailpage frameborder=0 allowfullscreen></iframe>"
 => "<iframe width=640 height=360 src=//www.youtube.com/embed/WLIfmnlSkQ4?feature=player_detailpage frameborder=0 allowfullscreen></iframe>"
2.0.0-p247 :062 > HTML_Truncator.truncate(text, 2)
 => "<iframe width=\"640\" height=\"360\" src=\"//www.youtube.com/embed/WLIfmnlSkQ4?feature=player_detailpage\" frameborder=\"0\" allowfullscreen></iframe>"


2.1.0 :001 > text = "<iframe width=640 height=360 src=//www.youtube.com/embed/WLIfmnlSkQ4?feature=player_detailpage frameborder=0 allowfullscreen></iframe>"
 => "<iframe width=640 height=360 src=//www.youtube.com/embed/WLIfmnlSkQ4?feature=player_detailpage frameborder=0 allowfullscreen></iframe>"
2.1.0 :002 > HTML_Truncator.truncate(text, 2)
 => ""
2.1.0 :003 >

The ellipsis should not be put in code tags

See https://github.com/nono/linuxfr.org/issues#issue/105

number of words vs number of chars

Any interest in making API similar to rails truncate? You tell it the max number of chars, instead of max number of words. But you can also tell it the separator to use, for instance ' ', to make sure it truncates on word boundary -- it'll truncate at the first separator before the limit.

My concern with truncating on number of words, is if the input has a really long 'word', hundreds of chars without any spaces -- nothing will get truncated.

Is there any way to ignore specified tags?

Suppose we have a blog with images and texts, we wanna truncate the blog as thumbnail in blog list, but we don't want images.

Seems the gem hasn't support it?

Thanks for any help.

Current implementation creates a Singleton class and prevents marshaling

This line:
eval "class <<str; def html_truncated?; #{opts[:was_truncated]} end end"

causes a singleton class to be returned which prevents the field from being serialized using Marshal (and therefore prevents it from being cached using dalli/memcached).

Characters instead of Words

First of all thanks for this awesome gem, I'm really happy with it! It's just what I needed. But I found a case where I need a fixed character length. Would it be possible for you to consider adding an option for character length as an extra option?

nono / html-truncator Goto Github PK

html-truncator's Issues

Messing up Nokogiri

Truncations after tag boundary

HTML encoded strings are decoded

Comma after text

Pattern in words scan

Strips script and style

Not work with russian language

For speed, allow Nokogiri nodes to be passed in

Removing img tag

Strips iframe tags on ruby 2.1

The ellipsis should not be put in code tags

number of words vs number of chars

Is there any way to ignore specified tags?

Current implementation creates a Singleton class and prevents marshaling

Characters instead of Words

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent