Giter VIP home page Giter VIP logo

django-softhyphen's Introduction

         ___ _       _           _
 ___ ___|  _| |_ ___| |_ _ _ ___| |_ ___ ___
|_ -| . |  _|  _|___|   | | | . |   | -_|   |
|___|___|_| |_|     |_|_|_  |  _|_|_|___|_|_|
                        |___|_|              

A Python library for hyphenating HTML in your Django project

Repurposed from Filipe Fortes' excellent AppEngine app.

Build Status PyPI version Coverage Status

Features

  • Use the ­ HTML entity to hyphenate text. Works well with text-align:justify;
  • Can be called as a function from inside Python code or as a filter in the Django template
  • Supports more than 25 languages

Getting started

Install it.

$ pip install django-softhyphen

Add it to the INSTALLED_APPS in your settings.py

INSTALLED_APPS = (
    ...
    'softhyphen',
    ...
)

Use it in as a function.

>>> from softhyphen.html import hyphenate
>>> hyphenate("<h1>I love hyphenation</h1>")
"<h1>I love hy&shy;phen&shy;a&shy;tion</h1>"
>>> # It is English by default, but you can provide another language.
>>> hyphenate("<h1>Me encanta guiones</h1>", language="es-es")
<h1>Me en&shy;can&shy;ta gu&shy;io&shy;nes</h1>

Or use it as a template filter.

{% load softhyphen_tags %}
{{ text|softhyphen }}
{# You can specify another language as an argument. English is the default #}
{{ text|softhyphen:"es-es" }}

(Warning! Because of its overhead, the filter is not recommended in production if it needs to run each time the page loads.)

django-softhyphen's People

Contributors

dnx avatar epicserve avatar jrief avatar palewire avatar streeter avatar v-alexeev avatar vdboor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-softhyphen's Issues

Library mutates DOM structure of HTML

While I'm using django-softhyphen to insert hyphens in python string containg HTML, it returns string with hyphens, but DOM structure of html changed.

Here is the example:

from softhyphen.html import hyphenate

input_html = '''<p> </p> 

<div class="flexslider small-indent">
<div class="popup-gallery slides">
<figure><a href="/media/tmp/a85bb122-3022-11e3-921f-002710a783d4.jpg"><img src="/media/tmp/a85bb122-3022-11e3-921f-002710a783d4.jpg" /></a><figcaption class="flex-caption">
<p>Test text</p>
</figcaption></figure>

<figure><a href="/media/tmp/aa147a44-3022-11e3-9d92-002710a783d4.jpg"><img src="/media/tmp/aa147a44-3022-11e3-9d92-002710a783d4.jpg" /></a><figcaption class="flex-caption">
<p>Another test text</p>
</figcaption></figure>
</div>
</div>
'''
print hyphenate(input_html)
<p> </p>
<div class="flexslider small-indent">
<div class="popup-gallery slides">
<figure><a href="/media/tmp/a85bb122-3022-11e3-921f-002710a783d4.jpg"><img src="/media/tmp/a85bb122-3022-11e3-921f-002710a783d4.jpg" /></a><figcaption class="flex-caption">
</figcaption></figure><p>Test text</p>

<figure><a href="/media/tmp/aa147a44-3022-11e3-9d92-002710a783d4.jpg"><img src="/media/tmp/aa147a44-3022-11e3-9d92-002710a783d4.jpg" /></a><figcaption class="flex-caption">
</figcaption></figure><p>An&shy;oth&shy;er test text</p>

</div>
</div>

So p element was moved out of figcaption element. How to prevent this behaviour?

Warn about risks of hyphenation?

Hello, and thanks for this module,

I activated on my django-cms blog, however I noticed troubles later when copy/pasting text to other editors, or copyingh a partially obfuscated email address to my mail client. The hyphenation characters become dashes, or stay invisible but corrupt the processing later (eg. mail sending fails).

Maybe the readme should mention these few edge cases, for those that do not know the impacts of this hyphenation processing ?

Prevent hyphenation of short words in Russian

I've seen this example in the source code:

Short words are not hyphenated

>>> hyphenate("<p>The brave men, living and dead.</p>")
u'<p>The brave men, liv&shy;ing and dead.</p>'

This doens't hold for Russian language where 5 letter words got hyphenated, how can I control this behavior?

HTML escape problems when using as tempate filter

I’ve tried django-softhyphen as a template filter with Django 1.9.1, Python 3.4

When I leave autoescape on (the default), I get all the ­ escaped, so they are being displayed as ­ on the web page. So I have to turn off autoescape for the fields where want hyphenation, which might be a security problem, and causes problems when there are & or < in the text fields, which are then interpreted as HTML syntax. I had a company name with & and no space afterwards, which displayed as a funny special character. Putting a space after the & avoids this, but it’s still wrong HTML.

Don't put fixed version numbers in setup.py

Upgrading django-softhyphen will downgrade six at the moment. Please use six>=1.5.1 in setup.py instead of putting it on a fixed version. The same also applies to the beautifulsoup4 dependency.

Example situation:

$ pip install -U django-softhyphen

Downloading/unpacking django-softhyphen from https://pypi.python.org/packages/source/d/django-softhyphen/django-softhyphen-1.0.0.tar.gz#md5=6dc76efb26cb3ed95b6f74cd18b4e0bd
  Downloading django-softhyphen-1.0.0.tar.gz (1.0MB): 1.0MB downloaded
  Storing download in cache at /Users/diederik/Library/Caches/pip-downloads/https%3A%2F%2Fpypi.python.org%2Fpackages%2Fsource%2Fd%2Fdjango-softhyphen%2Fdjango-softhyphen-1.0.0.tar.gz
  Running setup.py (path:/Users/diederik/Sites/virtualenvs/wakawaka/build/django-softhyphen/setup.py) egg_info for package django-softhyphen

    warning: no files found matching 'README.textile'
Downloading/unpacking beautifulsoup4==4.3.2 (from django-softhyphen)
  Downloading beautifulsoup4-4.3.2.tar.gz (143kB): 143kB downloaded
  Storing download in cache at /Users/diederik/Library/Caches/pip-downloads/https%3A%2F%2Fpypi.python.org%2Fpackages%2Fsource%2Fb%2Fbeautifulsoup4%2Fbeautifulsoup4-4.3.2.tar.gz
  Running setup.py (path:/Users/diederik/Sites/virtualenvs/wakawaka/build/beautifulsoup4/setup.py) egg_info for package beautifulsoup4

Downloading/unpacking six==1.5.1 (from django-softhyphen)
  Downloading six-1.5.1-py2.py3-none-any.whl
  Storing download in cache at /Users/diederik/Library/Caches/pip-downloads/https%3A%2F%2Fpypi.python.org%2Fpackages%2F3.3%2Fs%2Fsix%2Fsix-1.5.1-py2.py3-none-any.whl
Installing collected packages: django-softhyphen, beautifulsoup4, six
  Found existing installation: django-softhyphen 0.15
    Uninstalling django-softhyphen:
      Successfully uninstalled django-softhyphen
  Running setup.py install for django-softhyphen

    warning: no files found matching 'README.textile'
  Running setup.py install for beautifulsoup4

  Found existing installation: six 1.7.3
    Uninstalling six:
      Successfully uninstalled six
Successfully installed django-softhyphen beautifulsoup4 six
Cleaning up...

Fix typo in README.textile

README.textile, "Use it in as a function" block.

Replace:

from soft_hyphen.html import hyphenate_html

with:

from softhyphen.html import hyphenate_html

Stripping whitespace should not be default

Why does django softhyphen strip whitespace indefinitely? I have a case where the paragraph I'm hyphenating has a style tag <em> around the first word of the paragraph, and that text has a space at the end of the text in the <em> tag, or after the </em> tag which is getting stripped out.

This is an example of what the text looks like with the space inside the <em> tag:

<p><em>Test. </em>This is a test paragraph.</p>

This is an example of what the text looks like with the space after the <em> tag:

<p><em>Test.</em> This is a test paragraph.</p>

The result after hyphenation in either case is:

<p><em>Test.</em>This is a test paragraph.</p>

I can fix this locally by changing STRIP_WHITESPACE.sub(...) to re.sub(...) but it would be nice to be able to choose if I want to strip whitespace or not. Is there a reason the whitespace is always stripped? Can this be default to strip, but I can override it?

hyphenator shall specify parser for Beautifulsoup

django-softhyphen works perfectly if html5lib is not installed. However with html5lib in your Python search path, the given example code

from softhyphen.html import hyphenate
>>> hyphenate("<h1>I love hyphenation</h1>")
u'<html><body><h1>I love hy&shy;phen&shy;a&shy;tion</h1></body></html>'

gives the result string wrapped into a <html><body>... , which is not what we want.

This can be overridden by Monkey-patching with Beautifulsoup.DEFAULT_BUILDER_FEATURES = ['html.parser'], but that might cause other unwanted side-effects. A better approach would be to add a configuration setting in django-softhyphen, which invokes

html.py (line 54)

soup = BeautifulSoup(html, features=BEAUTIFULSOUP_BUILDER_FEATURES)

where BEAUTIFULSOUP_BUILDER_FEATURES defaults to ['html.parser'].

If you accept this feature request, I'll send a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.