Giter VIP home page Giter VIP logo

jolitypo's Introduction

JoliTypo – Web Microtypography fixer

Finally a tool for typography nerds.

JoliTypo is a tool fixing Microtypography glitches inside your HTML content.

use JoliTypo\Fixer;

$fixer = new Fixer(['Ellipsis', 'Dash', 'SmartQuotes', 'CurlyQuote', 'Hyphen']);
$fixedContent = $fixer->fix('<p>"Tell me Mr. Anderson... what good is a phone call... if you\'re unable to speak?" -- Agent Smith, <em>Matrix</em>.</p>');
<p>&ldquo;Tell me Mr. Ander&shy;son&hellip; what good is a phone call&hellip; if you&rsquo;re unable to speak?&rdquo;&mdash;Agent Smith, <em>Matrix</em>.</p>

“Tell me Mr. Anderson… what good is a phone call… if you’re unable to speak?”—Agent Smith, Matrix.

It's designed to be:

  • language agnostic (you can fix fr_FR, fr_CA, en_US... You tell JoliTypo what to fix);
  • easy to integrate into modern PHP projects (composer and autoload);
  • robust (make use of \DOMDocument instead of parsing HTML with dummy regexp);
  • smart enough to avoid Javascript, Code, CSS processing... (configurable protected tags list);
  • fully tested;
  • fully open and usable in any project (MIT License).

You can try it with the online demo!

Latest Stable Version

Quick usage

Just tell the Fixer class which Fixer you want to run on your content and then, call fix():

use JoliTypo\Fixer;

$fixer = new Fixer(["SmartQuotes", "FrenchNoBreakSpace"]);
$fixer->setLocale('fr_FR');
$fixedContent = $fixer->fix('<p>Je suis "très content" de t\'avoir invité sur <a href="http://jolicode.com/">Jolicode.com</a> !</p>');

For your ease of use, you can find ready to use list of Fixer for your language here. Micro-typography is nothing like a standard or a law, what really matters is consistency, so feel free to use your own lists.

Please be advised that JoliTypo works best on HTML content; it will also work on plain text, but will be less smart about smart quotes. When fixing a complete HTML document, potential <head>, <html> and <body> tags may be removed.

To fix non HTML content, use the fixString() method:

use JoliTypo\Fixer;

$fixer = new Fixer(["Trademark", "SmartQuotes"]);
$fixedContent = $fixer->fixString('Here is a "protip(c)"!'); // Here is a “protip©”!

CLI usage

You can run a standalone version of JoliTypo by downloading the PHAR version

Run jolitypo --help to know how to configure the Fixer.

Installation

Requirements are handled by Composer (libxml and mbstring are required).

composer require jolicode/jolitypo

Usage outside composer is also possible, just add the src/ directory to any PSR-0 compatible autoloader.

Integrations

Available Fixers

Dash

Replaces the simple dash - by a ndash between numbers (dates ranges...) and the double -- by a mdash .

Dimension

Replaces the letter x between numbers (12 x 123) by a times entity (×, the real mathematical symbol).

Ellipsis

Replaces the three dots ... by an ellipsis .

SmartQuotes

Converts dumb quotes " " to all kinds of smart style quotation marks (“ ”, « », „ “...). Handles a good variety of locales, like English, Arabic, French, Italian, Spanish, Irish, German...

See the code for more details, and do not forget to specify a locale on the Fixer instance.

This Fixer replaces legacy EnglishQuotes, FrenchQuotes and GermanQuotes.

FrenchNoBreakSpace

Replaces some classic spaces by non-breaking spaces following the French typographic code. No break space are placed before :, thin no break space before ;, ! and ?.

NoSpaceBeforeComma

Removes space before , and makes sure there is only one space after.

Hyphen (automatic hyphenation)

Makes use of org_heigl/hyphenator, a tool enabling word-hyphenation in PHP. This Hyphenator uses the pattern-files from OpenOffice which are based on the pattern-files created for TeX.

There are only some locales available for this fixer: af_ZA, ca, da_DK, de_AT, de_CH, de_DE, en_GB, en_UK, et_EE, fr, hr_HR, hu_HU, it_IT, lt_LT, nb_NO, nn_NO, nl_NL, pl_PL, pt_BR, ro_RO, ru_RU, sk_SK, sl_SI, sr, zu_ZA.

You can read more about this fixer on the official github repository.

This Fixer requires a Locale to be set on the Fixer with $fixer->setLocale('fr_FR');. Default to en_GB.

Proper hyphenation is mandatory in justified text and you should avoid word breaking in titles with this line of CSS: hyphens:none;.

⚠ Be aware that the current screen readers are unable to spell correctly the words containing &shy; tags. The Hyphen filter should therefore be used with caution or you might reduce your website's accessibility.

CurlyQuote (Smart Quote)

Replaces straight quotes ' with curly ones . There is one exception to consider: foot and inch marks (minutes and second marks). Purists use prime , this fixer uses straight quotes for compatibility. Read more about Curly quotes.

Trademark

Handles trade­mark symbol , a registered trade­mark symbol ®, and a copy­right symbol ©. This fixer replaces commonly used approximations: (r), (c) and (TM). A non-breaking space is put between numbers and copyright symbols too.

Unit (formerly Numeric)

Adds a non-breaking space between a numeral and its unit. Like this: 12_h, 42_฿ or 88_%. It was named Numeric before release 1.0.2, but BC is kept for now.

It is really easy to make your own Fixers, feel free to extend the provided ones if they do not fit your typographic rules.

Fixer recommendations by locale

en_GB

$fixer = new Fixer(['Ellipsis', 'Dimension', 'Unit', 'Dash', 'SmartQuotes', 'NoSpaceBeforeComma', 'CurlyQuote', 'Hyphen', 'Trademark']);
$fixer->setLocale('en_GB');

fr_FR

Those rules apply for most of the recommendations of "Abrégé du code typographique à l'usage de la presse", ISBN: 9782351130667.

$fixer = new Fixer(['Ellipsis', 'Dimension', 'Unit', 'Dash', 'SmartQuotes', 'FrenchNoBreakSpace', 'NoSpaceBeforeComma', 'CurlyQuote', 'Hyphen', 'Trademark']);
$fixer->setLocale('fr_FR');

fr_CA

Mostly the same as fr_FR, but the space before punctuation points is not mandatory.

$fixer = new Fixer(['Ellipsis', 'Dimension', 'Unit', 'Dash', 'SmartQuotes', 'NoSpaceBeforeComma', 'CurlyQuote', 'Hyphen', 'Trademark']);
$fixer->setLocale('fr_CA');

de_DE

Mostly the same as en_GB, according to Typefacts and Wikipedia.

$fixer = new Fixer(['Ellipsis', 'Dimension', 'Unit', 'Dash', 'SmartQuotes', 'NoSpaceBeforeComma', 'CurlyQuote', 'Hyphen', 'Trademark']  );
$fixer->setLocale('de_DE');

More to come (contributions welcome!).

Documentation

Default usage

$fixer        = new Fixer(['Ellipsis', 'Dimension', 'Dash', 'SmartQuotes', 'CurlyQuote', 'Hyphen']);
$fixedContent = $fixer->fix("<p>Some user contributed HTML which does not use proper glyphs.</p>");

$fixer->setRules(['CurlyQuote']);
$fixedContent = $fixer->fix("<p>I'm only replacing single quotes.</p>");

$fixer->setRules(['Hyphen']);
$fixer->setLocale('en_GB'); // I tell which locale to use for Hyphenation and SmartQuotes
$fixedContent = $fixer->fix("<p>Very long words like Antidisestablishmentarianism.</p>");

Define your own Fixer

If you want to add your own Fixer to the list, you have to implement JoliTypo\FixerInterface. Then just give JoliTypo their fully qualified name, or even instance:

// by FQN
$fixer        = new Fixer(['Ellipsis', 'Acme\\YourOwn\\TypoFixer']);
$fixedContent = $fixer->fix("<p>Content fixed by the 2 fixers.</p>");

// or instances, or both
$fixer        = new Fixer(['Ellipsis', 'Acme\\YourOwn\\TypoFixer', new Acme\\YourOwn\\PonyFixer("Some parameter")]);
$fixedContent = $fixer->fix("<p>Content fixed by the 3 fixers.</p>");

Configure the protected tags

Protected tags is a list of HTML tag names that the DOM parser must avoid. Nothing in those tags will be fixed.

$fixer        = new Fixer(['Ellipsis']);
$fixer->setProtectedTags(['pre', 'a']);
$fixedContent = $fixer->fix("<p>Fixed...</p> <pre>Not fixed...</pre> <p>Fixed... <a>Not Fixed...</a>.</p>");

Add your own Fixer / Contribute a Fixer

  • Write tests;
  • A Fixer is run on a piece of text, no HTML to deal with;
  • Implement JoliTypo\FixerInterface;
  • Send your Pull request.

Contribution guidelines

  • You MUST write code in english;
  • you MUST follow PSR2 and Symfony coding standard (run composer cs on your branch);
  • you MUST run the tests (run composer test);
  • you MUST comply to the MIT license;
  • you SHOULD write documentation.

If you add a new Fixer, please provide sources and references about the typographic rule you want to fix.

Compatibility & OS support restrictions

  • Windows XP : Thin No-Break Space can't be used, all other spaces are ignored, but they do not look bad (normal space).
  • Mac OS Snow Leopard : no no-break space, half no-break space, ems and en-dash but doesn't look bad (normal space).

BUT if you use a font (@font-face maybe) that contains all those glyphs, there will be no issues.

There is a known issue preventing JoliTypo to work correctly with APC versions older than 3.1.11.

What can you do to help?

We need to be able to use this tool everywhere, you can help by providing:

  • Wordpress plugin (to replace or complete wptexturize)
  • Dotclear plugin ...

Also, there is a Todo list 😙

License

This piece of code is under MIT License. See the LICENSE file.

Alternatives and other implementations

There is already quite a bunch of tools like this one (including good ones). Sadly, some are only for one language, some are running regexp on the whole HTML code (which is bad), some are not tested, some are bundled inside a CMS or a Library, some are not using proper auto-loading, some do not have an open bug tracker... Have a look by yourself:

Glossary & References

Thanks to theses online resources for helping a developer understand typography:

jolitypo's People

Contributors

alexislefebvre avatar amranich avatar clairecoloma avatar damienalexandre avatar fvaysh avatar hedicguibert avatar j0k3r avatar jenswittmann avatar joelwurtz avatar ker0x avatar killerwolf avatar lyrixx avatar marionleherisson avatar mdarse avatar pborreli avatar peter279k avatar pyrech avatar remyvanlerberghe avatar ternel avatar welcomattic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jolitypo's Issues

Release a standalone version (.phar) with external config file

In order to use JoliTypo in other contexts, it could be useful to release a .phar version, which can be configured with a .jolitypo configuration file.

It could be used with all static site generators, after build phase to fix the generated HTML files.

It looks Box could help use to generate the .phar

NoSpaceBeforeComma fixer add an unwanted space within numbers

Hi !

By using the french floating number notation (eg: 1,2), the NoSpaceBeforeComma fixer add an unwanted space after comma.

[...] « seule » 1,7 million de personnes [...]

is converted to :

[...] &laquo; seule &raquo; 1, 7&nbsp;million de personnes [...]

This has been noticed using fr_FR locale and the following options :

'Ellipsis', 'Dimension', 'Numeric', 'Dash',  'SmartQuotes', 'FrenchNoBreakSpace', 'NoSpaceBeforeComma',  'CurlyQuote', 'Hyphen', 'Trademark'

But it seems not to be a conflict between fixers nor a locale trouble (tested both in my own project and using the demo page.)

Harmonize spacing

When reformatting in set language, have the ability to harmonize spacing around punctuation:
word ! Word? Word! Word ?
To
Word! Word? Word! Word?
Or
Word ! Word ? Word ! Word ?

Nested quotes

When converting the following example using JoliTypo, nested quotes (quotes inside quotes) are not parsed and the closing quote is mistaken for an apostrophe.

Input

"This 'magic' piece of code fixes dump quotes and apostrophes."

Output

“This 'magic’ piece of code fixes dump quotes and apostrophes.” 
------^

Expected

“This ‘magic’ piece of code fixes dump quotes and apostrophes.”
------^ 

should not fix a specific text

Example:
"The man was 5'6" and 120 lbs." returns “The man was 5'6” and 120 lbs."
Should return “The man was 5'6" and 120 lbs.”

We must have something more specific than protectedTag

(cf. reddit comment)

[Insight] PHP 7 reserved words should not be used as class, interfaces or traits names - in src/JoliTypo/Fixer/Numeric.php, line 19

in src/JoliTypo/Fixer/Numeric.php, line 19

This name is a reserved word since PHP 7 and should not be used as class, interface or trait name.

use JoliTypo\StateBag;

/**
 * Add nbsp between numeric and units.
 */
class Numeric implements FixerInterface
{
    public function fix($content, StateBag $stateBag = null)
    {
        // Support a wide range of currencies
        $content = preg_replace('@([\dº])('.Fixer::ALL_SPACES.')+([º°%Ω฿₵¢₡$₫֏€ƒ₲₴₭£₤₺₦₨₱៛₹$₪৳₸₮₩¥\w]{1})@', '$1'.Fixer::NO_BREAK_SPACE.'$3', $content);

Posted from SensioLabsInsight

FrenchNoBreakSpace don't force a space before [:;!\?]

in src/JoliTypo/Fixer/FrenchNoBreakSpace.php#L27-L28

Maybe there is a reason
have a "+" instead a "*" ? because we can't force a nbsp before a ":" if there is no space

$content = preg_replace('@['.Fixer::ALL_SPACES.']+(:)@mu', Fixer::NO_BREAK_SPACE.'$1', $content);
$content = preg_replace('@['.Fixer::ALL_SPACES.']+([;!\?])@mu', Fixer::NO_BREAK_THIN_SPACE.'$1', $content);
$content = preg_replace('@['.Fixer::ALL_SPACES.']*(:)@mu', Fixer::NO_BREAK_SPACE.'$1', $content);
$content = preg_replace('@['.Fixer::ALL_SPACES.']*([;!\?])@mu', Fixer::NO_BREAK_THIN_SPACE.'$1', $content);

Rework how we build the phar

ATM, the phar is super heavy! (34.7Mb)

  • Make it much lighter (drop everything not needed)
  • use castor to build it (update the CI too)

Quotes for Switzerland are not correct

Hi,

the quotes for the de-CH locale are not set correctly. The quotes for Germany and Austria are applied wich is not correct for Switzerland. In Switzerland they use French-style angle quotation marks, see: https://en.wikipedia.org/wiki/Quotation_mark#German

In Switzerland, however, the French-style angle quotation mark sets are also used for German printed text: «A ‹B›?»
Andreas fragte mich: «Hast du den Artikel ‹EU-Erweiterung› gelesen?»
Andreas asked me: ‘Have you read the “EU Expansion” article?’

Deprecating notice message

When running the composer test, I get following deprecating message:

Remaining indirect deprecation notices (1)

  1x: Since symfony/framework-bundle 5.1: Not setting the "framework.router.utf8" configuration option is deprecated, it will default to "true" in version 6.0.
    1x in FunctionalTest::testRenderTwigViaFilter from JoliTypo\Tests\Bridge

Legacy deprecation notices (1)

UTF8 encoded extra characters

Hi !

I'm using this library to save user provided content to a database (utf8mb4 encoded field). The content is added by the user to a textarea that is then converted from markdown text to html using the markdown-it library.
I've been struggling to find out why the images included in my html didn't show up in my browser when their urls seemed to be right. Here's what I found out :

  • using JoliTypo, the content in the database, converted to ISO 8859-1 looks like this :
<p><a href="http://pubpeer.dev/stor­age/image-1492678775687.jpg" target="_self"><img src="http://pubpeer.dev/stor­age/image-1492678775687.jpg" alt="file"></a></p>
  • when I don't use JoliTypo :
<p><a href="http://pubpeer.dev/stor­age/image-1492678775687.jpg" target="_self"><img src="http://pubpeer.dev/stor­age/image-1492678775687.jpg" alt="file"></a></p>

So it seems that JoliTypo is adding UTF... characters in my content. Am I doing something wrong or is there a bug somewhere ?

Thanks,

Xavier

Fix autoload of tests classes with composer

Composer with version 1.0.5 of joliTypo gives the following notice, seems concerning mainly Tests Classes.

Deprecation Notice: Class JoliTypo\Fixer\FrenchQuotes located in ./vendor/jolicode/jolitypo/src/JoliTypo/Fixer/FrenchQuotes.php does not comply with psr-4 autoloading standard. It will not autoload anymore in Composer v2.0. in phar:///var/www/PROJECT/composer.phar/src/Composer/Autoload/ClassMapGenerator.php:185

issue deconding special character

I get a strange result when I have the character œ in a string the string seems to be returned in a different encoding.
Currently I reproduce the issue on a server but not in local. I think the issue come from the php configuration but I can't find what's wrong yet.

I reproduce the issue on the demo too https://jolitypo.jolicode.com/, using the french locale, applying all fixes and pasting the string <p>des œuvres d'art.</p>

Need help with JoliTypo and encoding errors

Hi, I've been trying to use JoliTypo for personnal use on http://borisschapira.com/ but it provokes encoding errors for accented characters. Here is an example of what I give to JoliTypo fixer (with encoding determined via mb_detect_encoding) and what JoliTypo responds :

Mentions Légales (UTF-8)
Mentions L&Atilde;&copy;gales (ASCII) 

Here is my (pretty simple) code :

``` php`
function typofr($text)
{
static $fixer;
if (!isset($fixer)) {
$fixer = new Fixer(array(
'Trademark'));
$fixer->setLocale('fr_FR');
}
$fixed = $fixer->fix($text);
return $text."<script>console && console.log('-------');console && console.log('".$text." (".mb_detect_encoding($text).")'); console && console.log('".$fixed." (".mb_detect_encoding($fixed).")')</script>";
}


And you can temporarily see the result here, in the console : http://borisschapira.com/

Prevent shy space from being added multiple times

I sometimes use JoliTypo to fix a text which has already been fixed before. If my text contains Personnalisation, after multiple saves, as many &shy; will be added, which ends up giving me this kind of content :

Person&shy;&shy;&shy;&shy;&shy;&shy;&shy;&shy;na&shy;&shy;&shy;&shy;&shy;&shy;&shy;&shy;li&shy;&shy;&shy;&shy;&shy;&shy;&shy;&shy;sa&shy;&shy;&shy;&shy;&shy;&shy;&shy;&shy;tion

A lazy solution would be to replace (with regexp) &shy;&shy; with &shy;.

Todo :

  • Check if this is happening with other fixers
  • Check if this is du to JoliTypo or org_heigl/hyphenator

Distribute a PHAR version of JoliTypo

It could be useful to make available a standalone binary of JoliTypo, to make it usable outside of the PHP ecosystem (for all JS/Go static site generators for example).

May be Box may help us to craft the PHAR version.

Add an "unfix" method

If, for some reason, I want to get the ugly keyboard typed text back, then I would like to use something like

$fixer->unfix($fixedText);

Using this method, I could even "unfix" some parts of the fixed text and keep the rest. (Like removing the hyphenation while keeping the Smart Quotes)

Ajouter les unités et monnaies

Bravo pour ce projet ! Ce serait bien d'ajouter un insécable automatiquement entre un chiffre et son unité, en suivant l'exemple de php-typography.

Issue Unit fixer , \n are removed

I have an issue with some fixers , \n are removed .
Example with this text :

## Verrückt und Harakiri 2
Am Tag nach dem

the fixer fiw the text like this

## Verrückt und Harakiri 2 Am Tag

Here the preg_match output of the regex of the fixer :
image

I found the problem is that \s match \n .
So to fix this we should replace \s by his equivalent minus \n .

I foud this in the python doculentation of re lib :

When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]. The LOCALE flag has no extra effect on matching of the space. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.

So replace \s by \t\r\f\v can be a solution

German uses both „quotes“ and »quotes«

While curly quotes are common in German, especially in script use, angular quote marks (reversed french guillemets or »Möwchen«) are more common in printed german text, books and newspapers. Please note that they are used in the reversed direction compared to French or Swiss German:

This is an »example«.
And this is an »example with another ›single quote‹ inside«.

Please provide an option for these quote marks.

Excluding shortcodes (going further : excluding regex) ?

Hi,

I'm applying JoliTypo on Wordpress contents but sometimes, these contents contains shortcodes like [[ poney ]]. Is there a way to tell Jolitypo not to replace content inside these shortcodes, like a Regex Exclusion Pattern for example ?

Expose Hyphen options

Hi,

i would be great to be able to control the options of the Org\Heigl\Hyphenator\Hyphenator instance, like wordMin, LeftMin and so on. I created my own fixer to do this but I think it would be nice to have an interface in JoliTypo to control this.

DOMDocument throw warning on HTML5 tags

PHP Warning:  DOMDocument::loadHTML(): Tag section invalid in Entity

DOMDocument doesn't know any HTML5 tag, even if the Doctype is provided.
The end result is still OK (the unknown tags are not removed) but the warning is annoying.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.