Giter VIP home page Giter VIP logo

txtmark's Introduction

Txtmark - Java markdown processor

Copyright (C) 2011-2015 René Jeschke [email protected]
See LICENSE.txt for licensing information.


Txtmark is yet another markdown processor for the JVM.

  • It is easy to use:

    String result = txtmark.Processor.process("This is ***TXTMARK***");
    
  • It is fast (see below)
    ... well, it is the fastest markdown processor on the JVM right now. (This might be outdated, but txtmark is still flippin' fast.)

  • It does not depend on other libraries, so classpathing txtmark.jar is sufficient to use Txtmark in your project.

For an in-depth explanation of markdown have a look at the original Markdown Syntax.


Maven repository

Txtmark is available on maven central.


Txtmark extensions

To enable Txtmark's extended markdown parsing you can use the $PROFILE$ mechanism:

[$PROFILE$]: extended

This seemed to me as the easiest and safest way to enable different behaviours. Just put this line into your Txtmark file like you would use reference links.

Behavior changes when using [$PROFILE$]: extended

  • Lists and code blocks end a paragraph

    In normal markdown the following:

    This is a paragraph
    * and this is not a list
    

    Will produce:

    <p>This is a paragraph
    * and this is not a list</p>
    

    When using Txtmark extensions this changes to:

    <p>This is a paragraph</p>
    <ul>
    <li>and this is not a list</li>
    </ul>
    
  • Text anchors

    Headlines and list items may recieve an ID which you can refer to using links.

    ## Headline with ID ##     {#headid}
    
    Another headline with ID   {#headid2}
    ------------------------
    
    * List with ID             {#listid}
    
    Links: [Foo] (#headid)
    

    this will produce:

    <h2 id="headid">Headline with ID</h2>
    <h2 id="headid2">Another headline with ID</h2>
    <ul>
    <li id="listid">List with ID</li>
    </ul>
    <p>Links: <a href="#headid">Foo</a></p>
    

    The ID must be the last thing on the first line.

    All spaces before {# get removed, so you can't use an ID and a manual line break in the same line.

  • Auto HTML entities

    • (C) becomes &copy; - ©
    • (R) becomes &reg; - ®
    • (TM) becomes &trade; - ™
    • -- becomes &ndash; - –
    • --- becomes &mdash; - —
    • ... becomes &hellip; - …
    • << becomes &laquo; - «
    • >> becomes &raquo; - »
    • "Hello" becomes &ldquo;Hello&rdquo; - “Hello”
  • Underscores (Emphasis)

    Underscores in the middle of a word don't result in emphasis.

    Con_cat_this
    

    normally produces this:

    Con<em>cat</em>this
    
  • Superscript

    You can use ^ to mark a span as superscript.

    2^2^ = 4
    

    turns into

    2<sup>2</sup> = 4
    
  • Abbreviations

    Abbreviations are defined like reference links, but using a * instead of a link and must be single-line only.

    [Git]: * "Fast distributed revision control system"
    

    and used like this

    This is [Git]!
    

    which will produce

    This is <abbr title="Fast distributed revision control system">Git</abbr>!
    
  • Fenced code blocks

    ```
    This is code!
    ```
    
    ~~~
    Another code block
    ~~~
    
    ~~~
    You can also mix flavours
    ```
    

    Fenced code block delimiter lines do start with at least three of `` or `~

    It is possible to add meta data to the beginning line. Everything trailing after `` or `~ is then considered meta data. These are all valid meta lines:

    ```python
    ~ ~ ~ ~ ~java
    ``` ``` ``` this is even more meta
    

    The meta information that you provide here can be used with a BlockEmitter to include e.g. syntax highlighted code blocks. Here's an example:

    public class CodeBlockEmitter implements BlockEmitter
    {
        private static void append(StringBuilder out, List<String> lines)
        {
            out.append("<pre class=\"pre_no_hl\">");
            for (final String l : lines)
            {
                Utils.escapedAdd(out, l);
                out.append('\n');
            }
            out.append("</pre>");
        }
    
        @Override
        public void emitBlock(StringBuilder out, List<String> lines, String meta)
        {
            if (Strings.isEmpty(meta))
            {
                append(out, lines);
            }
            else
            {
                try
                {
                    // Utils#highlight(...) is not included with txtmark, it's sole purpose
                    // is to show what the meta can be used for
                    out.append(Utils.highlight(lines, meta));
                    out.append('\n');
                }
                catch (final IOException e)
                {
                    // Ignore or do something, still, pump out the lines
                    append(out, lines);
                }
            }
        }
    }
    

    You can then set the BlockEmitter in the txtmark Configuration using Configuration.Builder#setCodeBlockEmitter(BlockEmitter emitter).


Markdown conformity

Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09 except of two:

  1. Images.text

    Fails because Txtmark doesn't produce empty 'title' image attributes.
    (IMHO: Images ... OK)

  2. Literal quotes in titles.text

    What the frell ... this test will continue to FAIL.
    Sorry, but using unescaped " in a title which should be surrounded by " is unacceptable for me ;)

    Change:

    Foo [bar](/url/ "Title with "quotes" inside").
    [bar]: /url/ "Title with "quotes" inside"
    

    to:

    Foo [bar](/url/ "Title with \"quotes\" inside").
    [bar]: /url/ "Title with \"quotes\" inside"
    

    and Txtmark will produce the correct result.
    (IMHO: Literal quotes in titles ... OK)


Where Txtmark is not like Markdown

  • Txtmark does not produce empty title attributes in link and image tags.

  • Unescaped " in link titles starting with " are not recognized and result in unexpected behaviour.

  • Due to a different list parsing approach some things get interpreted differently:

    * List
    > Quote
    

    will produce when processed with Markdown:

    <p><ul>
    <li>List</p>
    
    <blockquote>
     <p>Quote</li>
    </ul></p>
    </blockquote>
    

    and this when produced with Txtmark:

    <ul>
    <li>List<blockquote><p>Quote</p>
    </blockquote>
    </li>
    </ul>
    

    Another one:

    * List
    ====
    

    will produce when processed with Markdown:

    <h1>* List</h1>
    

    and this when produced with Txtmark:

    <ul>
    <li><h1>List</h1>
    </li>
    </ul>
    
  • List of escapeable characters:

    \   [   ]   (   )   {   }   #
    "   '   .   <   >   +   -   _
    !   `   ^
    

Performance comparison of markdown processors for the JVM

Remarks: These benchmarks are too old to be of any value. I leave them here as a reference, though.

Based on this benchmark suite.

Excerpt from the original post concerning this benchmark suite:

Most of these tests are of course unrealistic: Who would write a text where each word is a link? Yet they serve an important use: It makes it possible for the developer to pinpoint the parts of the parser where there is most room for improvement. Also, it explains why certain texts might render much faster in one Processor than in another.

Benchmark system:

  • Ubuntu Linux 10.04 32 Bit
  • Intel(R) Core(TM) 2 Duo T7500 @ 2.2GHz
  • Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
  • Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
TestActuariusPegDownKnockoffTxtmark
1st Run (ms)2nd Run (ms)1st Run (ms)2nd Run (ms)1st Run (ms)2nd Run (ms)1st Run (ms)2nd Run (ms)
Plain Paragraphs11275771273103774040015764
Every Word Emphasized156210011523151313982132215446
Every Word Strong112599711151114954396474441
Every Word Inline Code38227710581052911690745139
Every Word a Fast Link225716005375313980341010955
Every Word Consisting of Special XML Chars4045427029853044312377778775
Every Word wrapped in manual HTML tags33342919901896386337367362
Every Line with a manual line break51058814451440152711305656
Every word with a full link4522461045996188418198655
Every word with a full image26815011401132198519083836
Every word with a reference link98479082189561871912113611541615251380
Every block a quote445206131213014784575045
Every block a codeblock70873733761611756022
Every block a list920912172017256226515555
All tests together32812885518451961013010460206196
Benchmarked versions:

Actuarius version: 0.2
PegDown version: 0.8.5.4
Knockoff version: 0.7.3-15


Mentioned/related projects

Markdown is Copyright (C) 2004 by John Gruber
SmartyPants is Copyright (C) 2003 by John Gruber
Actuarius is Copyright (C) 2010 by Christoph Henkelmann
Knockoff is Copyright (C) 2009-2011 by Tristan Juricek
PegDown is Copyright (C) 2010 by Mathias Doenitz
PHP Markdown & Extra is Copyright (C) 2009 Michel Fortin


Project link: https://github.com/rjeschke/txtmark

txtmark's People

Contributors

demianr avatar eddieringle avatar haklop avatar malcolmsparks avatar rjeschke avatar thradec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

txtmark's Issues

CommonMark

Are you planning to support CommonMark?

support kbd tags

Like Stackoverflow and Github (the stackoverflow are prettier)

html link replace issue

There is a space in markdown process result, I fix it by change Utils.java, patch here :)

<a href="http://test.com">test</a>

to

<p><a href="http: //test.com">test</a></p>
diff --git a/src/main/java/com/github/rjeschke/txtmark/Utils.java b/src/main/java/com/github/rjeschke/txtmark/Utils.java
index 2e19c08..5339893 100644
--- a/src/main/java/com/github/rjeschke/txtmark/Utils.java
+++ b/src/main/java/com/github/rjeschke/txtmark/Utils.java
@@ -530,7 +530,7 @@
            pos = readRawUntil(out, in, pos, '/', '>');
            if(in.charAt(pos) == '/')
            {
-               out.append(" /");
+               out.append("/"); // a 'space' prefix will cause 'http://test.com' become 'http: //test.com'
                pos = readRawUntil(out, in, pos + 1, '>');
                if(pos == -1)
                    return -1;

convert HTML to Markdown

Is there any way to convert html to markdown. Basically reverse of Processor.process(str) which converts markdown to html.

target="_blank" for links

Hi,
First, congrats for your simple and efficient tool !
Is it possible to generate all links with the target="_blank". Is there any configuration trick to achieve this ?

Thanks !
Mathieu

Safe mode is not enough to be really safe

It currently allows users to "escape" the place they are supposed to be at with for example the following HTML template:

<p>
 This contains the text: 
 <div id="user-content">${mdContent}</div>
</p>

One can then create Markdown content such as:

In the box
</div></p>
<p>
 Outside the box, looks like part of the site now 
 <a href="malicious-link">go there to reset your password please</a>
</p>

I need a way to escape every unescaped < character from the input.

problem with backticks in inline code at the start of a line

Compare the result of the following markdown (use copy+paste – there’s a difference between 2. and 3.):

foo
`` `class Object` `` is a *meta literal*.
bar
foo
 `` `class Object` `` is a *meta literal*.
bar
foo
 `` `class Object` `` is a *meta literal*.
bar

(that’s a No-Break Space at the beginning of the second line)
4.

foo
other `` `class Object` `` is a *meta literal*.
bar

Here’s how the markdown is rendered by Github:

foo
class Object is a meta literal.
bar

foo
class Object is a meta literal.
bar

foo
class Object is a meta literal.
bar

foo
other class Object is a meta literal.
bar

(The line breaks are incorrect because GFM always inserts line breaks for newlines.)

And here’s how the markdown is rendered by txtmark (as used by ceylon doc):

foo
bar

foo
bar

foo class Object is a meta literal. bar

foo other class Object is a meta literal. bar

That is, if a line starts with nothing other than spaces (meaning the identity of U+0020, not the Unicode category “Separator, Space [Zs]”, which also contains the No-Break Space U+00A0 as seen in 3.) and then a multi-backtick code block containing backticks `code` with backticks`, then strange behavior occurs.

(originally reported as ceylon/ceylon-compiler#1553)

Get Markdown from HTML?

Hi,
Any plans for having a way to get the markdown from a given HTML code snippet?
Thanks

Please put artifacts in Maven Central

First, thanks for your hard work. I was surprised you haven't pushed it to Maven Central however... Putting your artifact in Maven Central will extend the reach of your project and make it easier for people to download your work.

It's not incredibly difficult to do either, and Sonatype will help you through the process if you get stuck.
http://maven.apache.org/guides/mini/guide-central-repository-upload.html#Other_Projects
https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide

How to strip all markdown to get pure text ?

Hello !

Is there a way where i can strip all the markdowns to get just pure text (not html) ?
I'd like this feature to do a simple substringed-preview, the one with the "read more" link.

Thank you,
Albert

There's a code injection vulnerability of `com.github.rjeschke.txtmark.cmd.HlUtils.highlight`

com.github.rjeschke.txtmark.cmd.HlUtils.highlight is designed to highlight code blocks. However, passing an unchecked argument to this API can lead to the execution of arbitrary commands. For instance, first, we create an instance of CodeBlockEmitter and specify the parameter program of CodeBlockEmitter to ”calc.exe”:

Configuration.Builder builder = Configuration.builder();
Class clazz = Class.forName("com.github.rjeschke.txtmark.cmd.CodeBlockEmitter");
Constructor constructor = clazz.getDeclaredConstructors()[0];
constructor.setAccessible(true);
Object cb = constructor.newInstance("UTF-8", "calc.exe");

Second, we set CodeBlockEmitter to the instance that we just created.

builder.setCodeBlockEmitter((BlockEmitter) cb);
builder.forceExtentedProfile();
Configuration config = builder.build();
System.out.println(Processor.process("```java\n123\n```\n", config));

Finally, malicious programs “calc.exe” would be executed.

Line break results in /> instead of <br/>

Configuration configuration = Configuration.builder()
    .enableSafeMode()
    .build();
String markdown = "Hello **world**  \nThis should come after newline";
String result = Processor.process(markdown, configuration);
System.out.println(result);

The output is this :

<p>Hello <strong>world</strong>  />
This should come after newline</p>

The expected result is this :

<p>Hello <strong>world</strong>  <br/>
This should come after newline</p>

auto link links...

I'm using this great library (thanks!) in conjunction with another library... http://code.google.com/p/pagedown/

One difference in the implementation is that pagedown will automatically make a regular url into a link. Heck, that is even similar to the way that the Github flavored markdown works (as you can see, the link above is made clickable).

Would it be possible to get an extension to txtmark to allow that as well? It just seems natural to allow people to write urls like that.

StringIndexOutOfBoundsException

when a line contains only a '<' character.

I'm not able to submit a pull request, but here's a patch:

$ git diff 108cee6b209a362843729c032c2d982625d12d98
diff --git a/src/main/java/com/github/rjeschke/txtmark/Line.java b/src/main/java/com/github/rjeschke/txtmark/Line.java
index 4ef1c43..2afcbc0 100644
--- a/src/main/java/com/github/rjeschke/txtmark/Line.java
+++ b/src/main/java/com/github/rjeschke/txtmark/Line.java
@@ -517,7 +517,7 @@ class Line
         final LinkedList<String> tags = new LinkedList<String>();
         final StringBuilder temp = new StringBuilder();
         int pos = this.leading;
-        if (this.value.charAt(this.leading + 1) == '!')
+        if (this.leading + 1 < this.value.length() && this.value.charAt(this.leading + 1) == '!')
         {
             if (this.readXMLComment(this, this.leading) > 0)
             {

This looks wrong

private int readXMLComment(final Line firstLine, final int start)
    {
        Line line = firstLine;
        if (start + 3 < line.value.length())
        {
            if (line.value.charAt(2) == '-' && line.value.charAt(3) == '-')
            {
...

I think that should be

            if (line.value.charAt(start + 2) == '-' && line.value.charAt(start + 3) == '-')

Adding a safe_mode parameter

Hi!

I tested your markdown processor but I couldn't find a way to escape html.
I know that the markdown documentation specify that HTML is possible, but it could be great to have a "security" parameter in case the markdown is used by other users (such as comments).

Moreover, I think you would be the first markdown processor in java that include a safe mode parameter :p

Configuration setCodeBlockEmitter not working propperly

Configuration.Builder#setCodeBlockEmitter(BlockEmitter emitter)

Example code:

Configuration.Builder c = Configuration.builder();
c.setCodeBlockEmitter(new CodeBlockEmitter());
c.setAllowSpacesInFencedCodeBlockDelimiters(true);
String res = Processor.process(markdown, c.build());

The configuration does seem to be adding, nor executing the CodeBlockEmitter() class. The code for it is similar to the example code.

conditional statement disappear

So I have this statement

~ ~ ~ ~ ~
if (a > 3) {
  moveShip(5 * gravity, DOWN);
}

What I get is for Processor.process(..)
is
a > 3) {
veShip(5 * gravity, DOWN);


Which is not the desired result.

Cannot include in Android projects

UNEXPECTED TOP-LEVEL EXCEPTION:
com.android.dex.DexException: Multiple dex files define Lcom/github/rjeschke/txtmark/Block$1;
    at com.android.dx.merge.DexMerger.readSortableTypes(DexMerger.java:596)
    at com.android.dx.merge.DexMerger.getSortedTypes(DexMerger.java:554)
    at com.android.dx.merge.DexMerger.mergeClassDefs(DexMerger.java:535)
    at com.android.dx.merge.DexMerger.mergeDexes(DexMerger.java:171)
    at com.android.dx.merge.DexMerger.merge(DexMerger.java:189)
    at com.android.dx.command.dexer.Main.mergeLibraryDexBuffers(Main.java:454)
    at com.android.dx.command.dexer.Main.runMonoDex(Main.java:303)
    at com.android.dx.command.dexer.Main.run(Main.java:246)
    at com.android.dx.command.dexer.Main.main(Main.java:215)
    at com.android.dx.command.Main.main(Main.java:106)

Link references should not be processed inside HTML elements

Link references are being processed in the wrong place. The following Markdown:

## Example

<pre>
[1]: blah
</pre>

Is converted into this:

<h2>Example</h2>
<pre>
</pre>

When it should be:

<h2>Example</h2>
<pre>
[1]: blah
</pre>

This may explain why your Markdown parser is faster than others!

Question on Lists

I have the following html:

  • one <br />
  • two <br />
  • three <br />

... but this is not interpreted as a list . Is it because of br tags?

David

Incorrect rendering of pre-formatted text

Triple backticks are rendered as <pre> when I set forceExtended in the config. However there are a couple of issues with the text.

  1. Everything from the block appears on the same line. (This could be intended, but I'm not certain)
  2. The first few characters on every line disappears

Example

```
go test -cover // shows you what percentage of the code is covered by tests
go tool cover -html=coverage.out //opens a web browser which shows you which lines aren't covered
```

random text

```
func main() {
    return nil
}
```

renders as

forceExtendedProfile = True

<p>From <a href="http://blog.golang.org/cover">a post</a> on the Go Blog</p>
    <pre><code>est -cover // shows you what percentage of the code is covered by tests
    ool cover -html=coverage.out //opens a web browser which shows you which lines aren't covered
    </code></pre>
    <p>random text</p>
    <pre><code> main() {
    return nil
    </code></pre>

forceExtendedProfile = False

<p>From <a href="http://blog.golang.org/cover">a post</a> on the Go Blog</p>
    <p><code>`
    go test -cover // shows you what percentage of the code is covered by tests
    go tool cover -html=coverage.out //opens a web browser which shows you which lines aren't covered
    </code>`</p>
    <p>random text</p>
    <p><code>`
    func main() {
    return nil
    }
    </code>`</p>

Back ticks fail to terminate a meta named fenced codeblock.

Added a custom block emitter to process blocks like


``` dot

digraph M1{ 
	node[shape=box width=1.1]
	dot[label="Graphviz\nDOT"]
	zestCode[label="Zest\ngraph"]
	zestVis[label="SWT\napp"]
	image[label="Image\nfile"]
	
	dot->image//[label=" Graphviz"]
	dot->zestCode[constraint=false color=black style=dashed label="            " dir=both]
	zestCode->zestVis//[label=" Zest"]	
}

~~~

Problem is that when the trailing delimiter of the code block is three back-ticks: ```, the list of lines delivered to the custom BlockEmitt#emitBlock includes every line from the first line of fenced code to the end of the document.

If the trailing delimiter is changed to ~~~, as show above, TxtMark behaves correctly.

If the custom block emitter is not used, the problem remains when using the trailing back-tick delimiter.

Suggests that something in this Dot content is messing with TxtMark. Just don't see what it is.

Question: profile: extended

Hi,

How does [$PROFILE$]: extended work. I put it on top of a string and ran it through markdown.geHtmlContent(myString) and it prints [$PROFILE$]: extended without executing it.

Triple backtick (```) code blocks

Is there a plan to support Girhub style triple back ticks for code blocks? It seems that it does not work at the moment.

I'd be happy to add it if you could give me some guidance on how to start.
Thanks

maven not up to date

The maven repository dates back two years ago.

Are you not publishing there anymore or are you not publishing snapshots? Both are highly appreciated

How to use it?

txtmark.Processor.process(post.content)

where is the reference to txtmark?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.