Giter VIP home page Giter VIP logo

doclayout's Issues

Wrong length for curly apostrophe

Prelude Text.DocLayout Data.List> literal "a’s"
Text 2 "a\8217s"

Should be length 3. I found this after noting a bunch of wrapping-related test failures in pandoc from the new doclayout release.

@Xitian9 can you see the problem? I believe this is due to your changes in real length calculation code.

Spacing combining characters should still increase width

Some combining characters, those with general character class Mc: Mark, spacing combining, should actually add to the length of the text, even though they combine. These characters are commonly used in abugidas like Devanagari.

charWidth gives incorrect result for emoji

Emoji are supposed to be displayed as 2 characters wide, apparently since Unicode 9. However, here they are treated as 1 character wide.

Here is a list of emoji in Unicode 14 (https://unicode.org/emoji/charts/full-emoji-list.html). Things can get pretty ugly with zero-width combiners, but we can probably improve on the current situation.

c.f. https://bugs.launchpad.net/ubuntu/+source/gnome-terminal/+bug/1665140
https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/9

Performance could be improved when determining character width

I've been looking into improving the performance of realLength, and reducing our reliance on shortcuts. I have sought advice and benchmarked a few different approaches on stackexchange.

The long and the short of it is that it seems we can do better, but we also have some choices to make. By far the best approach is making a giant unboxed array the size of all unicode points, and performing an array lookup. This will improve performance for all characters (except for ASCII control characters), but involves a significant memory and set-up cost. On my system it requires about 368MiB of memory, and about 150ms set-up cost.

Is this worthwhile? If you're working on ASCII the set-up cost is paid off after about 150 million lookups, while for text without shortcuts the payoff will come after only about 6 million lookups. But we would get a huge savings in code complexity, with no more shortcuts needed at all.

There are other improvements that can be made as well, in particular writing the binary search tree directly, allowing it to be specialised for our use case. This would not give as dramatic a speedup, but may allow us to maintain ASCII performance and get away with fewer shortcuts.

Rendering bug with certain inputs

See jgm/pandoc#8711

ghci> render Nothing $ mconcat [Block 71 ["a","","b"]]
*** Exception: renderList encountered [Empty,CarriageReturn,Text 1 "b"]
CallStack (from HasCallStack):
  error, called at src/Text/DocLayout.hs:453:21 in doclayout-0.4-inplace:Text.DocLayout

Support indexed and 24 bit colors

The merged-in color support is limited to the 8-color ANSI palette. There should be rendering support and a combinator API for coloring text with the 256-color (indexed) palette and 24 bit/true color.

`Styled` documents interact poorly with line breaking.

The inner document of a Styled can be a Concat, but as written, unfoldD won't unfold that document. The ultimate effect, via the definition of offsetOf, is that Styled text will exceed the line length when output because renderList (BreakingSpace : xs) can't correctly measure the offset of a Styled following a BreakingSpace.

It's not readily apparent what the right adaptation is here. Sprinkling cases around like unfoldD (Styled f x) = Styled f <$> unfoldD x and offsetOf (Styled _ x) = offsetOf x works towards addresses the line-breaking issue, but that then breaks how nested styles are flattened when outputting attributed text. That suggests we have to do some sort of further intermediate step but I'd have to think pretty hard about a good way of doing that.

Wrong character width in full-width symbol

This is my source markdown.

+---------+---------+---------+
|         | column1 | column2 |
+:========+:=======:+:=======:+
| row1    | x       | a       |
+---------+---------+---------+
| row2    | ◯      | a       |
+---------+---------+---------+
| row3    | ✕      | a       |
+---------+---------+---------+
| row4    | あ      | a       |
+---------+---------+---------+

I got following result:

<table style="width:42%;">
<colgroup>
<col style="width: 13%" />
<col style="width: 13%" />
<col style="width: 13%" />
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;"></th>
<th style="text-align: center;">column1</th>
<th style="text-align: center;">column2</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">row1</td>
<td style="text-align: center;">x</td>
<td style="text-align: center;">a</td>
</tr>
<tr class="even">
<td style="text-align: left;">row2</td>
<td style="text-align: center;">◯ |</td>
<td style="text-align: center;">a</td>
</tr>
<tr class="odd">
<td style="text-align: left;">row3</td>
<td style="text-align: center;">✕ |</td>
<td style="text-align: center;">a</td>
</tr>
<tr class="even">
<td style="text-align: left;">row4</td>
<td style="text-align: center;">あ</td>
<td style="text-align: center;">a</td>
</tr>
</tbody>
</table>

There is a problem on the next line.

<td style="text-align: center;">◯ |</td>

and

<td style="text-align: center;">✕ |</td>

These results include | character.

I can modify the source markdown to get the expected result as follows.

+---------+---------+---------+
|         | column1 | column2 |
+:========+:=======:+:=======:+
| row1    | x       | a       |
+---------+---------+---------+
| row2    | ◯       | a       |
+---------+---------+---------+
| row3    | ✕       | a       |
+---------+---------+---------+
| row4    | あ      | a       |
+---------+---------+---------+

However, it is not beautiful.

I think it's a half-width and full-width misjudgment.
and are full width character as well as .

Command line

sudo docker run --rm --mount type=bind,source=$(pwd),destination=/data pandoc/core -o out.html src.md

Version

# pandoc --version
pandoc 2.14.2
Compiled with pandoc-types 1.22, texmath 0.12.3.1, skylighting 0.11,
citeproc 0.5, ipynb 0.1.0.1
User data directory: /root/.local/share/pandoc
Copyright (C) 2006-2021 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.