Giter VIP home page Giter VIP logo

Comments (4)

faho avatar faho commented on June 2, 2024 2

Fish uses our widecharwidth script that parses the unicode data files to come up with a width representation.

Notably this includes "EastAsianWidth.txt", which lists:

2600..2604;N # So [5] BLACK SUN WITH RAYS..COMET

That means anything from U+2600 to U+2604 is classified as "N", which stands for "Neutral", which means they are "narrow" and occupy one cell. (tbh I don't really understand why "neutral" even exists, and TR11 is unhelpful here)

It is also listed as "Emoji" in emoji-data.txt, but not as "Emoji_Presentation". That means you can use the U+FE0F emoji variation selector to change it from text presentation (narrow because EastAsianWidth.txt applies) to emoji presentation (wide).

Fish will treat this combination as being two cells wide. This is also what I get when I copy your ☁️ here from the browser into my terminal - not just U+2601, but U+2601 followed by U+FE0F. This is also what I get from starship's default config (but again via a browser, it's always possible the U+FE0F was inserted there).

You can see what your fish thinks the width is with string length -V:

string length -V ☁️ \U2601 \U2601\UFE0F

This should print "2", "1" and "2" - if that cloud glyph is U+2601 U+FE0F. If it prints "1" "1" "2" the glyph is just U+2601.

This seems correct to me from what I know of Unicode and TR11 specifically. It's not a great document, and it's not meant for terminals specifically, but it is the best we have and fish's interpretation seems reasonable to me.

Fish's width handling isn't perfect either (notably it does not handle full grapheme clusters but only codepoint-by-codepoint with some hacks like for variation selectors), but the easy and medium difficulty cases usually work, and I would call this a medium case.


Note: The names of $fish_emoji_width and $fish_ambiguous_width are perhaps too simple. They affect very specific things:

  • fish_ambiguous_width affects all codepoints EastAsianWidth.txt classifies as "Ambiguous" width
  • fish_emoji_width is about a change that happened in Unicode 9: It declared that all codepoints with "Emoji_Presentation" should default to wide. So $fish_emoji_width defines whether fish should honor that (because the terminal does), and so it only affects all codepoints with Emoji_Presentation by default (listed in emoji-data.txt) that were introduced before Unicode 9 (because anything after that won't be supported by a program that doesn't support Unicode 9 anyway). It is not of much use anymore because Unicode 9 is pretty old by now.

Neither applies to U+2601 because it is classified as neutral and not Emoji_Presentation.


So, in summary:

Terminal.app and iTerm obviously misrender the text. They show ☁️ (which I assume is U+2601 with the emoji selector) as being two cells wide but then don't use the second cell.

Warp apparently ignores the emoji selector and draws U+2601 with the emoji selector as occupying one cell. I believe that is wrong and would therefore call this a bug in Warp.

Note that the zsh comparison doesn't help much because zsh doesn't care about the width at all here, it just prints the prompt without repositioning. The issue comes up because fish repositions the cursor to do syntax highlighting, suggestions, right prompt handling, etc.

Our guidance: Please draw U+2601 alone into one cell, and U+2601 U+FE0F into two cells.

from fish-shell.

faho avatar faho commented on June 2, 2024 1

So, since I just tested this in 6 more terminals and 5 failed the same test (with various failure modes), here are some quotes from Unicode TRs to support our reading:

From TR51:

default emoji presentation character — A character that, by default, should appear with an emoji presentation, rather than a text presentation.
[...]
These characters have the Emoji_Presentation property.

[...]

default text presentation character — A character that, by default, should appear with a text presentation, rather than an emoji presentation.
[...]
These characters do not have the Emoji_Presentation property; that is, their Emoji_Presentation property value is No.

So a character either has default emoji or text presentation. U+2601 does not have Emoji_Presentation so it defaults to text presentation.

emoji presentation selector — The character U+FE0F VARIATION SELECTOR-16 (VS16), used to request an emoji presentation for an emoji character.

So U+FE0F can be used to get emoji presentation for something that otherwise has text presentation.

emoji presentation sequence — A variation sequence consisting of an emoji character followed by a emoji presentation selector.

So U+2601 U+FE0F is an emoji presentation sequence (it's also listed in the corresponding file).

From TR11:

East Asian Wide (W): [...] This category includes [...] characters that have the [UTS51] property Emoji_Presentation, with the exception of characters that have the [UCD] property Regional_Indicator

So anything that has Emoji_Presentation is "wide".

emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

So U+2601 U+FE0F is wide.

Neutral (Not East Asian): All other characters. Neutral characters do not occur in legacy East Asian character sets.

So, since U+2601 is text presentation by default, the emoji stuff doesn't apply, so the "Neutral" from EastAsianWidth.txt applies.


I think that should answer your question?

from fish-shell.

Advait-M avatar Advait-M commented on June 2, 2024 1

Updates:

  • Got a better handle on how variation selectors work 👍
  • unicode-width, the crate we use for helping calculate widths for Unicode characters, very recently added emoji presentation support (literally 3 days ago - unicode-rs/unicode-width#41 haha) 🔥.
    • Not in a release cut yet but I confirmed this fixes the width calculation for "\u{2601}\u{FE0F}" (previously 1, now gives correct result of 2).
  • Figured out a path forward on the Warp-side with how we should handle the full-width character followed by zero-width character to ultimately lead to a double-width character (on the rendering + spacing calculations side). We've got a rough prototype working here - I'm gonna continue working on this, which should actually help us better support Unicode emojis more broadly in Warp (outside of the fish use case too) 🙌
    • There's both the rendering + width calculations piece of this for us to tackle correctly, as you mentioned.

Really appreciate your help on this @faho and I might have some follow-ups!

Also, curious, which terminal succeeded when you tested this in 6 mentioned above haha? Hopefully Warp will be another one to add soon 😄 !!

EDIT: for anyone wondering, I believe the terminal that handles this correctly is Kitty! But Warp is coming soon!

from fish-shell.

Advait-M avatar Advait-M commented on June 2, 2024

Awesome - thank you @faho, and thanks for digging into this! Just wanted to ack these comments in the interim - I'm working through reading up on this world and understanding this more deeply 😅.

I'll likely have some follow-up questions, but this is super helpful! 🙌

from fish-shell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.