Hey @dterei !
As the title suggests, this is yet another attempt to generalise pretty
to support alternative textual representations, such that it would pave the way for #1 .
Now, I know that a lot has been written in the past, but I'm giving it another go mainly as a means for me to write down an experience report of two days of tinkering which led nowhere.
Introduction
The bigger goal is to replace GHC's copy of pretty
with this library, but to the best of my knowledge there are two main blockers:
-
GHC uses some ad-hoc textual representations for performance reasons. This doesn't go along well with pretty
's user-facing API, which exposes combinators like text
(see below).
-
There are a couple of commits by @thomie which have been reverted due to some allocation regression in the compiler.
I think 2. is orthogonal to the scope of this ticket (but certainly relevant to the bigger goal), so I'm going to focus on 1. only.
In this ticket, @bgamari summarise the issue quite neatly:
The other related issue is the inability to make use of FastString with the upstream pretty since the text combinator has type String -> Doc. This is actually a very poor interface as text needs the length of the string, which is an O(n) operation. Really it should at least allow usage of a more efficient string representation. niteria and I discussed this a few months ago and agreed that this could be resolved by simply generalizing text over its string type with a typeclass providing an unpack and length function.
Starting from this ticket, I have started exploring different options, but none of my designs seemed to lead me somewhere. Therefore I'm seeking help from you fine hackers about how to proceed. Hereby follows a couple of failed attempts with a streams of thoughts of what made me pick certain decisions over something else. I'm relying on memory here, so please do not hold anything against me if some of these ramblings are incomplete or imprecise 😉
Attempt 1: Start top-down from the combinators
I started by doing exactly what Ben suggested; I created a type class like this (naming is hard, I picked this name just to avoid confusion with other existing textual types):
class RuneSequence a where
len :: a -> Int
unpack :: a -> String
This of course allows us to generalise things like text
like the following:
text :: RuneSequence r => r -> Doc a
text s = case len s of {sl -> textBeside_ (NoAnnot (Str s) sl) Empty}
This won't compile though as it calls internally textBeside_
which relies on TextDetails
and the latter "leaks" its internals, by which I mean the Str
constructor, which expects a String
. One could argue I could then use unpack
to write the following:
text :: RuneSequence r => r -> Doc a
text s = case len s of {sl -> textBeside_ (NoAnnot (Str $ unpack s) sl) Empty}
But this left a sour taste in my mouth:
-
unpack
can be costly, especially if we do lots of it. I really don't want to make the performance of the library any worse;
-
It really doesn't solve the crux of the problem: If we compare GHC's TextDetails
, we will see it's defined like this:
data TextDetails = Chr {-# UNPACK #-} !Char -- ^ A single Char fragment
| Str String -- ^ A whole String fragment
| PStr FastString -- a hashed string
| ZStr FastZString -- a z-encoded string
| LStr {-# UNPACK #-} !LitString {-#UNPACK #-} !Int
-- a '\0'-terminated array of bytes
In order to have a chance of unifying the two libraries, one possibility I saw was to abstract away TextDetails
, which led me to the second bikeshed:
Attempt 2: Polymorphic AnnotDetails
I asked myself "Can I abstract away TextDetails altogether, so that user code can plug its own TextDetails"? In brief, write the following:
data AnnotDetails r a = AnnotStart
| NoAnnot !r {-# UNPACK #-} !Int
| AnnotEnd a
deriving (Show,Eq)
I have one big problem with this and with all its variations: it will make the r
parameter to bubble up in the Doc
type, to the point which it will become Doc r a
. You might argue this could read like "This is a Doc node annotated with an annotation of type a and an inner textual representation of type r". This though introduces a massive breaking change in the API (no good) and I suspect we'll still need to be rely on type classes anyway (like RuneSequence
or similar) as most of the combinators are using the constructors of TextDetails
anyway. So, up to the next.
Attempt 3: Add a new constructor to TextDetails
In brief, write the following:
data TextDetails r = Chr {-# UNPACK #-} !Char
| Str String
| ...
| Runes r -- user provided
This might reconcile the two worlds, but it has several problems as well:
- It leaks a new type param
r
like solution 2, so it's a no-go
- Won't save us from using type classes anyway
I have tried to remove the need for the extra type param with a RankNType
annotation like this:
data TextDetails r = Chr {-# UNPACK #-} !Char
| Str String
| ...
| Runes (forall r. RuneSequence r => r) -- user provided
Although this might work (but I doubt) I couldn't use it as TextDetails needs to derive Show
, Eq
and Generic
. The first two can be derived via StandaloneDeriving
, but the third cannot, and we cannot write it manually by hand due to the Safe
constraint we have on the package. Argh!
At this point, I felt like I have reached an impasse. Thanks for reading up to this point, and hopefully somebody can chime in and tell me if there is a simple solution to this riddle 😉
Alfredo