haskell / pretty Goto Github PK

Haskell Pretty-printer library

License: Other

Haskell 100.00%

pretty's Introduction

Pretty : A Haskell Pretty-printer library

Pretty is a pretty-printing library, a set of API's that provides a way to easily print out text in a consistent format of your choosing. This is useful for compilers and related tools.

It is based on the pretty-printer outlined in the paper 'The Design of a Pretty-printing Library' by John Hughes in Advanced Functional Programming, 1995. It can be found here.

Licensing

This library is BSD-licensed.

Building

The library uses the Cabal build system, so building is simply a matter of running:

cabal sandbox init
cabal install "QuickCheck >= 2.5 && < 3"
cabal install --only-dependencies
cabal configure --enable-tests
cabal build
cabal test

We have to install QuickCheck manually as otherwise Cabal currently throws an error due to the cyclic dependency between pretty and QuickCheck.

If cabal test freezes, then run cabal test --show-details=streaming instead. This is due to a bug in certain versions of Cabal.

Get involved!

We are happy to receive bug reports, fixes, documentation enhancements, and other improvements.

Please report bugs via the github issue tracker.

Master git repository:

git clone git://github.com/haskell/pretty.git

Authors

This library is maintained by David Terei, [email protected]. It was originally designed by John Hughes's and since heavily modified by Simon Peyton Jones.

pretty's People

Contributors

Stargazers

Watchers

pretty's Issues

Benchmark Suite

It's hard to iterate on Pretty as it's a fairly performance critical library but there is no benchmark suite for it. Add one. There is some code in the branch for this but it needs to be put into a good state so it can be easily run. Also look at this simple benchmark for improving:

http://hackage.haskell.org/trac/ghc/ticket/3339#comment:22

Doc 'Eq' instance fails Substitutivity

Consider the following code:

module Main (main) where
import Prelude hiding ((<>))
import Text.PrettyPrint

main :: IO ()
main = do
  let s1 = text "if:" $+$ nest 4 (text "pass")
  let s2 = text "if:\n    pass"
  let f = (text "x" <>)

  --not following Eq's Substitutivity 'law':
  --first is True but second is False
  print $   s1 ==   s2
  print $ f s1 == f s2

  --examine the values visually:
  print $   s1
  print $   s2
  print $ f s1
  print $ f s2
  --outputs the following (note extra space in the 3rd):
  -- if:
  --     pass
  -- if:
  --     pass
  -- xif:
  --      pass
  -- xif:
  --     pass

If this is not the intended behavior of the various functions involved (<>, $+$, nest, ==), then there is a bug that should be fixed. If this is the intended behavior, then the documentation on Eq should alert the reader to the fact that the Substitutivity Law is not followed.

pretty-1.1.2.0 can't compile its test suite

A complete build log is at http://hydra.cryp.to/build/471422/nixlog/1/raw. The relevant part is:

Preprocessing test suite 'test-pretty' for pretty-1.1.2.0...

tests/TestStructures.hs:15:8:
    Could not find module ‘PrettyTestVersion’
    Use -v to see a list of the files searched for.

tests/UnitT3911.hs:5:8:
    Could not find module ‘TestUtils’
    Use -v to see a list of the files searched for.

pretty printing type class

Can we add a pretty printing type class similar to the Pretty class in ansi-wl-pprint? Then we wouldn't need separate packages like prettyclass and pretty-class. A class provided by pretty itself would also get wider adoption.

`fsep` and `fcat` are confusing as well. Does anyone know what "Paragraph fill" mean in this context?

fsep and fcat are confusing as well. Does anyone know what "Paragraph fill" mean in this context?

"Paragraph fill" version of sep.

"Paragraph fill" version of cat.

Originally posted by @banacorn in #48 (comment)

Support alternative textual representations

Hey @dterei !

As the title suggests, this is yet another attempt to generalise pretty to support alternative textual representations, such that it would pave the way for #1 .

Now, I know that a lot has been written in the past, but I'm giving it another go mainly as a means for me to write down an experience report of two days of tinkering which led nowhere.

Introduction

The bigger goal is to replace GHC's copy of pretty with this library, but to the best of my knowledge there are two main blockers:

GHC uses some ad-hoc textual representations for performance reasons. This doesn't go along well with pretty's user-facing API, which exposes combinators like text (see below).
There are a couple of commits by @thomie which have been reverted due to some allocation regression in the compiler.

I think 2. is orthogonal to the scope of this ticket (but certainly relevant to the bigger goal), so I'm going to focus on 1. only.

In this ticket, @bgamari summarise the issue quite neatly:

The other related issue is the inability to make use of FastString with the upstream pretty since the text combinator has type String -> Doc. This is actually a very poor interface as text needs the length of the string, which is an O(n) operation. Really it should at least allow usage of a more efficient string representation. niteria and I discussed this a few months ago and agreed that this could be resolved by simply generalizing text over its string type with a typeclass providing an unpack and length function.

Starting from this ticket, I have started exploring different options, but none of my designs seemed to lead me somewhere. Therefore I'm seeking help from you fine hackers about how to proceed. Hereby follows a couple of failed attempts with a streams of thoughts of what made me pick certain decisions over something else. I'm relying on memory here, so please do not hold anything against me if some of these ramblings are incomplete or imprecise 😉

Attempt 1: Start top-down from the combinators

I started by doing exactly what Ben suggested; I created a type class like this (naming is hard, I picked this name just to avoid confusion with other existing textual types):

class RuneSequence a where
    len :: a -> Int
    unpack :: a -> String

This of course allows us to generalise things like text like the following:

text :: RuneSequence r => r -> Doc a
text s = case len s of {sl -> textBeside_ (NoAnnot (Str s) sl) Empty}

This won't compile though as it calls internally textBeside_ which relies on TextDetails and the latter "leaks" its internals, by which I mean the Str constructor, which expects a String. One could argue I could then use unpack to write the following:

text :: RuneSequence r => r -> Doc a
text s = case len s of {sl -> textBeside_ (NoAnnot (Str $ unpack s) sl) Empty}

But this left a sour taste in my mouth:

unpack can be costly, especially if we do lots of it. I really don't want to make the performance of the library any worse;
It really doesn't solve the crux of the problem: If we compare GHC's TextDetails, we will see it's defined like this:

data TextDetails = Chr  {-# UNPACK #-} !Char -- ^ A single Char fragment
                 | Str  String -- ^ A whole String fragment
                 | PStr FastString                      -- a hashed string
                 | ZStr FastZString                     -- a z-encoded string
                 | LStr {-# UNPACK #-} !LitString {-#UNPACK #-} !Int
                   -- a '\0'-terminated array of bytes

In order to have a chance of unifying the two libraries, one possibility I saw was to abstract away TextDetails, which led me to the second bikeshed:

Attempt 2: Polymorphic `AnnotDetails`

I asked myself "Can I abstract away TextDetails altogether, so that user code can plug its own TextDetails"? In brief, write the following:

data AnnotDetails r a = AnnotStart
                    | NoAnnot !r {-# UNPACK #-} !Int
                    | AnnotEnd a
                      deriving (Show,Eq)

I have one big problem with this and with all its variations: it will make the r parameter to bubble up in the Doc type, to the point which it will become Doc r a. You might argue this could read like "This is a Doc node annotated with an annotation of type a and an inner textual representation of type r". This though introduces a massive breaking change in the API (no good) and I suspect we'll still need to be rely on type classes anyway (like RuneSequence or similar) as most of the combinators are using the constructors of TextDetails anyway. So, up to the next.

Attempt 3: Add a new constructor to TextDetails

In brief, write the following:

data TextDetails r = Chr  {-# UNPACK #-} !Char
                 | Str  String
                 | ...
                 | Runes r                     -- user provided

This might reconcile the two worlds, but it has several problems as well:

It leaks a new type param r like solution 2, so it's a no-go
Won't save us from using type classes anyway

I have tried to remove the need for the extra type param with a RankNType annotation like this:

data TextDetails r = Chr  {-# UNPACK #-} !Char
                 | Str  String
                 | ...
                 | Runes (forall r. RuneSequence r => r)                    -- user provided

Although this might work (but I doubt) I couldn't use it as TextDetails needs to derive Show, Eq and Generic. The first two can be derived via StandaloneDeriving, but the third cannot, and we cannot write it manually by hand due to the Safe constraint we have on the package. Argh!

At this point, I felt like I have reached an impasse. Thanks for reading up to this point, and hopefully somebody can chime in and tell me if there is a simple solution to this riddle 😉

Alfredo

Linked paper gives 403

The link to the paper mentioned in the README.md http://www.cs.chalmers.se/~rjmh/Papers/pretty.ps gives me a "403 Forbidden" (right now).

Test Suite

There is no test suite for Pretty. Add one. There is some testing code in the repo but fix it up and get it into an easily callable state.

The `testLargeDoc` doesn't test anything

@Peaker added the testLargeDoc test in #9, to test for stack overflows when running pretty on large documents.

Ever since ghc-7.8.4, GHC sets the default stack size to 80% physical memory size, instead of the previous 8MB. This makes the test useless as it currently is.

It would be better to compile the test into a separate executable, and running it with something like +RTS -K10 (very small stack size works, nice!). Then, the document can be made smaller as well, which makes the test much faster to run.

NFData instance

Would it be feasible to add an NFData instance for the Doc type, etc.? This would probably bee needed for #2. It does add a new dependency, but both deepseq and pretty are shipped with GHC so I don't think it's an unreasonable dependency.

Bugfix: overlap and f?(cat|sep)

The pretty source code currently contains two TODOs:

    -- XXX: TODO: PRETTY: Used to use True here (but GHC used False...)
     nilAboveNest False k (reduceDoc (vcat ys))

    -- XXX: TODO: PRETTY: Used to use True here (but GHC used False...)
     `mkUnion` nilAboveNest False k (fill g (y:ys))

I think we should go back to using True. From https://mail.haskell.org/pipermail/libraries/2008-June/009991.html (commit 1e50748):

2) Bugfix: overlap and f?(cat|sep)

The specification for cat/sep:
  * oneLiner (hcat/hsep ps)
   `union`
    vcat ps [*]

But currently cat, sep, fcat and fsep attempt to overlap the second  
line with the first one, i.e. they use
`foldr ($$) empty ps' instead of `foldr ($+$) empty ps' [*]. I assume  
this is a mistake.

This bug can lead to situations, where the line in the right argument  
of Union is actually longer:

 > prettyDoc$ cat [ text "a", nest 2 ( text "b") ]
 >> text "a"; union
 >>           (text "b"; empty)
 >>           (nilabove; nest 1; text "b"; empty)

 > renderStyle (Style PageMode 1 1) $ cat [ text "a", nest 2 ( text  
"b") ]
 >> "a b"

In the implementation, we call `nilAbove False' instead of `nilAbove  
True' (see patch).

Use `Semigroup((<>))` and define Monoid/Semigroup instances

Ideally this needs to happen in time for GHC 8.0 as pretty is bundled with GHC 8

See ekmett/ansi-wl-pprint@dd40c61 for how the (<>) part can be handled

The Semigroup part is a bit tricky but doable as well. I'll probably prepare a PR as that's easier than to explain how to do it =)

/cc @ekmett

Cut a release for GHC 8.4

The GHC 8.4.1 release is quickly approaching and i would like to have all of the submodules finalized by next alpha. For this we'll need a new release.

Fails to compile with GHC 7.0.4 and GHC 7.2.2 since 1.1.2.0

Here's the build-errors that resulted:

for GHC 7.0

cabal: The package pretty-1.1.2.1 requires the following language extensions
 which are not supported by ghc-7.0.4: DeriveGeneric
 cabal: Error: some packages failed to install:
 pretty-1.1.2.1 failed during the configure step.

and for GHC 7.2

 src/Text/PrettyPrint/HughesPJ.hs:78:1:
     deepseq-1.4.0.0:Control.DeepSeq can't be safely imported! The module itself isn't safe.
 cabal: Error: some packages failed to install:
 pretty-1.1.1.3 failed during the building phase.

This breaks build-plans for GHC 7.0 and GHC 7.2; the easiest way to fix this would be to add a base >= 4.5 constraint to future releases (I've alrady fixed up the meta-data on Hackage for the past two releases) and effectively drop support for those GHCs.

Support ByteString in TextDetails

As title suggest. Could also consider supporting Text.

Optimized rendering function for infinite band width case

In the case of rendering for an infinite band width we needn't do any layout (e.g. backtracking) at all. Providing a rendering function which optimizes for this case should improve performance in these cases considerably. In particular GHC would benefit greatly from this as it uses pretty to produce large quantities of assembler code.

exports and (cereal) Serialize instance

Hi,

For a program I am developing, I need to serialize a data type which contains Doc's in it. I am using cereal, but obviously can't give Doc a Serialize instance because it is an abstract type (i.e. constructors not exported). I am currently using the following instance:

instance Serialize Doc where
    put = put . show
    get = liftM text get

This instance goes through the Serialize instance of String and it clearly looses some information. (Like nesting and optimal width stuff if the Doc is used within another Doc after serialisation/deserialisation.)

One easy solution is exporting the constructors of Doc from a .Internal module. Failing that, what about deriving a GHC.Generics.Generic instance for Doc so I can use generic deriving mechanisms to get the Serialize instance?

Either way, I am happy to prepare a patch. Just let me know if you want one.

Thanks, Ozgur.

Modify structure of an existing Doc

Currently, the Doc constructors are not exported. It’s therefore not possible to reorganise the structure of an existing Doc.

Sometimes it could be handy. In my case, I'm parsing an AST to SQL. A query with a join clause is generally written like this:

SELECT *
FROM Table1
INNER JOIN Table2
ON Table1.table2Id = Table2.table2Id

The join starts from "Table1". So, the code haskell code will look something like (this isn't real code but just an example, to illustrates what happens with regards to pretty parsing):

parseFrom :: From -> Doc
parseFrom (From join) = "FROM" <+> parseJoin join

parseJoin :: Join -> Doc
parseJoin (Join table1 table2 clause) =
    parseTable table1 $+$ ("INNER JOIN" <+> parseTable table2) $+$ parseClause clause

This code will generate the following (when included in the whole):

SELECT *
FROM Table1
     INNER JOIN Table2
     ON Table1.table2Id = Table2.table2Id

To get the inner join aligned as desired, a first approach would be to change the above code.
However, (correct me if I’ve missed something!) I don't see another solution than parsing the join directly in the parseFrom function, which would reduce the modularity (imagine we need to parse a join somewhere else than in a FROM).

If it would be possible to have access to the Doc constructors one could do something like this:

-- | Return the first document contained in a document.
--   
--   It can be used in combination with 'docTail' to modify the composition
--   of a document.
docHead :: Doc -> Doc
docHead Empty                = Empty
docHead (NilAbove d)         = d
docHead d@(TextBeside _ _ _) = d
docHead (Nest _ d)           = d
docHead (Union _ d)          = d
docHead NoDoc                = NoDoc
docHead (Beside d _ _)       = d
docHead (Above d _ _)        = d 

-- | Return the second document contained in a document if existing.
--   Otherwise, return 'Nothing'.
--   
--   It can be used in combination with 'docHead' to modify the composition
--   of a document.
docTail :: Doc -> Maybe Doc
docTail (Union _ d)        = Just d
docTail (Beside _ _ d)     = Just d
docTail (Above _ _ d)      = Just d
docTail _                  = Nothing

Now, to get the desired result the parseFrom function becomes:

parseFrom :: Join -> Doc
parseFrom (From join) =
"FROM" <+> docHead doc $+$ fromMaybe empty (docTail doc)
where
    doc = parseJoin join

Of course, rather than providing access to the Doc constructor, an alternative would be to include functions like docHead and docTail in the PrettyPrint library.

Support more instances of Pretty typeclass

Should support:

Int[8,16,32,64]
Word
Word[8,16,32,64]
ByteString - Lazy & Strict
Text?
(Doc a)

Next release?

Just wondering when the next release will be, so that I can start depending on the annotation stuff :)

Document the word "ribbon"

The word "ribbon" does not appear in the linked paper, and it is not defined anywhere in the documentation or the source code. Seems important tho...

Replace GHC copy

GHC has an internal copy of pretty. I've gone and merged any improvements from GHC into pretty, so we can be safe that the code in pretty is the best code. However GHC uses some different base types for performance. We need to offer better base types than String. Say ByteString and Text and also offer a builder approach to the render function for performance. After that improvement should be able to change GHC to use the pretty library.

prettyclass should be marked deprecated in favor of pretty

http://hackage.haskell.org/package/prettyclass-1.0.0.0

Missing Pretty instance for Doc itself

should be a straightforward pPrint = id. I'm wondering if there is a particular reason behind not implementing this instance?

Missing exportations in the Text.PrettyPrint module

The documentation for Text.PrettyPrint says:

This module should be used as opposed to the HughesPJ module. Both are equivalent though as this module simply re-exports the other.

However, for example the function maybeParens defined in Text.PrettyPrint.HughesPJ is not exported by Text.PrettyPrint:

import Text.PrettyPrint

foo :: Bool -> Doc -> Doc
foo = maybeParens

-- Test.hs:5:7: error:
--    Variable not in scope: maybeParens :: Bool -> Doc -> Doc

Note also that in the above documentation, the module name 'HughesPJ' should be 'Text.PrettyPrint.HughesPJ'.

pretty version: 1.1.3.3.

Adding emptyLine document

Empty lines are useful to separate blocks in printed programs, but unless I'm getting the API wrong, currently only way to produce a empty newline is to use text "". empty doesn't work, because it's unit of $$ and $+$ .

The problem with text "" is that when combined with nest, it produces trailing white space. For example, this:

nest 4 (text "" $+$ text "block")

Generates this:

    $
    block$

I propose implementing an emptyLine document, like this:

emptyLine :: Doc a
emptyLine = NilAbove Empty

This works fine, for example, this program:

print $ nest 4 $ emptyLine $$ text "blah"
print $ nest 4 $ text "blah" $+$ emptyLine

prints this:

$
    blah$
    blah$
$
$

Which is not bad, but now it's printing an extra new line at the end.

So I think there are two problems:

This use of NilAbove is actually invalidating some internal invariants. Even though I couldn't manage to produce a broken example with this new doc, in the code it's mentioned that:
```
1) The argument of NilAbove is never Empty. Therefore
   a NilAbove occupies at least two lines.
```
Which this change clearly invalidates.
We need to make some changes in some of the combinators to handle extra newline printed in the case of text "blah" $+$ emptyLine. I think it's printed because of this invalidated invariant.

So at this point I'm hoping to get some comments and ideas. Do you think this is a correct way of doing this? Is there any other way to produce new lines without producing trailing white spaces?

Also, some help with changing internals would be great.

Thanks.

UPDATE: I just tried this program:

print $ nest 4 $ emptyLine $$ (nest 4 $ text "blah")
print $ nest 4 $ (nest 4 $ text "blah") $+$ emptyLine

And it produced this:

$
        blah$
        blah$
$
    $

It's good to see that next line after the blah is not indented. Last indented empty line should be fixed when we remove that line.

UPDATE 2: Here's a broken example:

print $ nest 4 $ emptyLine <> (nest 4 $ text "blah")
print $ nest 4 $ (nest 4 $ text "blah") <> emptyLine

Output:

$
    blah$
        blah$
            $

An emptyLine should never be indented, that's the whole point of it. Documentation sometimes mention "height" and "width", maybe we should be able to say "emptyLine has height 1 and width 0".

Build issues with tests?

Any ideas how to compile the package?

➜  pretty git:(master) ✗ cabal install --enable-tests                                                           
Resolving dependencies...
cabal: internal error: could not construct a valid install plan.
The proposed (invalid) plan contained the following problems:
The following packages are involved in a dependency cycle QuickCheck-2.7.6, template-haskell-2.9.0.0, pretty-1.1.3.1
Proposed plan:
Configured QuickCheck-2.7.6 (.fake.QuickCheck-2.7.6)
Configured pretty-1.1.3.1 (.fake.pretty-1.1.3.1)
Configured primitive-0.5.4.0 (.fake.primitive-0.5.4.0)
Configured random-1.1 (.fake.random-1.1)
Configured template-haskell-2.9.0.0 (.fake.template-haskell-2.9.0.0)
Configured tf-random-0.5 (.fake.tf-random-0.5)
PreExisting array-0.5.0.0 (array-0.5.0.0-b8a3e03cc1fe2faa719c34f245086f0e)
PreExisting base-4.7.0.1 (base-4.7.0.1-1a55ebc8256b39ccbff004d48b3eb834)
PreExisting rts-1.0 (builtin_rts)
PreExisting containers-0.5.5.1 (containers-0.5.5.1-0d8db9193d3e3371e0142bcc8a4a0721)
PreExisting deepseq-1.3.0.2 (deepseq-1.3.0.2-8f63133c1b77f3b3190f04893cf340e4)
PreExisting ghc-prim-0.3.1.0 (ghc-prim-0.3.1.0-954cb57749cf319beafdc89b3415422c)
PreExisting integer-gmp-0.5.1.0 (integer-gmp-0.5.1.0-d42e6a7874a019e6a0d1c7305ebc83c4)
PreExisting old-locale-1.0.0.6 (old-locale-1.0.0.6-09baf1dbc5be8338e5eba7c5bb515505)
PreExisting time-1.4.2 (time-1.4.2-d6766dce59812a4b19375d9595549a8b)
PreExisting transformers-0.3.0.0 (transformers-0.3.0.0-16a97696ae672940f1523b262e566ba5)

[Documentation] ribbonsPerLine

Currently ribbonsPerLine is described as

Ratio of ribbon length to line length

I think the correct description should be:

Ratio of line length to ribbon length

or equivalently:

Number of ribbons that fit on a line

Relax upper bound on `deepseq` to allow `deepseq-1.4`

Add "depending on what fits" after "either X or Y", or similar

I was rather puzzled by these Haddock lines until I read the CHANGELOG and it was clear that it means "one or the other, depending on what fits". Could we please either

add "depending on what fits", or perhaps even better
use "X if it fits, Y otherwise"

Thanks

pretty/src/Text/PrettyPrint/Annotated/HughesPJ.hs

Line 723 in e646037

-- | Either 'hsep' or 'vcat'.
pretty/src/Text/PrettyPrint/Annotated/HughesPJ.hs

Line 727 in e646037

-- | Either 'hcat' or 'vcat'.
pretty/src/Text/PrettyPrint/HughesPJ.hs

Line 390 in e646037

-- | Either 'hsep' or 'vcat'.
pretty/src/Text/PrettyPrint/HughesPJ.hs

Line 395 in e646037

-- | Either 'hcat' or 'vcat'.

pretty-1.1.3.1 fails Haddock phase

Citing from http://hydra.cryp.to/build/618012/nixlog/2/raw:

Running Haddock for pretty-1.1.3.1...
Running hscolour for pretty-1.1.3.1...
Preprocessing library pretty-1.1.3.1...
Preprocessing test suite 'test-pretty' for pretty-1.1.3.1...
Preprocessing library pretty-1.1.3.1...

src/Text/PrettyPrint/Annotated/HughesPJ.hs:1082:31:
    parse error on input ‘-- ^ this seems wrong, but remember that it's
                              -- working backwards at this point’

Quadratic performance issues

Try this:

import Text.PrettyPrint
import System.Environment
main = do
  [ s ] <- getArgs
  print $ iterate ( \ x -> fsep [ text "a" , x <+> text "b" ] ) empty !! read s

on my machine:

input | runtime in sec

100   |  0.1
200   |  1.1
300   |  4.4
400   | 11.4

This testcase is simplified from https://ghc.haskell.org/trac/ghc/ticket/7666
I think this bug (gut feeling - some quadratic behaviour) has been sitting there for a long time.
I do think this is serious. I think it also hurts haddock.

pretty-1.1.3.2 for ghc-8.0.0.20160111 and pretty-1.1.2.0 for ghc-7.10.3