kowainik / tomland Goto Github PK

View Code? Open in Web Editor NEW

120.0 8.0 39.0 704 KB

🏝 Bidirectional TOML serialization

Home Page: https://kowainik.github.io/posts/2019-01-14-tomland

License: Mozilla Public License 2.0

Haskell 99.97% Nix 0.03%

haskell toml-parser toml configuration profunctors bidirectional hacktoberfest

tomland's People

Contributors

Stargazers

Watchers

tomland's Issues

Implement type-safe version of TOML ast

So, currently TOML Value data type allows to have values of different type inside Array. I propose to solve this problem by introducing new TValue data type (or renaming current Value to UValue aka _untyped value) with GADT and DataKinds feature. Basically:

data TomlType = TBool | ...

data Value (t :: TomlType) where
    Bool :: Bool -> Value TBool
    ...

Basically, approach starting from this slide should be used:

http://slides.com/fp-ctd/lecture-15#/8

I guess we should have UValue and typeCheck :: UValue -> Maybe Value function. Not sure though...

Also, several problems need to be solved. For example, this type also should have typed and untyped version but it's not obvious how to modify it... Let's discuss and see whether it makes sense!

data TOML = TOML
    { tomlPairs       :: HashMap Key     Value
    , tomlTables      :: HashMap TableId TOML

Support array parsing in bidirectional conversion

Introduce unidirectional decoder (and probably encoder)

Sometimes you don't want to write parsers or showers because they are usually harder to write. So to make things easier it's better to have unidirectional encoders and decoders. Though it's not clear how to support encode and decode functions for both bi and uni...

Non-bijectional 'dimapBijection'

dimapBijection works good for things like Num and newtypes but if you have functions like:

parseMyType :: Text -> Maybe MyType
showMyType  :: MyType -> Text

you will have some problems with dimapping just pair of functions for str converter 😞

So new combinator should be created. Also, since dimapBijection is used very often let's make it name shorter (just dimap) and let's call new function mdimap (stands for maybe dimap).

Report proper type checking error during parsing

For example, if I have TOML like this:

list = ["one", "two"]

[table.name.2]
  2Inner = 42

I see no problem. But with the following:

list = ["one", 123]

[table.name.2]
  2Inner = 42

I see not really helpful message.

3:1:
  |
3 | [table.name.2]
  | ^
Can't type check value!

When what I want to see is something like this

1:15-17:
  |
1 | list = ["one", 123]
  |                ^^^
Can't type check value! Expected type String, actual type Int

Depends on #21.

Implement our own 'PrefixTree'

After thinking a little bit more on how to store Key components I came to conclusion that we really want to store component on each level of Toml data type instead of storing whole key.

Currently we have the following data types:

newtype Key = Key { unKey :: NonEmpty Text }

data TOML = TOML
    { tomlPairs  :: HashMap Key AnyValue
    , tomlTables :: HashMap Key TOML
    }

So each level can contain whole key. This is not really convenient actually... Because if you want to insert some key into such map, you need to scan whole HashMap in order to split by greatest common prefix. This sad and not really convenient...

On the other hand, storing only single key component per level simplifies things a lot!

So we actually want to have something like this:

data TOML = TOML
    { tomlPairs  :: HashMap Key AnyValue
    , tomlTables :: HashMap Text TOML  -- here 'Text' is only single component of 'Key'
    }

But with such scheme it's harder to implement pretty-printing for tables like this:

[foo.bar]
baz = 3

So we should analyze nested Tomls to not print table in this way:

[foo]
  [bar]  # we don't want this
    baz = 3

Also, during parsing we need to split every table key by component in insert them in this HashMap in some smart way. This is also undesirable.

Currently I'm having hard problems with these issues:

So let's implement our own data structure! Basically what we want is to have something like monomorphic for Key HashMap but which joins nodes by common prefix.

I think that data structure can look like this:

data PrefixTree a
    = Leaf (NonEmpty Text) a
    | Branch (NonEmpty Text) (PrefixTree a) (PrefixTree a)

type PrefixMap a = Maybe (PrefixTree a)

And we need to have the following basic functions and instances:

single :: Key -> a -> PrefixTree a
insert :: Key -> a -> PrefixTree a -> PrefixTree a
lookup :: Key -> PrefixTree a -> Maybe a
delete :: Key -> PrefixTree a -> Maybe (PrefixTree a)

instance Semigroup (PrefixTree a)  -- or PartialSemigroup

Support tables in bidirectional conversion

Add parsing of _ in numbers

TOML allows you to specify numbers like 1_000_000. This is not supported currently. But probably should be.

Property-based tests for encoder and decoder

How this test should looks like:

Create big data type with all possible types supported by tomland.
Write converter for this type.
Ensure decode . encode == id.

Introduce consistent renaming + some refactoring

Current naming is not perfect and not consistent. At the beginning I tried to give names according to toml specification. But now I see that it's better to be closer to Haskell. Which brings to the following naming convention:

Constructors of Value and ValueType changes to Integer, Text, Double from Int, String, Float. And rename ValueType to TValue.
Each parser should have P suffix.
Each prism should be capitalised and start with underscore.
Each bidirectional converter should be name of constructor without any suffices. When name clashes with Prelude name it should have T suffix (convention for all BiToml configurations).

Also, I want to change module structure a bit to make it more modular:

Split Toml.Types module into Toml.Types.Value, Toml.Types.AnyValue, Toml.Types.TOML, TOML.Types.UValue. This change is also motivated by fact that Toml.Types module becomes very big.

Let me know, what do you think!

Write 'tomland' tutorial

This tutorial should cover:

What is bidirectional conversion?
How to write bidirectional converter for simple data type with fields Integer, Text and [Double]?
How to use specification value to encode/decode toml?
How to parse tables?
How to parse nested objects?
How to parse list of objects (using array of tables)?

Something else we need to cover?..

Introduce converter for 'Maybe'

It's not possible now to parse absent field. Maybe converter should solve this problem.

Add 'parse . prettyToml ≡ id' property test

Things needed to be done:

Generate arbitrary AnyValue
Generate arbitrary valid TOML

Improve 'Date' generator for property-based tests

Add IO functions to parse toml

Enhance type checking to get exact type mismatch error

Currently if types mismatch we get Nothing. What we actually want is a pair of expected type (type of first element in array) and actual type (first mismatch).

Support sum types in bidirectional conversion

Let's say, we have a type like this:

data Either l r = Left l | Right r

It should be possible to write bidirectional converter for such type type. Perfectly with the help of Alternative type class. I think in TOML this should look like this:

foo.Left = 42
bar.Right = true

Specifying both constructors should result in parsing error.

Add parsing of hex, oct and bin integer numbers

TOML allows to specify numbers with 0x, 0o and 0b prefixes. megaparsec has primitives for such things. We should enhance our integer parser.

Improve 'String' generator for property-based tests

Implement TOML validator

Not everything can be guaranteed statically. And some errors are not parsing errors. So we should write some validating function which checks for the following things:

Defining a key multiple times is invalid.

# DO NOT DO THIS
name = "Tom"
name = "Pradyun"

A key already defined may not be appended

# THIS IS INVALID
a.b = 1
a.b.c = 2

Can't have key and table with the same name.

# DO NOT DO THIS
a = 2

[a]
x = 2

Can't have duplicating table names

# DO NOT DO THIS

[a]
b = 1

[a]
c = 2

Can't have nested same table and keys

# DO NOT DO THIS EITHER

[a]
b = 1

[a.b]
c = 2

Attempting to append to a statically defined array, even if that array is empty or of compatible type, must produce an error at parse time.

# INVALID TOML DOC
fruit = []

[[fruit]] # Not allowed

Attempting to define a normal table with the same name as an already established array must produce an error at parse time.

# INVALID TOML DOC
[[fruit]]
  name = "apple"

  [[fruit.variety]]
    name = "red delicious"

  # This table conflicts with the previous table
  [fruit.variety]
    name = "granny smith"

Create EDSL for manually specifying TOML data types

Sometimes you don't want to specify TOML in file, you want to have configuration embedded to code. Currently you can specify everything manually using HashMaps and constructors:

myToml :: TOML
myToml = TOML (fromList
    [ (Key "a", Bool True)
    , (Key "list", Array [String "one", String "two"])
    , (Key "time", Array [Date $ Day (fromGregorian 2018 3 29)])
    ] ) myInnerToml mempty

myInnerToml :: HashMap TableId TOML
myInnerToml = fromList $ [ ( TableId (NonEmpty.fromList ["table", "name", "1"])
                           , TOML (fromList [ (Key "aInner", Int 1)
                                            , (Key "listInner", Array [Bool True, Bool False])
                                            ]) myInnerInnerToml mempty
                           )
                         , ( TableId (NonEmpty.fromList ["table", "name", "2"])
                           , TOML (fromList [ (Key "2Inner", Int 42)
                                            ]) mempty mempty
                           )
                         ]

But this is ugly and not convenient. What would be great is to have some EDSL so the above can be configured like this:

myToml :: TOML
myToml = configure $ do
  "a"    :=: True
  "list" :=: ["one", "two"]
  "time" :=: Day (fromGregorian 2018 3 29) -- maybe something more convenient here
  table "table.name.1" $ do
    "aInner"    :=: 1
    "listInner" :=: [True, False]
    ...

The last one looks more convenient and simpler to me.

Do not allow leading zeros in the decimal representation of integer values

As per TOML specifications, leading zeros are not allowed in the decimal representation of integer values.

Might be easier to implement after #17 is implemented.

Add parsing of multiline strings

Multiline strings are quite useful. But it's not easy to parse them...

Rename 'Valuer' to 'SimplePrism'

And add more combinators (for something like Either if needed).

Rename `Bi` data types

I don't quite like names Bijection, Bi, BiToml and BiMap altogether because they all start with Bi prefix and from these type names it's not really clear what they mean and what is the difference (and the names are not precise enough). So I propose the following renaming:

I like name codec for what we previously called bidirectional converter.

Rename Bijection to Codec.

data Bijection r w c a = Bijection
    { biRead  :: r a
    , biWrite :: c -> w a
    }

Rename Bi to BiCodec.

type Bi r w a = Bijection r w a a

Rename BiToml to TomlCodec.

type BiToml a = Bi Env St a

Move BiMap under Bi/ directory and let's keep BiMap name for now.

data BiMap a b = BiMap
    { forward  :: a -> Maybe b
    , backward :: b -> Maybe a
    }

I think this naming scheme is good. The name TomlCodec gives hint to the user that this is a special case of Codec and every function that works with Codec should work with TomlCodec.

Implement mini-framework for unit testing TOML parsing

Pretty-printing doesn't require testing. Only single, that property-one. But we need to have some convenient way to test different parsing parts of TOML language. It would be great if we could parse everything from TOML repository. Need to think how to do this in convenient way...

Here are the things we need to test currently:

Add parsing of literal strings

Current string parser should be implemented more properly. And we should add literal strings as well. Details are described here: #12 (comment)

Add proper parsing of floating point numbers

Or ensure we have things we need:

Add dates parsing

Parsing dates is hard 😞 But we really need them.

Add easier way to write custom `BiMap` for values

If I have custom data type with couple functions like this one:

data GhcVer = ...

showGhcVer  :: GhcVer -> Text
parseGhcVer :: Text -> Maybe GhcVer

Then writing custom BiMap for this data type is a little bit tedious:

    _GhcVer :: BiMap AnyValue GhcVer
    _GhcVer = BiMap
        { forward = \(AnyValue t) -> Toml.matchText t >>= parseGhcVer
        , backward = Just . AnyValue . Toml.Text . showGhcVer
        }

I want easier way to write such prisms.

Improve 'AnyValue' generator for property-based tests

I'm talking about this generator:

tomland/test/Test/Toml/Gen.hs

Lines 52 to 64 in 6c60b40

 -- | Generates random value of 'AnyValue' type. 

 genAnyValue :: MonadGen m => m AnyValue 

 genAnyValue = do 

 let randB = Gen.bool 

 let randI = toInteger <$> Gen.int (Range.constantBounded @Int) 

 let randD = Gen.double $ Range.constant @Double (-1000000.0) 1000000.0 

 let randT = Gen.text (Range.constant 0 256) Gen.alphaNum 

 Gen.choice 

 [ AnyValue . Bool <$> randB 

 , AnyValue . Integer <$> randI 

 , AnyValue . Double <$> randD 

 , AnyValue . Text <$> randT 

 ]

Currently it doesn't generate everything we can parse. This generator should be expanded with the following:

nan and inf to double generator
Arrays
Different types of text after supporting string parser
Dates (after they're supported)

Add helpful bi combinators

Add utility converters for the following types:

Implement ADT which specifies TOML

According to this document:

https://github.com/toml-lang/toml

Add ability to convert between 'Toml' and user custom data types

What we want is to have some way to describe correspondence between user data types and TOML representation. Consider this Haskell data type:

data Node = Node { root :: Bool, labels :: [Int] }

Would be great to have an ability to specify correspondence in some way like this:

nodeToml :: TomlM Node
nodeToml = do
    root   <- bool "root"
    labels <- array @Int "labels"
    pure Node{..}

And then we can have two functions:

decode :: Toml -> TomlM a -> Either DecodeError a
encode :: a -> TomlM a -> Toml

Difficult questions

How this data type should be encoded in TOML?

Like this:

node.root = true
node.labels = [ 1, 2, 3 ]

or like this:

[node]
root = true
labels = [ 1, 2, 3 ]

I think we should give an ability to specify which way user want.

How to encode sum types?

I propose the following strategy: if you have foo :: Either Int String then it should be either

foo.Left = 3

foo.Right = "bar"

If you have both keys, this should be a type error. And, again, give ability to specify foo as table. And this scheme easily can be extended if constructors have multiple fields.

What to do with arrays of different values?

In TOML it's possible to have:

foo = [ [1, 2, 3], [true, false] ]

But Haskell doesn't really have a standard data structure for such case... So should we care about supporting this case or not?..

Property tests for 'PrefixTree' and 'PrefixMap'

Our PrefixTree is not really the most simple data structure... Would be great to have some property tests for it. Specifically:

PrefixMap generator. Ideally using recursive from hedgehog library.
InsertLookup: lookup k (insert k v t) ≡ Just v
InsertInsert: insert x a . insert x b ≡ insert x a (can be tested as lookup a (insert x a $ insert x b t) === Just a)

Implement pretty 'Show' instance for bi-exceptions

Support full-featured string parser

Relates to #16

See details here: #12 (comment)

Support tagged 'Codec' (add annotations to 'Codec')

Currently we have this:

-- | Parser for integer values.
integer :: Key -> TomlCodec Integer
integer = match matchInteger Int

word32 :: Key -> TomlCodec Word32
word32 = dimapNum . integer

The problem with word32 is that in case of error it will print type tag "Integer" where we actually want "Word32". But we also want to reuse our written parsers. So BiToml should be extended with some sort of annotations where I can write just:

word32 :: Key -> TomlCodec Word32
word32 = dimapA fromInteger toIntegral  . integer

Also useful for newtypes.

Support build with 'cabal' on Travis

I think this might be useful.

Build with GHC-8.4.2 on CI

It's even in nightly package. So might be quite useful.

Support array of tables

This is important issue since array of tables is probably the only reasonable way to specify data structures like List and Map in TOML. Though, it's hard to support them...

Add 'PrintOptions' for 'tomlToText' function

Remove `maybeT` in favour of `dioptional`

See here:

https://github.com/kowainik/summoner/blob/master/src/Summoner/Config.hs#L165-L169

Implement basic pretty-printer

Implement TOML -> Text function which renders TOML data type into textual representation.

Implement basic parser

Very basic. Without:

Inline tables.
Multiline strings.
Arrays of tables.
Underscores in numbers.
Dates.
Literal strings.
nan and inf.

Just to make this package usable for basic things.

Add 'Semigroup' and 'Monoid' instances for 'TOML' type

Use `binary` from megaparsec when tomland start to use megaparsec-7.0.0

This will make the integer parser code shorter.

The current integer parser code:

tomland/src/Toml/Parser.hs

Lines 159 to 168 in 00a2cac

 integerP :: Parser Integer 

 integerP = lexeme $ binary <|> octal <|> hexadecimal <|> decimal 

 where 

 decimal = L.signed sc L.decimal 

 binary = try (char '0' >> char 'b') >> mkNum 2 <$> (some binDigitChar) 

 octal = try (char '0' >> char 'o') >> L.octal 

 hexadecimal = try (char '0' >> char 'x') >> L.hexadecimal 

 binDigitChar = oneOf ['0', '1'] 

 mkNum b = foldl' (step b) 0 

 step b a c = a * b + fromIntegral (digitToInt c)

binary from megaparsec: https://github.com/mrkkrp/megaparsec/blob/f3e26a42231dfac748045325a7f45ba07a04f069/Text/Megaparsec/Char/Lexer.hs#L442-L452

Add parsing of inline tables

This might be convenient with #13.

Replace 'HashMap' in 'TOML' with our 'PrefixTree'

This involves:

Chane TOML type
Add insert and lookup functions for PrefixMap (basically just refactoring current implementation)
Change parser (removing merging, using insert from PrefixMap)
Change renderer

Write 'doctest' tests

Okay, so I want to put every testcase from TOML specification into our documentation and test them all with doctest. Once we figure out how to do it in neat way, we can implement those tests and use doctest for them.

	-- \| Generates random value of 'AnyValue' type.
	genAnyValue :: MonadGen m => m AnyValue
	genAnyValue = do
	let randB = Gen.bool
	let randI = toInteger <$> Gen.int (Range.constantBounded @Int)
	let randD = Gen.double $ Range.constant @Double (-1000000.0) 1000000.0
	let randT = Gen.text (Range.constant 0 256) Gen.alphaNum
	Gen.choice
	[ AnyValue . Bool <$> randB
	, AnyValue . Integer <$> randI
	, AnyValue . Double <$> randD
	, AnyValue . Text <$> randT
	]

	integerP :: Parser Integer
	integerP = lexeme $ binary <\|> octal <\|> hexadecimal <\|> decimal
	where
	decimal = L.signed sc L.decimal
	binary = try (char '0' >> char 'b') >> mkNum 2 <$> (some binDigitChar)
	octal = try (char '0' >> char 'o') >> L.octal
	hexadecimal = try (char '0' >> char 'x') >> L.hexadecimal
	binDigitChar = oneOf ['0', '1']
	mkNum b = foldl' (step b) 0
	step b a c = a * b + fromIntegral (digitToInt c)

kowainik / tomland Goto Github PK

tomland's People

Contributors

Stargazers

Watchers

Forkers

tomland's Issues

Difficult questions

Recommend Projects

Recommend Topics

Recommend Org