kowainik / tomland Goto Github PK
View Code? Open in Web Editor NEW๐ Bidirectional TOML serialization
Home Page: https://kowainik.github.io/posts/2019-01-14-tomland
License: Mozilla Public License 2.0
๐ Bidirectional TOML serialization
Home Page: https://kowainik.github.io/posts/2019-01-14-tomland
License: Mozilla Public License 2.0
So, currently TOML Value
data type allows to have values of different type inside Array
. I propose to solve this problem by introducing new TValue
data type (or renaming current Value
to UValue
aka _untyped value) with GADT and DataKinds
feature. Basically:
data TomlType = TBool | ...
data Value (t :: TomlType) where
Bool :: Bool -> Value TBool
...
Basically, approach starting from this slide should be used:
I guess we should have UValue
and typeCheck :: UValue -> Maybe Value
function. Not sure though...
Also, several problems need to be solved. For example, this type also should have typed and untyped version but it's not obvious how to modify it... Let's discuss and see whether it makes sense!
data TOML = TOML
{ tomlPairs :: HashMap Key Value
, tomlTables :: HashMap TableId TOML
Sometimes you don't want to write parsers or showers because they are usually harder to write. So to make things easier it's better to have unidirectional encoders and decoders. Though it's not clear how to support encode
and decode
functions for both bi and uni...
dimapBijection
works good for things like Num
and newtype
s but if you have functions like:
parseMyType :: Text -> Maybe MyType
showMyType :: MyType -> Text
you will have some problems with dimapping just pair of functions for str
converter ๐
So new combinator should be created. Also, since dimapBijection
is used very often let's make it name shorter (just dimap
) and let's call new function mdimap
(stands for maybe dimap).
For example, if I have TOML like this:
list = ["one", "two"]
[table.name.2]
2Inner = 42
I see no problem. But with the following:
list = ["one", 123]
[table.name.2]
2Inner = 42
I see not really helpful message.
3:1:
|
3 | [table.name.2]
| ^
Can't type check value!
When what I want to see is something like this
1:15-17:
|
1 | list = ["one", 123]
| ^^^
Can't type check value! Expected type String, actual type Int
Depends on #21.
After thinking a little bit more on how to store Key
components I came to conclusion that we really want to store component on each level of Toml
data type instead of storing whole key.
Currently we have the following data types:
newtype Key = Key { unKey :: NonEmpty Text }
data TOML = TOML
{ tomlPairs :: HashMap Key AnyValue
, tomlTables :: HashMap Key TOML
}
So each level can contain whole key. This is not really convenient actually... Because if you want to insert some key into such map, you need to scan whole HashMap
in order to split by greatest common prefix. This sad and not really convenient...
On the other hand, storing only single key component per level simplifies things a lot!
So we actually want to have something like this:
data TOML = TOML
{ tomlPairs :: HashMap Key AnyValue
, tomlTables :: HashMap Text TOML -- here 'Text' is only single component of 'Key'
}
But with such scheme it's harder to implement pretty-printing for tables like this:
[foo.bar]
baz = 3
So we should analyze nested Toml
s to not print table in this way:
[foo]
[bar] # we don't want this
baz = 3
Also, during parsing we need to split every table key by component in insert them in this HashMap
in some smart way. This is also undesirable.
Currently I'm having hard problems with these issues:
So let's implement our own data structure! Basically what we want is to have something like monomorphic for Key
HashMap
but which joins nodes by common prefix.
I think that data structure can look like this:
data PrefixTree a
= Leaf (NonEmpty Text) a
| Branch (NonEmpty Text) (PrefixTree a) (PrefixTree a)
type PrefixMap a = Maybe (PrefixTree a)
And we need to have the following basic functions and instances:
single :: Key -> a -> PrefixTree a
insert :: Key -> a -> PrefixTree a -> PrefixTree a
lookup :: Key -> PrefixTree a -> Maybe a
delete :: Key -> PrefixTree a -> Maybe (PrefixTree a)
instance Semigroup (PrefixTree a) -- or PartialSemigroup
TOML allows you to specify numbers like 1_000_000
. This is not supported currently. But probably should be.
How this test should looks like:
tomland
.decode . encode == id
.Current naming is not perfect and not consistent. At the beginning I tried to give names according to toml specification. But now I see that it's better to be closer to Haskell. Which brings to the following naming convention:
Value
and ValueType
changes to Integer
, Text
, Double
from Int
, String
, Float
. And rename ValueType
to TValue
.P
suffix.Prelude
name it should have T
suffix (convention for all BiToml
configurations).Also, I want to change module structure a bit to make it more modular:
Split Toml.Types
module into Toml.Types.Value
, Toml.Types.AnyValue
, Toml.Types.TOML
, TOML.Types.UValue
. This change is also motivated by fact that Toml.Types
module becomes very big.
Let me know, what do you think!
This tutorial should cover:
Integer
, Text
and [Double]
?Something else we need to cover?..
It's not possible now to parse absent field. Maybe
converter should solve this problem.
Things needed to be done:
AnyValue
TOML
Currently if types mismatch we get Nothing
. What we actually want is a pair of expected type (type of first element in array) and actual type (first mismatch).
Let's say, we have a type like this:
data Either l r = Left l | Right r
It should be possible to write bidirectional converter for such type type. Perfectly with the help of Alternative
type class. I think in TOML this should look like this:
foo.Left = 42
bar.Right = true
Specifying both constructors should result in parsing error.
TOML allows to specify numbers with 0x
, 0o
and 0b
prefixes. megaparsec
has primitives for such things. We should enhance our integer parser.
Not everything can be guaranteed statically. And some errors are not parsing errors. So we should write some validating function which checks for the following things:
# DO NOT DO THIS
name = "Tom"
name = "Pradyun"
# THIS IS INVALID
a.b = 1
a.b.c = 2
# DO NOT DO THIS
a = 2
[a]
x = 2
# DO NOT DO THIS
[a]
b = 1
[a]
c = 2
# DO NOT DO THIS EITHER
[a]
b = 1
[a.b]
c = 2
# INVALID TOML DOC
fruit = []
[[fruit]] # Not allowed
# INVALID TOML DOC
[[fruit]]
name = "apple"
[[fruit.variety]]
name = "red delicious"
# This table conflicts with the previous table
[fruit.variety]
name = "granny smith"
Sometimes you don't want to specify TOML in file, you want to have configuration embedded to code. Currently you can specify everything manually using HashMap
s and constructors:
myToml :: TOML
myToml = TOML (fromList
[ (Key "a", Bool True)
, (Key "list", Array [String "one", String "two"])
, (Key "time", Array [Date $ Day (fromGregorian 2018 3 29)])
] ) myInnerToml mempty
myInnerToml :: HashMap TableId TOML
myInnerToml = fromList $ [ ( TableId (NonEmpty.fromList ["table", "name", "1"])
, TOML (fromList [ (Key "aInner", Int 1)
, (Key "listInner", Array [Bool True, Bool False])
]) myInnerInnerToml mempty
)
, ( TableId (NonEmpty.fromList ["table", "name", "2"])
, TOML (fromList [ (Key "2Inner", Int 42)
]) mempty mempty
)
]
But this is ugly and not convenient. What would be great is to have some EDSL so the above can be configured like this:
myToml :: TOML
myToml = configure $ do
"a" :=: True
"list" :=: ["one", "two"]
"time" :=: Day (fromGregorian 2018 3 29) -- maybe something more convenient here
table "table.name.1" $ do
"aInner" :=: 1
"listInner" :=: [True, False]
...
The last one looks more convenient and simpler to me.
As per TOML specifications, leading zeros are not allowed in the decimal representation of integer values.
Might be easier to implement after #17 is implemented.
Multiline strings are quite useful. But it's not easy to parse them...
And add more combinators (for something like Either
if needed).
I don't quite like names Bijection
, Bi
, BiToml
and BiMap
altogether because they all start with Bi
prefix and from these type names it's not really clear what they mean and what is the difference (and the names are not precise enough). So I propose the following renaming:
I like name codec for what we previously called bidirectional converter.
Bijection
to Codec
.data Bijection r w c a = Bijection
{ biRead :: r a
, biWrite :: c -> w a
}
Bi
to BiCodec
.type Bi r w a = Bijection r w a a
BiToml
to TomlCodec
.type BiToml a = Bi Env St a
BiMap
under Bi/
directory and let's keep BiMap
name for now.data BiMap a b = BiMap
{ forward :: a -> Maybe b
, backward :: b -> Maybe a
}
I think this naming scheme is good. The name TomlCodec
gives hint to the user that this is a special case of Codec
and every function that works with Codec
should work with TomlCodec
.
Pretty-printing doesn't require testing. Only single, that property-one. But we need to have some convenient way to test different parsing parts of TOML language. It would be great if we could parse everything from TOML repository. Need to think how to do this in convenient way...
Here are the things we need to test currently:
Bool
Integer
Double
Array
Current string parser should be implemented more properly. And we should add literal strings as well. Details are described here: #12 (comment)
Or ensure we have things we need:
[-|+]nan
[-|+]inf
Parsing dates is hard ๐ But we really need them.
If I have custom data type with couple functions like this one:
data GhcVer = ...
showGhcVer :: GhcVer -> Text
parseGhcVer :: Text -> Maybe GhcVer
Then writing custom BiMap
for this data type is a little bit tedious:
_GhcVer :: BiMap AnyValue GhcVer
_GhcVer = BiMap
{ forward = \(AnyValue t) -> Toml.matchText t >>= parseGhcVer
, backward = Just . AnyValue . Toml.Text . showGhcVer
}
I want easier way to write such prisms.
I'm talking about this generator:
Lines 52 to 64 in 6c60b40
Currently it doesn't generate everything we can parse. This generator should be expanded with the following:
nan
and inf
to double generatorAdd utility converters for the following types:
Set
HashSet
String
(we have only for Text
but sometimes you want String
)ByteString
Natural
Float
Word
Read/Show
via show/readMaybe
According to this document:
What we want is to have some way to describe correspondence between user data types and TOML representation. Consider this Haskell data type:
data Node = Node { root :: Bool, labels :: [Int] }
Would be great to have an ability to specify correspondence in some way like this:
nodeToml :: TomlM Node
nodeToml = do
root <- bool "root"
labels <- array @Int "labels"
pure Node{..}
And then we can have two functions:
decode :: Toml -> TomlM a -> Either DecodeError a
encode :: a -> TomlM a -> Toml
Like this:
node.root = true
node.labels = [ 1, 2, 3 ]
or like this:
[node]
root = true
labels = [ 1, 2, 3 ]
I think we should give an ability to specify which way user want.
I propose the following strategy: if you have foo :: Either Int String
then it should be either
foo.Left = 3
or
foo.Right = "bar"
If you have both keys, this should be a type error. And, again, give ability to specify foo
as table. And this scheme easily can be extended if constructors have multiple fields.
In TOML it's possible to have:
foo = [ [1, 2, 3], [true, false] ]
But Haskell doesn't really have a standard data structure for such case... So should we care about supporting this case or not?..
Our PrefixTree
is not really the most simple data structure... Would be great to have some property tests for it. Specifically:
PrefixMap
generator. Ideally using recursive
from hedgehog
library.lookup k (insert k v t) โก Just v
insert x a . insert x b โก insert x a
(can be tested as lookup a (insert x a $ insert x b t) === Just a
)Relates to #16
See details here: #12 (comment)
Currently we have this:
-- | Parser for integer values.
integer :: Key -> TomlCodec Integer
integer = match matchInteger Int
word32 :: Key -> TomlCodec Word32
word32 = dimapNum . integer
The problem with word32
is that in case of error it will print type tag "Integer"
where we actually want "Word32"
. But we also want to reuse our written parsers. So BiToml
should be extended with some sort of annotations where I can write just:
word32 :: Key -> TomlCodec Word32
word32 = dimapA fromInteger toIntegral . integer
Also useful for newtypes
.
I think this might be useful.
It's even in nightly package. So might be quite useful.
This is important issue since array of tables is probably the only reasonable way to specify data structures like List
and Map
in TOML. Though, it's hard to support them...
Implement TOML -> Text
function which renders TOML data type into textual representation.
Very basic. Without:
nan
and inf
.Just to make this package usable for basic things.
This will make the integer parser code shorter.
The current integer parser code:
Lines 159 to 168 in 00a2cac
binary
from megaparsec: https://github.com/mrkkrp/megaparsec/blob/f3e26a42231dfac748045325a7f45ba07a04f069/Text/Megaparsec/Char/Lexer.hs#L442-L452
This might be convenient with #13.
This involves:
TOML
typeinsert
and lookup
functions for PrefixMap
(basically just refactoring current implementation)insert
from PrefixMap
)Okay, so I want to put every testcase from TOML specification into our documentation and test them all with doctest
. Once we figure out how to do it in neat way, we can implement those tests and use doctest
for them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.