Giter VIP home page Giter VIP logo

tomland's People

Contributors

alistairb avatar astynax avatar chshersh avatar cocreature avatar cronokirby avatar crtschin avatar danburton avatar dariodsa avatar dependabot[bot] avatar everythingfunctional avatar felixonmars avatar gabrielelana avatar gahag avatar ghallak avatar jiegillet avatar kutyel avatar mxxo avatar nimor111 avatar ramanshah avatar sanchayanmaity avatar sjakobi avatar tmcgilchrist avatar tomjaguarpaw avatar tomphp avatar totallynotchase avatar vrom911 avatar willbasky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tomland's Issues

Implement type-safe version of TOML ast

So, currently TOML Value data type allows to have values of different type inside Array. I propose to solve this problem by introducing new TValue data type (or renaming current Value to UValue aka _untyped value) with GADT and DataKinds feature. Basically:

data TomlType = TBool | ...

data Value (t :: TomlType) where
    Bool :: Bool -> Value TBool
    ...

Basically, approach starting from this slide should be used:

I guess we should have UValue and typeCheck :: UValue -> Maybe Value function. Not sure though...

Also, several problems need to be solved. For example, this type also should have typed and untyped version but it's not obvious how to modify it... Let's discuss and see whether it makes sense!

data TOML = TOML
    { tomlPairs       :: HashMap Key     Value
    , tomlTables      :: HashMap TableId TOML

Introduce unidirectional decoder (and probably encoder)

Sometimes you don't want to write parsers or showers because they are usually harder to write. So to make things easier it's better to have unidirectional encoders and decoders. Though it's not clear how to support encode and decode functions for both bi and uni...

Non-bijectional 'dimapBijection'

dimapBijection works good for things like Num and newtypes but if you have functions like:

parseMyType :: Text -> Maybe MyType
showMyType  :: MyType -> Text

you will have some problems with dimapping just pair of functions for str converter ๐Ÿ˜ž

So new combinator should be created. Also, since dimapBijection is used very often let's make it name shorter (just dimap) and let's call new function mdimap (stands for maybe dimap).

Report proper type checking error during parsing

For example, if I have TOML like this:

list = ["one", "two"]

[table.name.2]
  2Inner = 42

I see no problem. But with the following:

list = ["one", 123]

[table.name.2]
  2Inner = 42

I see not really helpful message.

3:1:
  |
3 | [table.name.2]
  | ^
Can't type check value!

When what I want to see is something like this

1:15-17:
  |
1 | list = ["one", 123]
  |                ^^^
Can't type check value! Expected type String, actual type Int

Depends on #21.

Implement our own 'PrefixTree'

After thinking a little bit more on how to store Key components I came to conclusion that we really want to store component on each level of Toml data type instead of storing whole key.

Currently we have the following data types:

newtype Key = Key { unKey :: NonEmpty Text }

data TOML = TOML
    { tomlPairs  :: HashMap Key AnyValue
    , tomlTables :: HashMap Key TOML
    }

So each level can contain whole key. This is not really convenient actually... Because if you want to insert some key into such map, you need to scan whole HashMap in order to split by greatest common prefix. This sad and not really convenient...

On the other hand, storing only single key component per level simplifies things a lot!

So we actually want to have something like this:

data TOML = TOML
    { tomlPairs  :: HashMap Key AnyValue
    , tomlTables :: HashMap Text TOML  -- here 'Text' is only single component of 'Key'
    }

But with such scheme it's harder to implement pretty-printing for tables like this:

[foo.bar]
baz = 3

So we should analyze nested Tomls to not print table in this way:

[foo]
  [bar]  # we don't want this
    baz = 3

Also, during parsing we need to split every table key by component in insert them in this HashMap in some smart way. This is also undesirable.

Currently I'm having hard problems with these issues:

So let's implement our own data structure! Basically what we want is to have something like monomorphic for Key HashMap but which joins nodes by common prefix.

I think that data structure can look like this:

data PrefixTree a
    = Leaf (NonEmpty Text) a
    | Branch (NonEmpty Text) (PrefixTree a) (PrefixTree a)

type PrefixMap a = Maybe (PrefixTree a)

And we need to have the following basic functions and instances:

single :: Key -> a -> PrefixTree a
insert :: Key -> a -> PrefixTree a -> PrefixTree a
lookup :: Key -> PrefixTree a -> Maybe a
delete :: Key -> PrefixTree a -> Maybe (PrefixTree a)

instance Semigroup (PrefixTree a)  -- or PartialSemigroup

Add parsing of _ in numbers

TOML allows you to specify numbers like 1_000_000. This is not supported currently. But probably should be.

Introduce consistent renaming + some refactoring

Current naming is not perfect and not consistent. At the beginning I tried to give names according to toml specification. But now I see that it's better to be closer to Haskell. Which brings to the following naming convention:

  1. Constructors of Value and ValueType changes to Integer, Text, Double from Int, String, Float. And rename ValueType to TValue.
  2. Each parser should have P suffix.
  3. Each prism should be capitalised and start with underscore.
  4. Each bidirectional converter should be name of constructor without any suffices. When name clashes with Prelude name it should have T suffix (convention for all BiToml configurations).

Also, I want to change module structure a bit to make it more modular:

Split Toml.Types module into Toml.Types.Value, Toml.Types.AnyValue, Toml.Types.TOML, TOML.Types.UValue. This change is also motivated by fact that Toml.Types module becomes very big.

Let me know, what do you think!

Write 'tomland' tutorial

This tutorial should cover:

  1. What is bidirectional conversion?
  2. How to write bidirectional converter for simple data type with fields Integer, Text and [Double]?
  3. How to use specification value to encode/decode toml?
  4. How to parse tables?
  5. How to parse nested objects?
  6. How to parse list of objects (using array of tables)?

Something else we need to cover?..

Support sum types in bidirectional conversion

Let's say, we have a type like this:

data Either l r = Left l | Right r

It should be possible to write bidirectional converter for such type type. Perfectly with the help of Alternative type class. I think in TOML this should look like this:

foo.Left = 42
bar.Right = true

Specifying both constructors should result in parsing error.

Implement TOML validator

Not everything can be guaranteed statically. And some errors are not parsing errors. So we should write some validating function which checks for the following things:

  • Defining a key multiple times is invalid.
# DO NOT DO THIS
name = "Tom"
name = "Pradyun"
  • A key already defined may not be appended
# THIS IS INVALID
a.b = 1
a.b.c = 2
  • Can't have key and table with the same name.
# DO NOT DO THIS
a = 2

[a]
x = 2
  • Can't have duplicating table names
# DO NOT DO THIS

[a]
b = 1

[a]
c = 2
  • Can't have nested same table and keys
# DO NOT DO THIS EITHER

[a]
b = 1

[a.b]
c = 2
  • Attempting to append to a statically defined array, even if that array is empty or of compatible type, must produce an error at parse time.
# INVALID TOML DOC
fruit = []

[[fruit]] # Not allowed
  • Attempting to define a normal table with the same name as an already established array must produce an error at parse time.
# INVALID TOML DOC
[[fruit]]
  name = "apple"

  [[fruit.variety]]
    name = "red delicious"

  # This table conflicts with the previous table
  [fruit.variety]
    name = "granny smith"

Create EDSL for manually specifying TOML data types

Sometimes you don't want to specify TOML in file, you want to have configuration embedded to code. Currently you can specify everything manually using HashMaps and constructors:

myToml :: TOML
myToml = TOML (fromList
    [ (Key "a", Bool True)
    , (Key "list", Array [String "one", String "two"])
    , (Key "time", Array [Date $ Day (fromGregorian 2018 3 29)])
    ] ) myInnerToml mempty

myInnerToml :: HashMap TableId TOML
myInnerToml = fromList $ [ ( TableId (NonEmpty.fromList ["table", "name", "1"])
                           , TOML (fromList [ (Key "aInner", Int 1)
                                            , (Key "listInner", Array [Bool True, Bool False])
                                            ]) myInnerInnerToml mempty
                           )
                         , ( TableId (NonEmpty.fromList ["table", "name", "2"])
                           , TOML (fromList [ (Key "2Inner", Int 42)
                                            ]) mempty mempty
                           )
                         ]

But this is ugly and not convenient. What would be great is to have some EDSL so the above can be configured like this:

myToml :: TOML
myToml = configure $ do
  "a"    :=: True
  "list" :=: ["one", "two"]
  "time" :=: Day (fromGregorian 2018 3 29) -- maybe something more convenient here
  table "table.name.1" $ do
    "aInner"    :=: 1
    "listInner" :=: [True, False]
    ...

The last one looks more convenient and simpler to me.

Rename `Bi` data types

I don't quite like names Bijection, Bi, BiToml and BiMap altogether because they all start with Bi prefix and from these type names it's not really clear what they mean and what is the difference (and the names are not precise enough). So I propose the following renaming:

I like name codec for what we previously called bidirectional converter.

  1. Rename Bijection to Codec.
data Bijection r w c a = Bijection
    { biRead  :: r a
    , biWrite :: c -> w a
    }
  1. Rename Bi to BiCodec.
type Bi r w a = Bijection r w a a
  1. Rename BiToml to TomlCodec.
type BiToml a = Bi Env St a
  1. Move BiMap under Bi/ directory and let's keep BiMap name for now.
data BiMap a b = BiMap
    { forward  :: a -> Maybe b
    , backward :: b -> Maybe a
    }

I think this naming scheme is good. The name TomlCodec gives hint to the user that this is a special case of Codec and every function that works with Codec should work with TomlCodec.

Implement mini-framework for unit testing TOML parsing

Pretty-printing doesn't require testing. Only single, that property-one. But we need to have some convenient way to test different parsing parts of TOML language. It would be great if we could parse everything from TOML repository. Need to think how to do this in convenient way...

Here are the things we need to test currently:

  • Bool
  • Integer
    • Binary
    • Octal
    • Decimal
    • Hexadecimal
  • Double
  • Array
  • Keys
  • Key-value
  • Table
  • Some small complete toml

Add easier way to write custom `BiMap` for values

If I have custom data type with couple functions like this one:

data GhcVer = ...

showGhcVer  :: GhcVer -> Text
parseGhcVer :: Text -> Maybe GhcVer

Then writing custom BiMap for this data type is a little bit tedious:

    _GhcVer :: BiMap AnyValue GhcVer
    _GhcVer = BiMap
        { forward = \(AnyValue t) -> Toml.matchText t >>= parseGhcVer
        , backward = Just . AnyValue . Toml.Text . showGhcVer
        }

I want easier way to write such prisms.

Improve 'AnyValue' generator for property-based tests

I'm talking about this generator:

-- | Generates random value of 'AnyValue' type.
genAnyValue :: MonadGen m => m AnyValue
genAnyValue = do
let randB = Gen.bool
let randI = toInteger <$> Gen.int (Range.constantBounded @Int)
let randD = Gen.double $ Range.constant @Double (-1000000.0) 1000000.0
let randT = Gen.text (Range.constant 0 256) Gen.alphaNum
Gen.choice
[ AnyValue . Bool <$> randB
, AnyValue . Integer <$> randI
, AnyValue . Double <$> randD
, AnyValue . Text <$> randT
]

Currently it doesn't generate everything we can parse. This generator should be expanded with the following:

  • nan and inf to double generator
  • Arrays
  • Different types of text after supporting string parser
  • Dates (after they're supported)

Add helpful bi combinators

Add utility converters for the following types:

  • [ ] Set
  • [ ] HashSet
  • String (we have only for Text but sometimes you want String)
  • ByteString
  • Natural
  • Float
  • Word
  • Read/Show via show/readMaybe

Add ability to convert between 'Toml' and user custom data types

What we want is to have some way to describe correspondence between user data types and TOML representation. Consider this Haskell data type:

data Node = Node { root :: Bool, labels :: [Int] }

Would be great to have an ability to specify correspondence in some way like this:

nodeToml :: TomlM Node
nodeToml = do
    root   <- bool "root"
    labels <- array @Int "labels"
    pure Node{..}

And then we can have two functions:

decode :: Toml -> TomlM a -> Either DecodeError a
encode :: a -> TomlM a -> Toml

Difficult questions

  1. How this data type should be encoded in TOML?

Like this:

node.root = true
node.labels = [ 1, 2, 3 ]

or like this:

[node]
root = true
labels = [ 1, 2, 3 ]

I think we should give an ability to specify which way user want.

  1. How to encode sum types?

I propose the following strategy: if you have foo :: Either Int String then it should be either

foo.Left = 3

or

foo.Right = "bar"

If you have both keys, this should be a type error. And, again, give ability to specify foo as table. And this scheme easily can be extended if constructors have multiple fields.

  1. What to do with arrays of different values?

In TOML it's possible to have:

foo = [ [1, 2, 3], [true, false] ]

But Haskell doesn't really have a standard data structure for such case... So should we care about supporting this case or not?..

Property tests for 'PrefixTree' and 'PrefixMap'

Our PrefixTree is not really the most simple data structure... Would be great to have some property tests for it. Specifically:

  • PrefixMap generator. Ideally using recursive from hedgehog library.
  • InsertLookup: lookup k (insert k v t) โ‰ก Just v
  • InsertInsert: insert x a . insert x b โ‰ก insert x a (can be tested as lookup a (insert x a $ insert x b t) === Just a)

Support tagged 'Codec' (add annotations to 'Codec')

Currently we have this:

-- | Parser for integer values.
integer :: Key -> TomlCodec Integer
integer = match matchInteger Int

word32 :: Key -> TomlCodec Word32
word32 = dimapNum . integer

The problem with word32 is that in case of error it will print type tag "Integer" where we actually want "Word32". But we also want to reuse our written parsers. So BiToml should be extended with some sort of annotations where I can write just:

word32 :: Key -> TomlCodec Word32
word32 = dimapA fromInteger toIntegral  . integer

Also useful for newtypes.

Support array of tables

This is important issue since array of tables is probably the only reasonable way to specify data structures like List and Map in TOML. Though, it's hard to support them...

Implement basic parser

Very basic. Without:

  1. Inline tables.
  2. Multiline strings.
  3. Arrays of tables.
  4. Underscores in numbers.
  5. Dates.
  6. Literal strings.
  7. nan and inf.

Just to make this package usable for basic things.

Use `binary` from megaparsec when tomland start to use megaparsec-7.0.0

This will make the integer parser code shorter.

The current integer parser code:

tomland/src/Toml/Parser.hs

Lines 159 to 168 in 00a2cac

integerP :: Parser Integer
integerP = lexeme $ binary <|> octal <|> hexadecimal <|> decimal
where
decimal = L.signed sc L.decimal
binary = try (char '0' >> char 'b') >> mkNum 2 <$> (some binDigitChar)
octal = try (char '0' >> char 'o') >> L.octal
hexadecimal = try (char '0' >> char 'x') >> L.hexadecimal
binDigitChar = oneOf ['0', '1']
mkNum b = foldl' (step b) 0
step b a c = a * b + fromIntegral (digitToInt c)

binary from megaparsec: https://github.com/mrkkrp/megaparsec/blob/f3e26a42231dfac748045325a7f45ba07a04f069/Text/Megaparsec/Char/Lexer.hs#L442-L452

Replace 'HashMap' in 'TOML' with our 'PrefixTree'

This involves:

  • Chane TOML type
  • Add insert and lookup functions for PrefixMap (basically just refactoring current implementation)
  • Change parser (removing merging, using insert from PrefixMap)
  • Change renderer

Write 'doctest' tests

Okay, so I want to put every testcase from TOML specification into our documentation and test them all with doctest. Once we figure out how to do it in neat way, we can implement those tests and use doctest for them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.