adzz / data_schema Goto Github PK

View Code? Open in Web Editor NEW

84.0 4.0 9.0 731 KB

Declarative schemas for data transformations.

License: Apache License 2.0

Elixir 100.00%

elixir data types functional-programming validation data-parsing

data_schema's People

Contributors

Stargazers

Watchers

Forkers

inoas seif jeangeorge jekku ivor vasspilka linusdm koolquark cro

data_schema's Issues

Should has_many and list_of consider [] as nil if the field is not optional?

Experiment with producing arbitrary data types from the schemas

Currently the library is focused around emitting a Struct from some arbitrary input data.
This is fine and good but the way we do that is with a reduce, which makes me wonder if it's possible to have schemas reduce into any arbitrary data type. It would be fun to explore this.

A use case could be using data_schema to parse API responses, but then also wanting to serialise requests from a struct, for example. So the flow would be:

API response (xml) => Struct => biz logic happens 
                                  ||
API request (xml) <============ Struct

An early idea is perhaps schemas have another attr like @accumulator %MyStruct{} or @serialises_to "" that we can provide as the accumulator.
We'd probably end up making another fn that to_struct calls that does the reduce.

This could possible clean up the whole "reducing to a map" thing in the runtime schemas....

let default be an atomic value too.

currently it's a 0 arity fn which is good for like Date.utc_today but verbose for [] as it has to be fn -> [] end

we can probably support both with a check

Fix documentation

I think the docs around rescue aren't valid elixir 🙈 check!

Implement collect_errors option when casting

This would attempt to continue casting on other fields, collecting all of the errors as it goes.

This is potentially a bit tricky and less performant so we should probably have separate functions.

Two Way Mappings? Isomorphisms? Round Trips?

Right now Data Schemas allow for creating a struct from XML for example. What about taking the struct and creating XML from the schema?

This feels possible, but if your schema selects a subset of the XML then obvioulsy if you round tripped you'd lose info.

However the round tripping isn't always neccessary, instead could we define a schema that serializes to XML?

Here casting doesn't make much sense (it will just become strings AFAICT, though casting fns would allow you to transform input data before putting into the XML)

field: {"./Path/To/Node/text()", :map_or_struct_key}
# or 
field: {"./Path/To/Node/text()", :map_or_struct_key, String}

Better error message for when a casting fn doesn't return an okay tuple.

Add option to treat empty lists as nil for the purpose of the optional check

for has_many and list_of fields if we return an empty list from a data accessor and the field is not optional we don't error.

Arguably we should because it is empty. The problem is for some input types (like maps. json) there may be a difference between "the value was provided and it was []" and "the value does not exist in the input data". This is the difference between: %{preferences: []} and %{}. We don't want to lose this distinction.

In ecto they have an option when casting empty_values which lets the user specify which values we should consider to be empty. If a field has any of those values returned from the data accessor when casting we treat them as nil for the purpose of the optionality check.

I wonder if a similar thing here could be useful...

JSON Sax Parser, then integrate it.

Integrate with Phoenix Forms

We should investigate how best to do this or if this is desireable / possible. Seems like we would have to implement something that implements the Phoenix.HTML.FormData protocol perhaps. Effectively very similar to changesets at that point.

We should also think if this is to be used as a form validation layer how would it integrate to ecto further into the stack.
Could look here for inspiration: https://github.com/phoenixframework/phoenix_ecto/blob/master/lib/phoenix_ecto/html.ex

Add some more better Error functions - extract_errors / traverse_errors etc

We return errors from casting it would be good to add functions on that struct that extract all the errors needed - or traverse them giving the user a chance to do something with all the errors like concat them into a string or whatever.

Allow for casting to a map for inline schemas

I think we allow inline / runtime schemas that provide a struct to cast to we should also allow creation of a map with casted fields.

Enums aka discriminated unions ... ?

Should we add this capability. If so how. Some options:

a new field type
handle it in the cast fns (not good).

If we have them we also presumably need to allow list of enums.

Add Telemetry

It might be that the functions run too quickly for telemetry to be viable...
But it could be good to be able to measure time taken to complete each cast fn for example?

Feature: fn to go from runtime schema -> compiled schema.

explore persistent_term schemas?

would there be any benefit to having schema fields live in persistent term rather than in a hidden module?
I'm thinking about what absinthe does for example: https://hexdocs.pm/absinthe/Absinthe.Schema.PersistentTerm.html

We are obvs way less macro heavy so maybe there is no differene but would be interesting to see.

Feature: Generate Schemas From OpenAPI Schema

https://spec.openapis.org/oas/v3.1.0#fixed-fields-11

I think we should be able to go some way to generating schemas based on open API schema file.

There are a lot of questions here, and supporting it in the general case might be very tricky. But we can start simple.
It's possible this should be another repo though that you can optionally add... But we can start here and see where it gets us.

The idea is to get at least 80% of the way to written schemas (inline or not?). That would making creating API clients extrememly easy by letting us ingest the schema create schemas and programatically create a client for any API!

Feature: Better error messages

When a cast function raises unexpectedly it can be tricky to see what field actually caused it. This can be helped by unit testing the cast fns (so you are sure they handle the cases you expect) and then possibly IO.inspecting the value as it comes in. But it would be nicer to be able to see which field caused the error just straight off the bat.

The obvious thing to do would be to wrap the cast__fns in a try catch but I don't like that at all...

Benchmark

Vs embedded_schemas
vs manually creating the struct.
vs Map.put s?

Should cast_fns be able to receive options...

Example use case is something like:

field: {:sid, 
  {["Flight"], {:attr, "FlightKey"}}, 
  {StringType, trim: true, blank_as_nil: true}, 
  optional?: true
}

field: {:sid, 
  {["Flight"], {:attr, "FlightKey"}}, 
  StringType, 
  optional?: true,
  cast_options: [trim: true, blank_as_nil: true]
}

This would be a breaking change so we'd need to think about how we could roll it out, especially given that most probably wont need the functionality.

Do we have two behaviours for cast_fns ?

Feature: Extract Paths

Sometimes it can be useful to see all of the paths in a given schema.
We should be able to do this both recursively (include all nested schemas) and 1 level deep, but we should spit out a the field (the struct key) and the path to the data.

Function the generates runtime schemas from compile time ones?

Fuzzing

Just an early thought but

It would be great if you could easily generate either valid schemas, or valid input data that could be consumed by a schema.
Even better if this is random in some way.

So one way we could do that is add another (optional) callback to the cast_fn that let you define generators for input to the cast fns.

If that is implemented then you could easily generate example data from it without needing factories or anything like that.

Generate schemas from JSON schema

Allow MFA as casting fns?

We should be aware that local captures seem to have a performance cost in erlang: beam-telemetry/telemetry#43

We should at the minimum mention that in the docs. We should also see how feasible it is to allow MFA syntax. I guess we would prepend the value onto the argument list which means you could add literals to the fn like so:

field: {:thing, "thing", {MyMod, :my_fun, [1,2]}}

That would get called like this: MyMod.my_fun(input["thing"], 1, 2)

The use case is partially handled by allowing a Module to be provided as a cast fn, but we can't pass in extra arguments that way.

The downside is the potential for the schemas to get quite wide and unreadable. They risk looking a bit scruffy...

Schemaless Schemas

Allow for schemaless schemas - ie schemas that are just the tuple based representation.

~~This should be relatively simple.~~

The tricky bit is how to allow for inline schemas that have has_many / has_one. The syntax is not particularly nice and we need a way to provide a data_accessor. Arguably it should use the data accessor of the "parent" - ie the accessor that is passed in when call to_struct initially.

Lots of branching options here... because the number of ways you can provide a struct is... a lot.

I like the idea of being able to turn some data into an existing struct without having to add code into that struct's module - like a data_accessor. That would allow for one struct to work in many different schemas.

Make DataSchema.Errors an exception struct so it can be raised.

Make the default for has_many / list_of be a []

When we define the struct should we do this for any has_many / list_of fields:

defstruct [my_field: []]

right now we just do this:

defstruct [:my_field]

Add validation for option fields

This can be done at compile time in the macro and will just validate that it hasn't been mistyped or anything.