Giter VIP home page Giter VIP logo

yardl's People

Contributors

dependabot[bot] avatar johnstairs avatar microsoftopensource avatar naegelejd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

yardl's Issues

Support for column-major layout in binary format?

The C++ types for multi-dimensional arrays (yardl::FixedNDArray, yardl::NDArray, and yardl::DynamicNDArray) are all locked to row-major layout at compile-time.

We will eventually support languages that default to column-major ordering. HDF5 requires data to be written in row-major order, so we will need to convert. For the binary format, we could do the same, or we could prefix each array with a byte indicating the layout. This could avoid expensive permutations if readers and writers are both working with column-major ordering.

Invalid Python generated for records containing optional generic fields

Using yardl 2d61ba3 with the following minimal model:

MyRecord: !record
  fields:
    myField: RecordWithGenericOptional<string>

RecordWithGenericOptional<T>: !record
  fields:
    value: T?

The generated types.py does not properly initialize the inner record class. See generated classes below:

class RecordWithGenericOptional(typing.Generic[T]):
    value: typing.Optional[T]

    def __init__(self, *,
        value: typing.Optional[T],
    ):
        self.value = value

    ...


class MyRecord:
    my_field: RecordWithGenericOptional[str]

    def __init__(self, *,
        my_field: typing.Optional[RecordWithGenericOptional[str]] = None,
    ):
        self.my_field = my_field if my_field is not None else RecordWithGenericOptional()

    ...

Python throws a TypeError when creating an instance of MyRecord:

In [1]: import combined

In [2]: combined.MyRecord()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 combined.MyRecord()

File /workspaces/yardl/joe/issue-switch-over-union/python/combined/types.py:45, in MyRecord.__init__(self, my_field)
     42 def __init__(self, *,
     43     my_field: typing.Optional[RecordWithGenericOptional[str]] = None,
     44 ):
---> 45     self.my_field = my_field if my_field is not None else RecordWithGenericOptional()

TypeError: RecordWithGenericOptional.__init__() missing 1 required keyword-only argument: 'value'

In RecordWithGenericOptional.__init__, value should be instantiated with a default value of None because it is Optional.

However, yardl currently omits default values for all generic types:

if dsl.ContainsGenericTypeParameter(f.Type) {
// cannot default generic type parameters
// because they don't really exist at runtime
defaultExpressionKind = defaultValueKindNone
} else {
defaultExpression, defaultExpressionKind = typeDefault(f.Type, rec.Namespace, "", st)
}

Unable to create switch expression of type string

Using yardl v0.4.0, I expect to be able to use a computed field switch expression to produce a string value:

RecordWithComputedFields: !record
  fields:
    myField: [null, string, float]
  computedFields:
    myResult:
      !switch myField:
        null: "null"
        string: "string"
        float: "float"

But yardl complains with:

❌ /workspaces/yardl/joe/switch-case-string/model/model.yml:6:14: there is no variable in scope with the name 'null' nor does the record 'RecordWithComputedFields' does not have a field or computed field named 'null'
❌ /workspaces/yardl/joe/switch-case-string/model/model.yml:7:18: there is no variable in scope with the name 'string' nor does the record 'RecordWithComputedFields' does not have a field or computed field named 'string'
❌ /workspaces/yardl/joe/switch-case-string/model/model.yml:8:17: there is no variable in scope with the name 'float' nor does the record 'RecordWithComputedFields' does not have a field or computed field named 'float'

exported CMakeLists.txt problems

  • should set CMAKE_CXX_STANDARD (what's the current minimum?)
  • should set target_include_directories
  • do we really need to depend on the C HDF5 libraries ?
  • why is this code here, and if it's needed, why is it before the find_package(HDF5)? (it'll be overwritten, no?)
    if(VCPKG_TARGET_TRIPLET)
      set(HDF5_CXX_LIBRARIES hdf5::hdf5_cpp-shared)
    else()
      set(HDF5_CXX_LIBRARIES hdf5::hdf5_cpp)
    endif()

xtensor required version undocumented

For whatever reason, my conda install got xtensor=0.21.10. The generated code fails to compile though as it xtensor_container doesn't have the flat member. xtensor-stack/xtensor@50e3d42 says this means at least 0.23.10 is required.

Ideally, this minimum version should be added to the generated CMakeLists.txt.

Of course, the same holds for other dependencies.

Vectors of bools broken

Vectors of booleans:

V: !vector
  items: bool

are not handled because the .data() method is deleted from the std::vector<bool> specialization. Additionally, binary serialization should write out a bitstream where each value is a bit rather than a byte.

Add JSON serialization target

The ability to write a protocol out at JSON could be useful for debugging, even if it is not well-suited for large streams of scientific data.

C++ optional build of HDF5/NDJSON support

I suggest to add

option(${prefix}_HDF5_SUPPORT "Add HDF5 protocol" ON)

or similar, and also for NDJSON. Could be advanced options. This would allow the advanced to switch off something that they don't need.

Choice of C++ ndarray type

Creating a separate issue based on #20 opened by @KrisThielemans

Also, somewhere in the doc we'll need a description of mappings between yardl types and C++ and other target languages. In particular, I believe you generate your own multi-dim array type as there still doesn't seem to be an std container sadly.

It could be useful to support a few existing multi-dim arrays to avoid copies in client-code (Boost.MultiArray and https://amypad.github.io/CuVec/ come to mind), but I can see that becoming very difficult. (If a mapping to a flat array is exposed somewhere, it'd need to be stated if row-major or column-major order is used).

These are good points. We currently use xtensor types for multidimensional arrays that we alias here. These have a .data() method that exposes the raw flat array.

I think we have some choices for this problem:

  1. We implement our own ndarray types that provide the minimum API surface and aim to make interop with other libraries "easy".
  2. We support a number of different libraries and generate different code depending on a setting in the _package.yaml.
  3. Implement both of the above, since they are not mutually exclusive, with (1) being the default.

Related problem: in some instances, perhaps the memory should be allocated on the GPU. Should this be a be a property on the !array in yardl?

Python ndjson error reading aliased nullable union with value None

Using yardl v0.4.0, if you add an aliased nullable union to a Protocol sequence, the NDJSON reader will crash if the value of that step is None.

Example:

GenericNullableUnion2<T1, T2>: [null, T1, T2]

RecordWithUnions: !record
  fields:
    value: [null, int, string]
    aliasedValue: GenericNullableUnion2<int, string>

Then, using the following code to convert an instance of RecordWithUnions to json and back again:

import yay

converter = yay.ndjson.RecordWithUnionsConverter()

json = converter.to_json(yay.RecordWithUnions())

r = converter.from_json(json)

The last line throws:

Traceback (most recent call last):
  File "/workspaces/yardl/joe/issue-#113/python/test.py", line 7, in <module>
    r = converter.from_json(json)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/issue-#113/python/yay/ndjson.py", line 58, in from_json
    aliased_value=self._aliased_value_converter.from_json(json_object["aliasedValue"],),
                                                          ~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'aliasedValue'

Error reading empty stream if it is the last step in a Protocol

If:

  1. the last step in a Protocol is a stream, and
  2. the user is batch reading the stream, and
  3. the stream is empty, the Reader.Close() method throws an incorrect error.

Using 28aa4af and the following model:

EmptyTest: !protocol
  sequence:
    strings: !stream
      items: string

and the following demonstration program:

int main(void) {
  ::binary::EmptyTestWriter w("test.bin");

  std::vector<std::string> strings;
  w.WriteStrings(strings);
  w.EndStrings();
  w.Close();

  ::binary::EmptyTestReader r("test.bin");

  int count = 0;
  strings.reserve(10);
  while (r.ReadStrings(strings)) {
    for (auto const& s : strings) {
      (void)(s);
      count++;
    }
  }
  assert(count == 0);

  r.Close();

  return 0;
}

the call to r.Close() throws the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Expected call to ReadStrings() but received call to Close() instead.

Wrong error message in Python get_dtype for generic type missing type arguments

This is a very minor bug, but it led to a necessary review of how the generated Python get_dtype function should work.

Given a model with an aliased, generic type:

GenericRecord<T>: !record
  fields:
    v: T

AliasedRecord<T>: GenericRecord<T>

If I call get_dtype on GenericRecord without specifying type arguments, I get a useful error message:

m.get_dtype(m.GenericRecord)
...
RuntimeError: Generic type arguments not provided for <class 'm.types.GenericRecord'>

But if I do the same for the aliased type, I do not get the same, expected error message:

m.get_dtype(m.AliasedRecord)
...
RuntimeError: Cannot find dtype for ~T

The user does not know what ~T is.

Python RecursionError with aliased generics

Using yardl commit ae9b826 with the following model:

GenericRecord<T>: !record
  fields:
    v: T

AliasedRecord<T>: GenericRecord<T>

AliasedOpenGeneric<T>: AliasedRecord<T>
AliasedClosedGeneric: AliasedRecord<string>

To reproduce, generate Python for this model, then import the generated Python module.
The module won't import, failing with the following error:

Traceback (most recent call last):
  File "/workspaces/yardl/joe/models/bug/python/test.py", line 1, in <module>
    import bug
  File "/workspaces/yardl/joe/models/bug/python/bug/__init__.py", line 21, in <module>
    from .types import (
  File "/workspaces/yardl/joe/models/bug/python/bug/types.py", line 56, in <module>
    get_dtype = _mk_get_dtype()
                ^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/models/bug/python/bug/types.py", line 52, in _mk_get_dtype
    dtype_map[AliasedClosedGeneric] = get_dtype(types.GenericAlias(AliasedRecord, (str,)))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/models/bug/python/bug/_dtypes.py", line 107, in <lambda>
    return lambda t: get_dtype_impl(dtype_map, t)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/models/bug/python/bug/_dtypes.py", line 90, in get_dtype_impl
    return res(get_args(t))
           ^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/models/bug/python/bug/types.py", line 51, in <lambda>
    dtype_map[AliasedOpenGeneric] = lambda type_args: get_dtype(types.GenericAlias(AliasedRecord, (type_args[0],)))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
RecursionError: maximum recursion depth exceeded in comparison

For this particular model, reordering the aliases as shown below can eliminate the error, but this is a confusing limitation for the user.

AliasedClosedGeneric: AliasedRecord<string>
AliasedOpenGeneric<T>: AliasedRecord<T>

Invalid Python codegen when record contains aliased nullable union type

Given the following model on commit 42be458:

X: [null, int, float]

MyRec: !record
  fields:
    a: X

We get the following exception when importing the generated code:

Traceback (most recent call last):
  File "/workspaces/yardl/python/run_sandbox.py", line 5, in <module>
    import sandbox
  File "/workspaces/yardl/python/sandbox/__init__.py", line 21, in <module>
    from .types import (
  File "/workspaces/yardl/python/sandbox/types.py", line 121, in <module>
    get_dtype = _mk_get_dtype()
                ^^^^^^^^^^^^^^^
  File "/workspaces/yardl/python/sandbox/types.py", line 117, in _mk_get_dtype
    dtype_map.setdefault(MyRec, np.dtype([('a', get_dtype(typing.Optional[X]))], align=True))
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/python/sandbox/_dtypes.py", line 87, in <lambda>
    return lambda t: get_dtype_impl(dtype_map, t)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/python/sandbox/_dtypes.py", line 60, in get_dtype_impl
    return _get_union_dtype(get_args(t))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/python/sandbox/_dtypes.py", line 81, in _get_union_dtype
    inner_type = get_dtype_impl(dtype_map, args[0])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/python/sandbox/_dtypes.py", line 76, in get_dtype_impl
    raise RuntimeError(f"Cannot find dtype for {t}")
RuntimeError: Cannot find dtype for <class 'sandbox.types.X'>

Another problem is that the dtype for [null, int, float] should be np.object_, instead of {"has_value": np.bool, "value": np.object_}

Another problem is that the generated union classes that do not have a named type (e.g. Int32OrString) are not recognized by get_dtype() and throw.

binary format doc on alignment

By using varints , strings etc t's possible that data is not aligned to a 32-bit or whatever boundary. It doesn't seem documented if the binary format fills in the gaps or not. This certainly needs to be documented for Records and Streams.

multi-dim arrays

https://github.com/microsoft/yardl/blob/main/docs/docs.md#computed-fields states

MyRec: !record
  fields:
    arrayField: !array
        items: int
        dimensions: [x, y]
  computedFields:
    accessArrayElementByName: arrayField[y:1, x:0]

this swaps the order between dimensions and access. Is this intentional? It'd be very confusing!

Also, somewhere in the doc we'll need a description of mappings between yardl types and C++ and other target languages. In particular, I believe you generate your own multi-dim array type as there still doesn't seem to be an std container sadly.

It could be useful to support a few existing multi-dim arrays to avoid copies in client-code (Boost.MultiArray and https://amypad.github.io/CuVec/ come to mind), but I can see that becoming very difficult. (If a mapping to a flat array is exposed somewhere, it'd need to be stated if row-major or column-major order is used).

environment.yml is overcomplete (and Linux specific)

On Windows from Powershell.

mamba env create  --file environment.yml

Looking for: ['bash-completion=2.11', 'ccache=4.5.1', 'clang-format=14.0.4', 'cmake=3.21.3', 'fmt=8.1.1', 'gcc_linux-64]

Could not solve for environment specs
Encountered problems while solving:
  - nothing provides requested bash-completion 2.11**
  - nothing provides requested gcc_linux-64 11.2.0**
  - nothing provides requested gdb 11.2**
  - nothing provides requested gxx_linux-64 11.2.0**
  - nothing provides requested valgrind 3.18.1**

The environment can't be solved, aborting the operation

I guess we should remove valgrind and gdb? Even bash_completion and ccache. Maybe even clang-format.

Of course, the justfile is bash/Linux specific as well and I guess Windows support is for later.

PS: Is pinning the compiler version etc best practice? Maybe some of these could be >=?

Union of aliased type compiler error for C++ NDJSON

Using 28aa4af and the following model:

UnionOfAlias: !protocol
  sequence:
    variant: [int, string]
    variantAlias: [AliasedInt, string]

produces the following compiler error for the C++ NDJSON serialization:

/workspaces/yardl/joe/quickcheck/cpp/generated/ndjson/protocols.cc:33:8: error: redefinition of 'struct nlohmann::json_abi_v3_11_2::adl_serializer<std::variant<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >'
   33 | struct adl_serializer<std::variant<check::AliasedInt, std::string>> {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/workspaces/yardl/joe/quickcheck/cpp/generated/ndjson/protocols.cc:14:8: note: previous definition of 'struct nlohmann::json_abi_v3_11_2::adl_serializer<std::variant<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >'
   14 | struct adl_serializer<std::variant<int32_t, std::string>> {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

licensing of generated code

From @hansenms 's email

  1. The yardl tool is MIT license.
  2. During code generation, there are some hand crafted files that are copied with the generated code, they are MIT too, we should stick that license on them.
  3. The yardl specification that the user writes can be whichever license they want. We would recommend MIT.
  4. The code that is truly generated can have any license that the user wants. We recommend MIT.
  5. We should add a feature to yardl, that allows you to put a license file in the _package directory, it should be emitted with the generated code. We may have to think about how that is transparent in the generated directory which license applies to what. There could be a mix of licenses in there.

Use defined alias types are not checked for compatibility with existing types in a union

When defining a union type field within a record there is validation which ensures all type cases in a union are distinct. User defined aliases are not checked. This leads to issues compiling generated code due to variants containing multiples of the same underlying type. There also appears to be a similar conflict with 'size' and 'uint64'.

MyIntType: uint64 
MyRecord: !record
  fields:
    one: [uint64, MyIntType]
MyRecord: !record
  fields:
    one: [uint64, size]

Records defined as such succeed in code generation, but that code cannot be compiled.

mypy generates lots of warnings on generated code

For instance, on https://github.com/ETSInitiative/PRDdefinition/tree/main/python

$ mypy prd_generator.py 
prd/yardl_types.py:270: error: No overload variant of "zip" matches argument types "void", "void"  [call-overload]
prd/yardl_types.py:270: note: Possible overload variants:
prd/yardl_types.py:270: note:     def [_T_co, _T1] __new__(cls, Iterable[_T1], /, *, strict: bool = ...) -> zip[tuple[_T1]]
prd/yardl_types.py:270: note:     def [_T_co, _T1, _T2] __new__(cls, Iterable[_T1], Iterable[_T2], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2]]
prd/yardl_types.py:270: note:     def [_T_co, _T1, _T2, _T3] __new__(cls, Iterable[_T1], Iterable[_T2], Iterable[_T3], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2, _T3]]
prd/yardl_types.py:270: note:     def [_T_co, _T1, _T2, _T3, _T4] __new__(cls, Iterable[_T1], Iterable[_T2], Iterable[_T3], Iterable[_T4], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2, _T3, _T4]]
prd/yardl_types.py:270: note:     def [_T_co, _T1, _T2, _T3, _T4, _T5] __new__(cls, Iterable[_T1], Iterable[_T2], Iterable[_T3], Iterable[_T4], Iterable[_T5], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2, _T3, _T4, _T5]]
prd/yardl_types.py:270: note:     def [_T_co] __new__(cls, Iterable[Any], Iterable[Any], Iterable[Any], Iterable[Any], Iterable[Any], Iterable[Any], /, *iterables: Iterable[Any], strict: bool = ...) -> zip[tuple[Any, ...]]
prd/yardl_types.py:299: error: "object" has no attribute "value"  [attr-defined]
prd/_ndjson.py:48: error: Incompatible types in assignment (expression has type "TextIO", variable has type "TextIOWrapper")  [assignment]
prd/_ndjson.py:86: error: Incompatible types in assignment (expression has type "BufferedReader | TextIO", variable has type "TextIOWrapper")  [assignment]
prd/_ndjson.py:940: error: <nothing> has no attribute "to_json"  [attr-defined]
prd/_ndjson.py:958: error: <nothing> has no attribute "from_json"  [attr-defined]
prd/_ndjson.py:993: error: Incompatible types in assignment (expression has type "None", variable has type "tuple[int, ...]")  [assignment]
prd/_ndjson.py:1024: error: Need type annotation for "result"  [var-annotated]
prd/_binary.py:1071: error: Incompatible types in assignment (expression has type "None", variable has type "tuple[int, ...]")  [assignment]
prd/_binary.py:1076: error: <nothing> has no attribute "_element_serializer"  [attr-defined]
prd/_binary.py:1115: error: Need type annotation for "result"  [var-annotated]

Allow specifying HDF5 group path

When writing a protocol to an HDF5 file, we create a group with the protocol's name. It you wanted to store multiple experiments with the same protocol in the same file, we could have an optional path parameter that specifies the group to put the protocol in.

Add RelativeTime?

It seems useful to have a RelativeTime, i.e. offset w.r.t. some defined DateTime, such as a scan start. This would be quite useful in de-identifying some data. In some cases, the time of scan needs to be removed from the data, but it'd be painful to have to adjust all times in the file.

Implement `switch` expression in expression language, not YAML

Computed fields are currently an embedded expression language within a YAML file. switch expressions (to work with unions and optional types) are not expressed in this language, but rather as YAML nodes:

optionalNamedArrayLength: # YAML
  !switch optionalNamedArray: # YAML
    NamedNDArray arr: size(arr) # YAML-type-expression hybrid
    null: 0 # YAML-type-expression hybrid

This does not allow switch expressions to be used as part of larger expressions (type conversions, a function call argument, etc).

Instead, we should consider making switch part of the expression language. The example above might then look like:

optionalNamedArrayLength: |
  switch(optionalNamedArray) {
    NamedNDArray arr: size(arr)
    null: 0
  }

On the other hand, this syntax introduces curly braces within a YAML document, where indentation is usually favoured.

Python union classes generated with duplicate type parameters

Using the following model:

GenericUnion<T>: !union
  t: T
  tv: T*
  tvf: T[]

yardl v0.4.0 generates invalid Python:

class GenericUnion(typing.Generic[T, T_NP, T, T_NP, T, T_NP]):

Error message on import:

TypeError: Parameters to Generic[...] must all be unique

Support flags enums

We should have a special kind of enum for flags that are meant to be bitwise ORed together:

!flags
  values:
    - none
    - red
    - green
    - blue

The first value will always be 0.

As with enums, you can specify the base type and integer values:

!flags
  values:
    none: 0
    red: 1
    green: 2
    blue: 4

confusing naming of functions/members

There is some renaming of members going on in the generated code, but it is not consistent

ScannerInformation: !record
  fields:
    tofBinEdges: !array
  computedFields:
    numberOfTOFBins: size(tofBinEdges)-1

leads to tof_bin_edges member in both C++ and Python, but NumberOfTOFBins() (note capital N) in C++ while number_of_tof_bins() in Python

Personally I'd try to avoid any renaming, but maybe that is difficult when covering multiple languages. We could enforce naming in the yardl model?

basic algebra on computed fields

It'd be nice to be able to do some basic manipulations for a computed field, e.g. subtracting 1

ScannerInformation: !record
  fields:
    # edge information for TOF bins in mm (e.g. start,edge1, ... end)
    tofBinEdges: float*
  computedFields:
    numberOfTOFBins: size(tofBinEdges)-1

container-independent access

At present, I believe the user has to know if the stored data is binary or HDF5, and instantiate the corresponding class. That's efficient but also very inconvenient. It would certainly be nice to be able to write some client-code that does not depend on the container-type. (Edit: I see that there are abstract classes in protocols.h already, so possibly the only thing that's necessary is a factory that determines the container-type given a filename)

Python TypeError instantiating record that contains aliased generic field

Using yardl commit ab1e2b with the following model:

GenericRecord<T>: !record
  fields:
    v: T

AliasedRecord<T>: GenericRecord<T>

MyRecord: !record
  fields:
    myField: AliasedRecord<int>

To reproduce, generate Python for this model, then import the generated Python module and create an instance of MyRecord with no arguments.
Python will complain that MyRecord.__init__() is missing the keyword argument for my_field:

Python 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import issue_082
>>> r = issue_082.MyRecord()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: MyRecord.__init__() missing 1 required keyword-only argument: 'my_field'

The generated code for MyRecord looks like this:

class MyRecord:
    my_field: AliasedRecord[yardl.Int32]

    def __init__(self, *,
        my_field: AliasedRecord[yardl.Int32],
    ):
        self.my_field = my_field

If I remove the AliasedRecord from the model and use GenericRecord directly, I get the expected class definition for MyRecord, and it works:

class MyRecord:
    my_field: GenericRecord[yardl.Int32]

    def __init__(self, *,
        my_field: typing.Optional[GenericRecord[yardl.Int32]] = None,
    ):
        self.my_field = my_field if my_field is not None else GenericRecord(v=0)

The relevant code is

var defaultExpression string
var defaultExpressionKind defaultValueKind
if dsl.ContainsGenericTypeParameter(f.Type) {
// cannot default generic type parameters
// because they don't really exist at runtime
defaultExpressionKind = defaultValueKindNone
} else {
defaultExpression, defaultExpressionKind = typeDefault(f.Type, rec.Namespace, st)
}
switch defaultExpressionKind {
case defaultValueKindNone:
w.WriteString(fieldTypeSyntax)
case defaultValueKindImmutable:
fmt.Fprintf(w, "%s = %s", fieldTypeSyntax, defaultExpression)
case defaultValueKindMutable:
fmt.Fprintf(w, "typing.Optional[%s] = None", fieldTypeSyntax)
}

time zone handling in documentation

https://microsoft.github.io/yardl/reference/binary.html#dates-times-and-datetimes isn't clear on time zone handling. DateTimes is clear enough, (although a reference to what "since epoch" means would be good) but the doc on Dates and Times should say this is in UTC presumably. That might be undesirable/confusing though, so maybe it's better to support time zones spec.

Also, I'm assuming the types are actually singular, e.g. DateTime. I think a different font needs to be used for the actual type name (like you use for float etc).

Support shorthand for vectors and arrays

We could support some syntactic sugar for !vector and !array. Perhaps something like:

int* # a vector of int of unknown length
int*3 # a vector of ints of length 3

int[] # an array of ints with an unknown number of dimensions
int[,] # an array of ints with two dimensions
int[x,y] # an array of ints with two named dimensions
int[3,4] # an array of ints with two fixed dimensions
int[x:3, y:4] # an array of ints with two named and fixed dimensions

Best practices for HDF5 version management?

Anyone has any suggestions for how to handle HDF5 versions in CMake? I have built STIR with a particular version of HDF5, and my yardl stuff accidentally with another version. Result: crash at start-up time.

Python `get_dtype` does not work for vectors, arrays, and maps

Using the current test model, yardl throws a RuntimeError: Cannot find dtype for each of the following true assertions:

import test_model as tm

assert tm.get_dtype(tm.AliasedGenericVector[int]) == np.object_

assert tm.get_dtype(tm.AliasedGenericFixedVector[int]) == np.int32

assert tm.get_dtype(tm.AliasedGenericDynamicArray[int]) == np.object_

assert tm.get_dtype(tm.AliasedGenericFixedArray[int]) == np.int32

assert tm.get_dtype(tm.basic_types.AliasedMap[str, int]) == np.object_

use / support for python / numpy array api

Dear Yardl developers,

would it be (in principle) possible to use numpy.array_api instead of numpy as array backend for the generated python code?
Note that numpy.array_api is a reference implementation of the array API standard.

By doing so, Yardl would be compliant with the python array api and as such more agnostic to the specific array backend which would potentially allow using other compliant array backends (e.g. cupy or pytorch) in the future.

Georg

PS: @johnstairs @hansenms thanks for the support of the 1st ETSI Hackathon

Python syntax/type errors when default union type is list or dict

Given the following model:

GenericUnionsRecord<T, U>: !record
  fields:
    a: !union
      tv: T*
      t: T
    b: !union
      tm: T->U
      t: T

yardl v0.4.0 generates invalid constructor code for both inner unions:

class GenericUnionsRecord(typing.Generic[T, T_NP, U]):
    ...

    def __init__(self, *, ...):
        self.a = a if a is not None else TvOrT.Tv([]())
        self.b = b if b is not None else TmOrT.Tm({}())

Warnings on import (these are TypeErrors at runtime):

/workspaces/yardl/joe/issue-#112/python/odd/types.py:58: SyntaxWarning: 'list' object is not callable; perhaps you missed a comma?
  self.a = a if a is not None else TvOrT.Tv([]())
/workspaces/yardl/joe/issue-#112/python/odd/types.py:59: SyntaxWarning: 'dict' object is not callable; perhaps you missed a comma?
  self.b = b if b is not None else TmOrT.Tm({}())

This issue occurs when the first type in the union resolves to a Python list or dict.

Support maps

We should support maps/dictionaries as a first-class datatype. Syntax could be something like:

x: !map
  keys: string
  values: int

Keys can only be primitive scalar types.

Shorthand syntax could look like:

string->int

We should also make sure maps can be used in computed fields.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.