Giter VIP home page Giter VIP logo

ome-types's Introduction

ome-types

License Version CondaVersion Python Version Tests Docs codecov Benchmarks

A pure-python implementation of the OME data model

ome_types provides a set of python dataclasses and utility functions for parsing the OME-XML format into fully-typed python objects for interactive or programmatic access in python. It can also take these python objects and output them into valid OME-XML. ome_types is a pure python library and does not require a Java virtual machine.

Note: The generated python code can be seen in the built branch. (Read the code generation section for details).

📖   documentation

Installation

from pip

pip install ome-types

With all optional dependencies:

# lxml => if you ...
#           - want to use lxml as the XML parser
#           - want to validate XML against the ome.xsd schema
#           - want to use XML documents older than the 2016-06 schema
# pint      => if you want to use object.<field>_quantity properties
# xmlschema => if you want to validate XML but DON'T want lxml

pip install ome-types[lxml,pint]

from conda

conda install -c conda-forge ome-types

from github (bleeding edge dev version)

pip install git+https://github.com/tlambert03/ome-types.git

Usage

convert an XML string or filepath into an instance of ome_types.model.OME

(The XML string/file will be validated against the ome.xsd schema)

from ome_types import from_xml

ome = from_xml('tests/data/hcs.ome.xml')

extract OME metadata from an OME-TIFF

from ome_types import from_tiff

ome2 = from_tiff('tests/data/ome.tiff')

manipulate the metadata via python objects

Both from_xml and from_tiff return an instance of ome_types.model.OME. All classes in ome_types.model follow the naming conventions of the OME data model, but use snake_case attribute names instead of CamelCase, to be consistent with the python ecosystem.

In [2]: ome = from_xml('tests/data/hcs.ome.xml')

In [3]: ome
Out[3]:
OME(
    images=[<1 Images>],
    plates=[<1 Plates>],
)

In [4]: ome.plates[0]
Out[4]:
Plate(
    id='Plate:1',
    name='Control Plate',
    column_naming_convention='letter',
    columns=12,
    row_naming_convention='number',
    rows=8,
    wells=[<1 Wells>],
)


In [5]: ome.images[0]
Out[5]:
Image(
    id='Image:0',
    name='Series 1',
    pixels=Pixels(
        id='Pixels:0',
        dimension_order='XYCZT',
        size_c=3,
        size_t=16,
        size_x=1024,
        size_y=1024,
        size_z=1,
        type='uint16',
        bin_data=[<1 Bin_Data>],
        channels=[<3 Channels>],
        physical_size_x=0.207,
        physical_size_y=0.207,
        time_increment=120.1302,
    ),
    acquisition_date=datetime.fromisoformat('2008-02-06T13:43:19'),
    description='An example OME compliant file, based on Olympus.oib',
)

Objects can be removed, or changed

In [6]: from ome_types.model.simple_types import UnitsLength

In [7]: from ome_types.model.channel import AcquisitionMode

In [8]: ome.images[0].description = "This is the new description."

In [9]: ome.images[0].pixels.physical_size_x = 350.0

In [10]: ome.images[0].pixels.physical_size_x_unit = UnitsLength.NANOMETER

In [11]: for c in ome.images[0].pixels.channels:
             c.acquisition_mode = AcquisitionMode.SPINNING_DISK_CONFOCAL

Elements can be added by constructing new OME model objects

In [12]: from ome_types.model import Instrument, Microscope, Objective, InstrumentRef

In [13]: microscope_mk4 = Microscope(
             manufacturer='OME Instruments',
             model='Lab Mk4',
             serial_number='L4-5678',
         )

In [14]: objective_40x = Objective(
             manufacturer='OME Objectives',
             model='40xAir',
             nominal_magnification=40.0,
         )

In [15]: instrument = Instrument(
             microscope=microscope_mk4,
             objectives=[objective_40x],
         )

In [16]: ome.instruments.append(instrument)

In [17]: ome.images[0].instrument_ref = InstrumentRef(id=instrument.id)

In [18]: ome.instruments
Out[18]:
[Instrument(
    id='Instrument:1',
    microscope=Microscope(
       manufacturer='OME Instruments',
       model='Lab Mk4',
       serial_number='L4-5678',
    ),
    objectives=[<1 Objectives>],
 )]

export to an OME-XML string

Finally, you can generate the OME-XML representation of the OME model object, for writing to a standalone .ome.xml file or inserting into the header of an OME-TIFF file:

In [19]: from ome_types import to_xml

In [20]: print(to_xml(ome))
<OME ...>
    <Plate ColumnNamingConvention="letter" Columns="12" ID="Plate:1" ...>
        ...
    </Plate>
    <Instrument ID="Instrument:1">
        <Microscope Manufacturer="OME Instruments" Model="Lab Mk4" SerialNumber="L4-5678" />
        <Objective Manufacturer="OME Objectives" Model="40xAir" ID="Objective:1"
        NominalMagnification="40.0" />
    </Instrument>
    <Image ID="Image:0" Name="Series 1">
        <AcquisitionDate>2008-02-06T13:43:19</AcquisitionDate>
        <Description>This is the new description.</Description>
        <InstrumentRef ID="Instrument:1" />
        <Pixels ... PhysicalSizeX="350.0" PhysicalSizeXUnit="nm" ...>
            <Channel AcquisitionMode="SpinningDiskConfocal" ...>
             ...
        </Pixels>
    </Image>
</OME>

Code generation

The bulk of this library (namely, modules inside ome_types._autogenerated) is generated at install time, and is therefore not checked into source (or visible in the main branch of this repo).

You can see the code generated by the main branch in the built branch

The package at src/ome_autogen converts the ome.xsd schema into valid python code. To run the code generation script in a development environment, clone this repository and run:

python -m src.ome_autogen

The documentation and types for the full model can be in the API Reference

Contributing

To clone and install this repository locally:

git clone https://github.com/tlambert03/ome-types.git
cd ome-types
pip install -e .[test,dev]

We use pre-commit to run various code-quality checks during continuous integration. If you'd like to make sure that your code will pass these checks before you commit your code, you should install pre-commit after cloning this repository:

pre-commit install

regenerating the models

If you modify anything in src/ome_autogen, you may need to regenerate the model with:

python -m src.ome_autogen

Running tests

To run tests:

pytest

ome-types's People

Contributors

ap-- avatar dependabot[bot] avatar jmuhlich avatar joshmoore avatar nicholas-schaub avatar pre-commit-ci[bot] avatar tlambert03 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ome-types's Issues

Unable to not speciy Channel.Color

Unable to not speciy Image->Pixel->Channel->Color
Not specifying it will create a default value of -1, which will be interpreted by QuPath/Bioformats as white.
If you have 3 separate channels without color specification QuPath/Bioformats will interpret it as RGB fine, but with color=-1 it will have 3 white channels.

Creating Channel() without color results in default color=-1. setting color=None results in an error as internally it's cast to int.
I would like the ability to create a Channel without color parameter in the resulting OME xml.

Improve Performance

We recently switch our bfio utility from OmeXml.py to ome-types. We spent a significant amount of time optimizing our OME Tiff reader/writer and OME Zarr reader/writers. However, one thing we have noticed is that our readers have slowed down by more than an order of magnitude after switching to ome-types.

I did some profiling, and for a 2048x2048 16-bit image, our read time from initial loading to data in memory went from 90ms to >3s. It looks like further analysis reveals that >95% of the time is being spent in ome_types.schema.to_dict. Specifically with how the schema is being parsed by xmlschema. I'll dump the profile in a separate comment so I don't take up too much space here.

This is a deal breaker for us with this kind of performance, but at the same time ome_types is clearly the best option in terms of properly processing OME XML. We can process terrabytes of data a day, so a 30x or more slowdown creates a significant problem, and maybe being a little more sloppy with metadata is okay as long as we can still parse it out properly.

I'm not willing to give up on ome-types though, so if you're open to a discussion about optimization, I am happy to contribute (either discussion or code).

The first sets of questions I have are:

  1. How tightly integrated is xmlschema with the repository? Would it be easy or difficult to switch to something else?
  2. Is it possible we could create a lighter weight xml parser or figure out a way to do lazy parsing? Maybe we could use some alternative to xmlschema when we are reasonably certain that the metadata will pass schema checks.

Based on what I've seen so far, it looks like all XML data is parsed to a dictionary and then passed to other things. So it at least seems possible to change over to something else as long as the input is xml and the output is dictionary that uses the parsing definitions found in schema.OMEConverter.

Error when validating XML with namespace prefixes

Hello!

I was attempting to validate the XML attached below, and got this error:

Traceback (most recent call last):
  File "/home/matteb/code/czi-to-ome-xslt/ome_types_test.py", line 49, in <module>
    from_xml(str(ome))
  File "/home/matteb/virtualenvs/czi-to-ome-xslt/lib/python3.9/site-packages/ome_types/__init__.py", line 42, in from_xml
    return OME(**d)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'ome:Experimenter'

From what I can gather, ome-types expects to find Experimenter, but is stumbling on the namespace prefix in ome:Experimenter. However, using namespace prefixes like this is valid XML, and I have validated this file against the OME schema using several other methods (xmlschema, lxml, and xmlvalid).

Anything I'm missing here?
produced.ome.zip

Unexpected child with tag 'OME:Plane'

Hello,

I created an OME-ZARR from a CZI file with bioformats2raw with 2 pyramid scales.
Then, I am trying to read the METADATA.ome.xml with from_xml and I get the following error:

XMLSchemaChildrenValidationError: failed validating <Element '{http://www.openmicroscopy.org/Schemas/OME/2016-06}Pixels' at 0x7fc7b75d4720> with XsdGroup(model='sequence', occurs=[1, 1]):

Reason: Unexpected child with tag 'OME:Plane' at position 5. Tag ('OME:BinData' | 'OME:TiffData' | 'OME:MetadataOnly') expected.

Schema:

  <xsd:complexType xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:sequence>
      <xsd:element ref="Channel" minOccurs="0" maxOccurs="unbounded">
        <xsd:annotation>
          <xsd:appinfo>
            <xsdfu>
              <ordered />
            </xsdfu>
          </xsd:appinfo>
        </xsd:annotation>
      </xsd:element>
      <xsd:choice minOccurs="1" maxOccurs="1">
        <xsd:element ref="BinData" minOccurs="1" maxOccurs="unbounded" />
        <xsd:element ref="TiffData" minOccurs="1" maxOccurs="unbounded" />
        <xsd:element ref="MetadataOnly" minOccurs="1" maxOccurs="1" />
      </xsd:choice>
      <xsd:element ref="Plane" minOccurs="0" maxOccurs="unbounded" />
    </xsd:sequence>
    <xsd:attribute name="ID" use="required" type="PixelsID" />
    <xsd:attribute name="DimensionOrder" use="required">
    ...
    ...
  </xsd:complexType>

Instance:

  <Pixels xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" BigEndian="true" DimensionOrder="XYZCT" ID="Pixels:0" Interleaved="false" PhysicalSizeX="0.35143526946023074" PhysicalSizeXUnit="µm" PhysicalSizeY="0.35143526946023074" PhysicalSizeYUnit="µm" PhysicalSizeZ="2.0103026803688864" PhysicalSizeZUnit="µm" SignificantBits="16" SizeC="4" SizeT="1" SizeX="9536" SizeY="6944" SizeZ="1" Type="uint16"><Channel AcquisitionMode="LaserScanningConfocalMicroscopy" Color="16711935" EmissionWavelength="526.2136399999998" EmissionWavelengthUnit="nm" ExcitationWavelength="488.00000000000006" ExcitationWavelengthUnit="nm" Fluor="FITC" ID="Channel:0:0" IlluminationType="Epifluorescence" Name="ChS1-T1" SamplesPerPixel="1"><DetectorSettings Binning="1x1" ID="Detector:0:0" /><LightPath /></Channel><Channel AcquisitionMode="LaserScanningConfocalMicroscopy" Color="-1" EmissionWavelength="690.5" EmissionWavelengthUnit="nm" ExcitationWavelength="633.0" ExcitationWavelengthUnit="nm" Fluor="APC" ID="Channel:0:1" IlluminationType="Epifluorescence" Name="Ch2-T1" SamplesPerPixel="1"><DetectorSettings Binning="1x1" ID="Detector:0:1" /><LightPath /></Channel><Channel AcquisitionMode="LaserScanningConfocalMicroscopy" Color="65535" EmissionWavelength="448.38500000000005" EmissionWavelengthUnit="nm" ExcitationWavelength="405.00000000000006" ExcitationWavelengthUnit="nm" Fluor="Pacific Blue" ID="Channel:0:2" IlluminationType="Epifluorescence" Name="Ch1-T2" SamplesPerPixel="1"><DetectorSettings Binning="1x1" ID="Detector:1:0" /><LightPath /></Channel><Channel AcquisitionMode="LaserScanningConfocalMicroscopy" Color="-16776961" EmissionWavelength="594.452559" EmissionWavelengthUnit="nm" ExcitationWavelength="561.0" ExcitationWavelengthUnit="nm" Fluor="R-PE" ID="Channel:0:3" IlluminationType="Epifluorescence" Name="ChS2-T2" SamplesPerPixel="1"><DetectorSettings Binning="1x1" ID="Detector:1:1" /><LightPath /></Channel><Plane DeltaT="16416.086416242422" DeltaTUnit="s" PositionX="16628.8" PositionXUnit="µm" PositionY="-787.25" PositionYUnit="µm" PositionZ="34.93" PositionZUnit="µm" TheC="0" TheT="0" TheZ="0" /><Plane DeltaT="16416.086416242422" DeltaTUnit="s" PositionX="16628.8" PositionXUnit="µm" PositionY="-787.25" PositionYUnit="µm" PositionZ="34.93" PositionZUnit="µm" TheC="1" TheT="0" TheZ="0" /><Plane DeltaT="16416.086416242422" DeltaTUnit="s" PositionX="16628.8" PositionXUnit="µm" PositionY="-787.25" PositionYUnit="µm" PositionZ="34.93" PositionZUnit="µm" TheC="2" TheT="0" TheZ="0" /><Plane DeltaT="16416.086416242422" DeltaTUnit="s" PositionX="16628.8" PositionXUnit="µm" PositionY="-787.25" PositionYUnit="µm" PositionZ="34.93" PositionZUnit="µm" TheC="3" TheT="0" TheZ="0" /></Pixels>
[METADATA.ome.xml.txt](https://github.com/tlambert03/ome-types/files/8329415/METADATA.ome.xml.txt)

Path: /OME/Image/Pixels

Here the whole XML:
METADATA.ome.xml.txt

Can I skip the validation somehow? I have version 0.2.10.

Cannot set the Intrumentref

I am trying to set an instrument ref but I get an error when setting it.
Here is a demo to test:

import tifffile
import ome_types
import numpy as np

number_of_channel = 3
voxel_size_um = 0.35
voxel_unit = 'µm'

output_filename = 'test.ome.tif'

channels_info = [
    ['MyChannel1', [0, 0, 255]],
    ['MyChannel2', [0, 255, 0]],
    ['MyChannel3', [255, 0, 0]]
]

data = (
    np
        .arange(1024 * 1024 * number_of_channel, dtype=np.uint16)
        .reshape((number_of_channel, 1024, 1024))
)


def per_channel(img, tile=(256, 256)):
    for c in range(img.shape[0]):
        for y in range(0, img.shape[1], tile[0]):
            for x in range(0, img.shape[2], tile[1]):
                yield img[c, y: y + tile[0], x: x + tile[1]]


with tifffile.TiffWriter(output_filename, bigtiff=True) as tif:
    tif.write(
        data=per_channel(data),
        shape=data.shape,
        dtype=np.uint16,
        tile=(256, 256),
        subifds=2
    )
    tif.write(
        data=per_channel(data[:, ::2, ::2]),
        shape=data[:, ::2, ::2].shape,
        dtype=np.uint16,
        tile=(256, 256),
        subfiletype=1
    )
    tif.write(
        data=per_channel(data[:, ::4, ::4]),
        shape=data[:, ::4, ::4].shape,
        dtype=np.uint16,
        tile=(256, 256),
        subfiletype=1
    )


# Set the color and channel names
ome_xml = tifffile.tiffcomment(output_filename)
print(ome_xml)
ome = ome_types.from_xml(ome_xml)

# Set the channel metadatas
for i in range(number_of_channel):
    # Get channel name and color from original .nd2 file metadata.
    name, color = channels_info[i]
    print(f'Updating channel {i} info')
    # Update the OME channel name and color
    channel = ome.images[0].pixels.channels[i]
    channel.name = name
    channel.color = ome_types.model.simple_types.Color(color)
    ome.images[0].pixels.channels[i] = channel

# Set the microscopy type and magnification
# Let's add a new instrument
instrument = ome_types.model.Instrument()
# Define the microscope
microscope_slide_scanner = ome_types.model.Microscope(
    manufacturer='Zeiss',
    model='Axioscan Z.1',
    # serial_number='xx-xx-xx-xx',
)
# Define the objective
objective_20x = ome_types.model.Objective(
    model='Plan_apochromal 20x/0.8 M27',
    nominal_magnification=20.0,
    calibrated_magnification=20.0,
    immersion='Air',
    lens_na='0.8'
)
# Define the instrument
instrument = ome_types.model.Instrument(
    microscope=microscope_slide_scanner,
    objectives=[objective_20x],
)
# Add the instrument metadata to the ome
ome.instruments.append(instrument)

#Set the pixel size and unit
ome.images[0].pixels.physical_size_x=voxel_size_um
ome.images[0].pixels.physical_size_y=voxel_size_um
ome.images[0].pixels.physical_size_x_unit =voxel_unit
ome.images[0].pixels.physical_size_y_unit=voxel_unit

#Add instrument ref
instrument_ref = ome_types.model.InstrumentRef(instrument.id) #Not working
ome.images[0].instrument_ref = instrument_ref 
# Write back OME tags to the TIFF file.
ome_xml = ome.to_xml()
tifffile.tiffcomment(output_filename, ome_xml)

When I run it, I get the following error:

Traceback (most recent call last):
  File "convert_to_pyramid/instrumentref_issue.py", line 103, in <module>
    instrument_ref = ome_types.model.InstrumentRef(instrument.id)
TypeError: __init__() takes 1 positional argument but 2 were given

error convert `xsd:list` members back to string

#48 fixes parsing xsd:list in model generation (#35), but exposed a new problem when writing XML:

xmlschema.validators.exceptions.XMLSchemaEncodeError:
failed validating <Type.TIME_LAPSE: 'TimeLapse'> with XsdAtomicBuiltin(name='xs:string')

Reason: <Type.TIME_LAPSE: 'TimeLapse'> is not an instance of <class 'str'>.

I'm not certain yet whether to failure to convert these Enums back to str has to do with coming from a xsd:list?

_________________________________________________________________________ test_roundtrip[tubhiswt] __________________________________________________________________________

xml = '/Users/talley/Dropbox (HMS)/Python/ome-types/testing/data/tubhiswt.ome.xml'

    @pytest.mark.parametrize("xml", xml_roundtrip, ids=true_stem)
    def test_roundtrip(xml):
        """Ensure we can losslessly round-trip XML through the model and back."""
        xml = str(xml)
        schema = get_schema(xml)

        def canonicalize(xml, strip_empty):
            d = schema.decode(xml, use_defaults=True)
            # Strip extra whitespace in the schemaLocation value.
            d["@xsi:schemaLocation"] = re.sub(r"\s+", " ", d["@xsi:schemaLocation"])
            root = schema.encode(d, path=NS_OME + "OME", use_defaults=True)
            # These are the tags that appear in the example files with empty
            # content. Since our round-trip will drop empty elements, we'll need to
            # strip them from the "original" documents before comparison.
            if strip_empty:
                for tag in ("Description", "LightPath", "Map"):
                    for e in root.findall(f".//{NS_OME}{tag}[.='']..."):
                        e.remove(e.find(f"{NS_OME}{tag}"))
            # ET.canonicalize can't handle an empty namespace so we need to
            # re-register the OME namespace with an actual name before calling
            # tostring.
            ElementTree.register_namespace("ome", URI_OME)
            xml_out = ElementTree.tostring(root, "unicode")
            xml_out = util.canonicalize(xml_out, strip_text=True)
            xml_out = minidom.parseString(xml_out).toprettyxml(indent="  ")
            return xml_out

        original = canonicalize(xml, True)
>       ours = canonicalize(to_xml(from_xml(xml)), False)

testing/test_model.py:111:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/ome_types/schema.py:283: in to_xml
    root = to_xml_element(ome)
src/ome_types/schema.py:269: in to_xml_element
    root = schema.encode(
../../../miniconda3/envs/ome/lib/python3.8/site-packages/xmlschema/validators/schema.py:1638: in encode
    for result in self.iter_encode(obj, path, validation, *args, **kwargs):
../../../miniconda3/envs/ome/lib/python3.8/site-packages/xmlschema/validators/schema.py:1625: in iter_encode
    yield from xsd_element.iter_encode(obj, validation, converter=converter,
../../../miniconda3/envs/ome/lib/python3.8/site-packages/xmlschema/validators/elements.py:864: in iter_encode
    for result in xsd_type.content.iter_encode(element_data, validation, **kwargs):
../../../miniconda3/envs/ome/lib/python3.8/site-packages/xmlschema/validators/groups.py:794: in iter_encode
    for result in xsd_element.iter_encode(value, validation, **kwargs):
../../../miniconda3/envs/ome/lib/python3.8/site-packages/xmlschema/validators/elements.py:874: in iter_encode
    yield self.validation_error(validation, e, elem, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = XsdElement(ref='OME:Experiment', occurs=[0, None]), validation = 'strict'
error = XMLSchemaEncodeError(reason="<Type.TIME_LAPSE: 'TimeLapse'> is not an instance of <class 'str'>.")
obj = <Element '{http://www.openmicroscopy.org/Schemas/OME/2016-06}Experiment' at 0x113a7def0>, source = None, namespaces = None
_kwargs = {'converter': <ome_types.schema.OMEConverter object at 0x113bef940>, 'level': 1, 'unordered': False, 'use_defaults': False}

    def validation_error(self, validation, error, obj=None,
                         source=None, namespaces=None, **_kwargs):
        """
        Helper method for generating and updating validation errors. If validation
        mode is 'lax' or 'skip' returns the error, otherwise raises the error.

        :param validation: an error-compatible validation mode: can be 'lax' or 'strict'.
        :param error: an error instance or the detailed reason of failed validation.
        :param obj: the instance related to the error.
        :param source: the XML resource related to the validation process.
        :param namespaces: is an optional mapping from namespace prefix to URI.
        :param _kwargs: keyword arguments of the validation process that are not used.
        """
        check_validation_mode(validation)
        if isinstance(error, XMLSchemaValidationError):
            if error.namespaces is None and namespaces is not None:
                error.namespaces = namespaces
            if error.source is None and source is not None:
                error.source = source
            if error.obj is None and obj is not None:
                error.obj = obj
            if error.elem is None and is_etree_element(obj):
                error.elem = obj
        elif isinstance(error, Exception):
            error = XMLSchemaValidationError(self, obj, str(error), source, namespaces)
        else:
            error = XMLSchemaValidationError(self, obj, error, source, namespaces)

        if validation == 'strict' and error.elem is not None:
>           raise error
E           xmlschema.validators.exceptions.XMLSchemaEncodeError: failed validating <Type.TIME_LAPSE: 'TimeLapse'> with XsdAtomicBuiltin(name='xs:string'):
E
E           Reason: <Type.TIME_LAPSE: 'TimeLapse'> is not an instance of <class 'str'>.
E
E           Schema:
E
E             <xs:simpleType xmlns:hfp="http://www.w3.org/2001/XMLSchema-hasFacetAndProperty" xmlns:xs="http://www.w3.org/2001/XMLSchema" name="string" id="string">
E               <xs:annotation>
E                 <xs:appinfo>
E                   <hfp:hasFacet name="length" />
E                   <hfp:hasFacet name="minLength" />
E                   <hfp:hasFacet name="maxLength" />
E                   <hfp:hasFacet name="pattern" />
E                   <hfp:hasFacet name="enumeration" />
E                   <hfp:hasFacet name="whiteSpace" />
E                   <hfp:hasProperty name="ordered" value="false" />
E                   <hfp:hasProperty name="bounded" value="false" />
E                   <hfp:hasProperty name="cardinality" value="countably infinite" />
E                   <hfp:hasProperty name="numeric" value="false" />
E                 </xs:appinfo>
E                 <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#string" />
E               </xs:annotation>
E               <xs:restriction base="xs:anySimpleType">
E                 <xs:whiteSpace value="preserve" id="string.preserve" />
E               </xs:restriction>
E             </xs:simpleType>
E
E           Instance:
E
E             <ome:Experiment xmlns:ome="http://www.openmicroscopy.org/Schemas/OME/2016-06" ID="urn:lsid:loci.wisc.edu:Experiment:OWS350" Type="">
E                 <ome:Description>4 Cell Embryo</ome:Description>
E                 <ome:ExperimenterRef ID="urn:lsid:loci.wisc.edu:Experimenter:116" />
E             </ome:Experiment>

../../../miniconda3/envs/ome/lib/python3.8/site-packages/xmlschema/validators/xsdbase.py:906: XMLSchemaEncodeError
========================================================================== short test summary info ==========================================================================
FAILED testing/test_model.py::test_roundtrip[tubhiswt] - xmlschema.validators.exceptions.XMLSchemaEncodeError: failed validating <Type.TIME_LAPSE: 'TimeLapse'> with XsdAt...

need help with abstract groups

Looking at one of the failing test files (spim.xml) I see that we're getting the error:

pydantic.error_wrappers.ValidationError: 2 validation errors for OME
instrument -> 0
  __init__() got an unexpected keyword argument 'laser' (type=type_error)

spim.ome.xml has an <Instrument> with a child <Laser> (which I gather is one of the concrete implementations of a LightSourceGroup?) ... whereas the generated model is looking for the literal key light_source_group...

@jmuhlich, Code generation aside for the moment, what should the behavior be? should it be that, any children of Instrument that are concrete subtypes of LightSource should be added to the light_source_group list during __init__? If I have that correct, I can probably make it happen somehow... but curious to hear your thoughts

Improve __repr__

  • leave out properties that are set to the default
  • summarize containers rather than recursively include their contents.

possible example repr(OME):

OME(uuid='urn:uuid:afa03d8b-30a8-4c9b-b016-9992c3d73fef', image=<1 image>)

e.g Pixels

   Pixels(
     dimension_order=DimensionOrder.XYZCT, id='Pixels:0',
     size_c=8, size_t=1, size_x=7218, size_y=4078, size_z=1,
     type=PixelType.UINT16,    physical_size_x=0.32499998807907104,
     physical_size_x_unit=UnitsLength.MICROMETER,
     physical_size_y=0.32499998807907104,
     physical_size_y_unit=UnitsLength.MICROMETER,
     channel=<8 Channels>, tiff_data=<1 TiffData>, plane=<8 Planes>,
   )

BinData can't be instantiated

I don't think dataclasses can usefully subclass a non-dataclass or really any non-cooperative class. For example our BinData ultimately subclasses str and I discovered there's no way to actually instantiate one due to errors from either the str constructor or the pydantic validator. How would the core string value actually get passed in here, anyway? The other subclasses of ConstrainedStr seem OK since they aren't also dataclasses. Maybe BinData should be a top-level class with an attribute value: base64Binary rather than inheriting.

Whole Slide OMETIFF pyramid levels

Hi, Thanks for creating such a useful tool!

I cannot seem to find metadata corresponding to the number of subresolution pyramid levels for whole slide OMETIFFs. Does the ometypes API support this query?

r

schema caching needs xmlschema version

just so I don't forget to fix it: when testing #17 locally a few tests failed that weren't testing on CI, so I updated xmlschema to check... and they all failed. I realized that it was ultimately due to the pickled schema cache being stale. So that cache key should minimally have the xmlschema version in it, but maybe we don't cache at all.

from_xml (string) raises path too long for Windows

I'm using Python 3.7.5, ome-types 0.2.1, and on Windows, trying to pass my xml string in to from_xml:

File "d:\src\aics\cellbrowser-tools\cellbrowser_tools\fov_processing.py", line 295, in add_segs_to_img
self.omexml = from_xml(description)
File "C:\Users\danielt\AppData\Local\Continuum\anaconda3\envs\cellbrowser-tools\lib\site-packages\ome_types_init_.py", line 37, in from_xml
d = to_dict(xml)
File "C:\Users\danielt\AppData\Local\Continuum\anaconda3\envs\cellbrowser-tools\lib\site-packages\ome_types\schema.py", line 265, in to_dict
_xml = xml if os.path.exists(xml) else StringIO(xml)
File "C:\Users\danielt\AppData\Local\Continuum\anaconda3\envs\cellbrowser-tools\lib\genericpath.py", line 19, in exists
os.stat(path)
ValueError: stat: path too long for Windows

It seems the code is using os.path.exists on the raw xml string to see if it's a file or not?

Add CRON to build infra to catch dependency breakages

xmlschema released a new version (1.5.0) on Friday (5 Feb 2021) that moved around some of the objects / functions in the library.

c:\users\dmt\anaconda3\envs\aicsimageio\lib\site-packages\ome_types\__init__.py:18: in <module>
    from .schema import to_dict, to_xml, validate  # isort:skip
c:\users\dmt\anaconda3\envs\aicsimageio\lib\site-packages\ome_types\schema.py:13: in <module>
    from xmlschema.converters import ElementData, XMLSchemaConverter
E   ImportError: cannot import name 'ElementData' from 'xmlschema.converters' (c:\users\dmt\anaconda3\envs\aicsimageio\lib\site-packages\xmlschema\converters\__init__.py)

This is just to alert that any sort of build systems that install ome-types (and xmlschema as a dependency) will likely fail today :)

@tlambert03 I will make a PR to resolve this later today probably. Alright if I also in that PR include a change to the build GitHub workflow to add a CRON job for this?

We have CRON jobs on aicsimageio build for these types of dependency catching: example

NotImplementedError: OMEPyramidStore

While reading xml from a czi file, we received this message:

NotImplementedError: Encountered a combination of schema element and data type that is not yet supported. Please submit a bug report with the information below:
    element: XsdElement(name='OME:OME', occurs=[1, 1])
    data type: <java class 'loci.formats.ome.OMEPyramidStore'>

We are in the process of replacing the OmeXml.py model with ome_types, and this error is a blocker for us adopting ome_types into our Bioformats utility.

xmlschema fails to validate keyrefs that refer to LightSourceIDKey

The ome schema exposes an implementation deficiency in xmlschema:

  • The xpath selector for LightSourceIDKey is a union that explicitly enumerates all the extensions of LightSource (Instrument/Laser | Instrument/Arc | ... to paraphrase)
  • The actual definition of the Instrument element references LightSourceGroup which is an abstract element serving as the SubstitutionGroup head for all the different LightSource extensions.
  • When xmlschema parses a keyref like ImagePixelsChannelLightSourceSettingsLightSourceIDKeyRef, which ensures that a LightSourceSettings references an actual LightSource that exists in the file, it doesn't understand the link between Instrument/LightSourceGroup and all of the extensions. This in turn causes it to see all LightSourceSettings as having dangling LightSource references and the validation fails.

Only the (few) keyrefs to LightSourceIDKey trigger this issue. The only other type with the problematic pattern is Shape but there are actually no keyrefs to ShapeIDKey -- the keyref is to ROIIDKey which is a container for Shapes and is not itself a SubstitutionGroup head.

I don't see a great way to solve this short of getting it fixed in xmlschema. I'm looking into that now but it's been slow going. I've traced it down to xmlschema.validators.identities.XsdIdentity:build where it creates self.elements, a cache of the schema elements the key's selector can potentially match. The code looks for Instrument/Laser etc. which don't directly exist so .elements ends up empty, causing the selector to never match anything during validation of an XML document. Maybe we can monkey-patch the .elements entry after constructing the XmlSchema object? Sounds dicey but I'll take a look.

Add example XML that can be directly round-tripped, with corresponding test

@tlambert03 I am considering reworking testing/data/example.ome.xml to use more (all?) of the schema elements and make sure all the float values etc. are round-trippable without tricks, then use that in a new test. Is that OK or do you think I should make a separate file? That file is ours, not from the ome-model test suite (it's example output from Ashlar).

Discuss deleting generic schema support code

Should we streamline the schema loading code to explicitly load the local OME .xsd and drop all remaining pretense that we might support other schemas? I think this would make it easier to also accept ElementTree objects in addition to paths and strings.

Date with excess zeros fails to parse

I encountered an OME-TIFF with an AcquisitionDate of 2020-09-08T17:26:16.7690000. This triggered a parsing error in xmlschema/elementpath: OverflowError: Invalid value 7690000 for microsecond. bioformats handles it fine so we should check what they're doing. Round-tripping back to OME-XML with showinf ends up just truncating the trailing zeros, which might be a hint.

`Ellipse`s being converted into `Point`s?

First of all, I love ome-types! I'm trying to generate an XML from an OMERO Project, and everything has been going well so far except for any Ellipse ROIs. Here's a code snippet:

print('kwargs:', kwargs)
roi = ROI(**kwargs)
print('roi:', roi)

And what that prints:

kwargs: {'id': 463, 'name': None, 'description': None, 'union': [Ellipse(
   id='Shape:1437',
   fill_color=Color('#fff0', rgb=(255, 255, 255, 0.0)),
   stroke_color=Color('yellow', rgb=(255, 255, 0)),
   the_t=0,
   the_z=0,
   radius_x=83.35868762069157,
   radius_y=11.176583815064788,
   x=8.692898522828159,
   y=13.194578115007033,
)]}
roi: id='ROI:463' annotation_ref=[] description=None name=None union=[Point(
   id='Shape:1437',
   fill_color=Color('#fff0', rgb=(255, 255, 255, 0.0)),
   stroke_color=Color('yellow', rgb=(255, 255, 0)),
   the_t=0,
   the_z=0,
   x=8.692898522828159,
   y=13.194578115007033,
)]

It seems that ROI() is internally converting the ellipse into a single point, and I have no idea why! Am I missing something obvious? Everything else seems to work as intended,

Pluralize container attributes

object.image (and similar containers) should be accessible at object.images ... probably as an alias, so as to remain consistent with the OME model

namespacing of Structured Annotation data?

I'm having a little trouble with how StructuredAnnotations are intended to be validated.
We have ome-tiffs with large Structured Annotation blocks that essentially are inlining other "arbitrary" xml as XMLAnnotations. (In this case, Zeiss CZI metadata has been stashed in there...please do not judge just yet!)

I can call from_xml and the xml properly is converted to ome-types without error. If I then call to_xml and from_xml again, I get an exception because of this:

Traceback (most recent call last):
  File "C:\Users\danielt\AppData\Local\Continuum\anaconda3\envs\cellbrowser-tools\lib\site-packages\xmlschema\validators\global_maps.py", line 127, in lookup
    obj = global_map[qname]
KeyError: '{http://www.openmicroscopy.org/Schemas/OME/2016-06}ImageDocument'

where I believe the OME schema namespace was added by ome-types.. Does that sound like what is going on here?

I'm honestly not sure whether the initial xml should fail to validate or not, but I believe the intent of the schema is to allow arbitrary (non-OME-schema) xml inside structured XMLannotations.

References should provide direct access to the referenced object

All of the Reference types in the schema are effectively pointers to entities defined elsewhere in the document. It would be nice if the Python API gave direct access to those referenced objects.

Currently:

>>> filter_set.emission_filter_ref[0]
FilterRef(id='Filter:5')

Ideally:

>>> filter_set.emission_filters[0]
Filter(id='Filter:5', lot_number='J34', manufacturer='Ink Inc.', ...)

To avoid creating any circular reference chains, the implementation should probably store weakrefs internally and provide proxy containers that perform the dereferencing upon access. The root OME object can walk its children in post_init and set all of these weakrefs, but this only helps with read-only use cases. Developing a good write API will be challenging.

Performance: Use Pydantic aliasing to handle snake and camel case

A significant amount of time is spent during parsing converting from camel case to snake case using regular expressions. An alternative to this is to use Pydantics aliasing feature, which allows mapping of multiple names to one attribute. Using aliasing, all camel case element names could be mapped to their snake case counterparts in Pydantic classes. This would lead to a performance hit when loading the Pydantic classes. However, once this is done Pydantic will directly map all camel case element names to snake case names, avoiding subsequent hits to performance when trying to map camel case to snake case. Driving the hit to performance on class loading rather than parsing is ideal for situations when parsing happens multiple times in the same Python session.

To implement this, the way Pydantic classes are generated would have to be modified to 1) include attribute aliasing code and 2) setting the configuration such that Pydantic classes can set values using attribute aliases. First, rather than converting to snake case when the model classes are generated, keep the original model attribute names and include the following to class Config under the OMETypes class.

class Config:

    allow_population_by_field_name = True    # This allows you to you use CamelCase when initializing a class
    alias_generator = camel_to_snake         # This maps CamelCase to snake_case when the class is loaded

Then, include the camel_to_snake function currently in the ome_autogen.py to _base_type.py. This will cause all inheriting classes to auto-generate aliases.

The benefit to using aliases is that Pydantic uses aliases by default when serializing. So even if you initialize a class using PlateRef as a keyword argument, when using dict() or json() methods, the fields will show up as plate_ref.

This conversion may be made a little more complicated by the use of plural name conversions, but this just might result is a slightly more complex camel_to_snake function, which means even less burden on the parser to parse out names properly.

Using aliasing should help to simply parsing logic and reduce the amount of code in addition to helping to globally improve performance for both schema.to_dict and lxml2dict.

Write metadata as well as read.

This package would be particularly useful if it could be used to write XML metadata as well as read it into python dataclasses. Both @jmuhlich and @jacksonmaxfield have expressed an interest in for that in their applications, so just wanted to pin the topic here and connect you both.

Usage question: how to copy ome data or subsets of it?

Sorry if this is the wrong venue for usage questions -- please direct me to the right place if this is not it.

My question is about copying the ome object.
In particular, I am doing some image cropping and want to bring along as much metadata as accurately as possible.
I am starting with a valid ome object, and currently making a "copy" of it by calling to_xml and then from_xml again :)
Then I am removing planes and fixing up all the rest of the planes' indices manually. (Among other things).

Is there a best recommended way to copy these objects? (e.g. copy.deepcopy or something?)
It may also be useful to me to grab copies of planes out of the original ome object and stuff them into a new one, in this cropping example.

Thanks!
-Dan,

Removal of Pixels.PhysicalSizeXUnit and .PhysicalSizeYUnit

Hi,
I have generated an OME-TIFF using the following command:

bfconvert -tilex 512 -tiley 512 -pyramid-resolutions 6 -pyramid-scale 2 -compression LZW my_image.tiff my_image.ome.tif

The output of

~/software/bftools/tiffcomment my_image.ome.tif > tiffcomment_out.ome.txt

is tiffcomment_out.ome.txt

However, the output of

from ome_types import from_tiff, to_xml
with open("ome_types_out.ome.txt", "wt+") as f:
  f.write(to_xml(from_tiff("my_image.ome.tif")))

is ome_types_out.ome.txt

The ome_types output erases the properties Pixels.PhysicalSizeXUnit and Pixels.PhysicalSizeYUnit. I am using a downstream tool that is relying on these properties to be in the OME-XML.

Is this a bug or is this the expected behavior?

Remove pickle protocol methods on XMLAnnotation now that xmlschema 1.4.1 is out

xmlschema 1.4.1 includes my fix to the xml.etree module namespace manipulation that fixes Element pickling. This means we can remove our pickle protocol methods on XMLAnnotation that worked around the issue and update the minimum version on our xmlschema dependency. the __eq__ method on XMLAnnotation will also need to be refactored since it calls __getstate__ directly.

Validation errors with Slidebook ome-tiff export

I'm working with a dataset that was originally in slidebook format but was exported as an ome-tiff series. This gave the following alleged OME XML:

https://gist.github.com/jni/c4b09934715246c158397b24db7fbb3b

I tried to parse it with:

import ome_types

ome = ome_types.from_xml('ome-meta.xml', parser='lxml', validate=False)

which gives the error:

ValidationError: 624 validation errors for OME

(Full traceback at: https://gist.github.com/jni/e87f511c892475de72c880b83617e10d)

I fully expect that Slidebook is producing garbage, but I'm wondering if it's easily fixed garbage. At any rate I'm presently only after the pixel physical size, and potentially channel display colors and contrast limits, so any suggestions for grabbing that reliably from a junk xml will be appreciated. 😃

Can't read XMLAnnotations with lxml

Hi,
Using tifffile and ome-type, I construct a OME-TIFF containing, among other things, an XMLAnnotation.
The part of my code where I make the annotation looks like this :

xml_ann = ome_types.model.XMLAnnotation(
                    description="Some description",
                    value='<Data><Params A="1" B="2" C="3"/></Data>',
                )
metadata.structured_annotations.append(xml_ann)

Then, when I open that file with ome-types, using xmlschema as a parser :
metadata = ome_types.from_tiff(path_out, parser='xmlschema')
I get this, which is the expected behaviour :
image

But, when I use lxml, I get an empty element :
metadata = ome_types.from_tiff(path_out, parser='lxml')

image

Interestingly, this only happens when I include a description in the XMLAnnotation. If I just do xml_ann = ome_types.model.XMLAnnotation(value='<Data><Params A="1" B="2" C="3"/></Data>') , I get the correct behaviour when reading it with either parser.

Versions :

  • ome-types : 0.3.1
  • lxml : 4.8.0

Handle multi-Image / Scene, multi-Resolution metadata correctly

Hey @tlambert03, got an interesting one for ya.

Even though aicsimageio==4.0.0 isn't out yet, we have already started the planning for 4.1.0 which includes support for multi-resolution / pyramid files. We got some great test files from Seb over at Zeiss and I converted to OME-TIFF w/ bfconvert -noflat ..., and the resulting file can be downloaded here.

Describing the file + metadata:

The file has two scenes with the first scene having multiple resolutions. In total there are four resolutions in the first scene and one resolution for the second scene. When converted, because OME doesn't encapsulate resolution data into something of it's own type, all four resolutions get converted to their own Image element in the metadata, resulting in the metadata having five total Image elements (1 scene * 4 resolutions + 1 scene * 1 resolutions).

Now, the bug:

What is interesting is that if I crack the produced file open and parse the metadata w/ ome-types it produces this:

In [1]: from ome_types import from_tiff

In [2]: ome = from_tiff("aicsimageio/tests/resources/variable_scene_shape_first_scene_pyramid.ome.tiff")

In [3]: ome
Out[3]: 
OME(
   creator='OME Bio-Formats 6.6.0',
   experimenters=[<1 Experimenters>],
   images=[<2 Images>],
   instruments=[<1 Instruments>],
   structured_annotations=[<5194 Structured_Annotations>],
   uuid='urn:uuid:54bab916-61f0-451d-bfbe-251896f608fb',
)

It correctly parses that there are two images (the two scenes regardless of multi-resolution) behavior.

But, if I check the metadata for the second scene:

In [4]: ome.images[1].pixels
Out[4]: 
Pixels(
   id='Pixels:1',
   dimension_order='XYCZT',
   size_c=1,
   size_t=1,
   size_x=422,
   size_y=2030,
   size_z=1,
   type='uint8',
   big_endian=False,
   channels=[<1 Channels>],
   interleaved=False,
   physical_size_x=0.9082107048835328,
   physical_size_y=0.9082107048835328,
   planes=[<1 Planes>],
   significant_bits=8,
   tiff_data_blocks=[<1 Tiff_Data_Blocks>],
)

Some of the metadata is correct and some is incorrect (when compared to the raw XML).
Correct:

  • dimension_order
  • size_c
  • size_t
  • size_x
  • size_y
  • size_z
  • type
  • big_endian
  • interleaved
  • planes
  • significant_bits
  • tiff_data_blocks

Incorrect:

  • pixels id (should be "Pixels:4")
  • channels (it is correct that there is only one channel but incorrect as to it's name / id)

Unsure:

  • physical_size_x
  • physical_size_y

If you get around to handling this, great. If not, I may also be able to look more into it when I start work on AllenCellModeling/aicsimageio#140

`to_xml()` produces invalid datetimes.

We recently discovered that the to_xml() function provided by ome-types produces datetimes that look like 2016-03-11T10:23:44.925154UTC, which violate the XSD dateTime format. Instead, the datetime in this case should be 2016-03-11T10:23:44.9251548Z, which is what is in the source XML.

Steps to reproduce:

  1. Extract OverViewScan.ome.xml from this ZIP: OverViewScan.ome.xml.zip
  2. Load that XML into ome-types, output it to XML, and try to load it again, like this:
from pathlib import Path
from ome_types import from_xml, to_xml

if __name__ == '__main__':
    og_xml = Path("OverViewScan.ome.xml").read_text()
    ome = from_xml(og_xml)
    output_xml = to_xml(ome)
    from_xml(output_xml)
  1. Receive the following error:
xmlschema.validators.exceptions.XMLSchemaDecodeError: failed validating '2016-03-11T10:23:44.925154UTC' with XsdAtomicBuiltin(name='xs:dateTime'):

Reason: attribute StartTime='2016-03-11T10:23:44.925154UTC': Invalid datetime string '2016-03-11T10:23:44.925154UTC' for <class 'elementpath.datatypes.datetime.DateTime10'>

Thanks for taking a look!

Issues with tiff-files exported from NIS elements using Export ND to Tiff

Below is a link to a sample file exported from NIS elements using the following dialog

image

https://www.dropbox.com/s/qiikd154xy4d9xx/seq0000xy01c1.tif?dl=0

I have been extracting metadata from this file using code in the following (messy) notebook (under the last heading):
https://github.com/VolkerH/PythonSnippets/blob/master/metadata_tifffile/tifffile%20metatada%20experiments.ipynb

I just tried this with ome_types.

If I try

from_tiff("seq0000xy01c1.tif")

I get a unicode error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-11-52cd54144d4d> in <module>
----> 1 from_tiff("/home/hilsenst/Desktop/seq0000xy01c1.tif")

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/ome_types/_convenience.py in from_tiff(path)
     50         If the TIFF file has no OME metadata.
     51     """
---> 52     return from_xml(_tiff2xml(path))
     53 
     54 

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/ome_types/_convenience.py in _tiff2xml(path)
    102     if desc[-1] == 0:
    103         desc = desc[:-1]
--> 104     return desc.decode("utf-8")

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 1746: invalid start byte

When I extract the XML string using tifffile as in my notebook and pass it to from_xml I get the following error which is potentially related to #86.

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-13-0e209f70ab29> in <module>
----> 1 from_xml(nd2totiff.ome_metadata)

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/ome_types/_convenience.py in from_xml(xml)
     27             d.pop(key)
     28 
---> 29     return OME(**d)  # type: ignore
     30 
     31 

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/ome_types/model/ome.py in __init__(self, **data)
    135 
    136     def __init__(self, **data: Any) -> None:
--> 137         super().__init__(**data)
    138         self._link_refs()
    139 

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/ome_types/_base_type.py in __init__(__pydantic_self__, **data)
     78         if "id" in __pydantic_self__.__fields__:
     79             data.setdefault("id", OMEType._AUTO_SEQUENCE)
---> 80         super().__init__(**data)
     81 
     82     # pydantic BaseModel configuration.

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/pydantic/main.cpython-39-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for OME
images -> 0 -> acquisition_date
  invalid datetime format (type=value_error.datetime)

Performance: Allow to_xml to use lxml

#127 implemented lxml parsing, which significantly increased parsing and validation times. However, xmlschema is the only option for converting OME to an xml string with the to_xml function. In testing, I have found that to_xml takes ~5x longer than both compressing and writing a multi-dimensional tiff image. Adding lxml as an xml string converter should significantly improve performance.

Remove or replace call to weakref / support serdes

Hey @tlambert03 !

Back again, one of the checks we do on aicsimageio is to ensure that the AICSImage / the base Reader object can be fully serialized. This is largely useful for dask worker transfer (while it's not recommended to transfer a whole image between workers, it is in some cases useful, i.e. it is decently efficient to transfer a not-yet-read image between workers for metadata parsing / some image handling).

Using the produced OME object from from_xml is awesome but unfortunately it can't be serialized due to this weakref call / attr creator.

I have gotten our tests working locally by simply removing this __post_init_post_parse__ function addition (or really just removing the entire OME object CLASS_OVERRIDE), but, it is there so I am here to ask: "Why?"

Would it possible to remove this? I also know that the __getstate__ and __setstate__ can be used to override default pickle calls if that is an option I could potentially work on a patch for adding those functions to allow pickling and unpickling of the OME object.

More forgiving parser

Multiple times I've gotten schema validation errors from ome_types on XML that BioFormats has no trouble with. These files always turn out to be third-party-generated, of course. I suspect people aren't running the schema validation, just testing that their generated files don't blatantly error out in showinf or ImageJ and that those tools show the correct metadata -- I know I've done just that in the past! 😳 It turns out BioFormats doesn't perform schema validation when opening files and it uses a bunch of ad-hoc DOM access for parsing, so a lot of technically invalid stuff like misordered elements is accepted. Do we have any reasonable solution here? I don't want to just send people away to fix their own tools (usually the person with the file is fairly removed from the tool developer, and at any rate they have this file in hand already), but I also don't want to start littering our code with fixes for every wrinkle we encounter in the wild.

schema 2011-06 failing to parse

The attached ome xml fails to parse in ome-types.
Here's the error I get:

Traceback (most recent call last):
  File "scratch.py", line 7, in <module>
    ome_obj = from_xml("./test_ome.xml")
  File "/Users/danielt/opt/anaconda3/envs/scratch/lib/python3.8/site-packages/ome_types/__init__.py", line 37, in from_xml
    d = to_dict(xml)
  File "/Users/danielt/opt/anaconda3/envs/scratch/lib/python3.8/site-packages/ome_types/schema.py", line 254, in to_dict
    schema = schema or get_schema(xml)
  File "/Users/danielt/opt/anaconda3/envs/scratch/lib/python3.8/site-packages/ome_types/schema.py", line 71, in get_schema
    return _build_schema(resource.namespace)
  File "/Users/danielt/opt/anaconda3/envs/scratch/lib/python3.8/site-packages/ome_types/schema.py", line 48, in _build_schema
    schema = xmlschema.XMLSchema(namespace)
  File "/Users/danielt/opt/anaconda3/envs/scratch/lib/python3.8/site-packages/xmlschema/validators/schema.py", line 429, in __init__
    self.parse_error(e.reason, elem=e.elem)
  File "/Users/danielt/opt/anaconda3/envs/scratch/lib/python3.8/site-packages/xmlschema/validators/xsdbase.py", line 169, in parse_error
    raise error
xmlschema.validators.exceptions.XMLSchemaParseError: <Element '{http://www.w3.org/1999/xhtml}html' at 0x7fed78517a90> is not an element of the schema:

Schema:

  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
          <title>Open Microscopy Environment OME Schema</title>
  </head>
  <body>
  <h1>Open Microscopy Environment OME Schema</h1>
  <div class="head">
  <p>June 2011</p>
  </div>
  <div id="toc">
  <h2>Table of contents</h2>
          <ol>
                  <li><a href="#intro">Introduction</a></li>
                  <li><a href="#status">Status</a></li>
                  <li><a href="#schema">Schema</a></li>
          </ol>
  </div>
  <div id="intro">
  <h2>Introduction</h2>
  <p>This document outlines the OME Schema created by the
  ...
  ...
  </html>

Path: /html

test_ome.xml.zip

xmlschema2dict(..., schema=None, validate=False) fails to use the cached, in-disk OME xml schema

In xmlschema2dict, at

if validate:
schema = schema or get_schema(xml)
if _XMLSCHEMA_VERSION >= (2,):
kwargs["validation"] = "strict" if validate else "lax"
result = xmlschema.to_dict(xml, schema=schema, converter=converter, **kwargs)
, if validate=True (and schema=None, the default), get_schema will take care of using the cached OME xml schema provided with ome_types. If validate=False (and schema=None again), however, then xmlschema.to_dict will itself try to download the OME xml schema (as it doesn't know about the cached copy provided with ome_types) which will fail e.g. in the absence of a network connection.

From a very quick look I guess the (untested) fix may be to always run schema = schema or get_schema(xml) regardless of the value of validate?

xsd:list not handled in code generation

Elements defined as lists are skipped over during code generation. For example Experiment/Type:

<xsd:attribute name="Type" use="optional">
  ...
  <xsd:simpleType>
    <xsd:list>
      <xsd:simpleType>
        <xsd:restriction base="xsd:string">
          <xsd:enumeration value="FP"/>
          <xsd:enumeration value="FRET"/>
          ...   
        </xsd:restriction>
      </xsd:simpleType>
    </xsd:list>
  </xsd:simpleType>
</xsd:attribute>

The generated code ends up with type = None for that field, and no corresponding enum is produced.

Schemas can define lists containing any type, but the OME schema only has two cases: Experiment/Type and MicrobeamManipulation/Type. I think we just need a little extra logic in Member to both detect these as lists (maybe in .max_occurs) and also extract the nested type for further processing (e.g. enum generation). These Type attributes are optional so they might not appear in a given document, but I think we should normalize that to an empty list in our model classes rather than make it Optional. xmlschema to_dict does return a list of strings for an xsd:list (if the attribute is present) so we don't need to do anything extra there.

consider mapping BinData to bytes

The BinData schema element is really just an XML-friendly encoding for a series of raw bytes, and the ideal Python type for that is bytes. If we make that change, the XML parsing process could perform the base64 decoding and any decompression, but I'm not sure what to do about the BigEndian flag. It would be instructive to look at BioFormats and OMERO code to see how BinData endianness is managed. Numpy users could be accommodated nicely with an __array__ method that returns the bytes object wrapped in np.frombuffer (zero-copy).

Also when we get to XML encoding, our API will need a way to declare what kind of compression the user wants (one setting for all BinData elements in the whole document is probably OK). The Length can be computed from the content, and we can pass along or arbitrarily choose the BigEndian value depending on how we manage it in the dataclass.

Feature: Fix common errors from OmeXml.py

Prior to ome-types, a common way to parse out OME metadata was from OmeXml.py. However, the data dumped from that library generally did not conform to the OME schema. Libraries such as AICSImageIO generally create catches to fix some of the common errors, and it might be ideal to implement something similar in ome-types. This could be an option in from_xml.

For reference:
https://github.com/AllenCellModeling/aicsimageio/blob/c49a613dc54381d11237240ba36f0ef54603a7d6/aicsimageio/metadata/utils.py#L187

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.