Fileformats provides a library of file-format types implemented as Python classes. The file-format types are designed to be used in type validation during the construction of data workflows (e.g. Pydra, Fastr), and also provide some basic data handling methods (e.g. loading data to dictionaries) and conversions between some equivalent types When the "extended" install option is provided.
File-format types are typically identified by a combination of file extension and "magic numbers" where applicable, however, unlike many other file-type Python packages, FileFormats, supports multi-file data formats ("file sets") often found in scientific workflows, e.g. with separate header/data files. FileFormats also provides a flexible framework to add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate data files, directories containing certain file types, or to peek at metadata fields to define specific sub-types (e.g. functional MRI DICOM file set).
See the extension template for instructions on how to design FileFormats extensions modules to augment the standard file-types implemented in the main repository with custom domain/vendor-specific file-format types.
Support for all non-vendor standard MIME types (i.e. ones not matching */vnd.*
or */x-*
) has been added to FileFormats by semi-automatically scraping the IANA MIME types website for file extensions and magic numbers. As such, many of the formats in the library have not been properly tested on real data and so should be treated with some caution. If you encounter any issues with an implemented file type, please raise an issue in the GitHub tracker.
Adding support for vendor formats will be relatively straightforward, it just requires someone to do the job of manually curating the scraped data (a days work or so). Please get in touch if you are interested in helping out with this.
FileFormats can be installed for Python >= 3.7 from PyPI with
Support for converter methods between a few select formats can be installed by passing the 'extended' install extra, e.g
Using the WithMagicNumber
mixin class, the Png
format can be defined concisely as
Files can then be checked to see whether they are of PNG format by
which will raise a FormatMismatchError
if initialisation or validation fails, or for a boolean method that checks the validation use matches
While not implemented in the main File-formats itself, file-formats provides hooks for other packages to implement extra behaviour such as format conversion. The fileformats-extras implements a number of converters between standard file-format types, e.g. archive types to/from generic file/directories, which if installed can be called using the convert() method.
The converters are implemented in the Pydra dataflow framework, and can be linked into wider Pydra workflows by creating a converter task
import pydra
from pydra.tasks.mypackage import MyTask
from fileformats.application import Json, Yaml
wf = pydra.Workflow(name="a_workflow", input_spec=["in_json"])
wf.add(
Yaml.get_converter(Json, name="json2yaml", in_file=wf.lzin.in_json)
)
wf.add(
MyTask(
name="my_task",
in_file=wf.json2yaml.lzout.out_file,
)
)
...
Alternatively, the conversion can be executed outside of a Pydra workflow with
This work is licensed under a Creative Commons Attribution 4.0 International License