therapixel / pacsanini Goto Github PK

PACS and DICOM framework to identify, collect, and analyze data sets.

License: Apache License 2.0

Makefile 0.96% Python 93.49% Dockerfile 0.23% CSS 5.14% Mako 0.18%

python dicom data-management python3 pacs

pacsanini's Introduction

pacsanini

pacsanini 🎻 is a package designed to help with the collection of DICOM files and the extraction of DICOM tags (metadata) for structuring purposes.

pacsanini's functionalities come out of a desire to facilitate research in medical imagery by easing the process of data collection and structuring. The two main pain points for this are:

acquiring data from a PACS
extracting metadata from DICOM files in research-ready formats (eg: csv)

The project seeks to target medical/research professionals that are not necessarily familiar with coding but wish to obtain data sets and software engineers that wish to build applications with a certain level of abstraction.

Documentation

Check out the complete documentation on readthedocs. You will be able to find examples on how to use the pacsanini API from within you Python application and as a command line tool.

Contributing and Code of Conduct

All contributions to improve pacsanini are welcome and valued. For more information on how you can contribute, please read the Contributing document and make sure that you are familiar with our Code of Conduct.

You are also more than welcome to open a discussion on our GitHub discussions page.

Installation

To install a particular release version, check out the available versions of pacsanini on PyPI or simply run the following command to obtain the latest release:

pip install pacsanini

To obtain the cutting edge version of pacsanini, you can use pip or poetry in the following way:

pip install git+https://github.com/Therapixel/pacsanini.git
# or
poetry add git+https://github.com/Therapixel/pacsanini.git

For development

poetry is the only supported build tool for installing pacsanini in a development context. See the previous section on how to install poetry.

git clone https://github.com/Therapixel/pacsanini.git
cd pacsanini
poetry install --no-root --no-dev
# or, to install the project and its development dependencies:
poetry install --no-root

Usage with docker

A docker image can be built locally to run pacsanini within an isolated environment.

docker image build -t pacsanini:latest .
docker run pacsanini --help

Roadmap

The following topics are the main areas where pacsanini can improve as a library and a tool. Of course, these topics are up for discussion and such discussions are encouraged in the GitHub issues section.

pacsanini's People

Contributors

Stargazers

Watchers

Forkers

jhnstrk aachick xgrg openssl-sg-insights

pacsanini's Issues

Link project to PyPI and to GitHub discussions

Have you noticed outdated documentation?

Would you like to see better documentation on a particular area?
It would be nice to have a link in the README that indicates that people can install pacsanini from PyPI now.

A link to the project's discussions forum should also be available in order to promote discussion about new releases and more topics (which may not be issues).

Update notebook+pywin32 dependencies following security vulnerabilities

It has been noted that there are two security vulnerabilities relating to this project's dependencies:

notebook -> GHSA-4952-p58q-6crx
- This is used as a dev dependency. However, there is no real need to use this even as a dev dependency. My opinion is that this listing should be removed.
pywin32 -> CVE-2021-32559
- This is indirectly used by dependencies (and even dependencies of dependencies) (eg: pyinstaller).

Overall, I think that the following actions should be taken:

Remove notebook from the list of dependencies.
Update all packages in pyproject.toml (which should also update pywin32).
Sort dependencies alphabetically in the pyproject.toml file (this is not necessary but could make future changes easier to track).

Use mypy for static type checking

Is your feature request related to a problem? Please describe.
It would be nice to have mypy be used in pacsanini in order to reap benefits from static typing. This should help:

reduce coding errors by forcing static code checking
make the code easier to understand for developers (by showing what data types are being used)
make the code easier to understand for users (by exposing parameter types and return types)

Describe the solution you'd like
I would like to make sure that all pacsanini is statically typed (this should already be the case for the most part).
A pre-commit hook should also be introduced in order to enforce static typing.

Describe alternatives you've considered
The alternative option would be to trust that types are always the expected ones. I believe that this could lead to errors as the code base grows.

Additional context
https://github.com/python/mypy

Add method to dump database to CSV files

Is your feature request related to a problem? Please describe.
It would be good to be able to dump database backends into CSV format. This would enable for some flexibility
in how results can be persisted and shared.

Describe the solution you'd like
Dumping the database could take the following form:

a CLI command like pacsanini db dump where each table is dumped to a csv file.
a -o/--output option could be specified to indicate the output directory where these files should be stored.
optional flags corresponding to table names could be set to indicate that only specific tables should be dumped. The default would be all tables.

Describe alternatives you've considered
Having users develop their custom solution.

Additional context

Add a security guideline file

Is your feature request related to a problem? Please describe.
There is currently no explicit SECURITY.md file that outlines the security policy of the project. This should change.

Describe the solution you'd like
A simple SECURITY.md file at the root of the project should suffice.

Describe alternatives you've considered
The alternative would be to not have an explicit security policy which is not desirable.

Additional context

Add Dockerfile to project

Is your feature request related to a problem? Please describe.
It would be good to have a Dockerfile added to this project as this would allow for:

more portability for the project.
a clear example of how the project can be installed

Describe the solution you'd like
It would be good to have a Dockerfile that could be easily built. The Dockerfile's entrypoint should
correspond to the project's CLI entrypoint.

Describe alternatives you've considered
An alternative would be more explicit documentation (this solution is not exclusive). However,
actions speak louder than words.

Additional context

Enable default configuration file for CLI usage

Is your feature request related to a problem? Please describe.
It would be good to to have a default configuration file location so that running CLI commands can be
done without using the -f/--config option.

Describe the solution you'd like
In the pacsanini.config file, there exist global variables that can be used to specify the location of a configuration
file. Commands should look for this if there no -f/--config option is set.

Describe alternatives you've considered
The alternative would be to keep providing the -f/--config options but this can be simplified.

Additional context

Publish pacsanini documentation on the readthedocs

Is your feature request related to a problem? Please describe.
It would be good to host the pacsanini documentation in a way that makes it accessible to all without needing to build it.

Describe the solution you'd like
NA

Describe alternatives you've considered
NA

Additional context
NA

str2timedelta does not correctly handle fractions

[ x] I have checked that this bug has not yet been reported
[ x] I have checked that this bug exists in the latest version of pacsanini

Describe the bug
The DICOM specification dictates that the fractional component may contain 1 to 6 characters:

The FFFFFF component, if present, shall contain 1 to 6 digits. If FFFFFF is unspecified the preceding "." shall not be included.
http://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_6.2

However, str2timedelta rejects the value if FFFFFF is not 6 digits.

Code To Reproduce

    time5 = "045812.123"
    time5_expected = timedelta(hours=4, minutes=58, seconds=12, microseconds=123000)
    time5_result = convert.str2timedelta(time5)
    assert time5_result == time5_expected

Expected behavior
Parsing conforms to DICOM spec.

Additional context
Version fc21cfb.

Establish database schema for pacsanini models

Is your feature request related to a problem? Please describe.
The objective of this task is to implement a database structure in relation to what is described in #46 .

Describe the solution you'd like
The implementation of this should be done using sqlalchemy. Utilities should be put into place in order to create the database as well.

Describe alternatives you've considered

Additional context

Have storescp server support plugins

Is your feature request related to a problem? Please describe.
It would be nice to have the storescp function support "plugin" type functions.
The idea would be to be able to pass functions or string imports to the StoreSCPServer class so that incoming DICOM events can be passed to the functions before the DICOM file gets persisted. This would allow for greater flexibility in how the code can be used.

Describe the solution you'd like
I would like to see an optional StoreSCPServer constructor parameter be introduced. This would allow passing a function or list of functions that would then be run over incoming Event objects. The possibility of passing functions to StoreSCPServer instances should be possible via the configuration file.

Describe alternatives you've considered
I have considered not proposing this feature and letting users implement their own features. However, this probably would result in a lot of biolerplate code.

Additional context

PyInstaller not properly configured

I have checked that this bug has not yet been reported
I have checked that this bug exists in the latest version of pacsanini

Describe the bug
The current make command for building an executable version of pacsanini is not working. This is most likely because it packages prefect, which needs a config.toml file to run. However, pyinstaller does not do this by default. Changes should be made to add the config.toml file that prefect needs.

Code To Reproduce

$ make build-exe
$ ./dist/pacsanini/pacsanini --help
# Outputs:
Traceback (most recent call last):
  File "pacsanini/__main__.py", line 6, in <module>
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "PyInstaller/loader/pyimod03_importers.py", line 540, in exec_module
...
FileNotFoundError: [Errno 2] No such file or directory: '.../dist/pacsanini/prefect/config.toml'
[85201] Failed to execute script __main__

Expected behavior
The expected behavior should be a functional pacsanini executable.

Additional context

Improve usability and lower entry barriers to pacsanini

Is your feature request related to a problem? Please describe.
Despite the presence of documentation and some "HOW-TO" guides, the pacsanini lacks some crucial elements to make it developer/open source friendly.

The CONTRIBUTING guide should be available at the top-level directory of the project to make it easier for newcomers to see how they can contribute.

A CODE_OF_CONDUCT guide should be placed at the top-level directory of the project to better explain what values the project promotes and adheres to.

A release process document should be made to better explain how the project is released so that users and developers can easily see how the project can evolve. Although no releases exist so far, it would be better to anticipate that there will be in the future.

The license to change from GPLv3 to Apache 2.0. The GPLv3 license is too restrictive in my opinion as it prevents the code base to be used in any other context than purely open source. Apache 2.0 (https://choosealicense.com/licenses/apache-2.0/) is quite a popular license choice and would allow for use of this code for private use without having to publish the calling code.

Furthermore, DCOs should be put into place in order to alleviate possible ambiguity as to whether code contributions are OK to use or not (cf. https://github.com/apps/dco).

I would also like to see more in-code examples and use-cases explaining how to use pacsanini and why it can be useful in the documentation. For now, it mostly contains examples from the command-line. However, the "code philosophy" of pacsanini is that whatever is being used as a CLI should also be able to be used inside native python code (which should also allow for more options).

Describe the solution you'd like
I would like the aforementioned documents to be added/modified.

Describe alternatives you've considered
The alternative would be to leave the project without such documents. I fear that this will not encourage external contributions and will limit the impact of the project on the overall community -which is obviously not the goal.

Additional context

Add email notification tasks in pipeline

Is your feature request related to a problem? Please describe.
For long lasting pipeline executions, it would be good to have email notifications indicating what the pipeline's overall status and whether there were failures.

Describe the solution you'd like
Email notifications should be sent to users when:

a task has failed to notify that the pipeline will need to be rerun.
the find task has finished -the message should include the number of studies retrieved
the move task has finished -the message could include the number of studies+images retrieved, the extraction duration, etc..
the parsing task (if used) has finished

Describe alternatives you've considered
NA.

Additional context

Replace luigi with prefect

Is your feature request related to a problem? Please describe.
The luigi framework does not seem to be exactly what is needed by pacsanini. It's very target based and if a target exists (a file or a database for example), ETL tasks are considered to be successful. This is not always practical for updating databases with new C-MOVE results for example. Workarounds are possible but I feel like this is an indication that luigi is definitely not the good tool for this task.

In contrast, prefect tasks are much more intuitive (to me at least). They are defined as simple functions which makes code readability better as well. Conditional flows are easily programmed which is quite useful and it seems that their documentation is more complete.

Describe the solution you'd like
I would like to perform the same actions that luigi does but with prefect.

Describe alternatives you've considered
I've considered continuing with luigi in #51 but this seems unnecessarily hard to do/maintain.

Additional context

Invalid project classifier

[ x ] I have checked that this bug has not yet been reported
[ x ] I have checked that this bug exists in the latest version of pacsanini

Describe the bug
The pyproject.toml file contains a classifier that is invalid (ie: "Topic :: Medical Imagery",). This should be removed so that the project can properly be released.

Code To Reproduce
NA

Expected behavior
The project should be able to be released.

Additional context
NA

Make luigi an extra

Making luigi an extra

luigi is a relatively heavy framework to install. I think that it should only be set as an extras. Many uses of pacsanini may not need the orchestration/pipeline framework so there is no need to force it.

This means that modules under pipeline should not be imported above that in any part of the code for now. To avoid having issues, an ImportError should be placed over the luigi import. If it is raised, luigi should be set to None. Inside the run_pacsanini_pipeline, if luigi is None, an error should be raised indicating that luigi should be installed with pip install pacasnini[luigi].

Add GitHub actions

Is your feature request related to a problem? Please describe.
As of now, no GitHub actions exist. This makes reviewing PRs (especially from forks) harder to do.

Describe the solution you'd like
I would like to introduce the following github workflow:

pull request events should initiate checks for formatting (make lint and make format) + run the tests for each supported python version.

Describe alternatives you've considered
The alternative would be to clone forked repos and/or re-run tests on submitted branches. This might always be necessary but having pre-existing test results would be good.

Additional context

Link storescp server to application database

Is your feature request related to a problem? Please describe.
In issue #46, a database structure is put into place. CRUD operations are also defined that allow users to add images to the database and also add found studies. What links the studies table to the studies_find table is an optional foreign key on studies. Ideally, the storescp server should be able to persist the DICOM files it receives in the database and also update the studies table in a way that the newly inserted study record is linked to the study_find record.

Describe the solution you'd like
I would either opt in for a subclass of the current StoreSCPServer class that would have an instance attribute be the database connection or pre-defined callbacks to be defined so that the storescp server can invoke them after persisting the data.

Describe alternatives you've considered

Additional context
See related issues:

Update mkdocs to version>=1.2.3

Is your feature request related to a problem? Please describe.
A vulnerability was disclosed in the mkdocs project for versions < 1.2.3.

Details can be found here: GHSA-qh9q-34h6-hcv9

The mkdocs dependency should therefore be updated.

Update PyYaml and pydantic

[ x ] I have checked that this bug has not yet been reported
[ x ] I have checked that this bug exists in the latest version of pacsanini

Describe the bug
Dependabot alerts have been emitted for PyYaml (critical severity) and Pydantic (low severity). Although the CVEs don't relate to the specific ways that the dependencies are used in the project, the dependencies should still be updated to be safe.

Code To Reproduce

poetry add "pydantic>=1.7.4"
poetry add "PyYAML>=5.4"

Expected behavior
This should have no impact on the code. This is a just a question of patching some of the dependencies.

Additional context
CVEs:

pydantic: GHSA-5jqp-qgf6-3pvh
pyyaml: GHSA-8q59-q68h-6hv4

Prepare release 0.1.1

Release 0.1.0 had issues regarding its upload to PyPI due to an invalid classifier in the pyrpoject.toml file.
Now that this has been resolved with PR #21 , the version numbers of the project in the pyproject.toml file and __version__.py file should be bumped to 0.1.1.

Have create tables be a kwarg in parse_dir2sql

I have checked that this bug exists in the latest version of pacsanini
I have checked that this bug has not yet been reported

Describe the bug
The pacsanini.db.parser::parse_dir2sql method raises an error when the database tables already exist. This is because the create_tables param of DBWrapper is hardcoded to True. This should instead be an optional param.

Code To Reproduce

pacsanini db init -f pacsanini_conf.yaml

And then:

pacsanini db parse -i tests/data/dicom-files -f pacsanini_conf.yaml --fmt sql

Expected behavior
I would like the user to have control over whether the database tables should be create when running parse_dir2sql. The default should be False.

Additional context
Add any other context about the problem here. Please specify the pacsanini version, python version and operation system version.

Roadmap for pacsanini

Roadmap/features

The pacsanini project now feels somewhat mature in terms of the functionalities it offers:

DICOM parsing in structured format
Emitting C-FIND requests to identify DICOM resources
Emitting C-MOVE requests to retrieve DICOM resources
Starting a storescp server to handle incoming DICOM data

The functionalities that are most interesting and could be improved the most are most probably the DICOM parsing, C-FIND operation, and storescp parsing functionalities.

DICOM parsing

The parsing of DICOM files now mainly outputs results in CSV format (sqlite too). CSV output serves research purposes well but the next step would be to be able to parse DICOM files into a database. In this way, structured data can be accessed across multiple servers simultaneously and without the need of copying data from/to servers or creating a NFS system.

The tricky part of having a database is choosing the right type. I think that a SQL database (probably postgres) should be the best. SQL is a mature standard and given the multiple engines that exist (postgres, mariadb, mysql, ...), users should have the choice to use whatever tool they prefer most. Furthermore, libraries such as sqlalchemy provide great abstraction for implementing this all.

Another tricky part shall be defining a general schema for the database. This is tricky because DICOM attributes can vary between modalities. A general solution would therefore be to expose mainly mandatory attributes and store the DICOM file's metadata as a JSON object (BLOB, JSON, JSONB (preferred)).

The database structure should also make sense for storing data received from the storescp server (more on that later).

Overall, the database would have the following tables, which would fare well with the DICOM data representation model:

patients (patient-level data - linked to studies)
studies (study-level data - linked to patients and series)
series (series-level data - linked to studies and images)
images (image-level data - linked to series)

C-FIND operations

To be able to give a bird's eye view of the data pipeline status, c-find results should be persisted in the database as well. A table named studies_find would be put into place that would store basic DICOM attributes of resources. In addition, two columns would be introduced: found_on and retrieved_on. found_on corresponds to the date at which the data was returned from a C-FIND request. moved_on corresponds to the data at which the data was retrieved from the PACS with a C-MOVE request.

storescp server

The storescp server was originally conceived to accept callbacks/plugins. This means that in addition to persisting data users can pass callbacks to perform additional actions on the data. One such action that should be facilitated is the parsing of DICOM metadata into the database. A system that updates the studies_find table and parses the DICOM file at the same time would be good.

Furthermore, the storescp server should be able to accept callbacks that will run before and after the data is persisted on disk.

pre-persistence callbacks should be put into place so that routines such as anonymization can run
post-persistence callbacks should be put into place so that routines that notify other systems or update the database can run

Improve coverage with pytest-docker and docker

Is your feature request related to a problem? Please describe.
Coverage of pacsanini is relatively good but falls short on some network aspects (especially the C-MOVE part) and on the pipeline aspect.

Testing the pipeline and the C-MOVE requests fall more under the "integration" category than the "unitary" category. It therefore makes sense to setup a docker service that emulates a pytest docker for this.

Describe the solution you'd like
I would like a docker service to run an Orthanc instance.
Once running, we would then have to:

Add a local pacsanini modality to the Orthanc configuration
Upload some sample data to the Orthanc instance so that it can be queried

The test suite would then be able to use the Orthanc for DICOM networking tests.

Describe alternatives you've considered
The alternative would be to write a custom PACS server but this seems to be too tedious and would not necessarily emulate a real situation.

Additional context

pytest-docker: https://github.com/avast/pytest-docker
orthanc: https://www.orthanc-server.com/

Improve CRUD functionalities

Is your feature request related to a problem? Please describe.
The pacsanini.db.crud currently has several CRUD methods defined that serve very specific purposes. Whilst functional, this does not enable users to easily access data in ways that could enable them to build more applications. For example, there is no pre-defined way to access image metadata for a particular image.

Describe the solution you'd like
I would like each database model to have an associated CRUD class that could peform:

simple getting (by ID or some other identifier such as SOPInstanceUID)
getting a set of data (eg: a group of study images)
updating data (the existing methods would be wrappers around this)
deleting data (if need be)

Describe alternatives you've considered
As the code-base grows, the alternative to this would be to add more unstructured methods but this seems hard to manage.

Additional context

Document DCO process

Have you noticed outdated documentation?
The project currently requires that developers/contributors sign-off commits with DCOs (change applied since #16 ).
This is not documented and should be. The best place would be in the CONTRIBUTING.md file.

Would you like to see better documentation on a particular area?

Allow dcm2dict to accept bytes

Is your feature request related to a problem? Please describe.
I would like pacsanini.convert.dcm2dict to accept a bytes value for the dcm param. This should have very little impact on the underlying code as pydicom.dcmread already supports converting bytes to a dataset.

Describe the solution you'd like
Implement the following line:

if isinstance(dcm, (str, bytes)):
    dcm = dcmread(dcm, stop_before_pixels=not include_pixels)

Describe alternatives you've considered

Additional context

Don't move previously moved resources

Is your feature request related to a problem? Please describe.
Running pacsanini move multiple times using a CSV file as a way to obtain resources to retrieve is somewhat flawed.

When resources are retrieved via a SQL backend, a check is made to determine which resources have already been retrieved and which resources need to be retrieved. The move operations will then only be made for resources not already on disk.

With CSV, this is not the case. This is problematic because if the move process crashes or is stopped, users will need to re-move all resources regardless of whether they are already present or not.

Describe the solution you'd like
I would like to add a boolean flag to the pacsanini move command so that it is possible to indicate whether resources should be downloaded from scratch or not. This could take the form of --move-downloaded/--no-move-downloaded. The default would be --no-move-downloaded.

When --no-move-downloaded is true, a listing of the DICOM directory would be made to determine what is already downloaded and only new resources would be moved.

Describe alternatives you've considered
The alternative would be for users to redo a csv file with a subset of resources to move. However, this can cause confusion/redundancy as multiple csv files could be created.

Additional context
none.

Callback is not applied when trying to parse nested tags

I have checked that this bug has not yet been reported
I have checked that this bug exists in the latest version of pacsanini

Describe the bug
When parsing nested tags, the function get_dicom_tag_value is recursively called, but initial arguments are not passed.

Code To Reproduce

def callback(_):
    return "foo"

get_dicom_tag_value(dicom, "ViewCodeSequence", callback=callback)
>>> 'foo'

get_dicom_tag_value(dicom, "ViewCodeSequence.CodeValue", callback=callback)
>>> 'R-10226'

Expected behavior
The callback should be applied even on nested tags.

Add SQL support to pacsanini

Is your feature request related to a problem? Please describe.
I think that the overall structure of pacsanini is becoming quite clear with regards to how it finds, collects, and parses DICOM data. I believe that one limitation that this workflow currently has is the absence of a central database where C-FIND, C-MOVE and DICOM parsing results can be stored. Such a centralized database would enable for other applications to integrate with pacsanini easier and enable users to get an easier overview of what has been done.

Describe the solution you'd like

I would like to define a very simple database structure that would contain two tables:

studies_find: This table would be populated with C-FIND queries. Each query will populate the table with basic C-FIND data such as patient ID, study instance UID, ... A "found_on" field would be added in order to track when the data was found from the PACS. A "retrieved" field would also exist. This field would be updated when emitting C-MOVE requests.
images: This table would be populated when parsing DICOM data. This could also be populated in the future by directly parsing DICOM data that is retrieved by C-MOVE requests (something that would be possible when #8 is mature enough).

It was originally not very clear in my mind how to represent DICOM file metadata in the database. I originally considered a system in which the configuration files that are used for parsing would define the table structure. However, such configuration files may change -rendering the existing table structures obsolete- and it would add an extra layer of complexity for users as they would have to define indexes etc...

Because of this, I think that defining a simple images table with common/(mostly) mandatory DICOM tags is the best solution. So that the parsing of the DICOM data can be all found in the database, I would like to add a meta field in the table that would be a JSON field representing the result of a call to the pacsanini.convert.dcm2dict method. In this way, any DICOM tag can be queried (even if it is not the most obvious way to do it).

To be able to integrate the database features into the project's current configuration file structure easily, I think that checks in the code should be created in order to determine whether the storage.resources value (indicating where parsing results are stored) is a database URI or a filepath. If the URI is a filepath, then previous behavior will be adopted. Otherwise, if the find, move, or parsing routines are invoked from the CLI, results will be stored and updated in the database.

In this issue, I would like to consider only the sqlite option for databases. This is by far the simplest to implement and test. If this issue is deemed mature enough, more SQL dialects (postgres, mysql, mariadb, ...) will be considered. With this consideration in mind, I think that sqlalchemy would be the most appropriate library to use for managing database related things in the project's code. It would then be possible to add dependency extras to the project so that different users can use their own databases.

Describe alternatives you've considered
For a moment I had considered using a nosql alternative to store data in a database. This would have enabled usage of a more lax definition of data structures. However, such databases (eg: mongodb and couchdb) come with their complexities, especially for deployment.

Additional context

Show support for black and isort in README

Have you noticed outdated documentation?
This is not a problem of the documentation being outdated. However, it would be good to show support to black and isort as they are used in the project's pre-commit hooks by including their badges in the README.

Fix pipeline bug which doesn't support database interactions

I have checked that this bug has not yet been reported
I have checked that this bug exists in the latest version of pacsanini

Describe the bug

Since #50 has been introduced in pacsanini, the luigi pipeline does not work anymore. This is because the pipeline is only configured to work with CSV or JSON files for their input.

Code To Reproduce

storage:
  resources: sqlite:///sqlite.db

pacsanini orchestrate -f pacsanini.yaml

This gives a value error saying the sqlite:///sqlite.db does not exist as a file.

Expected behavior
Using a database as a backend and a CSV as a storage mechanism should work in the same way.

Additional context

Release version 0.2.0

With the latest modifications made to pacsanini, the minor version of the project should be bumped to 0.2.0 from 0.1.X.

For reference on the main changes, see #46 and #53

Create views for pacsanini database

Is your feature request related to a problem? Please describe.
With the idea of achieving the data visualization dashboard, it would be good to develop views that can be easily queried to obtain summary information of the underlying data that is stored in the database.

Describe the solution you'd like
I would like to define the following two views:

a view that summarizes manufacturer data (grouping results in the images table by manufacturer)
a view that summarizes study data (grouping results by StudyInstanceUID and merging that with the image data)

Describe alternatives you've considered
The alternative would be to emit more complex queries each time in ways that may not be optimized.

Additional context

orchestrate flow not running when initialize database is False

I have checked that this bug has not yet been reported
I have checked that this bug exists in the latest version of pacsanini

Describe the bug
In pacsanini.pipeline.orchestra, the run_pacsanini_pipeline method will skip the find_dicom_resources task if the database is not initialized (ie: is skipped). The solution to this would be to set the find_dicom_resources task to not be skipped even though the initialize database task is skipped.

# above line 83
find_dicom_resources.skip_on_upstream_skip = False

Code To Reproduce

pacsanini orchestrate -f pacsanini_conf.yaml

Expected behavior
The prefect flow should continue even if the database initialization task is skipped.

Additional context

Change SQLAlchemy semantics

Is your feature request related to a problem? Please describe.
The StudyFind table is singular. The Patients, Studies, Series, and Images tables are plural. It makes more sense for all tables to referred to in the singular form when using these classes in python code.

I believe that:

session.query(Image).filter(Image.id == dbid)

is a more intuitive way to query data than:

session.query(Images).filter(Images.id == dbid)

Describe the solution you'd like
This would be a "simple" change of renaming the Patients, Studies, Series, and Images classes
to their singular form.

Describe alternatives you've considered
It would be simpler to leave the class names as they are. However, it feels like it is the right thing to do
to rename classes in their singular form and have consistent naming conventions.

Additional context

dcm2dict is not recursive

I have checked that this bug has not yet been reported
I have checked that this bug exists in the latest version of pacsanini

Describe the bug
The pacsanini.dcm2dict method that converts pydicom.Dataset instances to JSON compatible dicts does not consider nested tags.

Code To Reproduce

import os
from pydicom import dcmread
from pacsanini.convert import dcm2dict

# Run at the root of the project.
path = ""
for root, _, files in os.walk("tests/data/dicom-files"):
    for f in files:
        path = os.path.join(root, f)
        break

dcm_dict = dcm2dict(path)
vcs = None
for val in dcm_dict.values():
    if val.get("Name", "") == "ViewCodeSequence":  # a sequence tag with nested elements.
        vcs = val
        break

for elem in vcs["Value"]:
    assert "Name" not in elem
# returns True for all but this should not be the case.

Expected behavior
Each DICOM tag should have a new "Name" key added after it is converted to DICOM format.

Additional context

Empty tag values trick `pacsanini.parse.get_tag_value`

I have checked that this bug has not yet been reported
I have checked that this bug exists in the latest version of pacsanini

Describe the bug
When trying multiple tags to get a value, the parsing can be fooled when a DICOM tag exists and contains an empty string value. This is because the get_tag_value method checks whether return values are None instead of whether they are truthy or falsy (if tag_val is not None: (line 103)). This can be easily rectified by simply having: if tag_val:.

Code To Reproduce

from pydicom import Dataset
from pacsanini.parse import get_tag_value

ds = Dataset()
ds.Laterality = ""
ds.ImageLaterality = "R"

result = get_tag_value(ds, ["Laterality", "ImageLaterality"])
# result == ""

Expected behavior
In the above example, the expected result should be "R".

Additional context

Update license in pyproct.toml file

[ x ] I have checked that this bug has not yet been reported
[ x ] I have checked that this bug exists in the latest version of pacsanini

Describe the bug
After the PR #16 integration, (see issue #15 for the context), the license type in the pyproject.toml does not match the actual project license anymore. Furthermore, the "include" parameter includes the COPYRIGHT file which should not be the case anymore as the file does not exist.

Code To Reproduce
NA

Expected behavior
NA

Additional context
NA

Access nested DICOM tags in a sequence element that is not the first sequence element

Is your feature request related to a problem? Please describe.
The code in get_dicom_tag_value handles nested tags through . notation. However, this does work for cases where the nested tag belongs to a sequence element indexed at some other position than 0. With the following example:

(0012,0064) SQ (Sequence with undefined length #=2)     # u/l, 1 DeidentificationMethodCodeSequence
  (fffe,e000) na (Item with undefined length #=3)         # u/l, 1 Item
    (0008,0100) SH [113100]                                 #   6, 1 CodeValue
    (0008,0102) SH [DCM]                                    #   4, 1 CodingSchemeDesignator
    (0008,0104) LO [Basic Application Confidentiality Profile] #  42, 1 CodeMeaning
  (fffe,e00d) na (ItemDelimitationItem)                   #   0, 0 ItemDelimitationItem
  (fffe,e000) na (Item with undefined length #=3)         # u/l, 1 Item
    (0008,0100) SH [113106]                                 #   6, 1 CodeValue
    (0008,0102) SH [DCM]                                    #   4, 1 CodingSchemeDesignator
    (0008,0104) LO [Retain Longitudinal Temporal Information Full Dates Option] #  58, 1 CodeMeaning
  (fffe,e00d) na (ItemDelimitationItem)                   #   0, 0 ItemDelimitationItem

Accessing the DICOM value with DeidentificationMethodCodeSequence.CodeValue would return 113100. However, it may be the case the 113106 value is being searched.

Describe the solution you'd like
It would be good to have some notation where you could specify the index of the sequence element to search. This could take the form of a bracket notation: sequence_tag[idx]nested_tag. In the example above, DeidentificationMethodCodeSequence[1]CodeValue should return 113106.

Describe alternatives you've considered
There are currently no alternatives to accessing such nested tags through the current framework -which is a limitation.

Additional context

Use default configuration files

Is your feature request related to a problem? Please describe.
In almost all pacsanini commands, the configuration option (-f/--config) must be supplied. To make the use of commands even simpler, it would be nice if pacsanini could check whether a configuration file exists in the current directory or not. If true, then the command would source that file. Otherwise, an error would be raised.

Of course, documentation would need to be added to inform users of this feature.

Describe the solution you'd like
A single method could check whether a file named pacsaninirc.yaml exists in the current directory. This method would be called by CLI entry points when validating the -f/--config option.

Describe alternatives you've considered
The alternative would be not to do this. However, I believe that this option makes it easier to use pacsanini.

Additional context

Have parsing include file path by default

Is your feature request related to a problem? Please describe.
I think that by default, the include_path parameter of all DICOM parsing functions of pacsanini.net.*:parse_dir* should be set to True instead of False. Of course, the CLI tool should be appropriately adapted to this change.

The reason for this is that more often than not, when parsing multiple files, it is best to be able to immediately associate them with a file path. This would allow code/applications using the results of the parsing to more easily obtain a hold of the underlying files.

Describe the solution you'd like
I would like to modify the following methods by having include_path set to True by default:

pacsanini.io.base_parser:parse_dir
pacsanini.io.df_parser:parse_dir2df
pacsanini.io.io_parsers.parse_dir2csv
pacsanini.io.io_parsers.parse_dir2json

The pacsanini parse command would still keep the --include-path flag but also have the --exclude-path flag. The --include-path flag would be set by default.

Describe alternatives you've considered
The alternative would be to continuously pass the include_path parameter with a value of True. This works as well but I don't think that False is the appropriate default.

Additional context

Add dashboard to visualize collected data

Is your feature request related to a problem? Please describe.
It would be good to have a way to observe collected data via a dashboard with at least one graph.
Because the main goal of pacsanini is to perform well when collecting and structuring data, the dashboard should be based on an existing framework that does such things well out of the box. I am thinking of dash for this.

Describe the solution you'd like
I would like a dash app to be included in pacsanini. This would enable the visualization of patient, study, and image counts per year. Filters should be applied to explore data with more depth. These can be :

manufacturer
study dates
image count within a study
patient age

Added documentation would be needed to explain how to use the dashboard.

Describe alternatives you've considered
NA

Additional context
NA

Reduce boilerplate code when loading configuration

Is your feature request related to a problem? Please describe.
The same boilerplate code is always being called by command options when reading the provided
configuration file. Example:

ext = config.rsplit(".", 1)[-1].lower()
load_func = (
        PacsaniniConfig.from_json if ext == "json" else PacsaniniConfig.from_yaml
)
pacsanini_config = load_func(config)

Describe the solution you'd like
This could be avoided by having the @config_option decorator return a PacsaniniConfig instance
instead of its file path.

Describe alternatives you've considered
Keep boilerplate code in the project.

Additional context

Refactor testing so that the only PACS server used is orthanc

Is your feature request related to a problem? Please describe.
In some tests (eg: for C-FIND), testing is still done using the custom "PACS"-like server that was written in the beginning of the project. To be consistent with testing, we should be using the orthanc server spawned by docker-compose in the beginning of the test.

Describe the solution you'd like
Rather than use the test_dicom_server fixture in tests/net/conftest.py, use the orthanc instance spawned at the beginning of the tests.

Describe alternatives you've considered
Have different servers that serve as testing references -not good.

Additional context

Add tests for utils module

Is your feature request related to a problem? Please describe.
There is no problem so far but this issue tries to anticipate possible future problems.

Describe the solution you'd like
I would like to add tests for the utils.py module as all modules should be tested at a minumum.

Describe alternatives you've considered
The None testing framework is not a good alternative.

Additional context

Add python examples and update installation process

Have you noticed outdated documentation?
The installation page in the docs is outdated as it does not mention how to install the project from PyPI.

Would you like to see better documentation on a particular area?
The project's roadmap mentions that documentation should be written to explain how pacsanini can be
used inside a python application.

Add c-store functionality

Is your feature request related to a problem? Please describe.
It would be good if an out of the box solution existed to send DICOM files using C-STORE. This could also be used to initialize the test DICOM server.

Describe the solution you'd like
A c_store module with a send_dicom method. This method would accept a single DICOM file or a directory in which case the directory would be scanned recursively.

Describe alternatives you've considered
NA

Additional context

Consider adding migration scripts for database

Is your feature request related to a problem? Please describe.
With probable incoming and previous changes made to the database tables, it would be good to have versioned migration scripts. This would ensure that moving from one version to the next can be done smoothly.

Documentation should be written to explain how to do this.

Describe the solution you'd like
I would like to use alembic for this: https://alembic.sqlalchemy.org/en/latest/

Describe alternatives you've considered
Have everyone do their migrations in their corner -not good.

Additional context

therapixel / pacsanini Goto Github PK

pacsanini's Introduction

pacsanini

Documentation

Contributing and Code of Conduct

Installation

For development

Usage with docker

Roadmap

pacsanini's People

Contributors

Stargazers

Watchers

Forkers

pacsanini's Issues

Have you noticed outdated documentation?

Additional context

Additional context

Additional context

Making luigi an extra

Additional context

Roadmap/features

DICOM parsing

C-FIND operations

storescp server

Would you like to see better documentation on a particular area?

Additional context

Additional context

Recommend Projects

Recommend Topics

Recommend Org