Giter VIP home page Giter VIP logo

deid's Introduction

unit-tests type-hints doc-build test-coverage Python version PyPI version DOI

pydicom

pydicom is a pure Python package for working with DICOM files. It lets you read, modify and write DICOM data in an easy "pythonic" way. As a pure Python package, pydicom can run anywhere Python runs without any other requirements, although if you're working with Pixel Data then we recommend you also install NumPy.

Note that pydicom is a general-purpose DICOM framework concerned with reading and writing DICOM datasets. In order to keep the project manageable, it does not handle the specifics of individual SOP classes or other aspects of DICOM. Other libraries both inside and outside the pydicom organization are based on pydicom and provide support for other aspects of DICOM, and for more specific applications.

Examples are pynetdicom, which is a Python library for DICOM networking, and deid, which supports the anonymization of DICOM files.

Installation

Using pip:

pip install pydicom

Using conda:

conda install -c conda-forge pydicom

For more information, including installation instructions for the development version, see the installation guide.

Documentation

The pydicom user guide, tutorials, examples and API reference documentation is available for both the current release and the development version on GitHub Pages.

Pixel Data

Compressed and uncompressed Pixel Data is always available to be read, changed and written as bytes:

>>> from pydicom import dcmread
>>> from pydicom.data import get_testdata_file
>>> path = get_testdata_file("CT_small.dcm")
>>> ds = dcmread(path)
>>> type(ds.PixelData)
<class 'bytes'>
>>> len(ds.PixelData)
32768
>>> ds.PixelData[:2]
b'\xaf\x00'

If NumPy is installed, Pixel Data can be converted to an ndarray using the Dataset.pixel_array property:

>>> arr = ds.pixel_array
>>> arr.shape
(128, 128)
>>> arr
array([[175, 180, 166, ..., 203, 207, 216],
       [186, 183, 157, ..., 181, 190, 239],
       [184, 180, 171, ..., 152, 164, 235],
       ...,
       [906, 910, 923, ..., 922, 929, 927],
       [914, 954, 938, ..., 942, 925, 905],
       [959, 955, 916, ..., 911, 904, 909]], dtype=int16)

Decompressing Pixel Data

JPEG, JPEG-LS and JPEG 2000

Converting JPEG, JPEG-LS or JPEG 2000 compressed Pixel Data to an ndarray requires installing one or more additional Python libraries. For information on which libraries are required, see the pixel data handler documentation.

RLE

Decompressing RLE Pixel Data only requires NumPy, however it can be quite slow. You may want to consider installing one or more additional Python libraries to speed up the process.

Compressing Pixel Data

Information on compressing Pixel Data using one of the below formats can be found in the corresponding encoding guides. These guides cover the specific requirements for each encoding method and we recommend you be familiar with them when performing image compression.

JPEG-LS, JPEG 2000

Compressing image data from an ndarray or bytes object to JPEG-LS or JPEG 2000 requires installing the following:

RLE

Compressing using RLE requires no additional packages but can be quite slow. It can be sped up by installing pylibjpeg with the pylibjpeg-rle plugin, or gdcm.

Examples

More examples are available in the documentation.

Change a patient's ID

from pydicom import dcmread

ds = dcmread("/path/to/file.dcm")
# Edit the (0010,0020) 'Patient ID' element
ds.PatientID = "12345678"
ds.save_as("/path/to/file_updated.dcm")

Display the Pixel Data

With NumPy and matplotlib

import matplotlib.pyplot as plt
from pydicom import dcmread
from pydicom.data import get_testdata_file

# The path to a pydicom test dataset
path = get_testdata_file("CT_small.dcm")
ds = dcmread(path)
# `arr` is a numpy.ndarray
arr = ds.pixel_array

plt.imshow(arr, cmap="gray")
plt.show()

Contributing

We are all volunteers working on pydicom in our free time. As our resources are limited, we very much value your contributions, be it bug fixes, new core features, or documentation improvements. For more information, please read our contribution guide.

deid's People

Contributors

briankolowitz avatar dimitripapadopoulos avatar fcossio avatar glebsts avatar howff avatar jjderidder avatar johannesu avatar jstorrs avatar kolowitzbj avatar mjcbello avatar nbelakovski-mssm avatar petkaze avatar robinfrcd avatar sjswerdloff avatar vsoch avatar wetzelj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deid's Issues

Allow for custom functions to be passed into deid recipes

I would want to be able to have this in a deid recipe:

%header
REPLACE PatientID func:generate_id

And the action would be to use a function in the global space called "generate_id" to provide the PatientID, and return the new value. This is appropriate for "on the fly" generation of values.

Editing Dicom Preamble

Hello, I'll preface that I've only been learning python since March but have come a long way. I am using Pydicom and Deid to do some mass deidentification of dicom files and I noticed when I check the files, all fields I wanted changed in the header are changing, but the Media Storage SOP Class UID and Media Storage SOP Instance UID in the preamble are not changing. The SOP Class UID isn't that big of a deal because it's just an image type identifier, but more often than not, the Media Storage SOP Instance UID is just a copy of the actual SOP Instance UID with is PHI that needs removed. Is there a way to alter some code to get the Deid process to also change fields in the preamble as well? Thank you in advance for any help or guidance you can provide.

add custom "endswith:" filter for fields

instead of the following:

FORMAT dicom

%header

REPLACE PatientID var:entity_id
REPLACE SOPInstanceUID var:item_id
ADD PatientIdentityRemoved Yes"
JITTER InstanceCreationDate var:jitter
JITTER InstanceCreationTime var:jitter
JITTER StudyDate var:jitter
JITTER SeriesDate var:jitter
JITTER AcquisitionDate var:jitter
JITTER OverlayDate var:jitter

I should be able to do:

FORMAT dicom

%header

REPLACE PatientID var:entity_id
REPLACE SOPInstanceUID var:item_id
ADD PatientIdentityRemoved Yes"
JITTER endswith:Date var:jitter
JITTER endswith:Time var:jitter

to apply the same filter over all fields that end with (and start with) the term of interest.

Include specific Dicom group on get_identifiers

Is there a way to ensure that specific tag groups are included? Currently, the get_identifer function does not retrieve the 0051 group (which is essential for my image reconstruction...) Thanks!

Pixel Data with undefined length must start with an item tag

The following image:
IMG00001.dcm.zip

The error happens after clean() method successfully returns (blanking out coordinates supplied) and save_dicom method is called.

With tag (7fe0, 0010) got exception: Pixel Data with undefined length must start with an item tag
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.6/site-packages/pydicom/tag.py", line 30, in tag_in_exception
yield
File "/data/anaconda3/lib/python3.6/site-packages/pydicom/filewriter.py", line 475, in write_dataset
write_data_element(fp, dataset.get_item(tag), dataset_encoding)
File "/data/anaconda3/lib/python3.6/site-packages/pydicom/filewriter.py", line 435, in write_data_element
raise ValueError('Pixel Data with undefined length must '
ValueError: Pixel Data with undefined length must start with an item tag

interactive web interface for generating deid files

the user should be able to interactively generate a spec file to say how he/she wants his deidentification task to be done. For the SOM, we can point users here to generate, and then associate their spec files with the pipelines they have us doing.

add logic for within line testing

eg, a filter might have a check in parentheses:

if (Criteria 1 and Criteria 2)
OR Criteria 3

So we need to evaluate the first parens first!

fresh build of Docker yields "ModuleNotFoundError: No module named 'pydicom'

Although the Dockerfile does in fact run "pip install pydicom" after the Docker image is built "docker run pydicom/deid" using a command like inspect will fail with a package not found error.

It is possible this is related to miniconda3 using Python 3.7 but "matplotlib=2.1.2" forces a downgrade to Python 3.6.6, which occurs after the "pip install pydicom" command in the Dockerfile.

I switched the order of the conda install for matplotlib and pip install pydicom in Dockerfile and the problem went away, like this:

RUN apt-get update && apt-get install -y wget git pkg-config libfreetype6-dev
RUN /opt/conda/bin/conda install matplotlib==2.1.2
RUN pip install pydicom
RUN mkdir /code
ADD . /code
WORKDIR /code
RUN python /code/setup.py install

Could this be investigated and incorporated into the code base, if it makes sense? Thank you!

pydicom.read_file() --> _get_pixel_array() no longer exists

Attempting to use the DicomCleaner class to do pixel-level cleaning.

Looks like I keep getting an error in the clean.py file:

in
5 print(out)
6 if out['flagged']:
----> 7 client.clean()

/opt/conda/lib/python3.6/site-packages/deid-0.1.23-py3.6.egg/deid/dicom/pixels/clean.py in clean(self)
106
107 # We will set original image to image, cleaned to clean
--> 108 self.original = dicom._get_pixel_array()
109 self.cleaned = self.original.copy()
110

/opt/conda/lib/python3.6/site-packages/pydicom/dataset.py in getattr(self, name)
530 if tag is None: # name isn't a DICOM element keyword
531 # Try the base class attribute getter (fix for issue 332)
--> 532 return super(Dataset, self).getattribute(name)
533 tag = Tag(tag)
534 if tag not in self.tags: # DICOM DataElement not in the Dataset

AttributeError: 'FileDataset' object has no attribute '_get_pixel_array'

Tracing this issue back it looks like pydicom's FileDataset doesn't actually have a _get_pixel_array() function, as follows:

from pydicom import read_file
dicom = read_file(dicom_files[0])
dicom._get_pixel_array()

(dicom_files is a list of local paths to DICOM files). Gives the same error at the same location:

AttributeError Traceback (most recent call last)
in
1 from pydicom import read_file
2 dicom = read_file(dicom_files[0])
----> 3 dicom._get_pixel_array()

/opt/conda/lib/python3.6/site-packages/pydicom/dataset.py in getattr(self, name)
530 if tag is None: # name isn't a DICOM element keyword
531 # Try the base class attribute getter (fix for issue 332)
--> 532 return super(Dataset, self).getattribute(name)
533 tag = Tag(tag)
534 if tag not in self.tags: # DICOM DataElement not in the Dataset

AttributeError: 'FileDataset' object has no attribute '_get_pixel_array'

On the other hand, I seem to be able to call for the variable directly using dicom.pixel_array or, alternatively, dicom.__getattribute__("pixel_array")

Maybe pydicom changed the Dataset class at some point and broke the clean.py implementation?

ignoring user specified deid configuration

If i specify a deid file, why do you append the default 'dicom' file? it overrides my preferences. for instance i'd like to specify my own patientid in my deid file, but your 'dicom' default file removes it

in my deid.dicom file i'd like to have the following override your base configuration

%header
REPLACE PatientID var:patient_id

my "fix" is in header.py

if deid is not None:
        # deid = load_combined_deid([deid])
        deid = get_deid(deid, load=True)
    else:
        deid = get_deid('dicom', load=True)

possible resolutions seem to be
a) allow the user specified configuration to override your base configuration
b) add an option that allows the user to specify if they'd like to take your base configuration in addition to what they specify

items and lists

I find it a bit confusing that I can past an item or a list into many methods and always get a list back in return. This pattern is reflected many times throughout the code

# validate.py
if not isinstance(dcm_files,list):
        dcm_files = [dcm_files]
# header.py
if not isinstance(dicom_files,list):
        dicom_files = [dicom_files]

etc. I think it would be cleaner if the methods only took lists as arguments.

Example used with current source code gives ImportError

I checked out the current source tree, built the Docker image and then tried to use the data in the basic example. I get the exception detailed below when trying this, and it seems indeed like the import is missing in the code.

➜  code git clone [email protected]:pydicom/deid.git
Cloning into 'deid'...
➜  ~ cd deid 
➜  deid git:(master) docker build -t pydicom/deid .
➜  deid git:(master) docker run pydicom/deid inspect --deid /home/peter/code/deid/examples/deid/deid.dicom /home/peter/code/deid/deid/data/dicom-cookies --save 

Traceback (most recent call last):
  File "/opt/conda/bin/deid", line 11, in <module>
    load_entry_point('deid==0.1.11', 'console_scripts', 'deid')()
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/main/__init__.py", line 157, in main
    from .inspect import main
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/main/inspect.py", line 30, in <module>
    from deid.dicom import get_files
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/dicom/__init__.py", line 1, in <module>
    from .header import (
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/dicom/header.py", line 29, in <module>
    from .tags import (
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.11-py3.6.egg/deid/dicom/tags.py", line 28, in <module>
    from pydicom.tag import tag_in_exception
ImportError: cannot import name 'tag_in_exception'

Any ideas what might be wrong?

check out tag issue

This seems to be a common issue:

NotImplementedError: Invalid tag (403e, 3f62): Unknown Value Representation 'Á@' in tag (403e, 3f62)

DicomCleaner sav_dicom

Hello,
Thanks a lot for all these very usefull functions!
I get trouble generating dicom after clean:
File "C:\deid\dicom\pixels\clean.py", line 181, in save_dicom
dicom.PixelData = self.clean.tostring()
AttributeError: 'function' object has no attribute 'tostring'

Can you help me?
Regards

identifier for image should be file

I'm not comfortable with the fact that the API returns an entity id that is different than what goes in (without the original slash) as this is bound to lead to some error. We need to index the data on something that doesn't change until the end that further is one identifier per image, file name is reasonable to try.

[priority] add CHANGELOG

we need to be very on top of keeping track of changes, and making changes to development that coincide with particular versions.

Issue with full path names?

I updated to the latest version and my code broke. I'm using the full file path as the key for the ids dictionary.

if idx in ids: breaks b/c idx is the basename but my keys are the full path

a) is this a bug?
b) (or) do i need to modify my code to remove the path from the ids keys

header.py

if recipe.deid is not None:
            if idx in ids:
                for action in deid.get_actions():
                    dicom = perform_action(dicom=dicom,
                                           item=ids[idx],
                                           action=action) 

Support for duplicate DICOM file names

In the function deid.dicom.get_identifiers the dicom files are identified only by the file name.

This leads to hard to find bugs if you are trying to deidentify several series at the same time.

E.g.
Suppose you have two series

dicom/seriesA/00001.DCM
dicom/seriesB/00001.DCM

And you wish to update the Series Instance UID.

Following the example code you may write something like this

ids = get_identifiers(dicom_files)
for image,fields in ids.items():    
    fields['instance_id'] = pydicom.uid.generate_uid(entropy_srcs=uid)
    updated_ids[image] = fields 

Then both images would get the same instance_id.

deid identifiers --action all fails with NoneType object has no attribute get_actions

I have pulled the current master branch and tried the following:

(deid) ➜  deid git:(master) ✗ cd examples/dicom 
(deid) ➜  dicom git:(master) ✗ python deid-dicom-example.py 
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
Found 1 valid dicom files
WARNING No specification, loading default base deid.dicom
WARNING No specification, loading default base deid.dicom
WARNING No specification, loading default base deid.dicom
Traceback (most recent call last):
  File "deid-dicom-example.py", line 214, in <module>
    output_folder='/home/vanessa/Desktop')
  File "/home/peter/code/deid/deid/dicom/header.py", line 272, in replace_identifiers
    for action in deid.get_actions():
AttributeError: 'NoneType' object has no attribute 'get_actions'

Some information about my environment:

(deid) ➜  dicom git:(master) ✗ python --version
Python 3.6.2
(deid) ➜  dicom git:(master) ✗ pip list
certifi (2018.4.16)
chardet (3.0.4)
cycler (0.10.0)
deid (0.1.13, /home/peter/code/deid)
idna (2.6)
kiwisolver (1.0.1)
matplotlib (2.2.2)
numpy (1.14.3)
pip (9.0.1)
pydicom (1.0.2)
Pygments (2.2.0)
pyparsing (2.2.0)
python-dateutil (2.7.3)
pytz (2018.4)
requests (2.18.4)
retrying (1.3.3)
setuptools (28.8.0)
simplejson (3.15.0)
six (1.11.0)
urllib3 (1.22)
validator.py (1.2.5)

Any ideas what might be wrong?

Add function to perform JITTER

a Jitter of a timestamp means taking a variable field, to be used to jitter one or more fields. For example:

JITTER InstanceCreationTime var:item_timestamp

Would say to find the field InstanceCreationTime and jitter it by the number in the variable item_timestamp

deid version option has a bug

Running the container reveals:

(base) root@4abd1befe0c8:/code# which deid
/opt/conda/bin/deid
(base) root@4abd1befe0c8:/code# deid
Traceback (most recent call last):
  File "/opt/conda/bin/deid", line 11, in <module>
    load_entry_point('deid==0.1.19', 'console_scripts', 'deid')()
  File "/opt/conda/lib/python3.6/site-packages/deid-0.1.19-py3.6.egg/deid/main/__init__.py", line 141, in main
    if args.command == "version" or args.version is True:
AttributeError: 'Namespace' object has no attribute 'version'

AttributeError: 'function' object has no attribute 'tostring'

Hi,
Thanks for the opportunity to use this cool library 🙂

So, everything is going great until the .save_dicom() step.

client = DicomCleaner(output_folder='/output', deid=my_deid)
client.clean()

scrubbing happens:
Scrubbing /Users/me/dicoms/dicom.dcm.
Then,
client.save_dicom()
gives

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-77-35491d2123a0> in <module>()
----> 1 client.save_dicom()

/Users/me/anaconda3/envs/python_p27/lib/python2.7/site-packages/deid/dicom/pixels/clean.pyc in save_dicom(self, output_folder, image_type)
    179             dicom_name = self._get_clean_name(output_folder)
    180             dicom = read_file(self.dicom_file,force=True)
--> 181             dicom.PixelData = self.clean.tostring()
    182             dicom.save_as(dicom_name)
    183             return dicom_name

AttributeError: 'function' object has no attribute 'tostring'

It looks like the problem is with the read_file from pydicom...but I'm assuming the dicoms are readable since they were opened and read in the scrubbing step.

To experiment, I tried to save as a png:
client.save_png()
and got

Error in callback <function post_execute at 0x111179668> (for post_execute):
UsageError: Invalid GUI request 'pdf', valid ones are:[None, 'osx', 'widget', 'qt5', 'qt', 'nbagg', 'gtk', 'qt4', 'gtk3', 'notebook', 'tk', 'ipympl', 'inline', 'asyncio', 'wx']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Users/me/anaconda3/envs/python_p27/lib/python2.7/site-packages/matplotlib/pyplot.py in post_execute()
    146 
    147             def post_execute():
--> 148                 if matplotlib.is_interactive():
    149                     draw_all()
    150 

AttributeError: 'NoneType' object has no attribute 'is_interactive'

Not sure why the error has to do with 'pdf'?
Thanks for taking a look

Applying whitelist and blacklist filters

Hi, I have a question regarding the filter section of my config and my source code. In my configuration https://github.com/BrianKolowitz/deid/blob/development/my_examples/deid/deid.dicom I specify a whitelist

%filter whitelist

LABEL Xray
  contains Modality CR|DX

in my code https://github.com/BrianKolowitz/deid/blob/development/my_examples/dicom/my_deid.py i specify the configuration

 cleaned_files = replace_identifiers(dicom_files=dicom_files,
                                        ids=updated_ids,
                                        deid=deid,
                                        config=config_file_path,
                                        remove_private=True,
                                        output_folder=output_path)

but i see images with modalities PR and RG in my output_folder.

Is this a bug or am I not properly using the library?

clean method appears to be off

In clean.py:

Line 122:
self.cleaned[minr:maxr, minc:maxc] = 0 # should fill with black

For coordinates [0,0,800,59] it blacks out the vertical left section instead of horizontal upper section of the image.

pip3 download question

pip3 download deid==0.1.18 for pretty much all python packages results in whl file bing downloaded, however for deid, gz source is downloaded and I need to generate the whl file by running setup.py bdist_wheel option. Not a big deal, but Is there a reason while deid behaves differently?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.