Giter VIP home page Giter VIP logo

dicom's Introduction

dicom

High Performance Golang DICOM Medical Image Parser

👀 v1.0 just released!

This is a library and command-line tool to read, write, and generally work with DICOM medical image files in native Go. The goal is to build a full-featured, high-performance, and readable DICOM parser for the Go community.

After a fair bit of work, I've just released v1.0 of this library which is essentially rewritten from the ground up to be more canonical go, better tested, has new features, many bugfixes, and more (though there is always more to come on the roadmap).

Some notable features:

  • Parse multi-frame DICOM imagery (both encapsulated and native pixel data)
  • Channel-based streaming of Frames to a client as they are parsed out of the dicom
  • Cleaner Go Element and Dataset representations (in the absence of Go generics)
  • Better support for icon image sets in addition to primary image sets
  • Write and encode Datasets back to DICOM files
  • Enhanced testing and benchmarking support
  • Modern, canonical Go.

Usage

To use this in your golang project, import github.com/suyashkumar/dicom. This repository supports Go modules, and regularly tags releases using semantic versioning. Typical usage is straightforward:

dataset, _ := dicom.ParseFile("testdata/1.dcm", nil) // See also: dicom.Parse which has a generic io.Reader API.

// Dataset will nicely print the DICOM dataset data out of the box.
fmt.Println(dataset)

// Dataset is also JSON serializable out of the box.
j, _ := json.Marshal(dataset)
fmt.Println(j)

More details about the package (and additional examples and APIs) can be found in the godoc.

CLI Tool

A CLI tool that uses this package to parse imagery and metadata out of DICOMs is provided in the cmd/dicomutil package. This tool can take in a DICOM, and dump out all the elements to STDOUT, in addition to writing out any imagery to the current working directory either as PNGs or JPEG (note, it does not perform any automatic color rescaling by default).

Installation

You can download the prebuilt binaries from the releases tab, or use the following to download the binary at the command line using my getbin tool:

wget -qO- "https://getbin.io/suyashkumar/dicom" | tar xvz

(This attempts to infer your OS and 301 redirects wget to the latest github release asset for your system. Downloads come from GitHub releases).

Usage

dicomutil -path myfile.dcm

Note: for some DICOMs (with native pixel data) no automatic intensity scaling is applied yet (this is coming). You can apply this in your image viewer if needed (in Preview on mac, go to Tools->Adjust Color).

Build manually

To build manually, ensure you have make and go installed. Clone (or go get) this repo into your $GOPATH and then simply run:

make

Which will build the dicomutil binary and include it in a build/ folder in your current working directory.

You can also built it using Go directly:

go build -o dicomutil ./cmd/dicomutil

History

Here's a little more history on this repository for those who are interested!

v0

The v0 suyashkumar/dicom started off as a hard fork of go-dicom which was not being maintained actively anymore (with the original author being supportive of my fork--thank you!). I worked on adding several new capabilities, bug fixes, and general maintainability refactors (like multiframe support, streaming parsing, updated APIs, low-level parsing bug fixes, and more).

That represents the v0 history of the repository.

v1

For v1 I rewrote and redesigned the core library essentially from scratch, and added several new features and bug fixes that only live in v1. The architecture and APIs are completely different, as is some of the underlying parser logic (to be more efficient and correct). Most of the core rewrite work happened at the s/1.0-rewrite branch.

Acknowledgements

dicom's People

Contributors

amitbet avatar bench avatar bpeake-illuscio avatar courteouselk avatar dariocasas avatar dependabot[bot] avatar dimitripapadopoulos avatar ducquangkstn avatar faustoespinal-philips avatar favadi avatar gillesdemey avatar hkethi002 avatar ichocked avatar jabillings avatar jasperdenotter avatar jesslatimer avatar jstutters avatar kaxap avatar kristianvalind avatar marineotter avatar selimyoussry avatar suyashkumar avatar wkoszek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dicom's Issues

API Refactors

The current top level dicom api is pretty crowded with many public functions, interfaces, and struct definitions (and has historically grown more crowded across forks). It is reasonable to group some of these into subpackages.

Output DICOM Metadata to Proto [convert pkg]

Proto might look something like

// Just a general idea: 

message DataSet {
   repeated Element elements = 1;
}

message Element {
   VR vr = 1;
   bytes tag = 2;
   oneof value {
       Sequence sequence = 3;
       int64 integer = 4;  // or maybe consider breaking out all the various possible int types
       repeated int64 integers = 5;
       float flt = 6;
       repeated float flts = 7;
       bytes bytes_value = 8;
       string string_value = 9;
       repeated string strings_list = 10;
   }
}

message Sequence ...
// more to follow, general idea here. 

Error: Encountered odd length

Another issue when scanning many DICOMs:

Encountered odd length (vl=13945) when reading explicit VR SQ for tag (6003,1010)[private] (file offset 1752)
Encountered odd length (vl=27975) when reading explicit VR SQ for tag (6003,1010)[private] (file offset 1752)
Encountered odd length (vl=286383071) when reading implicit VR 'UN' for tag (1111,11e3)[private] (file offset 5382)
Encountered odd length (vl=419395) when reading explicit VR OW for tag (6000,3000)[??] (file offset 13496)

dicomutil should be able to operate over directory trees of DICOMs

The cli should be modified to be able to operate over a set of DICOMs that reside in a provided directory tree.

One could imagine using dicomutil to iterate over a set of DICOMs to process them in some way--say, converting them into tf.Examples or dumping their metadata to CSV, or writing out DICOM frames to disk as jpgs.

A deterministic output structure would be required--a possibility is dumping the outputs in a separate output folder that would mimic the same directory structure as the input folder.

MetaElementGroupLength not found

The dicoms provided by https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection are not standardized to have MetaElementGroupLength come first. The pydicom library, which the dataset was probably intended for use with, accounts for this by reading until the first non-0x0002 tag then rewinding.

Example error output with the RSNA dataset (error is returned from dicom.NewParserFromBytes()):

2020/01/01 21:39:41 error processing stage_2_train/ID_014d9a502.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014ddb831.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014dfa44a.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014e0a593.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014e24cfc.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)

ERROR ReadBytes

ERROR ReadBytes: requested 2, available 0 (file offset 65274)

I haven't had a chance to look into it yet, but I've attached 3 example files and will try to get more info sometime soon.

examples.zip

Data Citation:
Kinahan, Paul; Muzi, Mark; Bialecki, Brian; Coombs, Laura. (2017). Data from ACRIN-FLT-Breast. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2017.ol20zmxg

Unexpected EOF

Don't have more details right now (just repasting some error messages here from our last run over many Ms of DICOMs), but essentially some of our DICOMs can'd be decoded b/c of the "EOF" message. Will dig deeper into the source code later.

Cleanup %decoder

At some point there seems to have been an inadvertent find and replace that snuck in to change %d to %decoder. Example:

dicom/parse.go

Line 103 in 668a92a

panic(fmt.Sprintf("ReadElement failed to consume data: %decoder %decoder: %v", startLen, p.decoder.Len(), p.decoder.Error()))

Allow disabling of VR validation during encoding

When encoding an element, a VR not matching what is specified in the standard triggers an error.

For tags where the standard is ambiguous or allows more than one VR type, this is problematic. For example, (0028,0120) Pixel Padding Value somewhat frequently appears as SS, rather than US, which will trigger an error.

One way to solve this would be to allow encoding of elements without VR validation, which also might be useful for situations where writing of non compliant DICOM files might be necessary.

Wrong1D to 2D array convert?

i.SetGray16(j%n.Cols, j/n.Rows, color.Gray16{Y: uint16(n.Data[j][0])}) // for now, assume we're not overflowing uint16, assume gray image

I think it should be
i.SetGray16(j%n.Cols, j/n.Cols, color.Gray16{Y: uint16(n.Data[j][0])})

Is it right?

API to work with Sequence Elements

Provide an optional API that callers can use to work with nested Sequence items easier. This will probably be more allocations and copying than just iterating over the Value in the SQ element and using type assertion, but the caller has the option to do either (use the nicer API or go with the optimization).

Maybe I can also provide a callback based helper iterator to help callers go iterate more efficiently using recursion and type assertion (need to look more into how efficient type assertion from interface{} is, but my understanding is that it's pretty efficient in recent versions of go) .

Something like

func IterateOverSequence(root *element.Element, cb func(*element.Element)) {
    ...
}

Refactor existing parse logic

Much of the high level parse logic comes from the upstream fork, but there are some ways this should be simplified (recursion instead of looping for sequence elements, more traditional golang error handling, a nicer interface, etc).

Fetch and parse DICOMs at URLs

API like:
dicomutil --extract-images https://suyashkumar.com/test.dcm

The user must trust the entity serving the URL (and cert).

ERROR: too many open files

Results after too many dicom files (in my case 1024) have attempted to be opened and failed.

The issue seems to results from the function NewParserFromFile() in Parse.go. Each time the function is called, it opens a new file, but that file is not closed in the function or the termination of the function, but rather is indirectly closed later by the code execution following p.(*parser).file = file. The result of this is that files that incur an error are never closed because they return before that line, and once the max number of errored files are open, the program spits out this error for every subsequent file.

I'm in the process of submitting a fix. I'll add a pull request soon.

Consider creating a unified Frame interface (for Encapsulated and Native frames)

It would be nice to deal with a single unified Frame interface that wraps Encapuslated and Native frame data. Main thing it would do is try to abstract away a GetImage method that just returns a standard image.Image no matter what the underlying frame data.

Something like

type Frame interface {
       // GetDefaultImage returns this frame as a standard golang image.Image
	GetDefaultImage() image.Image
        GetImage(opts frame.ImageOptions) image.Image 
        // Below methods needed to access raw frame data?
	GetNativeFrame() (NativeFrame, error)
	GetEncapsulatedFrame() (EncapsulatedFrame, error)
	IsEncapsulated() bool
}

However there are many considerations:

  • What about window width and center? Should parameters for those be passed into GetImage (more relevant for Native Frames). Should we just have a GetDefaultImage? Users can also manipulate the image directly, but assumtions will need to be made about bitdepth and such at that point.
  • Users will want easy access to the underlying NativeFrame and EncapsulatedFrame objects if they want to do their own post-processing or image rendering

Unknown character set 'ISO_IR 6'. Assuming utf-8 (file offset 360)

I've hit this issue with 2 many DICOM files coming from 2 different providers/countries.

PyDICOM works fine in this case.

Looks like pydicom considers it as a normal encoding. I think we should too.

python_encoding = {

    # default character set for DICOM
    '': default_encoding,

    # alias for latin_1 too (iso_ir_6 exists as an alias to 'ascii')
    'ISO_IR 6': default_encoding,
    'ISO_IR 13': 'shift_jis',
    'ISO_IR 100': 'latin_1',

(edit): Same issue with: Unknown character set 'ISO_IR 192'. Assuming utf-8 (file offset 348)

Revisit package stutter (element.Element)

In #21, there were some major refactors, moving certain entities into standalone packages, which did introduce some stutter. While there are examples in the std lib of this (context.Context), if there's some way to group or name packages to allow this to read better that would be preferrable.

netdicom

Did you get rid of the Implementations in github.com/grailbio/go-netdicom?
Would like to switch from grailbio, but dont want use both.
Maybe i can implement a separate dcm server depending on your implementation.

Keyword 'DICM' not found in the header (file offset 132)

Problem

I have an XRay DICOM failing to initialize the parser b/c of:

Keyword 'DICM' not found in the header (file offset 132)

I can dump it with with fine dcmtk [dcmdump]. Good news is that the competition fails as well:

Traceback (most recent call last):
  File "../bench/dicom_bench.py", line 29, in <module>
    ds = pydicom.dcmread(fn)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pydicom/filereader.py", line 870, in dcmread
    force=force, specific_tags=specific_tags)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pydicom/filereader.py", line 667, in read_partial
    preamble = read_preamble(fileobj, force)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pydicom/filereader.py", line 620, in read_preamble
    raise InvalidDicomError("File is missing DICOM File Meta Information "
pydicom.errors.InvalidDicomError: File is missing DICOM File Meta Information header or the 'DICM' prefix is missing from the header. Use force=True to force reading.

Error: Expect Item in pixeldata but found tag

Hit another issues when scanning DICOMs:

Expect Item in pixeldata but found tag (00fd,f9ff)[private] (file offset 5384)
Expect Item in pixeldata but found tag (1a1a,1a1a)[??] (file offset 5390)

Convert DICOMs to tf.Example ?

A potentially useful feature for the included CLI (and library) could be converting DICOMs (frames + metadata) into neat tf.Example protocol buffers for downstream use in Tensorflow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.