Giter VIP home page Giter VIP logo

Comments (31)

hdclark avatar hdclark commented on July 23, 2024 1

I admire your enthusiasm and energy!!

Most of the functionality in DICOMautomaton revolves around the Drover class, which is essentially a shared wrapper around contours, images, surface meshes, and a few other things. All operations in DICOMautomaton expect a valid Drover object, and all files are loaded into a Drover. It will be a bit more complicated to get started with, but the Drover class is the public interface to loaded data / the current state -- internal functions like get_Contour_Data are not very stable and won't necessarily guarantee the data is valid/intact. Sticking with Drover rather than contours directly will hopefully also simplify the linking process by more closely following the existing translation unit boundaries.

The public counterpart to get_Contour_Data is Load_Files from src/File_Loader.cc, which essentially takes the same input (filenames and some other less relevant things) and places the contours in a Drover object:

bool
Load_Files( Drover &DICOM_data,
std::map<std::string,std::string> &InvocationMetadata,
const std::string &FilenameLex,
std::list<OperationArgPkg> &Operations,
std::list<std::filesystem::path> &Paths ){

The most important parts of the Drover class are defined here:

class Drover {
public:
std::shared_ptr<Contour_Data> contour_data; //Should I allow for multiple contour data? I think it would be logical...
std::list<std::shared_ptr<Image_Array>> image_data; //In case we ever get more than one set of images (different modalities?)
std::list<std::shared_ptr<Point_Cloud>> point_data;
std::list<std::shared_ptr<Surface_Mesh>> smesh_data;
std::list<std::shared_ptr<RTPlan>> rtplan_data;
std::list<std::shared_ptr<Line_Sample>> lsamp_data;
std::list<std::shared_ptr<Transform3>> trans_data;
std::list<std::shared_ptr<Sparse_Table>> table_data;
//Constructors.
Drover();
Drover(const Drover &in);

Contour_Data is defined here, but basically just wraps contour_collection:

class Contour_Data {
public:
std::list<contour_collection<double>> ccs; //Contour collections.
//Constructors.
Contour_Data();
Contour_Data(const Contour_Data &in);

https://github.com/hdclark/Ygor/blob/b9c557fc151e7a10970055dc7d581e10a44d0f34/src/YgorMath.h#L441-L448
and
https://github.com/hdclark/Ygor/blob/b9c557fc151e7a10970055dc7d581e10a44d0f34/src/YgorMath.h#L329-L337

Basically -- yep, its turtles all the way down. :)

I guess the key point is that I can wrangle the Drover object to extract contours rather than ask you to dive into all of this.

If you just want to use DICOMautomaton parsing code to get contour data, it will probably be best to create a new, dedicated 'gateway' C++ function to do the conversion to and from Python. It would accept a filename for a DICOM file, internally parse the inputs into a Drover object, extract the contours, and then translate the data into the format you wrote:

Point = Tuple[float, float, float]  # (*)
Contour = List[Point]
Contours = List[Contour]
ContoursBySlice = List[Contours]
ContoursByROI = Dict[str, ContoursBySlice]

If I understand correctly, this will allow using DICOMautomaton to extract contours for Python, but not do any processing with the data (because we don't have a function to convert the data back into Drover form). For the use-case of using Python to access DICOMautomaton functionality, we'll likely have to

  1. write some bidirectional Drover-to-Python code (in C++), so the Python code can make use of DICOMautomaton file loading and post-processing, but also still get idiomatic Python access to the data for further processing, and
  2. provide a way for Python to carry a Drover object around and pass it to and from DICOMautomaton operations -- maybe as some sort of opaque token or class containing a private Drover member that implements load()/store() to do the Python<->C++ conversion?

I don't know the idiomatic Python way, so might be way off base here. If we're on the same page this will basically amount to writing {,de-}serialization and operation wrapper code in C++, and a class and some RPC-esque operation wrapper code in Python. Seems reasonable to me. Does this work for you?

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024 1

One other thing (*): I currently keep contour vertices as 3-vectors because it's not uncommon to get oblique slices (e.g., MR). We can extract two planar coordinates, but it's not always clear which way is 'up' (i.e., positive orientation) or even left-right/up-down without more info. I'd prefer to minimize the differences between the C++ and Python representations for debug-ability, but if this is a pain point on the Python end we can figure something out.

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024 1

Quick update: I'm still working on this. I keep flipping back and forth between Pybind11 and RPC. I'm still leaning toward RPC for the reasons explained above, but there are lots of uncomfortable trade-offs because the popular RPC systems seem to require either intrusive data structures for writing explicit marshaling code.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024 1

Having a look at the following from the Apache Thrift page:

https://github.com/apache/thrift/blob/master/LANGUAGES.md

It is quite disconcerting that the maximum tested Python version is a version of Python that has been past end-of-life for the last 12 months:

image
image

https://endoflife.date/python

image

As a rule of thumb, when pulling in dependencies, I try to heavily weight the community size, as it tends to correlate with the likelihood that someone has had my issue before and has provided solutions online to my issue:

It does look like for better or worse, gRPC has a much larger community than thrift:

image
https://star-history.com/#apache/thrift&grpc/grpc&Date

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024 1

I fully agree that gRPC seems to be much more active, and also seems to have captured the lion's share of RPC users right now, at least for non-browser users. For transparency, here are the key points that led me toward Thrift:

  1. I wasn't able to compile the gRPC C++ example on any of the systems I regularly use to develop DICOMautomaton. Using the latest gRPC in a clean Docker image didn't work either. Support beyond Ubuntu was minimal, maybe even best-effort. The Thrift examples and tests compiled and ran without issue. This immediately made me nervous relying on gRPC.

  2. gRPC pulls in a few dependencies that would be challenging to integrate into my current build process. Upstream uses a somewhat exotic build system (Bazel). It nominally supports CMake (the build tool DICOMautomaton uses), but possibly relies on pkg-config. I couldn't get the pkg-config examples to work, and e.g., they silently ignore static builds. I couldn't get a stripped-down hello world to compile using manual pkg-config either. gRPC also brings in abseil and a few other Google tools that are hard to work around, especially when cross-compiling. This basically boils down to trust that breaking changes will be handled reasonably, and I trust Apache more than Google to serve users outside of Google. (No offense to Google!) Thrift explicitly favours limiting dependencies. I was able to build it from source and use it on x86 and arm so far.

  3. I found the gRPC documentation for C++ challenging. All documentation kept sending me back to the introductory tutorial, which wouldn't compile. The Thrift documentation was actually somehow worse, but there is a textbook available with C++, Java, and Python examples. I also managed to find some nice resources online to help grok what was happening.

  4. gRPC relies on HTTP/2, which limits access to load balancers. Thrift uses plain HTTP, which opens support for load balancers, proxies, and can be used in-browser without needing a translation layer. (HTTP/2 does bring streaming capabilities, but I don't think we can use that feature with current DICOMautomaton architecture.)

  5. gRPC has a fixed format, whereas Thrift has support for multiple serialization formats, including JSON. I don't have immediate plans to use JSON, but using plain HTTP and JSON might help 'bridge the gap' to languages or libraries where Thrift is not explicitly supported.

  6. Something I'm still not 100% sure about for either gRPC or Thrift: I think the gRPC compiler needs to be run for every build, whereas the Thrift compiler only needs to be run when the IDL file is altered. Needing to run another compiler during cross-compiling, that runs on architecture X but needs to emit code for architecture Y, requires a lot of effort.

  7. The gRPC and Thrift IDL languages are similar, so can probably be ported back and forth if needed. I prefer the Thrift IDL specification, which supports lists/arrays directly without having to recreate it using gRPC structs+repeated. But overall the majority of the code I/we need to write is for serialization, which looks pretty similar in both Thrift and gRPC.

Overall I'm not thrilled with Thrift, but I still think it's currently a safer bet than gRPC in terms of anticipated headaches. both gRPC and Thrift GitHub stars are growing. They are similar enough that we can probably port between them (or REST with swagger, or GraphQL, etc.) if needed.

Of note: Thrift is developed inside the Apache ecosystem, so GitHub stars might inaccurately reflect market share.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Yup, sounds good to me 🙂

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

What are our next steps?

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

Are you able to provide a class in Python that holds a private C++ Drover object? Exactly how to wire that up is something I'm unsure about. Or, rather, if I do, it it will reek of C++ style and be very un-Pythonic.

After we have that, I can provide supporting methods to load a Drover object from a file, invoke DICOMautomaton operations using the Drover, and convert contour data to/from Python data structures (or provide accessors to avoid copying). How those methods look and how to convert into sensible Python data structures can follow, I just need a class to get started.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Would you be open to utilising Boost.Python for this?

https://www.boost.org/doc/libs/1_80_0/libs/python/doc/html/index.html

Here's an example where someone details the use of Boost.Python to expose a C++ object to Python:

https://stackoverflow.com/a/8226377

If this is an approach you think would work well for Drover, would you be open to doing the C++ side of utilising Boost.Python? Once we have the Drover object exposed to Python via Boost.Python I can then make a Pythonic wrapper?

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

Both Boost.python and pybind11 are highly regarded. I previously looked into using them, and was dissatisfied for some specific reason that I cannot recall. Maybe portability? I don't think it was easy to cross-compile and/or they weren't available for some of the architectures I needed. I will re-evaluate both now, maybe x86_64 is enough?

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Being able to support Arm would be helpful I think. That would be an unfortunate trade off. Here is the array of build environments that I would imagine we would be aiming to try and use:

https://cibuildwheel.readthedocs.io/en/stable/

A completely different approach would be to undergo all communication between C++ and Python via gRPC on a localhost server?

https://grpc.io/


The approach there would be to have the DICOMautomaton C++ library have a gRPC server CLI command which accepts hostname, port and token (defaults to localhost, random, and a random cryptographic secret token, respectively). Once the DICOMautomaton gRPC server has started then it prints to STDOUT a json representation of the hostname, port, and token.

When starting the server, C++ randomises port selection, tries to host the gRPC server at that port, if it fails, it retries a different random port and repeats for a pre-defined number of retries.

Then, Python (or any other language that gRPC supports, which is many) can then call the DICOMautomaton CLI and listen to STDOUT. Once DICOMautomaton prints the port and token, then Python connects to the gRPC server at that port and provides that token for authentication.

...from there, C++ and Python are able to access a bi-directional streaming API with each other.

And, if desired, the DICOMautomaton gRPC API architecture can be readily utilised for other languages and server/client configurations.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Another huge benefit of the gRPC approach, cross-compilation is unaffected. The compilation approach is unchanged. C++ doesn't need to know about Python.

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Hi @hdclark,

I have no huge preference either way. Happy for you to decide which is the best combination of tradeoffs.

My main preference of tradeoffs is one that will work with Windows, Mac and Linux (with Mac including Apple Silicon)

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Thanks for your patience and ideas. I'm trying to proceed cautiously so I can both maintain and build upon this code for many years...

Yup, more than happy to be patient on this front :)

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

As a thought, what if, as a first pass, a form of RPC was used (such as gRPC) where the only data sent across RPC were UID strings or hashes and other "lite" objects. And then, for the bulk data transfer, send the file via a socket in DICOM byte stream format?

Essentially, the user can call a Python wrapper function from the Python side with a pydicom dataset object, this is then serialised to a DICOM byte stream, an sha224 hash is taken of the file, and then that hash is send via RPC to the C++ code. Then, the C++ code will respond to the internal python function to request for the file, if that particular file hasn't already been sent to C++. File sends between C++ and Python can by via sockets as DICOM byte streams. If that particular sha224 already exists on the C++ end, then those particular function calls will act on the already existing Drover object.

This approach assumes that the Drover object isn't being mutated, so that might not work. There is also the option of linking the two up via DICOM instance UIDs which means no hashing is needed. But then if the user adjusts the DICOM file on the Python end, without updating the UID, then C++ will mistakingly assume the DICOM file is the same as previous without requesting the new DICOM file be sent.


I suspect the above needs quite a bit of work... And it is certainly not efficient. But... It might cover a lot of useful cases.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Actually, a better option which allows for Drover mutation, might be to have the Python user explicitly send a DICOM file to C++ and then C++ responds with a unique string identifier to correspond to the created Drover object. Then the Python user can pass that string to a range of Python functions which then command certain Drover methods to be called. And finally, a "retrieve" Python function can be called, at which point the mutated Drover object is then serialised to DICOM byte stream and sent back. Whenever a new Drover object is created, the Python user is provided with a new unique string identifier. At any time those identifiers can be called to retrieve the Drover objects.

Maybe we can even have a hook on the Python identifiers so that when their reference counts go to 0 then C++ is informed to delete the corresponding Drover instance.


This could all be undergone effectively as a Drover class on the Python end...

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

I like the overall idea, which is basically holding a proxy object to avoid transmitting data back and forth. There might be some more 'gotchas' to think about with respect to memory leaks (e.g., does it require a long-running connection? how to handle timeouts? are tokens a means of authentication?).

Another approach to this could be to have Python code upfront send the data + a list of functions to perform serially. There is already a mechanism to handle building a computation graph in DICOMautomaton using scripts. The scripting facilities are barebones, but the important part is that I can compile scripts down to a tree which can be executed. Maybe we can come up with a way to specify a computation graph in Python, and then send it to DICOMautomaton? The batch computation approach avoids most of the complexity of the proxy approach (especially memory management), but still allows chaining operations without needless back-and-forth serialization between operations.

After encountering a few speedbumps with other RPC tools, I've made pretty good progress with Apache Thrift. I can now serialize and deserialize contour data within a Drover in a forward-compatible way. I'm still working on the networking code, but I hope to have something workable this week. I'm slightly worried about running into bugs in Thrift; for example, there are known issues with 'oneway' functions that seem to be incompatible with HTTP, and it might be challenging to support different versions provided by different toolchains since the generated code sometimes uses std::shared_ptr and sometimes uses boost::shared_ptr. But overall it seems to work well.

I hope to commit something soon. For the moment, I'm just focusing on {de-,}serialization and supporting a minimal RPC mechanism. It will be relatively easy to add proxy or batch computation on top afterward.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Yup, that all makes sense.

One query:

Of note: Thrift is developed inside the Apache ecosystem, so GitHub stars might inaccurately reflect market share.

I've been trying to find which parts of the Apache ecosystem utilise Thrift. I have found that Cassandra did, but it seems they removed it:

https://lists.apache.org/thread/69yl7nnfbvp1o40p5s3jobjxv63fr7bv

Does Apache use Thrift within its own projects?

It also seems like Thrift was only recently handed over to Apache in 2020 where prior to then it was Facebook's baby:

https://en.wikipedia.org/wiki/Apache_Thrift

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

I'm not sure about either. I know Facebook developed Thrift, but I assumed it was handed off to Apache earlier. If it was only handed off in 2020, how did GitHub stars for apache/thrift repo start prior to 2020. (Maybe it was just renamed??)

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Another tool that I use to give an idea of community activity is the "awesome" lists topic within GitHub:

https://github.com/topics/awesome?q=thrift
https://github.com/topics/awesome?q=grpc

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

I tend to lurk around https://news.ycombinator.com and get a sense of how folks discuss the topic. Thrift and gRPC both seem reasonably well-liked.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Yup, and here's something within the Apache ecosystem where they use thrift:
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2

Which is part of their popular Hadoop tooling:
https://hadoop.apache.org/

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

I tend to lurk around https://news.ycombinator.com/ and get a sense of how folks discuss the topic. Thrift and gRPC both seem reasonably well-liked.

If it helps, interestingly, the top 5 sites for thrift are the following:

https://www.google.com/search?q=thrift+site%3Ahttps%3A%2F%2Fnews.ycombinator.com

And for gRPC:

https://www.google.com/search?q=grpc+site%3Ahttps%3A%2F%2Fnews.ycombinator.com

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

2. gRPC pulls in a few dependencies that would be challenging to integrate into my current build process. Upstream uses a somewhat exotic build system (Bazel)

If it helps, I am highly considering utilising Bazel following the approach that Tensorflow uses within the OpenSaMD repo.

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

Happy holidays!

A bare-minimum Thrift-based RPC implementation was just added (f20e735), mostly to "test the waters" and see how CI handles the new dependencies. The commit touched a lot of code and might possibly require a meta-compilation step (hopefully not!), so I suspect a fair amount of touch-up work will be required before it's ready.

I'm not sure if you're planning to use the RPC interface directly at this point. I definitely plan to forge ahead to build up internal support for distributed and heterogeneous computing, but I'm also happy to add a more black-box interface using the RPC mechanisms if you think this would be helpful. If Thrift isn't your cup of tea, which is completely fair, I eventually hope to extend the same interface to also support gRPC. This will give me the most build-time and run-time flexibility. However, I won't be able to get to until I fill in the Thrift RPC model first.

from dicomautomaton.

SimonBiggs avatar SimonBiggs commented on July 23, 2024

Hi Hal :)

I have just enjoyed a beautiful three weeks of holiday. I haven't yet got back to this. I'm quite swamped at the moment though after being away for that long. So let me get back to you once I have my house back in order.

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

from dicomautomaton.

hdclark avatar hdclark commented on July 23, 2024

Closing this for now.

from dicomautomaton.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.