drivendataorg / concept-to-clinic Goto Github PK

View Code? Open in Web Editor NEW

369.0 369.0 147.0 152.81 MB

ALCF Concept to Clinic Challenge

Home Page: https://concepttoclinic.drivendata.org/

License: MIT License

Shell 0.65% Python 76.92% HTML 0.10% CSS 0.17% JavaScript 6.84% Vue 15.32%

concept-to-clinic's People

Contributors

Stargazers

Watchers

Forkers

fossabot tdraebing rahit dvu4 sjol ravichandrasekaran poppingtonic harshaabajaj foo-bar-baz-qux securetorobert musale shafiahmed silky hendrikpfaff andurill anushthakalia mehrtash alphamupsiomega abodacs gomrpickles alxndrkalinin excalliburbd nikolayvoronchikhin alexanderrich timothyman adriansarno brandonkrull benjamesbabala existeundelta doel wgierke castillomore antonow tapasag allensmile talhankoc dchansen sakares smwade onordander vessemer rubythonode tjk9002 picopoco hailiang-wang cbentes codeislife99 eltonlaw itcthienkhiem serhiy-shekhovtsov naimkabir ask7 rupak-118 szelenka jagatfx evan176 jehung betatim jeremykohn snehac-miner eelvira jimmychung98 jpcc-ll melihberberolu wgmv louisgv athiwatp spandyie knnaraghi lserafin jjjmm aslam truefalse10 vijaykramesh zeroaps ajrheaume chamara84 maksymdelta hassan-c-11 dramygdala benjaminarjun slavikgreen philliskiragu jvc26 trevtrich iglpdc enchantment85 mnickii olgadk7 swarm-ai mingdingzhiai beneditatan casaper lscoder ananyak100 briando2005 satytag sbawani2007 ykankaya forloverj

concept-to-clinic's Issues

Travis: "Could not find repository"

When trying to see the Travis logs of a build, you end up on a Travis page stating "We couldn't find the repository concept-to-clinic/concept-to-clinic".

Expected Behavior

When trying to access the Travis logs of a build, you should be able to see the logs of the respective build.

Current Behavior

You see the described error page.

Possible Solution

Are the builds only visible to admins? If so, it might be helpful to change some access rights.

Steps to Reproduce

Check a build like this one

Context (Environment)

I want to see any problems that occurred when running a Travis build.

Detailed Description

Contributors should be able to access the Travis logs so they know what went wrong.

Checklist before submitting

I have confirmed this using the officially supported Docker Compose setup using the local.py configuration and ensured that I built the containers again and they reflect the most recent version of the project at the HEAD commit on the master branch
I have searched through the other currently open issues and am confident this is not a duplicate of an existing bug
I provided a minimal code snippet or list of steps that reproduces the bug.
I provided screenshots where appropriate
I filled out all the relevant sections of this template

Overview

A "breadcrumb trail" is a type of secondary navigation scheme that reveals the user's location in a website or Web application. For example, "Open image -> Detect and select -> ...". We should add this for mnavigation based on the current stage in the analysis to aid the end-user

Expected Behavior

The current page should be highlighted:

The links do not need to be actual hyperlinks, merely text.

Technical details

We should avoid DRY (Don't Repeat Yourself) violations by duplicating the HTML on every possible page with slightly different markup. The links do not need to be clickable.

Blocked by

Issues: #9

Document the Pierre Fillard (Therapixel) algorithm

Overview

Participants in the Data Science Bowl produced several algorithms that we would like to incorporate. To help facilitate this effort, we also want to add documentation so that contributors can make an educated decision when selecting an algorithm to incorporate.

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	Pierre Fillard (Therapixel)
rank	5
repo	https://github.com/pfillard/tpx-kaggle-dsb2017
trained models	https://github.com/pfillard/tpx-kaggle-dsb2017/tree/master/models
converted branch
ML engine
engine-version
ML backend	Tensorflow
backend-version	1.1
training method
architecture
algorithm
OS	Ubuntu
OS version	16.04
Python version	3.5
CUDA version	8
cuDNN version	5.1
notes	https://github.com/pfillard/tensorflow/tree/r1.0_relu1

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Add default githooks for tests and flake8

Overview

Git hooks are useful tools that run commands before you commit

Expected Behavior

We should have a default githook script and instructions on how to install the git hook.

Technical details

Add .githook-pre-commit file with the script to run the tests (tests/test-docker.sh) and flake 8 on both of the codebases.
Add instructions for the docs about how to copy this file to the right place and rename it so that the hooks execute before a commit.

Acceptance criteria

Working githooks with tests and flake8
Documentation for the githooks

NOTE: All PRs must follow the standard PR checklist.

Document the Julian de Wit algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	Julian de Wit
rank	2
repo	https://github.com/juliandewit/kaggle_ndsb2017
trained models	https://retinopaty.blob.core.windows.net/ndsb3/trained_models.rar
converted branch
ML engine	Keras
engine-version
ML backend	Tensorflow
backend-version
training method
architecture
algorithm
OS	Windows
OS version
Python version	3.5
CUDA version
cuDNN version
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Document the DL Munich algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	DL Munich
rank	7
repo	https://github.com/NDKoehler/DataScienceBowl2017_7th_place
trained models	requested
converted branch
ML engine
engine-version
ML backend	Tensorflow
backend-version	1.0.1
training method
architecture
algorithm
OS	Ubuntu
OS version	14.04
Python version
CUDA version
cuDNN version
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

External links in https://concepttoclinic.drivendata.org/documentation do not work

Expected Behavior

When viewing the documentation through https://concepttoclinic.drivendata.org/documentation, external links should take the browser to a new webpage.

Current Behavior

Clicking external links (at least in Firefox and Chrome) leads to a page with the "Concept to Clinic" heading that is otherwise blank.

Possible Solution

It looks like the documentation page uses an iframe to display a readthedocs page. I think this iframe might need to be reconfigured to allow redirecting the target of its parent window. Since concepttoclinic.drivendata.org isn't controlled by this public github repo, I think an admin will have to fix this.

Steps to Reproduce

Open https://concepttoclinic.drivendata.org/documentation
Go to "Getting Started"
Click the "submitting a PR to the GitHub repository" link.
The screen goes blank.

Add links to the project structure documentation

Overview

The project structure documentation is a useful first step, but could get much better. The first way to do that is to turn the references to the technologies into useful links to those technologies.

Current page: https://concept-to-clinic.readthedocs.io/en/latest/project-structure.html

Expected Behavior

The updated Project-structure page should link out from at least the following references to the project pages or documentation that is relevant for that reference:

Docker
docker-compose
sphinx
Django
Django Rest Framework
Vue.js
Django database migrations
requirements.txt (and pip)
DICOM
git-lfs
Flask
flake8
pytest

Technical details

This feature should be implemented in the docs/project-structure.md file

Acceptance criteria

links to the projects mentioned above are added where relevant

NOTE: All PRs must follow the standard PR checklist.

Display metadata about an image prior to selection

Overview

When browsing for an test case to analyse, it would be very helpful for the end-user to be able to preview a particular case before proceeding to the next step. This would also provide means to ensure and double-check they are selecting the correct patient.

We should therefore display metadata about an image prior to selection.

Expected Behavior

The image metadata should be displayed in a panel on the right-hand side of the interface. For example:

Technical details

Upon clicking on an image file, the data will have to be loaded. A preview of the image will need to be displayed too, likely in a seperate HTTP call, but it may be possible and cleaner to provide a preview within the same payload by base64-encoding the image data.

Acceptance criteria

A pane on the right-hand side of the open image will reveal all the relevant metadata for an image.

Blocked by

Issues: #9

Feature: Implement classification algorithm

Overview

Currently, there is a just a placeholder in the algorithm that classifies nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl algorithms to predict P(cancer) for a given set of centroids for nodules.

Expected Behavior

Given a model trained to perform this task, a DICOM image, and the nodule centroids, return the P(cancer) for each nodule.

Design doc reference:
Jobs to be done > Annotate > Prediction service

Technical details

This feature should be implemented in the prediction/src/algorithms/classify/trained_model/predict method.
Code to train the model should live in the prediction/src/algorithms/classify/src/ folder
A fully serialized version of the model that can be loaded should live in the prediction/src/algorithms/classify/assets/ folder using git-lfs

Out of scope

This feature is a first-pass at getting a model that completes the task with the defined input and output. We are not yet judging the model based on its accuracy or computational performance.

Acceptance criteria

trained model
documentation for the trained model (e.g., cross validation performance, data used) and how to re-train it

NOTE: All PRs must follow the standard PR checklist.

Fix incorrect "developer documentation" link in the readme

The "developer documentation" under Getting Started doesn't link to the "Local development with Docker" section.

Expected Behavior

The link should point to https://concept-to-clinic.readthedocs.io/en/latest/developing-locally-docker.html

Current Behavior

The link points to the main documentation page https://concepttoclinic.drivendata.org/documentation

Possible Solution

Change the link or reword the text so that it is clear where the current link leads to.

Steps to Reproduce

visit https://github.com/concept-to-clinic/concept-to-clinic#getting-started
click the "developer documentation" link

Acceptance criteria

Readme documentation links to the correct location

Show "X candidates found" on the "Detect and select" page

Overview

In order to help the end-user get a quick overview of the scale of the case, we should show a "X candidates found" message on the "Detect and select" step of the identification/analysis.

Expected Behavior

For example, "7 candidates found" as displayed here:

Technical details

If the backend does not provide summary num_candidates metadata, the value can be calculated by counting the number of candidates returned by the API. The test displayed should be pluralised, so "1 candidate found" vs "2 candidates", etc. in a reasonably clean manner.

Acceptance criteria

The "detect and select" page shows the number of candidates found via a "X candidates found" message.

Blocked by

Issues: #9

Show current Git version in top-level navigation

Overview

Interface should include the current Git version in the top-right corner. This is in order to ease reporting of issues; any screenshots will implicitly include the version, potentially saving wasted effort when debugging possibly-fixed bugs, etc.

Expected Behavior

In the interface, the truncated SHA of the Git version used to build the site should be displayed, eg. b1f2ad46.

Technical details

It should also ensure it works in production where the .git/ metadata directory will not exist, thus some change to the deployment scripts may be required to capture this data.. Care should be taken re. caching to not inflict a performance penalty on every page load.

Acceptance criteria

The top-right corner will display the SHA-1.

Blocked by

Issues: #9

Use git-lfs for test images

Overview

Currently test images are stored in the git repository. These files won't change much so we should try to save bandwidth and separate concerns by tracking them with git-lfs instead.

Expected Behavior

DICOM images in tests/assets are managed with git-lfs

Acceptance criteria

test images are managed by git-lfs
large image files are removed from the repository

NOTE: All PRs must follow the standard PR checklist.

Feature: Implement segmentation algorithm

Overview

Currently, there is a just a placeholder in the algorithm that segments nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl algorithms to predict nodule boundaries and descriptive statistics from an iterator of nodule centroids for an image.

Expected Behavior

Given a model trained to perform this task, a DICOM image, and an iterator of nodule centroids, save a file with boundaries (3D boolean mask with true values for voxels associated with that nodule), widest width, and volume to disk. Yield paths to the saved file for each nodule.

Design doc reference:
Jobs to be done > Segment > Prediction service

Technical details

This feature should be implemented in the prediction/src/algorithms/segment/trained_model/predict method.
Code to train the model should live in the prediction/src/algorithms/segment/src/ folder
A fully serialized version of the model that can be loaded should live in the prediction/src/algorithms/segment/assets/ folder using git-lfs

Out of scope

This feature is a first-pass at getting a model that completes the task with the defined input and output. We are not yet judging the model based on its accuracy or computational performance.

Acceptance criteria

trained model
documentation for the trained model (e.g., cross validation performance, data used) and how to re-train it

NOTE: All PRs must follow the standard PR checklist.

Set up Sphinx autodoc for prediction and interface code

Overview

When docs are built (e.g. with make html run from docs directory), the prediction and interface projects should have their docstrings converted to entries in the documentation.

Technical details

Add one code.rst document that has its contents derived automatically from docstrings — don't worry about breaking this up into smaller files yet
Add the new document to the main toctree (table of contents) in index.rst
Important note: this project's chosen convention is to use Google-style docstrings rather than Sphinx style

Out of scope

You don't need to add to or fix existing docstrings as long as there is at least one that shows the autodoc is working.

Acceptance criteria

Docs build correctly with make html run from docs directory
Syntactically correct Google-style docstrings in prediction and interface pieces of the project can get built using autodoc when the appropriate entry is put into code.rst
At least one autodoc entry in code.rst demonstrates this is working

NOTE: All PRs must follow the standard PR checklist.

Create a template for algorithm documentation

Expected Behavior

Future documentation of detection algorithms (Issues #18, #19, #20, #21, #22, #23, #24, #25, #26, #27 and #28) should have a consistent structure and make it easy to compare the different algorithms.

Current Behavior

The issues mentioned above ask for documentation of algorithms from the Data Science Bowl. If addressed by several people, the documentation of each algorithm inconsistent and messy, making it unnecessarily harder to read and compare algorithms. Thus the advantage over just using the original documentation would be minimal.

Possible Solution

Creating a template-file specifying sections and content to be filled in as much as possible. This issue thread is also thought of a place to discuss, which information about the algorithm should be included into the documentation.

Possible Implementation

Add a template_algorithms.md file containing the template to the docs/template-folder.

Provide a summary all of the data from a case into a single JSON report

Overview

Whilst initially any summary will be fairly bare, by adding a summary all of the data from a case into a single JSON report early-on in the project we can be sure that any data will/can be exported for completeness reasons, etc.

It will also likely help in debugging and development of backend components as it will avoid making manual queries to the database via SQL commands or via a Python shell such as IPython.

Expected Behavior

We should be able to including the general notes, and details for each nodule, as well as all data about a case into a single JSON report.

Technical details

For now, this can be a simple view that takes a Case data, generates the corresponding dict structure and pretty-prints using the pprint module within a HTML page.

The generation of the dict should be separate from the display so it can be reused and tested.

Acceptance criteria

The summary can be downloaded and the code used to generate it has at least one simple testcase to ensure lack of obvious regressions.

Show list of candidates nodules on "Detect and select" page

Overview

Before more data is added to this view, we will start by simply showing the list of candidate nodules that are returned by the backend. Therefore, "Detect and select" views should show list of candidate nodules.

No accept/reject support is required here; that will be covered elsewhere.

Expected Behavior

The list of nodule candidates should be displayed on the "Detect and select" stage of case investigation:

Technical details

An "accordion" interface is suggested. The predicted centroid metadata should be displayed as well, with any floating point numbers suitably truncated for human consumption.

Acceptance criteria

The view called "Detect and select" will show the list of candidates corresponding to the current case and any relevant metadata (eg. centroids, etc.) are displayed.

Blocked by

Issues: #9

Fix the missing images on the homepage

The images next to the logo, and above the "I'm a Data Scientist" text are missing.

Expected Behavior

The images should be visible

Current Behavior

The images are missing and the following console error messages appear:

Steps to Reproduce

visit https://concepttoclinic.drivendata.org/

Set up a basic npm vue/webpack setup

Overview

We want to be able to use .vue files, ES6, and so forth.

Expected Behavior

As a first pass, NdagiStanley/vue-django has a lot of what we want. At a minimum, we want to be able to build .vue files so that we can use ✨ ES6 :sparkles and have our project JS compiled, including .vue files.

Design doc reference:

Design and architecture section, software architecture diagram

Technical details

There is something thinking to be done about how people should develop with this setup. One awkward aspect of the vue-django example is that we have two devservers (I think?) -- one from Django's manage.py and one from the JS setup.
You don't have to handle hot JS reloading in this issue.

Acceptance criteria

One single command to build project .vue (and other JS) assets, the output of which can be served by the Django process
Docker setup updated so npm packages get installed

NOTE: All PRs must follow the standard PR checklist.

Feature: Adapt the grt123 model

Overview

Currently, there is a just a placeholder in the algorithm that classifies nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl (DSB) algorithms to predict P(cancer) given an iterator of nodule centroids for an image.

The top DSB algorithm (grt123) was written to run on a GPU for Python2. It would be nice to integrate this algorithm into the current structure and update it to run on Python3 (potentially on a CPU as well).

Expected Behavior

Given the grt123 model trained to perform this task, a DICOM image, and an iterator of nodule centroids, yield the P(cancer) for each nodule.

Design doc reference: Detect and select

Technical details

The majority of the Python3 and CPU conversion has been completed and is available in the conversion branch.
The forked model is available here (reads source DICOM images from S3).
One area that definitely needs review is the Py2/Py3 floor/true division conversions. Some calculation explicitly converted numbers to floats, and in those cases it was apparent that true division was desired. However, the remaining floor division calculations should be checked to ensure that true is not the appropriate operation.
If you get a UnicodeDecodeError error while trying to load the serialized Torch model, use the torch_loader function in the utils module instead.
When running on the CPU, it isn't necessary to perform that much work. Just enough to obtain a plausible result in a reasonable amount of time.
This feature should be implemented in the prediction/classify/trained_model/predict method.

Acceptance criteria

the integrated model produces results similar to the forked model

NOTE: All PRs must follow the standard PR checklist.

Feature: Implement identification algorithm

Overview

We need to adapt the Data Science Bowl algorithms to produce possible centroid locations for nodules within an image rather than just P(cancer) for the whole image.

Expected Behavior

Currently, there is a just a placeholder in the algorithm that identifies nodules in scans. Nodules are areas of interest that might be cancerous (or might not be, the goal here is just the potentially concerning areas). This must actually yield centroid locations of potential nodules (X voxels from left, Y voxels from top, Z slice number).

First we need to train a model to perform this task. Then, we need to serialize the model so that it can be loaded from disk and used to make predictions. This trained model should be added to the prediction/src/algorithms/identify/assets/ folder using git-lfs. Finally, we need to write the code in the predict method that will load the model from assets, take in a DICOM image, and yield nodule locations in the specified format.

Design doc reference:
Jobs to be done > Detect and select > Prediction service

Technical details

This feature should be implemented in the prediction/src/algorithms/identify/trained_model/predict method
Code to train the model should live in the prediction/src/algorithms/identify/src/ folder
A fully serialized version of the model that can be loaded should live in the prediction/src/algorithms/identify/assets/ folder

Out of scope

This feature is a first-pass at getting a model that completes the task with the defined input and output. We are not yet judging the model based on its accuracy or computational performance.

Acceptance criteria

trained model for identification
documentation for the trained model (e.g., cross validation performance, data used) and how to re-train it

NOTE: All PRs must follow the standard PR checklist.

Layout base Django template to match wireframes

Overview

Even though the backend pieces have yet to be written, it will be helpful to get a mocked out frontend version quickly. Small issues exist for many of the individual UI pieces, but first the overall layout has to be created.

We want DRY templates, so we need a base template that gives the rough Bootstrap layout (e.g. navbar, breadcrumbs, main div, left pane, main pane).

Expected Behavior

Closing this issue shouldn't implement the individual UI pieces.

In fact, many pieces of these pages will end up being refactored into individual Vue components in order to do dynamic front-endy things, so it's most important at this point to lay out some semantically correct HTML5 that others can build upon.

Design doc reference:
[1] Open imagery
[2] Detect and select
[3],[4] Annotate and segment
[5] Report and export

Technical details

Use Bootstrap v4 (alpha)
Create a base template that these will all inherit from and use one HTML page per wireframe image -- see the current stub at interface/frontend/index.html as a starting point
Use the breadcrumbs to link from each page to the others (for now)
URLs do not need to work, you can use href="#" for now

Acceptance criteria

Page more or less matches the wireframe, or uses sensible, idiomatic Bootstrap v4 UI elements
Smoke tests ensure that each page can be loaded

NOTE: All PRs must follow the standard PR checklist.

Document how to update documentation

Overview

We need documentation on how to update the documentation! Bootstrap this for us!

Current page: https://concept-to-clinic.readthedocs.io/en/latest/project-structure.html

Expected Behavior

The documentation is a sphinx project. We need documentation on how to edit, structure, build locally, and test the documentation for participants who aren't familiar with sphinx.

Technical details

Add a "How to edit this documentation" top level docs page

Acceptance criteria

"How to edit this documentation page"
Includes: how to install locally
Includes: how to build locally
Includes: how to update table of contents
Includes: links to how to write markdown for documentation

NOTE: All PRs must follow the standard PR checklist.

Create a smaller test image dataset

Overview

Currently test images are stored in the git repository. These images are formatted as directories of DICOM files. Each directory contains hundreds of files, and consumes ~100MB of space. We would like to reduce the number of files to the bare minimum needed to detect nodules and pass the tests.

Expected Behavior

The number of DICOM directories and images is reduced to the minimum amount necessary (1 - 3 directories, each with 5 - 10 DICOM files).

Technical details

Each directory contains an xml file with metadata about the included DICOM files. You may need to edit this file.

Acceptance criteria

number of test images is reduced as described above
the tests suite passes with the reduced image dataset

NOTE: All PRs must follow the standard PR checklist.

Document the Deep Breath algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	Deep Breath
rank	9
repo	https://github.com/EliasVansteenkiste/dsb3
trained models	not available
converted branch
ML engine	Lasagne
engine-version	0.2.dev1
ML backend	Theano
backend-version	0.9.0b1
training method	CNN
architecture	inception
algorithm	Resnet
OS	Ubuntu
OS version	16.04
Python version	2.7
CUDA version	8
cuDNN version	5.1
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Update error handling in the prediction API

Overview

We need to adapt the Data Science Bowl algorithms to produce possible centroid locations for nodules within an image rather than just P(cancer) for the whole image.

Expected Behavior

Currently, we just catch all exceptions and ignore them. We want to return as part of the json payload some useful information about what went wrong. For example, if the request didn't have the right parameters, we say what was missing. If the DICOM image could not be found, we say the file was not where we expected it.

Technical details

This feature should be implemented in the prediction/src/views.py file
We probably need multiple except ExceptionType blocks for different kinds of expected errors. For example:

    try:
       ...
    except ImportError as ie:
        # one error message
    except ValueError as e:
        # a different error message

We need a catch-all error message at the end

Out of scope

We don't need to catch every possible exception. This issue is to identify how some common errors might occur and then return useful error messages.

Acceptance criteria

returns useful error messages depending on the exception
tests for the failure cases that are handled

NOTE: All PRs must follow the standard PR checklist.

140MB fresh repo size (without much code or data, not using LFS)

Expected Behavior

After cloning using Git (and without LFS), after 30 commits and without test data, the repository should only be at most a few MB in size.

Current Behavior

It's over 140 MB when cloning the master.

Possible Solution

I guess that the reason is that in the beginning even big files were pushed using Git. When they were then transferred to Git LFS, they weren't removed from the "normal" Git history.

Steps to Reproduce

Clone the repository
Check its size (including hidden folders such as .git)

Context (Environment)

Detailed Description

Possible Implementation

Would it be possible to remove those big files from the history using references like Removing sensitive data from a repository or How to remove/delete a large file from commit history in Git repository?? Unfortunately, I'm not that familiar with LFS or history rewriting...

Checklist before submitting

I have confirmed this using the officially supported Docker Compose setup using the local.py configuration and ensured that I built the containers again and they reflect the most recent version of the project at the HEAD commit on the master branch
I have searched through the other currently open issues and am confident this is not a duplicate of an existing bug
I provided a minimal code snippet or list of steps that reproduces the bug.
I provided screenshots where appropriate
I filled out all the relevant sections of this template

Accept/reject nodule candidates interface

Overview

Some nodules will be identified as non-candidates by the end user. Therefore, we should provde the ability to accept/reject nodule candidates on the "Detect and select" view.

Expected Behavior

"Dismiss" and "Mark concerning" buttons should be persisted by the user:

The next nodule candidate displayed until there are none left.

The user's decision should be sent back to the backend immediately upon selection for persistence, rather than aggregating the entire page's results first.

Blocked by

Issues: #9

Self-document the prediction API endpoint

Overview

A useful API will tell you how to use it. Update the API endpoints to be more helpful!

Expected Behavior

Currently, there is just a placeholder if you GET our prediction API endpoints. We expect this call to tell us what the endpoint does, what the required parameters are, and what it returns.

To do this, we'll need to update our response payload to have that information.

Technical details

This feature should be implemented in the prediction/src/views.py
For each algorithm, we should add the right: description, expected_parameters, return_values
This feature should have tests for each algorithm in prediction/src/tests/test_endpoints.py

Acceptance criteria

accurately documents all three prediction endpoints
has passing tests

NOTE: All PRs must follow the standard PR checklist.

Show directory tree of possible images

Overview

The first step of the analysis and identification process is to select an image file from the local disk. We should therefore show the user all the possible files available.

Expected Behavior

Upon loading the application, we should show a directory tree of the potential files:

Technical details

The files should be displayed in a tree, ideally, using the Django storage framework. In the development environement this should be a directory within our project but ignored via a suitable .gitignore file.

No file format or name filtering is required; we can assume all files in the specified directory are valid. Sorting may need to be applied to the result of Django's listdir for deterministic ordering.

Acceptance criteria

All files are listed in their correct hierarchy, sorted alphabetically.

Blocked by

Issues: #9

Document git-lfs use for this project

Overview

There are a handful of large data sources for this project. We want to use git-lfs to manage those large files so they are not all in the repository. We need the documentation so that all the contributors understand how to use git-lfs.

Expected Behavior

At least the model asset folders (e.g., prediction/src/algorithms/classify/assets) and the test assets folder (tests/assets) have large files in them. Model assets are already tracked by git-lfs for people who have it installed (see .gitattributes).

Our documentation should enable people to:

install git-lfs
setup git-lfs with this project
decide to track files with git-lfs and remove those files from the repo
not use git-lfs (for example, to save on bandwidth) and pull the repo without the large files

Technical details

This feature should be implemented as a new section in the docs

Helpful links:
https://git-lfs.github.com/
https://help.github.com/articles/about-git-large-file-storage/

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Add ability to set right/left lung for candidate nodule

Overview

Annotation and segmentation view should support ability to save which right/left lung the nodule corresponds to; before we pass back any more detailed or complex information, we should simply provide a way to return this information so that adding further data in future is easier.

Expected Behavior

It should be possible to select and subsequently save the per-nodule lung orientation using the dropdown:

Note that only the left/right lung selection is part of this issue.

Technical details

The dropdown should not save until an "Accept" button has been pressed; see the design document for an example wireframe.

Acceptance criteria

Per-nodule lung orientation should be returned and persisted to the backend service, once "Accept" is pressed.

Blocked by

Issues: #9

Fix failing static.SmokeTest.test_landing test

Overview

The tests are currently failing on the master branch.

Expected Behavior

We should endeavour to keep the testsuite passing at all times.

Technical details

$ ./manage.py test
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
..F...
======================================================================
FAIL: test_landing (backend.static.tests.SmokeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/lamby/git/work/drivendata/challenge-application-test/interface/backend/static/tests.py", line 9, in test_landing
    self.assertContains(resp, 'Hello')
  File "/home/lamby/git/work/drivendata/challenge-application-test/interface/.venv/lib/python2.7/site-packages/django/test/testcases.py", line 393, in assertContains
    self.assertTrue(real_count != 0, msg_prefix + "Couldn't find %s in response" % text_repr)
AssertionError: Couldn't find 'Hello' in response

----------------------------------------------------------------------
Ran 6 tests in 0.049s

FAILED (failures=1)
Destroying test database for alias 'default'...

Acceptance criteria

The testsuite passes.

Document the Daniel Hammack algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	Daniel Hammack
rank	1
repo	https://github.com/dhammack/DSB2017 & https://github.com/juliandewit/kaggle_ndsb2017
trained models	https://retinopaty.blob.core.windows.net/ndsb3/trained_models.rar
converted branch
ML engine	Keras
engine-version
ML backend	Theano
backend-version
training method
architecture
algorithm
OS	Windows
OS version	64bit
Python version	2.7 & 3.5
CUDA version
cuDNN version
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Inspect a DICOM directory and create appropriate model instances

Overview

The interface backend has some model stubs for tracking metadata about DICOM imagery. The Case model tracks a radiologist's workflow examining an image series, so it's the heart of this application. But the Case has foreign keys to the ImageSeries model, which needs to know about the data.

Expected Behavior

We should be able to pass a directory URI to this method (helpful ref here) and expect that an ImageSeries object should be created if necessary, with metadata fields filled in.

Technical details

The DICOM standard may be helpful here
The pydicom library is excellent and probably what we want to use

This should likely live in the ImageSeries class, possibly as a classmethod; this could be like:

uri = 'file:///path/to/project/tests/assets/LIDC-IDRI-0001/1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192'
ImageSeries.get_or_create(uri)

Out of scope

Adding new fields to the models (let's get the ones we have working first)
Using or storing the array data; an issue already exists to pull in the DICOM image's numerical array (#12) for the prediction service, but that doesn't have anything to do with our DB models on the interface side of the house.

Acceptance criteria

Working implementation of the method described above
Tests added

NOTE: All PRs must follow the standard PR checklist.

Fix broken links in main documentation

Overview

A few links in the Getting Started section of the documentation are broken (the 2nd and 3rd bullets).

Expected Behavior

The links found at

Find an open issue you’re interested in working on
Get your local environment running with the developer documentation

should point to valid urls.

Acceptance criteria

the links no longer return 404 pages, and instead point to the intended pages.

NOTE: All PRs must follow the standard PR checklist.

Add function that loads DICOM images

Overview

All of our models need to take a path to a DICOM image (which is actually a directory of images and XML files) and then load that image into memory.

Expected Behavior

The function should take a path to a DICOM directory and load the data from that directory into a format that will be useful to the models. It will then provide For example DICOM-numpy may be useful here.

This issue is for a first pass implementation. As the models evolve, we may need to update and change the format that this method provides to its callers.

Technical details

This feature should be implemented in the prediction/src/preprocess folder

Acceptance criteria

method to load DICOM images into memory
tests for the method

NOTE: All PRs must follow the standard PR checklist.

Document the Alex |Andre |Gilberto |Shize algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	Alex \|Andre \|Gilberto \|Shize
rank	8
repo	https://github.com/astoc/kaggle_dsb2017
trained models	https://github.com/astoc/kaggle_dsb2017/tree/master/code/Andre/nodule_identifiers
converted branch
ML engine	Keras
engine-version	1.2.2
ML backend	Theano
backend-version
training method
architecture
algorithm
OS
OS version
Python version
CUDA version
cuDNN version
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Document the qfpxfd algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	qfpxfd
rank	4
repo	http://www.cis.pku.edu.cn/faculty/vision/wangliwei/software.html
trained models	not available
converted branch
ML engine	Keras
engine-version
ML backend	Tensorflow
backend-version
training method	CNN
architecture	3D VGG
algorithm
OS
OS version
Python version
CUDA version
cuDNN version
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Document the Owkin Team algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	Owkin Team
rank	10
repo	https://github.com/owkin/DSB2017
trained models	https://github.com/owkin/DSB2017/blob/master/sje_scripts/Unet_X.hdf5
trained models	https://github.com/owkin/DSB2017/blob/master/sje_scripts/Unet_Y.hdf5
trained models	https://github.com/owkin/DSB2017/blob/master/sje_scripts/Unet_Z.hdf5
trained models	https://github.com/owkin/DSB2017/blob/master/pic_scripts/model64x64x64_v5_rotate_v2.h5
trained models	xgboost trees/features not provided
converted branch
ML engine	Keras
engine-version	2
ML backend	Tensorflow
backend-version	1
training method
architecture
algorithm
OS
OS version
Python version
CUDA version
cuDNN version
notes	http://pyradiomics.readthedocs.io/en/latest/installation.html

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Cannot view the Travis CI logs

Via #65 (comment)

@reiinakano wrote:

Also, I can't seem to see the Travis CI log. This would be helpful so I can see what tests failed (if there are any). Since this is a public repo anyway, why not make the CI logs public as well?

3rd level header links in dev docs have colliding titles

Visit https://concept-to-clinic.readthedocs.io/en/latest/design-doc.html#prediction-service and you will notice that the 3rd level headings are the same in each section, this prevents you from navigating to any section other than the 1st.

Expected Behavior

Clicking on a heading (e.g., Interface API) should go to the appropriate section

Current Behavior

You can only navigate within the first section

Possible Solution

Use different heading names, or the :ref: directive

Acceptance criteria

Clicking on a heading (e.g., Interface API) results in a navigation to the appropriate section

Document the Aidence algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
rank	3
repo	https://bitbucket.org/aidence/kaggle-data-science-bowl-2017
trained models	requested
converted branch
ML engine
engine-version
ML backend	Tensorflow
backend-version	1.1
training method
architecture
algorithm	Resnet
OS
OS version
Python version	3.4
CUDA version
cuDNN version
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Document the grt123 algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	grt123
rank	1
repo	https://github.com/lfz/DSB2017
trained models	https://github.com/lfz/DSB2017/tree/master/model
converted branch	https://github.com/concept-to-clinic/DSB2017
ML engine	pytorch
engine-version	0.1.10+ac9245a
ML backend
backend-version
training method
architecture
algorithm
OS	Ubuntu
OS version	14.04
Python version	2.7
CUDA version	8
cuDNN version	5.1
notes

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.

Calculate tumor volume from pixel masks

Overview

We want to provide clinicians with summary statistics about identified tumors. In the segment model, we output per-pixel binary masks that identify which pixels are likely to be cancer. From these masks, we want to calculate the volume of the tumor.

Expected Behavior

Take as an input the path to the serialized masks (which is the output of the segment predict method). We also have the centroids that are passed into the predict method. For each centroid, calculate the volume of the tumor. DICOM has slice sizes so the units should be in real measurements (not pixels).

Design doc reference:
Jobs to be done > Segment

Technical details

This feature should be implemented in the trained_model.py file.
The output should be the volume for each tumor (from the centroid list) in a real unit

Related issues

#XX is the work to add this output to the API endpoint.

Acceptance criteria

method that outputs volumes from a binary mask and list of centroids
tests for this method

NOTE: All PRs must follow the standard PR checklist.

Pass calculated summary statistics from segment model to API

Overview

Once we produce summary statistics for tumors, those need to be returned to the frontend through the API.

Expected Behavior

Currently, the segment endpoint only returns a path to a file which has a binary mask of the DICOM image. This is useful for displaying the nodules, but we also want to report summary statistics. This issue tracks taking both the summary statistic calculations and the path to the binary masks generated by segment.trained_model.predict and surfacing both of those pieces of information through the API.

Design doc reference:
Jobs to be done > Detect and select > Prediction service

Technical details

This feature should be implemented in views.py for the prediction application and wherever else down the stack is necessary to pass along the data

Related issues

Depends on #13, which tracks actually calculating the volumes. It's acceptable to create a stub for that work and implement this independently. You may need to read #13 to fully understand this issue.

Acceptance criteria

An API endpoint for nodule summary statistics
documentation for the summary statistics endpoint

NOTE: All PRs must follow the standard PR checklist.

Document the MDai algorithm

Overview

Expected Behavior

This documentation should enable people to:

view the library dependencies and license
understand its pros/cons
evaluate its performance/accuracy
identify which areas of the codebase to target for improvement

Design doc reference: Detect and select

Algorithm info

key	value
team	MDai
rank	6
repo	https://github.com/mdai/kaggle-lung-cancer
trained models	requested
converted branch
ML engine	Keras
engine-version	1.2.2
ML backend	Tensorflow
backend-version	1.0.0
training method
architecture
algorithm
OS	Ubuntu
OS version	16.04
Python version	3.5
CUDA version	8
cuDNN version	5.1
notes	https://github.com/pydicom/pydicom/tree/bbaa74e9d02596afc03b924fe8ffbe7b95b6ff55

Technical details

This feature should be implemented as a new markdown file in the docs folder

Acceptance criteria

effective documentation for the above

NOTE: All PRs must follow the standard PR checklist.