dhammack / dsb2017 Goto Github PK

Code for 2nd place solution to the 2017 National Data Science Bowl

Python 60.41% Jupyter Notebook 39.59%

dsb2017's Introduction

2nd Place Solution To the 2017 National Data Science Bowl

This is my contribution to the 2017 NDSB 2nd place team solution. The other half is located here https://github.com/juliandewit/kaggle_ndsb2017

For documentation and technical details, see the file here: https://github.com/dhammack/DSB2017/blob/master/dsb_2017_daniel_hammack.pdf.

scoring_code: location of code which replicates the submission for stage 2 of the competition. Trained models will eventually go here once I have checked that it is OK to upload them. You will not be able to run this code until the models have been uploaded.

training_code: location of the code to rebuild the models required for scoring my part of the solution.

NOTE

Most, if not every, python script in this repo currently use absolute filepaths when referring to files. This was much easier for me to write originally but will cause issues when trying to replicate. If you are trying to rebuild/rescore my solution make sure to check the filepaths.

Also I sometimes make modifications to my local Keras install to try out new things. I'm planning on going over my code to check for these but I haven't done it yet. If you get a strange error where my code is trying to use a feature in Keras that doesn't exist, this is probably what happened. As far as I can recall the only times this should happen are custom initializations (orthogonal and looks-linear are the two that I may have done this with) and custom activations (I don't think I used any of these...).

Also - I have noticed that a newer version of OpenCV can break some of my code. If you get OpenCV errors, change the following line:

contours, _ = cv2.findContours(img,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

To:

_, contours, _ = cv2.findContours(img,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

REQUIRED DATA

for training: LUNA16 - neural net models were trained only on this data. note you will need the annotations_enhanced.csv file included in this repo which contains LIDC radiologist annotations for the LUNA16 nodules.

NDSB 2017 stage1 data - used for training final diagnosis model (not a neural network)

for scoring:

any dataset of DICOM files.

Actually using this

If you are interested in actually using this code in a real application rather than just replicating my work, please reach out. This code is unnecessarily complicated due to the cutthroat and very hasty nature of Kaggle competitions. It could be considerably simplified and sped up with no loss in performance. Furthermore, now that the competition is over and I have time to think clearly, I know of several ways to improve the performance of this system.

TODO: this readme.

dsb2017's People

Contributors

Stargazers

Watchers

Forkers

thatfreesky chelovekhe deeplayer tinyjie dearkafka sam186 xiaotie jdc08161063 goyalrajni sincewhenucla stevenlol chingu163 hejunbok ieee820 chankeh allensmile ykwon0407 falconzyx junjun2016 foxet lyk125 yuanswife qilicun bityangke tzzhang10 xiaofengqing tifftliu zhangxujinsh liwenliang1001 jenifferwuucla sawwhite chzblych gihanali kkfuwfny xuetsing orientier7 jiandanjinxin zhudaoruyi gokul180288 locosoft1986 tongli12 jeperez yydxlv joshww43 shagru naejin steccami bkong1990 science4fun davidfumo myulia foton263 luyangcat matt-boyd-pdx rafemcbeth guyucowboy chocl8camellia adiamb yuxuanluo wanqiyang petroffss showly wgarvino mcdifference yun-li c1a1o1 sunxingxingtf qinyongqiang lidabing hiterstone roclee81 4quant hfcf20061 tongwangjp zhuiyuan616124 dskylt 86697603 dreyk jameskry desti-nation heyuqi pengkiki whaozl aiedward sagorika1996 njuzion khaledto zpeng1989 nikithayalala peterxiaoguo jungi21cc ppcv agreenpig hxyshare cancerimageai cesarappgit niyangfan mlyxs yluo39github hongdayu

dsb2017's Issues

about the data

Hi Daniel,

When I run the file DSB2017-master/training_code/FLung_nodule_models/create_nodules_from_modelv29.py, I find the line 230:

train_files = [f for f in os.listdir(r'E:\lung\data_raw\data')]

I want to know if the data in the directory 'E:\lung\data_raw\data' is the data provided by kaggle ?
If this is the case, kaggle provides the stage1 data is not like LUNA. Do I need to convert the .dcm file provided by kaggle into a .raw file and a .mhd file? How can I do it?

csv_file

Hello daniel:
if I want to change your code for luna16,how can i create my own stage1_masses_predictions.csv?

Thanks

Does this project use GPU? If not, can it be added?

Question about lung segmentation U-net

in your description,
use U-net for segmentation lung.

http://juliandewit.github.io/kaggle-ndsb2017/

how to get lung segmentation mask?
there are some other data or manually segment lung?

Please give me your email. can share simpler code like in Actually using this you mentioned

Hello Mr Hammack. I am a university student from Vietnam, currently I have a subject project on the topic of using this LUNA16 set. Can you give me your email information and how to simplify this code?, sorry my english is not good. Thank you very much, looking forward to your reply.

Start file?

Hi @dhammack ,

Thanks for the uploading. I have seen several different scripts for nodule detection. However, which file should I start with? Or are they all parallel? Thanks

what is branching?

Could you elaborate what do you mean by "branching":

It was also found that ‘branching’ the model earlier produced better results when training with multiple objectives. If branching is done too late, the model outputs are too correlated (as they share too many parameters) and thus they provide overall less information in the next stage of the pipeline.

My understanding, that you are running a few iterations adding (?) blocks and checking whether the network started predicting, is it correct?

Generation of annotations_enhanced.csv

Hi Daniel,

Had a question regarding the generation of annotations_enhanced.csv. The first few columns are fairly straightforward(from luna16 annotations.csv and the mhd file itself). But how was the 'margin', 'lobulation', 'spiculation', 'malignancy' values generated? In LIDC xml, these features have integral values from 1 - 5 but in annotations_enhanced.csv, these features have fractional parts. Can you please explain?

seriesuid 1.3.6.1.4.1.14519.5.2.1.6279.6001.100398138793540579077826395208
margin 3.66667
lobulation 1.33333
spiculation 1.33333
malignancy 2.66667

Thanks.

DSB

Hi @dhammack
Lung Cancer detection is my final year project. So i have to work seriously on that. So can you help me for that ?

slice indices must be integers or Noe

when I run your code,I got this error slice indices must be integers
detail:
--> 174 voxel = img[z_start:z_end,y_start:y_end,x_start:x_end]
voxel = undefined
img = array([[[-2048, -2048, -2048, ..., -2048, -2048,... -2048, ..., -2048, -2048, -2048]]], dtype=int16)
z_start = 68.5
z_end = 81.5
y_start = 268.5
y_end = 311.5
x_start = 75.5
x_end = 118.5

build_luna_model_v29.py

about the file DSB2017-master\scoring_code\create_preds_from_model_outputs.py???

Hi Daniel,

When I run the file DSB2017-master\scoring_code\create_preds_from_model_outputs.py, I find the line 264:

labels = pd.read_csv(r"F:\Flung\stage2\stage1plus2_labels.csv")

In the DSB2017-master \ scoring_code folder I can not find this file stage1plus2_labels.csv, I can only find stage1plus2_masses_predictions.csv. When I use stage1plus2_masses_predictions.csv instead of the stage1plus2_labels.csv, code produces an error. The error should come from the file format is wrong, so I guess stage1plus2_masses_predictions.csv file can not replace the above documents.

I hope you can give me some advice, thanks

License for the project

Hi!
I would like to try out your project/reproduce it, but as it stands right now I have no right to do so since there is no license attached.
Would you mind adding one?

data generator failure

Hi Daniel,

Last time I sent an email to you to report the problem. I tried to run the /DSB2017-master/training_code/aws/nodule_des_v37b.py file. But there is an error like this:
File "nodule_des_v37b.py", line 121 ,in get_generator_static
ixs1 = np.random.choice(range(X1.shape[0]),size=n1,replace=False)
ValueError: Cannot take a larger sample than population when 'replace=False'

You gave me the suggestion that the data didn't generate correctly.
But about the file /DSB2017-master/training_code/aws/data_generator_fn3.py, I just modified the corresponding directory . I guess if the file being used has any problem.
annotations_enhanced.csv
I use the file in DSB2017-master/training_code/DLung
candidates_V2.csv
Because I did not find this file in your folder, I used this candidates_V2.csv file in the sources provided by Julian. And I don't know whether there is a problem.
LUNA
I just use the whole data set of LUNA16.

Secondly, I have some confusion about these ‘None‘.

Thirdly, about the error :
File "nodule_des_v37b.py", line 121 ,in get_generator_static
ixs1 = np.random.choice(range(X1.shape[0]),size=n1,replace=False)
ValueError: Cannot take a larger sample than population when 'replace=False'

When I change the replace=True, it gets following results:

There are also some 'None' and 'MetaImage: M_ReadElementsData: data not read completely'.
I don't kow what's wrong.

trained network

Hi Daniel,

Would it be possible to get access to your already trained network without the need to re-do the trainig from scratch?

Thank you!
Laleh

How to generate annotations_enhanced.csv? and how do you plot Figure 3: Feature importance plot. in pdf thanks

Hi, dhammack. Your work is great ! Thank you for sharing!
I have two questions to ask you.

how to generate annotations_enhanced.csv? which code can generate this file?
how do you know the Feature importance mentioned in pdf file? How to calculate that? Could you provide and share the relevant code?
Looking forward to a reply.Thank you very much!

dhammack / dsb2017 Goto Github PK

dsb2017's Introduction

2nd Place Solution To the 2017 National Data Science Bowl

NOTE

REQUIRED DATA

Actually using this

dsb2017's People

Contributors

Stargazers

Watchers

Forkers

dsb2017's Issues

Recommend Projects

Recommend Topics

Recommend Org