Giter VIP home page Giter VIP logo

cell_organelle_classifer's Introduction

Cell Organelle Identifier Using Fluorescent Proteins

For the final project for Flatiron School's Data Immersive Course, I wanted to focus again on image classification. From my personal blog presentation on SSIM to identify picture pixel similarity and module 4's animal image classification, I found that analysing unstructure data to be very satisfying. In addition, I studied biology as an undergrad and have interned at a breast cancer research lab for a year before joining this bootcamp so I wanted to combine these two passions of mine into one. Unfortunately, getting pictures of cancer cells that are labeled is extremely difficult as it is propiretory data of each medical institute. So I focused on building a model that can perhaps identify cell organelle based on their structures. The cell organelles have each been tagged with specific fluorescent proteins. Specific protein only binds to specific organelle and once bound, they give off a color (most commonly green, but other colors such as DAPI staining for the nucleus exist).

Data

I worked with two different data set. Nether data set is on github repo, but I will link them below:

Dataset 1

The first dataset is labeled cell pictures of Yeast from Chong et all. Yeast was used instead of human cells as they have a 90% similarity in terms of cell organelle structures. They had 11 organelles labeled, but I decided to work on only 6.

Issues with Dataset 1

The dataset from source stated that pictures were of high resolution. So using what I know about VGG16 model transfer learning, I made a few models and trained the images on them. Since there were many pictures from each labels, this took a while. After ward, I decided to look at basic evaluation metrics and saw that my machine was not performing above baseline. Then I actually took a look at the pictures and saw that the resolution was actually terrible instead. I still went ahead and started to modify my model until I finally made one that can predict at double the base line. That can be found in my Dataset 1 Final Notebook.

Results

Here are the model accuracy, training and validation graph:

screen shot 2019-02-11 at 5 53 49 pm

screen shot 2019-02-11 at 5 53 35 pm

Here are the classification metric and confusion matrix for dataset 1:

screen shot 2019-02-11 at 5 38 33 pm

screen shot 2019-02-11 at 5 38 44 pm

Without higher image resolution, I was not able to make accurate prediction.

Dataset 2

The second dataset is from the Human Protein Atlas. They had a dataset on kaggle. The dataset can be found in HPA-Kaggle. They had higher resolution pictures, but they were unlabaled and linked with CSV. Using os.join.path, that was relatively easy to fix.

The final notebook for dataset 2 can be found in my Dataset 2 Final Notebook. After linking the labels, here are the categories: image

Dropping samples with less than 200 labels, I ran this through the machine algorithm I made in dataset 1. After tweaking the algorithm some more, I finally achieved a high accuracy result

Results

screen shot 2019-02-11 at 5 51 43 pm

And here are the training and validation graph: image

Here is some example of it's prediction: image

As you can see on the last picture, it was able to identify some aggresome cells, which are the first stages of occurence for cancer. Although I will need a lot more pictures of different variations of aggresome cells to be able to predict accurately.

Conclusion

Although I am confident in my algorithm to train and classify cell organelles, I was not able to achieve the high accuracy I wanted due to lower resolution pictures. My goal is to get high resolution pictures of cancer cells, and run it through this model. This can potentially help detect cancer cells on the same level as a trained professional.

cell_organelle_classifer's People

Contributors

imamun93 avatar

Stargazers

Arinjay Gholap avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.