wayscience / cell-health-data Goto Github PK
View Code? Open in Web Editor NEWData processing for the Cell Painting data from the Cell Health experiments
Data processing for the Cell Painting data from the Cell Health experiments
While attempting to normalize cells with the DeepProfiler tools from https://github.com/cytomining/pycytominer, @jenna-tomkinson and I ran out of memory. Because of the size of the Cell Health dataset, the single-cell dataframe PyCytominer attempts to compile and normalize is not able to fit into the 64 GB of memory on our machine. Thus, we are pursuing a couple different alterations to PyCytominer to make it capable of normalizing the dataset.
Our information regarding normalization population is derived from Data-analysis strategies for image-based cell profiling by Caicedo et al.
Normalization by all samples:
Caicedo says the following about choosing all samples for normalization population:
Ideally, features are normalized across an entire screen in which batch effects are absent
In the perfect world, we would normalize across all single cells. Of course, this is not viable because we do not have enough memory.
Normalization by plates:
Caicedo says the following about choosing plates for normalization population:
normalization within plates is generally performed to correct for batch effects (described in 'Batch-effect correction')... all samples on a plate can be used as the normalizing population when negative controls are unavailable, too few, or unsuitable for some reason, and when samples on each plate are expected to not be enriched in dramatic phenotypes.
We are not attempting to correct for batch effects in Celll Health data. The proposition that our data is "not enriched in dramatic phenotypes" is somewhat subjective and thus is a questionable basis for choosing this normalization method.
Normalization by controls:
Caicedo says the following about choosing controls for normalization population:
When choosing the normalizing population, we suggest the use of control samples (assuming that they are present in sufficient quantity), because the presence of dramatic phenotypes may confound results. This procedure is good practice regardless of the normalization being performed within plates or across the screen.
Given the abundance of controls in Cell Health data, this seems like the most viable method of normalization.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.