nci-cgr / plco-analysis Goto Github PK
View Code? Open in Web Editor NEWPrimary workflow for the PLCO "Atlas" project
Primary workflow for the PLCO "Atlas" project
The resources under annotations/
are large and currently tracked with git-lfs
, which is fine for the moment; but I'd really like to just put the files somewhere that could just be downloaded via wget
in a pipeline somewhere. Please, Future Person, save me from myself. (I contacted IT and they said they couldn't help me, but would try to find a solution in the future... good luck with that!)
I drafted a variety of additional pipelines before we fixed the scope of the project, and those pipelines are in various states of disrepair after not being included in the primary dev process. need to flag those problematic ones and remove them.
when there are phenotypes with the same prefix they are grouped together
e.g. j_lung_cancer and j_lung_cancer_current_smokers got merged together in the final file.
This either needs to be fixed in the Makefile or make sure the phenotypes have different prefix, e.g. change j_lung_cancer_current_smokers to j_current_smokers_lung_cancer?
I've started hearing rumors that the number of principal components used in association may be changed to some other number greater than the current 10. a nice gift to bequeath to my successor would be a configurable parameter for this somewhere in Makefile.config
, and support for that increased ceiling in the model matrix constructor, such that they don't have to deal with an immediate extension buried within makefiles.
which is bad, when ldsc/ldscores pipelines assume python2
existing method is a hack with expected anticonservative bias. bakeoff different scalable regression methods
similar to control-inclusion and control-exclusion, this is needed if we want to say examine lung cancer among never smokers. Another option is to add cohort inclusion/exclusion but having both case and control inclusion/exclusion seems more general
the first v1.0.0 release is literally the copy that was used for phenotype tranche 1. that does no one any good, as it has a bunch of baked in nonsense and no conda environment specification. this is obviously an urgently needed fix.
the milestone tracker on the README is ancient, fix that
so the pipelines (and configuration directory itself) have testing subdirectories that contain simple test scripts for integration with TAP/automake. you can generally run them from top-level by running make config-check
(or generally make {rulename}-check
. they are not nearly, not even close, as expansive as they should be; furthermore the tests for primary analysis are terribly slow since they are not set up in such a way that allows parallelization.
please, Future Person, do a better job of this than the cobbled-together nonsense that's in there now!
this is so that we can run ldsc after extra steps
when the sample size of one of the comparisons is below the threshold it does not produce the tracking files to inform the next steps, and then when running meta the pipeline breaks
make[1]: *** No rule to make target '.../?.SAIGE.categorical-combined.tsv.success', needed by '/.../?.SAIGE.final-ids.tsv.success'. Stop.
recently found an entire pipeline that hadn't been updated to the $(call log_handler)
or $(call sub_handler)
convention. need to go through all pipelines in all directories and ensure that everything that's in use has actually been updated
now that this is publicly available, host rst documentation for readthedocs
Makefile.config is one of the single oldest pieces of the pipeline. it's being used, no question; but it has an accumulated pile of garbage in it. of the various things to go through and clean up before departure, this is probably the highest priority. notably, it still has enumerated extension definitions in it when those are now being primarily handled in a yaml
file along with https://github.com/NCI-CGR/initialize_output_directories, so it's actually quite bad.
this project was designed with the intention that it be used for the PLCO "atlas" project. later developments have indicated that there may be the need for this code to be used for other projects. some of the "PLCO" nomenclature is baked into the pipelines (this was initially targeted for removal during an early milestone for the project but was removed from the project plan after those milestones were scrapped by superiors).
Atlas investigators have requested chrX support in the pipeline. This is not too difficult but requires pulling in imputed files generated by someone else. Each downstream tool handles chrX differently, so support needs to be cooked into each individual association pipeline.
make check-config
is broken due to separate added support for deeper level yaml
structures in config/*.yaml
. especially if outside people are going to be using this pipeline in the near future, they're going to need to rely heavily on that check code to catch their doubtless many yaml
formatting errors.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.