Giter VIP home page Giter VIP logo

plco-analysis's People

Contributors

shukwong avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

plco-analysis's Issues

modernize/update the checks

so the pipelines (and configuration directory itself) have testing subdirectories that contain simple test scripts for integration with TAP/automake. you can generally run them from top-level by running make config-check (or generally make {rulename}-check. they are not nearly, not even close, as expansive as they should be; furthermore the tests for primary analysis are terribly slow since they are not set up in such a way that allows parallelization.

please, Future Person, do a better job of this than the cobbled-together nonsense that's in there now!

error in aggregating results files in globus.Makefile

when there are phenotypes with the same prefix they are grouped together
e.g. j_lung_cancer and j_lung_cancer_current_smokers got merged together in the final file.
This either needs to be fixed in the Makefile or make sure the phenotypes have different prefix, e.g. change j_lung_cancer_current_smokers to j_current_smokers_lung_cancer?

chrX support

Atlas investigators have requested chrX support in the pipeline. This is not too difficult but requires pulling in imputed files generated by someone else. Each downstream tool handles chrX differently, so support needs to be cooked into each individual association pipeline.

patch to installable version

the first v1.0.0 release is literally the copy that was used for phenotype tranche 1. that does no one any good, as it has a bunch of baked in nonsense and no conda environment specification. this is obviously an urgently needed fix.

add case-inclusion and case-exclusion options

similar to control-inclusion and control-exclusion, this is needed if we want to say examine lung cancer among never smokers. Another option is to add cohort inclusion/exclusion but having both case and control inclusion/exclusion seems more general

replace git-lfs hosting with something, anything, elsewhere

The resources under annotations/ are large and currently tracked with git-lfs, which is fine for the moment; but I'd really like to just put the files somewhere that could just be downloaded via wget in a pipeline somewhere. Please, Future Person, save me from myself. (I contacted IT and they said they couldn't help me, but would try to find a solution in the future... good luck with that!)

clean up Makefile.config

Makefile.config is one of the single oldest pieces of the pipeline. it's being used, no question; but it has an accumulated pile of garbage in it. of the various things to go through and clean up before departure, this is probably the highest priority. notably, it still has enumerated extension definitions in it when those are now being primarily handled in a yaml file along with https://github.com/NCI-CGR/initialize_output_directories, so it's actually quite bad.

cleanup of incomplete pipelines

I drafted a variety of additional pipelines before we fixed the scope of the project, and those pipelines are in various states of disrepair after not being included in the primary dev process. need to flag those problematic ones and remove them.

custom max number of principal components estimated/used

I've started hearing rumors that the number of principal components used in association may be changed to some other number greater than the current 10. a nice gift to bequeath to my successor would be a configurable parameter for this somewhere in Makefile.config, and support for that increased ceiling in the model matrix constructor, such that they don't have to deal with an immediate extension buried within makefiles.

tracking inspection

recently found an entire pipeline that hadn't been updated to the $(call log_handler) or $(call sub_handler) convention. need to go through all pipelines in all directories and ensure that everything that's in use has actually been updated

fix config check

make check-config is broken due to separate added support for deeper level yaml structures in config/*.yaml. especially if outside people are going to be using this pipeline in the near future, they're going to need to rely heavily on that check code to catch their doubtless many yaml formatting errors.

project agnostic nomenclature/inputs

this project was designed with the intention that it be used for the PLCO "atlas" project. later developments have indicated that there may be the need for this code to be used for other projects. some of the "PLCO" nomenclature is baked into the pipelines (this was initially targeted for removal during an early milestone for the project but was removed from the project plan after those milestones were scrapped by superiors).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.