scholl-lab / vcf-filtering Goto Github PK
View Code? Open in Web Editor NEWA collection of scripts for filtering annotated variant call format files
License: MIT License
A collection of scripts for filtering annotated variant call format files
License: MIT License
When we convert the tsv values using replace_gt_with_sample.sh and the GT field has only numbers and they are separated by comma, excel converts these to numbers.
Solution:
This should inlcude SnpSift, awk, sed, tee and bash.
In the R script add echo messages for the package version and R vesrion runnig.
Similar to the output from "sed --version".
Plus add sccript author information.
To make usage easier, the repository README should include instructions on how to install required tools and libraries, such as conda.
Use our final script "filter_phenotypes.sh" in "filter_variants.sh" to output the phenotypes of all filtered samples. If the xlsx flag is set, the output of filter_phenotypes.sh should be in a separate sheet. If the output is TSV (the default), it will be two files with the same basename and ".phenoytpes" appended before the "filter_phenotypes.sh" output. If the output is stdout, the phenotypes should not be printed. In general, there should be an optional flag to request phenotypes. Also, the location of the "filter_phenotypes.sh script and the input-file argument to it should be added as arguments to this script.
TODO:
Add an option to filter by position or for specific variants instead of gene name; check the reference allele for input variants.
The script can produce unexpected output If the GT field does not contain VCF style genotypes.
Implement a check for correct data format in this column.
The -V and -h options do not appear to be implemented in args or long to short argument handling.
Make tsv_to_excel.R location an argument with defaults.
For conversion, the script "tsv_to_excel.R" should accept various separated file formats (including csv).
TODO:
Make the script handle CSV or TSV files with different delimiters. Add an option to specify the delimiter (output). This makes the script more flexible in dealing with various file formats.
Add a script that will generate screenshots of the variants' alignment in the IGV browser so that it can be validated. This should be done for all the variants passing filtering. The screenshot should include at least one control.
TODO:
Add semantic versioning and a version flag to replace_gt_with_sample.sh
Add help message as default when passing no arguments to have the same behaviour as the other scripts.
Currently all arguments for "replace_gt_with_sample.sh" are hardcoded. To allow more flexibility there should be ar argument to pass script specific setting from the master script.
For reproducibility it would be helpful to add meta information to the Excel output including filter setting and column documentation.
TODO:
Include in all scripts the requirements and software or package versions that have been tested.
The scripts should also echo their name and this should be set as a variable in each script.
Needed for #4
Currently the default values overwrite the config values in the script. This is not the expected or intended behaviour.
Add basic statistics for the filtered table in echo command.
This could include for example:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.