Giter VIP home page Giter VIP logo

evotum101's Introduction

Evotum101

Modelling tumor progression from a single VCF file

Evotum (demo) This perl script called ‘EvoTum_demo.pl’ is the accompanying code for the theoretical framework presented in “Estimating growth patterns and driver effects in tumor evolution from individual samples”, with authors Leonidas Salichos, William Meyerson, Jonathan Warrell and Mark Gerstein. Together, we provide 100 pseudo-VCF files as test play data. The files have been created based on real VCF files, but all personal and mutational information have been omitted or randomly changed. Mutational information about chromosome and position do not correspond to any true genomic coordinations.

Evotum dependencies For running EvoTum_demo.pl the user needs to install additional perl packages Statistics-R-0.33, Regexp-Common-0.01 and IPC-Run-0.80.

The following libraries need to be modified at script lines 3-5:

use lib “your-path-to/R/Statistics-R-0.33/lib/";

use lib “your-path-to/R/Regexp-Common-0.01/lib/";

use lib "your-path-to/R/IPC-Run-0.80/lib/";

Input files

The script accepts (pseudo)VCF as text files -see test files- where the user has to provide which tab denotes the chromosome, the reference and altered nucleotide and the VAF frequency. The script does not correct for ploidy and CNV. On line 66, the user can de: ######### Define VCF columns

$chrom_col=0; # Chromosome column

$pos_col=1; # Mutational position column

$vaf_col=7; # VAF column

$ref_nt_col=3; # Reference nucleotide column

$alt_nt_col=4; # Altered nucleotide column

$pat_id_def="provided-or-file-ID"; # Patient ID

Parameters As also described in the script, the user can run different modes or modify a number of parameters:

I) Modes (script line 25)

• $fit_expo=1; #Runs the main algorithm. Can be set to 0 for only generating the intermediate vcf files. Their format is ID_Chromosome_MutationalPosition_RefNT_AltNT_VAF.

• $resume_mode=0; # When running a list or multiple jobs, resume_mode=1 deters from running the same VCF again. To rerun delete corresponding $resume_file or set to 0.

• $resume_vcf_mode=0; # If equal to ‘1’, this mode restarts the algorithm from the last mutation it analyzed based on a log text file called "RESUME_temp_XXXX". Not recommended at this demo stage. Instead, the user can directly re-set the starting mutations in the command line (see examples).

• $sampling_r_mode=-1; # Leave '-1' for the DEFAULT approach with direct Λ-ambda calculation when estimating growth r.

• $ generation_mode=1; # DEFAULT measure of k. Script is optimizing for local generational time tg. Growth r is calculated independently with direct Λ-ambda calculation.

• $general_classic_mode=0; # Recommended for first or defining main clonal mutation. No optimization for generational time tg. Vulnerable to initial perturbation (e.g. CNV). Hitchiker generations are starting from generation 1 corresponding to hitchhiker 1 without calibration or offset.

• $reoptimize_r_mode=0; # Similar to $general_classic_mode, no optimization for generational time tg, but growth r is estimated through “Non Linear Squares” optimization and not Λ-ambda.

• $verbose_mode=1; # Print results on screen while running in addition to files; Helpful to immediately pick up abort-mutation when the algorithm exit due to bad optimization (see examples below).

Modes $generation_mode, $general_classic_mode and $reoptimize_r_mode are not exclusive and can be run at the same time. II) Parameters (line 37)

• $mut_pass_window=150; # The size of the sliding window for number of m=150 hitchhikers; No k and r values will be reported for the first m of the sample's mutations. In our analysis, we normally used m=100 or m=150. Too large will often result in reduced k vales and averaged growth. Too small will be more unstable to small changes and model optimization and parametrization.

• $coverage=0; # Resample VAF Frequency to simulate equal or lower than 1000x coverage. Use '0' for no resampling.

Useful script changes

  • Switching between VAF and frequency: Line 141, modify 2*VAF

  • Changing output directory: Line 46. Currently output is printed inside TESTFILES folder

  • Popping VAF frequency: Sometimes it might be hard to define the frequency-column in a pseudo vcf file. At the same time, VAF is often found as/at the last column. User can switch between lines 137 and 138 to pop the frequency, if at the last column of the text file.

Examples The output files can be found inside TESTFILES/OUTFILES/.

To run the algorithm the user needs to specify at least the file name. Using our play test data included in TESTFILES, the simplest form is:

perl EvoTum_demo.pl TESTFILES/108.psdVCF

This will run the script until ordered mutation number 517 or “Ordered mut.:517” where R will fail to optimize and exit. Consequently, the user can type:

perl EvoTum_demo.pl TESTFILES/108.psdVCF 518

This will restart the algorithm from mutation number 518. The user can superimpose a new sliding-window-size by providing an additional third argument as:

perl EvoTum_demo.pl TESTFILES/108.psdVCF 0 120, where 0 is the starting mutation and 120 the size of the sliding window.

File output When run as previously, the algorithm prints in new TAB text file TESTFILES/OUTFILES/108.psdVCF_cov0 with the following output:

1st Column/Ordered_mut.: the number of the ordered mutation.

2nd Column/genTime: Estimated generational time tg.

3rd Column/Independent_pos._growth_r: Independent calculation of positive growth (if r >0).

4th Column/ Independent_neg._growth_r: Independent calculation of negative growth (if r >0).

5th Column/ DEFAULT_scalar_k_with_tg_opt | scalar_k_with_classic_mode(no_tg_opt) | scalar_k_re-opt_with_dependent_r: scalar multiplier k. If more than one modes are used, different scalar ks are separated with space bar.

6th Column/ kr_growth: kr describes the driver’s new growth where scalar k is multiplied with hitchhiking growth to estimate new base-growth for fitness population.

7th Column/ DEFAULT_growth_r_with_tg_opt| Independent_growth_r_calc| growth_r_dependent_re-opt(no_tg): hitchhiker base growth r parameter similar to columns 3&4. This column is included in case the user selects mode reoptimize_r_mode which provides a new re-optimized growth rate r’ based on NLS optimization of m hitchhikers instead of a direct calculation through Λ-ambda.

8th Column/e.g. 108.psdVCF_16_18542407_G_A_0.808: Provides information on the specific mutations, including sample id (108.psdVCF), exact position (chrom+position), nucleotide change and mutation frequency.

evotum101's People

Contributors

ls926 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

evotum101's Issues

Installation / script issues

Hi,
Thank you for developing evotum101. However, I seem to be unable to successfully run the script. Is the the most recent version?
After adding the locations of Statistics, IPC and Regexp perl libraries to the script, running it results in this output, and aside from the creation of the output folder, no files are written.

% perl EvoTum_demo.pl TESTFILES/108.psdVCF
Arguments given:
name-of-file:TESTFILES/108.psdVCF
nr-of-starting-mutation:
path given is:TESTFILES
fstline_name:	irst_name:
resume_vcf:TESTFILES/OUTFILES/RESUME_VCF_108.psdVCF
vcf_pseudo:TESTFILES/OUTFILES/VCF_intermed_1000/
first_name:
resume vcf is on. Starting from:1:
rm: TESTFILES/OUTFILES/RESUME_VCF_108.psdVCF: No such file or directory
rm: TESTFILES/OUTFILES/VCF_intermed_1000/: is a directory
Vaf 1 is  . CHECK for prevelance
clonal_fitness2 mode selected
full date is date8-49-22_4-15-120
Supercalifragilous

Larg-ish blocks of the code are commented out and some of the variables mentioned in the README don't exist in the code (i.e. $verbose_mode).
I am probably missing something here - any clue why I am failing?

TESTFILES no output

i set library path
use lib "/usr/lib64/perl5/R/Statistics-R-0.33/lib/";
use lib "/usr/lib64/perl5/R/Regexp-Common-0.01/lib/";
use lib "/usr/lib64/perl5/R/IPC-Run-0.80/lib/";
use lib "/usr/lib64/perl5/R/Regexp-Common-0.01/lib/Regexp/";
use lib "/usr/lib64/perl5/R/IPC-Run-0.80/lib/IPC";
use lib "/usr/lib64/perl5/R/Statistics-R-0.33/lib/Statistics/";

and run command line: perl EvoTum_demo.pl TESTFILES/108.psdVCF

output
Arguments given:
name-of-file:TESTFILES/108.psdVCF
nr-of-starting-mutation:
path given is:TESTFILES
fstline_name: irst_name:
resume_vcf:TESTFILES/OUTFILES/RESUME_VCF_108.psdVCF
vcf_pseudo:TESTFILES/OUTFILES/VCF_intermed_1000/
first_name:
resume vcf is on. Starting from:1:
rm: cannot remove ‘TESTFILES/OUTFILES/RESUME_VCF_108.psdVCF’: No such file or directory
rm: cannot remove ‘TESTFILES/OUTFILES/VCF_intermed_1000/’: Is a directory
Vaf 1 is . CHECK for prevelance
clonal_fitness2 mode selected
full date is date4-29-33_7-12-120
Supercalifragilous

However, there no output and also there are no RESUME_VCF_108.psdVCF at starting point.

what am I doing wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.