Giter VIP home page Giter VIP logo

drugz's Introduction

drugz

DrugZ software from the Hart Lab

DrugZ detects synergistic and suppressor drug-gene interactions in CRISPR screens.

usage: drugz.py [-h] [-i sgRNA_count.txt] [-o drugz-output.txt]  
                [-f drugz-foldchange.txt] -c control samples -x drug samples  
                [-r remove genes] [-p pseudocount] [-I INDEX_COLUMN]  
                [--minobs minObs] [--half_window_size half_window_size] [-q]  
  
-i      	Readcount file, tab-delimited text (input)  
-o      	DrugZ results file, tab-delimited text (output)  
-f      	DrugZ Z-transformed fold change file (optional)  
-c      	Control samples: comma-delimited list of column headers in readcount file  
-x      	Treated samples: comma-delimited list of column headers in readcount file  
-r      	Comma-delimited list of genes to remove before analysis  
-p      	Pseudocount to add to all readcounts; prevents log(0) problems (default=5) 
-I      	Index column (default=0)  
--minobs   	Ignore genes with fewer observations ( gRNA/gene x replicates) (default=1) 
--half_window_size  Size of the first bin and half the size of the inital sample
    (window) to estimate std (default=500) 
-unpaired Unpaired approach: compares mean(treated samples) to mean(control samples) (default=False)

The input file should be a tab-delimited file with the following format:

sgRNA	Gene	T0	T15_A_control	T15_B_control	T15_C_control	T15_A_olaparib	T15_B_olaparib	T15_C_olaparib
A1BG_CACCTTCGAGCTGCTGCGCG	A1BG	313	235	47	337	428	115	340
A1BG_AAGAGCGCCTCGGTCCCAGC	A1BG	99	8	1	13	26	5	28
A1BG_TGGACTTCCAGCTACGGCGC	A1BG	650	336	74	185	392	193	304
A1BG_CACTGGCGCCATCGAGAGCC	A1BG	718	192	34	296	178	69	185
A1BG_GCTCGGGCTTGTCCACAGGA	A1BG	180	230	29	122	394	148	364
A1BG_CAAGAGAAAGACCACGAGCA	A1BG	428	300	158	294	366	184	489
A1CF_CGTGGCTATTTGGCATACAC	A1CF	677	452	74	423	585	446	434
A1CF_GGTATACTCTCCTTGCAGCA	A1CF	138	69	43	109	96	184	127
A1CF_GACATGGTATTGCAGTAGAC	A1CF	396	183	38	106	193	120	198
(etc)

Critically, the "gene" column must be the first non-index column in the file, and the column headers are used on the command line. For example, to execute DrugZ analyzing just the A and B replicates of this file, the command line would be:

drugz.py -i [input_file] -o drugz-output.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib

To save the intermediate gRNA-level raw and normalized fold changes for other analyses, add the -f flag:

drugz.py -i [input_file] -o drugz-output.txt -f drugz-foldchange.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib

To run drugZ for an unpaired approach, add the -unpaired flag:

drugz.py -i [input_file] -o drugz-output.txt -f drugz-foldchange.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib -unpaired

To run drugZ analysis in a jupyter notebook, and save the output as variable:

# define the Arguments class (more convinient since iPython doesn't recognize argparse arguments)
# these are user-specified arguments

# infile = input readcounts matrix
# drugz_out_file = name of a file in which you will write the drugz results
# control_samples = the names of control samples (included in column names)
# drug_samples = the names of drug-treated samples (included in column names)
# unpaire = unpaired approach - compares mean(treated samples) to mean(control samples) 
# pseudocount = counts added to the observed readscounts, default = 5
# half_window_size = size of the first bin and half the size of the inital sample (window) to estimate std, default = 500 (for whole genome screens)

class Args:
    infile = "./sgRNA_count.txt"
    drugz_output_file = "./drugz_results.txt"
    fc_outfile = "./fc_results.txt"
    control_samples = 'T15_A_control,T15_B_control,T15_C_control'
    drug_samples = 'T15_A_olaparib,T15_B_olaparib,T15_C_olaparib'
    remove_genes = 'LacZ,luciferase,EGFR'
    unpaired = False
    pseudocount = 5
    half_window_size = 5 # 5 because of the size of test data set          (sgRNA_count.txt = 9 guides (i.e. rows))
    
drugz_results = dz.drugZ_analysis(Args())

For more option check drugZ_in_jupyter_notebook_tutorial.html

drugz's People

Contributors

mcolic avatar traverhart avatar usajusaj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drugz's Issues

Error messages while running drugz

Greetings,

I am trying to run Drugz by using the example provided here (I have copied and pasted the example into a tab-delimited txt file); however, I am getting an error:

python drugz.py -i drugz_trial_unix.txt -o drugz-output.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib

INFO:main:Initiating analysis
INFO:main:Loading the read count matrix
INFO:main:Normalizing read counts
INFO:main:Calculating raw fold change for replicate 1
Traceback (most recent call last):
File "drugz.py", line 478, in
main()
File "drugz.py", line 475, in main
drugZ_analysis(args)
File "drugz.py", line 450, in drugZ_analysis
fc_zscore_id='zscore_fc_{replicate}'.format(replicate=i))
File "drugz.py", line 187, in empirical_bayes
results = fold_change.iloc[no_of_guides - (half_window_size + 1)][empirical_bayes_id]
File "/home/annadv/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 879, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/annadv/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1496, in _getitem_axis
self._validate_integer(key, axis)
File "/home/annadv/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1437, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

I am attaching the input file I tried to use. I have tried both Unix(LF) and Windows(CR LF) formatting, but both led to the same errors.

I will greatly appreciate any help or advice.

Thank you very much.

Regards,
Anna
drugz_trial_unix.txt

I loaded python/2.7 but got this error

INFO:main:Initiating analysis
INFO:main:Loading the read count matrix
INFO:main:Normalizing read counts
INFO:main:Calculating raw fold change for replicate 1
INFO:main:Caculating smoothed Epirical Bayes estimates of stdev for replicate 1
INFO:main:Caculating guide-level Zscores for replicate 1
Traceback (most recent call last):
File "drugz.py", line 478, in
main()
File "drugz.py", line 475, in main
drugZ_analysis(args)
File "drugz.py", line 458, in drugZ_analysis
fold_change =pd.concat(fold_changes, axis=1, sort=False)
TypeError: concat() got an unexpected keyword argument 'sort'

Error while running drugz

Hi,

I am getting the following error while running drugz. Do you have any idea why it might be happening?

Traceback (most recent call last):
File "/Users/bibaswan/Documents/programs/drugz/drugz.py", line 242, in
main()
File "/Users/bibaswan/Documents/programs/drugz/drugz.py", line 238, in main
args.fc_outfile, remove_genes, args.pseudocount, args.minObs, args.half_window_size, args.index_column, not args.quiet)
File "/Users/bibaswan/Documents/programs/drugz/drugz.py", line 113, in drugz
if (ebstd >= fc[eb_std_samplid][i-1]):
File "/Users/bibaswan/anaconda/lib/python3.5/site-packages/pandas/core/series.py", line 623, in getitem
result = self.index.get_value(self, key)
File "/Users/bibaswan/anaconda/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2560, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/_libs/index.pyx", line 83, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 91, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 811, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 817, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 499

Thanks,

Bibaswan

Error while running drugZ_w_modules.py

Hi Medina,
When I was running drugZ_w_modules.py, I got the error information:

File "drugZ_w_modules.py", line 347, in drugZ_analysis
fold_change =pd.concat(fold_changes, axis=1, sort=False)
TypeError: concat() got an unexpected keyword argument 'sort'

It is OK when I was using DrugZ.py

Best,
Wenjun

Your FDR calculation is wrong

drugz/drugz.py

Line 169 in 015c7fd

drugz_minobs['fdr_synth'] = drugz_minobs['pval_synth']*numGenes/drugz_minobs['rank_synth']

Hi,

I'm a bioinformatician developer working at Horizon Discovery and I was recently asked by one of the scientists to explain why FDRs they'd calculated using your software were > 1.

After a bit of digging into the method you were using and the drugz code, I discovered that you're missing the part of the equation that corrects when p * (n/r) is > the the previous p value (in rank order).

I'm more than happy to provide you with some code to pull request which fixes this, it's a little bit more involved, but not overly so, if that's helpful to you.

Regardless, I thought you'd probably want to know.

I've verified my new values using other in-built FDR calculations e.g. R's and they're correct. I can provide the unit tests as well, if they're helpful.

Anyway, just wanted to let you know. Thanks for your time and for contributing so strong to the open source community!

Dr John McGonigle

Running drugz in drugz-mean mode

Hi there,
Thanks for releasing drugz, greatly appreciated.

I have some instances where e.g. one of the treatments or controls is missing. In the paper you stated the paired-sample approach does not appear to offer significant benefits over an unpaired approach: when taking the mean fold change across experimental samples and comparing it to the mean fold change across control samples (Additional file 1: Figure S4A), the results are nearly identical to analysis of three paired samples

I was wondering if there was a way to enable drugz-mean when the number of controls doesn't equal the number of treated samples, to keep my pipeline tidier (i.e. not having to resort to other algorithms)?

Best regards,
Miika

foldchange report with no lines

Hi,

I run the drugz soft on readcount files downloaded from a published study. The soft runs fine with no error and drugz report is complete. However, foldchange file does not output any line.

File IR.drugz-output.tsv contains 17943 lines. Here are the head output of file:

head IR.drugz-output.tsv
GENE sumZ numObs normZ pval_synth rank_synth fdr_synth pval_supp rank_supp fdr_supp
LIG4 -26.06 8 -8.96 1.6e-19 1 2.88e-15 1 17942 1
NHEJ1 -25.13 8 -8.64 2.86e-18 2 2.57e-14 1 17941 1
ATM -19.75 6 -7.83 2.45e-15 3 1.47e-11 1 17940 1
FAM35A -19.65 8 -6.73 8.74e-12 4 3.92e-08 1 17939 1
PNKP -16.08 8 -5.48 2.09e-08 5 7.48e-05 1 17938 1
AMBRA1 -15.48 8 -5.27 6.72e-08 6 0.000201 1 17937 1
C7orf49 -14.39 8 -4.89 4.93e-07 7 0.00126 1 17936 1
C20orf196 -14.25 8 -4.84 6.36e-07 8 0.00131 1 17935 1
RNF168 -14.23 8 -4.84 6.56e-07 9 0.00131 1 17934 1

Files IR.drugz-foldchange.tsv contains 0 lines

I called drugz the following way:
python /ip29/marechal_group/programs/drugz/drugz.py -i Dataset_S1_readcounts.txt -o IR.drugz-output.tsv -f IR.drugz-foldchange.tsv -c S08_NT_T18_A,S08_NT_T18_B -x S08_IR_T18_A,S08_IR_T18_B

Thanks alot for your help,
JF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.