ambj / mupexi Goto Github PK

View Code? Open in Web Editor NEW

43.0 43.0 27.0 4.31 MB

MuPeXI: the mutant peptide extractor and informer, a tool for predicting neo-epitopes from tumor sequencing data.

License: Other

Python 100.00%

mupexi's People

Stargazers

Watchers

mupexi's Issues

surprising Permission denied

is the author ambj still working on DTU?

broken link for test.vcf and expression_test.vcf

I'm trying to test installation using the test files described on you main page. But, the links to those files don't work. Would you check into this for me please?

Skipping VCF compatibility subroutine

Hi there,

I am working with a VCF file produced by HaplotypeCaller in the gatk 4.0 suite. This type of file gets hung up in MuPeXI during the check of whether it's from MuTect or not, and making a "vep_compatible" VCF only outputs the header. However, this VCF does work fine directly in VEP using the --format vcf option,

To fix this and test out MuPeXI, I simply bypassed create_vep_compatible_vcf() in the setup and passed --format vcf into VEP. See here for the diff.

I'm wondering if 1) this seems like a good idea, and 2) if you would consider adding a flag to allow using the input VCF in VEP directly.

Thanks!

UnboundLocalError: local variable 'geneID' referenced before assignment

I'm trying to run MuPeXI pipeline with test data, but there is something wrong and I can't fix it.
The command line is:
./MuPeXI.py -v ./data/example.vcf -e data/example_expression.tsv -a HLA-A01:01,HLA-B08:01

Then I got:
Reading in data
Creating expression file dictionary
Creating proteome reference dictionary
Traceback (most recent call last):
File "./MuPeXI.py", line 1800, in
main(sys.argv[1:])
File "./MuPeXI.py", line 44, in main
proteome_reference, sequence_count = build_proteome_reference(paths.proteome_ref_file, input.webserver, species)
File "./MuPeXI.py", line 280, in build_proteome_reference
proteome_reference[geneID][transID] += line.strip()
UnboundLocalError: local variable 'geneID' referenced before assignment_

How can I solve this?
Thanks in advance.

NetMHCpan-4.0 need a criteria about EL, not mentioned in MuPexi，how to select?

we can see in paper NetMHCpan-4.0, it clearly told reader the criteria about specifity. and the example result also underline weak and strong.
MuPeXI add similarity weight, and bring up a new concept about priority score, but do nor give no criteria about how to select..

and in NetMHCpan-4.0, it add EL data, because the BA data seems not that accurate, MuPeXI use 3.0 in its paper, so the output does not contain Norm_MHCrank_EL, Mut_MHCrank_EL.
I guess the value may also refers to percentile rank, and we also can select value less than 2(2% in fact). , but now it add the normal peptide, so it is hard for me to select

so due to the MuPeXI paper use 3.0 data, not 4.0, but compatible with 4.0, so we are in urgent helpless,
what the criteria of priority_score, what the mapping transformation between EL value and priority_score here.
we are eager to see a clear priority score like NetMHCpan-4.0, such as strong and weak

AssertionError: amino acid sequence length (80) less than mutation position 81

I am getting the following error when I try and run MuPeXI on one of my vcf files.

Reading in data
Creating proteome reference dictionary
Creating genome reference dictionary
Creating cancer genes list

VEP: Starting process for running the Ensembl Variant Effect Predictor
Detecting variant caller
MuTect2
Change VCF to the VEP compatible
Extracting allele frequencies
Running VEP
Creating mutation information dictionary

MuPeX: Starting mutant peptide extraction
Extracting all possible peptides from reference
Peptides of 9 aa are being extracted
Peptide extraction begun
Traceback (most recent call last):
File "/home/arunimas/MuPeXI/MuPeXI.py", line 1807, in
main(sys.argv[1:])
File "/home/arunimas/MuPeXI/MuPeXI.py", line 78, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "/home/arunimas/MuPeXI/MuPeXI.py", line 730, in peptide_extraction
peptide_sequence_info = mutation_sequence_creation(mutation_info, proteome_reference, genome_reference, p_length)
File "/home/arunimas/MuPeXI/MuPeXI.py", line 763, in mutation_sequence_creation
peptide_sequence_info = insertion_peptide(proteome_reference, mutation_info, peptide_length, PeptideSequenceInfo)
File "/home/arunimas/MuPeXI/MuPeXI.py", line 789, in insertion_peptide
asserted_proteome = reference_assertion(proteome_reference, mutation_info, reference_type = 'proteome')
File "/home/arunimas/MuPeXI/MuPeXI.py", line 1073, in reference_assertion
assert len(seq) >= mutation_info.prot_pos, 'amino acid sequence length ({}) less than mutation position {}'.format(len(seq), mutation_info.prot_pos)
AssertionError: amino acid sequence length (80) less than mutation position 81

I run MuPeXI with

/home/arunimas/MuPeXI/MuPeXI.py -v header.vcf -a HLA-A01:01,HLA-A32:01,HLA-B08:01,HLA-B14:01,HLA-C07:01,HLA-C08:02 -c /home/arunimas/MuPeXI/config.ini -t

I've attached a minimal vcf file which reproduces this error
header.vcf.gz

Is there anything I can do to fix this myself?

Mouse reference sequences

Hi,

Thanks so much for making this package available, this is a brilliant resource, especially for neoantigen prediction in mice. We are trying to call neoantigens in a tumor derived from a BALB/c background, and this creates certain issues around reference sequences. I note that you recommend aligning to the BALB/c-specific reference genome from the Sanger, I believe that has different coordinates to the Mouse Genome Project SNP file that you recommend (ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/BALB_cJ.mgp.v5.snps.dbSNP142.vcf.gz), as these files are correspond to BALBc-specific mutations when reads are aligned to GRCm38 genome, and thus are incompatible (unless I am mistaken). As a result these files are not compatible. To complicate matters further, in our experience that BALB/c reference has significant gaps even in coding regions and indeed the Sanger paper where these strain-specific assemblies was published alludes to a substantially higher error rate versus GRCm38 (https://www.nature.com/articles/s41588-018-0223-8#Sec2). We have come to the conclusion that we should use the GRCm38 to align our BALB/c reads, especially as GRCm38 (cf. GRCm39) includes patches that correspond to strain-specific haplotypes. We use the pan-strain SNP and indels from the Sanger Mouse Genome Project for base quality score recalibration and then call mutations using Strelka2.

I was wondering if you had any advice about neoantigen calling for BALB/c data as we are planning. My feeling is that the best universal approach is to align everything to GRCm38 and then use the cDNA and peptides derived from this reference (i.e. available here http://ftp.ensembl.org/pub/release-89/fasta/mus_musculus/), as this is designed to capture majority of variation across most strains. Would really appreciate your thoughts on the question.

Kind regards,

Dr Sam Kleeman MD
PhD Student
Cold Spring Harbor Laboratory, NY

Transcripts do not correspond to the correct amino acid change

Hi,

MuPeXI and VEP files can be found here:
http://incpm-2.weizmann.ac.il/bioinfo/gil/mupexi_bug/

Specifically for the gene XBP1 ENSG00000100219,
I notice several transcripts in the VEP and the have different amino acid changes:
ENST00000344347 has S/X
ENST00000216037 has Q/X

Nonetheless, they both appear as Q/X in the mupexi file.

All the best,

Gil Hornung

Incorrect translation of an insertion mutation

Hi Anne-Mette,

I noticed a case in which MuPeXI placed an insertion in the wrong place, and caused the wrong amino acid to appear in a peptide.

As you can see in the IGV snapshot below, there is an insertion of a C in the sequence AGAGG, just after the AGA. This leads to the codons [AGA][CGG] which translates to amino acid sequence RR. However, MupeXI translated this sequence as RA, suggesting that the mutated sequence is [AGA][GCG]

You can find all the relevant files (vcf, vep and mupexi) in the link:
https://owncloud.incpm.weizmann.ac.il/owncloud/index.php/s/CzZS5bh53pHTxPP

All the best,

Gil

gatk4.0.11.0 normal in the first and tumor in the latter, so allele frequence gives the normal sample now, a wrong value, vcf is different from the example.vcf given by the author

let us talk about the Monotonicity of the function calculating priority_score

can you give some advice, thanks a lot

Gene symbols being split at hyphens '-'

I came across an issue with the gene MT-ND4, the MuPeXI output gene symbol is "MT".
I suspect that it is to do with some overreaching line.split('-') or something, hopefully an easy fix!
I attach an example VCF file with a mutation replicating this issue.

I should add that when I run this file through VEP externally, the gene symbol returns as expected.

mito_hyphengenesymbol_example.vcf.zip

Thanks

error about vep output file empty

hello, when i run the sample data , the commad as follows: pyhton MuPeXI.py -v home/jm/mupexi/data/example.vcf -a HLA-A01:01 -e expression.tsv -l 9 -c home/jm/mupexi/config.ini -o home/jm/mupexi/data/output
had the error about vep output file empty. can not use transcript reference sequences(--use_transcript_ref) without a FASTA file (see --fasta); you may wish to use --use_given_ref
Vep can run normally alone. can you tell me what's the possible reasons. Thank you!

What does an error message mean?

python ../../software/MuPeXI/MuPeXI.py -v data/example.vcf -a HLA-A01:01,HLA-B08:01 -c ../../software/MuPeXI/config.ini -e data/example_expression.tsv

Reading in data
Creating expression file dictionary
Creating proteome reference dictionary
Creating genome reference dictionary
Creating cancer genes list

MuPeX: Starting mutant peptide extraction
Extracting all possible peptides from reference
Peptides of 9 aa are being extracted
Peptide extraction begun
Running 9 aa normal peptide match
Traceback (most recent call last):
File "../../software/MuPeXI/MuPeXI.py", line 1785, in
main(sys.argv[1:])
File "../../software/MuPeXI/MuPeXI.py", line 78, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "../../software/MuPeXI/MuPeXI.py", line 729, in peptide_extraction
peptide_info, pepmatch_file_names = normal_peptide_correction(mutated_peptides_missing_normal, mutation_info, p_length, reference_peptide_file_names, peptide_info, peptide_match, tmp_dir, pepmatch_file_names, webserver, print_mismatch, num_mismatches)
File "../../software/MuPeXI/MuPeXI.py", line 949, in normal_peptide_correction
pepmatch_file = run_peptide_match(mutpeps_file, peptide_length, peptide_match, reference_peptide_file_names, mutation_info, tmp_dir, webserver, print_mismatch, num_mismatches)
File "../../software/MuPeXI/MuPeXI.py", line 974, in run_peptide_match
process_pepmatch = subprocess.Popen([peptide_match, '-thr' , str(num_mismatches), mutpeps_file.name, reference_peptide_file_name], stdout = pepmatch_file)
File "/gscmnt/gc2737/ding/hsun/software/miniconda2/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/gscmnt/gc2737/ding/hsun/software/miniconda2/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

Unable to access the file http://www.cbs.dtu.dk/services/MuPeXI/SupplTable1.csv described at "MuPeXI: prediction of neo-epitopes from tumor sequencing data"

Dear MuPeXI developers,

I read your paper entitled "MuPeXI: prediction of neo-epitopes from tumor sequencing data" at https://doi.org/10.1007/s00262-017-2001-3. In Tables S1, it says that "The full table can be found at URL: http://www.cbs.dtu.dk/services/MuPeXI/SupplTable1.csv". Unfortunately, the URL does not work anymore. Do you know if there is any way to gain access to the full tables? More specifically, I would like to find the raw peptide-MHC pairs that were subject to the multimer-staining and/or T-cell reactivity assays.

Kind regards,
Xiaofei

Please note that this is an issue related to the data availability of the manuscript (so this is not an issue related to the code).

long FASTA output

Hi,
I have a question about the FASTA output from MuPeXI web server. I wonder that the FASTA lists contain only peptides from the prediction with HLA that I input or contain all non-synonymous mutations from input VCF file?

Best,
Phorutai

AssertionError: Allele "HLA-B*07:02" not stated in NetMHCpan output

I'm trying to run MuPeXI pipeline with test data. There seems to be a problem with NetMHCpan, but I can't seem to fix it. I have checked the paths requested by the tool in its installation, as well as the MuPeXI config.ini file. I have also changed the permissions to the $NETMHCpan/data/version file, giving write and read permissions for other users. Moreover, I hace also added netmhcpan executable to my user $PATH.

I am working with a conda eviroment to be able to use Python 2.7 and the required versions of pandas, numpy, vep and biopython, since I am running on an HPC with Slurm as task manager. The NetMHCpan-4.0 tool is out of this enviroment, but works separately.

The data/allelenames file does contain the allele HLA-B07:02 it says it cannot find.

Any idea what might be happening?
Thanks in advance

(mupexi) [uscmgmfp@login209-18 data]$ ../MuPeXI.py -v example.vcf -c /mnt/lustre/scratch/nlsas/home/usc/mg/translational_oncology/1_tools/24_mupexi/MuPeXI/config.ini -e example_expression.tsv -a HLA-A*01:01,HLA-A*11:01,HLA-B*07:02,HLA-B*35:01 -t

Reading in data
	Creating expression file dictionary
	Creating proteome reference dictionary
	Creating genome reference dictionary
	Creating cancer genes list

VEP: Starting process for running the Ensembl Variant Effect Predictor
	Detecting variant caller
		MuTect2
	Change VCF to the VEP compatible
	Extracting allele frequencies
	Running VEP
	Creating mutation information dictionary

MuPeX: Starting mutant peptide extraction
	Extracting all possible peptides from reference
		Peptides of 9 aa are being extracted
	Peptide extraction begun
		Running 9 aa normal peptide match

MuPeI: Starting mutant peptide informer
	Writing temporary peptide file
	Running NetMHCpan eluted ligand prediction
Unable to open(r) file $NETMHCpan/data/version
	Creating NetMHCpan eluted ligand prediction file dictionary
	Writing output file
Traceback (most recent call last):
  File "../MuPeXI.py", line 1800, in <module>
    main(sys.argv[1:])
  File "../MuPeXI.py", line 101, in main
    output_file = write_output_file(peptide_info, expression, net_mhc_BA, net_mhc_EL, unique_alleles, cancer_genes, tmp_dir, input_.webserver, input_.print_mismatch, allele_fractions, input_.expression_type, transcript_info, reference_peptides, proteome_reference, protein_positions, version)
  File "../MuPeXI.py", line 1261, in write_output_file
    assert hla in net_mhc_EL, 'Allele "{}" not stated in NetMHCpan output'.format(hla)
AssertionError: Allele "HLA-B*07:02" not stated in NetMHCpan output

AssertionError: Allele "HLA-A02:01" not stated in NetMHCpan output

I am getting the following error running mupexi. I appreciate any help in this regard.

AssertionError: Allele "HLA-A02:01" not stated in NetMHCpan output

Command
/opt/MuPeXI/MuPeXI.py -t -n -v /data/Mupexi/114_FU/VCF/114_FU-mutect2-ensemble.vcf -c /opt/MuPeXI/config.ini -a HLA-A02:01 -F 30 -l 8-11 -d /data /Mupexi/114_FU/Mupexi_out/HLA-A02:01/
mupexi.log

Ensembl references and VEP

@ambj god, you are here, I am urgent for your help. no version of vep has the variant_effect_predictor.pl and variant_effect_predictor, what should i do, help, really very urgent.
image

from the issue, I know you said you have tested ensemble_87, so I tried to keep the same with you, but I do not have
fasta = /your/path/to/references/human_GRCh38/GCA_000001405.15_GRCh38_full_analysis_set.fa
chain = /your/path/to/references/human_GRCh37/liftover/hg19ToHg38.over.chain
cDNA = /your/path/to/references/human_GRCh38/cDNA/Homo_sapiens.GRCh38.85.cdna.all.fa
pep = /your/path/to/references/human_GRCh38/pep/Homo_sapiens.GRCh38.85.pep.all.fa
cosmic = /your/path/to/references/cosmic/Census_allWed_Feb_17_09-33-40_2016.tsv

I tried to download the corresponding file, but I am afraid I am wrong. so I am here to confrim with you.
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz for chain
what is worse, there are several files in ensemble_87, which file should I choose in fastq pep cdna. I list the link here, hope you can point out for me.

the worsest thing is that I know cosmic vcf file, but never use cosmic tsv file, so can you tell me where to download.

ftp://ftp.ensembl.org/pub/release-87/fasta/homo_sapiens/pep/
ftp://ftp.ensembl.org/pub/release-87/fasta/homo_sapiens/cdna/
ftp://ftp.ensembl.org/pub/release-87/fasta/homo_sapiens/dna/.

I send some emails to your eamil [email protected], I do not whether there is some bug in your school`s email, many days no reply.

I know my questions will waste you much time, sorry for disturb. wish you publish more CNS paper.

Originally posted by @2236529177 in #14 (comment)

Issues when running on the web server

Hi, I have a somatic mutation filtered VCF file generated by GATK haplotypeCaller. I used that file to run MuPeXI on the web server, but got the following errors. Could you please let me know what's wrong with this? Thank you!

Traceback (most recent call last):
File "/usr/cbs/bio/src/MuPeXI-1.1/MuPeXI/MuPeXI.py", line 1577, in
main(sys.argv[1:])
File "/usr/cbs/bio/src/MuPeXI-1.1/MuPeXI/MuPeXI.py", line 78, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "/usr/cbs/bio/src/MuPeXI-1.1/MuPeXI/MuPeXI.py", line 667, in peptide_extraction
peptide_info, pepmatch_file_names = normal_peptide_correction(mutated_peptides_missing_normal, mutation_info, p_length, reference_peptide_file_names, peptide_info, peptide_match, tmp_dir, pepmatch_file_names, webserver, print_mismatch, num_mismatches)
File "/usr/cbs/bio/src/MuPeXI-1.1/MuPeXI/MuPeXI.py", line 772, in normal_peptide_correction
assert mutated_peptide in peptide_info
AssertionError

Kevin

Issue Installing MuPeXI

I'm having issues installing MuPeXI as it seems that Python 2.7 support has depreciated such that Biopython requires Python 3.6 or later.

Commands used:
conda create --name mupexi python=2.7
conda activate mupexi
pip install biopython

Error message:
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Collecting biopython
Using cached https://files.pythonhosted.org/packages/3d/2f/d9df24de05d651c5e686ee8fea3afe3985c03ef9ca02f4cc1e7ea10aa31e/biopython-1.77.tar.gz
ERROR: Command errored out with exit status 1:
command: /home/ubuntu/anaconda3/envs/mupexi/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-d0k_pY/biopython/setup.py'"'"'; file='"'"'/tmp/pip-install-d0k_pY/biopython/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-d0k_pY/biopython/pip-egg-info
cwd: /tmp/pip-install-d0k_pY/biopython/
Complete output (1 lines):
Biopython requires Python 3.6 or later. Python 2.7 detected.
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Allele_Frequency column is emitted empty

Hi,
I wonder why "Allele_Frequency" filed is empty i.e "-" in *.mupexi output.
I am using vcf from GATK4-Mutect2.

Regards,
Raj

ERROR：vep output file empty

Hi,I used web server and got this error.
ERROR: VEP output file empty VEP Can't use an undefined value as a symbol reference at /home/tuba/shared/bin/ensembl-tools-release-85/scripts/variant_effect_predictor/variant_effect_predictor.pl line 2473, line 507.
I don't know what's wrong with my vcf.It's obtained from Mutect2.I noticed that it can run by choosing submit Hg19... option.I'm pretty sure my reference genome is hg38.Is it my format or something else that has gone wrong?
chrY 56734819 . C G . . DP=26;ECNT=2;NLOD=3.56;N_ART_LOD=-8.451e-01;POP_AF=5.000e-08;P_CONTAM=1.283e-05;P_GERMLINE=-5.786e+00;TLOD=6.19 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/0:12,0:9.553e-03:11,0:1,0:0:142,0:0:0:0|1:56734819_C_G 0/1:10,2:0.215:4,2:6,0:38:142,145:38:20:0|1:56734819_C_G:0.172,0.00,0.167:0.015,0.032,0.953 chrY 56734828 . T A . . DP=26;ECNT=2;NLOD=3.61;N_ART_LOD=-1.141e+00;POP_AF=5.000e-08;P_CONTAM=9.876e-06;P_GERMLINE=-5.645e+00;TLOD=6.17 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/0:12,0:0.071:11,0:1,0:0:142,0:0:0:0|1:56734819_C_G 0/1:9,2:0.209:4,2:5,0:37:134,145:38:29:0|1:56734819_C_G:0.182,0.00,0.182:0.017,0.030,0.953

ps

ps *100 and round and int
the author is really not a responsible person， we do not know the AUC in the article is whether the priority_score used is int or not rounded float. the editor just ignored thta, really a big pity

IndexError: list index out of range

Hi,
I run MuPeXI with VCF file called from mutect2, but I got this error

MuPeI: Starting mutant peptide informer
Writing temporary peptide file
Running NetMHCpan
Creating NetMHCpan file dictionary
Traceback (most recent call last):
File "/apps/MuPeXI/MuPeXI.py", line 1637, in
main(sys.argv[1:])
File "/apps/MuPeXI/MuPeXI.py", line 98, in main
net_mhc = build_netMHC(netMHC_file, input_.webserver)
File "/apps/MuPeXI/MuPeXI.py", line 1092, in build_netMHC
rank = float(line[13])
IndexError: list index out of range

How should I fix this problem?

1

thanks a lot

Transition to NetMHCpan 4.0 is not documented

Hi Anne-Mette,

I'm trying to work with the latest MuPeXI version (1.2)

I noticed that you started working with NetMHCpan 4, but the README and the config.ini file still reference to NetMHCpan 3.0.

Gil

MuPeXI finds peptide in a transcript which is labeled as "Nonsense mediated decay"

Dear Anne-Mette,

I came across an issue in MuPeXI.
I have a mutation in 3:48605804 (GRCH38, see below for full vcf line), which occurs in the gene UQCRC1.
MuPeXI finds peptides in the ENST00000415995 transcript, which is actually a "Nonsense mediated decay" transcript, based on its ensembl annotation. Obviously, it should not be considered as a source for peptides.

All the best,

Gil

vcf line:

3       48605804        .       C       T       122.87  PASS    .       GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:179,62:0.257:.:.:.:.:.:.     0/0:121,0:0:.:.:.:.:.:.

one of the peptides in the mupexi file:

HLA-C06:02	.G.......	6.5456	0.025697	9511.3	3.6350	0.153380	MRQATFWSI	0.2283	0.408440	223.3	0.0797	0.500108	ENSG00000010256	ENST00000415995	G/R	0.257	1	2	3	48605804	43	M	UQCRC1	No	No	0.411875	9.998579e-01	1.346961e-10	0.4714048	12

where can I get the file 'reference_peptide_9.txt'?

Please specify the binary path to the references used (optional)

cDNA = your/path/to/human_GRCh38/cDNA/Homo_sapiens.GRCh38.78.cdna.all.fa
pep = your/path/to/human_GRCh38/pep/Homo_sapiens.GRCh38.78.pep.all.fa
cosmic = your/path/to/cosmic/Census_allWed_Feb_17_09-33-40_2016.tsv
pep9 = your/path/to/reference_peptide_9.txt

where can I get the file 'reference_peptide_9.txt' ?

a big bug? the tmpClyJgf exists, but it is empty, so can not read. how should I reviese it?

thanks a lot, urgent for help

Assert mutated_peptide in peptide_info failed

In on some samples (it works on others) the python script failed with this error:

File "MuPeXI.py", line 79, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "MuPeXI.py", line 720, in peptide_extraction
peptide_info, pepmatch_file_names = normal_peptide_correction(mutated_peptides_missing_normal, mutation_info, p_length, reference_peptide_file_names, peptide_info, peptide_match, tmp_dir, pepmatch_file_names, webserver, print_mismatch, num_mismatches)
File "MuPeXI.py", line 825, in normal_peptide_correction
assert mutated_peptide in peptide_info_

I’ve narrowed it down to one indel:
chr14 22482290 . T TCCG

and use the following command line:
python MuPeXI.py -v /data/DO/p2/final/2017-09-21_p2/batch2-ensemble-annotated_indel.vcf.recode.vcf -a HLA-A02:06 -c ./config.ini -d /data/DO/p2/final/2017-09-21_p2/mupexi/ -l 9

The value of mutated_peptide is “XIRQGAQKL”

and the value of peptide_info is (which doesn't contain the peptide):

defaultdict(<type 'dict'>, {'RQGAQKLVF': {'IQGAQKLVF': [mutation_info(gene_id='ENSG00000211836', trans_id='ENST00000390484',
mutation_consequence='inframe_insertion', chr='14', pos='22482290-22482291', cdna_pos='4-5', prot_pos=2, prot_pos_to=None, aa_normal='I', aa_mut='IR',
codon_normal='att', codon_mut='atCCGt', alt_allele='CCG', symbol='TRAJ54'), peptide_sequence_info(chop_normal_sequence='-----------',
mutation_sequence='XIRQGAQKLVF', normal_sequence='XIQGAQKLVF', mutation_position='2:4', consequence='I'), '1:2', pep_match_info(normal_peptide='IQGAQKLVF',
mismatch=1, mismatch_peptide='---------')]}, 'IRQGAQKLV': {'IRQGKAKLV': [mutation_info(gene_id='ENSG00000211836', trans_id='ENST00000390484',
mutation_consequence='inframe_insertion', chr='14', pos='22482290-22482291', cdna_pos='4-5', prot_pos=2, prot_pos_to=None, aa_normal='I', aa_mut='IR',
codon_normal='att', codon_mut='atCCGt', alt_allele='CCG', symbol='TRAJ54'), peptide_sequence_info(chop_normal_sequence='-----------',
mutation_sequence='XIRQGAQKLVF', normal_sequence='XIQGAQKLVF', mutation_position='2:4', consequence='I'), '1:3', pep_match_info(normal_peptide='IRQGKAKLV',
mismatch=2, mismatch_peptide='---------')]}})

Any thoughts?

Cheers,
Andrew

AF ignored - GATK4 MuTect2 case-change

MuPeXI/MuPeXI.py

Line 417 in 4430334

if 'ID=MuTect2,' in line:

MuPeXI does not recognise GATK4's new Mutect2-generated VCF files due to the case-change of the program name (MuTect2 -> Mutect2). The VCF files still report AF in the same way.

Case-change explained here:
https://gatkforums.broadinstitute.org/gatk/discussion/10911/differences-between-gatk3-mutect2-and-gatk4-mutect2

Variant Effect Predictor version compatibility

latest version of VEP as of 05/01/2018 is 92. No version requirement is stated on the main MuPeXI git page. MuPeXI doesn't seem immediately compatible with this to start with as the program variant_effect_predictor.pl has been renamed to vep (plus other changes)

In the MuPeXI user manual shows usage of version 85.

However, this version is no longer available.

Version 87 could possibly work but there are (I believe) 27 releases (87.0-87.27). Not sure if that level matters.

What would help the most would be specification of which latest version is known to function well.

ERROR VEP output file empty

Hi, I got an error after I type this command
$python MuPeXI.py -v ../Variant_calling/Strelka2_result/case1.strelka.snvs.vcf -a HLA-A02:01 -o case1_strelka -s human -c config.ini

ERROR:
Reading in data
Creating proteome reference dictionary
Creating genome reference dictionary
Creating cancer genes list

VEP: Starting process for running the Ensembl Variant Effect Predictor
Detecting variant caller
Variant caller not detected in VCF file.
NOTE: Genomic allele frequency is only taken into account
with variant calls from MuTect or MuTect2!
Change VCF to the VEP compatible
Extracting allele frequencies
Running VEP
ERROR: VEP output file empty
VEP
-------------------- EXCEPTION --------------------
MSG: ERROR: Cache directory /home/onnicha/Thesis/ensembl-vep/homo_sapiens not found

STACK Bio::EnsEMBL::VEP::CacheDir::dir /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:311
STACK Bio::EnsEMBL::VEP::CacheDir::init /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:227
STACK Bio::EnsEMBL::VEP::CacheDir::new /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/CacheDir.pm:111
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:115
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:91
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /home/onnicha/Thesis/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /home/onnicha/Thesis/ensembl-vep/vep:224
Date (localtime) = Wed Aug 15 23:50:23 2018
Ensembl API version = 93

Thanks

Will MuPeXI become compatible with NetMHCpan 4.1?

I understand that MuPeXI only works with NetMHCpan 4.0. Will it be upgraded in the future, or left as it is?

Thanks!

config.ini file for mouse

Could you provide config.ini file for mouse references files to be included

Reference Files for the given example VCF

Hi,

I am trying to run MuPeXi on the given example VCF (https://raw.githubusercontent.com/ambj/MuPeXI/master/data/example.vcf) using the below command but end up with error:

MuPeXI.py -v data/example.vcf -a HLA-A01:01,HLA-A02:02

I am using homo_sapiens_refseq_vep_104_GRCh38.tar.gz for VEP cache (http://ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/homo_sapiens_vep_104_GRCh38.tar.gz) and fasta files downloaded from http://ftp.ensembl.org/pub/release-95/fasta/homo_sapiens as the reference.

Please find the error message pasted below:

Reading in data
Creating proteome reference dictionary
Traceback (most recent call last):
File "MuPeXI/MuPeXI.py", line 1800, in
main(sys.argv[1:])
File "MuPeXI/MuPeXI.py", line 44, in main
proteome_reference, sequence_count = build_proteome_reference(paths.proteome_ref_file, input_.webserver, species)
File "MuPeXI/MuPeXI.py", line 280, in build_proteome_reference
proteome_reference[geneID][transID] += line.strip()
UnboundLocalError: local variable 'geneID' referenced before assignment

Could I request you to confirm I am using the correct references to run the example VCF?

I appreciate your help!

Thanks in advance.

Regards,
Ashmitaa.

Incorrect translation of a complex mutation

Hi Anne-Mette,

I have a vcf with a complex mutation (a phasing of two substitutions in close proximity). I noticed that MuPeXI translates the wrong sequence from this mutation. For example, I see a peptide MQLMPFGSQL, but the sequence should actually be MQLMPFGSLL.

I am using MuPeXI 1.2

You can find all the relevant files in:
https://owncloud.incpm.weizmann.ac.il/owncloud/index.php/s/V86PPIQyQL2yBHp

All the best,

Gil

ambj / mupexi Goto Github PK

mupexi's People

Stargazers

Watchers

Forkers

mupexi's Issues

Regards, Raj

Please specify the binary path to the references used (optional)

Recommend Projects

Recommend Topics

Recommend Org

Regards,
Raj