Giter VIP home page Giter VIP logo

oncokb-annotator's Introduction

OncoKB Core

Repository for OncoKB, a precision oncology knowledge base.

The core of OncoKB Annotation service.

Status

Application CI Unit Tests Release Management Sentrey Release

Info

Running Environment

Please confirm your running environment is:

  • Java version: 8
  • MySQL version: 5.7.28

Prepare properties files

cp -r core/src/main/resources/properties-EXAMPLE core/src/main/resources/properties

Properties file

  1. database.properties
    • jdbc.driverClassName : We use mysql as database. Here, it will be com.mysql.jdbc.Driver
    • jdbc.url: Database url
    • jdbc.username & jdbc.password: MySQL user name and password
  2. config.properties

Build the WAR file

mvn clean install -P public -DskipTests=true

The WAR file is under /web/target/

Deploy with frontend

Please choose one of the profile when building the war file

  • curate - core + API + curation website
  • public - core + API + public website (deprecated)

You could find specific instructions in curate or public repo,

Run with Docker containers

OncoKB™ is a precision oncology knowledge base developed at Memorial Sloan Kettering Cancer Center that contains biological and clinical information about genomic alterations in cancer. OncoKB uses Genome Nexus to annotate genomic change to protein change using OncoKB picked transcripts. By default, the API requests are sent to www.genomenexus.org for GRCh37 and grch38.genomenexus.org for GRCh38. However, you can choose to use a local version of Genome Nexus by following the instructions for Option A, otherwise follow instructions for Option B.

OncoKB docker compose file consists of the following services:

  • OncoKB: provides variant annotations

  • OncoKB Transcript: serves OncoKB metadata including gene, transcript, sequence, etc.

  • Genome Nexus: provides annotation and interpretation of genetic variants in cancer

    • GRCh37 (optional):
      • gn-spring-boot: the backend service responsible for aggregating variant annotations from various sources
      • gn-mongo: variants fetched from external resources and small static data are cached in the MongoDB database
      • gn-vep: is a spring boot REST wrapper service for VEP using GRCh37 data
    • GRCh38 (optional):
      • gn-spring-boot-grch38: same as gn-spring-boot service, however the VEP URL points to gn-vep-grch38
      • gn-mongo-grch38: contains static data relevant to GRCh38
      • gn-vep-grch38: a spring boot REST wrapper service for VEP using GRCh38 data

Option A: With Local installation of Genome Nexus

For this option, you need to download the VEP cache, which is used in the gn-vep and gn-vep-grch38 services. We have pre-downloaded the VEP data and saved them to our AWS S3 Bucket. If interested, here are the instructions we followed to download the Genome Nexus VEP Cache.

  1. OncoKB requires a MySQL server and the oncokb and oncokb-transcript databases imported. This step must be completed before continuing the installation process. Reach out to [email protected] to get access to the data dump.

  2. Download the Genome Nexus VEP data from our AWS S3 Bucket.

    # The home directory is used to store the VEP cache in this tutorial, but this can be changed to your preferred download location.
    cd ~
    mkdir gn-vep-data && cd "$_"
    
    mkdir 98_GRCh37 && cd "$_"
    curl -o 98_GRCh37.tar https://oncokb.s3.amazonaws.com/gn-vep-data/98_GRCh37/98_GRCh37.tar
    tar xvf 98_GRCh37.tar
    
    cd ..
    mkdir 98_GRCh38 && cd "$_"
    curl -o 98_GRCh38.tar https://oncokb.s3.amazonaws.com/gn-vep-data/98_GRCh38/98_GRCh38.tar
    tar xvf 98_GRCh38.tar
    
  3. Set environment variable for the location of VEP caches

    # Update path if the VEP data was installed elsewhere
    export VEP_CACHE=~/gn-vep-data/98_GRCh37
    export VEP_GRCH38_CACHE=~/gn-vep-data/98_GRCh38
    
  4. Run docker-compose to create containers.

    docker-compose --profile genome-nexus up -d
    

    Note: The --profile argument is used as a way to selectively enable services. Services with the genome-nexus profile will only be spun up when the profile is specified.

Option B: Without local installation of Genome Nexus

  1. OncoKB requires a MySQL server and the oncokb and oncokb-transcript databases imported. This step must be completed before continuing the installation process. Reach out to [email protected] to get access to the data dump.
  2. Remove -Dgenome_nexus.grch37.url and -Dgenome_nexus.grch38.url properties from the oncokb service.
  3. Run docker-compose to spin up oncokb and oncokb-transcript services
    docker-compose up -d
    

Additional Information

Generating oncokb-transcript token

The docker compose file has a pre-generated oncokb-transcript JWT token, which is required to make API requests to the oncokb-transcript service. To generate the JWT token, go to the https://jwt.io/ website and follow these instructions:

  1. Add the auth key and set it to ROLE_ADMIN to grant roles. The payload section should look something like this:
    {
        "sub": "1234567890",
        "name": "John Doe",
        "auth":"ROLE_ADMIN",
        "iat": 1516239022
    }
    
  2. In the Verify Signature section, check the box secret base64 encoded. Copy and paste the oncokb-transcript base64 secret into the input box.
    • You can also change the default base64 secret used for encoding by generating a base64 string and add the environment variable, JHIPSTER_SECURITY_AUTHENTICATION_JWT_BASE64_SECRET: <new-base64-string>, to oncokb-transcript.
  3. Replace -Doncokb_transcript.token with the JWT token you generated.

Generating new VEP data

OncoKB predownloads VEP data and saves it to AWS S3 bucket. These steps are for OncoKB developers and show how to download and upload new Ensembl VEP data to S3. However, you can follow along and save VEP data to your own S3 bucket.

  1. Change Ensembl image in genome-nexus-vep Dockerfile to desired version
  2. Follow instructions to download VEP cache files and FASTA files for GRCh37 and GRCh38.
  3. After downloading your directory should like:
VEP_CACHE/
├─ homo_sapiens/
│  ├─ 98_GRCh37/
│  ├─ 98_GRCh38/
  1. Zip the files
tar cf 98_GRCh37.tar homo_sapiens/98_GRCh37
tar cf 98_GRCh38.tar homo_sapiens/98_GRCh38
  1. Go to AWS S3 webpage and under oncokb/gn-vep-data/, create two folders:
98_GRCh37/
98_GRCh38/
  1. Upload tar files to corresponding S3 folders
  2. Make the two S3 folders (oncokb/gn-vep-data/98_GRCh37/ and oncokb/gn-vep-data/98_GRCh38/) publicly accessible
  3. Update gn-vep and gn-vep-grch38 services in docker-compose.yml
Modify environment variable to point to the new FASTA file

gn-vep
VEP_FASTAFILERELATIVEPATH=homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

gn-vep-grch38
VEP_FASTAFILERELATIVEPATH=homo_sapiens/98_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
  1. Modify Dockerfile line in genome-nexus-vep to use the new Ensembl VEP image. As of 4/28/2023, genome-nexus-vep uses ensemblorg/ensembl-vep:release_98.3.
  2. Push new genome-nexus-vep image to DockerHub
  3. Change the image for both gn-vep and gn-vep-grch38 to the image built in step 7.

Questions?

The best way is to send an email to [email protected] so all our team members can help.

oncokb-annotator's People

Contributors

amcpherson avatar benglasstone avatar darasanchez avatar dependabot[bot] avatar jjgao avatar leowisd avatar sheridancbio avatar victoria34 avatar ygindinrevmed avatar zhx828 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oncokb-annotator's Issues

typo

In the help menu —
python ~/git/oncokb-annotator/MafAnnotator.py -h
Hugo_Symbol instead of Amino_Acid_Change

Issue with ClinicalDataAnnotator.py, with LEVEL_R2

$ python /data/MoCha/patidarr/oncokb-annotator-1.1.0/ClinicalDataAnnotator.py -i 283228/20170910/oncoKB/283228.clinical.txt -o 283228/20170910/oncoKB/283228.oncoKB.clinical.txt -a 283228/20170910/oncoKB/283228.maf,283228/20170910/oncoKB/283228.cnv
annotating 283228/20170910/oncoKB/283228.clinical.txt...
Traceback (most recent call last):
  File "/data/MoCha/patidarr/oncokb-annotator-1.1.0/ClinicalDataAnnotator.py", line 53, in <module>
    main(sys.argv[1:])
  File "/data/MoCha/patidarr/oncokb-annotator-1.1.0/ClinicalDataAnnotator.py", line 40, in main
    processclinicaldata(annotatedalterationfiles, inputclinicalfile, outputclinicalfile)
  File "/gpfs/gsfs6/users/MoCha/patidarr/oncokb-annotator-1.1.0/AnnotatorCore.py", line 495, in processclinicaldata
    il = headers[l]
KeyError: 'LEVEL_R2'

but when I ran the same command but version oncokb-annotator-1.0.3, it worked...

Overflow error in AnnotatorCore.py

This line, when executed on Windows (including 64-bit Windows) fails

csv.field_size_limit(sys.maxsize) # for reading large files

The error is

OverflowError: Python int too large to convert to C long

caused by the Windows C long being 32 bits.
The following fixes this, and should be OK for Python 3 and Python 2, on any CPU / OS.

import ctypes as ct
csv.field_size_limit(int(ct.c_ulong(-1).value // 2))

See _csv.Error: field larger than field limit (131072)

Shifted column names

If the last column in a MAF is all empty strings (which usually is the result of cmo_maf2maf with Caller being the last column ), MafAnnotator.py output is shifted by one column.

Gene symbols

Hi guys,

What is the source of the gene symbols you use?

I'm using http://oncokb.org/api/v1/genes to pull down genes and subset on tumor-suppressor genes. This is one of the entries:

{
    "entrezGeneId":84142,
    "hugoSymbol":"FAM175A",
    "name":"family with sequence similarity 175 member A",
    "oncogene":false,
    "curatedIsoform":"ENST00000321945",
    "curatedRefSeq":"NM_139076.2",
    "geneAliases":["ABRA1","CCDC98"],
    "tsg":true
}

The "official" symbol for this gene, however, seems to be ABRAXAS1: https://www.ncbi.nlm.nih.gov/gene/84142. That is also the primary symbol associated with the gene in the internal instance of the IMPACT series.

Errors for example dataset and command

with the latest version of annotator, I got the following errors with example code. My token is not Expired (174 days left)

ERROR:AnnotatorCore:error when processing https://www.oncokb.org/api/v1/utils/allCuratedGenes.json 
reason: Forbidden
INFO:MafAnnotator:annotating data/example_maf.txt ...
INFO:MafAnnotator:done!

any suggestions?
Thanks

Support both get/post to oncokb apis

Change the pull_mutation_info arguments to a list of class which includes hugo, protein_change, consequence, start, end, cancer_type.
Add additional method to support post to oncokb api.

Append the annotation to the file

This is the step after getting back from oncokb api and processed the result of the queries response.
At this moment, you are able to make call to oncokb, then after getting the processed result, we want to include the original rows and newly processed annotation to the file.

  • Store the original rows to a list. Similar to queries, let's create another list calls rows. At the place you push Query to queries, we should push row to rows
  • pull_mutation_info should return you a list of annotations based on the list of queries you send in. Loop through the list of annotation, get the annotation from the annotation list.
  • get the row from the rows, the index of the row is the same of the annotation
  • append the annotation after row
  • write the row to the file

input to MafAnnotator.py

Hi,

I like this program. I am using it to a current project. I have a few questions about MafAnnotator.py.

  1. Should the input MAF should be pre-filtered to include only "functional mutations"? I noticed that your example MAF only has functional mutations.
  2. For elements of the MAF that don't have annotated protein positions (e.g. intronic mutations), I seem to be getting the messages of the form "position wrong at lineN -/M" where M and N are integers. Why is this? I looked at the logic in the program. It seems to be checking the HGVSp columns against the Protein_position column which are both unpopulated in the case of mutations that don't cause an amino acid coding change.
  3. Should the read depth columns have any effect on the results?
  4. For a specific example related to all of these questions, consider the attached file which contains the output file where the input files only differ in that the columns t_depth (368), t_ref_count (164), t_alt_count (86), n_depth (359), n_ref_count (249) and n_alt_count (1) have NA in one file and they have integers in parentheses above in the other. Please note the sums don't add up because the allele-specific read counting was capped at 250 in the generating program.
  5. I was not expecting the KIT intronic mutation in the attachment would be actionable, so my sense is that the result with the allele counts populated is correct. But, I would like to verify that this is the desired results. In other words, should this intron KIT mutations be considered to be an actionable mutation?

Thanks,
Pete Vedell
Informatics Specialist
Mayo Clinic
[email protected]
example_unexpected_result.maf.transposed.txt

Use POST methods for annotation

Replace current GET method with POST, so we can decrease the workload
https://www.oncokb.org/swagger-ui/index.html

  • Understand how function makeoncokbrequest and pulloncokb works in AnnotatorCore.py

  • Understand why we want to change GET to POST

  • Use pull_mutation_info as example where uses pulloncokb: we should call pull_mutation_info after reading 50 rows instead of calling it every row.

  • Understand how method processalterationevents works.
    It takes a file and reads the row on line 269.
    It processes the row and gathers all necessary information before calling pull_mutation_info

  • Change the pull_mutation_info arguments to a list of class which includes hugo, protein_change, consequence, start, end, cancer_type
    From last step, you understand how the processalterationevents works. Instead of reading one row then call pull_mutation_info, we want to read 50 rows and put processed data into a list. Then past the list to pull_mutation_info.

  • Process 50 rows in pulloncokb and return 50rows response and append the result to the file

  • If the api failed, we should try to resend the api again.(Sometimes the system may not be stable for overwhelming requests, but there should always be a service available)

error while using local oncokB url

hi @jjgao,

I want to use a local version of oncokb to make sure the results I am generating today are what I will get down the road on a sample (if I reran a sample).
I setup the oncokB, and I can browse the site but when I oncokb-annotator I am getting error.
python /data/MoCha/patidarr/oncokb-annotator-1.1.0/MafAnnotator.py -i tmp.maf -o tm2.maf -t MEL -u https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev

annotating tmp.maf...
error when processing https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev/annotate/mutations/byProteinChange?hugoSymbol=FANCD2&alteration=X426_splice&tumorType=MEL&consequence=splice_region_variant
error when processing https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev/annotate/mutations/byProteinChange?hugoSymbol=OPA1&alteration=I24V&tumorType=MEL&consequence=missense_variant&proteinStart=24&proteinEnd=24
error when processing https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev/annotate/mutations/byProteinChange?hugoSymbol=SDHA&alteration=L649Efs*4&tumorType=MEL&consequence=frameshift_variant
error when processing https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev/annotate/mutations/byProteinChange?hugoSymbol=BRAF&alteration=V600K&tumorType=MEL&consequence=missense_variant&proteinStart=600&proteinEnd=600
error when processing https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev/annotate/mutations/byProteinChange?hugoSymbol=PPP6C&alteration=T120Nfs*32&tumorType=MEL&consequence=frameshift_variant
done!

but when I paste the https://mocha-cbioportal.ncifcrf.gov/cbioportal-dev/annotate/mutations/byProteinChange?hugoSymbol=BRAF&alteration=V600K&tumorType=MEL&consequence=missense_variant&proteinStart=600&proteinEnd=600 in browser I get following:

{"query":{"id":null,"type":"regular","hugoSymbol":"BRAF","entrezGeneId":673,"alteration":"V600K","alterationType":null,"svType":null,"tumorType":"MEL","consequence":"missense_variant","proteinStart":600,"proteinEnd":600,"hgvs":null},"geneExist":true,"variantExist":true,"alleleExist":true,"oncogenic":"Oncogenic","mutationEffect":{"knownEffect":"Gain-of-function","description":"","citations":{"pmids":["25417114","20179705","23833300","22535154","26091043","26343582","30630828","23922205","25079552","28783719","19251651","15035987"],"abstracts":[]}},"highestSensitiveLevel":"LEVEL_1","highestResistanceLevel":null,"highestDiagnosticImplicationLevel":null,"highestPrognosticImplicationLevel":null,"otherSignificantSensitiveLevels":[],"otherSignificantResistanceLevels":[],"hotspot":true,"geneSummary":"BRAF, an intracellular kinase, is frequently mutated in melanoma, thyroid and lung cancers among others.","variantSummary":"The BRAF V600K mutation is known to be oncogenic.","tumorTypeSummary":"The RAF-inhibitors encorafenib, dabrafenib and vemurafenib alone or in combination with the MEK-inhibitors binimetinib, trametinib and cobimetinib, respectively, are FDA-approved for the treatment of patients with BRAF V600E/K mutant melanoma.","prognosticSummary":"","diagnosticSummary":"","diagnosticImplications":[],"prognosticImplications":[],"treatments":[{"drugs":[{"ncitCode":"C82386","drugName":"Dabrafenib","uuid":"939cd40b-b515-499d-b099-fd29027c0d17","synonyms":["DABRAFENIB","GSK-2118436","GSK-2118436A","BRAF Inhibitor GSK2118436","Dabrafenib","GSK2118436","Benzenesulfonamide, N-(3-(5-(2-amino-4-pyrimidinyl)-2-(1,1-dimethylethyl)-4-thiazolyl)-2-fluorophenyl)-2,6-difluoro-"]}],"approvedIndications":["Dabrafenib is FDA-approved for BRAF V600E mutant unresectable or metastatic melanoma."],"fdaApproved":null,"level":"LEVEL_1","pmids":["22608338","23051966","22735384"],"abstracts":[]},{"drugs":[{"ncitCode":"C82386","drugName":"Dabrafenib","uuid":"939cd40b-b515-499d-b099-fd29027c0d17","synonyms":["DABRAFENIB","GSK-2118436","GSK-2118436A","BRAF Inhibitor GSK2118436","Dabrafenib","GSK2118436","Benzenesulfonamide, N-(3-(5-(2-amino-4-pyrimidinyl)-2-(1,1-dimethylethyl)-4-thiazolyl)-2-fluorophenyl)-2,6-difluoro-"]},{"ncitCode":"C77908","drugName":"Trametinib","uuid":"fb2bb01c-c0ec-4641-abf7-87f486075022","synonyms":["Mekinist","TRAMETINIB","JTP-74057","MEK Inhibitor GSK1120212","N-(3-{3-cyclopropyl-5-[(2-fluoro-4-iodophenyl)amino]-6,8-dimethyl-2,4,7-trioxo-3,4,6,7-tetrahydropyrido[4,3-d]pyrimidin-1(2H)-yl}phenyl)acetamide","GSK1120212","Trametinib"]}],"approvedIndications":["Dabrafenib + Trametinib is FDA-approved for BRAF V600E or V600K mutant unresectable or metastatic melanoma"],"fdaApproved":null,"level":"LEVEL_1","pmids":["28891408","25265492","25287827","29361468","28991513","23020132","25399551"],"abstracts":[]},{"drugs":[{"ncitCode":"C98283","drugName":"Encorafenib","uuid":"001e534f-3e63-432f-90a6-d1af1759e4e2","synonyms":["LGX-818","Encorafenib","LGX818","LGX 818","ENCORAFENIB","Braftovi"]},{"ncitCode":"C84865","drugName":"Binimetinib","uuid":"feb9f4a3-e374-4c75-8a3b-0f1fbcdbf677","synonyms":["ARRY-438162","Mektovi","Binimetinib","ARRY-162","MEK162","BINIMETINIB"]}],"approvedIndications":["In combination for patients with unresectable or metastatic melanoma with a BRAF V600E or V600K mutation"],"fdaApproved":null,"level":"LEVEL_1","pmids":["29573941"],"abstracts":[]},{"drugs":[{"ncitCode":"C77908","drugName":"Trametinib","uuid":"fb2bb01c-c0ec-4641-abf7-87f486075022","synonyms":["Mekinist","TRAMETINIB","JTP-74057","MEK Inhibitor GSK1120212","N-(3-{3-cyclopropyl-5-[(2-fluoro-4-iodophenyl)amino]-6,8-dimethyl-2,4,7-trioxo-3,4,6,7-tetrahydropyrido[4,3-d]pyrimidin-1(2H)-yl}phenyl)acetamide","GSK1120212","Trametinib"]}],"approvedIndications":["Trametinib is FDA-approved for BRAF V600E or V600K mutant unresectable or metastatic melanoma"],"fdaApproved":null,"level":"LEVEL_1","pmids":["29361468","25399551","22663011","25265492"],"abstracts":[]},{"drugs":[{"ncitCode":"C64768","drugName":"Vemurafenib","uuid":"4e91da20-6cf0-4e07-995f-7f7db4c7c077","synonyms":["Vemurafenib","BRAF(V600E) Kinase Inhibitor RO5185426","RO 5185426","PLX4032","PLX-4032","1-propanesulfonamide, n-(3-((5-(4-chlorophenyl)-1h-pyrrolo(2,3-b)pyridin-3-yl)carbonyl)-2,4-difluorophenyl)-","RG 7204","Zelboraf","BRAF (V600E) kinase inhibitor RO5185426","RG7204","VEMURAFENIB"]}],"approvedIndications":["Vemurafenib is FDA-approved for BRAF V600E mutant unresectable or metastatic melanoma"],"fdaApproved":null,"level":"LEVEL_1","pmids":["28961848","24508103","25399551"],"abstracts":[]},{"drugs":[{"ncitCode":"C64768","drugName":"Vemurafenib","uuid":"4e91da20-6cf0-4e07-995f-7f7db4c7c077","synonyms":["Vemurafenib","BRAF(V600E) Kinase Inhibitor RO5185426","RO 5185426","PLX4032","PLX-4032","1-propanesulfonamide, n-(3-((5-(4-chlorophenyl)-1h-pyrrolo(2,3-b)pyridin-3-yl)carbonyl)-2,4-difluorophenyl)-","RG 7204","Zelboraf","BRAF (V600E) kinase inhibitor RO5185426","RG7204","VEMURAFENIB"]},{"ncitCode":"C68923","drugName":"Cobimetinib","uuid":"eb357145-3b18-4aca-b75a-5e18dd2bf4f9","synonyms":["Cotellic","Cobimetinib","XL518","COBIMETINIB","GDC-0973","MEK Inhibitor GDC-0973"]}],"approvedIndications":["Cobimetinib is FDA-approved for the treatment of patients with unresectable or metastatic melanoma with a BRAF V600E or V600K mutation, in combination with vemurafenib. Cobimetinib is not indicated for treatment of patients with wild-type BRAF melanoma."],"fdaApproved":null,"level":"LEVEL_1","pmids":["27480103","25265494"],"abstracts":[]},{"drugs":[{"ncitCode":"C82386","drugName":"Dabrafenib","uuid":"939cd40b-b515-499d-b099-fd29027c0d17","synonyms":["DABRAFENIB","GSK-2118436","GSK-2118436A","BRAF Inhibitor GSK2118436","Dabrafenib","GSK2118436","Benzenesulfonamide, N-(3-(5-(2-amino-4-pyrimidinyl)-2-(1,1-dimethylethyl)-4-thiazolyl)-2-fluorophenyl)-2,6-difluoro-"]},{"ncitCode":"C1857","drugName":"Panitumumab","uuid":"d7b1d12a-e942-4801-bb64-916c9bdfaaf3","synonyms":["ABX-EGF, Clone E7.6.3","PANITUMUMAB","Vectibix","Panitumumab","MoAb ABX-EGF","ABX-EGF","ABX-EGF Monoclonal Antibody","Monoclonal Antibody ABX-EGF","panitumumab"]},{"ncitCode":"C77908","drugName":"Trametinib","uuid":"fb2bb01c-c0ec-4641-abf7-87f486075022","synonyms":["Mekinist","TRAMETINIB","JTP-74057","MEK Inhibitor GSK1120212","N-(3-{3-cyclopropyl-5-[(2-fluoro-4-iodophenyl)amino]-6,8-dimethyl-2,4,7-trioxo-3,4,6,7-tetrahydropyrido[4,3-d]pyrimidin-1(2H)-yl}phenyl)acetamide","GSK1120212","Trametinib"]}],"approvedIndications":[],"fdaApproved":null,"level":"LEVEL_2B","pmids":["29431699"],"abstracts":[{"link":"http://ascopubs.org/doi/abs/10.1200/JCO.2018.36.4_suppl.627","abstract":"Cutsem et al. Abstract# 627, ASCO 2018"},{"link":"https://academic.oup.com/annonc/article/29/suppl_5/mdy149.026/5039436","abstract":"Cutsem et al. Abstract# O-027, ESMO 2018"}]},{"drugs":[{"ncitCode":"C98283","drugName":"Encorafenib","uuid":"001e534f-3e63-432f-90a6-d1af1759e4e2","synonyms":["LGX-818","Encorafenib","LGX818","LGX 818","ENCORAFENIB","Braftovi"]},{"ncitCode":"C1723","drugName":"Cetuximab","uuid":"5fce3074-e420-4c36-9603-2423daf20118","synonyms":["CETUXIMAB","Cetuximab Biosimilar CMAB009","IMC-C225","Chimeric Anti-EGFR Monoclonal Antibody","Cetuximab Biosimilar KL 140","Cetuximab Biosimilar CDP-1","cetuximab","Chimeric Monoclonal Antibody C225","Cetuximab","Chimeric MoAb C225","Erbitux"]},{"ncitCode":"C84865","drugName":"Binimetinib","uuid":"feb9f4a3-e374-4c75-8a3b-0f1fbcdbf677","synonyms":["ARRY-438162","Mektovi","Binimetinib","ARRY-162","MEK162","BINIMETINIB"]}],"approvedIndications":[],"fdaApproved":null,"level":"LEVEL_2B","pmids":["29431699"],"abstracts":[{"link":"http://ascopubs.org/doi/abs/10.1200/JCO.2018.36.4_suppl.627","abstract":"Cutsem et al. Abstract# 627, ASCO 2018"},{"link":"https://academic.oup.com/annonc/article/29/suppl_5/mdy149.026/5039436","abstract":"Cutsem et al. Abstract# O-027, ESMO 2018"}]}],"dataVersion":"","lastUpdate":"06/19/2019","vus":false}

Could you please let me know what I am doing wrong here?

Thanks,
@patidarr

How to handle MSI

Hi,

Is it possible to give MSI Status as input to OncoKBPlots.py?

Thanks,
Rajesh

Some MSK Archer fusion annotations are inconsistent with cBioPortal

Hi,
I ran oncokb-annotator on 04/03/2020 on a subset of the MSK IMPACT cohort and some of the archer fusion annotations are inconsistent with cBioPortal. For example the oncogenic column for P-0022485-T01-IM6 NR4A3-EWSR1 fusion - Archer was blank when I ran the annotator, but on the portal it shows this fusion as likely oncogenic. P-0024067-T01-IM6 FLI1-EWSR1 fusion - Archer is another example of this, the annotator result is blank but it's level 4 on the portal. No errors showed up when running FusionAnnotator.py and I cloned the repo right before running it. Attached is a list of the IMPACT samples with the discrepancy in my cohort and the output of the oncogenic column from the annotator. Adjacent is the annotation from the portal that I manually curated. Do you know what could be causing the discrepancy? I'd really appreciate any help.
Thanks!

Support 3-letter AA code for HGVSP column

Update pulloncokb method

Currently, the method pulloncokb makes the call to and process the result from OncoKB. It should not be the case while you are trying to support GET/POST at the same time considering we are only making changes for the pull_mutation_info. Lets' change the logic of the pulloncokb to take the response from oncokb instead.

  • rename pulloncokb to process_oncokb_annotation.
  • The parameter should be only one and named annotation. Remember, this is annotation for one query.
  • Process annotation. So the basic logic of this method would not change. The only difference is that instead of fetching oncokb and process, we only process the result.

example data doesn't seem correct

Hi,

I tried running your example data, and the results (example_maf.onckkb.txt) do not match what I see on the OncoKB website.

For example, the PIK3CA E542K variant was annotated as level 3B and 4, but here:

http://oncokb.org/#/gene/PIK3CA/variant/E542K

...it says 3A. None of the levels in the output match what I see online.

Am I doing something wrong?

thanks for your help,
Greg.

MafAnnotator.py run unsuccessfully with error of json decode

I'm a Chinese and normally the unicode is UTF-8.
When I run MafAnnotator.py, the error occurred as follows:

#####################################################################
########################### parting line ################################
annotating data/example_maf.txt...
Traceback (most recent call last):
File "MafAnnotator.py", line 81, in
main(sys.argv[1:])
File "MafAnnotator.py", line 68, in main
processalterationevents(inputmaffile, outputmaffile, previousresultfile, defaultcancertype, cancertypemap, False)
File "/Users/peiyuchen/Biosoft/oncokb-annotator-1.0.6/AnnotatorCore.py", line 129, in processalterationevents
inithotspots()
File "/Users/peiyuchen/Biosoft/oncokb-annotator-1.0.6/AnnotatorCore.py", line 120, in inithotspots
missensesinglehotspots = gethotspots(cancerhotspotsbaseurl+"/api/hotspots/single", "single residue")
File "/Users/peiyuchen/Biosoft/oncokb-annotator-1.0.6/AnnotatorCore.py", line 97, in gethotspots
hotspotsjson = json.load(urllib.urlopen(url))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 290, in load
**kw)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 33 column 1 (char 6 - 477818)
############################ parting line ####################################
#########################################################################

I wonder if it resulted from my Chinese system unicode, so I shifted my mac in English system. However, nothing changed, same error as before.
Just a little python background so I cannot solve this problem by myself, could anyone help me to figure out what's wrong with my MafAnnotator.py?

Thanks a lot for helping me.

Error running MafAnnotator.py

Hi @jjgao,

I can figure out what I am doing wrong to get this error:
[patidarr@cn3151 processedDATA]$ python /data/MoCha/patidarr/oncokb-annotator/MafAnnotator.py -i LG0520/20170910/LG0520.consensus.maf -o LG0520/20170910/oncoKB/LG0520.maf -t LUSC
annotating LG0520/20170910/LG0520.consensus.maf...
Traceback (most recent call last):
File "/data/MoCha/patidarr/oncokb-annotator/MafAnnotator.py", line 81, in
main(sys.argv[1:])
File "/data/MoCha/patidarr/oncokb-annotator/MafAnnotator.py", line 68, in main
processalterationevents(inputmaffile, outputmaffile, previousresultfile, defaultcancertype, cancertypemap, False)
File "/gpfs/gsfs6/users/MoCha/patidarr/oncokb-annotator/AnnotatorCore.py", line 127, in processalterationevents
inithotspots()
File "/gpfs/gsfs6/users/MoCha/patidarr/oncokb-annotator/AnnotatorCore.py", line 118, in inithotspots
missensesinglehotspots = gethotspots(cancerhotspotsbaseurl+"/api/hotspots/single", "single residue")
File "/gpfs/gsfs6/users/MoCha/patidarr/oncokb-annotator/AnnotatorCore.py", line 95, in gethotspots
hotspotsjson = json.load(urllib.urlopen(url))
File "/usr/local/Anaconda/envs/py2.7/lib/python2.7/json/init.py", line 291, in load
**kw)
File "/usr/local/Anaconda/envs/py2.7/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Anaconda/envs/py2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Anaconda/envs/py2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

I used vcf2maf to generate the MAF file and I have used the similar maf file on another patient and it has worked, there are some 70 patients its not working on.

Any insight would be very helpful.

Thanks,
Rajash

support running the program from any directory

It necessarily requires one to run the code from inside the git/oncokb-annotator/ directory otherwise it complains of "IOError: [Errno 2] No such file or directory: 'data/curated_genes.txt’"

Handle failing api if the post fails

If for some reasons, the post api failed when fetching oncokb, we should change it to get call to get all queries annotated one by one so we would not lose most of the data.

Array joining issue

the annotator returns all original data.
But sometimes it gives the following issue and stopped the analysis

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Failed run on "CnaAnnotator.py"

Hi, there,

I recently cloned this version of oncokb_annotator and run "CnaAnnotator.py" using the following command and got the following error:

[gongy@luna BRAF_Fusions]$ which python
/opt/common/CentOS_6-dev/python/python-2.7.10/bin/python
[gongy@luna BRAF_Fusions]$ python ~/utilities/pipelines/oncokb-annotator/CnaAnnotator.py -i /home/jonssonp/res/dmp/mskimpact/data_CNA.txt -o test
annotating /home/jonssonp/res/dmp/mskimpact/mskimpact/data_CNA.txt...
Traceback (most recent call last):
  File "/home/gongy/utilities/pipelines/oncokb-annotator/CnaAnnotator.py", line 72, in <module>
    main(sys.argv[1:])
  File "/home/gongy/utilities/pipelines/oncokb-annotator/CnaAnnotator.py", line 59, in main
    processcnagisticdata(inputcnafile, outputcnafile, previousresultfile, defaultcancertype, cancertypemap, False)
  File "/home/gongy/utilities/pipelines/oncokb-annotator/AnnotatorCore.py", line 400, in processcnagisticdata
    outf.write(oncokbinfo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 62: ordinal not in range(128)
[gongy@luna BRAF_Fusions]$ diff /home/bandlamc/ifs_work/git/oncokb-annotator/CnaAnnotator.py ~/utilities/pipelines/oncokb-annotator/CnaAnnotator.py

I tried to use an older version of "CnaAnnotator.py" from the following directory and it works well:

/home/bandlamc/ifs_work/git/oncokb-annotator/CnaAnnotator.py

Please let me know if you can't access to any of these files.

Thanks.
Yixiao Gong

Change annotator to use the new Annotation endpoints

... based on the data types, ie. copy number endpoint for copy number events, sv endpoint for SVs. For mutations, if already annotated with protein change, use byProteinChange by default, otherwise use byGenomicChange, and give an option to alway use byGenomicChange.

Please compare to the current results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.