Giter VIP home page Giter VIP logo

hpocaseannotator's Introduction

Hpo Case Annotator

GitHub release Java CI with Maven Documentation Status

Hpo Case Annotator makes biocuration of case reports easier.

Most users should download the latest Hpo Case Annotator distribution ZIP file from the Releases page.

Please consult the Read the docs site for detailed documentation:

  • stable version describing the latest release at the Releases page, or
  • latest version summarizing the latest development on development branch.

Issues?

Feel free to submit an issue to our tracker.

hpocaseannotator's People

Contributors

dependabot[bot] avatar ielis avatar mabeckwith avatar pnrobinson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpocaseannotator's Issues

Genome build/ Assembly could be an Enum

see

https://github.com/phenopackets/phenopacket-schema/blob/4ce66acabfd3cc0f66e0a33bc5199bb80c4b2c87/src/main/proto/org/phenopackets/schema/v1/core/base.proto#L509-L521

if you want the patch version this won't be ideal, but could be included in a compound type e.g.

message GenomeAssemblyWithPatch {
 GenomeAssembly genome_assembly = 1;
 int32 patch = 2;
}

Malformed export of author name

In the phenopacklet that gets exported from this publication:
1: Irfanullah, Umair M, Khan S, Ahmad W. Homozygous sequence variants in the NPR2
gene underlying Acromesomelic dysplasia Maroteaux type (AMDM) in consanguineous
families. Ann Hum Genet. 2015 Jul;79(4):238-44. doi: 10.1111/ahg.12116. Epub 2015
May 11. PubMed PMID: 25959430.

tghe export uses "author" instead of "Irfanullah"

"publication": {
    "authorList": "Irfanullah, Umair M, Khan S, Ahmad W",
    "title": "Homozygous sequence variants in the NPR2 gene underlying Acromesomelic dysplasia Maroteaux type (AMDM) in consanguineous families",
    "journal": "Ann Hum Genet",
    "year": "2015",
    "volume": "79(4)",
    "pages": "238-44",
    "pmid": "25959430"
  },

Cannot open annotation files...

Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at java.beans.XMLDecoder.readObject(XMLDecoder.java:250)
at org.monarchinitiative.hpo_case_annotator.io.XMLModelParser.loadDiseaseCaseModel(XMLModelParser.java:65)
at org.monarchinitiative.hpo_case_annotator.controllers.MainController.openMenuItemAction(MainController.java:109)
... 53 more

Get accession # doesn't do anything except get accession

Also in Hpo Case Annotator v1.0.11, when you hit the 'Get accession' button, you do accurately get the accession number(s). However, if you hit the accesssion number button, another window opens and it seems like you could add the accession number and variant and then hit 'ok' it should lead somewhere? It doesn't. Is it supposed to go to variantvalidator?

DIsease database

The disease/disease name field is inactivated. Do we need to download stuff to fill these fields? There is nothing in the Settings field for the disease data.

phenopacket id missing

I am progressing with the C++ validator. Now I think it is validating the entire RD phenopacket, and it seems that the only error is that there is no id for the phenopaclet (see below). Probably we can give the id PMID_12345_patient_B or something like that

Phenopacket at: ../Gebbia-1997-ZIC3-III-1.json

Phenopacket:
ID: III-1
Age: 7W
Sex: male
Arrhinencephaly [HP:0002139]
Single ventricle [HP:0001750]
Patent ductus arteriosus [HP:0001643]
Asplenia [HP:0001746]
Pulmonary artery hypoplasia [HP:0004971]
Complete atrioventricular canal defect [HP:0001674]
Transposition of the great arteries [HP:0001669]
Posteriorly placed anus [HP:0012890]
Ventricular septal defect [HP:0001629]
Abnormal ciliary motility [HP:0012262]
Pulmonary artery atresia [HP:0004935]
Abdominal situs inversus [HP:0003363]
Gene: ZIC3[ENTREZ:7547]
GRCh37: X:136649818C>T[]
Disease: HETEROTAXY, VISCERAL, 1, X-LINKED; HTX1 [OMIM:306955]
Metadata:
Hpo Case Annotator : 1.0.13-SNAPSHOT(1970-01-01T00:00:00Z)
human phenotype ontology: hp(HP;http://purl.obolibrary.org/obo/hp.owl;2018-03-08;http://purl.obolibrary.org/obo/HP_)
Phenotype And Trait Ontology: pato(PATO;http://purl.obolibrary.org/obo/pato.owl;2018-03-28;http://purl.obolibrary.org/obo/PATO_)
Genotype Ontology: geno(GENO;http://purl.obolibrary.org/obo/geno.owl;19-03-2018;http://purl.obolibrary.org/obo/GENO_)
NCBI organismal classification: ncbitaxon(NCBITaxon;http://purl.obolibrary.org/obo/ncbitaxon.owl;2018-03-02;)
Evidence and Conclusion Ontology: eco(ECO;http://purl.obolibrary.org/obo/eco.owl;2018-11-10;http://purl.obolibrary.org/obo/ECO_)
Online Mendelian Inheritance in Man: omim(OMIM;https://www.omim.org;;)

We identified 1 Q/C issue

[ERROR] phenopacket id missing

Null pointer exception

In line 123 of MainController.java, the following can cause a null pointer exception

ProtoJSONModelParser pp = new ProtoJSONModelParser(optionalResources.getDiseaseCaseDir().toPath());

The reason was because I had not yet set any of the settings. We should provide an error message that says "please initialize the settings before use" or something like that.
After I completing the settings, everything worked fine!

Duplicate output phenotypes

I have seen a weird bug that happened twice (but not everytime I use it). When I export a case to a phenopackets, somehow the phenotypes are output twice.

store data broken

HpoCaseAnnotator does not store data about the paths etc between program runs. The data should be written to the .hpo-case-annotator config file.

hostservices wrapper

@ielis Could you take a look at the new branch I just pushed, variantvalidator-update.
How do I get a reference to the HostServices from the Wrapper class? (there are two compile errors that result from the hostService)

Phenopacket export

When we export Phenopacket from the internal format, we have:

"diseases": [{
    "term": {
      "id": "OMIM:306400",
      "label": "GRANULOMATOUS DISEASE, CHRONIC, X-LINKED; CGD"
    }
  }]

for the disease, and

"genes": [{
    "id": "ENTREZ:1536",
    "symbol": "CYBB"
  }]

However, I do not know how to create appropriate Resource for these namespaces (OMIM, ENTREZ).

@pnrobinson do you please have any suggestions how to fix it? Otherwise the Phenopacket will not be valid, at least by the code I wrote this morning..

+

When we send a variant like c.123+1G>C to VariantValidator, we need to escape the "+" sign. Probably we can use +

Error when removing variant using the tool

Was helping @nicolevasilevsky curate a paper: Takagi-2006-WNK1.
We accidentally had an extra variant (it was held over from the previous paper we opened--I think). Anyways, it was incorrect, so I hit 'remove variant'. However, the Validate tool clearly thinks that an additional variant is expected, so it is throwing errors. The rest of the curation is correct, but I do not know how to get rid of the errors.

properties.getProperty("scigraph.mining.url")

The following (in HpoCaseAnnotatorModule) is throwing a null pointer exception when the "Add/remove HPO terms" button is clicked. I am trying to track down where the properties get initialized.

 @Provides     @Singleton     @Named("scigraphMiningUrl")     public URL scigraphMiningUrl(Properties properties) throws MalformedURLException {         return new URL(Objects.requireNonNull(properties.getProperty("scigraph.mining.url")));     }

VariantValidator interaction

It would be excellent to allow the user to open up a Java Webview to check the mutation with VariantValidator. If the user only knows the chromosomal position, they can open up the window and check that everything is correct compared to the HGVS string, and then use VariantValidator to quickly get the snippet.

https://variantvalidator.org/variantvalidation/?variant=GRCh37:1:150550916:G:A

Alternatively, I think it is possible to use variant validator to go from HGVS to genome, this would also save an enormous amount of time. There is also a new API and we could possible do the latter programmatically.

Insertion/deletion snippets

This is a valid snippet for an insertion
GGACCTGACACTT[-/TT]ACAACA
but it is not being recognized.
This would be a valid sinppet for a deletion of 2 bases
GGACCTGACACTT[TT/-]ACAACA

Numbering of indels

I am wondering if the position check in the snippet is off by one
I just added this case report
data/casereports/Unger-2008-CCNQ.json
It is from this variant
https://www.ncbi.nlm.nih.gov/clinvar/variation/10674/

chrX
pos: 152860131
ref: T
alt: TT
snippet: TTGGGT[T/TT]AAAGTACCT

If I run the validate function of HCA, I get "Ref sequence T does not match the sequence A observed at X:152860130-152860131

I then tried to run Jannovar on this but do not get the same variant as in ClinVar

$ java -jar jannovar-cli-0.27.jar annotate-pos -d data/hg19_ucsc.ser -c 'chrX:152860131T>TT'
Options
JannovarAnnotatePosOptions [genomicChanges=[chrX:152860131T>TT], toString()=JannovarAnnotationOptions [useThreeLetterAminoAcidCode=false, nt3PrimeShifting=false, showAll=false, databaseFilePath=data/hg19_ucsc.ser, toString()=JannovarBaseOptions [reportProgress=true, httpProxy=null, httpsProxy=null, ftpProxy=null, verbosity=1]]]
Deserializing transcripts...
INFO Deserializing JannovarData from data/hg19_ucsc.ser
INFO Deserialization took 4.19 sec.
#change	effect	hgvs_annotation	messages
chrX:152860131T>TT	FRAMESHIFT_VARIANT	CCNQ:uc011myr.2:c.291dup:p.(L98Tfs*30)	INFO_REALIGN_3_PRIME

Something weird is going on, hopefully I am not making more than one dumb mistake at a time.

read the docs

Are we ready to make this repository public and create some read the docs?

Feature request: Bigger dropdown box for disease name

NOT URGENT.

Having issues with long named diseases and choosing between the different types.

Example: I wanted SPONDYLOEPIMETAPHYSEAL DYSPLASIA WITH JOINT LAXITY, TYPE 1, WITH ORWITHOUT FRACTURES; SEMDJL1

But I was typing it in from the paper--so I didn't exactly know what it was called. The first that came up was Type 2...and it was very difficult to search through the results to select what I wanted.

Set OMIM as default database

I tried to do this in DiseaseCaseDataController but it did not have any effect
(at about line 435, in the init function)

diseaseDatabaseComboBox.getSelectionModel().selectFirst(); // OMIM as the default

HPO text-mining terms do not identify "Not" or "No" for terms

When using the text mining tool, there is no identification of 'not'/ 'no' phenotypes.
Below is an example of what I added to the text-mining box. It does not matter if I put 'no' or 'not' in front of the term. It identifies them always as a present phenotype and as far as I know, from that box, I cannot further negate it.

Psychomotor retardation
Developmental regression (from age)
Seizures/ epilepsy
Lennox Gestaut
Myopathy
Abnormal basal ganglia
Cerebral atrophy
Elevated serum lactate
elevated lactate, elevated malate, elavated succinate

NOt Ataxia
Not Dystonia
Not Pyramidal tract involvement
Not Hepatomegaly
Not Cardiomyopathy
Not Leukodystrophy

Resource files management

Resource files (genome fasta file, hpo obo file) rarely change between different releases of Hpo Case Annotator.
At the moment, each new release of the app needs to have its own resource folder (e.g. $HOME/.hpo-case-annotator-1.0.11) and user is forced to configure app each time a new release is made.

It would be good to simplify installation of the new release by using resource folder from previous release (if there is any)

Disease database

It does not make sense to have NCI being an option here, because the app is not designed for curating cancer cases. We can probably just remove the pull down menu, but we should at least make OMIM show up by default.

Update in PhenoPacketCodec

We have switched the genomeAssembly element in Phenopackets from an ENUM to a String, and so the following code no longer workds. Essentially, the Phenopackets now just needs Strings like "GRCh37"

 private static String hcaGenomeAssemblyToPhenoPacketGenomeAssembly(org.monarchinitiative.hpo_case_annotator.model.proto.GenomeAssembly genomeAssembly) {
            switch (genomeAssembly) {
                case GRCH_37:
                    return GenomeAssembly.GRCH_37.name();
                case GRCH_38:
                    return GenomeAssembly.GRCH_38.name();
                case UNKNOWN_GENOME_ASSEMBLY:
                case UNRECOGNIZED:
                    return GenomeAssembly.UNKNOWN_ASSEMBLY.name();
                default:
                    LOGGER.warn("Unknown genome assembly: {}", genomeAssembly);
                    return GenomeAssembly.UNKNOWN_ASSEMBLY.name();
            }
        }

1.0.7

Thanks! I can now open the 1.0.7 files. The family info and the genome build is not getting inputted correctly, although it is there in the Java bean file.

<void property="familyInfo">
   <void property="familyOrPatientID">
    <string>Patient 1</string>
   </void>
  </void>
  <void property="genomeBuild">
   <string>37</string>
  </void>

I want to try and experiment with code to input these files and output JSON, and also to finalize our new model. For now I have started a new repo
https://github.com/pnrobinson/beanjson
but we can merge that into this repo if it is working OK.

The data model structure

We should update the data model structure of the app. I would propose starting by editing the current model schema here.

String index out of range: -7

With Hpo Case Annotator v1.0.13

I get

xception in thread "JavaFX Application Thread" java.lang.StringIndexOutOfBoundsException: String index out of range: -7
	at java.lang.String.substring(String.java:1967)
	at org.monarchinitiative.hpotextmining.gui.controller.Present.colorizeHTML4ciGraph(Present.java:226)
	at org.monarchinitiative.hpotextmining.gui.controller.Present.setResults(Present.java:441)
	at org.monarchinitiative.hpotextmining.gui.controller.HpoTextMining.lambda$new$0(HpoTextMining.java:88)
	at org.monarchinitiative.hpotextmining.gui.controller.Configure.lambda$analyzeButtonClicked$0(Configure.java:89)
	at com.sun.javafx.event.CompositeEventHandler.dispatchBubblingEvent(CompositeEventHandler.java:86)
(....)

This is the text:

Febrile seizures	HP:0002373HPOs: Dysarthria	HP:0001260HPOs: Loss of ability to walk	HP:0006957HPOs: Myoglobinuria	HP:0002913HPOs: Focal-onset seizure	HP:0007359HPOs: Apnea	HP:0002104HPOs: Elevated serum creatine kinase	HP:0003236HPOs: Hyperammonemia	HP:0001987HPOs: Hypoglycemia	HP:0001943HPOs: Myopathic facies	HP:0002058HPOs: Microcephaly	HP:0000252HPOs: Hyperactive deep tendon reflexes	HP:0006801HPOs: Babinski sign	HP:0003487HPOs: Exotropia	HP:0000577HPOs: Developmental regression	HP:0002376HPOs: Elevated serum creatine kinase	HP:0003236HPOs: Intellectual disability	HP:0001249HPOs: Rhabdomyolysis	HP:0003201HPOs: Microcephaly	HP:0000252HPOs Free Text: Urine myoglobin of 94 ng/ml (normal range 10–65 ng/ml), serum creatine phosphokinase (CPK) of 205,000 U/l (normal range 75–230 U/l), elevated aspartate aminotransferase (AST) of 1,618 U/l (normal range 15–50 U/l), alanine aminotransferase (ALT) of 571 U/l (normal range 10–25 U/l), ammonia of 122 μmol/l (normal range 22–48 μmol/l), and hypoglycemia (blood glucose 30 mg/dl; normal range 70–110 mg/dl)
Not HPOs: Arrhythmia	HP:0011675Not HPOs Free Text: -
Variants: NM_152906.6:c.460G>A (p.Gly154Arg)
ClinVar ID: 208823

FamilyInfo

I am concerned that code such as this is fragile:

   case 10: 

Can we explore how to do this using named getters/setters?
A lot of the code is now marked as deprecated as well.

Add version number to accession

I initially did not realize this but the eutils do provide a version number like this:

<Gene-commentary_accession>NM_000138</Gene-commentary_accession>
 <Gene-commentary_version>4</Gene-commentary_version>

Therefore, it will be an easy fix to add this -- figure out how to parse the XML in a more elegant way!

LiftOver

Our variants are currently stored as hg37, but the community is moving to Hg38. Consider strategy for lifting over and revalidating the curated variants.

Output file name

We should make the output file name have the case Id in it, so that it is easy to make multiple case reports from one paper.

Refreshing

When I first looked at one case and then opened another, the mutation data was not cleared.

PMID

It would be nice to show the PMID in the field if this has already been initialized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.