monarch-initiative / hpocaseannotator Goto Github PK

View Code? Open in Web Editor NEW

8.0 19.0 0.0 12.4 MB

Next-generation Biocuration App for annotating cases and PhenoPackets

Home Page: https://hpocaseannotator.readthedocs.io/en/latest/index.html

License: BSD 3-Clause "New" or "Revised" License

Java 96.12% CSS 1.19% HTML 2.21% Shell 0.44% Batchfile 0.04%

biocuration java javafx gui

hpocaseannotator's Introduction

Hpo Case Annotator

Hpo Case Annotator makes biocuration of case reports easier.

Most users should download the latest Hpo Case Annotator distribution ZIP file from the Releases page.

Please consult the Read the docs site for detailed documentation:

stable version describing the latest release at the Releases page, or
latest version summarizing the latest development on development branch.

Issues?

Feel free to submit an issue to our tracker.

hpocaseannotator's People

Contributors

Stargazers

Watchers

hpocaseannotator's Issues

Genome build/ Assembly could be an Enum

HpoCaseAnnotator/hpo-case-annotator-core/src/main/proto/model.proto

Line 9 in 41de168

string genome_build = 3;

see

https://github.com/phenopackets/phenopacket-schema/blob/4ce66acabfd3cc0f66e0a33bc5199bb80c4b2c87/src/main/proto/org/phenopackets/schema/v1/core/base.proto#L509-L521

if you want the patch version this won't be ideal, but could be included in a compound type e.g.

message GenomeAssemblyWithPatch {
 GenomeAssembly genome_assembly = 1;
 int32 patch = 2;
}

Text Mining

consider interim solution with SciGraph (https://monarchinitiative.org/annotate/text) while biolink is down.

Malformed export of author name

In the phenopacklet that gets exported from this publication:
1: Irfanullah, Umair M, Khan S, Ahmad W. Homozygous sequence variants in the NPR2
gene underlying Acromesomelic dysplasia Maroteaux type (AMDM) in consanguineous
families. Ann Hum Genet. 2015 Jul;79(4):238-44. doi: 10.1111/ahg.12116. Epub 2015
May 11. PubMed PMID: 25959430.

tghe export uses "author" instead of "Irfanullah"

"publication": {
    "authorList": "Irfanullah, Umair M, Khan S, Ahmad W",
    "title": "Homozygous sequence variants in the NPR2 gene underlying Acromesomelic dysplasia Maroteaux type (AMDM) in consanguineous families",
    "journal": "Ann Hum Genet",
    "year": "2015",
    "volume": "79(4)",
    "pages": "238-44",
    "pmid": "25959430"
  },

Cannot open annotation files...

Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at java.beans.XMLDecoder.readObject(XMLDecoder.java:250)
at org.monarchinitiative.hpo_case_annotator.io.XMLModelParser.loadDiseaseCaseModel(XMLModelParser.java:65)
at org.monarchinitiative.hpo_case_annotator.controllers.MainController.openMenuItemAction(MainController.java:109)
... 53 more

Get accession # doesn't do anything except get accession

Also in Hpo Case Annotator v1.0.11, when you hit the 'Get accession' button, you do accurately get the accession number(s). However, if you hit the accesssion number button, another window opens and it seems like you could add the accession number and variant and then hit 'ok' it should lead somewhere? It doesn't. Is it supposed to go to variantvalidator?

DIsease database

The disease/disease name field is inactivated. Do we need to download stuff to fill these fields? There is nothing in the Settings field for the disease data.

phenopacket id missing

I am progressing with the C++ validator. Now I think it is validating the entire RD phenopacket, and it seems that the only error is that there is no id for the phenopaclet (see below). Probably we can give the id PMID_12345_patient_B or something like that

Phenopacket at: ../Gebbia-1997-ZIC3-III-1.json

Phenopacket:
ID: III-1
Age: 7W
Sex: male
Arrhinencephaly [HP:0002139]
Single ventricle [HP:0001750]
Patent ductus arteriosus [HP:0001643]
Asplenia [HP:0001746]
Pulmonary artery hypoplasia [HP:0004971]
Complete atrioventricular canal defect [HP:0001674]
Transposition of the great arteries [HP:0001669]
Posteriorly placed anus [HP:0012890]
Ventricular septal defect [HP:0001629]
Abnormal ciliary motility [HP:0012262]
Pulmonary artery atresia [HP:0004935]
Abdominal situs inversus [HP:0003363]
Gene: ZIC3[ENTREZ:7547]
GRCh37: X:136649818C>T[]
Disease: HETEROTAXY, VISCERAL, 1, X-LINKED; HTX1 [OMIM:306955]
Metadata:
Hpo Case Annotator : 1.0.13-SNAPSHOT(1970-01-01T00:00:00Z)
human phenotype ontology: hp(HP;http://purl.obolibrary.org/obo/hp.owl;2018-03-08;http://purl.obolibrary.org/obo/HP_)
Phenotype And Trait Ontology: pato(PATO;http://purl.obolibrary.org/obo/pato.owl;2018-03-28;http://purl.obolibrary.org/obo/PATO_)
Genotype Ontology: geno(GENO;http://purl.obolibrary.org/obo/geno.owl;19-03-2018;http://purl.obolibrary.org/obo/GENO_)
NCBI organismal classification: ncbitaxon(NCBITaxon;http://purl.obolibrary.org/obo/ncbitaxon.owl;2018-03-02;)
Evidence and Conclusion Ontology: eco(ECO;http://purl.obolibrary.org/obo/eco.owl;2018-11-10;http://purl.obolibrary.org/obo/ECO_)
Online Mendelian Inheritance in Man: omim(OMIM;https://www.omim.org;;)

We identified 1 Q/C issue

[ERROR] phenopacket id missing

Null pointer exception

In line 123 of MainController.java, the following can cause a null pointer exception

ProtoJSONModelParser pp = new ProtoJSONModelParser(optionalResources.getDiseaseCaseDir().toPath());

The reason was because I had not yet set any of the settings. We should provide an error message that says "please initialize the settings before use" or something like that.
After I completing the settings, everything worked fine!

Exception in thread "JavaFX Application Thread" java.lang.StringIndexOutOfBoundsException: String index out of range: -5

This seems to happen if there is a match right at the end of the string.
An easy fix would be to pad the string with ten spaces

Is the genome assembly getting output in the wrong place?

For example...

 "vcfAllele": {
      "id": "GRCh37",
      "chr": "X",
      "pos": 136649818,
      "ref": "C",
      "alt": "T"
    },

Duplicate output phenotypes

I have seen a weird bug that happened twice (but not everytime I use it). When I export a case to a phenopackets, somehow the phenotypes are output twice.

store data broken

HpoCaseAnnotator does not store data about the paths etc between program runs. The data should be written to the .hpo-case-annotator config file.

hostservices wrapper

@ielis Could you take a look at the new branch I just pushed, variantvalidator-update.
How do I get a reference to the HostServices from the Wrapper class? (there are two compile errors that result from the hostService)

Phenopacket export

When we export Phenopacket from the internal format, we have:

"diseases": [{
    "term": {
      "id": "OMIM:306400",
      "label": "GRANULOMATOUS DISEASE, CHRONIC, X-LINKED; CGD"
    }
  }]

for the disease, and

"genes": [{
    "id": "ENTREZ:1536",
    "symbol": "CYBB"
  }]

However, I do not know how to create appropriate Resource for these namespaces (OMIM, ENTREZ).

@pnrobinson do you please have any suggestions how to fix it? Otherwise the Phenopacket will not be valid, at least by the code I wrote this morning..

+

When we send a variant like c.123+1G>C to VariantValidator, we need to escape the "+" sign. Probably we can use +

Unusual PubMed string not parsable.

https://www.ncbi.nlm.nih.gov/pubmed/30249733

Error when removing variant using the tool

Was helping @nicolevasilevsky curate a paper: Takagi-2006-WNK1.
We accidentally had an extra variant (it was held over from the previous paper we opened--I think). Anyways, it was incorrect, so I hit 'remove variant'. However, the Validate tool clearly thinks that an additional variant is expected, so it is throwing errors. The rest of the curation is correct, but I do not know how to get rid of the errors.

properties.getProperty("scigraph.mining.url")

The following (in HpoCaseAnnotatorModule) is throwing a null pointer exception when the "Add/remove HPO terms" button is clicked. I am trying to track down where the properties get initialized.

 @Provides     @Singleton     @Named("scigraphMiningUrl")     public URL scigraphMiningUrl(Properties properties) throws MalformedURLException {         return new URL(Objects.requireNonNull(properties.getProperty("scigraph.mining.url")));     }

VariantValidator interaction

It would be excellent to allow the user to open up a Java Webview to check the mutation with VariantValidator. If the user only knows the chromosomal position, they can open up the window and check that everything is correct compared to the HGVS string, and then use VariantValidator to quickly get the snippet.

https://variantvalidator.org/variantvalidation/?variant=GRCh37:1:150550916:G:A

Alternatively, I think it is possible to use variant validator to go from HGVS to genome, this would also save an enormous amount of time. There is also a new API and we could possible do the latter programmatically.

Current release does not link to variant validator

When was that feature added? I will continue to curate, but not sure when that feature was added.
@pnrobinson

Insertion/deletion snippets

This is a valid snippet for an insertion
GGACCTGACACTT[-/TT]ACAACA
but it is not being recognized.
This would be a valid sinppet for a deletion of 2 bases
GGACCTGACACTT[TT/-]ACAACA

Text mining not working in release Hpo Case Annotator v1.0.11

Text mining is not working at all in Hpo Case Annotator v1.0.11, but the HPO tree browser still does work.

Numbering of indels

I am wondering if the position check in the snippet is off by one
I just added this case report
data/casereports/Unger-2008-CCNQ.json
It is from this variant
https://www.ncbi.nlm.nih.gov/clinvar/variation/10674/

chrX
pos: 152860131
ref: T
alt: TT
snippet: TTGGGT[T/TT]AAAGTACCT

If I run the validate function of HCA, I get "Ref sequence T does not match the sequence A observed at X:152860130-152860131

I then tried to run Jannovar on this but do not get the same variant as in ClinVar

$ java -jar jannovar-cli-0.27.jar annotate-pos -d data/hg19_ucsc.ser -c 'chrX:152860131T>TT'
Options
JannovarAnnotatePosOptions [genomicChanges=[chrX:152860131T>TT], toString()=JannovarAnnotationOptions [useThreeLetterAminoAcidCode=false, nt3PrimeShifting=false, showAll=false, databaseFilePath=data/hg19_ucsc.ser, toString()=JannovarBaseOptions [reportProgress=true, httpProxy=null, httpsProxy=null, ftpProxy=null, verbosity=1]]]
Deserializing transcripts...
INFO Deserializing JannovarData from data/hg19_ucsc.ser
INFO Deserialization took 4.19 sec.
#change	effect	hgvs_annotation	messages
chrX:152860131T>TT	FRAMESHIFT_VARIANT	CCNQ:uc011myr.2:c.291dup:p.(L98Tfs*30)	INFO_REALIGN_3_PRIME

Something weird is going on, hopefully I am not making more than one dumb mistake at a time.

Make hg37 the default build

Sorry, we are still old-fashioned

read the docs

Are we ready to make this repository public and create some read the docs?

Feature request: Bigger dropdown box for disease name

NOT URGENT.

Having issues with long named diseases and choosing between the different types.

Example: I wanted SPONDYLOEPIMETAPHYSEAL DYSPLASIA WITH JOINT LAXITY, TYPE 1, WITH ORWITHOUT FRACTURES; SEMDJL1

But I was typing it in from the paper--so I didn't exactly know what it was called. The first that came up was Type 2...and it was very difficult to search through the results to select what I wanted.

automatically remove space from disease id entry

This saves time since copy-paste seems always to catch some whitespace that needs to be removed manually.

Variant Validator chromosome issue

The call to variant validator always seems to use chromosome 1 and not the actual chromosome.

Set OMIM as default database

I tried to do this in DiseaseCaseDataController but it did not have any effect
(at about line 435, in the init function)

diseaseDatabaseComboBox.getSelectionModel().selectFirst(); // OMIM as the default

HPO text-mining terms do not identify "Not" or "No" for terms

When using the text mining tool, there is no identification of 'not'/ 'no' phenotypes.
Below is an example of what I added to the text-mining box. It does not matter if I put 'no' or 'not' in front of the term. It identifies them always as a present phenotype and as far as I know, from that box, I cannot further negate it.

Psychomotor retardation
Developmental regression (from age)
Seizures/ epilepsy
Lennox Gestaut
Myopathy
Abnormal basal ganglia
Cerebral atrophy
Elevated serum lactate
elevated lactate, elevated malate, elavated succinate

NOt Ataxia
Not Dystonia
Not Pyramidal tract involvement
Not Hepatomegaly
Not Cardiomyopathy
Not Leukodystrophy

Resource files management

Resource files (genome fasta file, hpo obo file) rarely change between different releases of Hpo Case Annotator.
At the moment, each new release of the app needs to have its own resource folder (e.g. $HOME/.hpo-case-annotator-1.0.11) and user is forced to configure app each time a new release is made.

It would be good to simplify installation of the new release by using resource folder from previous release (if there is any)

Disease database

It does not make sense to have NCI being an option here, because the app is not designed for curating cancer cases. We can probably just remove the pull down menu, but we should at least make OMIM show up by default.

Update in PhenoPacketCodec

We have switched the genomeAssembly element in Phenopackets from an ENUM to a String, and so the following code no longer workds. Essentially, the Phenopackets now just needs Strings like "GRCh37"

 private static String hcaGenomeAssemblyToPhenoPacketGenomeAssembly(org.monarchinitiative.hpo_case_annotator.model.proto.GenomeAssembly genomeAssembly) {
            switch (genomeAssembly) {
                case GRCH_37:
                    return GenomeAssembly.GRCH_37.name();
                case GRCH_38:
                    return GenomeAssembly.GRCH_38.name();
                case UNKNOWN_GENOME_ASSEMBLY:
                case UNRECOGNIZED:
                    return GenomeAssembly.UNKNOWN_ASSEMBLY.name();
                default:
                    LOGGER.warn("Unknown genome assembly: {}", genomeAssembly);
                    return GenomeAssembly.UNKNOWN_ASSEMBLY.name();
            }
        }

1.0.7

Thanks! I can now open the 1.0.7 files. The family info and the genome build is not getting inputted correctly, although it is there in the Java bean file.

<void property="familyInfo">
   <void property="familyOrPatientID">
    <string>Patient 1</string>
   </void>
  </void>
  <void property="genomeBuild">
   <string>37</string>
  </void>

I want to try and experiment with code to input these files and output JSON, and also to finalize our new model. For now I have started a new repo
https://github.com/pnrobinson/beanjson
but we can merge that into this repo if it is working OK.

Add genome build to VariantValidator transcript URL

Add this to the URL we send to VV: &primary_assembly=GRCh37

The data model structure

We should update the data model structure of the app. I would propose starting by editing the current model schema here.

String index out of range: -7

With Hpo Case Annotator v1.0.13

I get

xception in thread "JavaFX Application Thread" java.lang.StringIndexOutOfBoundsException: String index out of range: -7
	at java.lang.String.substring(String.java:1967)
	at org.monarchinitiative.hpotextmining.gui.controller.Present.colorizeHTML4ciGraph(Present.java:226)
	at org.monarchinitiative.hpotextmining.gui.controller.Present.setResults(Present.java:441)
	at org.monarchinitiative.hpotextmining.gui.controller.HpoTextMining.lambda$new$0(HpoTextMining.java:88)
	at org.monarchinitiative.hpotextmining.gui.controller.Configure.lambda$analyzeButtonClicked$0(Configure.java:89)
	at com.sun.javafx.event.CompositeEventHandler.dispatchBubblingEvent(CompositeEventHandler.java:86)
(....)

This is the text:

Febrile seizures	HP:0002373HPOs: Dysarthria	HP:0001260HPOs: Loss of ability to walk	HP:0006957HPOs: Myoglobinuria	HP:0002913HPOs: Focal-onset seizure	HP:0007359HPOs: Apnea	HP:0002104HPOs: Elevated serum creatine kinase	HP:0003236HPOs: Hyperammonemia	HP:0001987HPOs: Hypoglycemia	HP:0001943HPOs: Myopathic facies	HP:0002058HPOs: Microcephaly	HP:0000252HPOs: Hyperactive deep tendon reflexes	HP:0006801HPOs: Babinski sign	HP:0003487HPOs: Exotropia	HP:0000577HPOs: Developmental regression	HP:0002376HPOs: Elevated serum creatine kinase	HP:0003236HPOs: Intellectual disability	HP:0001249HPOs: Rhabdomyolysis	HP:0003201HPOs: Microcephaly	HP:0000252HPOs Free Text: Urine myoglobin of 94 ng/ml (normal range 10–65 ng/ml), serum creatine phosphokinase (CPK) of 205,000 U/l (normal range 75–230 U/l), elevated aspartate aminotransferase (AST) of 1,618 U/l (normal range 15–50 U/l), alanine aminotransferase (ALT) of 571 U/l (normal range 10–25 U/l), ammonia of 122 μmol/l (normal range 22–48 μmol/l), and hypoglycemia (blood glucose 30 mg/dl; normal range 70–110 mg/dl)
Not HPOs: Arrhythmia	HP:0011675Not HPOs Free Text: -
Variants: NM_152906.6:c.460G>A (p.Gly154Arg)
ClinVar ID: 208823

FamilyInfo

I am concerned that code such as this is fragile:

   case 10:

Can we explore how to do this using named getters/setters?
A lot of the code is now marked as deprecated as well.

Add version number to accession

I initially did not realize this but the eutils do provide a version number like this:

<Gene-commentary_accession>NM_000138</Gene-commentary_accession>
 <Gene-commentary_version>4</Gene-commentary_version>

Therefore, it will be an easy fix to add this -- figure out how to parse the XML in a more elegant way!

HpoCaseAnnotator/hpo-case-annotator-core/src/main/proto/model.proto

Line 20 in 41de168

repeated OntologyClass phenotype = 8;

is modeled like this:

https://github.com/phenopackets/phenopacket-schema/blob/4ce66acabfd3cc0f66e0a33bc5199bb80c4b2c87/src/main/proto/org/phenopackets/schema/v1/core/base.proto#L44-L106

Phenopacket export feature

HpoCaseAnnotator should be able to export the maximum possible amount of information from its data model into phenopackets.

Phenopacket schema v0.1.0 is used and the Phenopacket is saved in JSON format.