kingsford-group / kourami Goto Github PK

Kourami: Graph-guided assembly for HLA alleles

License: BSD 3-Clause "New" or "Revised" License

Java 80.72% Shell 19.28%

hla-database imgt hla hla-typing sequence-search wgs

kourami's Introduction

-hhy+.                o o       o o       o o o o       o o
.`           -syss:---.`        o     o o o     o o o         o o o     o o o
:+:`     .:/o+++++///ommy+`    o       _  __                               _
`yhs/..:osssooooo++++dmNNNdo`   o     | |/ /___  _   _ _ __ __ _ _ __ ___ (_)
 /syy///++++ooooooooodNMdNdmh: o      | ' // _ \| | | | '__/ _` | '_ ` _ \| |
 -do/` .://++++++++oodmmmmmmd-        | . \ (_) | |_| | | | (_| | | | | | | |
 .+:     `.://///+///ommmmdy-         |_|\_\___/ \__,_|_|  \__,_|_| |_| |_|_|
  .          -syo----..``          
            +y+.

Overview

Kourami is a graph-guided assembler for HLA haplotypes covering typing exons (exons 2 and 3 for Class I and exon 3 for Class II) using high-coverage whole genome sequencing data. Kourami constructs highly accurate haplotype sequences at 1-bp resolution by first encoding currently available HLA allelic sequences from IPD-IMGT/HLA Database ( http://www.ebi.ac.uk/ipd/imgt/hla/ ) as partial-ordered graphs. Each database allele is naturally encoded as a path through the graph and any detectable genetic variations (SNPs or indels) not captured by the known sequences are added to the graph by graph-modification based on read alignment to capture differences novel alleles have compared to known sequences. Unlike previously available WGS-based HLA typing methods (database-matching techniques), Kourami direclty assembles both haplotypes for each HLA gene (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1). From version 0.9.4 or later, Kourami supports additional HLA loci. It also provides the typing result (6-digit 'G' resolution) by outputing the best matching alleles among the known sequences whenever 'G' grouping information is available.

Release

The latest release, including both jar and source code can be downloaded from here.

Support

Kourami is, and will continue to be, freely and actively supported on a best-effort basis.

If you need industrial-grade technical support, please consider the options at oceangenomics.com/support.

Installation

To install Kourami, you must have following installed on your system:

JDK 1.8+
Apache Maven (3.3+) or Apache Ant (1.9+) is required (we recommend Maven for easy dependency downloads)
- OR you must have dependencies downloaded and added to your CLASSPATH. Then you can compile using javac.
- To use Ant, you must have dependencies downloaded and place jars under 'exjars' directory. 'exjars' directory must be created.

-A copy of the preformatted IMGT-HLA database (Kourami panel) can be obtained using a script. The panel sequence file needs to be bwa indexed before using and this is NOW done by the script when it downloads the database. The script will download and install the database under db directory under the Kourami installation directory. The download and index script can be run from the kourami installation directory:

scripts/download_panel.sh

[MAVEN USERS] To compile and generate a jar file run the following command from the kourami directory where pom.xml is located.

mvn install

[ANT USERS] To compile and generate a jar file run the following command from the kourami directory where build.xml is located.

ant compile jar

This will create a "target" directory and place a packaged jar file in it.

Usage

java -jar <PATH_TO>/Kourami.jar [options] <bam-1> ... <bam-n>

NOTE: kourami jar takes a bam aligned to Kourami reference panel built from IMGT/HLA db (included in the preformatted IMGT-HLA database). Detailed notes on how to generate input bam consisting of HLA loci reads aligned to known alleles is explained in How to prepare input bam and HLA panel for Kourami.

Option Tag	Description
-h,--help	print this message
-d,--msaDirectory <path>	build HLAGraph from gen and nuc MSAs provided by IMGT/HLA DB from given directory (required). Can be downloaded by running `scripts/download_panel.sh`.
-o,--outfilePrefix <outfile>	use given outfile prefix for all output files (required)
-a,--additionalLoci	type additional loci (optional)

Output

<outfileprefix>.result contains the typing result and the columns are:
1: Allele
2: #BasesMatched
3: Identity (#BasesMatched/MaxLen(query, db_allele))
4: Length of the assembled allele
5: Length of the matched allele from IMGT/HLA DB
6: Combined bottleneck weights of both paths at a position. This is not necessarily same as the sum of column 7 and 8. 7: Weight of the bottleneck edge in path 1 8: Weight of the bottleneck edge in path 2

Note: Given a path, a bottleneck edge is an edge with the minimal weight. For an allele, there are always two entries (lines) reported in the result file. Path 1 is reported first, and path 2 is reported in the following line. The columns 6 7 8 are going to be redundant (same) for both lines.

<outfileprefix> contiains program log

Assembled allele sequences are outputed in files ending with .typed.fa (multi-FASTA format)

Dependencies

Dependecies can be easily downloaded by using Maven install command.

In each release, the pre-compiled jar is distributed with all necessary jars for dependencies, and they are:

JGraphT 0.9.1 ( http://jgrapht.org/ )
Apache Commons CLI 1.4 ( https://commons.apache.org/proper/commons-cli/ )
fastutil 7.0.13 : Fast & compact type-specific collections for Java ( http://fastutil.di.unimi.it/ )

How to cite Kourami

Please cite our paper available on Genome Biology:

Lee, H., & Kingsford, C. Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery. Genome Biology 19(16), 2018

kourami's People

Contributors

Stargazers

Watchers

Forkers

heewookl al3n70rn mojaveazure vladimirkovacevic biocodings gjhanchem flywind2 shulp2211 lee-cbg freshfischer gokalpcelik mainguyenanhvu

kourami's Issues

Duplicate reads in BAM

Hi,

This tool is very useful in my workflow but I have a question about the preprocessing step.

The chr6 coordinates overlap causing the final alignment to have duplicate primary alignments for the same read. The number of primary alignments matches the number of times the filter step used "samtools view chr6:START1-END1...chr6:STARTn-ENDn" .

For example, if I use this process to filter reads it duplicats reads in the first samtools view and then duplicats them a second time for the second samtools extraction. These four reads then have primary alignments to different contigs in the panel (each of which has a flag of 83 or 163)
https://github.com/Kingsford-Group/kourami/blob/master/scripts/alignAndExtract_hs38DH.sh

A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*266:01	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*266:01	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*379:01	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*379:01	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*351:01	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*351:01	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*20:01:02	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*20:01:02	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76

The number of alignments is arbitrarily large depending on how many times the read was included in the FASTQ. They can also align multiple times to the same contig. Should I actually be creating a BAM with all possible alignments or with one primary alignment only?

Null Pointer on Kourami call

I am getting a Null Pointer error while running Kourami on a custom built IMGT HLA alignments version 3.37.0 and using 1000 genomes NA12878 GRCh38 aligned bam file. I went through the steps explained in https://github.com/Kingsford-Group/kourami/blob/master/preprocessing.md to prepare the custom database and prepare the bam file. Both completed successfully but I am getting the following error message:

IMGT HLA 3.37.0 alignment files
https://github.com/ANHIG/IMGTHLA/tree/Latest/alignments

The 1000 genomes NA12878 GRCh38 aligned bam file
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CEU/NA12878/alignment/NA12878.alt_bwamem_GRCh38DH.20150718.CEU.low_coverage.cram

>>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs (38DH)
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
>>>>>>>>>>>>>> indexing extracted bam (38DH)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH)
>>>>>>>>>>>>>> Mapping back to GRCh38_NoALT_wHLA (38DH_NoAlt)
[M::bwa_idx_load_from_disk] read 3171 ALT contigs
[M::process] read 491302 sequences (49621502 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 54, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (365, 385, 395)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (305, 455)
[M::mem_pestat] mean and std.dev: (379.52, 24.97)
[M::mem_pestat] low and high boundaries for proper pairs: (275, 485)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 491302 reads in 257.462 CPU sec, 32.830 real sec
[main] Version: 0.7.15-r1140
[main] CMD: /usr/local/bin/bwa mem -t 8 /shares/hii/bioinfo/ref/gotcloud/hs38DH-db142-v1/hs38DH.fa test_tmp.extract_1.fq.gz test_tmp.extract_2.fq.gz
[main] Real time: 52.121 sec; CPU: 266.018 sec
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
>>>>>>>>>>>>>> indexing extracted bam (38DH_NoAlt)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH_NoAlt)
>>>>>>>>>>>>>> bwa mem to hla panel for Kourami
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 31954 sequences (3227354 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 710, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] skip orientation FF as there are not enough pairs                                                                                             [61488/301933]
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (356, 383, 404)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (260, 500)
[M::mem_pestat] mean and std.dev: (380.44, 32.22)
[M::mem_pestat] low and high boundaries for proper pairs: (212, 548)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 31954 reads in 63.380 CPU sec, 7.992 real sec
[main] Version: 0.7.15-r1140
[main] CMD: /usr/local/bin/bwa mem -t 8 /home/j/jmoreno8/dev/usf-hii/fac-parikhh-teddy-hla-imputation/tmp/kourami_db/3.37.0/All_FINAL_with_Decoy.fa.gz test_extract_1.fq.gz
test_extract_2.fq.gz
[main] Real time: 9.389 sec; CPU: 63.959 sec
----------------REF GRAPH CONSTRUCTION--------------
java.lang.NullPointerException
        at HLASequence.<init>(HLASequence.java:17)
        at MergeMSFs.formDataBase(MergeMSFs.java:39)
        at HLA.loadGraphs(HLA.java:91)
        at HLA.<init>(HLA.java:62)
        at HLA.main(HLA.java:556)

Exception in thread "main" java.lang.OutOfMemoryError

I was trying to run the tool (on a 30x whole genome sequence) and it ran fine for a while, but then ended up with this error (on READ LOADING step). The server that I run it on should have 256Gb memory, I was hoping that this would be enough. Is it possible that the Memory problem is on Java side and I can just increase it somehow or do I need a bigger computer?

test

How to type HLA beyond A.B.C.DQA1.DQB1 and DRB1

such like DPA1 DPB1
DRA
DRB3
DRB4
DRB5
and DM or DO?
Even the quality is not that good for above gene, I really wanna see the possible call set.

Thanks

NullPointerException in MergeMSFs.java

When using IMGT/HLA 3.53.0, calling Kourami.jar to genotype an extracted bam will result in an NPE in MergeMSFs.java. This was traced to HLA*26:236 being present in hla_nom_g.txt but absent in alignments.

Might I suggest the following change in MergeMSFs.java?
Uncomment line 38.
Add after it:-
if (s==null) System.err.println("The following is may be present in hla_nom_g.txt but not in alignments: "+ g.getFirstAllele());

It will then report the ID so it can be deleted from hla_nom_g.txt. Deleting the spurious ID makes it possible to use IMGT/HLA 3.53.0.

Thank you for developing this software - it is pretty useful.

Hard-coded HLA Genes in Java Code

The software is hard-coded to only work on a few of the HLA genes, but there are many others with numerous alleles in IMGT/HLA (e.g. HLA-E, HLA-G, etc.). The partially-ordered graph method seems generic enough that it should work with any input genes - HLA isn't the only polymorphic gene family in the genome. Also, the software seems bloated because it copies and pastes code statements instead of using loops. For example,

    public void printWeights(){
	this.hlaName2Graph.get("A").traverseAndWeights();
	this.hlaName2Graph.get("B").traverseAndWeights();
	this.hlaName2Graph.get("C").traverseAndWeights();
	this.hlaName2Graph.get("DQA1").traverseAndWeights();
	this.hlaName2Graph.get("DQB1").traverseAndWeights();
	this.hlaName2Graph.get("DRB1").traverseAndWeights();
}

seems wasteful because the alleles are defined elsewhere String[] list = {"A" , "B" , "C" , "DQA1" , "DQB1" , "DRB1"}; and could simply be looped over.

formatIMGT

Should this line in formatIMGT.sh
java -Xmx$jvm_memory -cp $SCRIPTD/../target/Kourami.jar FormatIMGT $input_msa $imgt_ver_num $db_base 2> $logfil

not rather read

java -Xmx$jvm_memory -cp $kourami FormatIMGT $input_msa $imgt_ver_num $db_base 2> $logfil

DRB5_gen wrong with IMGT/HLA Release 3.40.0 and HLA Release 3.41.0

Hi
i have try HLA Release 3.40.0 and HLA Release 3.41.0,they all go wrong with DRB5.
And i noticed that Release 3.40.0 and HLA Release 3.41.0 noly have DRB_nuc.txt,but with DRB1_gen.txt,DRB3_gen.txt,DRB4_gen.txt,DRB5_gen.txt 4 splitted gene.txt file.so it make scripts/formatIMGT.sh can not work with the new release.
can you work with this problems?
Thank you

Cannot extract any read from IGSR High Coverage sample

Hi, I tried Kourami for HLA typing recently, the tool itself and the preprocessing script run fine on my sample (aligned on Hg38 + decoy + HLA + no ALT). However, when I tried it on IGSR high coverage sample (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/), the preprocessing script return zero reads. As far as I known, the sample is aligned on Hg38 + decoy + HLA + ALT, but I tried all four scripts, but they all return zero read (I checked the file manually)

Have you ever get into this problem? Please give me some insight on this.

Here is the running log of the preprocessing script.

sample_name=HG02082.final
file=/home/nguyen/IGSR_HC_Data/CRAM/HG02082.final.cram

(base) nguyen@nguyen-ws:~$     $KOUDIR/scripts/alignAndExtract_hs38DH.sh $out_folder/${sample_name}/kourami/$sample_name $file
>>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs (38DH)
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.
>>>>>>>>>>>>>> indexing extracted bam (38DH)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH)
>>>>>>>>>>>>>> Mapping back to GRCh38_NoALT_wHLA (38DH_NoAlt)
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../resources/hs38NoAltDH.fa /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_tmp.extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_tmp.extract_2.fq.gz
[main] Real time: 1.905 sec; CPU: 1.908 sec
>>>>>>>>>>>>>> indexing extracted bam (38DH_NoAlt)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH_NoAlt)
>>>>>>>>>>>>>> bwa mem to hla panel for Kourami
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../db/All_FINAL_with_Decoy.fa.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_2.fq.gz
[main] Real time: 0.056 sec; CPU: 0.056 sec

(base) nguyen@nguyen-ws:~$     $KOUDIR/scripts/alignAndExtract_hs38DH_NoAlt.sh $out_folder/${sample_name}/kourami/$sample_name $file
>>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs (38DH_NoAlt)
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.
>>>>>>>>>>>>>> indexing extracted bam (38DH_NoAlt)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH_NoAlt)
>>>>>>>>>>>>>> bwa mem to hla panel for Kourami
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../db/All_FINAL_with_Decoy.fa.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_2.fq.gz
[main] Real time: 0.056 sec; CPU: 0.057 sec

(base) nguyen@nguyen-ws:~$     $KOUDIR/scripts/alignAndExtract_hs38Alt.sh $out_folder/${sample_name}/kourami/$sample_name $file
>>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs (38Alt)
samtools view: Could not read file "/home/nguyen/Exec/kourami-0.9.6/scripts/../resources/hs38dh.initial.bed": No such file or directory
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.
>>>>>>>>>>>>>> indexing extracted bam (38Alt)
>>>>>>>>>>>>>> bamUtil fastq extraction (38Alt)
rm /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final.tmp.extract.bam /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final.tmp.extract.bam.bai
>>>>>>>>>>>>>> Mapping back to GRCh38_NoALT_wHLA (38DH_NoAlt)
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../resources/hs38NoAltDH.fa /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_tmp.extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_tmp.extract_2.fq.gz
[main] Real time: 34.010 sec; CPU: 8.334 sec
>>>>>>>>>>>>>> indexing extracted bam (38DH_NoAlt)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH_NoAlt)
>>>>>>>>>>>>>> bwa mem to hla panel for Kourami
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../db/All_FINAL_with_Decoy.fa.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_2.fq.gz
[main] Real time: 0.068 sec; CPU: 0.069 sec

(base) nguyen@nguyen-ws:~$ $KOUDIR/scripts/alignAndExtract_hs38.sh $out_folder/${sample_name}/kourami/$sample_name $file
>>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs (38)
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.
>>>>>>>>>>>>>> indexing extracted bam (38)
>>>>>>>>>>>>>> bamUtil fastq extraction (38)
rm /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final.tmp.extract.bam /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final.tmp.extract.bam.bai
>>>>>>>>>>>>>> Mapping back to GRCh38_NoALT_wHLA (38DH_NoAlt)
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../resources/hs38NoAltDH.fa /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_tmp.extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_tmp.extract_2.fq.gz
[main] Real time: 1.898 sec; CPU: 1.899 sec
>>>>>>>>>>>>>> indexing extracted bam (38DH_NoAlt)
>>>>>>>>>>>>>> bamUtil fastq extraction (38DH_NoAlt)
>>>>>>>>>>>>>> bwa mem to hla panel for Kourami
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /home/nguyen/anaconda3/bin/bwa mem -t 8 /home/nguyen/Exec/kourami-0.9.6/scripts/../db/All_FINAL_with_Decoy.fa.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_1.fq.gz /home/nguyen/IGSR_HC_Data/Result/HG02082.final/kourami/HG02082.final_extract_2.fq.gz
[main] Real time: 0.059 sec; CPU: 0.061 sec

how to simulate TGS data (Pacbio) for HLA typing test

Hello：

In oder to test the performence of several softwares, how to simulate TGS data (Pacbio) for HLA typing test

The best!

Software Doesn't Validate Input Parameters

If the BAM file name is mistyped, Kourami doesn't check if the file exists and commences reference graph construction. Good software design means the input parameters are validated before any processing is done, so that the error occurs early.

$ java -jar Kourami.jar -d databases/kourami/ -o test /path/to/nonExistent.bam
----------------REF GRAPH CONSTRUCTION--------------
---------------- READ LOADING --------------
htsjdk.samtools.util.RuntimeIOException: java.io.FileNotFoundException: /path/to/nonExistent.bam (No such file or directory)

Also, the reference graph construction happens every time I run the command. Why is it repeated when it could be done once and saved to disk? I'm glad software such as bowtie only builds its index once (e.g. using bowtie-build) and allows it to be reused in future analyses.

Null Pointer on Kourami call


https://github.com/ANHIG/IMGTHLA/tree/Latest/alignments

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CEU/NA12878/alignment/NA12878.alt_bwamem_GRCh38DH.20150718.CEU.low_coverage.cram

alignAndExtract_hs38Alt uses NoAlt as reference?

Hi I wanted to use hs38DH as reference, hence has used the download_grch38.sh script to download hs38DH and stored the files in the 'resources' folder.

However, when I try to use alignAndExtract_hs38DH.sh script, it didnt work. I looked in the bash script and that's when I realized all the alignAndExtract scripts were pointed to in the code:

grch38_HLA_NoAlt=$resources_dir/hs38NoAltDH.fa
grch38_HLA_NoAlt_index=$resources_dir/hs38NoAltDH.fa.bwt
Why do we still the need noAlt fasta?
Thanks!

novel alleles

How does Kourami tag novel alleles?
what does such result mean
"DRB103:01:01G;DRB103:01:19;DRB103:01:22;DRB103:18;DRB1*03:77" ?

Select which reference fasta to be used by Kourami

How to select whether kourami should use hs38NoAltDH or hs38DH?
Also, is it possible to have as a parameter the location of the fasta instead mandatory to be under resources directory?

Instructions For Updating Database Lacking

The database provided for downloading is over 1 year old and hundreds of new alleles are available in version 3.28.0. There is no script provided to produce a database folder of files as required by the software, so it'll be hard for end-users to use any version other than 3.24.0 of IMGT-HLA.

The last three columns in new output

Hi, after update to current releases, I notice there are 8 columns in the output result, can you explain what the last three columns mean. Thanks

Numerous Software Dependencies of Scripts

The preprocessing scripts have a number of software dependencies, some of which seem unnecessary. For example, bamUtil is required by the script, but the same task can done by samtools.

samtools fastq -1 aSample_R1.fastq -2 aSample_R2.fastq aSample.bam

Also, some of the progress messages are also wrong. For example, in the script alignAndExtract_hs38.sh, on line 69:

echo ">>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs (38)"

The script is for when the reads were not mapped to HLA and ALT contigs.

It'd be convenient if it was available as a Docker container to make it easier to use.

Error: Could not find or load main class FormatIMGT

Hi,

This might be a stupid question but I don't know how to solve it. I have been trying to make my own reference but when I run the formatIMGT.sh I get the following error message:
An error has occurred while formating IMGT/HLA DB.

and when looking in the log file it says:
Error: Could not find or load main class FormatIMGT

These errors don't give much clue to how to fix this issue so what can I do to get the formatIMGT to work?

Thankful for help.
Kind regards,
Jessika

Java error: Null Pointer Exception

I have an issue with my one sample, out of 12, in which I get the following java error during read loading:

java.lang.NullPointerException
at org.jgrapht.graph.AbstractGraph.assertVertexExist(AbstractGraph.java:156)
at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:193)
at HLAGraph.addAndIncrement(HLAGraph.java:274)
at HLAGraph.incrementWeight(HLAGraph.java:297)
at HLAGraph.addWeight(HLAGraph.java:460)
at HLA.processRecord(HLA.java:233)
at HLA.loadReads(HLA.java:188)
at HLA.main(HLA.java:567)

source NULL <<<<<<<<<<<

The other 11 samples work, and all the files were generated using the same methodology:
Bam files unmapped using Picard > Unmapped bam files converted to fastq using Picard > fastq aligned using BWA-mem.

Error: Could not find or load main class HLA

I've prepared docker image with given instructions (installed java, mvn, kourami 0.6.9, bwa, samtools, bamutil). Also, database is prepared in db directory with download_panel.sh together with bwa indexed hs38NoAltDH.fa using download_grch38.sh). Fastq files are aligned with bwa-mem on the same reference producing WES_human_Illumina.bam.
When I run the command on machine with 36 CPUs and 60 GB of RAM:
/bin/bash -c "/opt/kourami-0.9.6/scripts/alignAndExtract_hs38DH.sh SRS000638 WES_human_Illumina.bam" && java -Xmx30000m -jar /opt/kourami-0.9.6/target/Kourami.jar -o SRS000638_ -d /opt/kourami-0.9.6/db SRS000638_on_KouramiPanel.bam

I got the following output:

[main_samview] region "HLA-A*01:01:01:01:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-A*01:01:01:02N:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-A*01:01:38L:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-A*01:02:" specifies an unknown reference name. Continue anyway.
...
[main_samview] region "HLA-DRB1*15:01:01:03:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-DRB1*15:01:01:04:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-DRB1*15:02:01:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-DRB1*15:03:01:01:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-DRB1*15:03:01:02:" specifies an unknown reference name. Continue anyway.
[main_samview] region "HLA-DRB1*16:02:01:" specifies an unknown reference name. Continue anyway.
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 65404 sequences (4970592 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 25735, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (137, 176, 223)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 395)
2018-05-23T16:22:32.109166145Z [M::mem_pestat] mean and std.dev: (183.91, 61.56)
2018-05-23T16:22:32.109186016Z [M::mem_pestat] low and high boundaries for proper pairs: (1, 481)
2018-05-23T16:22:32.109199179Z [M::mem_pestat] skip orientation RF as there are not enough pairs
2018-05-23T16:22:32.109211016Z [M::mem_pestat] skip orientation RR as there are not enough pairs
2018-05-23T16:22:32.384031580Z [M::mem_process_seqs] Processed 65404 reads in 8.344 CPU sec, 1.065 real sec
[main] Version: 0.7.15-r1140
[main] CMD: /bin/bwa mem -t 8 /opt/kourami-0.9.6/scripts/../resources/hs38NoAltDH.fa SRS000638_tmp.extract_1.fq.gz SRS000638_tmp.extract_2.fq.gz
[main] Real time: 3.764 sec; CPU: 10.476 sec
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 7074 sequences (537624 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 58, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
2018-05-23T16:22:34.401888843Z [M::mem_pestat] (25, 50, 75) percentile: (123, 174, 234)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 456)
[M::mem_pestat] mean and std.dev: (188.41, 76.91)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 567)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 7074 reads in 7.572 CPU sec, 0.977 real sec
[main] Version: 0.7.15-r1140
[main] CMD: /bin/bwa mem -t 8 /opt/kourami-0.9.6/scripts/../db/All_FINAL_with_Decoy.fa.gz SRS000638_extract_1.fq.gz SRS000638_extract_2.fq.gz
[main] Real time: 1.144 sec; CPU: 7.660 sec
Error: Could not find or load main class HLA

Any help?

formatIMGT.sh NullPointerException

Hi again,

I have downloaded the Alignments_Rel_3330.zip from https://github.com/ANHIG/IMGTHLA and put the hla_nom_g.txt in that folder. It seem to be working fine at first to create the new reference with formatIMGT.sh but then it crashes, the last part of the log is as follow:

Processing [Y] <<<<<<<<<<
nucRefAl: Y01:01 genRefAl: Y01:01
refGeneName on nuc and gen are same
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_gen.txt
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_nuc.txt
REF SEQ names differs :
(nuc):Y*01:01
(gen):Y
java.lang.NullPointerException
at Sequence.processBlock(Sequence.java:554)
at Sequence.(Sequence.java:498)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:383)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:372)
at MergeMSFs.merge(MergeMSFs.java:298)
at FormatIMGT.processGene(FormatIMGT.java:199)
at FormatIMGT.main(FormatIMGT.java:100)

Any idea what the problem might be?
Thank you in advance for the help!

Typing of non-key exons for major HLA genes

Is it possible to receive complete exon sequences and also to get from Kourami actual 6-digit result without any G-grouping? As I can understand from corresponding article, Kourami uses all exon sequences available in IMGT-HLA, why only key exons are included in output?

typing additional loci

hi,
when i use the -a option it only types A,B,C,DQA1,DRB1. if i specify -a MICA for instance, i get the message "Input bam : A DOES NOT exist. Please check the bam exists." I know the MICA reads are present in the KouramiPanel.bam. So what is the problem.
Also, is there an explanation to the numbers that come in the results file ?

thanks

Is this code still being maintained?

Hello,

I wanted to start using Kourami in one of my projects. And am wondering if this is still being maintained.

Thanks,
Rashesh

IndexOutOfBoundsException in HLA.main

java -jar ~/tools/kourami/target/Kourami.jar -o test -d ~/tools/kourami/db sample2.bam

(where sample2 comprises reads mapped to the reference in db) results in the error:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:653)
        at java.util.ArrayList.get(ArrayList.java:429)
        at HLAGraph.processBubbles(HLAGraph.java:929)
        at HLAGraph.countBubblesAndMerge(HLAGraph.java:869)
        at HLA.countBubblesAndMerge(HLA.java:372)
        at HLA.main(HLA.java:564)

The output contains a stretch of lines reading 'This should NOT HAPPEN', which makes me think that something just might possibly not be happening correctly.

Note that the BAM file was built by filtering chr6 and unmapped reads from a GRCh38 alignment, then realigning resulting fastq files to the db reference.

Output:
test.txt

BAM file statistics:
flagstats:
flagstat.txt

stats:
stats.txt

Error: Could not find or load main class HLA

Follow the instruction step by step, and mnv install was done successfully.

Why the error message still show up after I type

java -jar target/Kourami.jar

Need details for pre-processing

An error indicates:

The input BAM MUST be aligned to the set of IMGT/HLA alleles in /gne/home/matthejb/tools/kourami/db
Please use the recommended preprocessing steps explained on the github page:
https://github.com/Kingsford-Group/kourami

but there does not seem to be such an explanation. I believe I have figured this out, but explicit instructions would be welcome.

java.lang.IndexOutOfBoundsException

I am running Kourami on several samples and ran into the following error with just one sample:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
	at java.util.ArrayList.rangeCheck(ArrayList.java:659)
	at java.util.ArrayList.get(ArrayList.java:435)
	at HLAGraph.pathAlign(HLAGraph.java:1296)
	at HLAGraph.pathAlign(HLAGraph.java:1249)
	at HLAGraph.processBubbles(HLAGraph.java:1043)
	at HLAGraph.countBubblesAndMerge(HLAGraph.java:890)
	at HLA.countBubblesAndMerge(HLA.java:357)
	at HLA.main(HLA.java:585)

This is the only sample out of 39 with this issue. here is my script:

bwa mem -t 2 kourami/db/All_FINAL_with_Decoy.fa.gz sample.hla.1.fastq sample.hla.2.fastq | samtools view -Sb - > mysample.kourami.bam
java -jar /opt/kourami-0.9.6/target/Kourami.jar -d kourami/db/ -o mysample mysample.kourami.bam

Can I use Kourami on other species?

Hi,
I am thinking about assembling MHC-I of some other non-human species using Kourami. And I prepared the input bam file and the reference directory following the guideline for Human HLA. Here is an error message I got.

java -jar $script/Kourami.jar -d $ref -o KouramiPred Kourami.bam

----------------REF GRAPH CONSTRUCTION--------------
java.lang.NullPointerException
at MergeMSFs.merge(MergeMSFs.java:264)
at HLA.loadGraphs(HLA.java:82)
at HLA.(HLA.java:62)
at HLA.main(HLA.java:556)

Do you have any idea what might be the reason for the error? And is it doable for other species?

Thanks,
Yuan

java.lang.NullPointerException during Bubble Processing

Hi,

I'm was running Kourami on a dozens of samples and for a few of them, I get this java error:

Bubble Processing and Path Assembly for:        A
Bubble Processing and Path Assembly for:        B
Bubble Processing and Path Assembly for:        C
Bubble Processing and Path Assembly for:        DQA1
Bubble Processing and Path Assembly for:        DQB1
java.lang.NullPointerException
        at Bubble.removeUnsupported(Bubble.java:842)
        at Bubble.<init>(Bubble.java:379)
        at Bubble.<init>(Bubble.java:384)
        at HLAGraph.countBubbles(HLAGraph.java:2050)
        at HLAGraph.countBubblesAndMerge(HLAGraph.java:900)
        at HLA.countBubblesAndMerge(HLA.java:357)
        at HLA.main(HLA.java:585)

In this case it happened during the Bubble Processing of DQB1 but it sometimes happens for other genes.

I should specify that I changed slightly the preprocessing scripts to work on the hg19 reference and include unmapped reads.
Any idea of what might cause that?
Thank you for your help and amazing work on Kourami!

UPDATING error probabilities of each edge

I used kourami with IMGTHLA version: 3.49.0 (the latest update from https://github.com/ANHIG/IMGTHLA/). I run with four options of HGV favor: hs38,hs38D,hs38DH,hs38NoAltDH.

However, all outputs are empty and logs are the same:

 -d custom_db/v3.48.0-alpha-10-g50e92c6/3.49.0 -a -o output/tesths38 extracted_bam_files/VN_01_00_0016_01_01_on_KouramiPanel.bam
----------------REF GRAPH CONSTRUCTION--------------
Merging HLA sequences and building HLA graphs
processing HLA gene:	A
Traversing (7560)
DONE Traversing
processing HLA gene:	B
Traversing (8999)
DONE Traversing
processing HLA gene:	C
Traversing (7512)
DONE Traversing
processing HLA gene:	DQA1
Traversing (483)
DONE Traversing
processing HLA gene:	DQB1
Traversing (2278)
DONE Traversing
processing HLA gene:	DRB1
Traversing (3298)
DONE Traversing
processing HLA gene:	DOA
Traversing (15)
DONE Traversing
processing HLA gene:	DMA
Traversing (58)
DONE Traversing
processing HLA gene:	DMB
Traversing (71)
DONE Traversing
processing HLA gene:	DPA1
Traversing (455)
DONE Traversing
processing HLA gene:	DPB1
Traversing (2067)
DONE Traversing
processing HLA gene:	DRA
Traversing (32)
DONE Traversing
processing HLA gene:	DRB3
Traversing (436)
DONE Traversing
processing HLA gene:	DRB5
Traversing (181)
DONE Traversing
processing HLA gene:	F
Traversing (56)
DONE Traversing
processing HLA gene:	G
Traversing (104)
DONE Traversing
processing HLA gene:	H
Traversing (67)
DONE Traversing
processing HLA gene:	J
Traversing (32)
DONE Traversing
processing HLA gene:	L
Traversing (5)
DONE Traversing
Done building	19	graphs.
----------------     READ LOADING     --------------
Loading reads from:	VN_01_00_0016_01_01_on_KouramiPanel.bam
Loaded a total of 0 mapped reads.
A total of 0 bases
----------------    GRAPH CLEANING    --------------
A	>>>>> FLATTENED InsertionBubble:	0
B	>>>>> FLATTENED InsertionBubble:	0
C	>>>>> FLATTENED InsertionBubble:	0
DQA1	>>>>> FLATTENED InsertionBubble:	0
DQB1	>>>>> FLATTENED InsertionBubble:	0
DRB1	>>>>> FLATTENED InsertionBubble:	0
DOA	>>>>> FLATTENED InsertionBubble:	0
DMA	>>>>> FLATTENED InsertionBubble:	0
DMB	>>>>> FLATTENED InsertionBubble:	0
DPA1	>>>>> FLATTENED InsertionBubble:	0
DPB1	>>>>> FLATTENED InsertionBubble:	0
DRA	>>>>> FLATTENED InsertionBubble:	0
DRB3	>>>>> FLATTENED InsertionBubble:	0
DRB5	>>>>> FLATTENED InsertionBubble:	0
F	>>>>> FLATTENED InsertionBubble:	0
G	>>>>> FLATTENED InsertionBubble:	0
H	>>>>> FLATTENED InsertionBubble:	0
J	>>>>> FLATTENED InsertionBubble:	0
L	>>>>> FLATTENED InsertionBubble:	0
A	:removed	11634	Edges.
A	:removed	8124	Vertices.
B	:removed	12035	Edges.
B	:removed	8626	Vertices.
C	:removed	11945	Edges.
C	:removed	8596	Vertices.
DQA1	:removed	9912	Edges.
DQA1	:removed	8564	Vertices.
DQB1	:removed	14917	Edges.
DQB1	:removed	12440	Vertices.
DRB1	:removed	35249	Edges.
DRB1	:removed	30950	Vertices.
DOA	:removed	3714	Edges.
DOA	:removed	3682	Vertices.
DMA	:removed	5148	Edges.
DMA	:removed	5081	Vertices.
DMB	:removed	6905	Edges.
DMB	:removed	6823	Vertices.
DPA1	:removed	11529	Edges.
DPA1	:removed	10752	Vertices.
DPB1	:removed	14785	Edges.
DPB1	:removed	13300	Vertices.
DRA	:removed	5928	Edges.
DRA	:removed	5839	Vertices.
DRB3	:removed	15805	Edges.
DRB3	:removed	15172	Vertices.
DRB5	:removed	14126	Edges.
DRB5	:removed	13887	Vertices.
F	:removed	3714	Edges.
F	:removed	3647	Vertices.
G	:removed	3410	Edges.
G	:removed	3280	Vertices.
H	:removed	3925	Edges.
H	:removed	3741	Vertices.
J	:removed	3770	Edges.
J	:removed	3674	Vertices.
L	:removed	3931	Edges.
L	:removed	3887	Vertices.
A	:removed	[DE]:1	[UN]:1	[NumVertices]:2
B	:removed	[DE]:1	[UN]:1	[NumVertices]:2
C	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DQA1	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DQB1	:removed	[DE]:2	[UN]:1	[NumVertices]:3
DRB1	:removed	[DE]:2	[UN]:2	[NumVertices]:4
DOA	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DMA	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DMB	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DPA1	:removed	[DE]:1	[UN]:2	[NumVertices]:3
DPB1	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DRA	:removed	[DE]:1	[UN]:1	[NumVertices]:2
DRB3	:removed	[DE]:2	[UN]:2	[NumVertices]:4
DRB5	:removed	[DE]:1	[UN]:1	[NumVertices]:2
F	:removed	[DE]:1	[UN]:1	[NumVertices]:2
G	:removed	[DE]:1	[UN]:2	[NumVertices]:3
H	:removed	[DE]:1	[UN]:1	[NumVertices]:2
J	:removed	[DE]:1	[UN]:1	[NumVertices]:2
L	:removed	[DE]:1	[UN]:1	[NumVertices]:2
------------ UPDATING error probabilities of each edge ---------
------------     DONE UPDATING error probabilities     ---------
=========================
=  A
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	A
=========================
=  B
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	B
=========================
=  C
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	C
=========================
=  DQA1
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DQA1
=========================
=  DQB1
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DQB1
=========================
=  DRB1
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DRB1
=========================
=  DOA
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DOA
=========================
=  DMA
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DMA
=========================
=  DMB
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DMB
=========================
=  DPA1
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DPA1
=========================
=  DPB1
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DPB1
=========================
=  DRA
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DRA
=========================
=  DRB3
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DRB3
=========================
=  DRB5
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	DRB5
=========================
=  F
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	F
=========================
=  G
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	G
=========================
=  H
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	H
=========================
=  J
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	J
=========================
=  L
=========================
Disconnected Graph. Probably due to not enough coverage to fully assemble or check for any biases in sequencing libraries used.
CANNOT PROCEED for HLA gene:	L

Please help me to solve it. Thank you very much.

Combined bottleneck weights cutoff

Hello,

I was wondering if you have any suggestions on what cutoff to use for the bottleneck weights. Most of the values from my data fall within the range of 15-40. But I also have some at 147 and 2.

I am running it on WGS samples sequenced on NovaSeq with mean coverages ranging from 15x-30x.

Best Regards,
Rashesh

Running Kourami with non human MHC

Is there a chance to run Kourami with my own list of non human MHC alleles?

Kourami stalls upon Bubble Processing and Path Assembly for DPB1

Hi!

I've encountered a problem using Kourami v0.9.6 with IMGTHLA v3.42.0 (preprocessed as per instructions): a single human 30x WGS 100 bp read-length BAM file causes stalling of Kourami at Bubble Processing and Path Assembly for DPB1. Kourami manages to successfully process all previous loci (A, B, C, DQ, DR, etc.). While stalling, Kourami still uses computational resources and its processes are active. It has been stalled for 12 hours already and still does not proceed to the next step.
The observed issue does not happen with any other similar BMA files, which are successfully processed extremely fast within 5 min.
I can provide the on_KouramiPanel.bam file if necessary.

@ckingsford Could you please tell me whether Kourami is still supported? It seems like all of the latest issues do not have any impact. Thank you.