Comments (19)
So the first error you see in chr1.alignments.log
is normal, as genipe is trying to fix the strand misalignment between your dataset and the reference panel (as per issue #50 (comment)).
For the phasing issue, I'm guessing a memory issue, as the graphs are not building... What you can do is manually execute shapeit and see what happens. To do so, just use the shapeit binary and add the options used for the analysis (second line of the file chr1.final.phased.log
).
It should look something like this. Make sure to be in the same directory as the one you ran genipe.
"$PATH_TO_BINARY"/shapeit \
--thread 1 \
-B genipe/chr1/chr1.final \
-M "$PATH_TO_GENETIC_MAP"/genetic_map_chr1_combined_b37.txt \
-O genipe/chr1/chr1.final.phased \
-L genipe/chr1/chr1.final.phased.log
from genipe.
Please have a look at SHAPEIT's log file for chromosome 1, since this looks like a SHAPEIT issue.
from genipe.
In the files genipe_tutorial/genipe/chr1/chr1.alignments.log
and genipe_tutorial/genipe/chr1/chr1.to_exclude.alignments.log
I found the following lines at the end:
Reading SNPs in [.../genipe_tutorial/1000GP_Phase3/1000GP_Phase3_chr1.legend.gz]
* 149343 reference panel sites included
* 6351015 reference panel sites excluded
ERROR: Reference and Main panels are not well aligned:
* #Missing sites in reference panel = 443
* #Misaligned sites between panels = 56
* #Multiple alignments between panels = 0
The other log files genipe_tutorial/genipe/chr1/*.log
seems to contain no remarkably entries.
Btw.: I got the error also on a virtual machine with only 4 cores. Therefore it seems to be independent of the number of available cores.
from genipe.
This is a SHAPEIT issue. I manually tried SHAPEIT with 40 threads.
./bin/shapeit \
--thread 40 \
-B genipe/chr1/chr1.final \
-M /home/lemieuxl/genipe_tutorial/1000GP_Phase3/genetic_map_chr1_combined_b37.txt \
-O genipe/chr1/chr1.final.phased \
-L genipe/chr1/chr1.final.phased.log
I got the following error in the console.
shapeit: src/modes/phaser/phaser_algorithm.cpp:150: void phaser::phaseSegment(int): Assertion `conditional_index[segment].size() >= 2' failed.
Aborted (core dumped)
There is no usable information in the log file (chr1.final.phased.log
) because of the failure.
According to SHAPEIT's documentation:
This option is recommended only if you have a large number of individuals in your dataset.
Note that the dataset used in the tutorial only has 90 samples.
from genipe.
I now used genipe with a real dataset (2562 instances, about 840 thousand SNPs) and got the error lines (with other values) also for 20
threads and even without the options --shapeit-thread
and --thread
at all. But in that case (with the value 20, I didn't get a result without that option because of large running time) genipe didn't stopped when the error was written into the log file but seems to produce some results. But after it ended (after some hours only for chromosome 1) I got error messages like the following:
[... WARNING] impute2_chr1_135000001_140000000: there are no SNPs in the imputation interval
[... ERROR] Task 'IMPUTE2 chr1 from 1 to 5000000': did not finish...
[... ERROR] Task 'IMPUTE2 chr1 from 5000001 to 10000000': did not finish...
[... INFO] Task 'IMPUTE2 chr1 from 10000001 to 15000000': performed in 16,726 seconds
[... ERROR] Task 'IMPUTE2 chr1 from 15000001 to 20000000': did not finish...
...
[... INFO] Task 'IMPUTE2 chr1 from 245000001 to 250000000': performed in 7,475 seconds
[... ERROR] the following task did not work: ['IMPUTE2 chr1 from 1 to 5000000', 'IMPUTE2 chr1 from 5000001 to 10000000', ...]
usage: genipe-launcher [-h] [-v] [--debug] [--thread THREAD] --bfile PREFIX
...
Please notice the INFO messages among the ERROR messages.
Can it be that there is an unfavorable parametrization which genipe uses for IMPUTE2? Can it be that there are two independent problems (one which terminated genipe and one that produced the error messages in the log file)?
Does someone know of a workaround how I can use genipe for our real dataset?
from genipe.
genipe executes tasks in parallel (according to the --thread
option). It will wait to run all the tasks of a specific step (in your case, imputation) before it stops because of an error. This explains the error messages among the information messages. If you rerun genipe, it will only redo the failed or incomplete tasks.
To investigate why IMPUTE2 failed, you need to have a look at the corresponding log file. Could it be a memory issue?
from genipe.
You said that in a rerun of genipe it will redo failed tasks. That is a very good general workaround (maybe you can add that to the documentation): "If there were errors output, just try to rerun genipe." I thought that every rerun would give the same errors but that seems to be not the case (see below).
You asked if there could be a memory issue (very good idea), and the answer is: Yes. Directly after starting the impute2 processes most of these uses less than one GB memory. But after some time they use about 12 GB memory each. And the system has not 240 GB (20 threads times 12 GB) memory. Therefore I will try to run it with a smaller number of threads (just stared it).
You pointed me to the log files. But there are no special messages in the log files apart from those I cited above. But for every finished impute2 task there are 5 files (*.impute2
, *.impute2_info
, *.impute2_info_by_sample
, *.impute2_warnings
, *.impute2_summary
). And I noticed that there is none of these files for the tasks for which there was an error message (the tasks which were not finished).
Now I understand that there were three problems:
-
The main problem for the processing of our real dataset seems to be too less memory for the specified number of impute2 threads/processes. With a smaller number of impute2 threads it should work. Maybe you can add a warning about a too large number of threads with respect to memory to the documentation.
-
The other problem which results in the error messages in the log files cited above comes from shapeit. I get that for the test dataset even with
--shapeit-thread 1
. But I don't know if it is critical. At least it does not stop genipe. -
And there is another serious problem with shapeit. When the value for
--shapeit-thread
is too large (probably with respect to the number of samples in the dataset) shapeit and genipe stop working (in a rerun genipe stops after a few seconds).
Suggestion: Add something like the following to the description of the option --thread
: "That is the number of impute2 processes which are started in parallel."
from genipe.
I was successful in running genipe for our real dataset (the test was done only for chromosome 1). The problem was the to large number of impute2 processes for the available memory (problem 1 in the preceding comment).
The problem 2 in the preceding comment is still unsolved (if it is really a problem). Should I open a new issue for that?
from genipe.
I have a same issue with
genipe-launcher: error: the following task did not work: ['SHAPEIT phase chr1']
My error message is like this.
ERROR: Reference and Main panels are not well aligned:
- #Missing sites in reference panel = 2994
- #Misaligned sites between panels = 525
- #Multiple alignments between panels = 0
Have you solved this issue?
Thank you.
from genipe.
Does this error message come from chr1.alignments*.log
or chr1.final.phased.log
?
If it comes from the former, this was referenced in the issue #50 (comment).
To get the log information for the phasing step of chromosome 1, make sure to look at the content of the file chr1.final.phased.log
.
from genipe.
The error message of
ERROR: Reference and Main panels are not well aligned:
- #Missing sites in reference panel = 2994
- #Misaligned sites between panels = 525
- #Multiple alignments between panels = 0
from chr1.alignments*.log
.
I checked chr1.final.phased.log
, it is showing like this:
Parameters :
- Seed : 1574885865
- Parallelisation: 1 threads
- Ref allele is NOT aligned on the reference genome
- MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
- Model: 100 states per window [100 H + 0 PM + 0 R + 0 COV ] / Windows of ~2.0 Mb / Ne = 15000
Reading site list in [genipe/chr1/chr1.final.bim]
- 72809 sites included
Reading sample list in [genipe/chr1/chr1.final.fam]
- 4321 samples included
- 4321 unrelateds / 0 duos / 0 trios in 4321 different families
Reading genotypes in [genipe/chr1/chr1.final.bed]
- Plink binary file SNP-major mode
Reading genetic map in [/data1/home/jungj7/genipe_tutorial/1000GP_Phase3/genetic_map_chr1_combined_b37.txt]
- 248796 genetic positions found
- #set=59644 / #interpolated=13165
- Physical map [0.09 Mb -> 249.21 Mb] / Genetic map [0.00 cM -> 293.39 cM]
Checking missingness and MAF...
- 0 individuals with high rates of missing data (>5%)
- 0 SNPs with high rates of missing data (>5%)
- 2637 monomorphic SNPs
- 10874 missing genotypes automatically imputed at monomorphic SNPs
- 494 singletons SNPs
Building graphs ...
Thank you.
from genipe.
I used the manual way of shapeit
It seems working now.
Now, it is building graphs with [859/4321].
I was wondering, do I have to phasing for each chr by manual way?
Thank you.
from genipe.
I attached the current situation of my shapeit.
Why it takes too long for only phasing chr1?
Is there any other way we can do phasing for each chr?
from genipe.
The phasing should be done by genipe. I asked you to run it manually for debugging purposes. Now we know that shapeit can phase chromosome 1.
Using genipe, how many chromosomes were you phasing at the same time? This would be the --thread
option. If this value is too high and your computer doesn't have enough memory, it could explain why the task failed.
Also, recalling genipe will redo the failed tasks, and continue where it left off.
from genipe.
I used --thread 1
at the same time.
I'm not sure why my compute cannot run with only thread 1. Do I need to use --shapeit-thread
?
from genipe.
Use 1 in both cases, and rerun genipe, see what it does.
from genipe.
I used 1 both cases, but it is still running in genipe.
Do I have to use shapeit? My goal is to use SKAT from the final imputed data set in genipe.
from genipe.
It looks like it's running now. I don't know how many threads you were using at first, but I'm pretty sure it was a memory issue... How much memory does your computer has.
If you already have imputed data (from IMPUTE2), you can use SKAT directly on those. Otherwise, you need to let genipe finish the imputation process.
from genipe.
I installed genipe on a CentOS system with 20 physical cores and 40 logical cores. I tested genipe as described in the documentation on http://pgxcentre.github.io/genipe/installation.html and executed
genipe_tutorial
. I added the option--shapeit-thread
to the generated script and was able to execute it. But after I increased the value of that parameter, I got an error without an informative mesage about the reason.In detail I did the following:
- Installed genipe and tested the installation
- Executed the following commands
cd
wget http://statgen.org/wp-content/uploads/Softwares/genipe/supp_files/hg19.tar.bz2
wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz
mkdir $HOME/genipe_tutorial
mkdir $HOME/genipe_tutorial/hg19
cd $HOME/genipe_tutorial/hg19
tar -jxf $HOME/hg19.tar.bz2
cd $HOME/genipe_tutorial
tar -zxf $HOME/1000GP_Phase3.tgz
touch 1000GP_Phase3/genipe_tut_done
cd
source genipe_pyvenv/bin/activate
genipe-tutorial
deactivateIn
genipe_tutorial/execute.sh
I did the following changes:
- Replaced
--chrom autosomes
by--chrom 1
(to impute only the SNPs on the first chromosome for a test)- After the line with
--thread
I added a line with--shapeit-thread 20 \
Then I started the imputation:
source genipe_pyvenv/bin/activate
genipe_tutorial/execute.sh
deactivateThe imputation was successful.
Then I changed the value of
--shapeit-thread
ingenipe_tutorial/execute.sh
from20
to40
, removed the geneated directory and started the imputation again:rm -r genipe_tutorial/genipe/
source genipe_pyvenv/bin/activate
genipe_tutorial/execute.shI got the following messages:
[... INFO] Phasing markers [... ERROR] Task 'SHAPEIT phase chr1': did not finish... [... ERROR] the following task did not work: ['SHAPEIT phase chr1'] usage: genipe-launcher [-h] [-v] [--debug] [--thread THREAD] --bfile PREFIX [--reference FILE] [--chrom CHROM [CHROM ...]] [--output-dir DIR] [--bgzip] [--use-drmaa] [--drmaa-config FILE] [--preamble FILE] [--shapeit-bin BINARY] [--shapeit-thread INT] [--shapeit-extra OPTIONS] [--plink-bin BINARY] [--hap-template TEMPLATE] [--legend-template TEMPLATE] [--map-template TEMPLATE] --sample-file FILE [--hap-nonPAR FILE] [--hap-PAR1 FILE] [--hap-PAR2 FILE] [--legend-nonPAR FILE] [--legend-PAR1 FILE] [--legend-PAR2 FILE] [--map-nonPAR FILE] [--map-PAR1 FILE] [--map-PAR2 FILE] [--impute2-bin BINARY] [--segment-length BP] [--filtering-rules RULE [RULE ...]] [--impute2-extra OPTIONS] [--probability FLOAT] [--completion FLOAT] [--info FLOAT] [--report-number NB] [--report-title TITLE] [--report-author AUTHOR] [--report-background BACKGROUND] genipe-launcher: error: the following task did not work: ['SHAPEIT phase chr1']
When I run shapeit using a VCF file consisting of 30 peanut samples, some chromosomes run successfully while others did not. The error message is "Assertion `conditional_index[segment].size() >= 2' failed.".
I solved this problem by three steps after scanned mails in the mail list https://www.jiscmail.ac.uk/cgi-bin/wa-jisc.exe?A2=ind1503&L=OXSTATGEN&P=R10095. I found this website in the instruction page of shapeit "To ask a question about SHAPEIT please subscribe to the OXSTATGEN mailing list" https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#Contact.
- Inluded more SNP in VCF file. Firstly, I filtered my raw VCF file using --max-missing 1. Due a sample with high missing rate larger than 38%, the SNP number shrank to the half of raw VCF. Therefore, I removed this high missing sample and using --max-missing 0.95. The SNP number increased.
- Set the
-T 1
in shapeit as I only had 30 samples in the VCF files. - Set the
--window 10
in shapeit. I guessed that the SNP density of my peanut samples was lower than that of human. So I increased the window from the defualt value 2 to 5, and to 10. Finally the value 10 succeeded.
from genipe.
Related Issues (20)
- CoxPH and categorical variables
- options to use bfile instead of impute2 for SKAT? HOT 1
- genipe-tutorial fails when broken link HOT 2
- Unknown format code 'd' for object of type 'float' HOT 3
- Error in file genipe/pipeline/cli.py HOT 2
- Module statsmodels, [... WARNING] interaction term is categorical HOT 1
- Error with module lifelines, name 'dmatrices' is not defined HOT 3
- Deprecation warning for barh
- Existing directory 1000GP_Phase3 is not detected HOT 1
- Define an option for the directory with the IMPUTE2 reference files HOT 1
- Some flaws of the documentation
- ERROR: Reference and Main panels are not well aligned HOT 1
- ERROR missing optional module: pyplink
- Errors when testing newly installed genipe: TypeError: __init__() got an unexpected keyword argument 'normalize' HOT 5
- chrX pipeline HOT 4
- Unreleased version of BioPython (1.73) required HOT 2
- Error when --shapeit-extra '--force' HOT 2
- genotype rate is lower than before imputation
- Support for Impute5 and Shapeit4 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genipe.