Giter VIP home page Giter VIP logo

shapeit4's Introduction

Segmented HAPlotype Estimation and Imputation Tools version 4 (SHAPEIT4)

VERSION 5

Warning Stop using version 4. Use instead version 5, availble from there:

Introduction

SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm with multiple key additional features:

  • It includes a Positional Burrow Wheeler Transform (PBWT) based approach to quickly select a small set of informative conditioning haplotypes to be used when updating the phase of an individual.
  • We have changed that way in which phase information in sequencing reads is input into the model. We now recommend the use of the WhatsHap tool as a pre-processing step to extract phase information from a bam file..
  • It accounts for sets of pre-phased genotypes (i.e. haplotype scaffold). The scaffold can be derived either from family data or large reference panels.
  • It reads and writes files using HTSlib for better I/O performance in either VCF or BCF formats.
  • The genotype graph and HMM routines have been re-implemented for better hardware usage and performance.
  • The source code is provided in an open source format (license MIT) on github.

If you use the SHAPEIT4 in your research work, please cite the following paper:

Delaneau O., et al. Accurate, scalable and integrative haplotype estimation. Nature Communications volume 10, Article number: 5436 (2019). https://www.nature.com/articles/s41467-019-13225-y

Documentation

https://odelaneau.github.io/shapeit4/

License

This project is licensed under the MIT License - see the LICENSE file for details

shapeit4's People

Contributors

23andme-jaredo avatar jaredo avatar odelaneau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

shapeit4's Issues

unable to install shapeit

/home/dhwani/Documents/Phasing/extra_softwares/htslib-1.10.2/hfile_libcurl.c:297: undefined reference to `curl_global_cleanup'
collect2: error: ld returned 1 exit status
makefile:46: recipe for target 'bin/shapeit4' failed
make: *** [bin/shapeit4] Error 1

Question on recombination maps

Hi,
I noticed the file in test folder chr20.b37.gmap.gz is the same as that supplied by beagle. But this is different than what you get from Eagle download here with name genetic_map_hg19.txt.gz. Would anyone know where to understand these different positions and cMs?

Error: Illegal instruction (core dumped)

Hi,

I'm trying to install shapeit4. I follow the instructions and it seems to compile successfully. But when I run
./shapeit4
I get
Illegal instruction (core dumped)

Below is some output I get from compiling. There are no obvious errors:

g++ -std=c++11 -O3 -mavx2 -mfma -c src/containers/genotype_set.cpp -o obj/genotype_set.o -Isrc -I/usr/local/include -I/usr/include g++ -std=c++11 -O3 -mavx2 -mfma -c src/containers/haplotype_set.cpp -o obj/haplotype_set.o -Isrc -I/usr/local/include -I/usr/include g++ -std=c++11 -O3 -mavx2 -mfma -c src/io/graph_writer.cpp -o obj/graph_writer.o -Isrc -I/usr/local/include -I/usr/include g++ -std=c++11 -O3 -mavx2 -mfma -c src/io/genotype_reader1.cpp -o obj/genotype_reader1.o -Isrc -I/usr/local/include -I/usr/include g++ -std=c++11 -O3 -mavx2 -mfma -c src/io/genotype_reader2.cpp -o obj/genotype_reader2.o -Isrc -I/usr/local/include -I/usr/include g++ -std=c++11 -O3 -mavx2 -mfma -c src/io/haplotype_writer.cpp -o obj/haplotype_writer.o -Isrc -I/usr/local/include -I/usr/include src/io/haplotype_writer.cpp: In member function 'void haplotype_writer::writeHaplotypes(std::string)': src/io/haplotype_writer.cpp:57:15: warning: ignoring return value of 'int bcf_hdr_write(htsFile*, bcf_hdr_t*)', declared with attribute warn_unused_result [-Wunused-result] 57 | bcf_hdr_write(fp, hdr); | ~~~~~~~~~~~~~^~~~~~~~~ g++ -std=c++11 -O3 -mavx2 -mfma -c src/io/gmap_reader.cpp -o obj/gmap_reader.o -Isrc -I/usr/local/include -I/usr/include g++ -std=c++11 -O3 obj/compute_job.o obj/genotype_sweep.o obj/genotype_mask.o obj/genotype_prune.o obj/genotype_build.o obj/genotype_managment.o obj/hmm_parameters.o obj/variant.o obj/haplotype_segment_double.o obj/haplotype_segment_single.o obj/main.o obj/phaser_finalise.o obj/phaser_initialise.o obj/phaser_parameters.o obj/phaser_algorithm.o obj/phaser_management.o obj/pbwt_solver.o obj/builder.o obj/variant_map.o obj/genotype_set.o obj/haplotype_set.o obj/graph_writer.o obj/genotype_reader1.o obj/genotype_reader2.o obj/haplotype_writer.o obj/gmap_reader.o /usr/local/lib/libhts.a /usr/lib/x86_64-linux-gnu/libboost_iostreams.a /usr/lib/x86_64-linux-gnu/libboost_program_options.a -o bin/shapeit4 -lz -lbz2 -lm -lpthread -llzma

Thanks,
Pauline

Compiling with shared libraries

This isn't really an issue---just an FYI in case it's useful. I wasn't able to compile SHAPEIT 4.1 on our system (Centos 7) using the static htslib, but it worked with the dynamic libraries. Perhaps others will find this information useful. Below is the exact Makefile used. Feel free to incorporate this information into your installation instructions.

CXX=g++ -std=c++11

HTSLIB_ROOT=/project2/jnovembre/software/htslib-1.9
HTSLIB_INC=$(HTSLIB_ROOT/include/htslib
HTSLIB_LIB=-L$(HTSLIB_ROOT)/lib -lhts

BOOST_INC=$(BOOST_ROOT)/include
BOOST_LIB_IO=-L$(BOOST_ROOT)/lib -lboost_iostreams
BOOST_LIB_PO=-L$(BOOST_ROOT)/lib -lboost_program_options

CXXFLAG=-O3 -march=native
LDFLAG=-O3

DYN_LIBS=-lz -lbz2 -lm -lpthread -llzma

BFILE=bin/shapeit4
HFILE=$(shell find src -name *.h)
CFILE=$(shell find src -name *.cpp)
OFILE=$(shell for file in `find src -name *.cpp`; do echo obj/$$(basename $$file .cpp).o; done)
VPATH=$(shell for file in `find src -name *.cpp`; do echo $$(dirname $$file); done)

all: $(BFILE)

$(BFILE): $(OFILE)
        $(CXX) $(LDFLAG) $^ $(HTSLIB_LIB) $(BOOST_LIB_IO) $(BOOST_LIB_PO) -o $@ $(DYN_LIBS)

obj/%.o: %.cpp $(HFILE)
        $(CXX) $(CXXFLAG) -c $< -o $@ -Isrc -I$(HTSLIB_INC) -I$(BOOST_INC)

clean:
        rm -f obj/*.o $(BFILE)

Not able to install SHAPEIT4 on Mac

I want to install SHAPEIT4 on the terminal. I downloaded HTSLIB and BOOST under /Downloads folder and set the BOOST PATH to
#BOOST IOSTREAM & PROGRAM_OPTION LIBRARIES [SPECIFY YOUR OWN PATHS]
BOOST_INC=/Users/ruyushi/Downloads/boost_1_73_0/boost # was /usr/include
BOOST_LIB_IO=/Users/ruyushi/Downloads/lib/libboost_iostreams.a # was /usr/lib/x86_64-linux-gnu/libboost_iostreams.a
BOOST_LIB_PO=/Users/ruyushi/Downloads/lib/libboost_program_options.a # was /usr/lib/x86_64-linux-gnu/libboost_program_options.a

and when I put command locate libboost_program_options.a libboost_iostreams.a libhts.a
it works well.

But when I type command make, it generates an error:

shapeit4 git:(master) ✗ make
g++ -std=c++11 -O3 -c src/io/haplotype_writer.cpp -o obj/haplotype_writer.o -Isrc -I/Users/ruyushi/Downloads/include/htslib -I/Users/ruyushi/Downloads/boost_1_73_0/boost
In file included from src/io/haplotype_writer.cpp:22:
In file included from src/io/haplotype_writer.h:25:
src/utils/otools.h:44:10: fatal error: 'boost/program_options.hpp' file not
found
#include <boost/program_options.hpp>
^~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [obj/haplotype_writer.o] Error 1

I go back to my /Downloads/boost_1_73_0/boost, I can find the program_options.hpp file.

Can anyone help me with this? Thanks.

ERROR: Parsing line 0 : incorrect number of columns,

Hi, I'm trying out SHAPEIT4 on my population vcf. My file is gzipped, and aligned to GRCh38. I want to use shapeit4 to impute the missing genotype and phase the genotypes.

My full command is:

shapeit4 --input $chr_file --map /home/nguyen/Exec/shapeit4/maps/genetic_maps.b38.tar.gz --output /mnt/Data/DGV4VN_Data/VCF_pop/DGV4VN.$chr.phased.vcf.gz --thread 14 --region $chr --sequencing
However, it ended with the error (with full log):


SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : [email protected]
  * Version       : 4.1.3
  * Run date      : 22/08/2020 - 11:34:54

Files:
  * Input VCF     : [testMASH_504.chr3.vep.norm.vqsr.vcf.gz]
  * Genetic Map   : [/home/nguyen/Exec/shapeit4/maps/genetic_maps.b38.tar.gz]
  * Output VCF    : [/mnt/Data/DGV4VN_Data/VCF_pop/DGV4VN.chr3.phased.vcf.gz]

Parameters:
  * Seed    : 15052011
  * Threads : 14 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.0005 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active
  * IBD2    : length>=3.00cM [N>=10000 / MAF>=0.000 / MDR<=0.500]

Initialization:
[W::hts_idx_load2] The index file is older than the data file: testMASH_504.chr3.vep.norm.vqsr.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: testMASH_504.chr3.vep.norm.vqsr.vcf.gz.tbi
  * VCF/BCF scanning [N=504 / L=1870919 / Reg=chr3] (362.66s)
[W::hts_idx_load2] The index file is older than the data file: testMASH_504.chr3.vep.norm.vqsr.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: testMASH_504.chr3.vep.norm.vqsr.vcf.gz.tbi
  * VCF/BCF parsing [Hom=84.7% / Het=9.1% / Mis=6.3%] (365.12s)

ERROR: Parsing line 0 : incorrect number of columns, observed: 1 expected: 3

What have gone wrong in my case? Thank you.

Not able to install shapeit4 on Mac

I've been trying to install shapeit4 on Mac (macOS Mojave, 10.14.6).

I installed HTSlib and BOOST (using BREW) and found all the files that I needed to put into the makefile.

#HTSLIB LIBRARY 
HTSLIB_INC=/usr/local/Cellar/htslib/1.10.2/include/htslib # was $(HOME)/Tools/htslib-1.9
HTSLIB_LIB=/usr/local/Cellar/htslib/1.10.2/lib/libhts.a # was $(HOME)/Tools/htslib-1.9/libhts.a

#BOOST IOSTREAM & PROGRAM_OPTION LIBRARIES 
BOOST_INC=/usr/local/Cellar/boost/1.72.0/include/boost # was /usr/include
BOOST_LIB_IO=/usr/local/Cellar/boost/1.72.0/lib/libboost_iostreams.a # was /usr/lib/x86_64-linux-gnu/libboost_iostreams.a
BOOST_LIB_PO=/usr/local/Cellar/boost/1.72.0/lib/libboost_program_options.a # was /usr/lib/x86_64-linux-gnu/libboost_program_options.a

I've also modified the CXXFLAG to run without avx2 (since I don't have that option)

#Portable version without avx2 (much slower)
CXXFLAG=-O3
LDFLAG=-O3
sysctl -a | grep AVX
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR **AVX1.0** RDRAND F16C

But when I use make, I always get this error:

g++ -std=c++11 -O3 -c src/io/haplotype_writer.cpp -o obj/haplotype_writer.o -Isrc -I/usr/local/Cellar/htslib/1.10.2/include/htslib  -I/usr/local/Cellar/boost/1.72.0/include/boost
In file included from src/io/haplotype_writer.cpp:22:
In file included from src/io/haplotype_writer.h:25:
In file included from src/utils/otools.h:63:
src/utils/string_utils.h:64:26: error: use of undeclared identifier 'setiosflags'; did you mean 'std::setiosflags'?
                if (prec >= 0) { ss << setiosflags( std::ios::fixed ); ss.precision(prec); }
                                       ^~~~~~~~~~~
                                       std::setiosflags
/Library/Developer/CommandLineTools/usr/include/c++/v1/iomanip:125:1: note: 'std::setiosflags' declared here
setiosflags(ios_base::fmtflags __mask)
^
In file included from src/io/haplotype_writer.cpp:22:
In file included from src/io/haplotype_writer.h:25:
In file included from src/utils/otools.h:63:
src/utils/string_utils.h:72:26: error: use of undeclared identifier 'setiosflags'; did you mean 'std::setiosflags'?
                if (prec >= 0) { ss << setiosflags( std::ios::fixed ); ss.precision(prec); }
                                       ^~~~~~~~~~~
                                       std::setiosflags
/Library/Developer/CommandLineTools/usr/include/c++/v1/iomanip:125:1: note: 'std::setiosflags' declared here
setiosflags(ios_base::fmtflags __mask)
^
src/io/haplotype_writer.cpp:86:3: warning: ignoring return value of function declared with 'warn_unused_result' attribute
      [-Wunused-result]
                bcf_write1(fp, hdr, rec);
                ^~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/include/htslib/vcf.h:248:33: note: expanded from macro 'bcf_write1'
    #define bcf_write1(fp,h,v)  bcf_write((fp),(h),(v))
                                ^~~~~~~~~ ~~~~~~~~~~~~
src/io/haplotype_writer.cpp:57:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute
      [-Wunused-result]
        bcf_hdr_write(fp, hdr);
        ^~~~~~~~~~~~~ ~~~~~~~
2 warnings and 2 errors generated.
make: *** [obj/haplotype_writer.o] Error 1

shapeit4 drops all format fields

Would it be possible for shapeit4 to avoid dropping variants that were not used for phasing and format fields originally present in the VCF file?

Furthermore, shapeit4 imputes missing genotypes. This seems okay for most applications, but when you have to rely on high quality genotypes for some delicate analyses, this can have unintended consequences. It would be nice to have an option to avoid imputing missing genotypes.

More in general, it would be nice if shapeit4 could just output VCF files equal in all aspects to the original VCF file with the only difference being that the genotypes have changed. This would go a long way guaranteeing that users with diverse applications in mind would not have to go through the additional laborious steps of importing the GT information back into the original VCF. This would be much more in line with the HTSlib/BCFtools philosophy.

About including PS (phase set) information

Hi, I ran shapeit4 using '--use-PS' option. But, PS (phase set) information was not included in the output file of shapeit4.
image

I want to check GT (genotype) with PS (phase set), simultaneously.
If you have a solution, please let me know.
Otherwise, is there only one phase set in the output file (in other words, all phased variants are in the same phase set)?

Bests,

Sijae

ERROR: No variants to be phased in [/shapeit/out.recode.vcf.gz]

Hi !

I try to run the SHAPEIT4 with a multisample .vcf from WES via docker:

docker run -v /Users/shapeit:/shapeit lifebitai/shapeit4 shapeit4 --input /shapeit/out.recode.vcf.gz --map /shapeit/genetic_maps.b38.tar.gz --region 2 --output /shapeit/phased_100_samples_glnexus.vcf.gz --sequencing --thread 12

But get an error regarding the VCF:

`
SHAPEIT

  • Author : Olivier DELANEAU, University of Lausanne
  • Contact : [email protected]
  • Version : 4.1.3
  • Run date : 06/09/2020 - 13:17:59

Files:

  • Input VCF : [/shapeit/out.recode.vcf.gz]
  • Genetic Map : [/shapeit/genetic_maps.b38.tar.gz]
  • Output VCF : [/shapeit/phased_100_samples_glnexus.vcf.gz]

Parameters:

  • Seed : 15052011
  • Threads : 12 threads
  • MCMC : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  • PBWT : Depth of PBWT neighbours to condition on: 4
  • PBWT : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.0005 cM]
  • HMM : K is variable / min W is 2.50cM / Ne is 15000
  • HMM : Recombination rates given by genetic map
  • HMM : AVX2 optimization active
  • IBD2 : length>=3.00cM [N>=10000 / MAF>=0.000 / MDR<=0.500]

Initialization:

  • VCF/BCF scanning ...
    ERROR: No variants to be phased in [/shapeit/out.recode.vcf.gz]
    `

The multisample vcf was made with GLnexus and only variants genotyped >0.8 samples were left.

A glimpse of how the vcf looks is attached.
out.recode.txt

Does the VCF requires some changes ?

Thanks !

ligateHaplotypes/equivalent in shapeit 4

Hi there,

When using region shapeit2 came with the functionality to "stitch" the outputs from different regions together into a single scaffold.

Although shapeit4 has the same region flag- I don't see a mention of postprocessing these chunked outputs in the docs. Is this still possible/supported?

many thanks
Nick

shapeit fails with error message - Segmentation fault (core dumped)

I'm trying to phase a vcf with 983 exome samples.

The command I ran -

/mnt/exome/Softwares/shapeit4/bin/shapeit4.2 --input MOD_hg19.vcf.gz --map /mnt/exome/Softwares/shapeit4/maps/chr"$chr".b37.gmap.gz --region "$chr" --output chr"$chr".phased.vcf.gz

I am looping over all chromosomes, it seems like the error occurs for all chromosomes.

The error -

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : [email protected]
  * Version       : 4.2.0
  * Run date      : 19/01/2021 - 17:25:51

Files:
  * Input VCF     : [MOD_hg19.vcf.gz]
  * Genetic Map   : [/mnt/exome/Softwares/shapeit4/maps/chr1.b37.gmap.gz]
  * Output VCF    : [chr1.phased.vcf.gz]

Parameters:
  * Seed    : 15052011
  * Threads : 10 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active

Initialization:
  * VCF/BCF scanning [N=983 / L=175794 / Reg=1] (44.73s)
  * VCF/BCF parsing [Hom=53.5% / Het=3.2% / Mis=43.3%] (48.68s)
  * GMAP parsing [n=256895] (0.26s)
  * cM interpolation [s=16794 / i=159000] (0.03s)
  * Region length [249218770 bp / 286.3 cM]
  * HMM parameters [Ne=15000 / Error=0.0001 / #rare=45428]
  * PBWT indexing [l=3228] (0.01s)
  * HAP update (1.17s)
  * H2V transpose (0.23s)
  * PBWT phase sweep (12.61s)
  * Build genotype graphs [seg=1869070] (0.55s)

Burn-in iteration [1/5]
  * V2H transpose (0.36s)
  * PBWT selection (1.35s)
  * C2H transpose (0.14s)
  * HMM computations [K=185.689+/-115.639 / W=5.16Mb] (225.76s)
Segmentation fault (core dumped)

What could be the issue?

Typo in version number?

Hello, we installed shapeit4 v4.1.3 using the tarball of 4.1.3 and the compiled binary program reports version 4.1.2. Is this just a typo? Thanks!

Assertion generated at shapeit4: src/phaser/phaser_algorithm.cpp:50

Hi,

Thanks for writing such a good tool and making it MIT licensed.
We have been evaluating shapeit4 on low density cattle snp chip data and a colleague has been generating the following assertion.

shapeit4: src/phaser/phaser_algorithm.cpp:50: void phaser::phaseWindow(int, int): Assertion `threadData[id_worker].Kvec[w].size()>0' failed.
Aborted (core dumped)

The command line we have been using is

shapeit4 --really.simple.ped.phased.vcf.gz --map  1cmMb.map -O  out.vcf -R Chr5 --thread 4  -W 20000000 --effective-size 400  --pbwt-disable-init --mcmc-iterations  6b,1p,2b,1p,2b,20m

The input vcf has 1935 markers ranging over Chr5 which (in cattle) is 121 MB long.
So with -W 20,000,000 we have 6 windows with 400 markers per window.

By experimenting with the parameters I have found setting -W to 200,000,000 i.e force 1 window over the chromosome the command completes.

Is there a recommended minimum number of markers per window?

Segmentation fault (core dumped)

Hi,

I tried using SHAPEIT4, but I got "Segmentation fault (core dumped)" error!

The belows are my command and log.

**shapeit4 --input /home/test.vcf.gz --map /home/genetic_maps.b38/chr11.b38.gmap.gz --region 11 --output /home/phased_test.vcf.gz --log phased_test.log --thread 16

SHAPEIT

  • Author : Olivier DELANEAU, University of Lausanne
  • Contact : [email protected]
  • Version : 4.1.3
  • Run date : 30/11/2020 - 13:26:10

Files:

  • Input VCF : [/home/test.vcf.gz]
  • Genetic Map : [/home/genetic_maps.b38/chr11.b38.gmap.gz]
  • Output VCF : [/home/phased_test.vcf.gz]
  • Output LOG : [phased_test.log]

Parameters:

  • Seed : 15052011
  • Threads : 16 threads
  • MCMC : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  • PBWT : Depth of PBWT neighbours to condition on: 4
  • PBWT : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  • HMM : K is variable / min W is 2.50cM / Ne is 15000
  • HMM : Recombination rates given by genetic map
  • HMM : AVX2 optimization active
  • IBD2 : length>=3.00cM [N>=150 / MAF>=0.010 / MDR<=0.500]

Initialization:

  • VCF/BCF scanning [N=1 / L=1425027 / Reg=chr11] (8.34s)
  • VCF/BCF parsing [Hom=86.0% / Het=8.3% / Mis=5.7%] (9.01s)
  • GMAP parsing [n=168609] (0.08s)
  • cM interpolation [s=158292 / i=1266735] (0.07s)
    Segmentation fault (core dumped)**

The "Segmentation fault (core dumped)" error is quite common error. So I don't know hot to solve this problem.
Please let me know if you have solutions.

Thank you.

Unable to phase chromosome X of a cohort VCF with ~2,000 sample

THIS ISSUE IS NOW OBSOLETE. PLEASE SEE ISSUE #52 INSTEAD.

=====

Hello,

I am trying to use SHAPEIT4 for phasing a large cohort (~2,000 samples) of whole genome sequencing data and I'm running into issues when trying to phase chromosome X VCF. As far I know, the input VCF doesn't use any special representation for chromosome X, but SHAPEIT4 only fails on chrX with this message ERROR: No variants to be phased in [...vcf.gz] while it works well for other chromosomes.

This issues is reproducible with a public 30x WGS release of 1000 Genomes Project phase 3 by New York Genome Center. You can either download the variant calls by DeepVariant+GLnexus at this link in Google Cloud (+ .tbi), or the calls by GATK at this link in Google Cloud (+ .tbi) or official FTP, and run the following command (with the genetic map file included in SHAPEIT4):

$ shapeit4 \
  --input cohort-chrX.release.vcf.gz \
  --map chrX.b38.gmap.gz \
  --region chrX \
  --output cohort-chrX.release.phased.vcf.gz \
  --thread $(nproc) \
  --log shapeit4_output_dvglx_chrX.txt \
  --sequencing

Full output log can be found here. I'm running this in a Debian/Ubuntu machine with intel Xeon CPU. SHAPEIT4 binary was compiled with htslib v1.9.

I have tried the following to fix this issue but all failed:

  1. Changing --pbwt-mdr value to a higher value.
  2. Changing the genetic map file to use chrX instead of X to match the chromosome name in VCF.
  3. Manually changing the chromosome name from chrX to chr1 in both VCF and the genetic map file.
  4. Converting all missing genotype calls (./.) to 0/0.
  5. Skipping the --map flag to use the default flat linkage structure.
  6. Restricting the number of samples to 1000.

I'm using SHAPEIT v4.1.2 for this run, but I also tried v4.2.0 and it failed with the same error message. Just as a reference, I also tried phasing the same VCF with Eagle v2.4.1 and it did work without an error.

I'm not sure if I missed anything obvious but I ran out of ideas to try - it'd be great if I can get some advice on where else I can look to investigate this issue.

Thank you very much for your work in developing this awesome software.

Best,
Ted

Paternal and maternal haplotypes

How to combine the Shapeit4 phased variants across different chromosomes? Could I run Shapeit on all chromosomes together? I'm using a pedigree-based phased vcf as scaffold. The paternal/maternal assignment in the scaffold is consistent across the chromosomes (paternal allele always in front of maternal allele).

Compilation problems / documentation

I'd love to try out shapeit4, but I'm having issues compiling it on my system. When I try compiling with GCC 4.3: I get the error:

g++ -std=c++0x -O3 -c src/phaser/phaser_initialise.cpp -o obj/phaser_initialise.o -Isrc -Iinclude -I/broad/software/free/Linux/redhat_6_x86_64/pkgs/boost_1.66.0/include
In file included from src/utils/otools.h:64:0,
                 from src/phaser/phaser_header.h:27,
                 from src/phaser/phaser_initialise.cpp:22:
src/utils/timer.h: In member function 'std::string timer::date()':
src/utils/timer.h:59:12: error: 'put_time' is not a member of 'std'
make: *** [obj/phaser_initialise.o] Error 1

Some googling made me thing that this might be fixed with a newer GCC, but when I try with GCC 5.2, I get:

$ make
g++ -std=c++0x -O3 -c src/phaser/phaser_initialise.cpp -o obj/phaser_initialise.o -Isrc -Iinclude -I/broad/software/free/Linux/redhat_6_x86_64/pkgs/boost_1.66.0/include
In file included from src/phaser/phaser_header.h:29:0,
                 from src/phaser/phaser_initialise.cpp:22:
src/models/haplotype_segment.h: In member function 'bool haplotype_segment::TRANSH()':
src/models/haplotype_segment.h:403:25: error: call of overloaded 'isnan(double&)' is ambiguous
  return (isnan(sumHProbs) || sumHProbs < numeric_limits<double>::min());
                         ^
In file included from /usr/include/features.h:361:0,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/x86_64-redhat-linux/bits/os_defines.h:39,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/x86_64-redhat-linux/bits/c++config.h:482,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/bits/stl_algobase.h:59,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/vector:60,
                 from src/utils/otools.h:26,
                 from src/phaser/phaser_header.h:27,
                 from src/phaser/phaser_initialise.cpp:22:
/usr/include/bits/mathcalls.h:235:1: note: candidate: int isnan(double)
 __MATHDECL_1 (int,isnan,, (_Mdouble_ __value)) __attribute__ ((__const__));
 ^
In file included from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/random:38:0,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/bits/stl_algo.h:66,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/algorithm:62,
                 from src/utils/otools.h:35,
                 from src/phaser/phaser_header.h:27,
                 from src/phaser/phaser_initialise.cpp:22:
/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/cmath:626:3: note: candidate: constexpr bool std::isnan(long double)
   isnan(long double __x)
   ^
/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/cmath:622:3: note: candidate: constexpr bool std::isnan(double)
   isnan(double __x)
   ^
/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/cmath:618:3: note: candidate: constexpr bool std::isnan(float)
   isnan(float __x)
   ^
In file included from src/phaser/phaser_header.h:29:0,
                 from src/phaser/phaser_initialise.cpp:22:
src/models/haplotype_segment.h: In member function 'bool haplotype_segment::TRANSD(int&)':
src/models/haplotype_segment.h:444:25: error: call of overloaded 'isnan(double&)' is ambiguous
  return (isnan(sumDProbs) || sumDProbs < numeric_limits<double>::min());
                         ^
In file included from /usr/include/features.h:361:0,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/x86_64-redhat-linux/bits/os_defines.h:39,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/x86_64-redhat-linux/bits/c++config.h:482,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/bits/stl_algobase.h:59,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/vector:60,
                 from src/utils/otools.h:26,
                 from src/phaser/phaser_header.h:27,
                 from src/phaser/phaser_initialise.cpp:22:
/usr/include/bits/mathcalls.h:235:1: note: candidate: int isnan(double)
 __MATHDECL_1 (int,isnan,, (_Mdouble_ __value)) __attribute__ ((__const__));
 ^
In file included from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/random:38:0,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/bits/stl_algo.h:66,
                 from /broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/algorithm:62,
                 from src/utils/otools.h:35,
                 from src/phaser/phaser_header.h:27,
                 from src/phaser/phaser_initialise.cpp:22:
/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/cmath:626:3: note: candidate: constexpr bool std::isnan(long double)
   isnan(long double __x)
   ^
/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/cmath:622:3: note: candidate: constexpr bool std::isnan(double)
   isnan(double __x)
   ^
/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/include/c++/5.2.0/cmath:618:3: note: candidate: constexpr bool std::isnan(float)
   isnan(float __x)
   ^
make: *** [obj/phaser_initialise.o] Error 1

What versions of GCC, Boost, and Samtools should I be using to compile?

Missing ;

You're missing a semi-colon:

10:13:35 BIOCONDA INFO (OUT) src/phaser/phaser_algorithm.cpp: In member function 'void phaser::phaseWindow(int, int)':
10:13:35 BIOCONDA INFO (OUT) src/phaser/phaser_algorithm.cpp:52:3: error: expected ';' before '}' token

shapeit4: src/phaser/phaser_algorithm.cpp:50: void phaser::phaseWindow(int, int): Assertion `threadData[id_worker].Kvec[w].size()>0' failed.

Hi,

I am getting the same error message as mdkeehan. In my case, 89.chr20.vcf.gz has a single human sample with ~14,400 SNPs on chr20. I get the error message with the default window, and 1e6, 1e7 and 20e7 used in the --window command. I get the error message for chr2 as well.

shapeit4
--input 89.chr20.vcf.gz
--map /shapeit4/maps/chr20.b37.gmap.gz
--region 20
--window 20e7
--output phased_89.20.vcf.gz

Can you help please?
Stuart Kim

Error: bcf_sr_set_threads was not declared in this scope; did you mean ‘bcf_sr_set_targets

Hi,

I am trying to compile shapeit 1.2.0 but run into problem:

Modified make file:
...
...
HTSLIB_INC=/usr/local/apps/samtools/1.2/include/
HTSLIB_LIB=/usr/local/apps/samtools/1.2/lib/libhts.a
#HTSLIB_INC=$(HOME)/Tools/htslib-1.9
#HTSLIB_LIB=$(HOME)/Tools/htslib-1.9/libhts.a

#BOOST IOSTREAM & PROGRAM_OPTION LIBRARIES [SPECIFY YOUR OWN PATHS]
BOOST_INC=/usr/local/boost/1.70_gcc-9.2.0/include
BOOST_LIB_IO=/usr/local/boost/1.70_gcc-9.2.0/lib/libboost_iostreams.a
BOOST_LIB_PO=/usr/local/boost/1.70_gcc-9.2.0/lib/libboost_program_options.a
#BOOST_INC=/usr/include
#BOOST_LIB_IO=/usr/lib/x86_64-linux-gnu/libboost_iostreams.a
#BOOST_LIB_PO=/usr/lib/x86_64-linux-gnu/libboost_program_options.a
...
...

I am using gcc 9.2.0 and boost 1.70. Exact same environment like this works just fine with shapeit version 4.1.3 previously. But for 4.2.0 I got the following error while running 'make all':

$ make all
g++ -std=c++11 -O3 -mavx2 -mfma -c src/containers/bitmatrix.cpp -o obj/bitmatrix.o -Isrc -I/usr/local/apps/samtools/1.2/include/ -I/usr/l
ocal/boost/1.70_gcc-9.2.0/include
g++ -std=c++11 -O3 -mavx2 -mfma -c src/containers/genotype_set.cpp -o obj/genotype_set.o -Isrc -I/usr/local/apps/samtools/1.2/include/ -I
/usr/local/boost/1.70_gcc-9.2.0/include
g++ -std=c++11 -O3 -mavx2 -mfma -c src/containers/haplotype_set.cpp -o obj/haplotype_set.o -Isrc -I/usr/local/apps/samtools/1.2/include/
-I/usr/local/boost/1.70_gcc-9.2.0/include
In file included from src/containers/haplotype_set.cpp:22:
src/containers/haplotype_set.h: In member function ‘bool IBD2track::merge(const IBD2track&)’:
src/containers/haplotype_set.h:53:2: warning: no return statement in function returning non-void [-Wreturn-type]
53 | }
| ^
g++ -std=c++11 -O3 -mavx2 -mfma -c src/containers/variant_map.cpp -o obj/variant_map.o -Isrc -I/usr/local/apps/samtools/1.2/include/ -I/usr/local/boost/1.70_gcc-9.2.0/include
g++ -std=c++11 -O3 -mavx2 -mfma -c src/io/genotype_reader1.cpp -o obj/genotype_reader1.o -Isrc -I/usr/local/apps/samtools/1.2/include/ -I/usr/local/boost/1.70_gcc-9.2.0/include
In file included from src/io/genotype_reader.h:28,
from src/io/genotype_reader1.cpp:22:
src/containers/haplotype_set.h: In member function ‘bool IBD2track::merge(const IBD2track&)’:
src/containers/haplotype_set.h:53:2: warning: no return statement in function returning non-void [-Wreturn-type]
53 | }
| ^
src/io/genotype_reader1.cpp: In member function ‘void genotype_reader::scanGenotypes(std::string)’:
src/io/genotype_reader1.cpp:87:18: error: ‘bcf_sr_set_threads’ was not declared in this scope; did you mean ‘bcf_sr_set_targets’?
87 | if (nthreads>1) bcf_sr_set_threads(sr, nthreads);
| ^~~~~~~~~~~~~~~~~~
| bcf_sr_set_targets
src/io/genotype_reader1.cpp: In member function ‘void genotype_reader::scanGenotypes(std::string, std::string)’:
src/io/genotype_reader1.cpp:110:18: error: ‘bcf_sr_set_threads’ was not declared in this scope; did you mean ‘bcf_sr_set_targets’?
110 | if (nthreads>1) bcf_sr_set_threads(sr, nthreads);
| ^~~~~~~~~~~~~~~~~~
| bcf_sr_set_targets
make: *** [obj/genotype_reader1.o] Error 1

Help is really appreciated.

Jean

missing genotypes in scaffold

Are missing genotypes allowed in the scaffolds?

For example:

0|1 ./.
./. 0|1

The use case is samples coming from different microarrays.

thanks!

Jared

free(): invalid pointer \n Aborted (core dumped)

Hi, I'm getting a shapeit4 issue during the first burn-in iteration

g++ Version:

$g++ --version g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Log:

SHAPEIT

  • Author : Olivier DELANEAU, University of Lausanne
  • Contact : [email protected]
  • Version : 4.2.0
  • Run date : 09/04/2021 - 16:29:32

Files:

  • Input VCF : [../data/unphased/11/HGDP_1000g_regen_no_AT_CG_3pop_geno05_11.vcf.gz]
  • Reference VCF : [../data/reference/shapeit4/CCDG_14151_B01_GRM_WGS_2020-08-05_chr11.filtered.shapeit2-duohmm-phased_header.vcf.gz]
  • Genetic Map : [../data/gen_maps/chr11.b38.gmap.gz]
  • Output VCF : [../data/phased/11/HGDP_1000g_regen_no_AT_CG_3pop_geno05_shapeit4_11.vcf.gz]

Parameters:

  • Seed : 15052011
  • Threads : 16 threads
  • MCMC : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  • PBWT : Depth of PBWT neighbours to condition on: 4
  • PBWT : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  • HMM : K is variable / min W is 2.50cM / Ne is 15000
  • HMM : Recombination rates given by genetic map
  • HMM : AVX2 optimization active

Initialization:

  • VCF/BCF scanning [Nm=1690 / Nr=3202 / L=1018000 / Reg=11] (602.40s)
  • VCF/BCF parsing [Hom=92.8% / Het=7.2% / Mis=0.0%] (679.11s)
  • GMAP parsing [n=168609] (0.26s)
  • cM interpolation [s=103172 / i=914828] (0.07s)
  • Region length [134943857 bp / 158.3 cM]
  • HMM parameters [Ne=15000 / Error=0.0001 / #rare=144758]
  • PBWT indexing [l=7812] (0.05s)
  • HAP update (10.39s)
  • H2V transpose (32.16s)
  • PBWT phase sweep (107.38s)
  • Build genotype graphs [seg=41377137] (2.59s)

Burn-in iteration [1/5]

  • V2H transpose (2.14s)
  • PBWT selection (35.48s)
  • C2H transpose (0.31s)
  • HMM computations [K=389.134+/-196.794 / W=4.14Mb] (329.76s)
    free(): invalid pointer
    Aborted (core dumped)

Thanks a lot,
Ben

Encoding GT for male on nonpar X

How should genotypes (GT) for males be encoded in the NONPAR region as "0" "1" or "0/0" "1/1"?Both approaches seem to run. Thank you!

Feature request for "shapeit -convert --input-graph --output-sample" ?

Dear Prof. Delaneau,

In an older version of shapeit, there was an option to sample a pair of haplotypes with:
shapeit -convert --input-graph chr20.graph --output-sample gwas.phased

I see shapeit4 outputs a graph, but currently I see no options for "input-graph" or "output-sample". I was wondering if there were plans to implement that?

(I can't use shapeit2 because I'm using phasing options in shapeit4 not available in shapeit2, so the graph won't be the same).

Best,
Pauline

Compiling problem with libhts.a(hfile_libcurl.o) in htslib1.9: In all functions are undefined reference to smth

After this part of compilation:
g++ -std=c++11 -O3 obj/haplotype_writer.o obj/genotype_reader1.o obj/genotype_reader2.o obj/gmap_reader.o obj/genotype_set.o obj/variant_map.o obj/haplotype_set.o obj/compute_job.o obj/genotype_managment.o obj/genotype_sweep.o obj/genotype_prune.o obj/genotype_build.o obj/genotype_mask.o obj/variant.o obj/hmm_parameters.o obj/main.o obj/pbwt_solver.o obj/builder.o obj/phaser_management.o obj/phaser_initialise.o obj/phaser_algorithm.o obj/phaser_parameters.o obj/phaser_finalise.o obj/haplotype_segment.o /home/silly/progs/htslib-1.9/libhts.a /home/silly/progs/boost_1_71_0/stage/lib/libboost_iostreams.a /home/silly/progs/boost_1_71_0/stage/lib/libboost_program_options.a -o bin/shapeit4 -lz -lbz2 -lm -lpthread -llzma

I've got the following output:

/home/silly/progs/htslib-1.9/libhts.a(hfile_libcurl.o): In function `easy_errno':
/home/silly/progs/htslib-1.9/hfile_libcurl.c:164: undefined reference to `curl_easy_getinfo'
/home/silly/progs/htslib-1.9/hfile_libcurl.c:178: undefined reference to `curl_easy_getinfo'
/home/silly/progs/htslib-1.9/libhts.a(hfile_libcurl.o): In function `wait_perform':
/home/silly/progs/htslib-1.9/hfile_libcurl.c:686: undefined reference to `curl_multi_fdset'
/home/silly/progs/htslib-1.9/hfile_libcurl.c:707: undefined reference to `curl_multi_perform'
/home/silly/progs/htslib-1.9/libhts.a(hfile_libcurl.o): In function `process_messages':
/home/silly/progs/htslib-1.9/hfile_libcurl.c:662: undefined reference to `curl_multi_info_read'
/home/silly/progs/htslib-1.9/libhts.a(hfile_libcurl.o): In function `wait_perform':
/home/silly/progs/htslib-1.9/hfile_libcurl.c:689: undefined reference to `curl_multi_timeout'
/home/silly/progs/htslib-1.9/libhts.a(hfile_libcurl.o): In function `libcurl_close':
etc... as above
/home/silly/progs/htslib-1.9/libhts.a(hfile_s3.o): In function `s3_sign':
/home/silly/progs/htslib-1.9/hfile_s3.c:77: undefined reference to `EVP_sha1'
/home/silly/progs/htslib-1.9/hfile_s3.c:77: undefined reference to `HMAC'
collect2: error: ld returned 1 exit status
makefile:49: recipe for target 'bin/shapeit4' failed
make: *** [bin/shapeit4] Error 1

Assertion `ngt_main == 2 * n_main_samples' failed.

Hello:
Just getting started using ShapeIt4 on WGS vcfs for families.
Right away there is an error at Initialization:

"shapeit4: src/io/genotype_reader2.cpp:50: void genotype_reader::readGenotypes0(std::__cxx11::string): Assertion `ngt_main == 2 * n_main_samples' failed."

I imagine it may be a vcf format issue.
Thanks for any help.
Chuck

ERROR: Parsing line 0 : incorrect number of columns, observed: 1 expected: 3

I'm trying to run Shapeit4 with some files and I obtain en Error
(ERROR: Parsing line 0 : incorrect number of columns, observed: 4 expected: 3).
I even tried with the unphased file in /test and I have the same error.

This is how I used it (same with .vcf):
shapeit4 --input unphased.bcf --map ../maps/genetic_maps.b37.tar.gz --region 20 --output test-phased.bcf

This is the log, I don't obtain an output file:
<<
SHAPEIT

  • Author : Olivier DELANEAU, University of Lausanne
  • Contact : [email protected]
  • Version : 4.1.2
  • Run date : 11/03/2021 - 13:25:50

Files:

  • Input VCF : [unphased.bcf]
  • Genetic Map : [../maps/genetic_maps.b37.tar.gz]
  • Output VCF : [test-phased.bcf]

Parameters:

  • Seed : 15052011
  • Threads : 1 threads
  • MCMC : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  • PBWT : Depth of PBWT neighbours to condition on: 4
  • PBWT : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  • HMM : K is variable / min W is 2.50cM / Ne is 15000
  • HMM : Recombination rates given by genetic map
  • HMM : !AVX2 optimization inactive!
  • IBD2 : length>=3.00cM [N>=150 / MAF>=0.010 / MDR<=0.500]

Initialization:

  • VCF/BCF scanning [N=203 / L=24990 / Reg=20] (0.01s)
  • VCF/BCF parsing [Hom=90.0% / Het=10.0% / Mis=0.0%] (0.08s)

ERROR: Parsing line 0 : incorrect number of columns, observed: 4 expected: 3

Does anyone had the same problem??
Thank you so much!

Question in test folder hapmap for b37 chr20

Hi,
I noticed the file in test folder chr20.b37.gmap.gz is the same as that supplied by beagle. But this is different that what you get from Eagle download here with name genetic_map_hg19.txt.gz. Would anyone know where to understand these different positions and cMs?

build issue in docker Unbuntu 20.04

Hi,
I am trying to build a docker image. I am getting this error however. I have all the libraries I believe and boost in my image. I even tried adding -ldl to the DYN_LIBS in the make file but same issue:

/usr/bin/ld: /usr/local/lib/libhts.a(plugin.o): in function `load_plugin':
/opt/htslib-1.11/plugin.c:137: undefined reference to `dlopen'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:141: undefined reference to `dlsym'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:166: undefined reference to `dlerror'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:144: undefined reference to `dlopen'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:146: undefined reference to `dlclose'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:156: undefined reference to `dlsym'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:168: undefined reference to `dlclose'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:166: undefined reference to `dlerror'
/usr/bin/ld: /usr/local/lib/libhts.a(plugin.o): in function `plugin_sym':
/opt/htslib-1.11/plugin.c:174: undefined reference to `dlsym'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:175: undefined reference to `dlerror'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:174: undefined reference to `dlsym'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:175: undefined reference to `dlerror'
/usr/bin/ld: /usr/local/lib/libhts.a(plugin.o): in function `close_plugin':
/opt/htslib-1.11/plugin.c:188: undefined reference to `dlclose'
/usr/bin/ld: /opt/htslib-1.11/plugin.c:190: undefined reference to `dlerror'
collect2: error: ld returned 1 exit status
make: *** [makefile:50: bin/shapeit4.2] Error 1

happens when I download https://github.com/odelaneau/shapeit4/archive/v4.2.0.tar.gz or clone from git.

Here is code in Dockerfile in which i also installed htslib:

# htslib (required for samtools) - Updated to 1.11
RUN cd /opt && \
	wget --no-check-certificate https://github.com/samtools/htslib/releases/download/1.11/htslib-1.11.tar.bz2 && \
	tar -xf htslib-1.11.tar.bz2 && rm htslib-1.11.tar.bz2 && cd htslib-1.11 && \
	./configure --enable-libcurl --enable-s3 --enable-plugins --enable-gcs && \
	make && make install && make clean

# boost
RUN cd /opt && \
	wget https://dl.bintray.com/boostorg/release/1.75.0/source/boost_1_75_0.tar.bz2 && \
	tar -xf boost_1_75_0.tar.bz2 && rm boost_1_75_0.tar.bz2 && cd boost_1_75_0 && \
	./bootstrap.sh --prefix=/usr/local/boost && ./b2 install
ENV BOOST=/usr/local/boost

# shapeit4
RUN cd /opt && \
	#wget https://github.com/odelaneau/shapeit4/archive/v4.2.0.tar.gz && \
	#tar -xvzf v4.2.0.tar.gz && rm v4.2.0.tar.gz && cd shapeit4-4.2.0 && \
	git clone https://github.com/odelaneau/shapeit4.git && cd shapeit4 && \
	sed -i 's|HTSLIB_INC=.*|HTSLIB_INC=/usr/local/include/htslib|' makefile && \
	sed -i 's|HTSLIB_LIB=.*|HTSLIB_LIB=/usr/local/lib/libhts.a|' makefile && \
	sed -i 's|BOOST_INC=.*|BOOST_INC=$(BOOST)/include|' makefile && \
	sed -i 's|BOOST_LIB_IO=.*|BOOST_LIB_IO=$(BOOST)/lib/libboost_iostreams.a|' makefile && \
	sed -i 's|BOOST_LIB_PO=.*|BOOST_LIB_PO=$(BOOST)/lib/libboost_program_options.a|' makefile && \
	sed -i 's/DYN_LIBS=-lz -lbz2 -lm -lpthread -llzma -lcurl -lssl -lcrypto/DYN_LIBS=-lz -lbz2 -lm -lpthread -llzma -lcurl -lssl -lcrypto -ldl/' makefile && \
	make && make install && make clean

Error in compiling

Hi,

I'd like to user shapeit4, but I have problem with compiling.

The below is error message.

g++ -std=c++11 -O3 -mavx2 -mfma -c src/objects/genotype/genotype_sweep.cpp -o obj/genotype_sweep.o -Isrc -I/home/users/sijaewoo/bin/htslib -I/usr/include
In file included from src/utils/otools.h:64:0,
from src/objects/genotype/genotype_header.h:25,
from src/objects/genotype/genotype_sweep.cpp:22:
src/utils/timer.h: In member function 'std::string timer::date()':
src/utils/timer.h:59:12: error: 'put_time' is not a member of 'std'
ss << std::put_time(std::localtime(&in_time_t), "%d/%m/%Y - %X");
^
make: *** [obj/genotype_sweep.o] Error 1

If you know how to solve it, please let me know.

Thnaks.

Unable to phase chromosome X of a cohort VCF with ~2,000 sample (updated)

Hello,

I am trying to use SHAPEIT4 for phasing a large cohort (~2,000 samples) of whole genome sequencing data and I'm running into issues when trying to phase chromosome X VCFs. As far I know, the input VCFs don't use any special representation for chromosome X, but SHAPEIT4 only fails on chrX without any error message, while it works well with other chromosomes. It seems to always fails in the "HMM computations" step in the first burn-in iteration "Burn-in iteration [1/5]".

This issues is reproducible with a public 30x WGS release of 1000 Genomes Project phase 3 by New York Genome Center. You can either download the variant calls by DeepVariant+GLnexus at this link in Google Cloud (+ .tbi), or the calls by GATK at this link in Google Cloud (+ .tbi) or official FTP, and run the following command (with the genetic map file included in SHAPEIT4):

$ shapeit4 \
  --input CCDG_13607_B01_GRM_WGS_2019-02-19_chrX.recalibrated_variants.vcf.gz \
  --map chrX.b38.gmap.gz \
  --region chrX \
  --output phased_1kgp_gatk_chrX.vcf.gz \
  --thread $(nproc) \
  --log gatk_shapeit413_chrX.txt \
  --sequencing

Full output log can be found here using SHAPEIT v4.1.3. I didn't see any error message other than "Killed". I tried this in multiple Debian/Ubuntu machines with intel Xeon CPU. SHAPEIT4 binary was compiled with htslib v1.9.

I have tried the following to fix this issue but all failed:

  1. Changing --pbwt-mdr value to a higher value.
  2. Converting all missing genotype calls (./.) to 0/0.
  3. Skipping the --map flag to use the default flat linkage structure.

I'm using SHAPEIT v4.1.3 for this run, but I also tried v4.2.1 and it failed with the same error message. Just as a reference, I also tried phasing the same VCF with Eagle v2.4.1 and it did work without an error.

It'd be great if I can get some advice on how to fix this issue.

Thank you very much for your work in developing this awesome software.

Best,
Ted

Buffer read overflow in pbwt_solver::sweep

When troubleshooting a problem that might be similar to issue #45 (stack trace indicating heap corruption in a free operation of a string towards the end of the first burn-in round), I rebuilt the code using -fsanitize=address.

This picks up a read buffer overflow in pbwt_solver::sweep at the line:

if (hidx1<(n_total_hap-1)) s -= Guess[pbwt_clusters[idx_prev][hidx1+1]] * scoreBit[l - pbwt_divergences[idx_prev][hidx1+1] + 1];

It's not immediately clear to me that pbwt_divergence values are always supposed to be positive for the case where l == n_site - 1, and then this will indeed trigger an overflow. Obviously, that's also what happens in practice with my data.

While I suppose that the effect of a strange value there will be the phase of the guess, the invalid read by itself could theoretically end up at a page boundary triggering a segmentation fault in its own right. I would have made a pull request if it was clear to me what the proper fix is. It seems like a possible off by one error, but I am not sure in what way.

'--pbwt-disable-init' not recognized

Hi,

When running using the '--pbwt-disable-init' option, I get Error parsing command line arguments: unrecognised option '--pbwt-disable-init'. Am I doing something wrong?

Thanks for developing and supporting this great resource.

Marc

"Underflow impossible to recover" error

I ran shapeit4 on WGS sequencing data with a family-based scaffold and when I run it with version 4.1.2 or 4.1.3 (assuming the latter is the newest version since in the log it says v4.1.2 - see Issue #22) I get the following error:

* HMM computations [52%]^M * HMM computations [53%]^M * HMM computations [54%]^M * HMM computations [55%]^M * HMM computations [56%]^M * HMM computations [57%]^M * HMM computations [58%]^M * HMM computations [59%]^M * HMM computations [60%]^M * HMM computations [61%] ESC[31mERROR: ESC[0mUnderflow impossible to recover

However, when I run it with version 4.0.0 with the same default parameters it finishes with no issues. Any ideas of what can be causing that? Thanks!

Command I'm using:
shapeit4 \ --input ${vcf} \ --map ${genetic_map} \ --region ${chr} \ --scaffold ${scaffold_vcf} \ --thread 20 \ --log phased.log \ --output ${outdir}/${prefix}.vcf.gz

Problem opening index file

Hi,

When running shapeit4 v4.1.2, I get the following error in the Initialization step:

Initialization:
[E::hts_hopen] Failed to open file output/1.unphased.vcf.gz.csi
[E::hts_open_format] Failed to open file output/1.unphased.vcf.gz.csi

The command:

shapeit4 --input output/2.unphased.vcf.gz.csi --map chr2.b38.gmap.gz --region 2 --output output/2.phased.vcf.gz

I tried different ways to install shapeit4 and its dependencies, e.g. building from source or using bioconda, but the error keeps showing up.

I wonder whether you have any clue of this error. Something wrong with HTSlib? Library paths? Thank you.

Best,
Andrey

Segmentation Fault when running shapeit4 phasing

I see a possibly relevant comment from biostars 3 months ago but the issue wasn't resolved on that thread: https://www.biostars.org/p/200262/#422658

GCC version:

$ g++ --version
g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Logs:

liezl@pytorch-vm-high-mem-4:~/shapeit4$ bin/shapeit4 --input /home/lie/kj_test/vcfs/genc
ove/chr22_ef209283-9df1-4195-b292-2a16cbdad50d.vcf.gz --map ~/genetic_maps.b37/chr22.b37
.gmap.gz --region 22 --output ~/test_chr_22_phased.vcf.gz --pbwt-depth 2
SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : [email protected]
  * Version       : 4.1.3
  * Run date      : 26/05/2020 - 00:06:50

Files:
  * Input VCF     : [/home/lie/kj_test/vcfs/gencove/chr22_ef209283-9df1-4195-b292-2a16cb
dad50d.vcf.gz]
  * Genetic Map   : [/home/liezl/genetic_maps.b37/chr22.b37.gmap.gz]
  * Output VCF    : [/home/liezl/test_chr_22_phased.vcf.gz]
Parameters:
  * Seed    : 15052011
  * Threads : 1 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 2
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active
  * IBD2    : length>=3.00cM [N>=150 / MAF>=0.010 / MDR<=0.500]
Initialization:
  * VCF/BCF scanning ...
  * VCF/BCF scanning [N=1 / L=1110237 / Reg=22] (2.51s)
  * VCF/BCF parsing [Hom=96.7% / Het=3.3% / Mis=0.0%] (3.35s)
  * GMAP parsing [n=45329] (0.03s)
  * cM interpolation [s=45230 / i=1065007] (0.10s)
Segmentation fault

Error in compilation

Hi,

I'm having a bit of a strain to compile SHAPEIT4 and everything I have tried so far always ends up with an error.

The server I'm working on only has GCC 4.9.2 so I installed GCC 7.1.0 in a local directory. I also installed the Boost library with GCC 7.1.0 in a local directory. Both installs seems to have worked properly, at least as far as I can tell...

However, when trying to install SHAPEIT4 I always get an error that looks something like:

# until here everything compiles OK, then on the last rule of the makefile
g++ -std=c++11 -O3 obj/phaser_management.o obj/phaser_finalise.o obj/phaser_parameters.o obj/phaser_initialise.o obj/phaser_algorithm.o obj/variant_map.o obj/genotype_set.o obj/haplotype_set.o obj/haplotype_writer.o obj/genotype_reader2.o obj/graph_writer.o obj/genotype_reader1.o obj/gmap_reader.o obj/haplotype_segment_double.o obj/haplotype_segment_single.o obj/main.o obj/variant.o obj/hmm_parameters.o obj/genotype_sweep.o obj/genotype_mask.o obj/genotype_prune.o obj/genotype_build.o obj/genotype_managment.o obj/compute_job.o obj/pbwt_solver.o obj/builder.o /home/myname/Scratch/Software/libraries/htslib/1.9-GCC7/lib/libhts.a /home/myname/Scratch/Software/libraries/boost/1_67_0-GCC7/lib/libboost_iostreams.a /home/myname/Scratch/Software/libraries/boost/1_67_0-GCC7/lib/libboost_program_options.a -o bin/shapeit4 -lz -lbz2 -lm -lpthread -llzma -lcurl -lcrypto
obj/haplotype_set.o: In function `void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.136]':
haplotype_set.cpp:(.text+0x63): undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'
obj/haplotype_set.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > string_utils::str<double>(double, int) [clone .isra.138] [clone .constprop.503]':
haplotype_set.cpp:(.text+0x381): undefined reference to `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >'
# ... a bunch of other undefined reference follow
positional_options.cpp:(.text._ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPS5_S7_EEmRKS5_[_ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPS5_S7_EEmRKS5_]+0x6cf): undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
make: *** [bin/shapeit4] Error 1

I also tried the suggestions in #7, but to no avail. After some googling I found that these errors could be because different libraries were compiled with different compiler versions, but that doesn't seem to be the case here?

ldd libboost_iostreams.so
	linux-vdso.so.1 =>  (0x00007ffe3fb32000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f98d62a8000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f98d6092000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f98d5e82000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f98d5c5b000)
	libstdc++.so.6 => /home/myname/Scratch/Software/Compilers/gcc/7.1.0/lib64/libstdc++.so.6 (0x00007f98d58d9000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f98d55d7000)
	libgcc_s.so.1 => /home/myname/Scratch/Software/Compilers/gcc/7.1.0/lib64/libgcc_s.so.1 (0x00007f98d53bf000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f98d51a3000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f98d4de0000)
	/lib64/ld-linux-x86-64.so.2 (0x000055a40c6da000)

Not sure what to do more. Did anyone experienced the same troubles and can share their solutions here?

Many thanks,
Pedro

EDIT:

Boost version is 1_67_0.

$ g++ --version
g++ (GCC) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.

Problems with shapeit output

Hi, Prof. I wonder how the shapeit deal with missing data. More specifically, if a snp is coded as "0 0 0" in gens format, what it will be output in haps format? Will it be discarded or any other situation?

Q: Feasibility of a phasing experiment

Hi,

for a study, we are trying to get haplotypes for the gene ERAP1 for some human samples. I have genotypes from exome sequencing data for 6 SNPs described in the literature. We could phase some SNPs with RNA-seq that we do additionally have, but the distances are still to large for the insert size.

When trying to run shapeit4 (from bioconda), I run into the same problem as #1 , see log below. The overall region is >~ 100 kb, so I guess 6 SNPs is not enough.
I have not checked yet, but I could try if there are any other SNPs in this region.

So here are my questions:

  1. Is this type of analysis possible with shapeit4?
  2. Would it help to genotype the whole of chr5 instead of just the region we are interested in?
  3. Adding pre-phased info from the RNA-seq wouldn't hurt, would it?
  4. Hapmap3 does have phasing data for the ERAP SNPs available, so I guess I could use it. Not sure about the format, though [1]. Is there a better resource?

Kind regards,
Clemens

[1] ftp://ftp.hapmap.org/hapmap/phasing/2009-02_phaseIII/HapMap3_r2/CEU/UNRELATED/

$ bash 01_impute.sh 

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : [email protected]
  * Version       : 4.0.0
  * Run date      : 27/11/2019 - 16:35:15

Files:
  * Input VCF     : [../analysis/eraps.vcf.gz]
  * Genetic Map   : [shapeit4/maps/chr5.b37.gmap.gz]
  * Output VCF    : [phased.vcf.gz]

Parameters:
  * Seed    : 1234
  * Threads : 1 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Store indexes every 8 variants
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * HMM     : K is variable / min W is 2.00Mb / Ne is 15000

Initialization:
  * VCF/BCF scanning [N=216 / L=7 / Reg=5] (0.02s)
  * VCF/BCF parsing [Hom=60.5% / Het=39.5% / Mis=0.0%] (0.01s)
  * GMAP parsing [n=215414] (0.14s)
  * cM interpolation [s=6 / i=1] (0.00s)
  * HAP update (0.00s)
  * H2V transpose (0.00s)
  * IBD2 mask [l=120 / n=0] (0.00s)
  * PBWT phase sweep (0.00s)
  * Build genotype graphs [seg=289] (0.00s)

Burn-in iteration [1/5]
  * V2H transpose (0.00s)
  * PBWT selection (0.00s)
  * C2H transpose (0.00s)
  * HMM computations [K=8.0+/-0.2 / W=0.12Mb] (0.00s)
  * HAP update (0.00s)
  * H2V transpose (0.00s)

Burn-in iteration [2/5]
  * V2H transpose (0.00s)
  * PBWT selection (0.00s)
  * C2H transpose (0.00s)
  * HMM computations [K=8.0+/-0.2 / W=0.12Mb] (0.00s)
  * HAP update (0.00s)
  * H2V transpose (0.00s)

Burn-in iteration [3/5]
  * V2H transpose (0.00s)
  * PBWT selection (0.00s)
  * C2H transpose (0.00s)
shapeit4: src/phaser/phaser_algorithm.cpp:50: void phaser::phaseWindow(int, int): Assertion `threadData[id_worker].Kvec[w].size()>0' failed.
01_impute.sh: line 7: 32656 Aborted                 (core dumped) shapeit4 --input ../analysis/eraps.vcf.gz --map shapeit4/maps/chr5.b37.gmap.gz --region 5 --output phased.vcf.gz --window 2000000 --seed 1234

whatshap phase set variants

Hi!
First, let me thank you for releasing and maintaining shapeit4, and with setting aside time to contend with what usually amounts to user error on part of the community (which I will likely be guilty of in a second!).

I've been curious about using both statistical and physical phasing to get at the phase of rare variants. Shapeit4 seems to have some... features (that include the inducement of genotyping error from phasing) that caused me to write my own wrapper and clean-up script for it, and in so doing I believe I found a bug in how the phase set genotypes are treated. From what I gather the "known" phase of any genotype in a phase-set is only valid within that set; it may not be valid between sets. However in the few thousand phase-sets I've assessed every single one has the same orientation pre-shapeit4 as well as post shapeit4. This to me seems a bit wrong. At a high level, phasing should seek some path in the possible maternal/paternal orientation of genotypes (in the absence of a phase set), or in a collection of locally phased genotypes from some phase set. Instead what seems to be happening is that the orientation from the phase-set is treated as ground truth for the entire chromosome, which should have the effect of inducing switch errors at the phase-set boundaries as no information (other than that provided by LD) is available to say how the two phase-sets are phased to each other.

Said in another way (in ascii art), if the post whatshap pre shapeit4 vcf file for a single person with two phase sets (PS1 and PS2) is set up as:

                (PS1) (PS2)

HaplotypeA 1 1 0 0
HaplotypeB 0 0 1 1

Then if the phaseset error rate is 0, one would expect that statistical phasing would choose between the 1100 HaplotypeA and the 1111 HaplotypeA as the phase-sets are only stating that the first two sites are in-phase and sites 3 and 4 are in phase (but says nothing about sites 2 and 3, for example). What seems to be happening instead is that only the sites not in phase sets are being phased.

My apologies if this is in error! And thanks for your continued support for the program!
-August

Compiling does not work with GCC 4.8.2

Hi,

Today I tried to install shapeit4 on our universities HPC cluster. Using GCC 4.8.2, HTSLIB 1.7 and BOOST 1.63.0, compiling failed with an error message:

[sfux@eu-c7-050-15 shapeit4]$ make
g++ -std=c++0x -O3 -c src/containers/genotype_set.cpp -o obj/genotype_set.o -Isrc -I/cluster/apps/gdc/samtools/1.7/include -I/cluster/apps/gdc/boost/1.63.0/include
In file included from src/utils/otools.h:64:0,
                 from src/containers/genotype_set.h:25,
                 from src/containers/genotype_set.cpp:22:
src/utils/timer.h: In member function ‘std::string timer::date()’:
src/utils/timer.h:59:12: error: ‘put_time’ is not a member of ‘std’
      ss << std::put_time(std::localtime(&in_time_t), "%d/%m/%Y - %X");
            ^
make: *** [obj/genotype_set.o] Error 1
[sfux@eu-c7-050-15 shapeit4]

The documentation does not indicate, which GCC versions are supported and should work fine. Please add this to the documentation as it would make it much easier to install the software if the requirements are more clear.

And please also make a versioned release. This repository has 0 releases. If scientists use your software, and afterwards publish some results, they need to refer to the software they used. To avoid that people write sentences like, "the results were computed using shapeit4 commit 4ae1750" it would be nice to have a release with a version number.

Best regards

Sam

Segmentation fault

Hello, I came across the segmentation fault at the very beginning of running Shapeit4.

shapeit4 --input 22.snps..vcf.gz --map chr22.b37.gmap.gz --region 22 --use-PS 0.0001 --thread 4 --log 22.phased.log --output 22.phased.vcf

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : [email protected]
  * Version       : 4.1.3
  * Run date      : 15/10/2020 - 10:13:30

Files:
  * Input VCF     : [22.snps.vcf.gz]
  * Genetic Map   : [chr22.b37.gmap.gz]
  * Output VCF    : [22.phased.vcf]
  * Output LOG    : [22.phased.log]

Parameters:
  * Seed    : 15052011
  * Threads : 4 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : Inform phasing using VCF/PS field / Error rate of PS field is 0.0001
  * HMM     : !AVX2 optimization inactive!
  * IBD2    : length>=3.00cM [N>=150 / MAF>=0.010 / MDR<=0.500]

Initialization:
  * VCF/BCF scanning [N=2 / L=70040 / Reg=22] (0.53s)
  * VCF/BCF parsing [Hom=31.5% / Het=68.2% / Pha=9.031% / Mis=0.3%] (0.53s)
  * GMAP parsing [n=45329] (0.06s)
  * cM interpolation [s=17762 / i=52278] (0.01s)
  * PBWT indexing [l=2769] (0.00s)
  * HAP update (0.00s)
  * H2V transpose (0.00s)
  * IBD2 constraints [#inds=0 / #pairs=0] (0.01s)
  * PBWT phase sweep (0.01s)
  * Build genotype graphs [seg=31841] (0.00s)

Burn-in iteration [1/5]
  * V2H transpose (0.00s)
Segmentation fault (core dumped)

I just have two samples here. If I need the reference panel, which one is the best? Could you provide the link to download it?

Thanks!

Phasing with whatshap phased variants

I've phased variants from about 600 samples using whatshap with a pedigree file. About 100 samples in this file are in trios so majority of the variants from these samples were phased by whatshap. But the remaining 500 samples have poor phasing due to lack of trio information (parents were not sequenced).

I'm trying to use Shapeit4 to do statistical phasing in order to improve the phasing of these 500 non-trio samples. All these 600 samples are from a small isolated village so I'd like to see if the haplotypes resolved from the 100 trio-samples could improve the phasing of the non-trio samples via statistical phasing. The basic assumption is that haplotypes are broadly shared among the samples.

How should I set this up with Shapeit4? Could I just use the whathap phased vcf file as the input for Shapeit4 and Shapeit4 will use the phasing information automatically? Or should I prepare a scaffold vcf file using the phased variants from the 100 trio-samples and provide it to Shapeit4 using the "--scaffold" parameter? If latter, how should I prepare this scaffold file exactly? Should I prepare a "scaffold" file or a "reference panel" using these 100 trio-samples?

Thanks.

core dumped at genotype_reader2.cpp:127

Hi, I am using shapeit4 but there is a core dumped error during the initialization.

The error log is:

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : [email protected]
  * Version       : 4.1.3
  * Run date      : 28/10/2020 - 23:41:08

Files:
  * Input VCF     : [22.snps.VQSR.vcf.gz]
  * Reference VCF : [HRC.r1-1.GRCh37.wgs.mac5.sites.bcf]
  * Genetic Map   : [chr22.b37.gmap.gz]
  * Output VCF    : [22.phased.vcf]
  * Output LOG    : [22.phased.log]

Parameters:
  * Seed    : 15052011
  * Threads : 4 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : Inform phasing using VCF/PS field / Error rate of PS field is 0.0001
  * HMM     : !AVX2 optimization inactive!
  * IBD2    : length>=3.00cM [N>=150 / MAF>=0.010 / MDR<=0.500]

Initialization:
  * VCF/BCF scanning [Nm=2 / Nr=0 / L=39458 / Reg=22] (1.02s)
shapeit4: src/io/genotype_reader2.cpp:127: void genotype_reader::readGenotypes1(std::string, std::string): Assertion `ngt_ref == 2 * n_ref_samples' faile d.
Aborted (core dumped)

Semicolon missing in phaser_algorithm.cpp

Hi,

I tried to install the shapeit4 release 4.1.2 and would like to report some tiny issues that one needs to fix to get the software compiled.

In src/phaser/phaser_algorithm.cpp, there is a semicolon missing at the end of line 51, which causes the error:

src/phaser/phaser_algorithm.cpp: In member function ‘void phaser::phaseWindow(int, int)’:
src/phaser/phaser_algorithm.cpp:52:3: error: expected ‘;’ before ‘}’ token

Another small thing:

I had to replace all calls to isnan() by std::isnan() to avoid the error message

call of overloaded ‘isnan(double&)’ is ambiguous

Best regards

Sam

ERROR: Could not find conditioning haplotypes

Shapeit4 gave me an error on one chromosome:

ERROR: Could not find conditioning haplotypes for [sample_137739] / check options --pbwt-* and --ibd2-*

shapeit4 --input DNA.phased.chr3.bcf --map chr3.b38.gmap.gz --region chr3 --output DNA.shapeit4phased.chr3.vcf --use-PS 0.0001 --sequencing --scaffold DNA.phased.chr3.trios.bcf

How should I change the pbwt and idb2 parameters?

Phasing multiallelic variants

Hi there, does shapeit4 currently support phasing multiallelic variants (2 or more ALT alleles) or are only biallelic variants supported?

Many thanks.

How to prepare reference panel?

I have a set of phased SNPs in a cohort of samples and want to make it a reference panel for Shapeit4 phasing. How should I do this? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.