mfumagalli / ngstools Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 64.0 167 KB

Programs to analyse NGS data for population genetics purposes

License: GNU General Public License v3.0

Makefile 1.00% R 90.93% Shell 8.07%

ngstools's People

Contributors

Stargazers

Watchers

ngstools's Issues

running ngsTools on Plasmodium falciparum data

I would like to know if its possible to run ngsTools on P. falciparum parasite ILLUMINA data. Thanks

covar vs. cover

In:

Rscript scripts/plotPCA.R -i test.cover -c 1-2 -a test.pops.clst -o test.pca.pdf

test.cover should be test.covar to be consistent with previous tutorial syntax.

How does ANGSD handle some samples without coverage?

If we have a region in which some samples do not have coverage (e.g. due to presence/absence variation), how will ANGSD handle this? Will it calculate diversity given the samples it has coverage for? Or omit the region entirely?

Is there interest in adding container definition to documentation?

I am currently helping somebody to build the code and we solved some build issues by building the tool set into a Singularity/Apptainer container:

Bootstrap: docker
From: ubuntu:20.04

%post
    export DEBIAN_FRONTEND=noninteractive
    apt-get update -y

    apt install -y git build-essential pkg-config
    apt install -y libz-dev libbz2-dev liblzma-dev libcurl4-openssl-dev libssl-dev libgsl-dev

    git clone --recursive https://github.com/mfumagalli/ngsTools.git
    cd ngsTools
    git checkout 6505f80
    git submodule update --init --recursive
    make

%environment
    export LC_ALL=C

%runscript
    export PATH=/ngsTools/angsd:$PATH
    export PATH=/ngsTools/ngsDist:$PATH
    export PATH=/ngsTools/ngsF:$PATH
    export PATH=/ngsTools/ngsF-HMM:$PATH
    export PATH=/ngsTools/ngsLD:$PATH
    export PATH=/ngsTools/ngsPopGen:$PATH
    export PATH=/ngsTools/ngsSim:$PATH
    export PATH=/ngsTools/ngsUtils:$PATH

    $@

%labels
    Author Radovan Bast

%help
    This is how I use this container image:
    $ ./container.sif ngsDist [other arguments]
    $ ./container.sif [some other NGS tool]

If there is interest, I could submit it to the README or somewhere else.

error while running plotPCA.R

Dear Dr Fumagalli,

I'm trying to use your plotPCA.R script through:
Rscript --vanilla --slave pca.R -i ${in}.covMat -c 1-2 -a ${bamclst}.clst -o ${out}.pdf

with a .covMat file I generated from the following angsd command:

angsd -bam ${bamlist} -out ${out} -nThreads 8 \
      -minMapQ 20 -minQ 20 -minInd ${minInd} -doCounts 1 \
      -GL 1 -doMajorMinor 1 -doMaf 1 -minMaf 0.05 -skipTriallelic 1 \
      -SNP_pval 1e-6 -doHWE 1 -minHWEpval 0.05 -doPost 2 -postCutoff 0.95 \
      -doGeno 4 -geno_minDepth 5 -doGlf 2 -doIBS 1 -doCov 1

But I get the following error message :

Error in eigen(covar, symm = TRUE) : infinite or missing values in 'x'
Execution halted

I don't understand the issue because I don't have any missing data in my .covMat file.
Do you know what this message means?

Best regards,
Manon

TESTS FAILED

Hi. I installed the ngsTools on my ubuntu machine using the make command. I run the make test and some of the tests failed. Is there anything to be concerned about ? Below is the log

make[1]: Entering directory '/home/user/ngsTools/ngsSim'
ngsSim: All tests OK!
make[1]: Leaving directory '/home/user/ngsTools/ngsSim'
make[1]: Entering directory '/home/user/ngsTools/ngsPopGen'
ngsPopGen: All tests OK!
make[1]: Leaving directory '/home/user/ngsTools/ngsPopGen'
make[1]: Entering directory '/home/user/ngsTools/ngsUtils'
ngsUtils: All tests OK!
make[1]: Leaving directory '/home/user/ngsTools/ngsUtils'
make[1]: Entering directory '/home/user/ngsTools/ngsDist'
ngsDist: All tests OK!
make[1]: Leaving directory '/home/user/ngsTools/ngsDist'
make[1]: Entering directory '/home/user/ngsTools/ngsLD'
ngsLD: test(s) failed!
make[1]: Leaving directory '/home/user/ngsTools/ngsLD'
make[1]: Entering directory '/home/user/ngsTools/ngsF'
ngsF: All tests OK!
make[1]: Leaving directory '/home/user/ngsTools/ngsF'
make[1]: Entering directory '/home/user/ngsTools/ngsF-HMM'
ngsF-HMM: test(s) failed!
make[1]: Leaving directory '/home/user/ngsTools/ngsF-HMM'

ngsCovar error

Hi @mfumagalli, @fgvieira

I have run doGeno 32 for each chromosome with specified high quality sites (filtered by depth, SNP_pvalue, hwe_pvalue, SB3, and so on)
angsd -bam bam.list -only_proper_pairs 1 -uniqueOnly 1 -remove_bads 1 -minQ 20 -minMapQ 30 -C 50 -ref ref.fa -r $i -out $i -doMaf 1 -doMajorMinor 1 -doGeno 32 -doPost 1 -GL 1 -doCounts 1 -P 5 -doGlf 2 -sites $i.site
but when I do ngsCovar -probfile test.geno -outfile test.covar -nind $minindividual -nsites $sites -call 0 -norm 0 for one chromosome according to the tutorial, it just give me an error like following,
-> Possible error read GENO, binary file might be broken...
I do not know why because everything proceeded very smoothly except ngsCovar. I need your help.

Many thanks.
Best

Rewriting history

Howdy, and thanks for putting this on github, I'm a fan. Appears you've been rewriting commit history, which causes merge conflicts when I try to git pull to get newest changes to code. See http://git-scm.com/book/ch6-4.html .

Cheers,

Jeff

never ending ngsF

Hello,
I was rellay happy to see ngstool working in combnation wit ANGSD. I made my first steps with ANGSt recently , and tested ngsF. I am working n yeast and I was interested in estimating selfing rates for each strain: as well for life cycle than from genotyping.
I have downloaded ngstools from Github and compiled it on my Mac OSX 10.9.5, with some error messages but when launching the programs , it apparently works.

1- I managed to launch ngsF with my data, , and it looks like it is calculating but it does not stop after one night with only 123000 variant positions . I thought that this might come from a too big dataset, so that I filtered the SNPs for calling and min quality which turned the data set down to 53681 snps and still I do not see any change. Last I added the “quick “ or "-fast_lkl" option and it does the same.
here are the command lines and outputs

2- I looked back to install data which indicate in the case of ngsF make and then perform 'make test': this one answer test(s) failed! .
However I launched the test scripts after simulating the data with ngsSim and it started also a run … but apparently never ending….

It would be nice of you if you can tell me where the problem might come from!.
Best regards.
JL

zcat 20vintest.glf.gz | ./ngsF -n_ind 20 -n_sites 53681 -glf - -min_epsilon 0.1 -out 20vin.ttest3 -seed 1239 -init_values r -fast_lkl
==> Input Arguments:
glf file: -
init_values: r
out file: 20vin.ttest3
n_ind: 20
n_sites: 53681
chunk_size: 100000
fast_lkl: true
approx_EM: false
call_geno: false
max_iters: 1500
min_epsilon: 0.1000000000
n_threads: 1
seed: 1239
quick: false
version: 1.0.0
verbose: 1

==> Fewer sites (53681) than chunk_size (100000). Reducing chunk size to match number of sites
==> Analysis will be run in 1 chunk(s)
==> Using native I/O library
==> Setting initial values

Iteration 1:
......

ERROR DOWNLOADING DATA SET FOR THE TUTORIAL

I tried downloading the data set for the tutorial and I got some error below is the two lines of the error messages. Please advice

mkdir: cannot create directory ‘Data/LWK.BAMs’: No such file or directory
LWK
/NA19313.mapped.ILLUMINA.bwa.LWK.low_coverage.20120522.bam
./data.sh: line 48: Data/LWK.BAMs/NA19313.mapped.ILLUMINA.bwa.LWK.low_coverage.20120522.bam: No such file or directory
/NA19331.mapped.ILLUMINA.bwa.LWK.low_coverage.20120522.bam

I realized that the error was coming from the line below. please advice
$SAMTOOLS view -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/$i $CHROM:$START-$END > Data/$POP.BAMs$NAME 2> /dev/null

How to apply inbreeding coeffs (F) in Angsd?

Hi, if i use ngsF to calculate F for individuals in my population, where do I use it as a prior in Angsd? There seems to be no documentation on this.

cheers
DK

cannot convert vcf file to beagle binary format using the -vcf-gl option

Hello,

I did SNP calling with GATK (HaplotypeCaller, CombineGVCFs and GenotypeGVCFs) and as a result I have a final with all snps after applying several filters.

Now I would like to use this matrix of snps to study population structure (do a PCA with ngsCovar) and individual admixture proportions (with NGSadmix). It is my understanding that in order to do this I need to convert my vcf file into beagle binary format, which I could do in angsd using the -vcf-gl option.

I am able to run the command (see below) without getting errors but it seems it is not generating the correct output file.

ANGSD=/path to angsd

REF=genome.fasta.fai

$ANGSD -vcf-gl ./snps.vcf.gz -GL 0 -out ./snps.beagle -fai $REF -doGlf 2 -doMajorMinor 1 -doMaf 1 -nind 43

This is what I see when I check the run log file:

BAD SITE chr24:21289418. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289424. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289425. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289431. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289433. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289470. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289475. return code:-1 while fetching GL tag rec->rid:23
BAD SITE chr24:21289525. return code:-1 while fetching GL tag rec->rid:23
-> Done reading data waiting for calculations to finish
-> Done waiting for threads
-> Output filenames:
->"./snps.beagle.arg"
->"./snps.beagle.beagle.gz"
->"./snps.beagle.mafs.gz"
-> Mon May 15 15:43:02 2023
-> Arguments and parameters for all analysis are located in .arg file
-> Total number of sites analyzed: 0
-> Number of sites retained after filtering: 0
[ALL done] cpu-time used = 134.35 sec
[ALL done] walltime used = 135.00 sec

I'm not really sure what the problem is and how to fix it. Any advice on this?

Thank you!!

Make test(s) failure

Hi there!

I would appreciate some help with installing ngsTools. After cloning latests version, running make and make tests, I get the following (on two completely different servers, CentOS 7.4.1708 and Ubuntu 16.04.3 LTS):

make[1]: Entering directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsSim' ngsSim: test(s) failed! make[1]: Leaving directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsSim'
make[1]: Entering directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsPopGen' ngsPopGen: test(s) failed! make[1]: Leaving directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsPopGen'
make[1]: Entering directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsUtils' ngsUtils: test(s) failed! make[1]: Leaving directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsUtils'
make[1]: Entering directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsDist' ngsDist: test(s) failed! make[1]: Leaving directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsDist'
make[1]: Entering directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsF' ngsF: All tests OK! make[1]: Leaving directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsF'
make[1]: Entering directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsF-HMM' ngsF-HMM: test(s) failed! make[1]: Leaving directory /storage/rafal.gutaker/bin/INSTALLS/ngsTools/ngsF-HMM'

Any help would be much welcome!

Best,
Rafal

How to apply Plink format in ngsTools (ped/map data as input)

Dear all,

I use angsd and ngsTools to do PCA analysis and I found the PCA analysis (by angsd) are not similar with my previous study (by Plink). I tried lots of parameters in angsd and found the results are highly depended on the angsd parameters which are mainly about quality control. So I am wondering how I could apply plink output (ped/map data) to ngsTools as input?

Best,
Yafei

Conda recipies?

Hi,

I wanted to install the NGSutils in my conda environment, on a external cluster. I am afraid that the "make" process did not work then. Is there any way to install it using conda? Or are there alternative ways to install in external?

Thank you,

How to use plotSFS.R for SFS plot?

Dear mfumagalli,

Now, I have three population SFS files, and want to use plotSFS.R to plot a figure of SFS. However, I dont't know how to do.
I tried to use command R CMD BATCH plotSFS.R pop1.sfs pop2.sfs pop3.sfs , but with false output file named NA.

The formate of input file :
1501016844.282213 305197949.866378 94107367.693417 28784361.230037 12232819.391580 8848907.018584 4452457.953427 6221102.970627 2647028.357635 3060544.718254

Hope your answer.
Thank you very much!

Sun

cannot estimate fst due problem with size of dimension of 2d SFS

Hi there,

I've been following this tutorial (which is great btw!) without any problem until now :(
For some reason I am not able to estimate fst.

I have what I believe are 2 different populations (the whole point of doing these analyses is to actually see if they are different or not): pop1 (n = 10) and pop2 (n =10)

So far, this is what I've done:

1 -> calculate SAF:
(I'm using my reference genome as the ancestral sequence to polarise the alleles)
FILTERS="-minMapQ 30 -minQ 20 -uniqueOnly 1 -remove_bads 1 -baq 1 -only_proper_pairs 1 -trim 0 -C 50 -maxDepth 500 -minInd 10 -setMinDepth 20 -setMaxDepth 100"
$ANGSD -b pop1.bam.list -anc $ANC -ref $REF $FILTERS -GL 1 -doCounts 1 -doSaf 1 -out pop1
$ANGSD -b pop2.bam.list -anc $ANC -ref $REF $FILTERS -GL 1 -doCounts 1 -doSaf 1 -out pop2

2 -> calculate SFS:
$realSFS -bootstrap 100 pop1.saf.idx > pop1.sfs
$realSFS -bootstrap 100 pop2.saf.idx > pop2.sfs

Then I checked the number of values by doing:
awk -F' ' '{print NF; exit}' pop1.sfs
awk -F' ' '{print NF; exit}' pop2.sfs
awk -F' ' '{print NF; exit}' pop1.pop2.sfs

and I get 21 (2N+1), 21, and 441 respectively

3 -> calculate 2d SFS:
$realSFS -bootstrap 100 pop1.saf.idx pop2.saf.idx > pop1.pop2.sfs

now I'm trying to calculate the fst by doing:
$realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1.pop2.sfs -fstout pop1.pop2
$realSFS fst stats pop1.po2.fst.idx

but I get this error:
-> args: tole:0.000000 nthreads:4 maxiter:100 nsites(block):0 start:pop1.pop2.sfs chr:(null) start:-1 stop:-1 fstout:pop1.pop2 oldout:0 seed:-1 bootstrap:0 resample_chr:0 whichFst:0 fold:0 ref:(null) anc:(null)
-> nSites: 100000
-> IMPORTANT: please make sure that your saf files haven't been folded with -fold 1 in -doSaf in angsd
-> [reynoldFst] sfs1:20 sfs2:20 dimspace:441
-> generating offset remapper lookup
-> isSame:1 adjusting foldfactors
-> Reading: pop1.pop2.sfs assuming counts (will normalize to probs internally)
-> Pxroblem with size of dimension of prior 441 vs 44100

I can't figure out what I've done wrong... Any ideas?

Thank you!

Marcela

how to prepare sfs input file?

I used below code to run, it printed a error "Possible error,binaryfiles might be broken"
/home/lvfh/Tools-Reseq/ngsTools/ngsPopGen/ngsCovar -probfile test.pops.geno -outfile test.covar -nind 10 -nsites 10000 -block_size 2000 -call 0 -sfsfile test.sfs.ml.norm
So, i used an another code "/home/lvfh/Tools-Reseq/ngsTools/ngsPopGen/ngsCovar -probfile test.pops.geno -outfile test.covar -nind 10 -nsites 10000 -block_size 2000 -call 0". i got a right results.
I used the following commands to output .sfs files :
"angsd -b bam.filelist -anc chimpHg19.fa -out test -P 5 -r 1: -GL 1 -doSaf 1" "realSFS test.saf.idx -maxIter 100 -P 5 > test.sfs" .

Thetas on a single individual BAM

After running a set of BAM files from multiple individuals through ANGSD, I'd like to take a closer look at summary statistics calculated by ANGSD for a single (inbred) individual BAM to assess some possible earlier technical issues with our mapping steps (I work with plants, so this is quite tricky). I know this is outside of what ANGSD is designed to do, but I'd like to keep our workflow the same as to avoid adding other complexities. Is this possible, or is more than one sample needed with ANGSD?

PCA analysis with unlinked sites

I was wondering why the tutorial is not making a PCA on pruned or unlinked sites in the TUTORIAL.md. It is mentioned in the inbreeding part, but not for PCA. How would this be performed?

ngsTools for polyploid

Hi, is it possible to use ngsTools for simulation of sequence data in polyploid?
Thanks

How to interpret the output from plotQC.R

I want to begin with thanking you for the very helpful angsd tutorial and scripts, they have been so useful.

I have run your code plotQC.R successfully on my samples but I am struggling with how to interpret the pdf output and how I can use it to make decisions about the settings for the -setMinDepth and -setMaxDepth flags as that is not really explained in the tutorial.

I have looked far and wide but have not found good explanations behind how people are choosing values for these flags and the angsd documentation is not very helpful. There it says
-setMinDepth [int]: Discard site if total sequencing depth (all individuals added together) is below [int]. Requires -doCounts
-setMaxDepth [int]: Discard site if total sequencing depth (all individuals added together) is above [int] -doCounts
But gives no info on how to figure out what settings might make sense.

Choi et al 2019 say:
-setMinDepth to be one-third the average genome-wide coverage, and–setMaxDepth to be 2.5 times the average genome-wide coverage.

I am working with WGS ancient and modern with a mammal that has a good reference genome. My sample coverage is 3-7x and I am running angsd only on chr no x, y or scaffolds.

I realize this may not strictly be catagorized as an issue but thought I would try.
Example of plot output for chr 3 with 28 ancient samples

Installation of NGStools when ANGSD already installed

Hello,
Using the 'git clone' option listed, we downloaded and attempted to install NGStools on our system (Linux, RHEL 6 x86_64), but the make seem to bail when it hit ANGSD, which we already have installed. Can you direct how to install when ANGSD (and its related programs like RealSFS, etc.) are already on a system? I've pasted the error message below in case it is helpful. thanks in advance!

g++ -MM -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcTemplate.cpp >abcTemplate.d
g++ -c -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcWriteFasta.cpp
g++ -MM -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcWriteFasta.cpp >abcWriteFasta.d
g++ -c -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcWritePlink.cpp
g++ -MM -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcWritePlink.cpp >abcWritePlink.d
g++ -c -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcWriteVcf.cpp
g++ -MM -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib abcWriteVcf.cpp >abcWriteVcf.d
g++ -c -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib analysisFunction.cpp
g++ -MM -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib analysisFunction.cpp >analysisFunction.d
g++ -c -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib angsd.cpp
g++ -MM -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib angsd.cpp >angsd.d
g++ -c -O3 -D__WITH_POOL__ -I/usr/local/work/workspace/ngsTools/htslib bam_likes.cpp
bam_likes.cpp:271: error: expected constructor, destructor, or type conversion before ‘’ token
bam_likes.cpp: In function ‘void bam_likes_init()’:
bam_likes.cpp:275: error: ‘mod’ was not declared in this scope
bam_likes.cpp:275: error: ‘errmod_init’ was not declared in this scope
bam_likes.cpp: In function ‘void bam_likes_destroy()’:
bam_likes.cpp:280: error: ‘mod’ was not declared in this scope
bam_likes.cpp:281: error: ‘errmod_destroy’ was not declared in this scope
bam_likes.cpp: In function ‘void call_bam(chunkyT, double*, int)’:
bam_likes.cpp:288: error: ‘mod’ was not declared in this scope
bam_likes.cpp:292: error: ‘mod’ was not declared in this scope
bam_likes.cpp:343: error: ‘errmod_cal’ was not declared in this scope
make[1]: ** [bam_likes.o] Error 1
make[1]: Leaving directory `/usr/local/work/workspace/ngsTools/angsd'
make: *** [angsd] Error 2

angsd repo link is broken

Hi,

The link to angsd is broken.

I used this repo : https://github.com/ANGSD/angsd

And processed as mentionned in the README.

git clone https://github.com/samtools/htslib.git;
git clone https://github.com/angsd/angsd.git;
cd htsdir;make;cd ../angsd;make HTSSRC=../htslib

It's seem good

sem_t not included in shared.h for some reason

Hello,

Trying to compile your software here on an Ubuntu 16.04 box and for some reason the sem_t type isn't recognized. It's a standard library so I am not sure what the problem is. My version of g++ is: g++ (Ubuntu 7.3.0-21ubuntu1~16.04) 7.3.0.

Do you have any idea what to do? Thank you!

g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ./shared/threadpool.c
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c parse_args.cpp
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ngsDist.cpp -lgsl -lgslcblas -lm -lz -lpthread
g++ -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE *.o -lgsl -lgslcblas -lm -lz -lpthread -o ngsDist
make[1]: Leaving directory '/usr/local/bin/ngsTools/ngsDist'
make[1]: Entering directory '/usr/local/bin/ngsTools/ngsLD'
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ./shared/gen_func.cpp
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ./shared/read_data.cpp
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ./shared/threadpool.c
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c parse_args.cpp
g++ -I./shared  -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ngsLD.cpp -lgsl -lgslcblas -lm -lz -lpthread
g++ -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE *.o -lgsl -lgslcblas -lm -lz -lpthread -o ngsLD
make[1]: Leaving directory '/usr/local/bin/ngsTools/ngsLD'
make[1]: Entering directory '/usr/local/bin/ngsTools/ngsF'
g++ -O3 -Wall -I -I/usr/local/bin/ngsTools/htslib  -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_USE_KNETFILE  -c parse_args.cpp
In file included from parse_args.cpp:3:0:
shared.h:47:2: error: ‘sem_t’ does not name a type; did you mean ‘key_t’?
  sem_t _launch_thread_semaph;
  ^~~~~
  key_t
shared.h:48:2: error: ‘sem_t’ does not name a type; did you mean ‘key_t’?
  sem_t _running_thread_semaph;
  ^~~~~
  key_t
Makefile:28: recipe for target 'parse_args.o' failed
make[1]: *** [parse_args.o] Error 1
make[1]: Leaving directory '/usr/local/bin/ngsTools/ngsF'
Makefile:11: recipe for target 'ngsF' failed
make: *** [ngsF] Error 2

visualization 2DSFS. plot2DSFS.R

hi
Thank you very much for your valuable sharing.
Recently, I had a problem using plot2DSFS.R.
The command I used was as follows,
Rscript plot2DSFS.R HNHZ.YZ.sfs HNHZ-YZ 27-28
But instead of a raster, I got a strange one, as follows

Can you tell me what causes this?
I am looking forward to your reply!

Tutorial command not running

Hello @mfumagalli . I tried following the tutorial given for ngsTools and it could not run. Below is the command I used

$ANGSD -P 36 -b ALL.bamlist -ref $REF -out Results/ALL.qc -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 0 -C 50 -baq 1 -minMapQ 20 -minQ 20 -minInd 15 -setMinDepth 60 -setMaxDepth 400 -doCounts 1 -GL 1 -doMajorMinor 1 -doMaf 1 -skipTriallelic 1 -SNP_pval 1e-3 -doGeno 32 -doPost 1 &> /dev/null

Below is the message that was displayed
_-> angsd version: 0.930 (htslib: 1.9) build(Jan 3 2021 13:10:41)
-> angsd version: 0.930 (htslib: 1.9) build(Jan 3 2021 13:10:41)
-> Please use the website "http://www.popgen.dk/angsd" as reference
-> Use -nThreads or -P for number of threads allocated to the program
Overview of methods:_Below is the message that was displayed
-GL Estimate genotype likelihoods
-doCounts Calculate various counts statistics
-doAsso Perform association study
-doMaf Estimate allele frequencies
-doError Estimate the type specific error rates
-doAncError Estimate the errorrate based on perfect fastas
-HWE_pval Est inbreedning per site or use as filter
-doGeno Call genotypes
-doFasta Generate a fasta for a BAM file
-doAbbababa Perform an ABBA-BABA test
-sites Analyse specific sites (can force major/minor)
-doSaf Estimate the SFS and/or neutrality tests genotype calling
-doHetPlas Estimate hetplasmy by calculating a pooled haploid frequency

Below are options that can be usefull
-bam		Options relating to bam reading
-doMajorMinor	Infer the major/minor using different approaches
-ref/-anc	Read reference or ancestral genome
-doSNPstat	Calculate various SNPstat
-cigstat	Printout CIGAR stat across readlength
many others

Output files:
In general the specific analysis outputs specific files, but we support basic bcf output
-doBcf Wrapper around -dopost -domajorminor -dofreq -gl -dovcf docounts
For information of specific options type:
./angsd METHODNAME eg
./angsd -GL
./angsd -doMaf
./angsd -doAsso etc
./angsd sites for information about indexing -sites files
Examples:
Estimate MAF for bam files in 'list'
'./angsd -bam list -GL 2 -doMaf 2 -out RES -doMajorMinor 1'__

COMPILATION ERROR

Hi , I tried installing ngsTools and I got this error. Please advice

No package 'gsl' found
make[1]: Entering directory `/home/ha/apps/bioinfo/ngsTools/ngsTools/ngsDist'
g++ -I./shared -O3 -Wall -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -c ./shared/gen_func.cpp
In file included from ./shared/gen_func.cpp:1:0:
./shared/gen_func.hpp:12:25: fatal error: gsl/gsl_rng.h: No such file or directory
#include <gsl/gsl_rng.h>

But I have gsl installed ( in a non-default linux directory)

segfault with some -offset / -nsites combinations

Dear Matteo,

I am getting a segmentation fault when running ngsCovar for data subsets using -offset and -nsites options - I suspect this could be a bug.

I am running the following command with the latest github version of ngsCovar and get the following verbose output (args and dumping file lines removed):

$ ngsCovar -probfile file.geno -nind 58 -verbose 1 -offset 400000 -nsites 193 -norm 0 -call 0 -sfsfile file.saf -outfile tmp1
Block 0 out of 0 from 399999 to 400191
Getting esti...: 193 174 , 0.969347 0.059351
Getting mu...: 193, 0.286054 0.115822
Updating covar......weighting...
Got sfs: 193 117, e.g. 0.000000 0.000000
Getting pvar...: 193 , 1.000000 1.000000
Updating covar...Segmentation fault

I get the same when not using -sfsfile:

$ ngsCovar -probfile file.geno -nind 58 -verbose 1 -offset 148028 -nsites 327 -norm 0 -call 0 -outfile tmp1
Block 0 out of 0 from 148027 to 148353
Getting esti...: 327 174 , 0.984398 0.015602
Getting mu...: 327, 0.071430 0.074497
Updating covar...
I am using this minmaf 0.000000 ...no weighting...Segmentation fault

While for some other combinations of -offset and -nsites, it works:

$ ngsCovar -probfile file.geno -nind 58 -verbose 1 -offset 148027 -nsites 329 -norm 0 -call 0 -sfsfile file.saf -outfile tmp1**
Block 0 out of 0 from 148026 to 148354
Getting esti...: 329 174 , 0.000000 0.015602
Getting mu...: 329, 0.169526 0.048822
Updating covar......weighting...
Got sfs: 329 117, e.g. 0.000000 0.186886
Getting pvar...: 329 , 1.000000 0.999999
Updating covar...
Updating esite...: 309.205395 Sum:4383384258618.480957
(Exp/eff) nr sites: 309.205395

I use geno/saf files created with angsd 0.911 (newest github version) from a BEAGLE input file containing around 7M SNPs, so the geno and saf files (both binary) should contain information on 7M sites. I observed that the subsetting worked fine for low numbers supplied to -offset, but that with high numbers (= high index of starting site), the segfault happens. To test this, I ran the same data subset once using the full 7M SNP geno/saf files (-> segfault) and once using geno/saf files only containing the data subset of interest (-> worked nicely). The index number (1 vs. 27000000) supplied to -offset was thus the only difference.

Thank you very much for any ideas or input on what I could be wrong or whether this could be a bug.
Best,
David

Floating point exception problem in ngsStat

Hello,

I am running nsgStat in different populations. No matter which population I used, when I try to get the stats by doing:

/opt/ngsTools/ngsPopGen/ngsStat -npop 1 -postfiles BIA.unfolded.ref.postprob.saf -outfile BIA.stats.txt -nind 8

I got the message: "floating point exception". Output file is generated but empty. The .saf file "seems" normal when I print it. I don't really know what could be causing the problem, so sorry for not giving any hint. Any idea?

I would truly appreciate any input,
Best,
María.

realSFS on multiple regions

Can I use realSFS specifying more than one region? if the answer is yes, how?

I am looking for the equivalent to -rf in ANGSD in NGSTools

The idea would be:

Estimate the SFS around SEVERAL REGIONS:
-> ./realSFS afile.saf.idx -r $REGION_FILE

cat $REGION_FILE

chr1:1-10000
chr2:135000000-140000000
chr3:4880-194598

folded SFS

Not an issue. I was just wondering how aspects of the tutorial might change when dealing with a system in which an ancestral sequence is lacking? Could one just at -fold 1 to most commands?

mfumagalli / ngstools Goto Github PK

ngstools's People

Contributors

Stargazers

Watchers

Forkers

ngstools's Issues

Recommend Projects

Recommend Topics

Recommend Org