speciationgenomics / scripts Goto Github PK

View Code? Open in Web Editor NEW

78.0 78.0 61.0 72 KB

Scripts for analysis used during the course

Shell 39.75% R 50.17% Python 3.20% Slim 6.87%

scripts's People

Contributors

Stargazers

Watchers

scripts's Issues

plotADMIXTURE.r not working for K=2

Hello,
I find the plotADMIXTURE.r script very useful, but I cannot make it work for K=2 (only for K=4).
When I try, I get the following message:

Error in tbl[[x]] : subscript out of bounds
Calls: lapply -> FUN -> barplot -> t -> as.matrix
Execution halted

Could it be a mistake in the script itself?
Thank you for your help.
Kind regards.

No "RULES" section in fsc27

Hi Joana,

I was using the script calculateAIC.sh and came across an issue. Since I'm using fsc27, the section "RULES" in the .est file is not necessary anymore and I didn't include it. When I ran the AIC script, it only calculates deltaL and fails to calculate AIC. The problem is in line 33, where it tries to find "RULES".

To solve it, I added "[RULES]" in my .est file and left it blank and it worked. Just to let you know and anyone that can have the same issue and struge to figgure it out because it doesn't return any error.

Thanks for you work, it helped me a lot!

Rita

Altering axes for checkHetsIndvVCF.sh

When I use checkHetsIndvVCF.sh, my plots look similar to the examples on this tutorial (https://speciationgenomics.github.io/allelicBalance/). The only major difference is that I have some background noise in low genotype count that is expanding my x and y axis much further than where the majority of my data is clumped. I tried altering the script myself to limit the x and y axes on the left graph to get a better look at the majority of my data, but I'm not skilled in R and only came up with errors. How would I be able to alter my script so I could limit my x and Y axes to zoom in on my data? Thank you for any help you can provide

chrom ID

Hi there!

I would like to use to use convertvcf2eigenstrat.sh to test hybridization using ADMIXTOOLS. The problem is, that my chromosome IDs don't refer to actual chromosomes but to scaffolds.

I have managed to overcome the issues with plink by adding scaff_ to the beggining of the chrom IDs and changing the original command to:
plink --vcf $file.vcf --recode --allow-extra-chr --set-missing-var-ids @:# --out $file

However, I still run into problems with convertf which apparently also arise from my "weird" chrom IDs. I managed to make it work but only by changing all chromosome names to 1, resorting and removing all duplicates. I would rather not do that as I am not sure if changing the order messes with my downstream analyses so I wanted to ask if you know any workaround?

Thanks in advance!
maernster

fsc-selectbestrun

Hi,

Thanks for the course material. Although I m not able to attend the course on site, I have learnt lots of skills here!
The procedure with fastsimcoal has been running fine before, but I just lately encountered this error when using ./fsc-selectbestrun.sh
"no .bestlhoods file found in run1
./fsc-selectbestrun.sh: line 37: [: -gt: unary operator expected
no .bestlhoods file found in run10
./fsc-selectbestrun.sh: line 37: [: -gt: unary operator expected
no .bestlhoods file found in run100
./fsc-selectbestrun.sh: line 37: [: -gt: unary operator expected
no .bestlhoods file found in run11
"
It seems to have an easy fix, however I m not an advanced linux user, can you help with that?
Thanks a lot!
Cui

vcf2treemix.sh plink2 version error

Hi, I would like to use vcf2treemix.sh file, but I got an error message "ERROR: treemix mc = line[6] indexerror: list index out of range"
so, I expect it to be a mismatch between plinks versions.
Could you suggest any ideas? I hoply can use vcf2treemix.sh under Plink2 version
Thank you:)

Error: Polyploidy found, and not supported by vcftools

Hi Mark,

I ran Treemix and I got this error:

Error: Polyploidy found, and not supported by vcftools: 33:48

So it seems I can't use this tool for my polyploidy samples. Maybe if we replace bcftools with vcftools, it is a good idea?

Thank you!

Best,

.

code issue

Hi,

is ti possible to understand what this part of the code is trying to do and how the final file should look like?

awk '{printf $0
for(i = 4; i <= NF; i++){
split($i,values,",")
if((values[1]+values[2])>0) freq=values[1]/(values[1]+values[2])
else freq=0
printf freq"\t"
}
printf "\n"}' $file".frequencies" > $file".frequencies2"

can you provide an example of the output file? it's very hard to run the whole code as it has a lot of bugs (for example the way it handle the .map file works only if the vcf files has a very specific format for the variants - if the variant shave a standard rs name, it doesn't work)

thank you

allelicBalance and checkHetsIndvVCF on WGS data

Hello,

I've come across this informative article https://speciationgenomics.github.io/allelicBalance/ which brings me to the "allelicBalance" and "checkHetsIndvVCF" scripts.

As I read through the article it suggests that the script is not designed for WGS data but while reviewing the presentation here "https://github.com/speciationgenomics/presentations/blob/master/PDFs_2022/AllelicBalance_PCRduplication.pdf" it shows an example of WGS data.

What are the downsides of using the "checkHetsIndvVCF.sh" for WGS analysis? I am also curious how haploid individuals and the hard region of the genome like repeats will affect the results.

Regards,
Nihir

Newest version of checkHetsIndvVCF.sh not working

Hi Joana and Mark,

thanks a lot for making this repository available to everyone; an awesome resource!

I just wanted to report that the current version of checkHetsIndvVCF.sh is not working. Running it with different vcf-files, some of them zipped, some not, results in the following error:

Rscript outputting your plots
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
Execution halted

The older version (without x-axsis pruning) works.

Best wishes,
Isabel

./plotModel.r -p early_geneflow -l EC,GOLAN

Hi, when I try the plotModel.r, I got the error message like this:
[root@GenEngine bestrun]# ./plotModel.r -p early_geneflow -l EC,GOLAN
Error in (grep("Population effective sizes", tpl) + 1):(grep("//Haploid", :
argument of length 0
Calls: plotModelBestParams
Execution halted

Do you know how to deal with it?

Citation

Hello, I used two of the scripts by Joana to analyze my fastsimcoal data (calculateAIC and plotModels) and I would love to cite your work in my publication. Do you have a citation preference for me to use? I greatly appreciated your scripts! Thanks so much, Jenna

Question about the format of genome_scan/chrEnds.txt file

In the script of genomeScan.r, appears the requiring of a txt file to get the end position of the chromosome. But I have no idea of what it's looks like. The script concerning this issue is attached below. Could you please help me to find it out. Thanks in advance.

Get chrom ends for making positions additive

chrom<-read.table("./genome_scan/chrEnds.txt",header=T)
chrom$add<-c(0,cumsum(chrom$end)[1:21])

Outgroup gone missing after

Hi,

I had made successful calculation with my RAD data when it was aligned to a reference genome using ./convertVCFtoEigenstrat.sh, but when I try denovo vcf, I get error message like this:
Parameters as interpreted:
--vcf populations.snps.vcf
--mac 1
--out populations.snps
--plink

After filtering, kept 89 out of 89 Individuals
Writing PLINK PED and MAP files ...
Done.
After filtering, kept 40785653 out of a possible 40785653 Sites
Run Time = 15937.00 seconds
parameter file: par.PED.EIGENSTRAT.populations.snps
genotypename: populations.snps.ped
snpname: populations.snps.map
indivname: populations.snps.ped
outputformat: EIGENSTRAT
genotypeoutname: populations.snps.eigenstratgeno
snpoutname: populations.snps.snp
indivoutname: populations.snps.ind
familynames: NO

convertf version: 5000

*** warning. genetic distances are in cM not Morgans
1 32061:441 100.001 441

snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 0 38841
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 1 314788
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 2 343244
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 3 365965
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 4 513517
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 5 575462
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 6 583003
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 7 677528
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2

And in the end I get the eigenstrat file without the outgroup sample. When I reduce the SNPs number and keep only the alleles appear in more than 90% of the samples, the outgroup was kept.
Do you know what is the problem?

thanks,
Cui

speciationgenomics / scripts Goto Github PK

scripts's People

Contributors

Stargazers

Watchers

Forkers

scripts's Issues

Get chrom ends for making positions additive

convertf version: 5000

Recommend Projects

Recommend Topics

Recommend Org