speciationgenomics / scripts Goto Github PK
View Code? Open in Web Editor NEWScripts for analysis used during the course
Scripts for analysis used during the course
Hello,
I find the plotADMIXTURE.r script very useful, but I cannot make it work for K=2 (only for K=4).
When I try, I get the following message:
Error in tbl[[x]] : subscript out of bounds
Calls: lapply -> FUN -> barplot -> t -> as.matrix
Execution halted
Could it be a mistake in the script itself?
Thank you for your help.
Kind regards.
Hi Joana,
I was using the script calculateAIC.sh and came across an issue. Since I'm using fsc27, the section "RULES" in the .est file is not necessary anymore and I didn't include it. When I ran the AIC script, it only calculates deltaL and fails to calculate AIC. The problem is in line 33, where it tries to find "RULES".
To solve it, I added "[RULES]" in my .est file and left it blank and it worked. Just to let you know and anyone that can have the same issue and struge to figgure it out because it doesn't return any error.
Thanks for you work, it helped me a lot!
Rita
When I use checkHetsIndvVCF.sh, my plots look similar to the examples on this tutorial (https://speciationgenomics.github.io/allelicBalance/). The only major difference is that I have some background noise in low genotype count that is expanding my x and y axis much further than where the majority of my data is clumped. I tried altering the script myself to limit the x and y axes on the left graph to get a better look at the majority of my data, but I'm not skilled in R and only came up with errors. How would I be able to alter my script so I could limit my x and Y axes to zoom in on my data? Thank you for any help you can provide
Hi there!
I would like to use to use convertvcf2eigenstrat.sh to test hybridization using ADMIXTOOLS. The problem is, that my chromosome IDs don't refer to actual chromosomes but to scaffolds.
I have managed to overcome the issues with plink by adding scaff_ to the beggining of the chrom IDs and changing the original command to:
plink --vcf $file.vcf --recode --allow-extra-chr --set-missing-var-ids @:# --out $file
However, I still run into problems with convertf which apparently also arise from my "weird" chrom IDs. I managed to make it work but only by changing all chromosome names to 1, resorting and removing all duplicates. I would rather not do that as I am not sure if changing the order messes with my downstream analyses so I wanted to ask if you know any workaround?
Thanks in advance!
maernster
Hi,
Thanks for the course material. Although I m not able to attend the course on site, I have learnt lots of skills here!
The procedure with fastsimcoal has been running fine before, but I just lately encountered this error when using ./fsc-selectbestrun.sh
"no .bestlhoods file found in run1
./fsc-selectbestrun.sh: line 37: [: -gt: unary operator expected
no .bestlhoods file found in run10
./fsc-selectbestrun.sh: line 37: [: -gt: unary operator expected
no .bestlhoods file found in run100
./fsc-selectbestrun.sh: line 37: [: -gt: unary operator expected
no .bestlhoods file found in run11
"
It seems to have an easy fix, however I m not an advanced linux user, can you help with that?
Thanks a lot!
Cui
Hi, I would like to use vcf2treemix.sh file, but I got an error message "ERROR: treemix mc = line[6] indexerror: list index out of range"
so, I expect it to be a mismatch between plinks versions.
Could you suggest any ideas? I hoply can use vcf2treemix.sh under Plink2 version
Thank you:)
Hi Mark,
I ran Treemix and I got this error:
Error: Polyploidy found, and not supported by vcftools: 33:48
So it seems I can't use this tool for my polyploidy samples. Maybe if we replace bcftools with vcftools, it is a good idea?
Thank you!
Best,
Hi,
is ti possible to understand what this part of the code is trying to do and how the final file should look like?
awk '{printf $0
for(i = 4; i <= NF; i++){
split($i,values,",")
if((values[1]+values[2])>0) freq=values[1]/(values[1]+values[2])
else freq=0
printf freq"\t"
}
printf "\n"}' $file".frequencies" > $file".frequencies2"
can you provide an example of the output file? it's very hard to run the whole code as it has a lot of bugs (for example the way it handle the .map file works only if the vcf files has a very specific format for the variants - if the variant shave a standard rs name, it doesn't work)
thank you
Hello,
I've come across this informative article https://speciationgenomics.github.io/allelicBalance/ which brings me to the "allelicBalance" and "checkHetsIndvVCF" scripts.
As I read through the article it suggests that the script is not designed for WGS data but while reviewing the presentation here "https://github.com/speciationgenomics/presentations/blob/master/PDFs_2022/AllelicBalance_PCRduplication.pdf" it shows an example of WGS data.
What are the downsides of using the "checkHetsIndvVCF.sh" for WGS analysis? I am also curious how haploid individuals and the hard region of the genome like repeats will affect the results.
Regards,
Nihir
Hi Joana and Mark,
thanks a lot for making this repository available to everyone; an awesome resource!
I just wanted to report that the current version of checkHetsIndvVCF.sh is not working. Running it with different vcf-files, some of them zipped, some not, results in the following error:
Rscript outputting your plots
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
Execution halted
The older version (without x-axsis pruning) works.
Best wishes,
Isabel
Hi, when I try the plotModel.r, I got the error message like this:
[root@GenEngine bestrun]# ./plotModel.r -p early_geneflow -l EC,GOLAN
Error in (grep("Population effective sizes", tpl) + 1):(grep("//Haploid", :
argument of length 0
Calls: plotModelBestParams
Execution halted
Do you know how to deal with it?
Hello, I used two of the scripts by Joana to analyze my fastsimcoal data (calculateAIC and plotModels) and I would love to cite your work in my publication. Do you have a citation preference for me to use? I greatly appreciated your scripts! Thanks so much, Jenna
In the script of genomeScan.r, appears the requiring of a txt file to get the end position of the chromosome. But I have no idea of what it's looks like. The script concerning this issue is attached below. Could you please help me to find it out. Thanks in advance.
chrom<-read.table("./genome_scan/chrEnds.txt",header=T)
chrom$add<-c(0,cumsum(chrom$end)[1:21])
Hi,
I had made successful calculation with my RAD data when it was aligned to a reference genome using ./convertVCFtoEigenstrat.sh, but when I try denovo vcf, I get error message like this:
Parameters as interpreted:
--vcf populations.snps.vcf
--mac 1
--out populations.snps
--plink
After filtering, kept 89 out of 89 Individuals
Writing PLINK PED and MAP files ...
Done.
After filtering, kept 40785653 out of a possible 40785653 Sites
Run Time = 15937.00 seconds
parameter file: par.PED.EIGENSTRAT.populations.snps
genotypename: populations.snps.ped
snpname: populations.snps.map
indivname: populations.snps.ped
outputformat: EIGENSTRAT
genotypeoutname: populations.snps.eigenstratgeno
snpoutname: populations.snps.snp
indivoutname: populations.snps.ind
familynames: NO
*** warning. genetic distances are in cM not Morgans
1 32061:441 100.001 441
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 0 38841
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 1 314788
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 2 343244
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 3 365965
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 4 513517
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 5 575462
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 6 583003
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
zzz 7 677528
snp order check fail; snp list not ordered: populations.snps.map (processing continues) 1 0 2
And in the end I get the eigenstrat file without the outgroup sample. When I reduce the SNPs number and keep only the alleles appear in more than 90% of the samples, the outgroup was kept.
Do you know what is the problem?
thanks,
Cui
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.