spacemix's Introduction
############################################################ ############################################################ # SpaceMix ReadMe ############################################################ ############################################################ PREAMBLE Caveat user: this code is still in beta. Please contact Gideon Bradburd at bradburd(at)umich(dot)edu with bug reports or questions. Please thoroughly read the documentation and go through the vignette first to make sure your question has not already been addressed. Also, remember to check the git for updates! README This repository contains the R package SpaceMix and accompanying vignette. To install the package you can either: (1) download the .tar.gz file and install using > install.packages("~/SpaceMix_X.tar.gz",repos=NULL) where "X" in the package name denotes the number of the most current version. OR (2) install directly from github using > require(devtools) > install_github("gbradburd/SpaceMix",build_vignettes=TRUE) You can then look at the vignette with > vignette("spacemix_vignette") and you can see all documented functions using > ??SpaceMix The citation for this paper is: Bradburd GS, Ralph PL, Coop GM (2016) A Spatial Framework for Understanding Population Structure and Admixture. PLoS Genet 12(1): e1005703 and the DOI link is below: http://dx.doi.org/10.1371/journal.pgen.1005703
spacemix's People
Forkers
balicea stuntspt benjaminpeter oasisye tvkent cooplab rmarquezp sethmusker hansonmenghan kitchensjnspacemix's Issues
Should we move from SNPs to averages over regions?
Currently, we think about thinning SNPs to make them independent. But, why not average them in windows, since then the central limit theorem says the result should fit our Gaussian model better anyhow?
Specifically, we could:
- subtract off SNP means
- take the average genotype in windows
- normalize each (across pops) to have variance 1
First step: see how the resulting covariance matrix compares.
Negative infinity
I get the following error when running spacemix:
fast run 1 failed
Error in MCMC(model.option = fast.model.option, data.type = data.type, : Initial probability of model is NEGATIVE INFINITY! Please attempt to initiate chain again.
skipping fast_run_1 because it failed
Error in setwd(list.files()[grep("BestRun", list.files())]) :
character argument expected
In addition: Warning messages:
1: In sqrt(mean.freq.mat * (1 - mean.freq.mat)) : NaNs produced
2: In sqrt(mean.freq.mat * (1 - mean.freq.mat)) : NaNs produced
3: In file.rename(paste("../", fast.run.dirs[i], sep = ""), paste("../", :
cannot rename file '../fast_run_1' to '../failed_fast_run_1', reason 'Directory not empty'
4: In file.rename(fast.run.dirs[which.max(last.probs)], paste(fast.run.dirs[which.max(last.probs)], :
cannot rename file 'fast_run_1' to 'fast_run_1_BestRun', reason 'No such file or directory'
I tried initiating the MCMC a number of times but always get this error. The present dataset is using only a single individual per population, about 50k SNPs. Let me know if any additional info that might be useful troubleshooting here.
Parallel runs
When running spacemix on the same data (with different replicates / different models) at the same time, I found that the runs somehow interact with each other, with all runs result in nearly the same output figures, some of which are clearly inconsistent with the parameters of the particular run.
Is there a list of temporary files/folders that are written somewhere? Or some guidelines how spacemix should be configured to avoid these interactions?
Plotting admixture when there is no admixture
Hi Gideon.
Thank you for the great package, and method!
I am running the 'source_and_target' mode, and there seems to be little evidence for admixture. Yet, when I make the geogenetic plot, I get large dotted circles that encompass all populations. Is this by design?
Also, as a minor suggestion, you should add a citation and link to the paper to the repo's README.
Thanks again.
Best.
Anders.
make.spacemix.map requires X11
Not exactly a bug, but undesireable if it can be avoided...
Even if a different output dev is specified (I tend to use svg
), make.spacemix.map
requires a working X11 connection.
Some people (me at least) tend to not forward X on their server logins unless it is really needed... It causes annoyances with persistent logins using screen
or tmux
as well as X being (sometimes very) slow when logging in from home or other less-than-ideal connection.
Burnin valid values?
Hi,
What is the valid value for burnin when displaying the SpaceMix results if I want to exclude, for example, data for 0-10,000 MCMC interations out of total 100,000 iterations?
bypass SNP calling
Allow as input a sample covariance matrix estimated directly from e.g. low-coverage data.
Chains not mixing
When running my Spacemix Model, I don't think my chains are mixing correctly (no fuzzy caterpillar). In other packages, I have tried running the model longer, but that hasn't seem to improve anything. How can I go about optimizing the run.spacemix.analysis parameters to fix this? Below I have included my run code and various troubleshooting graphs
run.spacemix.analysis(n.fast.reps = 10,
fast.MCMC.ngen = 1e5,
fast.model.option = "source_and_target",
long.model.option = "source_and_target",
data.type = "sample.frequencies",
sample.frequencies = allele.frequencies,
mean.sample.sizes = mean.sample.sizes,
counts = NULL,
sample.sizes = NULL,
sample.covariance= NULL,
target.spatial.prior.scale=NULL,
source.spatial.prior.scale=NULL,
spatial.prior.X.coordinates = LonLat_sorted[,1],
spatial.prior.Y.coordinates = LonLat_sorted[,2],
round.earth = TRUE,
long.run.initial.parameters=NULL,
k = nrow(allele.frequencies),
loci = ncol(allele.frequencies),
ngen = 1e6,
printfreq = 1e2,
samplefreq = 1e3,
mixing.diagn.freq = 50,
savefreq = 1e5,
directory=NULL,
prefix = "G4t90_110")
Possible numeric instability?
Hi Gideon.
In a dataset I am currently analysing some of my short runs, and my long run seem to have some numeric stability issues:
748200 ---- 1.907098e+13
748300 ---- 1.907098e+13
748400 ---- 1.907098e+13
748500 ---- 1.907098e+13
Error in solve.default(par.cov) :
system is computationally singular: reciprocal condition number = 1.41589e-16
I had this problem with the both the target
and source-target
models, I have not tried the other models. I am working from a smallish number of SNPs (420) and eight populations. But, they were all found to be outliers --- I was hoping to compare the neutral and outlier geogenetic maps. So, I wonder if that is what is causing the issue.
Congratulations and good luck with your new position (saw it on Twitter).
Cheers.
Anders.
compute goodness of fit along the genome
Perhaps we should compute "residuals" for each SNP, globally and for each population, and use this to look for regions that "look different".
"Residual" could mean the same thing as in bayenv.
But, would this be any different than bayenv, really?
Error in solve.default(par.cov)?
Here is the truncated version out output... It seems like the fast_run 1 ran ok but the all subsequent runs failed. Any idea?
--------- OUTPUT BEGIN -------------
100000 ---- -38143.28
9369 loci out of the original 100000 left in curated dataset.
LnL: -300572.6
Pr(a0): -5.390294
Pr(a1): -0.06266915
Pr(a2): -0.6418539
Pr(nugget): -100.6171
Pr(admix_target_locations): -1101.546
Pr(admix_source_locations): -1206.868
Pr(admix_proportions): 335.3737
Prob: -302652.3
1000 ---- -269967.7
...
49000 ---- 1.507207e+16
fast run 2 failed
Error in solve.default(par.cov): system is computationally singular: reciprocal condition number = 1.72956e-16
....
Error in make.spacemix.map.list
Hi,
I am running spacemix on a large dataset of ~3,600,000 SNPs. The spacemix analysis runs fine and generates the long run output file. However, when I try to run the make.spacemix.map.list I receive the following error.
Error in rgb(x[1L, ], x[2L, ], x[3L, ], x[4L, ]) :
alpha level NA, not in [0,1]
Calls: make.spacemix.map.list -> fade.admixture.source.points -> adjustcolor -> rgb
I have run the script on a subset of 500 SNPs, and both the spacemix analysis and reading in the Robj file for plotting run without errors.
Do you know what could be causing the above error in the larger dataset?
Please see attached for the commands used.
Many thanks,
Please move the ".tar.gz" distribution file to a release.
Instead of leaving it in the git repository, since it increases the git clone
size by a very large amount, you can just place the "distribution" in the releases section.
If you find that version controlling this file is also important, please consider creating a new repository for it. It really helps to keep thing tight and neat.
Thanks for considering!
"system computationally singular" crash
I've got no clue what caused this error. It came up rather late into the LongRun after successfully doing 10 fast runs.
The data set is 100K SNPs in something like 45 samples.
...
652800 ---- 8.926476e+14
652900 ---- 8.926476e+14
653000 ---- 8.926476e+14
653100 ---- 8.926476e+14
Error in solve.default(par.cov) :
system is computationally singular: reciprocal condition number = 9.67127e-17
The command was:
require(SpaceMix)
# load data
# defines ac (allele_counts), ss (sample_size), and loc (lng, lat) matricies
load('data.gzip')
# Need to transpose
ac <- t(ac)
ss <- t(ss)
run.spacemix.analysis(n.fast.reps = 10,
fast.MCMC.ngen = 1e5,
fast.model.option = "target",
long.model.option = "target",
data.type = "counts",
sample.frequencies=NULL,
mean.sample.sizes=NULL,
counts = ac, # our data
sample.sizes = ss, # our data
sample.covariance=NULL,
target.spatial.prior.scale=NULL,
source.spatial.prior.scale=NULL,
spatial.prior.X.coordinates = loc[,1], # our data
spatial.prior.Y.coordinates = loc[,2], # our data
round.earth = TRUE, # changed
long.run.initial.parameters=NULL,
k = nrow(ac), # our data
loci = ncol(ss), # our data
ngen = 1e6,
printfreq = 1e2,
samplefreq = 1e3,
mixing.diagn.freq = 50,
savefreq = 1e5,
directory=NULL,
prefix = outprefix)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.