smcclatchy / mapping Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 11.0 118.43 MB

Quantitative Trait Mapping Lesson

Home Page: https://smcclatchy.github.io/mapping/

License: Other

Makefile 2.19% HTML 25.12% JavaScript 0.68% R 31.16% Python 37.47% Shell 0.85% Ruby 0.15% SCSS 2.38%

mapping's People

Contributors

Stargazers

Watchers

Forkers

kmbphd94 ytakemon obwalton sedaarat jun-lizst fboehm vk6733 bzebosi stan2019 rohitkt10

mapping's Issues

kinship episode

kinship = proportion of shared alleles between individuals
heatmap(kinship[1:20, 1:20], symm = TRUE)
some individuals don't appear to be related to self because they are highly heterozygous, thus don't share alleles with self; each chromosome doesn't share alleles with its partner
discuss with your partner why diagonal doesn't equal 1, and the relationship between this and homo and heterozygosity

pre-workshop survey

in addition to R and programming language, ask about genetics and biology and statistics background

Bayes credible interval

graphic of lod plot showing 90, 95, 99% credible interval

Input file format comments

This page discusses qtl2 package, but have not connected the relationship between the other qtl2 sub-libraries that has been mentioned in the last 2 episodes such as: qtl2geno, qtl2convert, and qtl2db. Is qtl2 interchangeable with gtl2geno? Is the future aim to have a bunch of qtl2 sub-libraries to be housed under qtl2, so that when you load qtl2 you would load all the sub-libraries? This part should be explained or to be consistent just load qtl2.

challenge: explore permuted data

make a histogram of permuted iron data
have a look at the permuted iron data (head)

Calculating Genotype Probabilities commets

I think this page is very well constructed! It flows very nicely and the explanations a very clear.

One part I noticed that wasn't explained was how to know what the error_probability. How do you determine the error_prob? A short explanation would be nice here:

# How did you get 0.002?
pr <- calc_genoprob(cross=iron, map=map, error_prob=0.002)

kinship loco in 09 genome scan lmm

after explaining model accounting for relatedness, include example of kinship loco calculation
fit model leave on chr out
model effect of allele minus current allele; chr minus current chr

Under Kinship Better explanation between allele and genotype

During the Kinship Matrix section, there was discussion about using allele probabilities vs genotype probabilities. There was a specific question about why you would choose to use one over the other, however, for me I wasn't even sure what was meant by the difference between allele probabilities and genotype probabilities. It turns out when I spoke to Dan during break, I actually had the two concepts reversed.

script of all commands

as a reference so that people can run through commands quickly instead of searching through lesson for individual commands.
Karl's documentation should have it.

make graphic of input data for input file episode

1st column = sample ids
columns 2-5 = phenotypes
column 6-7 = covariates
table 2 = marker genotypes
superior table = marker map
output as svg

LOCO model

... except the genotype under consideration. Why?

because you're putting the same genotype (predictor) into the model twice
overfitting
multi-collinearity

add sex as a covariate

we only use X-chromosome covariates in the lesson
change episode "Special covariates for the X chromosome" to "Covariates" w/ additive covar as subhead 1 and special Xcovar as subhead 2
for addcovar use sex and direction of cross
for Xchr covar do what's written
update all subsequent episodes - all genome scans and perms

Change order of lessons

Students were confused by the threshold in find_peaks(). Move permutations after performing a genome scan and before find peaks.

line plot of permutations at 10, 100, and 1000

showing spread of permutations

My permutation simulations.

thrs = rep(0, 100)
for(i in 1:100) {
tmp = scan1perm(genoprobs = pr, pheno = iron$pheno[,'liver', drop = F], Xcovar = Xcovar,
n_perm = 10)
thrs[i] = quantile(tmp, probs = 0.95)
}

thrs2 = rep(0, 100)
for(i in 1:100) {
tmp = scan1perm(genoprobs = pr, pheno = iron$pheno[,'liver', drop = F], Xcovar = Xcovar,
n_perm = 100)
thrs2[i] = quantile(tmp, probs = 0.95)
}

thrs3 = rep(0, 100)
for(i in 1:100) {
print(i)
tmp = scan1perm(genoprobs = pr, pheno = iron$pheno[,'liver', drop = F], Xcovar = Xcovar,
n_perm = 1000, cores = 4)
thrs3[i] = quantile(tmp, probs = 0.95)
}

data.frame(perm10 = thrs, perm100 = thrs2, perm1000 = thrs3) %>%
pivot_longer(cols = everything()) %>%
ggplot() +
geom_density(aes(value, color = name), size = 1) +
labs(title = 'Comparison of 0.05 threshold using 10 vs 100 permutation',
x = 'LOD', color = 'Num. Perms')

Introduction comments:

Sample datasets: read_cross step is redundant from previous set up page. I think the two steps should be collapsed so you don't end up doing the same thing in 5 different steps, which seems to be the easiest way to confuse people (ie me :) ).

# Redudant:
grav2 <- read_cross2( system.file("extdata", "grav2.zip", package="qtl2geno") )
grav2 <- read_cross2("http://kbroman.org/qtl2/assets/sampledata/grav2/grav2.zip")
grav2 <- read_cross2("~/my_data/grav2.zip")
zip_datafiles("~/my_data/grav2.yaml")
grav2 <- read_cross2("~/my_data/grav2.yaml")

# From all these I think the easiest method is:
grav2 <- read_cross2( system.file("extdata", "grav2.zip", package="qtl2geno") )
# but it would be good for everyone to download the sample data into their own directory 
# and learn to load data from their own machines:
grav2 <- read_cross2("~/my_data/grav2.yaml")

Setup comments

Currently creating an Rproject is an option. In that case, it would be good to start everyone in the same directory to prevent future directory confusions. For example, it might be easy to have everyone make a directory called "mapping" on their desktop and run the following before creating new directories:

setwd("~/Desktop")
dir.create("./mapping")
setwd("~/Desktop/mapping")
dir.create("./data")
dir.create("./scripts")
dir.create("./results")

Downloading DOQTL data:
It should be said that once the data have been downloaded, it should be put into the ~/Desktop/mapping/data directory.
Initial steps that demonstrate reading data is a bit confusing because grav2.yaml wasn't downloaded yet. Where do you get it? There is no link. Also It might be better to either have participants go download the grav2.yaml first and put it into their data directory then the demo, or just read from Karl's website directly but showing all the different ways might be a bit confusing:

library(qtl2geno)
grav2 <- read_cross2("http://kbroman.org/qtl2/assets/sampledata/grav2/grav2.zip")
#or download into directory first then:
grav2 <- read_cross2("~/my_data/grav2.yaml")

write_control_file() was mentioned at the very end lacks explanation. It looks like a pretty useful function so if it's is going to be mentioned might as well provide some examples for its use? This example was taken from the documentation:

?write_control_file()

write_control_file("~/my_data/grav2.yaml",
                        crosstype="riself",
                        geno_file="grav2_geno.csv",
                        gmap_file="grav2_gmap.csv",
                        pheno_file="grav2_pheno.csv",
                        phenocovar_file="grav2_phenocovar.csv",
                        geno_codes=c(L=1L, C=2L),
                        alleles=c("L", "C"),
                        na.strings=c("-", "NA"))

permutation exercise

new_order <- sample(rownames(iron$pheno))
pheno_perm <- iron$pheno
rownames(pheno_perm) <- new_order
xcovar_perm <- Xcovar
rownames(xcovar_perm) <- new_order
p <- scan1(genoprobs = pr, pheno = pheno_perm, Xcovar = xcovar_perm)
plot(p, map)
head(new_order)

regular scan has max lod ~ 7

new permutation has max lod ~ 2

scrambled all the data so that phenotypes are no longer connected to the sample or mouse

what is max lod by chance?

paste your max lod into etherpad
load all max lods into a vector
create a histogram of max lods
do 1000 perms as in the lesson
plot the max lod in a histogram
max a hist of operm[,1]
now on your own, hist of operm[,2]
draw abline for 0.2 or 0.05 threshold for liver or spleen onto histogram
abline(v = 3.36, col = "red")
now on your own, do the same for spleen, for 0.2 , etc. use blue, gray etc.

create script for comparison of kinship methods for challenge

create a script of the following for people to copy and paste, then change lodcolumns to 2 for spleen and col 2 for spleen
create challenge and provide url to script for copy-paste

this is for liver

add comment

plot(out, map, lodcolumn=1,
col=color[1],
main=colnames(iron$pheno)[1],
ylim=c(0, ymx*1.02))

add comment

plot(out_pg, map, lodcolumn=1, col=color[2], add=TRUE)

add comment

plot(out_pg_loco, map, lodcolumn=1,
col=color[3], add=TRUE, lty=2)

add comment

legend("topleft", lwd=2, col=color,
c("H-K", "LMM", "LOCO"), bg="gray90", lty=c(1,1,2))

plot(out, lodcolumn="liver", map)
plot(out, lodcolumn="liver", map, col="black", add=TRUE)
plot(out_pg, lodcolumn="liver", map, col="red", add=TRUE)
plot(out_pg_loco, lodcolumn="liver", map, col="yellow", add=TRUE)

smcclatchy / mapping Goto Github PK

mapping's People

Contributors

Stargazers

Watchers

Forkers

mapping's Issues

My permutation simulations.

regular scan has max lod ~ 7

new permutation has max lod ~ 2

scrambled all the data so that phenotypes are no longer connected to the sample or mouse

what is max lod by chance?

this is for liver

add comment

add comment

add comment

add comment

Recommend Projects

Recommend Topics

Recommend Org