Giter VIP home page Giter VIP logo

mapping's People

Contributors

abbycabs avatar billmills avatar dmgatti avatar erinbecker avatar evanwill avatar fmichonneau avatar gdevenyi avatar gvwilson avatar jduckles avatar jnothman avatar jpallen avatar maxim-belkin avatar mikewlloyd avatar neon-ninja avatar pbanaszkiewicz avatar pipitone avatar rgaiacs avatar smcclatchy avatar synesthesiam avatar twitwi avatar wking avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mapping's Issues

kinship episode

kinship = proportion of shared alleles between individuals
heatmap(kinship[1:20, 1:20], symm = TRUE)
some individuals don't appear to be related to self because they are highly heterozygous, thus don't share alleles with self; each chromosome doesn't share alleles with its partner
discuss with your partner why diagonal doesn't equal 1, and the relationship between this and homo and heterozygosity

pre-workshop survey

in addition to R and programming language, ask about genetics and biology and statistics background

Input file format comments

  1. This page discusses qtl2 package, but have not connected the relationship between the other qtl2 sub-libraries that has been mentioned in the last 2 episodes such as: qtl2geno, qtl2convert, and qtl2db. Is qtl2 interchangeable with gtl2geno? Is the future aim to have a bunch of qtl2 sub-libraries to be housed under qtl2, so that when you load qtl2 you would load all the sub-libraries? This part should be explained or to be consistent just load qtl2.

Calculating Genotype Probabilities commets

I think this page is very well constructed! It flows very nicely and the explanations a very clear.

  1. One part I noticed that wasn't explained was how to know what the error_probability. How do you determine the error_prob? A short explanation would be nice here:
# How did you get 0.002?
pr <- calc_genoprob(cross=iron, map=map, error_prob=0.002)

kinship loco in 09 genome scan lmm

after explaining model accounting for relatedness, include example of kinship loco calculation
fit model leave on chr out
model effect of allele minus current allele; chr minus current chr

Under Kinship Better explanation between allele and genotype

During the Kinship Matrix section, there was discussion about using allele probabilities vs genotype probabilities. There was a specific question about why you would choose to use one over the other, however, for me I wasn't even sure what was meant by the difference between allele probabilities and genotype probabilities. It turns out when I spoke to Dan during break, I actually had the two concepts reversed.

script of all commands

as a reference so that people can run through commands quickly instead of searching through lesson for individual commands.
Karl's documentation should have it.

LOCO model

... except the genotype under consideration. Why?

because you're putting the same genotype (predictor) into the model twice
overfitting
multi-collinearity

add sex as a covariate

we only use X-chromosome covariates in the lesson
change episode "Special covariates for the X chromosome" to "Covariates" w/ additive covar as subhead 1 and special Xcovar as subhead 2
for addcovar use sex and direction of cross
for Xchr covar do what's written
update all subsequent episodes - all genome scans and perms

Change order of lessons

Students were confused by the threshold in find_peaks(). Move permutations after performing a genome scan and before find peaks.

line plot of permutations at 10, 100, and 1000

showing spread of permutations

My permutation simulations.

thrs = rep(0, 100)
for(i in 1:100) {
tmp = scan1perm(genoprobs = pr, pheno = iron$pheno[,'liver', drop = F], Xcovar = Xcovar,
n_perm = 10)
thrs[i] = quantile(tmp, probs = 0.95)
}

thrs2 = rep(0, 100)
for(i in 1:100) {
tmp = scan1perm(genoprobs = pr, pheno = iron$pheno[,'liver', drop = F], Xcovar = Xcovar,
n_perm = 100)
thrs2[i] = quantile(tmp, probs = 0.95)
}

thrs3 = rep(0, 100)
for(i in 1:100) {
print(i)
tmp = scan1perm(genoprobs = pr, pheno = iron$pheno[,'liver', drop = F], Xcovar = Xcovar,
n_perm = 1000, cores = 4)
thrs3[i] = quantile(tmp, probs = 0.95)
}

data.frame(perm10 = thrs, perm100 = thrs2, perm1000 = thrs3) %>%
pivot_longer(cols = everything()) %>%
ggplot() +
geom_density(aes(value, color = name), size = 1) +
labs(title = 'Comparison of 0.05 threshold using 10 vs 100 permutation',
x = 'LOD', color = 'Num. Perms')

Introduction comments:

  1. Sample datasets: read_cross step is redundant from previous set up page. I think the two steps should be collapsed so you don't end up doing the same thing in 5 different steps, which seems to be the easiest way to confuse people (ie me :) ).
# Redudant:
grav2 <- read_cross2( system.file("extdata", "grav2.zip", package="qtl2geno") )
grav2 <- read_cross2("http://kbroman.org/qtl2/assets/sampledata/grav2/grav2.zip")
grav2 <- read_cross2("~/my_data/grav2.zip")
zip_datafiles("~/my_data/grav2.yaml")
grav2 <- read_cross2("~/my_data/grav2.yaml")

# From all these I think the easiest method is:
grav2 <- read_cross2( system.file("extdata", "grav2.zip", package="qtl2geno") )
# but it would be good for everyone to download the sample data into their own directory 
# and learn to load data from their own machines:
grav2 <- read_cross2("~/my_data/grav2.yaml")

Setup comments

  1. Currently creating an Rproject is an option. In that case, it would be good to start everyone in the same directory to prevent future directory confusions. For example, it might be easy to have everyone make a directory called "mapping" on their desktop and run the following before creating new directories:
setwd("~/Desktop")
dir.create("./mapping")
setwd("~/Desktop/mapping")
dir.create("./data")
dir.create("./scripts")
dir.create("./results")
  1. Downloading DOQTL data:
    It should be said that once the data have been downloaded, it should be put into the ~/Desktop/mapping/data directory.

  2. Initial steps that demonstrate reading data is a bit confusing because grav2.yaml wasn't downloaded yet. Where do you get it? There is no link. Also It might be better to either have participants go download the grav2.yaml first and put it into their data directory then the demo, or just read from Karl's website directly but showing all the different ways might be a bit confusing:

library(qtl2geno)
grav2 <- read_cross2("http://kbroman.org/qtl2/assets/sampledata/grav2/grav2.zip")
#or download into directory first then:
grav2 <- read_cross2("~/my_data/grav2.yaml")
  1. write_control_file() was mentioned at the very end lacks explanation. It looks like a pretty useful function so if it's is going to be mentioned might as well provide some examples for its use? This example was taken from the documentation:
?write_control_file()

write_control_file("~/my_data/grav2.yaml",
                        crosstype="riself",
                        geno_file="grav2_geno.csv",
                        gmap_file="grav2_gmap.csv",
                        pheno_file="grav2_pheno.csv",
                        phenocovar_file="grav2_phenocovar.csv",
                        geno_codes=c(L=1L, C=2L),
                        alleles=c("L", "C"),
                        na.strings=c("-", "NA"))

permutation exercise

new_order <- sample(rownames(iron$pheno))
pheno_perm <- iron$pheno
rownames(pheno_perm) <- new_order
xcovar_perm <- Xcovar
rownames(xcovar_perm) <- new_order
p <- scan1(genoprobs = pr, pheno = pheno_perm, Xcovar = xcovar_perm)
plot(p, map)
head(new_order)

regular scan has max lod ~ 7

new permutation has max lod ~ 2

scrambled all the data so that phenotypes are no longer connected to the sample or mouse

what is max lod by chance?

paste your max lod into etherpad
load all max lods into a vector
create a histogram of max lods
do 1000 perms as in the lesson
plot the max lod in a histogram
max a hist of operm[,1]
now on your own, hist of operm[,2]
draw abline for 0.2 or 0.05 threshold for liver or spleen onto histogram
abline(v = 3.36, col = "red")
now on your own, do the same for spleen, for 0.2 , etc. use blue, gray etc.

create script for comparison of kinship methods for challenge

create a script of the following for people to copy and paste, then change lodcolumns to 2 for spleen and col 2 for spleen
create challenge and provide url to script for copy-paste

this is for liver

add comment

plot(out, map, lodcolumn=1,
col=color[1],
main=colnames(iron$pheno)[1],
ylim=c(0, ymx*1.02))

add comment

plot(out_pg, map, lodcolumn=1, col=color[2], add=TRUE)

add comment

plot(out_pg_loco, map, lodcolumn=1,
col=color[3], add=TRUE, lty=2)

add comment

legend("topleft", lwd=2, col=color,
c("H-K", "LMM", "LOCO"), bg="gray90", lty=c(1,1,2))

plot(out, lodcolumn="liver", map)
plot(out, lodcolumn="liver", map, col="black", add=TRUE)
plot(out_pg, lodcolumn="liver", map, col="red", add=TRUE)
plot(out_pg_loco, lodcolumn="liver", map, col="yellow", add=TRUE)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.