R Genomics

This is a repository for Data Carpentry R Genomics lessons

r-genomics's People

Contributors

Watchers

r-genomics's Issues

overall idea for R-genomics lesson

Have a metadata file with all the clones.
sample
generatin
clade strain cit run genome_size
Read in that file.
Look at it with str and summary.
Talk about factors.
Deal with NAs somehow, maybe where citrate level unknown?
Do some summary statistics with genome size.
Filter on citrate status or on generation.
Mutate unit conversion with genome size
Group by citrate status and then look at genome size as related to status
Plot average genome size by citrate status, or by generation
Could do a t-test for genome size by citrate status

Then after dplyr lesson move on working with the SNP calling data. That info is in another file.
Read in that file then join the SNP file with the metadata file with an inner_join

Read in SNP file
Look at that file
Stitch them together with inner_join
inner_join
https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
https://stat545-ubc.github.io/bit001_dplyr-cheatsheet.html

Then use inspiration from Ryan's R_visualization lesson and do things with the SNP calls.

look at number of SNPs per clone
average and sd the number of SNPs for each citrate type
ggplot averages and sd
exercise: how would you do this for generation number?

multivariate analysis showing how close they are to each other

Re-synchronize with upstream

I have now injected the history from R-ecology, resolved merge conflicts, and merged your development into the main Data Carpentry repo. Before continuing development in this fork, please ensure that you update from upstream. If you do this, it necessarily won't be a fast-forward. Alternatively, delete the fork and fork again for a clean start, though see below.

I see that you have open issues in your fork's tracker (#1 and #2); if these are still current, could you please move them to the main repo so we have a central place to track issues with the lesson?

Get SNP dataset

We need to get the SNP calls for the E. coli genomes in this study. They might already be in Dryad

http://datadryad.org/resource/doi:10.5061/dryad.8q6n4

Recommend Projects

tracykteal / r-genomics Goto Github PK

r-genomics's Introduction

R Genomics

r-genomics's People

Contributors

Watchers

Forkers

r-genomics's Issues

overall idea for R-genomics lesson

Re-synchronize with upstream

Get SNP dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent