ngs-docs / angus Goto Github PK
View Code? Open in Web Editor NEWMaterials for Analyzing Next-Generation Sequencing (ANGUS) course.
License: Creative Commons Zero v1.0 Universal
Materials for Analyzing Next-Generation Sequencing (ANGUS) course.
License: Creative Commons Zero v1.0 Universal
Could also add clicker-style items / active learning
“Challenge questions” socrative.com <- marian has experience.
Can use draw.io collaboratively? Or google draw? Or…? Develop flowcharts collaboratively in classroom.
Can the link on http://angus.readthedocs.org/en/2015/week3.html to the assembly material (now pointing to https://github.com/ngs-docs/angus/blob/2015/week3/LN_assembly.md) be made to point to http://angus.readthedocs.org/en/2015/week3/LN_assembly.html?
Need to replace code for downloading PLINK in the Genome wide association study tutorial
https://angus.readthedocs.io/en/2017/running-blast-large-scale.html doesn't exist yet.
my thought was to run something that took a little while, output some data that we could use in https://angus.readthedocs.io/en/2017/visualizing-blast-scores-with-RStudio.html, and let us show off "starting something big and walking away for a bit."
my first naive thought was to calculate BLAST score distributions for some large reciprocal BLAST (e.g. ecoli vs shewanella or something), visualize it, and then ask students to pick a cutoff based on the graph, and maybe do a little bit of analysis of how many putative orthologs there are.
or we could drag shmlast into this.
@camillescott what do you think?
See below
## Bibliography
It's also possible to include a bibliography file in the YAML header. Bibliography formats that are readable by Pandoc include the following:
| Format | File extension |
|-----+-------|
| MODS | .mods |
| BibLaTeX | .bib |
| BibTeX | .bibtex |
| RIS | .ris |
| EndNote | .enl |
| EndNote XML | .xml |
| ISI | .wos |
| MEDLINE | .medline |
| Copac | .copac |
| JSON citeproc | .json |
second line should be | --- | --- |
All_SKAT_Data_reduced$ <- All_SKAT_Data_reduced[order(All_SKAT_Data_reduced$P.value),]
Has one too many $.
Perhaps progressive taxation will help?
@camillescott adding info on public-private key pair
The tutorial assumes that you have R and R studio installed on an instance.
I was thinking we could demo Open Science Framework as a place to do closed collaborations that could then => open upon publication.
what other demos should we do? figshare? twitter? seqanswers? stackoverflow?
note, Sue McClatchy has agreed to be external contact.
Notes to be included in the lesson
-Use set -x
: stops the execution of a script if a command or pipeline has an error - which is the opposite of the default shell behaviour, which is to ignore errors in scripts
-Use set -e
: prints out each command before executing it - extra useful for debug
Both options are great to include in the beginning of the file
Include #! /bin/bash
in the beginning of the script to define the script interpreter
Include bash <scriptfile>
as a means of running the script file (instead of going directly to ./<scriptfile>
)
Include information on batch scripts
Check for flag in curl
for not re-downloading the file every single time
Add the way to rename a screen; at creation screen -S <name>
, after creation Ctrl
+a
, :sessionname <name>
Add commands to detach a screen without the shortcut: screen -d <ID>
Another way to terminate a screen without needing to reattach: screen -X -S [session # you want to kill] quit
web interface instance password change
we should at least mention this, and maybe demo it (if we have a group of people that are prepared for complexity in 2nd week)
Day 1 and 2: https://hackmd.io/KYExEMA4CMAYFYC0BGA7NAzIgLMgxrItHuOIusNhqhpOMrAGZA==?both
Day 3:
https://hackmd.io/CwDgZgpg7AJgTHAtBAxgVgIaOARhAZkRDQh0TQDYZwwoUAGCtfIA
Day 4:
https://hackmd.io/MwdgrCBGAMCmCcBaEATAxgM0QFniaiAHMAIwBMiwhGY2aa2h8AhmUA==
Day 8 Sourmash:
https://hackmd.io/MwMwDARgTBAmCGBaAjANgKwE5EBZn3UXgFN0IUSAOTYCYKVYqIA=
Day 10 RMarkdown Exploratory Data Analysis:
https://hackmd.io/MYNgrMCcDsBMAMBaYAjeAORAWAZgQz0TyxViPmklgGYQBTAE2tCA
we should update the section on what books people recommend.
Update and improve http://sjackman.github.io/abyss-activity/
In the "what's in my metagenome" section, the command
sourmash gather -k 31 ecoli-genome.sig genbank-k31.sbt.json
fails with a file not found error. Need to add in ../ before the genbank file
it looks like we will have a stats lecture on thursday morning, so maybe we should do counting on thursday and then assembly/annotation on friday?
In the nanopore tutorial (and probably others) there are quite a few export
that get forgotten when people get disconnected from their favourite cloud provider
export PATH=/home/linuxbrew/.linuxbrew/bin:$PATH
export PATH=$PWD/prokka/bin:$PATH
TODO: add to .bashrc
or .bash_profile
and source it
Some high level thoughts from a quick review --
We don't have diagrams or explanations for a lot of what we are doing in week 1.
We should sprinkle in topic guides to help pace things through week 1.
Many of the files downloaded and used are coming in from other repos, e.g. the install-edgeR.R script here. We should move them into angus/2018.
We don't have any formative assessment stuff sprinkled in (questions, mechanisms, etc.)
We may want to add in instructor guides for extra material in case people end too quickly.
We should add hackmds for each major topic shift, as per day1.
write up something on how to choose/benchmark bioinformatics software
Syntax error in
SSID_count <- count(SSID[,1])
in
NGS_GWAS_via_SKAT.md
should be missing the comma.
ie vim and nano aren't updating their UIs, etc.
On a new instance, following http://angus.readthedocs.io/en/2017/deseq2-asthma.html, install-deseq2.R
doesn't install GenomicFeatures (nor DESeq2)
R version 3.4.1 (2017-06-30) -- "Single Candle"
> library(GenomicFeatures)
Error in library(GenomicFeatures) :
there is no package called ‘GenomicFeatures’
when installing with install-deseq2.R
:
* removing ‘/usr/local/lib/R/site-library/GenomicAlignments’
ERROR: dependency ‘annotate’ is not available for package ‘genefilter’
* removing ‘/usr/local/lib/R/site-library/genefilter’
ERROR: dependency ‘annotate’ is not available for package ‘geneplotter’
* removing ‘/usr/local/lib/R/site-library/geneplotter’
ERROR: dependencies ‘GenomicRanges’, ‘XML’, ‘GenomeInfoDb’, ‘RCurl’, ‘Rsamtools’, ‘GenomicAlignments’ are not available for package ‘rtracklayer’
* removing ‘/usr/local/lib/R/site-library/rtracklayer’
ERROR: dependencies ‘GenomicRanges’, ‘SummarizedExperiment’, ‘genefilter’, ‘geneplotter’ are not available for package ‘DESeq2’
* removing ‘/usr/local/lib/R/site-library/DESeq2’
ERROR: dependencies ‘GenomeInfoDb’, ‘GenomicRanges’, ‘RCurl’, ‘rtracklayer’, ‘biomaRt’ are not available for package ‘GenomicFeatures’
* removing ‘/usr/local/lib/R/site-library/GenomicFeatures’```
points to touch on:
put in an optional discussion/info on apt-get vs pip vs 'git clone' vs perl vs ruby (if we have any ruby)
do we want to link in session feedback forms? if so how do we avoid unconscious bias issues?
@lexnederbragt has a nice formula "a wish and a star" - we could ask attendees who are giving feedback on a session to say at least one nice thing (a star) as well as one "wish" for change.
I would also say that only a few of us could/should read the feedback and then summarize to the instructors. this can be a lot of work, though, and in the past I've never gotten to actually doing it.
...in the first few days.
Add information about shutting down an instance to the booting an instance tutorial
Problems I see from variant calling lesson:
(https://github.com/ngs-docs/angus/blob/2017/variant-calling.md)
samtools sort SRR2584857.bam SRR2584857.sorted
samtools view SRR2584857.sorted.bam 'ecoli:920514-920514' > out.bam
wc -l
I'll try and fix some of these tomorrow!
It seems the data we used in 2016 been removed from SRA!
https://github.com/ngs-docs/angus/blob/2017/kraken_species_identification.rst
If anyone knows a good metagenome example SRR can they update this?
we should sprinkle in formative assessments and exercises during various tutorials (clicker-style, using socratic or google forms).
@karenword presumably has some thoughts :)
for tonight, maybe -
The line defining All_SKAT_Data_reduced
in NGS_GWAS_via_SKAT.md
should be
All_SKAT_Data_reduced <- read.table(file = "data/SKAT_all_reduced-pvals.results", header =TRUE)
Comment about the Intro to Docker tutorial.
apt-get update
and apt-get install
on the same line, the same way it was done interactively. It has a few benefits: it builds only one layer, and the install runs only if the update is successful. Arguably, the git clone and the make
call could be together too, but that might obscure things too much. It's probably better to just stick to the sequence of actions done interactively, i.e. the dockerfile is essentially the content of history
with a few special commands prepended to each line.Use the explanations from the data-lessons/cloud-genomics carpentries explanations.
See https://github.com/data-lessons/cloud-genomics/tree/gh-pages/_episodes "beta" episodes.
inconsistent title size and large images in meta_GWAS.md
The link to the read group document is broken.
Wouldn't be easy to parse the blast output for plotting e-value distributions in R if the blast output is tab delimited or comma separated instead of the default flat query anchored type?
See: http://ivory.idyll.org/blog/2014-davis-swc-training.html.
Watch this space (well, subscribe to this issue :) for updates!
Please e-mail [email protected] directly if you have questions.
should be during first two days
apt-get install doesn't work properly for fastqc - see https://angus.readthedocs.io/en/2017/quality-trimming.html.
I think we can just install the prereqs and then download fastqc / unpack it ourselves. but worth trying out.
The file on booting a jetstream instance still talks about a username and password for the UCSC workshop. This should be changed to refer to ANGUS.
The link to the 48-replicate experiment lacks the reference, which will survive longer than the link.
Add author, publication information for this paper to hands-on.Rmd
The installs of stringr
and SKAT
in Setup-for-gwas-on-EC2.md
failed to allow inclusion of these libraries for many of the learners on EC2.
This requires debugging on EC2.
see https://angus.readthedocs.io/en/2017/kmers-and-sourmash.html - the genbank download is too big for m1.small instances.
we could use this as a way to talk about creating extra disk space, or we could remove it... or see if @luizirber IPFS demo is ready :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.