griffithlab / rnaseq_tutorial Goto Github PK

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

License: Other

R 66.91% Perl 24.22% Shell 8.87%

rnaseq_tutorial's Introduction

Informatics for RNA-seq: A web resource for analysis on the cloud

THIS VERSION OF THE RNA-SEQ COURSE IS DEPRECATED. FOR CURRENT VERSION PLEASE VISIT: https://rnabio.org/

An educational tutorial and working demonstration pipeline for RNA-seq analysis including an introduction to: cloud computing, next generation sequence file formats, reference genomes, gene annotation, expression analysis, differential expression analysis, alternative splicing analysis, data visualization, and interpretation.

This repository is used to store code and certain raw materials for a detailed RNA-seq tutorial. To actually complete this tutorial, go to the RNA-seq tutorial wiki.

Citation: Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud. PLoS Comp Biol. 11(8):e1004393.

*To whom correspondence should be addressed: E-mail: mgriffit[AT]genome.wustl.edu, ogriffit[AT]genome.wustl.edu

Note: An archived version of this tutorial exists here. This version is maintained for consistency with the published materials (Griffith et al. 2015. PLoS Comp Biol.) and for past students wishing to review covered material. However, we strongly suggest that you continue with the current version of the tutorial below.

Want to contribute to the RNA-seq Wiki?

Fork it and send a pull request.

Tutorial Table of Contents

Module 0 - Introduction and Cloud Computing

Module 1 - Introduction to RNA sequencing

Module 2 - RNA-seq Alignment and Visualization

Module 3 - Expression and Differential Expression

Module 4 - Isoform Discovery and Alternative Expression

Module 5 - De novo transcript reconstruction

De novo RNA-Seq Assembly and Analysis Using Trinity

Module 6 - Functional Annotation of Transcripts

Functional Annotation of Assembled Transcripts Using Trinotate

Appendix

rnaseq_tutorial's People

Contributors

Stargazers

Watchers

Forkers

al3n70rn singerma dfajar2 washingtondasilva arash104 chinhbn isanwong aishalynn bioxiao lran2008 xuemeiluo cyang-2014 dinasun antoniogps csuliweilong honglongwu hjanime uksurd88 dhanaprakashj fw1121 crazyhottommy mqondisi lry198010 xiuying madduri jasvinderkaur nhettige lingdudefeiteng siyue1226 quelopes mrg7 nevegav suzychang cschoi boratonaj rareseas q-kim ashuein pittmiqi duydn woowoods b1234561 snewhouse xiongxu cooleel kisun divyank0 trentonko vd4mmind sunxingqiang agrwalmohit renzhonglu yanfeili mohanbolisetty zihua jielovedata prakashraaz tyagianuj robertoalvarezm franklyfakeli shankarkshakya jiaolongsun living-death themisgia antjemaertin dulunar zhenbinhu zlskidmore jeanielmj i-claudius yuanchuntian bioinformatics-institute nzgl sultan-alharbi zhyguo kapeel yimsea vvrahul11 derderi genomicsnx hbai521 zhangheng217 dolittle007 jasonwalker80 mdavy86 snashraf ruru-adra sm30 linhua-sun andreamrau jixing475 cwt1 zychen2016 libyarlaylab yangkangyf yyxql srikanthkris ppsg lixuenan j-p-courneya

rnaseq_tutorial's Issues

Replace ENSG IDs with gene names/symbols

Create a mapping file, output a final list of DE gene names, create a mapping to gene name in R and use gene names as symbols in final figures.

Steps to conduct RNA-seq data analysis in R

What are the steps to do RNA-seq data analysis in R? I need to do it using any standard R package (s). Could anybody suggest me the steps with R code?

ftp site for data is not working

The ftp site with the data for the RNA-Seq tutorial does not work:

wget ftp://genome.wustl.edu/pub/rnaseq/data/brain_vs_uhr_w_ercc/downsampled_5pc_chr22/HBR_UHR_ERCC_ds_5pc.tar

Update scripts to operate with StringTie/Ballgown

Tutorial_Module4_Part2_CummeRbund.R
Tutorial_Module4_Part3_Supplementary_R.R
Tutorial_Module4_ERCC_DE.R
Tutorial_Module4_Part4_edgeR.R

RNA_Seq analysis get counts from Balgown #264

Hello Everyone,
I am new to RNA-seq analysis. First of all I 'd like give some information about my data. My data is consist of 24 SRR file related to Sex-specific and lineage-specific alternative splicing in primates. Primates are human and chimpanzees. we used RNAseq to study transcript levels in humans and chimpanzees, using liver RNA samples from three males and three females from each species. For each sex there are two replicates. Briefly, For human, I have 12 RNA-seq file: Three males x2 replicates (For example: male1 rep1,male1 rep2 and so on.) three females x 2 replicates, for chimpanzees likewise human. For RNA-seq analsis ,I am using Nature protocol.These RNA-seq data contains single reads.
1- For first step when I align these rna-seq reads to genome. I used this comman:"hisat2 -p 2 --dta -x indexes/hg.v90 project_datasets/SRR032126.fastq -S SRR032126.sam".I am not sure whether is true or not
and as I said I have 2 replicates for each sex, Should I combine this replicate on one file? İf it is, how to perform it? Or why we use the two or more replicates.
2- I used the hisat2 >stringtie >... balgown for this analysis, but I want to do the differential analysis via DeSeq2. As you konow for this ,you need to count read. How to get this count reads from balgown. Thank you...

stringtie_expression_matrix.pl

hello,
thank you for the rnaseq tutorial.
i was trying to use the script "stringtie_expression_matrix.pl" to Extract FPKM/TPM or coverage results and i used the following code:
perl stringtie_expression_matrix.pl --expression_metric=FPKM \ --result_dirs='alevin_RNA-1,alevin_RNA-2,alevin_RNA-3,egg_RNA-1,egg_RNA-2,egg_RNA-3,FF_RNA-1,FF_RNA-2,FF_RNA-3' \ --transcript_matrix_file=transcript_tpms_all_samples.tsv \ --gene_matrix_file=gene_tpms_all_samples.tsv
for the files of transcripts gtf, their structure is as follow:

when i run the command, i got the following error:
Processing data for the following 9 samples:
Could not find transcript id in line: 1 StringTie transcript 135071 135516 1000 . . gene_id "MSTRG.1"; transcript_id "MSTRG.1.1"; cov "6681.424805"; FPKM "1365.562866"; TPM "3094.511230";
how is possible the fix the problem?
thank you for your help.

No X11 DISPLAY when trying to open fastqc

The X11 Forwarding seems off when trying to open fastqc, any ideas? I have stupidly tried export DISPLAY= 0.0 which seem not working as well.

Can this tutorial be done without cloud?

IS it possible to undertake this tutorial on standalone system?
I have for now installed the mentioned tools on my system and seems to be ready!

Hisat2 error

Hi, I'm experiencing errors when running the hisat2 to generate sam files.
hisat2 -p 8 --rg-id=UHR_Rep2 --rg SM:UHR --rg LB:UHR_Rep2_ERCC-Mix1 --rg PL:ILLUMINA --rg PU:CXX1234-TGACAC.1 -x $RNA_REF_INDEX --dta --rna-strandness RF -1 $RNA_DATA_DIR/UHR_Rep2_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq.gz -2 $RNA_DATA_DIR/UHR_Rep2_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq.gz -S ./UHR_Rep2.sam Warning: the current version of HISAT2 (2.0.4) is older than the version (2.167.69) used to build the index. Users are strongly recommended to update HISAT2 to the latest version. Error reading block of _offs[] array: 0, 2097152 Error: Encountered internal HISAT2 exception (#1) Command: /home/ubuntu/workspace/rna_seq/tools/hisat2-2.0.4/hisat2-align-s --wrapper basic-0 -p 8 --rg-id=UHR_Rep2 --rg SM:UHR --rg LB:UHR_Rep2_ERCC-Mix1 --rg PL:ILLUMINA --rg PU:CXX1234-TGACAC.1 -x /home/ubuntu/workspace/rna_seq/reference_genome/chr22_with_ERCC92 --dta --rna-strandness RF -S ./UHR_Rep2.sam -1 /tmp/1602.inpipe1 -2 /tmp/1602.inpipe2 (ERR): hisat2-align exited with value 1

Any idea what caused it?

I don't know how to use this code.

you can see this website

Where do I get ERCC transcript file?

In 1-ii

This has been done for you and that data placed on your AWS instance. It contains chr22 and ERCC transcript fasta files in both a single combined file and individual files. Copy the file to the rnaseq working directory

I downloaded the file Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa. Do I need to extract chr22 from this file?
Where do I get ERCC transcript file?

./stringtie_expression_matrix.pl does not work

I am trying to follow the tutorial, but I cannot get past the "expression" section.

In particular, when I try to get the expression matrix with ./stringtie_expression_matrix.pl , I get a compilation error:

./stringtie_expression_matrix.pl --expression_metric=TPM --result_dirs='HBR_Rep1,HBR_Rep2,HBR_Rep3,UHR_Rep1,UHR_Rep2,UHR_Rep3' --transcript_matrix_file=transcript_tpm_all_samples.tsv --gene_matrix_file=gene_tpm_all_samples.tsv

Type of arg 1 to keys must be hash (not private array) at ./stringtie_expression_matrix.pl line 49, near "@sample_list;" Execution of ./stringtie_expression_matrix.pl aborted due to compilation errors.

I am not really an expert in PERL, so I am having some problems troubleshooting the code, but it seems that sample_list is not a hash (well, it says that in the error...).

How do I solve this?

Any section discussing about lncRNA and circleRNA

Hi Dr. Griffith,

Any section about lncRNA and circleRNA?

Regards,

Normalization module

I think that this tutorial could use a module on normalizing the gene/transcript expression estimates provided by stringtie, before differential expression estimation.

link broken - adapter trim

Hello Griffith Lab team members,

Thank you for providing such a nice explanation and tutorial on RNA-Seq analysis.

I was going over the adapter trimming tutorial, and found that this link

wget http://genomedata.org/rnaseq-tutorial/illumina_multiplex.fa

is broken , could you please help in fixing it.

Thank you.

Unlimited urns GTTBHR 1000 lives

The first thing that we do is that I have made a great job with the only person who can make it work

strand specificity of dUTP libraries and software parameters

Hi,

I have a couple of questions about dUTP method generated RNAseq data.
Do you thinks it has the same strand specificity as 'TruSeq Strand Specific Total RNA'?

I found this piece of code from 'Trinity-v2.6.5/util/align_and_estimate_abundance.pl'
'''
if ($SS_lib_type) {
# add strand-specific options for kallisto
my $kallisto_ss_opt = ($SS_lib_type =~ /^R/) ? "--rf-stranded" : "--fr-stranded";
if ($kallisto_add_opts !~ /$kallisto_ss_opt/) {
$kallisto_add_opts .= " $kallisto_add_opts";
}
}
'''

Which means Trinity --SS_lib_type RF is equal to kallisto quant --rf-stranded.
Do you think it is correct?

Many thanks.

Huanlee

RNA-seq Test Data

First of all, I want to thank you for providing this tutorial.

I am following the thorough instruction given in Module-1 for obtaining the RNA-seq raw data set (fastq files), and I was wondering if the HBR and HUR data (HBR_UHR_ERCC_ds_5pc.tar) is available for download.

Could please inform me if the data is available?

Thank you again!

HISAT2 error message

I'm getting a couple errors when trying to run HISAT2

my input is $hisat2 -p 8 --dta -x index {-1 input1_1,input2_1 -2 input1_2,input2_1} -S [output.sam]

but I keep getting the error message: (ERR): Different number of files specified with --reads/-1 as with -2

Any help would be much appreciated.

Creat FM-index "hisat2-build: command not found"

I added the hisat2 to my PATH somehow when I want to use the command, it shows that "hisat2-build: command not found. Do I need to specify the program in order to use it, @obigriffith
Thank you in advance, first time trying to build an R_seq environment
ubuntu@ip-172-31-40-177:~/workspace/rna_seq/tools/hisat2-2.0.4$ hisat2-build -p 8 --ss /home/ubuntu/workspace/rna_seq/reference_genome/splicesites.tsv --exon /home/ubuntu/workspace/rna_seq/reference_genome/exons.tsv $RNA_REF_FASTA $RNA_REF_INDEX hisat2-build: command not found ubuntu@ip-172-31-40-177:~/workspace/rna_seq/tools/hisat2-2.0.4$ echo $PATH /home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ubuntu/tools/hisat2-2.0.4 ubuntu@ip-172-31-40-177:~/workspace/rna_seq/tools/hisat2-2.0.4$

Error in using Trinotate command

Hi,
I wanted to create Trinotate sqlite database, but facing an issue while using the first Trinotate command.
Please help me in solving the issue and guide me what to do.
I am attaching the command used as well as the error message received.

Thank you in advance

COMMAND USED:

./Trinotate --db /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite --create --trinotate_data_dir /home/workstation2/Trinotate-Trinotate-v4.0.0

ERROR MESSAGE:

Use of uninitialized value $blast_type in concatenation (.) or string at ./Trinotate line 200.
-CREATING /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite and populating data dir: /home/workstation2/Trinotate-Trinotate-v4.0.0
WARNING: SQLITE database /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite already exists and wont be replaced.
-sqlite db /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite already exists and is not being replaced. If a new boilerplate is required, copy db from: /home/workstation2/Trinotate-Trinotate-v4.0.0/TrinotateBoilerplate.sqlite
-- Skipping CMD: /home/workstation2/Trinotate-Trinotate-v4.0.0/util/admin/Build_Trinotate_Boilerplate_SQLite_db.pl TrinotateBoilerplate, checkpoint [/home/workstation2/Trinotate-Trinotate-v4.0.0/__chckpts/build_boilerplate.ok] exists.

[Sat May 27 15:37:32 2023] Running CMD: mv TrinotateBoilerplate.sqlite pfam2go go-basic.obo pfam2go.tab NOG.annotations.tsv.gz /home/workstation2/Trinotate-Trinotate-v4.0.0/
mv: 'TrinotateBoilerplate.sqlite' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/TrinotateBoilerplate.sqlite' are the same file
mv: 'pfam2go' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/pfam2go' are the same file
mv: 'go-basic.obo' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/go-basic.obo' are the same file
mv: 'pfam2go.tab' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/pfam2go.tab' are the same file
mv: 'NOG.annotations.tsv.gz' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/NOG.annotations.tsv.gz' are the same file
Error, cmd: mv TrinotateBoilerplate.sqlite pfam2go go-basic.obo pfam2go.tab NOG.annotations.tsv.gz /home/workstation2/Trinotate-Trinotate-v4.0.0/ died with ret 256 No such file or directory at /home/workstation2/Trinotate-Trinotate-v4.0.0/PerlLib/Pipeliner.pm line 187.
Pipeliner::run(Pipeliner=HASH(0x55f807279d58)) called at ./Trinotate line 426
main::run_Trinotate_create("/home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite", "/home/workstation2/Trinotate-Trinotate-v4.0.0") called at ./Trinotate line 237

Error in `contrasts

Multiple hits per transcript/peptide after blastx and blastp

Hi there

I am using the Trinotate pipeline to annotate a transcriptome comprising of 70 000 transcripts. After running blastx and blastp, i got 6 million and 4 million hits, respectively. This is due to there being multiple hits per sequence. Do I first have to filter and keep the top hit for each sequence to continue with the pipeline?

Kind regards,
Tanner

Link broken

Hi,
This link seems to be broken.
https://github.com/griffithlab/rnaseq_tutorial/blob/master/scripts/Tutorial_Module4_Part2_ballgown.R

Can't find some gene id

Thank you for writing the script, but when I use this to extract tpm or fpkm, it always appears that can't find some gene id, but it actually exists, so I wonder how to solve this problem？ Thank you very much!

Update the .bashrc that is provided.

See the end of the Installation wiki page:
https://github.com/griffithlab/rnaseq_tutorial/wiki/config/.bashrc

Getting error while loading transcripts

Trinotate Trinotate.sqlite init --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep
CMD: TrinotateSeqLoader.pl --sqlite Trinotate.sqlite --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep --bulk_load
-parsing gene/trans map file.... done.
-loading Transcripts.
[93900]
done.
-loading ORFs.
[160000]
done.
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.Transcript.bulk_load Transcript" | sqlite3 Trinotate.sqlite
sqlite3: error while loading shared libraries: libncurses.so.6: cannot open shared object file: No such file or directory
Error, cmd: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.Transcript.bulk_load Transcript" | sqlite3 Trinotate.sqlite died with ret 32512 at /home/maran/anaconda3/lib/site_perl/5.26.2/Sqlite_connect.pm line 190.
Sqlite_connect::bulk_load_sqlite("Trinotate.sqlite", "Transcript", "tmp.Transcript.bulk_load") called at /home/maran/anaconda3/bin/TrinotateSeqLoader.pl line 223
Error, cmd: TrinotateSeqLoader.pl --sqlite Trinotate.sqlite --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep --bulk_load died with ret 32512 at /home/maran/anaconda3/bin/Trinotate line 126.

Please help me to solve this issue

how to get the genes belong to one sample and gene id conversion

I am new to RNA seq data analysis. I have followed hisat2,stringtie and ballgown pipeline. I have created the list of genes by following all steps mentioned in the paper protocol. list of genes created by step number 15. This list contains all differential expressed genes present in all samples. and this list did not contain any identifier which tells me this gene belong to this sample. My question is that how can I identify a number of differentially expressed genes belong to one sample. here mine gene list file.
`

feature	id	fc	pval	qval
gene	MSTRG.28632	0.34122	1.95E-05	0.176761
gene	MSTRG.3615	5.155727	2.21E-05	0.176761
gene	MSTRG.7507	0.251907	2.22E-05	0.176761
gene	MSTRG.70532	0.318647	2.42E-05	0.176761

same case with the trancript file

geneNames	geneIDs	geneNames.1	geneIDs.1	feature	id	fc	pval	qval
.	MSTRG.165	.	MSTRG.37909	transcript	94562	7.209395	6.37E-07	0.029347
.	MSTRG.165	.	MSTRG.26699	transcript	66342	0.095091	9.80E-06	0.117421
.	MSTRG.166	.	MSTRG.11475	transcript	28435	3.41491	1.17E-05	0.117421
.	MSTRG.170	.	MSTRG.17454	transcript	42831	16.80033	1.22E-05	0.117421
.	MSTRG.173	.	MSTRG.1249	transcript	3256	0.076575	1.85E-05	0.117421

`
next question is that how can i change the gene id in these above file into ensembl format to get gene ontology?

typo in hisat2 rna strandness option

Wanted to point out a small typo that gave me an issue for a good 20 minutes - when running hisat2 options the command for indicating library strandedness is incorrect. It is written as ---rna-strandedness but correct option is --rna-strandness.

Thanks for this tutorial - it's great!

which tool can be used for an alternate for Hisat2 ??

Hello everyone,

I am following the tutorial https://github.com/griffithlab/rnaseq_tutorial/ .
Since, Hisat requires huge RAM, I am getting message "out of memory". which other splice aware aligner can be used instead, with less computer itensive algorithm.

Thanks

add multiqc

:-)

getting raw counts

Hi and thank you for the great tool.

I am trying to perform DEG analysis with DESeq2 and need raw counts. After using kallisto quant my output looks like this:

Since the estimated count is calculated by MLE I was wondering how would you recommend using that for DESeq2.

Thank you for your attention.

How to perform DGE with Kallisto results files

Hi, I am analyzing RNA-Seq data. I have 190 PE samples (control: male, female, mutant: male, female). I used kallisto for quantifying abundances of transcripts with the same input data.

My question is that how to perform DGE after kallisto analysis. I have prepared a signle .tsv matrix files of all samples. Should I use Deseq2 for DGE analysis. However, the number of DE should be more in abundances of transcripts rather than a gene-based method.
I am using the following command but here my data is not in equal number. Please suggest appropriate script:

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
#library(DESeq2)
sampleTable <- data.frame(condition = factor(rep(c("affected", "unaffected"), each=93)))

rownames(sampleTable) <- colnames(txi.kallisto$counts)

dds <- DESeqDataSetFromTximport(txi.kallisto, sampleTable, ~condition)

Regarding ERCC spike-in

In the samples that were used in the tutorial, the samples contained an additional ERCC. However for the samples that I have downloaded from SRA doesnot contain any. Neither do the Reference Genome downloaded from Ensembl. So will there be much diffrence in analysis steps. Where do I need to be careful.

Docker?

Should there be a docker image for this tutorial?

Differential expression on STAR throwing an error

I've been following the guide for STAR from the start, but I seem to have an issue at the cuffmerge stage:

cd $RNA_HOME/expression/star_cufflinks/ref_only/
ls -1 _Rep_ERCC*/transcripts.gtf > assembly_GTF_list.txt
cuffmerge -p 8 -o merged -g $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf -s $RNA_HOME/refs/hg19/bwt/chr22_ERCC92/ assembly_GTF_list.txt

I get the error:

[Sat Jan 2 15:23:16 2016] Beginning transcriptome assembly merge

[Sat Jan 2 15:23:16 2016] Preparing output location merged/
[Sat Jan 2 15:23:17 2016] Converting GTF files to SAM
[15:23:17] Loading reference annotation.
Error: duplicate GFF ID 'ENST00000400518' encountered!
[FAILED]
Error: could not execute gtf_to_sam

if I dump the contents of the .gtf file it looks like there are multiple hits for that ID:

grep 'ENST00000400518' $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf > hits.txt
hits.txt

This is the second time I have tried to replicate it, so I don't have any of the tophat alignments in this version, but I don't think they should be needed. Do you have any suggestions?

Nanopore reads

Hi,
I just quantified RNA-Seq reads generated with a Oxford Nanopore MinION sequencing device using kallisto 0.46.0. The est_counts reflected well the number of reads I observed for some example genes but I was wondering if the tpm calculation is using the read length parameter. Since each nanopore read should more or less represent a whole transcript, then incorporating a fixed read length in TPM estimation would lead to wrong results, right?

Regards,
Markus

sequences dropped from the index

Hello,

kallisto (0.44.0) seems to be silently dropping sequences from the index.

Working example:

download circBase hg19 circRNA sequences.
count them (140790)
index them and count again how many targets are included in the index (92509)

Is there a reason why some sequences are not indexed?

Code to reproduce example:

wget 'http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz' | gzip -d -c > human_hg19_circRNAs_putative_spliced_sequence.fa

sed -n '/^>/p' human_hg19_circRNAs_putative_spliced_sequence.fa |  wc -l 

kallisto index -i human_hg19_circRNAs_putative_spliced_sequence.fa.fai human_hg19_circRNAs_putative_spliced_sequence.fa

kallisto inspect human_hg19_circRNAs_putative_spliced_sequence.fa.fai

Ballgown plotting problem

I did the tutorial, typically, as it is. I used the same tools versions, but I have 2 main problems in DE visualisation.

The distribution plot of UHR vs HBR, shows a less number of genes, and the labelled genes are not the real 25 top genes of probability values.
but, when I look to my UHR_vs_HBR_gene_results_sig.tsv, I have about 310 genes, the top 25 genes (I attached the plots PDFs)
are:
IGLC3
MPPED1
PRAME
IGLV2-23
CDC45
APOBEC3B
PLA2G3
SHANK3
RP5-1119A7.17
CACNA1I
ATF4
IGLV2-14
KDELR3
ERCC-00004
Sep-03
LA16c-3G11.7
SYNGR1
MYO18B
ERCC-00002
GNAZ
MLC1
MAPK8IP2
ERCC-00130
TEF

Tutorial_Part3_Supplementary_R_output.pdf
outfile.pdf

hisat2 --rna-strandness produced same results with RF and FR

Hi there,

I followed the tutorial to align RNAseq PE reads.
I found both --rna-strandness FR and RF produced the same results and they are also the same as the results generated by ignoring strans-specificity.

Just wonder why is this.

Thanks heaps in advance for your help.

Huanlee

Samtools Install

Hey, I'm getting tripped up in the samtools installation at the make step. It's throwing this error and I don't understand it / couldn't find anything online.

Thanks a lot for the help.