Giter VIP home page Giter VIP logo

griffithlab / rnaseq_tutorial Goto Github PK

View Code? Open in Web Editor NEW
1.3K 183.0 616.0 448.7 MB

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

License: Other

R 66.91% Perl 24.22% Shell 8.87%

rnaseq_tutorial's Introduction

Informatics for RNA-seq: A web resource for analysis on the cloud

THIS VERSION OF THE RNA-SEQ COURSE IS DEPRECATED. FOR CURRENT VERSION PLEASE VISIT: https://rnabio.org/


An educational tutorial and working demonstration pipeline for RNA-seq analysis including an introduction to: cloud computing, next generation sequence file formats, reference genomes, gene annotation, expression analysis, differential expression analysis, alternative splicing analysis, data visualization, and interpretation.

This repository is used to store code and certain raw materials for a detailed RNA-seq tutorial. To actually complete this tutorial, go to the RNA-seq tutorial wiki.

Citation: Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud. PLoS Comp Biol. 11(8):e1004393.

*To whom correspondence should be addressed: E-mail: mgriffit[AT]genome.wustl.edu, ogriffit[AT]genome.wustl.edu

Note: An archived version of this tutorial exists here. This version is maintained for consistency with the published materials (Griffith et al. 2015. PLoS Comp Biol.) and for past students wishing to review covered material. However, we strongly suggest that you continue with the current version of the tutorial below.

Want to contribute to the RNA-seq Wiki?

Fork it and send a pull request.


Tutorial Table of Contents

  1. Module 0 - Introduction and Cloud Computing
    1. Authors
    2. Citation and Supplementary Materials
    3. Syntax
    4. Intro to AWS Cloud Computing
    5. Logging into Amazon Cloud
    6. Unix Bootcamp
    7. Environment
    8. Resources
  2. Module 1 - Introduction to RNA sequencing
    1. Installation
    2. Reference Genomes
    3. Annotations
    4. Indexing
    5. RNA-seq Data
    6. Pre-Alignment QC
  3. Module 2 - RNA-seq Alignment and Visualization
    1. Adapter Trim
    2. Alignment
    3. IGV
    4. Alignment Visualization
    5. Alignment QC
  4. Module 3 - Expression and Differential Expression
    1. Expression
    2. Differential Expression
    3. DE Visualization
    4. Kallisto for Reference-Free Abundance Estimation
  5. Module 4 - Isoform Discovery and Alternative Expression
    1. Reference Guided Transcript Assembly
    2. de novo Transcript Assembly
    3. Transcript Assembly Merge
    4. Differential Splicing
    5. Splicing Visualization
  6. Module 5 - De novo transcript reconstruction
    1. De novo RNA-Seq Assembly and Analysis Using Trinity
  7. Module 6 - Functional Annotation of Transcripts
    1. Functional Annotation of Assembled Transcripts Using Trinotate
  8. Appendix
    1. Saving Your Results
    2. Abbreviations
    3. Lectures
    4. Practical Exercise Solutions
    5. Integrated Assignment
    6. Proposed Improvements
    7. AWS Setup

rnaseq_tutorial's People

Contributors

ahwagner avatar brianjohnhaas avatar bryant1410 avatar chrisamiller avatar jasonwalker80 avatar malachig avatar obigriffith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rnaseq_tutorial's Issues

ftp site for data is not working

The ftp site with the data for the RNA-Seq tutorial does not work:

wget ftp://genome.wustl.edu/pub/rnaseq/data/brain_vs_uhr_w_ercc/downsampled_5pc_chr22/HBR_UHR_ERCC_ds_5pc.tar

RNA_Seq analysis get counts from Balgown #264

Hello Everyone,
I am new to RNA-seq analysis. First of all I 'd like give some information about my data. My data is consist of 24 SRR file related to Sex-specific and lineage-specific alternative splicing in primates. Primates are human and chimpanzees. we used RNAseq to study transcript levels in humans and chimpanzees, using liver RNA samples from three males and three females from each species. For each sex there are two replicates. Briefly, For human, I have 12 RNA-seq file: Three males x2 replicates (For example: male1 rep1,male1 rep2 and so on.) three females x 2 replicates, for chimpanzees likewise human. For RNA-seq analsis ,I am using Nature protocol.These RNA-seq data contains single reads.
1- For first step when I align these rna-seq reads to genome. I used this comman:"hisat2 -p 2 --dta -x indexes/hg.v90 project_datasets/SRR032126.fastq -S SRR032126.sam".I am not sure whether is true or not
and as I said I have 2 replicates for each sex, Should I combine this replicate on one file? İf it is, how to perform it? Or why we use the two or more replicates.
2- I used the hisat2 >stringtie >... balgown for this analysis, but I want to do the differential analysis via DeSeq2. As you konow for this ,you need to count read. How to get this count reads from balgown. Thank you...

stringtie_expression_matrix.pl

hello,
thank you for the rnaseq tutorial.
i was trying to use the script "stringtie_expression_matrix.pl" to Extract FPKM/TPM or coverage results and i used the following code:
perl stringtie_expression_matrix.pl --expression_metric=FPKM \ --result_dirs='alevin_RNA-1,alevin_RNA-2,alevin_RNA-3,egg_RNA-1,egg_RNA-2,egg_RNA-3,FF_RNA-1,FF_RNA-2,FF_RNA-3' \ --transcript_matrix_file=transcript_tpms_all_samples.tsv \ --gene_matrix_file=gene_tpms_all_samples.tsv
for the files of transcripts gtf, their structure is as follow:
image

when i run the command, i got the following error:
Processing data for the following 9 samples:
Could not find transcript id in line: 1 StringTie transcript 135071 135516 1000 . . gene_id "MSTRG.1"; transcript_id "MSTRG.1.1"; cov "6681.424805"; FPKM "1365.562866"; TPM "3094.511230";
how is possible the fix the problem?
thank you for your help.

Hisat2 error

Hi, I'm experiencing errors when running the hisat2 to generate sam files.
hisat2 -p 8 --rg-id=UHR_Rep2 --rg SM:UHR --rg LB:UHR_Rep2_ERCC-Mix1 --rg PL:ILLUMINA --rg PU:CXX1234-TGACAC.1 -x $RNA_REF_INDEX --dta --rna-strandness RF -1 $RNA_DATA_DIR/UHR_Rep2_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq.gz -2 $RNA_DATA_DIR/UHR_Rep2_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq.gz -S ./UHR_Rep2.sam Warning: the current version of HISAT2 (2.0.4) is older than the version (2.167.69) used to build the index. Users are strongly recommended to update HISAT2 to the latest version. Error reading block of _offs[] array: 0, 2097152 Error: Encountered internal HISAT2 exception (#1) Command: /home/ubuntu/workspace/rna_seq/tools/hisat2-2.0.4/hisat2-align-s --wrapper basic-0 -p 8 --rg-id=UHR_Rep2 --rg SM:UHR --rg LB:UHR_Rep2_ERCC-Mix1 --rg PL:ILLUMINA --rg PU:CXX1234-TGACAC.1 -x /home/ubuntu/workspace/rna_seq/reference_genome/chr22_with_ERCC92 --dta --rna-strandness RF -S ./UHR_Rep2.sam -1 /tmp/1602.inpipe1 -2 /tmp/1602.inpipe2 (ERR): hisat2-align exited with value 1

Any idea what caused it?

Where do I get ERCC transcript file?

In 1-ii

This has been done for you and that data placed on your AWS instance. It contains chr22 and ERCC transcript fasta files in both a single combined file and individual files. Copy the file to the rnaseq working directory

I downloaded the file Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa. Do I need to extract chr22 from this file?
Where do I get ERCC transcript file?

./stringtie_expression_matrix.pl does not work

I am trying to follow the tutorial, but I cannot get past the "expression" section.

In particular, when I try to get the expression matrix with ./stringtie_expression_matrix.pl , I get a compilation error:

./stringtie_expression_matrix.pl --expression_metric=TPM --result_dirs='HBR_Rep1,HBR_Rep2,HBR_Rep3,UHR_Rep1,UHR_Rep2,UHR_Rep3' --transcript_matrix_file=transcript_tpm_all_samples.tsv --gene_matrix_file=gene_tpm_all_samples.tsv

Type of arg 1 to keys must be hash (not private array) at ./stringtie_expression_matrix.pl line 49, near "@sample_list;" Execution of ./stringtie_expression_matrix.pl aborted due to compilation errors.

I am not really an expert in PERL, so I am having some problems troubleshooting the code, but it seems that sample_list is not a hash (well, it says that in the error...).

How do I solve this?

Normalization module

I think that this tutorial could use a module on normalizing the gene/transcript expression estimates provided by stringtie, before differential expression estimation.

strand specificity of dUTP libraries and software parameters

Hi,

I have a couple of questions about dUTP method generated RNAseq data.
Do you thinks it has the same strand specificity as 'TruSeq Strand Specific Total RNA'?

I found this piece of code from 'Trinity-v2.6.5/util/align_and_estimate_abundance.pl'
'''
if ($SS_lib_type) {
# add strand-specific options for kallisto
my $kallisto_ss_opt = ($SS_lib_type =~ /^R/) ? "--rf-stranded" : "--fr-stranded";
if ($kallisto_add_opts !~ /$kallisto_ss_opt/) {
$kallisto_add_opts .= " $kallisto_add_opts";
}
}
'''

Which means Trinity --SS_lib_type RF is equal to kallisto quant --rf-stranded.
Do you think it is correct?

Many thanks.

Huanlee

RNA-seq Test Data

First of all, I want to thank you for providing this tutorial.

I am following the thorough instruction given in Module-1 for obtaining the RNA-seq raw data set (fastq files), and I was wondering if the HBR and HUR data (HBR_UHR_ERCC_ds_5pc.tar) is available for download.

Could please inform me if the data is available?

Thank you again!

HISAT2 error message

I'm getting a couple errors when trying to run HISAT2

my input is $hisat2 -p 8 --dta -x index {-1 input1_1,input2_1 -2 input1_2,input2_1} -S [output.sam]

but I keep getting the error message: (ERR): Different number of files specified with --reads/-1 as with -2

Any help would be much appreciated.

Creat FM-index "hisat2-build: command not found"

I added the hisat2 to my PATH somehow when I want to use the command, it shows that "hisat2-build: command not found. Do I need to specify the program in order to use it, @obigriffith
Thank you in advance, first time trying to build an R_seq environment
ubuntu@ip-172-31-40-177:~/workspace/rna_seq/tools/hisat2-2.0.4$ hisat2-build -p 8 --ss /home/ubuntu/workspace/rna_seq/reference_genome/splicesites.tsv --exon /home/ubuntu/workspace/rna_seq/reference_genome/exons.tsv $RNA_REF_FASTA $RNA_REF_INDEX hisat2-build: command not found ubuntu@ip-172-31-40-177:~/workspace/rna_seq/tools/hisat2-2.0.4$ echo $PATH /home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ubuntu/tools/hisat2-2.0.4 ubuntu@ip-172-31-40-177:~/workspace/rna_seq/tools/hisat2-2.0.4$
screen shot 2017-03-31 at 4 28 42 pm

Error in using Trinotate command

Hi,
I wanted to create Trinotate sqlite database, but facing an issue while using the first Trinotate command.
Please help me in solving the issue and guide me what to do.
I am attaching the command used as well as the error message received.

Thank you in advance

COMMAND USED:

./Trinotate --db /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite --create --trinotate_data_dir /home/workstation2/Trinotate-Trinotate-v4.0.0

ERROR MESSAGE:

Use of uninitialized value $blast_type in concatenation (.) or string at ./Trinotate line 200.
-CREATING /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite and populating data dir: /home/workstation2/Trinotate-Trinotate-v4.0.0
WARNING: SQLITE database /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite already exists and wont be replaced.
-sqlite db /home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite already exists and is not being replaced. If a new boilerplate is required, copy db from: /home/workstation2/Trinotate-Trinotate-v4.0.0/TrinotateBoilerplate.sqlite
-- Skipping CMD: /home/workstation2/Trinotate-Trinotate-v4.0.0/util/admin/Build_Trinotate_Boilerplate_SQLite_db.pl TrinotateBoilerplate, checkpoint [/home/workstation2/Trinotate-Trinotate-v4.0.0/__chckpts/build_boilerplate.ok] exists.

  • [Sat May 27 15:37:32 2023] Running CMD: mv TrinotateBoilerplate.sqlite pfam2go go-basic.obo pfam2go.tab NOG.annotations.tsv.gz /home/workstation2/Trinotate-Trinotate-v4.0.0/
    mv: 'TrinotateBoilerplate.sqlite' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/TrinotateBoilerplate.sqlite' are the same file
    mv: 'pfam2go' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/pfam2go' are the same file
    mv: 'go-basic.obo' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/go-basic.obo' are the same file
    mv: 'pfam2go.tab' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/pfam2go.tab' are the same file
    mv: 'NOG.annotations.tsv.gz' and '/home/workstation2/Trinotate-Trinotate-v4.0.0/NOG.annotations.tsv.gz' are the same file
    Error, cmd: mv TrinotateBoilerplate.sqlite pfam2go go-basic.obo pfam2go.tab NOG.annotations.tsv.gz /home/workstation2/Trinotate-Trinotate-v4.0.0/ died with ret 256 No such file or directory at /home/workstation2/Trinotate-Trinotate-v4.0.0/PerlLib/Pipeliner.pm line 187.
    Pipeliner::run(Pipeliner=HASH(0x55f807279d58)) called at ./Trinotate line 426
    main::run_Trinotate_create("/home/workstation2/Trinotate-Trinotate-v4.0.0/Trinotate.sqlite", "/home/workstation2/Trinotate-Trinotate-v4.0.0") called at ./Trinotate line 237

Multiple hits per transcript/peptide after blastx and blastp

Hi there

I am using the Trinotate pipeline to annotate a transcriptome comprising of 70 000 transcripts. After running blastx and blastp, i got 6 million and 4 million hits, respectively. This is due to there being multiple hits per sequence. Do I first have to filter and keep the top hit for each sequence to continue with the pipeline?

Kind regards,
Tanner

Can't find some gene id

Thank you for writing the script, but when I use this to extract tpm or fpkm, it always appears that can't find some gene id, but it actually exists, so I wonder how to solve this problem? Thank you very much!
image

Getting error while loading transcripts

Trinotate Trinotate.sqlite init --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep
CMD: TrinotateSeqLoader.pl --sqlite Trinotate.sqlite --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep --bulk_load
-parsing gene/trans map file.... done.
-loading Transcripts.
[93900]
done.
-loading ORFs.
[160000]
done.
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.Transcript.bulk_load Transcript" | sqlite3 Trinotate.sqlite
sqlite3: error while loading shared libraries: libncurses.so.6: cannot open shared object file: No such file or directory
Error, cmd: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.Transcript.bulk_load Transcript" | sqlite3 Trinotate.sqlite died with ret 32512 at /home/maran/anaconda3/lib/site_perl/5.26.2/Sqlite_connect.pm line 190.
Sqlite_connect::bulk_load_sqlite("Trinotate.sqlite", "Transcript", "tmp.Transcript.bulk_load") called at /home/maran/anaconda3/bin/TrinotateSeqLoader.pl line 223
Error, cmd: TrinotateSeqLoader.pl --sqlite Trinotate.sqlite --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep --bulk_load died with ret 32512 at /home/maran/anaconda3/bin/Trinotate line 126.

Please help me to solve this issue

how to get the genes belong to one sample and gene id conversion

I am new to RNA seq data analysis. I have followed hisat2,stringtie and ballgown pipeline. I have created the list of genes by following all steps mentioned in the paper protocol. list of genes created by step number 15. This list contains all differential expressed genes present in all samples. and this list did not contain any identifier which tells me this gene belong to this sample. My question is that how can I identify a number of differentially expressed genes belong to one sample. here mine gene list file.
`

feature id fc pval qval
gene MSTRG.28632 0.34122 1.95E-05 0.176761
gene MSTRG.3615 5.155727 2.21E-05 0.176761
gene MSTRG.7507 0.251907 2.22E-05 0.176761
gene MSTRG.70532 0.318647 2.42E-05 0.176761

same case with the trancript file

geneNames geneIDs geneNames.1 geneIDs.1 feature id fc pval qval  
. MSTRG.165 . MSTRG.37909 transcript 94562 7.209395 6.37E-07 0.029347  
. MSTRG.165 . MSTRG.26699 transcript 66342 0.095091 9.80E-06 0.117421  
. MSTRG.166 . MSTRG.11475 transcript 28435 3.41491 1.17E-05 0.117421  
. MSTRG.170 . MSTRG.17454 transcript 42831 16.80033 1.22E-05 0.117421  
. MSTRG.173 . MSTRG.1249 transcript 3256 0.076575 1.85E-05 0.117421  

`
next question is that how can i change the gene id in these above file into ensembl format to get gene ontology?

typo in hisat2 rna strandness option

Wanted to point out a small typo that gave me an issue for a good 20 minutes - when running hisat2 options the command for indicating library strandedness is incorrect. It is written as ---rna-strandedness but correct option is --rna-strandness.

Thanks for this tutorial - it's great!

getting raw counts

Hi and thank you for the great tool.

I am trying to perform DEG analysis with DESeq2 and need raw counts. After using kallisto quant my output looks like this:

Screenshot 1403-03-19 at 22 59 07

Since the estimated count is calculated by MLE I was wondering how would you recommend using that for DESeq2.

Thank you for your attention.

How to perform DGE with Kallisto results files

Hi, I am analyzing RNA-Seq data. I have 190 PE samples (control: male, female, mutant: male, female). I used kallisto for quantifying abundances of transcripts with the same input data.

My question is that how to perform DGE after kallisto analysis. I have prepared a signle .tsv matrix files of all samples. Should I use Deseq2 for DGE analysis. However, the number of DE should be more in abundances of transcripts rather than a gene-based method.
I am using the following command but here my data is not in equal number. Please suggest appropriate script:

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
#library(DESeq2)
sampleTable <- data.frame(condition = factor(rep(c("affected", "unaffected"), each=93)))

rownames(sampleTable) <- colnames(txi.kallisto$counts)

dds <- DESeqDataSetFromTximport(txi.kallisto, sampleTable, ~condition)

Regarding ERCC spike-in

In the samples that were used in the tutorial, the samples contained an additional ERCC. However for the samples that I have downloaded from SRA doesnot contain any. Neither do the Reference Genome downloaded from Ensembl. So will there be much diffrence in analysis steps. Where do I need to be careful.

Docker?

Should there be a docker image for this tutorial?

Differential expression on STAR throwing an error

I've been following the guide for STAR from the start, but I seem to have an issue at the cuffmerge stage:

cd $RNA_HOME/expression/star_cufflinks/ref_only/
ls -1 _Rep_ERCC*/transcripts.gtf > assembly_GTF_list.txt
cuffmerge -p 8 -o merged -g $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf -s $RNA_HOME/refs/hg19/bwt/chr22_ERCC92/ assembly_GTF_list.txt

I get the error:

[Sat Jan 2 15:23:16 2016] Beginning transcriptome assembly merge

[Sat Jan 2 15:23:16 2016] Preparing output location merged/
[Sat Jan 2 15:23:17 2016] Converting GTF files to SAM
[15:23:17] Loading reference annotation.
Error: duplicate GFF ID 'ENST00000400518' encountered!
[FAILED]
Error: could not execute gtf_to_sam

if I dump the contents of the .gtf file it looks like there are multiple hits for that ID:

grep 'ENST00000400518' $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf > hits.txt
hits.txt

This is the second time I have tried to replicate it, so I don't have any of the tophat alignments in this version, but I don't think they should be needed. Do you have any suggestions?

Nanopore reads

Hi,
I just quantified RNA-Seq reads generated with a Oxford Nanopore MinION sequencing device using kallisto 0.46.0. The est_counts reflected well the number of reads I observed for some example genes but I was wondering if the tpm calculation is using the read length parameter. Since each nanopore read should more or less represent a whole transcript, then incorporating a fixed read length in TPM estimation would lead to wrong results, right?

Regards,
Markus

sequences dropped from the index

Hello,

kallisto (0.44.0) seems to be silently dropping sequences from the index.

Working example:

Is there a reason why some sequences are not indexed?

Code to reproduce example:

wget 'http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz' | gzip -d -c > human_hg19_circRNAs_putative_spliced_sequence.fa

sed -n '/^>/p' human_hg19_circRNAs_putative_spliced_sequence.fa |  wc -l 

kallisto index -i human_hg19_circRNAs_putative_spliced_sequence.fa.fai human_hg19_circRNAs_putative_spliced_sequence.fa

kallisto inspect human_hg19_circRNAs_putative_spliced_sequence.fa.fai

Ballgown plotting problem

I did the tutorial, typically, as it is. I used the same tools versions, but I have 2 main problems in DE visualisation.

The distribution plot of UHR vs HBR, shows a less number of genes, and the labelled genes are not the real 25 top genes of probability values.
but, when I look to my UHR_vs_HBR_gene_results_sig.tsv, I have about 310 genes, the top 25 genes (I attached the plots PDFs)
are:
IGLC3
MPPED1
PRAME
IGLV2-23
CDC45
APOBEC3B
PLA2G3
SHANK3
RP5-1119A7.17
CACNA1I
ATF4
IGLV2-14
KDELR3
ERCC-00004
Sep-03
LA16c-3G11.7
SYNGR1
MYO18B
ERCC-00002
GNAZ
MLC1
MAPK8IP2
ERCC-00130
TEF

Tutorial_Part3_Supplementary_R_output.pdf
outfile.pdf

hisat2 --rna-strandness produced same results with RF and FR

Hi there,

I followed the tutorial to align RNAseq PE reads.
I found both --rna-strandness FR and RF produced the same results and they are also the same as the results generated by ignoring strans-specificity.

Just wonder why is this.

Thanks heaps in advance for your help.

Huanlee

Samtools Install

screen shot 2018-01-05 at 15 51 50

Hey, I'm getting tripped up in the samtools installation at the make step. It's throwing this error and I don't understand it / couldn't find anything online.

Thanks a lot for the help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.