Giter VIP home page Giter VIP logo

pennseq's Introduction

# Copyright (c) 2013, Yu Hu and Mingyao Li (Perelman School of Medicine, University of Pennsylvania)
# All rights reserved.
#########################

UPADATES:
	# 01/11/2017: Application of Pysam module was added

REQUIREMENTS:
	(i) Python v2.X	
	(ii) Pysam
	(iii) Indexed BAM file as input
	

#####################STEP(1)#########################
The perl script "PreProcess.pl" will give the compatible matrix of isoforms for each gene.

USAGE OF preprocess.pl:
"./PreProcess.pl -r RefSeqAnnotation -o Outputfile"

The RefSeqAnnotation file is downloaded from UCSC Genome Browers. The format is as follows
--------------------------------------------------------------------------------------------
#bin    name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       exonStarts      exonEnds        id      name2   cdsStartStat    cdsEndStat      exonFrames
774	NM_001251984	chr1	+	24828840		24863510		24859578	24861767	4	24828840,24857707,24859572,24861582,	24828953,24857881,24859744,24863510, 0 RCAN3 cmpl cmpl -1,-1,0,1,
774	NM_001251979	chr1	+	24828840		24863510		24840862	24861767	5	24828840,24840803,24857707,24859572,24861582,	24828953,24841057,24857881,24859744,24863510, 0 RCAN3 cmpl cmpl -1,0,0,0,1,
774	NM_001251977	chr1	+	24829386		24863510		24840862	24861767	5	24829386,24840803,24857707,24859572,24861582,	24829607,24841057,24857881,24859744,24863510, 0	RCAN3 cmpl cmpl	-1,0,0,0,1,
774	NM_001251981	chr1	+	24829386		24863510		24840862	24861767	4	24829386,24840803,24859572,24861582,		24829607,24841057,24859744,24863510,	      0	RCAN3 cmpl cmpl	-1,0,0,1,

---------------------------------------------------------------------------------------------
-r ---RefSeqAnnotation file

-o ---The file name that you want to save the results; it will be the input file for "pennseq.pl"

The file "RefSeqAnnotation_ECE1_example" is an example of RefSeqAnnotation.

The file "ISOFORM_Compatible_Matrix_example" is an example of the processed annotation by preprocess.pl


#####################STEP(2)#########################
The Python script "PennSeq.py" will calculate the relative abundance for each isoform.

USAGE OF pennseq.pl:

"python PennSeq.py -ref ISOFORM_Compatible_Matrix -bam Indexed_BAM_File -out Output_Result_FILE"

-bam ---Indexed_BAM_File is the mapping result file generated by mapping tools such as TopHat. 

-ref ---ISOFORM_Compatible_Matrix is from the output of preprocess.pl

-out ---The file name that you want to save the results

The data file "BAM_example.sorted.bam" is an example Indexed Bam File. 

The file "Output_Result_FILE" is an example of results file.

-------------------------------------------------------------------------------------
Contact inforamtion
For more information, addtional data and scripts (such as the simulation and plot codes), bug reports and comments, please contact Yu Hu via [email protected]. 

pennseq's People

Contributors

steffen12 avatar

Watchers

 avatar

pennseq's Issues

Error in getCigarStringInformation

To Whom It May Concern,

When generating isoform relative abundances for a dataset using your PennSeq, I got the following error:

using PennSeq, I got the following error:
Traceback (most recent call last):
File "~/program/PennSeq/PennSeq.py", line 289, in
test1.aln.sam.txt

cigarMatchRead1, cigarNumberRead1, cigarMatchInfoCount1, cigarNumberInfoCount1 = getCigarStringInformation(readCigar, readName, 1)

File "~/program/PennSeq/PennSeq.py", line 24, in getCigarStringInformation
cigarNumberRead[cigarMatchInfoCount] = int(splitCigar[i])
ValueError: invalid literal for int() with base 10: '92=1'

The dataset (a sam file) was attached.

Thanks a lot for your help.

Max

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.