Comments (16)
hi @acarmas1 , sorry for the delay in my reply. I believe you have replied to the issue in [here](#54 (comment) and you encountered an error which I am not quite sure about unless you share with me the function that you used and the full error log?
Thanks!
from m6anet.
Hi Chris, no problem
Also sorry for the late response. On this folder I added the code I run (gen_pos.sbatch), the error I got (out_genomic_position) and the files I used as inputs, which are in .fasta .gtf and .txt format. https://1drv.ms/u/s!AokqkR3muxL0jvIA82csbixN-ANRcQ?e=KWig3S
I'll appreciate your help and thank you.
from m6anet.
hi @acarmas1 seems like a syntax error on gen_pos.py, specifically the last line, your dictionary is missing one curly braces at the end
from m6anet.
Thank you,
By adding the } at the end I was able to run it, but now I'm getting this error:
I think something is wrong with the exon_number argument of the GTF file.
from m6anet.
hi @acarmas1, may I know which gtf file you are using? It might be formatted differently
from m6anet.
Sure, on this link you can find the gtf file I'm using https://1drv.ms/u/s!AokqkR3muxL0jvIA82csbixN-ANRcQ?e=KWig3S
from m6anet.
Please can someone help me with this
from m6anet.
hi @acarmas1, sorry for the delay
I looked at your GTF file and it seems that the problem arises because your gtf file format is a bit different than what we used to deal with. I modified your gen_pos.py, try this out, t2g_dict will give transcript_position -> genomic_position for every transcript in your all_transcripts.txt file
`#!/usr/bin/env python
def t2g(tx_id, fasta_dict, gtf_dict):
t2g_dict = {}
tx_seq = fasta_dict[tx_id.split(".")[0]]
tx_contig = gtf_dict[tx_id]['chr']
g_id = gtf_dict[tx_id]['g_id']
if tx_seq is None:
return [], []
for exon_num in range(len(gtf_dict[tx_id]['exon'])):
g_interval = gtf_dict[tx_id]['exon'][exon_num]
tx_interval = gtf_dict[tx_id]['tx_exon'][exon_num]
for g_pos in range(g_interval[0], g_interval[1] + 1): # Exclude the rims of exons.
dis_from_start = g_pos - g_interval[0]
if gtf_dict[tx_id]['strand'] == "+":
tx_pos = tx_interval[0] + dis_from_start
elif gtf_dict[tx_id]['strand'] == "-":
tx_pos = tx_interval[1] - dis_from_start
if (g_interval[0] <= g_pos < g_interval[0]+2) or (g_interval[1]-2 < g_pos <= g_interval[1]):
kmer = 'XXXXX'
else:
kmer = tx_seq[tx_pos-2:tx_pos+3]
t2g_dict[tx_pos] = g_pos # tx.contig is chromosome.
return t2g_dict
def readFasta(infasta):
fasta=open(infasta,"r")
entries=""
for ln in fasta:
entries+=ln
entries=entries.split(">")
dict={}
for entry in entries:
entry=entry.split("\n")
if len(entry[0].split()) > 0:
id=entry[0].split()[0].split(".")[0]
seq="".join(entry[1:])
dict[id]=seq
return dict
def readGTF(gtf_or_gff):
gtf=open(gtf_or_gff,"r")
dict,is_gff={},0
for ln in gtf:
if not ln.startswith("#"):
ln=ln.strip("\n").split("\t")
if is_gff == 0:
if ln[-1].startswith("ID") or ln[-1].startswith("Parent"):
is_gff = 1
else:
is_gff = -1
if is_gff < 0:
if ln[2] == "transcript" or ln[2] == "exon":
chr,type,start,end=ln[0],ln[2],int(ln[3]),int(ln[4])
attrList=ln[-1].split(";")
attrDict={}
for k in attrList:
p=k.strip().split(" ")
if len(p) == 2:
attrDict[p[0]]=p[1].strip('\"')
##tx_id=ln[-1].split('; transcript_id "')[1].split('";')[0]
##g_id=ln[-1].split('gene_id "')[1].split('";')[0]
tx_id = attrDict["transcript_id"]
g_id = attrDict["gene_id"]
if tx_id not in dict:
dict[tx_id]={'chr':chr,'g_id':g_id,'strand':ln[6]}
if type not in dict[tx_id]:
if type in ("transcript", "exon"):
dict[tx_id][type]=[(start,end)]
else:
if type == "exon":
if type not in dict[tx_id]:
dict[tx_id][type]=[(start,end)]
else:
dict[tx_id][type].append((start,end))
if is_gff > 0:
if ln[2] == "exon" or ln[2] == "mRNA":
chr,type,start,end=ln[0],ln[2],int(ln[3]),int(ln[4])
tx_id=ln[-1].split('transcript:')[1].split(';')[0]
if ln[2] == "mRNA":
type="transcript"
if tx_id not in dict:
dict[tx_id]={'chr':chr,'strand':ln[6]}
if type == "transcript":
dict[tx_id][type]=(start,end)
if type == "exon":
dict[tx_id][type]=[(start,end)]
else:
if type == "transcript" and type not in dict[tx_id]:
dict[tx_id][type]=(start,end)
if type == "exon":
if type not in dict[tx_id]:
dict[tx_id][type]=[(start,end)]
else:
dict[tx_id][type].append((start,end))
#convert genomic positions to tx positions
if is_gff < 0:
for id in dict:
tx_pos,tx_start=[],0
if 'exon' not in dict[id]:
print(id)
continue
for pair in dict[id]["exon"]:
tx_end=pair[1]-pair[0]+tx_start
tx_pos.append((tx_start,tx_end))
tx_start=tx_end+1
dict[id]['tx_exon']=tx_pos
else:
for id in dict:
tx_pos,tx_start=[],0
if dict[id]["strand"] == "-":
dict[id]["exon"].sort(key=lambda tup: tup[0], reverse=True)
for pair in dict[id]["exon"]:
tx_end=pair[1]-pair[0]+tx_start
tx_pos.append((tx_start,tx_end))
tx_start=tx_end+1
dict[id]['tx_exon']=tx_pos
return (dict,is_gff)
def readalltranscript(intranscript):
all_transcripts = []
i = 0
with open(intranscript,"r") as f:
for line in f:
if i == 0: # Skip the first line
i+=1
continue
else:
all_transcripts.append(line.strip("\n"))
return all_transcripts
fasta_dict = readFasta("/projects/dsn001/camila/m6A/genomes/ApisMellifera_RNA.fasta")
gtf_dict, _ = readGTF("/projects/dsn001/camila/m6A/genomes/GCF_003254395.2_Amel_HAv3.1_genomic.gtf")
all_transcripts = readalltranscript("/projects/dsn001/camila/m6A/m6anet_analysis/pooled_rep/Rfiltered_data/trans2.txt")
t2g_dict = {tx: t2g(tx, fasta_dict, gtf_dict) for tx in all_transcripts
`
from m6anet.
Thank you, I tried to run the code, but I got this error
I saw this error EOF implies that the interpreter has reached the end of our program before executing all the code.
from m6anet.
hi @acarmas1 , yeah you encountered that error because the t2g dict does not have a closing bracket. Try
t2g_dict = {tx: t2g(tx, fasta_dict, gtf_dict) for tx in all_transcripts}
from m6anet.
By running it with the closing brackets I didn't get any errors, but I also didn't get any new files or results.
from m6anet.
hi @acarmas1 , the script gives you a Python dictionary that provides a mapping between each transcriptomic position to a genomic position in t2g_dict variable. Will you be able to use this to map the position in your data yourself? I'm a bit packed with other projects recently so I might take a while to help with code implementation
from m6anet.
The problem is that after running the script I didn't get any t2g_dict file. Therefore, I was wondering if you could help me please. After getting the dictionary, yes run the mapping by myself.
from m6anet.
yeah the dictionary is not saved as a file, you can use it directly to map your position. If you want to save it, then you have to save the t2g_dict within that script into a file
from m6anet.
hi @acarmas1 if you're more familiar with R, you can also check this discussion thread
from m6anet.
Yes, I'm more familiar with R. I'll give it a try. Thanks for sharing the discussion thread.
from m6anet.
Related Issues (20)
- Memory usage exploding HOT 3
- Empty Eventalign.txt for Negative Sense Viral RNA HOT 1
- dataprep and data.log HOT 8
- Questions about the m6A motif HOT 1
- Genome vs transcriptome alignment HOT 4
- m6anet, eventalign, galaxy HOT 3
- error with m6anet dataprep HOT 1
- yet another error with m6anet dataprep HOT 1
- running with test eventalign test file file form github produces another error HOT 8
- installation on linux failed HOT 3
- TypeError: argument should be integer or None, not 'numpy.float64' HOT 2
- Interpreting 'inference' results HOT 1
- m6anet dataprep fails to complete: cannot convert float NaN to integer HOT 3
- Issues with Read Selection HOT 2
- The position output by m6anet seems to be 1 less than the actual position HOT 1
- empty dataprep data.json and data.log HOT 5
- Arabidopsis VIRC instead of col0 (wild type) HOT 3
- probability_modified and mod_ratio are 0 HOT 2
- f5c or nanopolish HOT 3
- Disconnect between site and individual read mod probability RNA004
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from m6anet.