Hello, I have used m6anet to identified the m6a modification in dire

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Sure, on this link you can find the gtf file I'm using <a href="https://1drv.ms/u/s!Ao

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you, I tried to run the code, but I got this error <a target="

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

genomic position about m6anet HOT 16 CLOSED

acarmas1 commented on July 29, 2024

genomic position

from m6anet.

Comments (16)

chrishendra93 commented on July 29, 2024

hi @acarmas1 , sorry for the delay in my reply. I believe you have replied to the issue in [here](#54 (comment) and you encountered an error which I am not quite sure about unless you share with me the function that you used and the full error log?

Thanks!

from m6anet.

acarmas1 commented on July 29, 2024

Hi Chris, no problem

Also sorry for the late response. On this folder I added the code I run (gen_pos.sbatch), the error I got (out_genomic_position) and the files I used as inputs, which are in .fasta .gtf and .txt format. https://1drv.ms/u/s!AokqkR3muxL0jvIA82csbixN-ANRcQ?e=KWig3S

I'll appreciate your help and thank you.

from m6anet.

chrishendra93 commented on July 29, 2024

hi @acarmas1 seems like a syntax error on gen_pos.py, specifically the last line, your dictionary is missing one curly braces at the end

from m6anet.

acarmas1 commented on July 29, 2024

Thank you,

By adding the } at the end I was able to run it, but now I'm getting this error:

I think something is wrong with the exon_number argument of the GTF file.

from m6anet.

chrishendra93 commented on July 29, 2024

hi @acarmas1, may I know which gtf file you are using? It might be formatted differently

from m6anet.

acarmas1 commented on July 29, 2024

Sure, on this link you can find the gtf file I'm using https://1drv.ms/u/s!AokqkR3muxL0jvIA82csbixN-ANRcQ?e=KWig3S

from m6anet.

acarmas1 commented on July 29, 2024

Please can someone help me with this

from m6anet.

chrishendra93 commented on July 29, 2024

hi @acarmas1, sorry for the delay

I looked at your GTF file and it seems that the problem arises because your gtf file format is a bit different than what we used to deal with. I modified your gen_pos.py, try this out, t2g_dict will give transcript_position -> genomic_position for every transcript in your all_transcripts.txt file

`#!/usr/bin/env python

def t2g(tx_id, fasta_dict, gtf_dict):
    t2g_dict = {}
    tx_seq = fasta_dict[tx_id.split(".")[0]]
    tx_contig = gtf_dict[tx_id]['chr']
    g_id = gtf_dict[tx_id]['g_id']
    if tx_seq is None:
        return [], []

    for exon_num in range(len(gtf_dict[tx_id]['exon'])):
        g_interval = gtf_dict[tx_id]['exon'][exon_num]
        tx_interval = gtf_dict[tx_id]['tx_exon'][exon_num]
        for g_pos in range(g_interval[0], g_interval[1] + 1): # Exclude the rims of exons.
            dis_from_start = g_pos - g_interval[0]
            if gtf_dict[tx_id]['strand'] == "+":
                tx_pos = tx_interval[0] + dis_from_start
            elif gtf_dict[tx_id]['strand'] == "-":
                tx_pos = tx_interval[1] - dis_from_start
            if (g_interval[0] <= g_pos < g_interval[0]+2) or (g_interval[1]-2 < g_pos <= g_interval[1]): 
                kmer = 'XXXXX'
            else:
                kmer = tx_seq[tx_pos-2:tx_pos+3]
            t2g_dict[tx_pos] = g_pos # tx.contig is chromosome.
    return t2g_dict

def readFasta(infasta):
    fasta=open(infasta,"r")
    entries=""
    for ln in fasta:
        entries+=ln
    entries=entries.split(">")
    dict={}
    for entry in entries:
        entry=entry.split("\n")

        if len(entry[0].split()) > 0:
            id=entry[0].split()[0].split(".")[0]
            seq="".join(entry[1:])
            dict[id]=seq
    return dict

def readGTF(gtf_or_gff):
    gtf=open(gtf_or_gff,"r")
    dict,is_gff={},0
    for ln in gtf:
        if not ln.startswith("#"):
            ln=ln.strip("\n").split("\t")
            if is_gff == 0:
                if ln[-1].startswith("ID") or ln[-1].startswith("Parent"):
                    is_gff = 1
                else:
                    is_gff = -1

            if is_gff < 0:
                if ln[2] == "transcript" or ln[2] == "exon":
                    chr,type,start,end=ln[0],ln[2],int(ln[3]),int(ln[4])
                    attrList=ln[-1].split(";")
                    attrDict={}
                    for k in attrList:
                        p=k.strip().split(" ")
                        if len(p) == 2:
                            attrDict[p[0]]=p[1].strip('\"')
                    ##tx_id=ln[-1].split('; transcript_id "')[1].split('";')[0]
                    ##g_id=ln[-1].split('gene_id "')[1].split('";')[0]
                    tx_id = attrDict["transcript_id"]
                    g_id = attrDict["gene_id"]
                    

                    if tx_id not in dict:
                        dict[tx_id]={'chr':chr,'g_id':g_id,'strand':ln[6]}
                        if type not in dict[tx_id]:
                            if type in ("transcript", "exon"):
                                dict[tx_id][type]=[(start,end)]
        
                    else:
                        if type == "exon":
                            if type not in dict[tx_id]:
                                dict[tx_id][type]=[(start,end)]
                            else:
                                dict[tx_id][type].append((start,end))
                        
            if is_gff > 0:
                if ln[2] == "exon" or ln[2] == "mRNA":
                    chr,type,start,end=ln[0],ln[2],int(ln[3]),int(ln[4])
                    tx_id=ln[-1].split('transcript:')[1].split(';')[0]
                    if ln[2] == "mRNA":
                        type="transcript"
                    if tx_id not in dict:
                        dict[tx_id]={'chr':chr,'strand':ln[6]}
                        if type == "transcript":
                            dict[tx_id][type]=(start,end)
                        if type == "exon":
                            dict[tx_id][type]=[(start,end)]
                    else:
                        if type == "transcript" and type not in dict[tx_id]:
                            dict[tx_id][type]=(start,end)
                        if type == "exon":
                            if type not in dict[tx_id]:
                                dict[tx_id][type]=[(start,end)]
                            else:
                                dict[tx_id][type].append((start,end))
    #convert genomic positions to tx positions
    if is_gff < 0:
        for id in dict:
            tx_pos,tx_start=[],0
            if 'exon' not in dict[id]:
                print(id)
                continue
            for pair in dict[id]["exon"]:
                tx_end=pair[1]-pair[0]+tx_start
                tx_pos.append((tx_start,tx_end))
                tx_start=tx_end+1
            dict[id]['tx_exon']=tx_pos
    else:
        for id in dict:
            tx_pos,tx_start=[],0
            if dict[id]["strand"] == "-":
                dict[id]["exon"].sort(key=lambda tup: tup[0], reverse=True)
            for pair in dict[id]["exon"]:
                tx_end=pair[1]-pair[0]+tx_start
                tx_pos.append((tx_start,tx_end))
                tx_start=tx_end+1
            dict[id]['tx_exon']=tx_pos
    return (dict,is_gff)


def readalltranscript(intranscript):
    all_transcripts = []
    i = 0
    with open(intranscript,"r") as f:
        for line in f:
            if i == 0: # Skip the first line
                i+=1
                continue
            else:
                all_transcripts.append(line.strip("\n"))
    return all_transcripts  

fasta_dict = readFasta("/projects/dsn001/camila/m6A/genomes/ApisMellifera_RNA.fasta")
gtf_dict, _ = readGTF("/projects/dsn001/camila/m6A/genomes/GCF_003254395.2_Amel_HAv3.1_genomic.gtf")
all_transcripts = readalltranscript("/projects/dsn001/camila/m6A/m6anet_analysis/pooled_rep/Rfiltered_data/trans2.txt")
t2g_dict = {tx: t2g(tx, fasta_dict, gtf_dict) for tx in all_transcripts

from m6anet.

acarmas1 commented on July 29, 2024

Thank you, I tried to run the code, but I got this error

I saw this error EOF implies that the interpreter has reached the end of our program before executing all the code.

from m6anet.

chrishendra93 commented on July 29, 2024

hi @acarmas1 , yeah you encountered that error because the t2g dict does not have a closing bracket. Try

t2g_dict = {tx: t2g(tx, fasta_dict, gtf_dict) for tx in all_transcripts}

from m6anet.

acarmas1 commented on July 29, 2024

By running it with the closing brackets I didn't get any errors, but I also didn't get any new files or results.

from m6anet.

chrishendra93 commented on July 29, 2024

hi @acarmas1 , the script gives you a Python dictionary that provides a mapping between each transcriptomic position to a genomic position in t2g_dict variable. Will you be able to use this to map the position in your data yourself? I'm a bit packed with other projects recently so I might take a while to help with code implementation

from m6anet.

acarmas1 commented on July 29, 2024

The problem is that after running the script I didn't get any t2g_dict file. Therefore, I was wondering if you could help me please. After getting the dictionary, yes run the mapping by myself.

from m6anet.

chrishendra93 commented on July 29, 2024

yeah the dictionary is not saved as a file, you can use it directly to map your position. If you want to save it, then you have to save the t2g_dict within that script into a file

from m6anet.

chrishendra93 commented on July 29, 2024

hi @acarmas1 if you're more familiar with R, you can also check this discussion thread

from m6anet.

acarmas1 commented on July 29, 2024

Yes, I'm more familiar with R. I'll give it a try. Thanks for sharing the discussion thread.

from m6anet.

genomic position about m6anet HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent