Giter VIP home page Giter VIP logo

Comments (16)

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1 , sorry for the delay in my reply. I believe you have replied to the issue in [here](#54 (comment) and you encountered an error which I am not quite sure about unless you share with me the function that you used and the full error log?

Thanks!

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

Hi Chris, no problem

Also sorry for the late response. On this folder I added the code I run (gen_pos.sbatch), the error I got (out_genomic_position) and the files I used as inputs, which are in .fasta .gtf and .txt format. https://1drv.ms/u/s!AokqkR3muxL0jvIA82csbixN-ANRcQ?e=KWig3S

I'll appreciate your help and thank you.

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1 seems like a syntax error on gen_pos.py, specifically the last line, your dictionary is missing one curly braces at the end

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

Thank you,

By adding the } at the end I was able to run it, but now I'm getting this error:

image

I think something is wrong with the exon_number argument of the GTF file.

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1, may I know which gtf file you are using? It might be formatted differently

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

Sure, on this link you can find the gtf file I'm using https://1drv.ms/u/s!AokqkR3muxL0jvIA82csbixN-ANRcQ?e=KWig3S

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

Please can someone help me with this

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1, sorry for the delay

I looked at your GTF file and it seems that the problem arises because your gtf file format is a bit different than what we used to deal with. I modified your gen_pos.py, try this out, t2g_dict will give transcript_position -> genomic_position for every transcript in your all_transcripts.txt file

`#!/usr/bin/env python

def t2g(tx_id, fasta_dict, gtf_dict):
    t2g_dict = {}
    tx_seq = fasta_dict[tx_id.split(".")[0]]
    tx_contig = gtf_dict[tx_id]['chr']
    g_id = gtf_dict[tx_id]['g_id']
    if tx_seq is None:
        return [], []

    for exon_num in range(len(gtf_dict[tx_id]['exon'])):
        g_interval = gtf_dict[tx_id]['exon'][exon_num]
        tx_interval = gtf_dict[tx_id]['tx_exon'][exon_num]
        for g_pos in range(g_interval[0], g_interval[1] + 1): # Exclude the rims of exons.
            dis_from_start = g_pos - g_interval[0]
            if gtf_dict[tx_id]['strand'] == "+":
                tx_pos = tx_interval[0] + dis_from_start
            elif gtf_dict[tx_id]['strand'] == "-":
                tx_pos = tx_interval[1] - dis_from_start
            if (g_interval[0] <= g_pos < g_interval[0]+2) or (g_interval[1]-2 < g_pos <= g_interval[1]): 
                kmer = 'XXXXX'
            else:
                kmer = tx_seq[tx_pos-2:tx_pos+3]
            t2g_dict[tx_pos] = g_pos # tx.contig is chromosome.
    return t2g_dict

def readFasta(infasta):
    fasta=open(infasta,"r")
    entries=""
    for ln in fasta:
        entries+=ln
    entries=entries.split(">")
    dict={}
    for entry in entries:
        entry=entry.split("\n")

        if len(entry[0].split()) > 0:
            id=entry[0].split()[0].split(".")[0]
            seq="".join(entry[1:])
            dict[id]=seq
    return dict

def readGTF(gtf_or_gff):
    gtf=open(gtf_or_gff,"r")
    dict,is_gff={},0
    for ln in gtf:
        if not ln.startswith("#"):
            ln=ln.strip("\n").split("\t")
            if is_gff == 0:
                if ln[-1].startswith("ID") or ln[-1].startswith("Parent"):
                    is_gff = 1
                else:
                    is_gff = -1

            if is_gff < 0:
                if ln[2] == "transcript" or ln[2] == "exon":
                    chr,type,start,end=ln[0],ln[2],int(ln[3]),int(ln[4])
                    attrList=ln[-1].split(";")
                    attrDict={}
                    for k in attrList:
                        p=k.strip().split(" ")
                        if len(p) == 2:
                            attrDict[p[0]]=p[1].strip('\"')
                    ##tx_id=ln[-1].split('; transcript_id "')[1].split('";')[0]
                    ##g_id=ln[-1].split('gene_id "')[1].split('";')[0]
                    tx_id = attrDict["transcript_id"]
                    g_id = attrDict["gene_id"]
                    

                    if tx_id not in dict:
                        dict[tx_id]={'chr':chr,'g_id':g_id,'strand':ln[6]}
                        if type not in dict[tx_id]:
                            if type in ("transcript", "exon"):
                                dict[tx_id][type]=[(start,end)]
        
                    else:
                        if type == "exon":
                            if type not in dict[tx_id]:
                                dict[tx_id][type]=[(start,end)]
                            else:
                                dict[tx_id][type].append((start,end))
                        
            if is_gff > 0:
                if ln[2] == "exon" or ln[2] == "mRNA":
                    chr,type,start,end=ln[0],ln[2],int(ln[3]),int(ln[4])
                    tx_id=ln[-1].split('transcript:')[1].split(';')[0]
                    if ln[2] == "mRNA":
                        type="transcript"
                    if tx_id not in dict:
                        dict[tx_id]={'chr':chr,'strand':ln[6]}
                        if type == "transcript":
                            dict[tx_id][type]=(start,end)
                        if type == "exon":
                            dict[tx_id][type]=[(start,end)]
                    else:
                        if type == "transcript" and type not in dict[tx_id]:
                            dict[tx_id][type]=(start,end)
                        if type == "exon":
                            if type not in dict[tx_id]:
                                dict[tx_id][type]=[(start,end)]
                            else:
                                dict[tx_id][type].append((start,end))
    #convert genomic positions to tx positions
    if is_gff < 0:
        for id in dict:
            tx_pos,tx_start=[],0
            if 'exon' not in dict[id]:
                print(id)
                continue
            for pair in dict[id]["exon"]:
                tx_end=pair[1]-pair[0]+tx_start
                tx_pos.append((tx_start,tx_end))
                tx_start=tx_end+1
            dict[id]['tx_exon']=tx_pos
    else:
        for id in dict:
            tx_pos,tx_start=[],0
            if dict[id]["strand"] == "-":
                dict[id]["exon"].sort(key=lambda tup: tup[0], reverse=True)
            for pair in dict[id]["exon"]:
                tx_end=pair[1]-pair[0]+tx_start
                tx_pos.append((tx_start,tx_end))
                tx_start=tx_end+1
            dict[id]['tx_exon']=tx_pos
    return (dict,is_gff)


def readalltranscript(intranscript):
    all_transcripts = []
    i = 0
    with open(intranscript,"r") as f:
        for line in f:
            if i == 0: # Skip the first line
                i+=1
                continue
            else:
                all_transcripts.append(line.strip("\n"))
    return all_transcripts  

fasta_dict = readFasta("/projects/dsn001/camila/m6A/genomes/ApisMellifera_RNA.fasta")
gtf_dict, _ = readGTF("/projects/dsn001/camila/m6A/genomes/GCF_003254395.2_Amel_HAv3.1_genomic.gtf")
all_transcripts = readalltranscript("/projects/dsn001/camila/m6A/m6anet_analysis/pooled_rep/Rfiltered_data/trans2.txt")
t2g_dict = {tx: t2g(tx, fasta_dict, gtf_dict) for tx in all_transcripts

`

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

Thank you, I tried to run the code, but I got this error

image

I saw this error EOF implies that the interpreter has reached the end of our program before executing all the code.

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1 , yeah you encountered that error because the t2g dict does not have a closing bracket. Try

t2g_dict = {tx: t2g(tx, fasta_dict, gtf_dict) for tx in all_transcripts}

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

By running it with the closing brackets I didn't get any errors, but I also didn't get any new files or results.

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1 , the script gives you a Python dictionary that provides a mapping between each transcriptomic position to a genomic position in t2g_dict variable. Will you be able to use this to map the position in your data yourself? I'm a bit packed with other projects recently so I might take a while to help with code implementation

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

The problem is that after running the script I didn't get any t2g_dict file. Therefore, I was wondering if you could help me please. After getting the dictionary, yes run the mapping by myself.

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

yeah the dictionary is not saved as a file, you can use it directly to map your position. If you want to save it, then you have to save the t2g_dict within that script into a file

from m6anet.

chrishendra93 avatar chrishendra93 commented on July 29, 2024

hi @acarmas1 if you're more familiar with R, you can also check this discussion thread

from m6anet.

acarmas1 avatar acarmas1 commented on July 29, 2024

Yes, I'm more familiar with R. I'll give it a try. Thanks for sharing the discussion thread.

from m6anet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.