Giter VIP home page Giter VIP logo

ncc's Introduction

NCC : non-Coding RNA Classifier

A new AI model trained and tested with fresh updated dataset of small Non-coding RNA (ncRNA or sncRNA) sequences to resolve efficiently the classification of small non-coding RNA. Biological experimental methods for identifying ncRNA families are not only time-consuming and labor-intensive but also expensive, making them impractical for the demands of high-throughput technology.



Performance comparison of several prediction methods

Method/Model Accuracy Sensitivity Precision F-score MCC
RNAcon 0.3737 0.3787 0.4500 0.3605 0.3341
GeaPPLE 0.6487 0.6684 0.7325 0.7050 0.6857
nRC 0.6960 0.6889 0.6878 0.6878 0.6627
ncRFP 0.7972 0.7878 0.7904 0.7883 0.7714
ncDLRES 0.8430 0.8344 0.8419 0.8407 0.8335
ncDENSE 0.8687 0.8677 0.8703 0.8667 0.8574
--> NCC 0.9897 0.9870 0.9892 0.9880 0.9889
MncR > 97% - - - -

The main modules of this Repo

Functions Files
Data collection functions rfam_query.py
Data Analysis Analysis.ipynb
Data transformation ncc_DataTransform.py
AI Models ncc_Model.py
Training and testing the model ncc_TrainTest.py

Data collection functions


To collect datasets from Rfam database and assemble the main used dataset you will find methods in rfam_query.py file

# Update if you need more or less RNA families to be downloaded form Rfam db
def get_RNA_Families_in_interest() -> []:
    return [
        'Cis-reg; IRES;',
        'Cis-reg; leader;',
        'Cis-reg; riboswitch;',
        'Cis-reg; riboswitch;',
        'Gene; ribozyme;',
        'Gene; rRNA;',
        'Gene; miRNA;',
        'Gene; snRNA; snoRNA; CD-box;',
        'Gene; snRNA; snoRNA; HACA-box;',
        'Gene; snRNA; snoRNA; scaRNA;',
        'Gene; tRNA;',
        'Intron;'
    ]

Data Analysis

If a Jupiter Notebook with some statictic analysis of the dataset that can help finalize the data input of the AI model. The final dataset has more than 50.000 labeld RNA sequences in fasta format as shown bellow:

>IRES
ATACCTTTCTCGGCCTTTTGGCTAAGATCAAGTGTAGTATCTGTTCTTATCAGTTTAATATCTGATACGTGGGCCA ...
>tRNA
GCACCACTCTGGCCTTTTGGCTTAGATCAAGTGTAGTATCTGTTCTTATTAGTTTAACCACTAATATGGTCGCACC ...
>tRNA
ATACCTTTCTCGGCCTTTTGGCTAAGATCAAGTGTAGTATCTGTTTTTATCAGTTTAATATCTGATATGTGGTCCA ...
>riboswitch
ATTACTTCTCAGCCTTTTGGCTAAGATCAAGTGTAATAAATCTCATTGTGCTTTATGCCTAATGTGTGCTTATATT ...
>HACA-box
CCAGCTCTCTTTGCCTTTTGGCTTAGATCAAGTGTAGTATCTGTTCTTTTCAGTTTAATCTCTGAAAGTGTTCTAA ...
>tRNA
ACAGCTGATGCCGCAGCTACACTATGTATTAATCGGATTTTTGAACTTGGAGTACGGTTCTGGAGCTTGCTCCACC ...

Data transformation

Padding, cutting and encoding the RNA sequences before loading them to AI model. If you and to change the encoding method edit this file. One-hot encoding is used.

# Ribisome encoding
# --------------------------------------
A_rep_8d = [1, 0, 0, 0, 0, 0, 1, 0]
U_rep_8d = [0, 1, 0, 0, 0, 0, 0, 1]
G_rep_8d = [0, 0, 1, 0, 1, 0, 0, 0]
C_rep_8d = [0, 0, 0, 1, 0, 1, 0, 0]
X_rep_8d = [0, 0, 0, 0, 0, 0, 0, 0]

AI Models

The keras model used for this task. Consists of an Biderectional RRN in the input and Densenet CNN.

Training and testing the model

A jupiter Notepad for training evaluating/tasting the selected model and some metrics along.

Requirements

  • python
    • docker - Docker SDK for Python
    • wget
    • fastaparser - A Python FASTA file Parser and Writer

NEED TO UPDATE

Recources

Rfam

Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models

ncc's People

Contributors

vassilas avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.