Giter VIP home page Giter VIP logo

russelllab / kinaseresistance Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 4.64 GB

A method to predict activating, deactivating and resistance mutations in kinases

Home Page: http://activark.russelllab.org

License: GNU General Public License v3.0

Python 9.26% HTML 7.32% Shell 5.41% JavaScript 76.81% CSS 1.12% SCSS 0.05% GLSL 0.02%
activating-mutations bioinformatics drug-resistance flask hidden-markov-model kinases machine-learning mutations random-forest-classifier

kinaseresistance's People

Contributors

gurdeep330 avatar jcgonzs avatar tschmenger avatar

Stargazers

 avatar

kinaseresistance's Issues

Rank the genes in the alignment in different ways

https://agrussell.slack.com/archives/C04P305PK61/p1683191322269059
Point 6, can you introduce another parameter in the main function that tells how to sort the genes in the alignment?
z.B.

  • value 1 = rank by similarity (you could write a BLAST protocol to do this?)
  • value 2 = rank by known information quantity (default, as of now)
  • value 3 = rank by p-sites (you can use the kinase_project DB of PostgreSQL on peveolution2)

At the front end, I will create a drop-down menu, from which the user selects the type of sort, and then I call the same main function.

I think you can use the PostgreSQL DB to fetch the fasta sequences and PTM information. You can connect to the DB in your code:

import psycopg2

def connection():
    '''Function to connect to postgresql database'''
    mydb = psycopg2.connect(
                            database = "kinase_project",
                            user = "gurdeep",
                            password = "hellokitty",
                            host = "localhost",
                            port = "5432")
    return mydb

mydb = connection()
mydb.autocommit = True
mycursor = mydb.cursor()

and from the command-line on pevolution2: psql -U gurdeep kinase_project

Let me know if I can provide or do something else or if you have a better alternative in mind. :-)

Create an FM

Collect all possible features and create an FM (feature matrix) to be used for ML

Rob on slack channel

  1. "
    EGFR
    P00533/V689M
    P00533/E1005R
    P00533/D1006K

MAP2K3
P46734/S218E
P46734/T222E

Actually, multiple newlines cause a crash I think."

  1. "(gives "You called GET") and putting in various things just crashed (e.g. FGFR2/C278F)"

Rob suggestions

Other little things for the WebApp (some of these are repetitions, but people under 40 apparently do not ever read anything that isn't in a tweet - according to some podcast I just heard - so...)

  • name of the kinase and the Uniprot ID would be nice in the first info box. We could even do synonyms.
  • #41
  • hyperlinks wherever possible
  • fix the "known mutation" labels to remove boomerie punctutation, ECO1234 and add hyperlinks to the PubMed links.

we want sections about prior info below the name:

  • in this kinase at this position
  • #40
  • in other kinases at this position
  • in others kinases +/- 2 residues or maybe +/- 5 residues
  • perhaps give the user the option to rank kinases in the alignment in different ways? Rank by sequence similarity or rank by information quantity or rank by n-phosphosites, etc. #37 (comment)
  • help pages need to be there. (#39 (comment))
  • Change the URL to activark or whatever
  • (re: 5 I'm not sure if c should be before b but it probably doesn't really matter)
  • labels regarding which region of the kinase canonical structure we are in (P-loop, A-loop whatever)

Write a script to generate input to createSVG

  1. Take an input kinase and find out the top 30 kinases based on seq identity
  2. Do an hmmalign of all the candidates and prepare an output aln
  3. Make dictionary with mutations and region annotations

Features

  1. Normalize the data correctly. For example, homolog scores should be done around -1 to 1 and not 0 to -1 (maybe)
  2. Add structure features

webApp

  • make a function that created output folder and call it every time
  • make a dedicated function that stores meta data in a dic, which is returned
  • display basic information at top of results page
  • connect summary to results page
  • in AJAXChart, check if output already exists before running the predictor again

Tutorial

Write a short tutorial like in precog(x)

Gold standard alignment

  1. Take all the human kinases
  2. Align them first using hmmalign to the Pkinase profile
  3. Then look at the alignment with ADR mutations with the help of Torsten's tool
  4. Fix it manually, and then build an HMM out of it.

Improve alignment for CreateSVG

Create an alignment based on what is around the given position of interest or by selecting kinases that have the most information.

Error

Error handling for both webApp and command-line z.B. if gene or ACC does not exist, if the position is outside domain

Tests

Write tests for both webApp and command-line

Static Heatmap

@gurdeep330 told me that Rob
"wants the ADR heatmap to NOT change i.e. it should be done for the entire alignment (>400 kinases) even if the user is looking at top 10 or 20, say"

The most efficient solution I could think of immediately would be to do the counts for the 'heatmap' separately and read it from a dictionary (using alignment positions rather than protein positions I think) rather than going through each letter of the whole alignment to sum up the values, each time we recalculate/redraw the alignment.

Violing plots

Draw a GRID violin plot to describe the role of each feature i.e. show multiple features in the same plot.

webApp suggestions Rob

  • Show aln postitions and not hmm positions
  • Check that predictions for outside the kinase domain are not made
  • Check if changing to "Activating" helps in the search
  • Add +1/-1 in the aln/hmmpos columns of the summary table

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.