The kinaseresistance from russelllab

kinaseresistance's Issues

SVG not clickable

https://stackoverflow.com/questions/37121767/svg-with-clickable-links-in-shiny-not-clickable

re:svg, I was looking this up for my shiny application today and people seem to recommend incorporating svgs via "".
It did not work for RShiny (I found another way to do it there) but otherwise seemed to be a promising approach.

Rank the genes in the alignment in different ways

https://agrussell.slack.com/archives/C04P305PK61/p1683191322269059
Point 6, can you introduce another parameter in the main function that tells how to sort the genes in the alignment?
z.B.

value 1 = rank by similarity (you could write a BLAST protocol to do this?)
value 2 = rank by known information quantity (default, as of now)
value 3 = rank by p-sites (you can use the kinase_project DB of PostgreSQL on peveolution2)

At the front end, I will create a drop-down menu, from which the user selects the type of sort, and then I call the same main function.

I think you can use the PostgreSQL DB to fetch the fasta sequences and PTM information. You can connect to the DB in your code:

import psycopg2

def connection():
    '''Function to connect to postgresql database'''
    mydb = psycopg2.connect(
                            database = "kinase_project",
                            user = "gurdeep",
                            password = "hellokitty",
                            host = "localhost",
                            port = "5432")
    return mydb

mydb = connection()
mydb.autocommit = True
mycursor = mydb.cursor()

and from the command-line on pevolution2: psql -U gurdeep kinase_project

Let me know if I can provide or do something else or if you have a better alternative in mind. :-)

Add a separate colum for Pred D

Create an FM

Collect all possible features and create an FM (feature matrix) to be used for ML

Display "loading" while the alignment scripts runs at the backend

Make database

Make a PCA with all features

Draw a bubble plot/heatmap to show ME in mutations

Fetch sequences from UniProt
Map Pfam positions to the sequences
Draw a bubble plot/heatmap

Rob on slack channel

"
EGFR
P00533/V689M
P00533/E1005R
P00533/D1006K

MAP2K3
P46734/S218E
P46734/T222E

Actually, multiple newlines cause a crash I think."

"(gives "You called GET") and putting in various things just crashed (e.g. FGFR2/C278F)"

in this kinase +/- 2 residues and then maybe +/- 5 residues

COSMIC dataset for kinases

@JCGonzS

Rob suggestions

Other little things for the WebApp (some of these are repetitions, but people under 40 apparently do not ever read anything that isn't in a tweet - according to some podcast I just heard - so...)

name of the kinase and the Uniprot ID would be nice in the first info box. We could even do synonyms.
#41
hyperlinks wherever possible
fix the "known mutation" labels to remove boomerie punctutation, ECO1234 and add hyperlinks to the PubMed links.

we want sections about prior info below the name:

in this kinase at this position
#40
in other kinases at this position
in others kinases +/- 2 residues or maybe +/- 5 residues
perhaps give the user the option to rank kinases in the alignment in different ways? Rank by sequence similarity or rank by information quantity or rank by n-phosphosites, etc. #37 (comment)
help pages need to be there. (#39 (comment))
Change the URL to activark or whatever
(re: 5 I'm not sure if c should be before b but it probably doesn't really matter)
labels regarding which region of the kinase canonical structure we are in (P-loop, A-loop whatever)

add bar plots for features

Develop a web application

Avoid crashes

Silent variants
Without slash
Incorrect format

update dataset

z.B. BRAF/V600E is missing

Add dropdown button for gene sorting in alignment

Active links of p-sites and ADR in the info box

Add active links for p-sits and known variants. Remove "ECO" etc from the info and just have the text and the link.

Alignment colors

@tschmenger

refer point 2
https://agrussell.slack.com/archives/C04P305PK61/p1682938480543719

Let's go with colour-blind-friendly shades.

The top one (red/orange) is deactivating
Third from the top (bluish) is resistance
the bottom-most one (greenish) is activating

I can fx this in my scripts and can you do this in yours?

Vary the sequence identity cutoffs and regenerate the scores

Currently, the sequences are sorted based on identity
Take the top 25, 50% and 75% and regenerate the scores
The idea is that perhaps by limiting the homology distance, we can get better resolution

Update alignment in the webApp

#41 is also fixed. Follow the comments below to update the alignment

pytests for create_svf

Write about other kinases at the equivalent positon in the info box

Write a script to generate input to createSVG

Take an input kinase and find out the top 30 kinases based on seq identity
Do an hmmalign of all the candidates and prepare an output aln
Make dictionary with mutations and region annotations

Prepare a Jupyter notebook to run VAE

Features

Normalize the data correctly. For example, homolog scores should be done around -1 to 1 and not 0 to -1 (maybe)
Add structure features

Create Database

webApp

make a function that created output folder and call it every time
make a dedicated function that stores meta data in a dic, which is returned
display basic information at top of results page
connect summary to results page
in AJAXChart, check if output already exists before running the predictor again

Tutorial

Write a short tutorial like in precog(x)

Gold standard alignment

Take all the human kinases
Align them first using hmmalign to the Pkinase profile
Then look at the alignment with ADR mutations with the help of Torsten's tool
Fix it manually, and then build an HMM out of it.

Help pages

Improve alignment for CreateSVG

Create an alignment based on what is around the given position of interest or by selecting kinases that have the most information.

Error

Error handling for both webApp and command-line z.B. if gene or ACC does not exist, if the position is outside domain

add ADR window size in the FM

Tests

Write tests for both webApp and command-line

Convert the "more information" into datatable

Static Heatmap

@gurdeep330 told me that Rob
"wants the ADR heatmap to NOT change i.e. it should be done for the entire alignment (>400 kinases) even if the user is looking at top 10 or 20, say"

The most efficient solution I could think of immediately would be to do the counts for the 'heatmap' separately and read it from a dictionary (using alignment positions rather than protein positions I think) rather than going through each letter of the whole alignment to sum up the values, each time we recalculate/redraw the alignment.

gene names in the alignment

Violing plots

Draw a GRID violin plot to describe the role of each feature i.e. show multiple features in the same plot.

Align sequences using Pfam Kinase domain

Write info about the position on the output page and the results page

If the given position is a PTM site
if the given position in an ADR site
1 and 2 if known at other positions
if 1 and 2 in neighbouring residues

Show aln postitions and not hmm positions
Check that predictions for outside the kinase domain are not made
Check if changing to "Activating" helps in the search
Add +1/-1 in the aln/hmmpos columns of the summary table

webApp suggestions

Download buttons
Color scores
"known ADR"

russelllab / kinaseresistance Goto Github PK

kinaseresistance's People

Contributors

Stargazers

kinaseresistance's Issues

Recommend Projects

Recommend Topics

Recommend Org