Giter VIP home page Giter VIP logo

pmtnet's People

Contributors

akazhiel avatar tianshilu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pmtnet's Issues

Labelled training data used in pMTnet

Thank you for a great tool! I am still pretty new in this field.

I would like to learn more about the training process on pMTnet. I am not sure if I missed the training data in the repository. Could you please provide the training data used in pMTnet with positive and negative labels (e.g. positive/TCR_output.csv, negative/TCR_output.csv, training_positive.csv)? Thank you so much for all your efforts!

Prediction with unknown HLA

Dear Tianshi,

I want to examine the connection between peptides and TCR, but without the related HLA information. Do you have any suggestions?

Thank you so much for your attention and help.
Yingcheng

Retrain Model pMTnet

Hi,

I find the relevant training code is provided in this file test/code/ternary_train_model_pMTnet.py. However there are still some missing parts. Could you help with the following questions?

  1. How to generate these files? Could you offer the relevant code?
tcr_file_train_pos='positive/TCR_output.csv'
tcr_file_train_neg='negative/TCR_output.csv'                                        
hla_antigen_file_train='MHC_antigen_output.csv'
  1. What is the exact shape of negative data?
ternary_prediction.fit({'pos_in':tcr_train_pos,'neg_in':tcr_train_neg,'hla_antigen_in':hla_antigen_train}, {'output':Y_train},epochs=150,batch_size=256,shuffle=True)

The meaning of this line of code seems to be that the number of negative samples should be equal to the number of positive samples, not 10:1 as stated in the article.

  1. Is (pos_in , neg_in) fixed?
    The shown code seems to indicate that each positive sample fixes a negative sample at training time. The network only identifies the TCR truly bound out of another one fixed TCR?

I am new about the field and keras. Thank you for your efforts.

Prediction with unknown antigen

Hi Dr Tianshi,

Thanks for this beautiful work. It is really useful.

I have a list of HLA (HLA-A*02:01) and TCR CDR3 sequences (e.g. CAVLDSNYQLIW), but I don't know what the exact antigen is. Is there any possible solutions to compute the score of HLA-TCR match score?

Thanks again for your kind help.

Best,
Yingcheng

Full TCR sequences of validation data

Thanks for this new tool and for the data you provide!

I am very interested to use the validation dataset you use to judge the performance of pMTnet, as I think you make a very valid point concerning the quality of data and its effect on model performance. However, I am currently using models that take the full TCR sequence into account. Do you maybe have the full TCR sequences or V and J gene usage information for the validation data?

about the testing data

Hello, When I was reading this paper"Attention-aware contrastive learning for predicting T cell receptor–antigen binding specificity", I found that the dataset involved in the paper came from your paper. But half of the 619 test cases described in the paper were positive and half were negative.
So, may I ask if all 619 test cases in the test set are positive? Or is it half positive and half negative? thanks

Strange characters in the testing_data.csv and training_data.csv

hi, I find some strange characters in the datasets you provided for training and testing. For example, in the 30 and 31 row of testing_data.csv, the antigen sequence seems to contain a strange Chinese word.

a

When I loaded this file with pandas, I found this character seems to be '\xa0'.
image
So, is this a mistake made in generating the files or '\xa0' could have some special meaning? Thank you.

Slow encoding

Greetings!

Great tool to help predict the TCR-pMHC bindings although, is there any way to speed up the encoding step? Since I understand the aim of this tool is to predict how well your TCR repertoire binds to the predicted pMHCs, the encoding is far slower than what I'd expect. Given you'd pair each TCR to the whole list of pMHCs to test for binding, this would generate files of millions of lines. Currently I'm running it on a file with 2M lines and it's been almost 3 days of running time and the encoding is not even close to be done. Maybe it's not expected to use as input all the possible combinations but just some of them? In that case how would you select them?

Best regards,

Jonatan

How to filter the result by Rank

Hi,
I know that 'A lower rank considered a good prediction', but how can I select the credible CDR3-Antigen from the output? Could you please provide thresholds or any filter methods?

Thanks

How to calculate the AUC?

Hi,

I'm wondering how you calculate the AUC value since the output of pMTnet is the relative rank?

Thanks

Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria.

Hi, when I try to use pMTnet a waring appears looks like this:
2022-05-16 10:04:14.706952: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-05-16 10:04:16.631750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10791 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:86:00.0, compute capability: 3.7 2022-05-16 10:04:16.635577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10791 MB memory: -> device: 1, name: Tesla K80, pci bus id: 0000:87:00.0, compute capability: 3.7 2022-05-16 10:04:18.266973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8100 WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
And I found GPU did not help at all compared to CPU version.
image
Can you help me find what‘s wrong? I want to speed up the prediction.I am using tensorflow2.7.0 and python 3.8.5.Thanks!!

HLA Pseudosequence generation

Thanks for making the test code for the tool available.

I have a query regarding how the HLA pseudosequences are generated

Here there are hard coded indexes for generating the pseudo sequences; however, my understanding was that an alignment was needed before using these indexes since the HLAs in the fastas you've used are of varying length. After this the indexes from the original netMHCpan paper describing the method wouldn't necessarily be correct for your HLA sequences.

If you look at the HLA analysis in netMHCpan (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000796) the pseudosequences have an expected pattern which I don't think holds using your method, indicating you're not using the same pseudosequences they are (at least at test time).

MHCFlurry used what looks like a similar set of HLA fastas to you and after their alignment they start at index 31 not 7

Did you use a different method for training? If not it could be possible that the network is mainly performing an accurate match between peptide-TCR. The HLAs are still being encoded, but not in a way which preserves the likely contact points.

Apologies if I've missed part of the implementation which addresses this!

A bug may be in pMTnet

Hi,

In the for loop (line 83), there are 34 values in pseudo_seq_pos array. but only use 33 values to construct the pseudo sequence (line 95): for i in range(0,33).

range(0,33)=0,1,2,3,...,32

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.