tianshilu / pmtnet Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 20.0 4.17 MB

Deep Learning the T Cell Receptor Binding Specificity of Neoantigen

License: GNU General Public License v2.0

Python 99.58% Shell 0.42%

pmtnet's People

Contributors

Stargazers

Watchers

pmtnet's Issues

Retrain Model pMTnet

Hi,

I find the relevant training code is provided in this file test/code/ternary_train_model_pMTnet.py. However there are still some missing parts. Could you help with the following questions?

How to generate these files? Could you offer the relevant code?

tcr_file_train_pos='positive/TCR_output.csv'
tcr_file_train_neg='negative/TCR_output.csv'                                        
hla_antigen_file_train='MHC_antigen_output.csv'

What is the exact shape of negative data?

ternary_prediction.fit({'pos_in':tcr_train_pos,'neg_in':tcr_train_neg,'hla_antigen_in':hla_antigen_train}, {'output':Y_train},epochs=150,batch_size=256,shuffle=True)

The meaning of this line of code seems to be that the number of negative samples should be equal to the number of positive samples, not 10:1 as stated in the article.

Is (pos_in , neg_in) fixed?
The shown code seems to indicate that each positive sample fixes a negative sample at training time. The network only identifies the TCR truly bound out of another one fixed TCR?

I am new about the field and keras. Thank you for your efforts.

Prediction with unknown antigen

Hi Dr Tianshi,

Thanks for this beautiful work. It is really useful.

I have a list of HLA (HLA-A*02:01) and TCR CDR3 sequences (e.g. CAVLDSNYQLIW), but I don't know what the exact antigen is. Is there any possible solutions to compute the score of HLA-TCR match score?

Thanks again for your kind help.

Best,
Yingcheng

Hello author, thank you for your excellent work. I noticed that TCR_encoder_30.h5 requires sequences of length 80. I would like to ask how to use it if the sequence length exceeds 80? Or could you provide the code used to train TCR_encoder_30.h5?

Slow encoding

Greetings!

Great tool to help predict the TCR-pMHC bindings although, is there any way to speed up the encoding step? Since I understand the aim of this tool is to predict how well your TCR repertoire binds to the predicted pMHCs, the encoding is far slower than what I'd expect. Given you'd pair each TCR to the whole list of pMHCs to test for binding, this would generate files of millions of lines. Currently I'm running it on a file with 2M lines and it's been almost 3 days of running time and the encoding is not even close to be done. Maybe it's not expected to use as input all the possible combinations but just some of them? In that case how would you select them?

Best regards,

Jonatan

A bug may be in pMTnet

Hi,

In the for loop (line 83), there are 34 values in pseudo_seq_pos array. but only use 33 values to construct the pseudo sequence (line 95): for i in range(0,33).

range(0,33)=0,1,2,3,...,32

HLA Pseudosequence generation

Thanks for making the test code for the tool available.

I have a query regarding how the HLA pseudosequences are generated

Here there are hard coded indexes for generating the pseudo sequences; however, my understanding was that an alignment was needed before using these indexes since the HLAs in the fastas you've used are of varying length. After this the indexes from the original netMHCpan paper describing the method wouldn't necessarily be correct for your HLA sequences.

If you look at the HLA analysis in netMHCpan (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000796) the pseudosequences have an expected pattern which I don't think holds using your method, indicating you're not using the same pseudosequences they are (at least at test time).

MHCFlurry used what looks like a similar set of HLA fastas to you and after their alignment they start at index 31 not 7

Did you use a different method for training? If not it could be possible that the network is mainly performing an accurate match between peptide-TCR. The HLAs are still being encoded, but not in a way which preserves the likely contact points.

Apologies if I've missed part of the implementation which addresses this!

What does the negative rank mean

Dear professor：
I find some rank of the result from pMTnet is -1e-04, is there any Bugs？

Strange characters in the testing_data.csv and training_data.csv

hi, I find some strange characters in the datasets you provided for training and testing. For example, in the 30 and 31 row of testing_data.csv, the antigen sequence seems to contain a strange Chinese word.

When I loaded this file with pandas, I found this character seems to be '\xa0'.

So, is this a mistake made in generating the files or '\xa0' could have some special meaning? Thank you.

Build my own background TCR

Thank you for your outstanding work. I prepare to use my own background in my project, and I read through your code and paper but I don not know how to do it. Can you get a reference pipeline for it? Thank you very much!

How to filter the result by Rank

Hi,
I know that 'A lower rank considered a good prediction', but how can I select the credible CDR3-Antigen from the output? Could you please provide thresholds or any filter methods?

Thanks

Labelled training data used in pMTnet

Thank you for a great tool! I am still pretty new in this field.

I would like to learn more about the training process on pMTnet. I am not sure if I missed the training data in the repository. Could you please provide the training data used in pMTnet with positive and negative labels (e.g. positive/TCR_output.csv, negative/TCR_output.csv, training_positive.csv)? Thank you so much for all your efforts!

Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria.

Hi, when I try to use pMTnet a waring appears looks like this：
2022-05-16 10:04:14.706952: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-05-16 10:04:16.631750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10791 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:86:00.0, compute capability: 3.7 2022-05-16 10:04:16.635577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10791 MB memory: -> device: 1, name: Tesla K80, pci bus id: 0000:87:00.0, compute capability: 3.7 2022-05-16 10:04:18.266973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8100 WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
And I found GPU did not help at all compared to CPU version.

Can you help me find what‘s wrong？ I want to speed up the prediction.I am using tensorflow2.7.0 and python 3.8.5.Thanks!!

Understanding the model predictions- not CDR3 rank not influenced by HLA or peptide

Hi,

I am trying to run pMTnet and notice that each CDR3 always has the same Rank, regardless of which HLA or Peptide it is compared to? These are Neoantigen Peptides so will never have been seen by the model. The run code is below:

python pMTnet.py -input ~/pMTnet_input.csv -library library -output test/output/CAR108predict -output_log test/output/CAR108predict/output.og

I downloaded pMTnet using git clone. Is there a bug or something specific I am not understanding?

Thanks for your help

How to calculate the AUC?

Hi,

I'm wondering how you calculate the AUC value since the output of pMTnet is the relative rank?

Thanks

TCR_encoder_30.h5 and other trained models in the repo

Is it possible to provide the code to generate 'TCR_encoder_30.h5' or other models that have been loaded in the repo?

Error Predict Score on Neg_10k Data

Hi, I want to ask about TCR if it is not in bg_tcr_library 10k, it will not rank and cause errors, right?
I have used rankings with different HLAs, some HLAs rank well and some have errors (the same Antigen)

Request for the support of MHC II molecule like "DRA1*02:01"

Hi Dr. Lu, it's really a fantastic tool.

When inputing: CAVNRGAQKLVF, MTEYKLVAKSAAI, DRA1*02:01, the website shows an error ("Unexcepted error happened.")

Would you please offer some help?

Cheers,
Yingcheng

about the testing data

Hello, When I was reading this paper"Attention-aware contrastive learning for predicting T cell receptor–antigen binding specificity", I found that the dataset involved in the paper came from your paper. But half of the 619 test cases described in the paper were positive and half were negative.
So, may I ask if all 619 test cases in the test set are positive? Or is it half positive and half negative? thanks

Prediction with unknown HLA

Dear Tianshi,

I want to examine the connection between peptides and TCR, but without the related HLA information. Do you have any suggestions?

Thank you so much for your attention and help.
Yingcheng

tensorflow.python.framework.errors_impl.InvalidArgumentError: input and filter must have the same depth: 76 vs 30

hi, I have a problem when I run the code: python pMTnet.py -input test/input/test_input.csv -library library -output test/output -output_log test/output/output.log
tensorflow.python.framework.errors_impl.InvalidArgumentError: input and filter must have the same depth: 76 vs 30

So, can you help me with this problem？thanks

Full TCR sequences of validation data

Thanks for this new tool and for the data you provide!

I am very interested to use the validation dataset you use to judge the performance of pMTnet, as I think you make a very valid point concerning the quality of data and its effect on model performance. However, I am currently using models that take the full TCR sequence into account. Do you maybe have the full TCR sequences or V and J gene usage information for the validation data?

tianshilu / pmtnet Goto Github PK

pmtnet's People

Contributors

Stargazers

Watchers

Forkers

pmtnet's Issues

Recommend Projects

Recommend Topics

Recommend Org