Giter VIP home page Giter VIP logo

calm's Introduction

CaLM

The Codon adaptation Language Model

This repository encapsulates all code required to reproduce the results of the paper "Codon language embeddings provide strong signals for use in protein engineering", by Carlos Outeiral and Charlotte M. Deane.

Citation

If you use our work, please cite:

Outeiral, Carlos, and Charlotte M. Deane. Codon language embeddings provide strong signals for use in protein engineering Nature Machine Intelligence 6.2 (2024): 170-179.

Installation

git clone https://github.com/oxpig/CaLM
python setup.py install

Usage

from calm import CaLM

model = CaLM()
model.embed_sequence('ATGGTATAGAGGCATTGA')

calm's People

Contributors

couteiral avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

calm's Issues

UnpicklingError When Loading Model Weights in training.py

Dear Developer Team,

I hope this message finds you well. I am reaching out to report an issue I encountered while working with your software, specifically when attempting to load a model that I trained using the training.py file. Upon executing the test code to utilize the trained model, I encountered an UnpicklingError that halted the process.

To provide a detailed context, here is the error message that was generated:

UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

This error occurred during the following operation in the code:

model = CaLM(weights_file='/home/hugeng/39-CaLM/production-run/latest-0.ckpt')

Thank you for your time and assistance. I look forward to your response and any suggestions you may offer to resolve this issue.

Best regards,

Geng Hu

image

The issue of data usage

Hello! I noticed that in the data you provided, some sequences do not begin with "ATG", for example, 'TTGAAAAGAAAAGCCAGTATCATGTTTGTCCATCAAGACAAGTACGAAGAATACAAACAGCGGCATGATGACATTTGGCCTGAGATGGCAGAAGCACTCAAAGCTCATGGAGCACACCATTATTCCATTTTTCTAGACGAGGAAACAGGCAGGCTTTTTGCATATTTAGAAATAGAGGATGAAGAGAAATGGAGAAAGATGGCGGACACGGAAGTTTGCCAAAGATGGTGGAAATCGATGGCGCCATTAATGAAAACAAATTCGGATTTCAGTCCTGTTGCGATAGATCTAAAGGAAGTTTTTTATTTGGATTGA'.
When tokenizing, should I discard the part before ATG and start from ATG, or should I just use the entire sequence as it is?
Similarly, when translating it into an amino acid sequence, should I translate the entire sequence directly or start translating from ATG?

Missing license

Great resource, however the repository does not contain a license, which makes it difficult to use/reuse.

FileNotFoundError

Dear Developer Team,

I am writing to seek your assistance regarding an issue I encountered while attempting to run the code associated with your paper titled "Codon language embeddings provide strong signals for use in protein engineering". When running the 'training.py' file, I encountered the following traceback error:

Traceback (most recent call last):
  File "training.py", line 141, in <module>
    ckpt_path='production-run/latest-56000.ckpt')
  ...
  FileNotFoundError: [Errno 2] No such file or directory: 'training_data.fasta'

It seems that the 'training_data.fasta' file is not found, leading to this error. I would greatly appreciate it if you could provide some guidance on how to address this issue.

Thank you very much for your time and consideration. I look forward to your valuable guidance.

Sincerely,
Geng Hu

Fine-tune on top of your pre-trained model

Can you please share your pytorch lightning model snapshot, so I can fine-tune a model on top of yours?

Currently you only share a weights file (calm_weights.pkl). I full pytorch lightning model snapshot allows me to run Trainer() from your location.

GPU device management

Hi and thanks again for the great work,

I will fix that in my local copy of the repo but as far as I see, the CaLM class in pretrained.py which is used for inference doesn't support a device argument and setting e.g. model.model.cuda() results in a conflict between the model device and the device of the tensors put in the forward method.

Environment installation issues

Hi @couteiral and thanks for sharing your work,

I tried to run python setup.py install but when I try to do from calm import CaLM the pytorch init triggers an error:
OSError: libnvJitLink.so.12: cannot open shared object file: No such file or directory

So I tried to build the env myself starting from a fresh conda environment, such that pytorch/cuda can be imported without issues. Then I install your repository within this env, all seems to go through and imports work well, however when I try to init the model, it downloads the weights but after that I get another cuda error:
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

Do you have any hints on how to finalise the installation please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.