Giter VIP home page Giter VIP logo

sctranslator's Introduction

scTranslator: A pre-trained large language model for translating single-cell transcriptome to proteome

Despite the recent advancements in single-cell proteome technology, it still has limitation on throughput, proteome depth and batch effect and the cost is still high. Inspired by the nature language translation and the central dogma of molecular biology, we propose a pre-trained large language model named scTranslator (single-cell translator), which is align-free and generates absent single-cell proteome by inferring from the transcriptome.

Dataset

The data can be downloaded from this link. If you have any question, please contact [email protected].

https://drive.google.com/drive/folders/1gTs9-wlKL0WhyQjSAUo0RK4FVuwwJPK9?usp=sharing

Checkpoint

The pre-trained model checkpoint can be downloaded from this link. If you have any question, please contact [email protected].

https://drive.google.com/drive/folders/1Grd8IgVH_baN4tKUTxc0RPufVPZE8Vvp?usp=sharing

Results

The results for analysis with jupyter demo can be downloaded from this link. If you have any question, please contact [email protected].

https://drive.google.com/drive/folders/1dcDNmRqhntJGLC-2Qu7C1eSotb7J_Hgq?usp=sharing

Installation and Usage

python >3.8.13 scipy-1.6.2 pytorch-1.12.1 numpy-1.21.5 pandas-1.2.4 scanpy-1.9.1 scikit--learn-1.1.1 local--attention-1.4.3

Environment preparation

The environment for scTranslator can be obtained from the Docker Hub registry or by installing the dependencies with requirement.txt.

Option 1:

Download the docker image from Docker Hub.

$ docker pull linjingliu/sctranslator:latest

Start a container based on the image and ativate the enviroment .

$ docker run --name sctranslator --gpus all -it --rm linjingliu/sctranslator:latest /bin/bash

Option 2:

Utilize conda to create and activate a environment.

$ conda create performer
$ conda activate performer

Install the necessary dependencies

$ conda install requirements.txt

Installation

This usually takes 5 seconds on a normal desktop computer.

$ git clone [email protected]:TencentAILabHealthcare/scTranslator.git

Download datasets and checkpoint from provided links and place to the corresponding folder in scTranslator.

Usage

  1. Activate the environment and switch to scTranslator folder.
$ conda activate performer
$ cd scTranslator
  1. Demo for protein abundance prediction with or without fine-tuning. The results, comprising both protein abundance and performance metrics, are stored in the 'scTranslator/result/test' directory.
# Inferrence without fine-tune
$ python code/stage3_inference_without_finetune.py \
--pretrain_checkpoint='checkpoint/stage2_single-cell_scTranslator.pt' \
--RNA_path='dataset/test/dataset1/GSM5008737_RNA_finetune_withcelltype.h5ad' \
--Pro_path='dataset/test/dataset1/GSM5008738_protein_finetune_withcelltype.h5ad'
# Inferrence with fine-tune
$ python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node 1 --master_port 23333 \
code/stage3_fine-tune.py  --epoch=100 --frac_finetune_test=0.1 --fix_set \
--pretrain_checkpoint='checkpoint/stage2_single-cell_scTranslator.pt' \
--RNA_path='dataset/test/dataset1/GSM5008737_RNA_finetune_withcelltype.h5ad' \
--Pro_path='dataset/test/dataset1/GSM5008738_protein_finetune_withcelltype.h5ad'
  1. Demo for obtaining attention matrix.
$ python code/attention_matrix.py \
--pretrain_checkpoint='checkpoint/Dataset1_fine-tuned_scTranslator.pt' \
--RNA_path='dataset/test/dataset1/GSM5008737_RNA_finetune_withcelltype.h5ad' \
--Pro_path='dataset/test/dataset1/GSM5008738_protein_finetune_withcelltype.h5ad'
  1. Demo for pseudo-knockout gene.
# Compute origin protein abundance
$ python code/pseudo_knockout_gene.py --gene='org'
# Compute protein abundance after pseudo-knockout gene
$ python code/pseudo_knockout_gene.py --gene='TP53' 

Time cost

The anticipated runtime for inferring 1000 proteins in 100 cells is approximately 20 seconds using a 16GB GPU and 110 seconds with a CPU.

Disclaimer

This tool is for research purpose and not approved for clinical use.

This is not an official Tencent product.

Coypright

This tool is developed in Tencent AI Lab.

The copyright holder for this project is Tencent AI Lab.

All rights reserved.

sctranslator's People

Contributors

cchen22 avatar elaineliu-920 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.