Giter VIP home page Giter VIP logo

deconpeaker's Introduction

DeconPeaker

DeconPeaker: a deconvolution method to estimate cell type proportions in chromatin accessibility data (ATAC-Seq), as well as gene expression data (RNA-Seq & Microarray). DeconPeaker_pipeline

Dependance and installation of DeconPeaker

Dependance

DeconPeaker's code is a mix of Python3.8 and R(4.0), which requires the following dependencies.

  • Python3.8:
    • Numpy
    • Scipy
    • Pandas
    • bx
    • Matplotlib
    • rpy2
  • R4.0:
    • pls
    • transport
    • colorRamps
    • MASS
  • Other tools (when excute preprocess and simulation steps):
    • bedtools
    • samtools
    • featureCounts

Installation

git clone [email protected]:lihuamei/DeconPeaker.git
cd DeconPeaker && pip install . && cd ..

How to use DeconPeaker?

usage: deconPeaker [-h] {preprocess,findctsps,deconvolution,simulation} ...

deconPeaker - a deconvolution model to identify cell types based on chromatin accessibility in ATAC-Seq data of mixture samples.

positional arguments:
  {preprocess,findctsps,deconvolution,simulation}
    preprocess          Create chromatin accessibility profile for pure samples, this step is the basis for subsequent specific cell type identification and mixture deconvolution. Note: This step only support Linux system, and if
                        you have a large sample size for pure cells, please ensure enough sufficient memory and hard storage space for program to run normally.
    findctsps           Find cell type specific peaks/genes accross pure samples, different pure cell samples require replicates as input.
    deconvolution       Based on pure cell profile information, robust regression deconvolution strategy was used to estimate the proportion of possible cell types in the mixed samples.
    simulation          Simulate mixed samples with different proportions of cells. [Note] The method is proportional random sampling of reads from different cell types of BAM/BED files.

optional arguments:
  -h, --help            show this help message and exit

Show an exmaple for deconvolution

deconPeaker deconvolution --lib-strategy=ATAC-Seq --mixture=DeconPeaker/test/examples/ATAC-Seq/GSE74912_Corces_MR_synthetic_mixture_counts_data.xls --pure=DeconPeaker/test/examples/ATAC-Seq/GSE74912_Corces_MR_pure_readcounts_signature_matrix.xls --format=TABLE --pvalue=FALSE --outdir=DeconPeaker/results/GSE74912_Corces_MR

INFO  @ Tue, 23 Apr 2024 14:49:47: Deconvolving......
[1] "[WARN] Lambda = 0, Log transfer for SIMPLS"
[1] "[WARN] Lambda = 0, Log transfer for SIMPLS"
[1] "[WARN] Lambda = 0, Log transfer for SIMPLS"
INFO  @ Tue, 23 Apr 2024 14:50:17: Showing deconPeaker results:
                 Ery       CLP      CD8T       GMP        NK       CMP      CD4T         B       MEP       MPP      LMPP       HSC      MONO  Rsquared     RMSEP  P.value
Sample_171  0.069005  0.002305  0.052112  0.067137  0.262002  0.013519  0.286919  0.080115  0.049956  0.000000  0.047004  0.065608  0.004319  0.866373  0.365072   9999.0
Sample_177  0.018484  0.111797  0.000000  0.047688  0.263781  0.066290  0.007585  0.049847  0.056509  0.041708  0.218438  0.074697  0.043177  0.928823  0.266408   9999.0
Sample_159  0.000289  0.286506  0.000000  0.090313  0.032724  0.016271  0.245973  0.071600  0.000000  0.191758  0.014806  0.046406  0.003353  0.964885  0.187197   9999.0
Sample_139  0.006446  0.008090  0.000000  0.000000  0.000000  0.011153  0.909165  0.000000  0.006506  0.000000  0.022982  0.029292  0.006367  0.898426  0.318128   9999.0
Sample_86   0.382248  0.165546  0.021309  0.015785  0.003639  0.090452  0.240281  0.024951  0.025254  0.022799  0.000000  0.007111  0.000624  0.952080  0.218717   9999.0
...              ...       ...       ...       ...       ...       ...       ...       ...       ...       ...       ...       ...       ...       ...       ...      ...
Sample_134  0.002720  0.000000  0.019477  0.019017  0.000000  0.017883  0.000000  0.000000  0.000000  0.082663  0.012248  0.834613  0.011378  0.901643  0.313398   9999.0
Sample_1    0.002720  0.000000  0.019477  0.019017  0.000000  0.017883  0.000000  0.000000  0.000000  0.082663  0.012248  0.834613  0.011378  0.901719  0.313285   9999.0
Sample_84   0.019543  0.005681  0.000000  0.008541  0.233237  0.324421  0.211760  0.000000  0.030085  0.000000  0.124828  0.041904  0.000000  0.911399  0.297173   9999.0
Sample_175  0.000891  0.059974  0.021029  0.000000  0.196285  0.253763  0.208002  0.000000  0.004527  0.064388  0.047768  0.136915  0.006459  0.908695  0.301642   9999.0
Sample_29   0.000000  0.044709  0.002532  0.141323  0.000000  0.028472  0.000000  0.000000  0.000000  0.000000  0.674772  0.064291  0.043901  0.980558  0.139116   9999.0

[195 rows x 16 columns]
INFO  @ Tue, 23 Apr 2024 14:50:17: Elapsed time is 29.629956245422363 seconds

More Information

Please see Tutorial.

Citation

Please cite the publication: Li H, Sharma A, Luo K, et al. DeconPeaker, a deconvolution model to identify cell types based on chromatin accessibility in ATAC-Seq data of mixture samples[J]. Frontiers in genetics, 2020, 11: 392.

deconpeaker's People

Contributors

lihuamei avatar lihuamei123 avatar

Stargazers

Joey avatar Fairlie Reese avatar zhangwei avatar balabala avatar

Watchers

 avatar

deconpeaker's Issues

setup.py

Hello,

Could you please add a setup.py file for installation?

Thank you.

mouse ATAC reference data

Hello,

I want to apply this method on mouse ATAC data, and I didn't find a reference data for mouse ATAC. So could you please add a mouse blood ATAC-reference data ? Or do you have an example for this?

Thank you.

installation requirenments.txt or yaml file

hi

I am trying to use this tool but installation is real hurdler can you please provide the yaml file
i tried with all the dependencies but getting this error
import-im6.q16: unable to open X server ' @ error/import.c/ImportImageCommand/346. import-im6.q16: unable to open X server ' @ error/import.c/ImportImageCommand/346.
./deconPeaker2.py: line 5: os.environ[R_HOME]: command not found
./deconPeaker2.py: line 6: os.environ[PATH]: command not found
import-im6.q16: unable to open X server ' @ error/import.c/ImportImageCommand/346. ./deconPeaker2.py: line 24: from: command not found ./deconPeaker2.py: line 29: from: command not found ./deconPeaker2.py: line 30: from: command not found ./deconPeaker2.py: line 31: from: command not found ./deconPeaker2.py: line 32: from: command not found ./deconPeaker2.py: line 37: syntax error near unexpected token ('
./deconPeaker2.py: line 37: `LOGS = log_infos() # logging informative'

Here is the deconpeaker2.py with R path as original deconpeaker.py was not able to access R without explicit R path

import os
import sys

Set R environment variables

os.environ['R_HOME'] = '/nfs/amishra/mambaforge/envs/deconpeaker'
os.environ['PATH'] = '/nfs/amishra/mambaforge/envs/deconpeaker/bin:' + os.environ['PATH']

Continue with the rest of the script

...

#!/usr/bin/env python
#title : deconPeaker.py
#description : Identification and deconvolution of cell types based on chromatin accessibility.
#author : Huamei Li
#date : 29/05/2018
#type : main script
#version : 3.8

#-----------------------------------------------------

load python modules

import numexpr
from time import time

#-----------------------------------------------------

load own python modules

from modules.deconv_mixed import *
from modules.simulate import *
from modules.peaks import *
from modules.parse_opts import parse_opts

#-----------------------------------------------------

global setting

LOGS = log_infos() # logging informative

numexpr.set_num_threads(numexpr.detect_number_of_cores())

#-----------------------------------------------------

def preprocess():
'''
Create chromatin accessibility profile for pure samples,
this step is the basis for subsequent specific cell type identification and mixture deconvolution.
:return: 0

'''
LOGS.info('Loading peaks......')
merged_files = mp_read_peaks(ARGS.infos.PEAK.values.tolist(), kargs=ARGS)

LOGS.info('Chosing a list of non-overlapping, maximally significant peaks')
nonovp_peakfil = remove_redundant_peakfile(
        merged_files              , 
        ARGS.infos.CELL.unique()  , 
        ARGS.prefix               , 
        ARGS.outdir
    )

if ARGS.offset:
    LOGS.info('Filtering out the peaks nearby the TSS (+/-{} bps)'.format(ARGS.offset))
nonovp_peakfil = remove_peaks_nearbytss(nonovp_peakfil, ARGS.hg_genome, offset=ARGS.offset)

peaknum = get_line_number(nonovp_peakfil)
LOGS.info('Counting the number of fragments from each sample falling into each of {} peaks'.format(peaknum))
multi_get_reads(
        ARGS.infos.BAM.values.tolist(), 
        nonovp_peakfil                , 
        ARGS                          , 
        prefix = ARGS.prefix + '_reference_count_matrix',
        outdir = ARGS.outdir          ,
        bg = False
    )
phenotype = os.path.join(ARGS.outdir, ARGS.prefix + '_phenotype_classes.xls')
LOGS.info('Writing phenootype class into {}'.format(phenotype))
write_phenotype_file(ARGS.infos, phenotype)
LOGS.info('Preprocess finished')
return 0

def findctsps():
'''
find cell type specific peaks for pure samples
:return: 0

'''
profile    = load_profile(ARGS.profile, ARGS.lib_strategy)
phenotypes = load_phenotypes(ARGS.phenotype)
LOGS.info('Loaded {} peaks and {} samples'.format(profile.shape[0], profile.shape[1] - 3))

ARGS.prefix = os.path.basename(ARGS.profile).rsplit('.', 1)[0]
LOGS.info('Normalizing pure profile by {} method to remove batch effects'.format(ARGS.norm))
profile_norm = normalize_profile(
        profile              , 
        ARGS.norm            , 
        profile.columns[3 : ],
        False                
    )

profile_norm = filter_weakpeaks(profile_norm)
curcnts, diffcnts = profile_norm.shape[0], profile.shape[0] - profile_norm.shape[0]
LOGS.info('Filtering out {} peaks and {} peaks have been remained'.format(diffcnts, curcnts))
del profile

LOGS.info('Identifying cell specific peaks accross pure cell profile')
markerpeaks = cellspecificpeaks(profile_norm, phenotypes, ARGS)

LOGS.info('Final number of cell specific peaks is {}'.format(markerpeaks.shape[0]))
return 0

def deconvolution():
'''
Based on pure cell type information, a deconvolution strategy was used to
calculate the proportion of possible cell types in the mixed sample.

'''
if ARGS.format == 'BAM':
    LOGS.info('Counting the number of reads/fragments in each peak in the bam files by featureCounts')
    ARGS.mixture = multi_get_reads(
        ARGS.infos.BAM.values.tolist() ,
        ARGS.pure                      ,
        ARGS                           ,
        prefix = 'mixed_sample_profile',
        outdir = ARGS.outdir           ,
        bg = False
    )
mixsamples, sigprofile = load_profile([ARGS.mixture, ARGS.pure], ARGS.lib_strategy)
LOGS.info('Deconvolving......')
results = deconvcells(
        mixsamples       , 
        sigprofile       , 
        ARGS.lib_strategy,
        ARGS.pvalue      ,
        ARGS.method      ,
        ARGS.norm        ,
        ARGS.outdir      
    )
LOGS.info('Showing deconPeaker results: ')
print(results)
return 0

def simulate():
'''
simulate mixture cell sampels by sampling reads of pure cells

'''
LOGS.info('Simulating......')
ARGS.mixture = random_proportions(
        ARGS.pure_infos, 
        ARGS.mixture   , 
        ARGS.rep_counts, 
        ARGS.prefix    , 
        ARGS.outdir
    )
multi_simulate_bams(ARGS) # simulate data with specified proportions
LOGS.info('Okay!')
return 0

def run():
'''
deconPeaker main funcion and contains four sections, pureprofile, identifycells, deconvolution and simulation, respectivly
:return: stat [int]

'''
global ARGS
ARGS, start  = parse_opts(), time()
funcs, modes = [preprocess, findctsps, deconvolution, simulate], \
               ['preprocess', 'findctsps', 'deconvolution', 'simulation']
ARGS.tmpdir = tmpdir = __import__('tempfile').mkdtemp('_deconPeaker')
try:    
    stat = funcs[modes.index(ARGS.sub_parser)]()
finally:
    __import__('shutil').rmtree(tmpdir) if os.path.exists(tmpdir) else 0
LOGS.info('Elapsed time is {} seconds'.format(time() - start))
return stat

if name == 'main':
sys.exit(run())

original deconpeaker.py is giving this error

(deconpeaker) Singularity> ./deconPeaker.py
Unable to determine R library path: [Errno 2] No such file or directory: '/opt/R/4.2.3/lib/R/bin/Rscript'
cannot find system Renviron
Fatal error: unable to open the base package

deconPeaker.py does not work

Hi Huamei,

I'm trying to predict cell proportions with your tools for ATAC-seq data and it does not work even I'm running the example data.
The command with issue is "deconPeaker.py deconvolution --lib-strategy=ATAC-Seq --mixture=test\examples\ATAC-Seq\GSE74912_Corces_MR_synthetic_mixture_counts_data.xls --pure=test\examples\ATAC-Seq\GSE74912_Corces_MR_pure_readcounts_signature_matrix.xls --format=TABLE --pvalue=FALSE --outdir=results\GSE74912_Corces_MR".
It does not generate a result folder and results, it finishes quickly and seems like not running but no error is shown. Is there something I did wrong or issues with the script? Look forward to your reply

Jingyu

lm.fit error

(/home/fyan0011/mx82/fyan0011/conda_env/DeconPeaker) [fyan0011@m3-login2 DeconPeaker]$ python deconPeaker.py deconvolution --lib-strategy=ATAC-Seq --mixture=/home/fyan0011/ls25_scratch/feng.yan/Immgen/diffbind2/output/mix_5k_chr.xls --pure=/home/fyan0011/ls25_scratch/feng.yan/Immgen/diffbind2/output/pure_mean_5k_chr.xls --format=TABLE --pvalue=FALSE --outdir=Lmo2
INFO  @ Tue, 16 Nov 2021 22:34:05: Deconvolving......
WARNING @ Tue, 16 Nov 2021 22:34:07: R[write to console]: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  0 (non-NA) cases

Traceback (most recent call last):
  File "deconPeaker.py", line 167, in <module>
    sys.exit(run())
  File "deconPeaker.py", line 160, in run
    stat = funcs[modes.index(ARGS.sub_parser)]()
  File "deconPeaker.py", line 118, in deconvolution
    results = deconvcells(
  File "/fs03/ls25/feng.yan/software/DeconPeaker/modules/deconv_mixed.py", line 122, in deconvcells
    deconv_results = deconv(mixsamples, sigprofile, method=method, pvalue=pvalue)
  File "/fs03/ls25/feng.yan/software/DeconPeaker/modules/build_models.py", line 87, in deconv
    for sn in mixsnames: deconv_results.append(SIMPLS(Y[sn], X, method, pvalue))
  File "/fs03/ls25/feng.yan/software/DeconPeaker/modules/build_models.py", line 38, in SIMPLS
    r('''
  File "/home/fyan0011/mx82/fyan0011/conda_env/DeconPeaker/lib/python3.8/site-packages/rpy2/robjects/__init__.py", line 438, in __call__
    res = self.eval(p)
  File "/home/fyan0011/mx82/fyan0011/conda_env/DeconPeaker/lib/python3.8/site-packages/rpy2/robjects/functions.py", line 198, in __call__
    return (super(SignatureTranslatedFunction, self)
  File "/home/fyan0011/mx82/fyan0011/conda_env/DeconPeaker/lib/python3.8/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "/home/fyan0011/mx82/fyan0011/conda_env/DeconPeaker/lib/python3.8/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "/home/fyan0011/mx82/fyan0011/conda_env/DeconPeaker/lib/python3.8/site-packages/rpy2/rinterface.py", line 680, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  0 (non-NA) cases

Hi team,
Thanks for developing this tool. However, I have encountered this error when running this. (the test data run ok).

My signature table looks like below:

>>> profile = pd.read_csv("/home/fyan0011/ls25_scratch/feng.yan/Immgen/diffbind2/output/pure_mean_5k_chr.xls", sep='\t', header=0)
>>> profile.head()
  chrom      start        end   B.Fem.Sp    B.Fo.Sp   B.FrE.BM  B.GC.CB.Sp  ...  Tgd.g2+d1.24a+.Th  Tgd.g2+d1.LN  Tgd.g2+d17.24a+.Th  Tgd.g2+d17.LN     Tgd.Sp  Treg.4.25hi.Sp  Treg.4.FP3+.Nrplo.Co
0  chr9   39847458   39848851  12.578028  12.578028  12.578028   12.578028  ...          12.578028     12.578028           12.578028      12.578028  12.578028       12.578028             12.578028
1  chr1  193200519  193202061   9.025825   9.025825   9.025825    9.025825  ...           9.025825      9.025825            9.025825       9.025825   9.025825        9.025825              9.025825
2  chr3  106083793  106085556   9.187085   9.187085   9.187085    9.187085  ...           9.187085      9.187085            9.187085       9.187085   9.187085        9.187085              9.187085
3  chr6   67715154   67716896   9.858280   9.858280   9.858280    9.858280  ...           9.858280      9.858280            9.858280       9.858280   9.858280        9.858280              9.858280
4  chr9   48408865   48410729   9.550301   9.550301   9.550301    9.550301  ...           9.550301      9.550301            9.550301       9.550301   9.550301        9.550301              9.550301

[5 rows x 96 columns]
>>> profile.dtypes
chrom                    object
start                     int64
end                       int64
B.Fem.Sp                float64
B.Fo.Sp                 float64
                         ...
Tgd.g2+d17.24a+.Th      float64
Tgd.g2+d17.LN           float64
Tgd.Sp                  float64
Treg.4.25hi.Sp          float64
Treg.4.FP3+.Nrplo.Co    float64
Length: 96, dtype: object

I would really appreciate it if you could help.

Cheers,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.