Giter VIP home page Giter VIP logo

gwidecodeml's Introduction

GWideCodeML

GWideCodeML is a Python package that provides support for testing evolutionary hypothesis using codeml (from the PAML package) in a genome-wide framework.

For further information on installation and usage, please visit https://github.com/lauguma/GWideCodeML/wiki

Installation

Option 1:

  1. Download GWideCodeML
    git clone https://github.com/lauguma/GWideCodeML.git
    cd GWideCodeML

  2. Install
    python setup.py install

  3. Run GWideCodeML: if succesfull installation, gwidecodeml executable is created. You can check it by writing in your console:
    gwidecodeml -h

Note: if you choose this option, all requirements must be satisfaied before running GWideCodeML (e.g. codeml must be installed and available from the working directory), see Requirements section.

Option 2: conda environment

(easier, recommended option)

  1. Install and initialize miniconda
    (skip in case you already have a conda env and know how it works)
    https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
    close and open a new console before continue

  2. Download GWideCodeML
    git clone https://github.com/lauguma/GWideCodeML.git
    cd GWideCodeML

  3. Create a conda environment from yml file
    conda env create -f gwidecodeml_conda.yml

  4. Activate conda environment
    conda activate gwcodeml

  5. Install and run GWideCodeML
    python setup.py install
    gwidecodeml -h

Requirements

Python >= 3.5

Python libraries

  • Biopython
  • Scipy
  • ete3

Software

Citation

Macías L. G., Barrio E. and Toft. C. "GWideCodeML: a Python package for testing evolutionary hypothesis at the genome-wide level" G3: Genes, Genomes, Genetics (2020) doi:10.1534/g3.120.401874.

Our pipeline uses third-party software:

Yang, Z. "PAML 4: a program package for phylogenetic analysis by maximum likelihood." Mol Biol Evol (2017) doi: 10.1093/molbev/msm088

Huerta-Cepas, J., Serra, F and Bork, P. "ETE 3: Reconstruction, analysis and visualization of phylogenomic data." Mol Biol Evol (2016) doi: 10.1093/molbev/msw046

gwidecodeml's People

Contributors

lauguma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gwidecodeml's Issues

WARNING :: LnL value was not found

Hi there,

GWideCodeML successfully run. However, the final results data in .tsv were empty without any real data. Warning occurred as shown in gwidecodeml.log, see below:
WARNING :: LnL value was not found at x.alt.txt/ x.null.txt for all genes analyzed.
That's the reason why no data was generated in the final output files. I have checked the input files, nothing seems wrong to me. FYI, I have posted the log file below. Could anyone help me out? Thanks a lot!

part of gwidecodeml.log, as below:
INFO :: Starting GWideCodeML analysis... ROUND NAME: r01
WARNING :: Found duplicated genome tags in fasta files. 1 fasta files won't be analyzed
WARNING :: Discarded fasta files: ['group_1030.fna']
INFO :: These alignments won't be analyzed:
INFO :: group_1030.fna
INFO :: Starting analysis of 22 alignments
INFO :: Control files successfully created, GWidecodeml is ready for codeml performance
INFO :: Running codeml null hypothesis on btuD_2
INFO :: Running codeml alternative hypothesis on btuD_2
...(same for the remaining genes, not shown here)
INFO :: Codeml has finished, output files successfully created. Running LRTs...
WARNING :: LnL value was not found at btuD_2_r01_alt.txt. Please, check.
WARNING :: LnL value was not found at btuD_2_r01_null.txt. Please, check.
WARNING :: LRT not performed because the value LnL is missing in btuD_2_r01_alt.txt btuD_2_r01_null.txt files
....(same for the remaining genes, not shown here)
INFO :: Total nr. of genes rejecting null hypothesis: 0
INFO :: dnds option selected. Omega values will be written to an output file.
INFO :: Round name r01 finished
INFO :: GWideCodeML successfully run. Please, check results files: ['results_r01_SM.tsv']

Thanks!
Li

Clade model addition to presets

Hey Dev, just came across this nice tool that I'm starting to try, and as a feature request comment, it would be handy to count with a fourth preset nested analysis for Clade Model (M2a_rel vs Clade model C). ;)

there is an error when i try to run GWideCodeML

Traceback (most recent call last):
File "/home/zhangyanan/miniconda3/envs/gwcodeml/bin/gwidecodeml", line 33, in
sys.exit(load_entry_point('gwidecodeml==1.1', 'console_scripts', 'gwidecodeml')())
File "/home/zhangyanan/miniconda3/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/main.py", line 82, in main
File "/home/zhangyanan/miniconda3/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/pkg_utils/utils.py", line 55, in dup_tags
File "/home/zhangyanan/miniconda3/envs/gwcodeml/lib/python3.6/site-packages/biopython-1.78-py3.6-linux-x86_64.egg/Bio/SeqIO/init.py", line 607, in parse
return iterator_generator(handle)
File "/home/zhangyanan/miniconda3/envs/gwcodeml/lib/python3.6/site-packages/biopython-1.78-py3.6-linux-x86_64.egg/Bio/SeqIO/FastaIO.py", line 183, in init
super().init(source, mode="t", fmt="Fasta")
File "/home/zhangyanan/miniconda3/envs/gwcodeml/lib/python3.6/site-packages/biopython-1.78-py3.6-linux-x86_64.egg/Bio/SeqIO/Interfaces.py", line 47, in init
self.stream = open(source, "r" + mode)

'Nodes are not connected'

I'm using a species tree generated by OrthoFinder and the genome tags correspond to the names in my codon-aware alignment file. Here's the command and the output:

gwidecodeml -tree SpeciesTree_unrooted.txt -work_dir ~/HGT/flexible.dNdS/Orthogroup_Sequences/single.copy.fasta/codon.aware.alignments -cds .pal2nal -model BS -p 5 -branch branch_labels.csv -dnds

Traceback (most recent call last):
  File "/home/janani/anaconda3/envs/gwcodeml/bin/gwidecodeml", line 33, in <module>
    sys.exit(load_entry_point('gwidecodeml==1.1', 'console_scripts', 'gwidecodeml')())
  File "/home/janani/anaconda3/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/__main__.py", line 134, in main
  File "/home/janani/anaconda3/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/pkg_utils/utils.py", line 136, in prepare_codeml
  File "/home/janani/anaconda3/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/pkg_utils/utils.py", line 527, in prune_tree
  File "/home/janani/anaconda3/envs/gwcodeml/lib/python3.6/site-packages/ete3/coretype/tree.py", line 532, in prune
    start, node2path = self.get_common_ancestor(to_keep, get_path=True)
  File "/home/janani/anaconda3/envs/gwcodeml/lib/python3.6/site-packages/ete3/coretype/tree.py", line 928, in get_common_ancestor
    raise TreeError("Nodes are not connected!")
ete3.coretype.tree.TreeError: 'Nodes are not connected!'

Happy to send any files that might be helpful in troubleshooting.

Empty null.txt and alt.txt files for some sequences

Hello everyone!

While reviewing the folders generated after running the GWideCodeML, I noticed that empty files alt.txt and null.txt were created for many sequences. Because of this, it is not possible to generate LRT tests for these files, and I also don't know their dN/dS value. I checked different sequences for which these files were generated, and when compared to those for which they were not, I cannot discern any differences. Has anyone experienced something similar? Thanks in advance.

how to set only one species as foreground

Dear lauguma,

WGideCodeML is a great work. However, when I set only one species as foreground, error appears as below:

Traceback (most recent call last):
File "/DATA5/software/miniconda/envs/gwcodeml/bin/gwidecodeml", line 33, in
sys.exit(load_entry_point('gwidecodeml==1.1', 'console_scripts', 'gwidecodeml')())
File "/DATA5/software/miniconda/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/main.py", line 134, in main
File "/DATA5/software/miniconda/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/pkg_utils/utils.py", line 141, in prepare_codeml
File "/DATA5/software/miniconda/envs/gwcodeml/lib/python3.6/site-packages/gwidecodeml-1.1-py3.6.egg/gwidecodeml/pkg_utils/utils.py", line 517, in mark_branches
AttributeError: 'str' object has no attribute 'node_id'

This error alos happens in the example, when only one species selected as foreground. How to set only one species as foreground ?

Best,
Leon
[email protected]

GWideCodeML is slow

I tried running the example and after one hour it did not ended. I then removed most some fna leaving just four alignments and it still never ends.

When I check one of the gene folders this is what I get:
total 56
-rw-r--r-- 1 r users 0 May 12 13:04 2NG.dN
-rw-r--r-- 1 r users 0 May 12 13:04 2NG.dS
-rw-r--r-- 1 r users 0 May 12 13:04 2NG.t
lrwxrwxrwx 1 r users 29 May 29 2020 codeml.ctl -> /usr/lib/paml/data/codeml.ctl
-rw-r--r-- 1 r users 0 May 12 13:04 lnf
-rw-r--r-- 1 r users 0 May 12 13:04 rst
-rw-r--r-- 1 r users 0 May 12 13:04 rst1
-rw-r--r-- 1 r users 0 May 12 13:04 rub
-rw-r--r-- 1 r users 2202 May 12 13:04 YBL102W_alt.ctl
-rw-r--r-- 1 r users 2203 May 12 13:04 YBL102W_null.ctl
-rw-r--r-- 1 r users 17559 May 12 13:04 YBL102W.phy
-rw-r--r-- 1 r users 22387 May 12 13:04 YBL102W_r01_null.txt
-rw-r--r-- 1 r users 230 May 12 13:04 YBL102W.tree

Log file :
2022-05-12 13:04:21,361 :: INFO :: Working directory: /home/r/Software/GWideCodeML/example_data
2022-05-12 13:04:21,361 :: INFO :: Multiple branch testing selected: 2 branches will be analyzed as foreground branches.
2022-05-12 13:04:21,361 :: INFO :: Total number of alignments provided: 4
2022-05-12 13:04:21,362 :: INFO :: Model selected: BS
2022-05-12 13:04:21,362 :: INFO :: Species tree: /home/r/Software/GWideCodeML/example_data/spp_unrooted_tree.nwk
2022-05-12 13:04:21,362 :: INFO :: Nr. of threads: 6
2022-05-12 13:04:21,362 :: INFO :: Starting GWideCodeML analysis... ROUND NAME: r01
2022-05-12 13:04:21,371 :: INFO :: Applying filters...
2022-05-12 13:04:21,371 :: INFO :: Minimum number of outgroup species: 3
2022-05-12 13:04:21,373 :: INFO :: 0 alignments removed by min. outgroup filter.
2022-05-12 13:04:21,373 :: INFO :: Minimum number of clade-of-interest species: 3
2022-05-12 13:04:21,376 :: INFO :: 0 alignments removed by min. clade-of-interest filter.
2022-05-12 13:04:21,376 :: INFO :: Starting analysis of 4 alignments
2022-05-12 13:04:21,443 :: INFO :: Control files successfully created, GWidecodeml is ready for codeml performance
2022-05-12 13:04:21,468 :: INFO :: Running codeml null hypothesis on YFR016C
2022-05-12 13:04:21,468 :: INFO :: Running codeml null hypothesis on YEL063C
2022-05-12 13:04:21,469 :: INFO :: Running codeml null hypothesis on YBR167C
2022-05-12 13:04:21,469 :: INFO :: Running codeml null hypothesis on YBL102W

Any suggestions on how to make the software faster ? If I provide more threads even though only four alignments are given would it help ?

Thank you for this tool anyway !!

UnboundLocalError: local variable 'log' referenced before assignment

Hi,

I was trying to use GWideCodeML. I got the following error: UnboundLocalError: local variable 'log' referenced before assignment.

I took a piece of code -- for instance function lnl(codeml_in) -- and then reran with and without initializing local variable "log" in the function lnl((), I was able reproduce the error by not initializing the log variable.

Can you let me know if there is a bug (the one I mention) in the program?

Best,
Niraj

UnicodeDecodeError when installing

Hi!

I'm following the conda install instructions, however I get

Traceback (most recent call last):                                                                                                                                  
  File "setup.py", line 7, in <module>                                                                                                                              
    long_description = fh.read()                                                                                                                                    
  File "/miniconda3/envs/gwcodeml/lib/python3.6/encodings/ascii.py", line 26, in decode                                                   
    return codecs.ascii_decode(input, self.errors)[0]                                                                                                               
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1727: ordinal not in range(128)      

The cause seems to be some characters in the README.md file. I think it's the í Macías, when I delete that it works.

Difference in gene number information in the log file and the results_r01_BS.tsv

Hi Laura,

In the log file (gwidecodeml.log), I see the following information.

INFO :: Total nr. of genes rejecting null hypothesis: 1033

However, when I check the results_r01_BS.tsv file there are only 239 genes.

Could you please let me know if the number of genes reported in the results_r01_BS.tsv file has to equal to be equal to the number shown in the log file?

Regards,
Niraj

outputs without .tsv file

Hi,
Congratulations on the program! It is a great contribution to work with genomic data.
I have run the branch model several times and always I get the outputs but not the .tsv file with the LRTs. Should I indicate a special flag or should it be generated automatically?

The analysis I ran it as follows:
gwidecodeml -model BM -work_dir $WD -cds .cd.msa -tree species_tree.nwk -dnds -branch branch1.csv -p 10

Best,
Daly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.