The evogenomics_hyphy from veg

BUSTED conclusions do not match tutorial

Greetings, and thank you so much for the great Spielman et al. 2017 tutorial! As I'm working through using HYPHY 2.5.62(MP) for Darwin on arm64 ARM Neon SIMD zlib (v1.2.12), I get very different results for the first exercise. Specifically, I get the following output for 1 Selection / 5 BUSTED:

>Select a coding sequence alignment file (`/Users/chasenelson/Documents/GitHub/evogenomics_hyphy/datasets/`) ksr2.fna
>Loaded a multiple sequence alignment with **9** sequences, **949** codons, and **1** partitions from `/Users/chasenelson/Documents/GitHub/evogenomics_hyphy/datasets/ksr2.fna`

>branches => All

>srv => Yes
The number omega rate classes to include in the model (permissible range = [1,10], default value = 3, integer): 
>rates => 3

>multiple-hits => None
The number alpha rate classes to include in the model [1-10, default 3] (permissible range = [1,10], default value = 3, integer): 
>syn-rates => 3

>error-sink => No
The number of points in the initial distributional guess for likelihood fitting (permissible range = [1,10000], default value = 250, integer): 
>grid-size => 250
The number of initial random guesses to 'seed' rate values optimization (permissible range = [1,25], default value = 1, integer): 
>starting-points => 1


### Branches to test for selection in the BUSTED analysis
* Selected 15 branches to test in the BUSTED analysis: `HUM, PAN, Node6, GOR, Node5, PON, Node4, GIB, Node3, MAC, BAB, Node12, Node2, MAR, BUS`


### Obtaining branch lengths and nucleotide substitution biases under the nucleotide GTR model

>kill-zero-lengths => Yes
* Log(L) = -5768.11, AIC-c = 11582.26 (23 estimated parameters)
* 1 partition. Total tree length by partition (subs/site)  0.121

### Obtaining the global omega estimate based on relative GTR branch lengths and nucleotide substitution biases
* Log(L) = -5342.82, AIC-c = 10745.85 (30 estimated parameters)
* 1 partition. Total tree length by partition (subs/site)  0.134
* non-synonymous/synonymous rate ratio for *test* =   0.0341

### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model
* Log(L) = -5333.75, AIC-c = 10727.71 (30 estimated parameters)
* non-synonymous/synonymous rate ratio for *test* =   0.0306

### Performing the full (dN/dS > 1 allowed) branch-site model fit
* Log(L) = -5300.23, AIC-c = 10678.82 (39 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    5.581    |                                   |
|        Negative selection         |     0.012     |   94.046    |                                   |
|      Diversifying selection       |     9.781     |    0.374    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.398               |    32.899     |                                   |
|               0.420               |    52.826     |                                   |
|               4.533               |    14.275     |                                   |


### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -5301.39, AIC-c = 10679.13 (38 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   89.934    |                                   |
|        Negative selection         |     0.000     |    6.759    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |    3.307    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.381               |    85.219     |                                   |
|               3.603               |    13.886     |                                   |
|              19.555               |     0.895     |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.1564**.

I note in particular that the conclusion differs from that reached in the tutorial, i.e. I get p = 0.1564 rather than p = 0.0015. My gut says this is due to recent advances in synonymous rate variation, whose incorporation reveals that the signal of positive selection is not significant. However, it would also be great to have some reassurance that this result is reproducible and not something going wrong with my installation/machine. Would be very grateful for any feedback / verification / understanding!

With gratitude,
Chase

veg / evogenomics_hyphy Goto Github PK

evogenomics_hyphy's People

Contributors

Stargazers

Watchers

evogenomics_hyphy's Issues

BUSTED conclusions do not match tutorial

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent