Giter VIP home page Giter VIP logo

evogenomics_hyphy's People

Contributors

sjspielman avatar spond avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

evogenomics_hyphy's Issues

BUSTED conclusions do not match tutorial

Greetings, and thank you so much for the great Spielman et al. 2017 tutorial! As I'm working through using HYPHY 2.5.62(MP) for Darwin on arm64 ARM Neon SIMD zlib (v1.2.12), I get very different results for the first exercise. Specifically, I get the following output for 1 Selection / 5 BUSTED:

>Select a coding sequence alignment file (`/Users/chasenelson/Documents/GitHub/evogenomics_hyphy/datasets/`) ksr2.fna
>Loaded a multiple sequence alignment with **9** sequences, **949** codons, and **1** partitions from `/Users/chasenelson/Documents/GitHub/evogenomics_hyphy/datasets/ksr2.fna`

>branches => All

>srv => Yes
The number omega rate classes to include in the model (permissible range = [1,10], default value = 3, integer): 
>rates => 3

>multiple-hits => None
The number alpha rate classes to include in the model [1-10, default 3] (permissible range = [1,10], default value = 3, integer): 
>syn-rates => 3

>error-sink => No
The number of points in the initial distributional guess for likelihood fitting (permissible range = [1,10000], default value = 250, integer): 
>grid-size => 250
The number of initial random guesses to 'seed' rate values optimization (permissible range = [1,25], default value = 1, integer): 
>starting-points => 1


### Branches to test for selection in the BUSTED analysis
* Selected 15 branches to test in the BUSTED analysis: `HUM, PAN, Node6, GOR, Node5, PON, Node4, GIB, Node3, MAC, BAB, Node12, Node2, MAR, BUS`


### Obtaining branch lengths and nucleotide substitution biases under the nucleotide GTR model

>kill-zero-lengths => Yes
* Log(L) = -5768.11, AIC-c = 11582.26 (23 estimated parameters)
* 1 partition. Total tree length by partition (subs/site)  0.121

### Obtaining the global omega estimate based on relative GTR branch lengths and nucleotide substitution biases
* Log(L) = -5342.82, AIC-c = 10745.85 (30 estimated parameters)
* 1 partition. Total tree length by partition (subs/site)  0.134
* non-synonymous/synonymous rate ratio for *test* =   0.0341

### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model
* Log(L) = -5333.75, AIC-c = 10727.71 (30 estimated parameters)
* non-synonymous/synonymous rate ratio for *test* =   0.0306

### Performing the full (dN/dS > 1 allowed) branch-site model fit
* Log(L) = -5300.23, AIC-c = 10678.82 (39 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    5.581    |                                   |
|        Negative selection         |     0.012     |   94.046    |                                   |
|      Diversifying selection       |     9.781     |    0.374    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.398               |    32.899     |                                   |
|               0.420               |    52.826     |                                   |
|               4.533               |    14.275     |                                   |


### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -5301.39, AIC-c = 10679.13 (38 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   89.934    |                                   |
|        Negative selection         |     0.000     |    6.759    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |    3.307    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.381               |    85.219     |                                   |
|               3.603               |    13.886     |                                   |
|              19.555               |     0.895     |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.1564**.

I note in particular that the conclusion differs from that reached in the tutorial, i.e. I get p = 0.1564 rather than p = 0.0015. My gut says this is due to recent advances in synonymous rate variation, whose incorporation reveals that the signal of positive selection is not significant. However, it would also be great to have some reassurance that this result is reproducible and not something going wrong with my installation/machine. Would be very grateful for any feedback / verification / understanding!

With gratitude,
Chase

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.