Giter VIP home page Giter VIP logo

aagos's People

Contributors

emilydolson avatar leg2015 avatar mercere99 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

amlalejini

aagos's Issues

Fix Data node overlap entry

Currently, data for each overlap section is added as 1_overlap etc, but in python a variable can't start with a number so I have to go in and manually change all column headers. Come up with a versatile, streamlined way to overcome this issue.

Debug Gradient Model

  • convert the old scripts over to sbatch and see if it still works - #23
  • debug gradient stuff by hand
  • make the script for gradient
    • copy the same scripts from before
    • first need to figure out mutation rate - should be the same
    • then changing environments
  • Test gradient scripts

Test data tracking

  • Test overall data tracking

    • run a test example see if data output makes sense
  • Test histogram overlap calculation

  • Test avg # genes per site calculation

  • Test number of neighbor genes calculation

  • Retest overall data tracking

    • run a very small pop and see what output should be, compare to hand-calculated values to confirm all values are as intended

Segmentation Fault in Aagos with Small Genomes

  • Figure out what's causing segmentation fault
    • Bad params:
      • -NUM_BITS = 10
      • -NUM_GENES = 3
      • -POP_SIZE = 5
      • -GENE_SIZE = 3
    • throws a segmentation fault
  • Find why bin_array_offset is producing a negative number with small genomes
    • subtracts num_genes from min_size to get bin_array_offset, which I'm not sure is right

Histogram Size not Variable Enough

Currently, the number of bins in the histogram is defined as the number of genes + 1. However if that was ever to change the current way histogram bins are defined in AagosOrg and are accessed by the data tracking in AagosWorld would break. Would be good to find a way to generalize and clean this up. It does appear that this method is not working properly currently, which is preventing us from getting vital information about the organisms.

Streamline data visualization notebook

Currently, the visualization notebook script is not very versatile despite a lot of boilerplate code. Need to go in and automate and streamline much of the process.

Add data tracking

  • Decide on overlap metric

    • Overlap histogram
  • Finish data collection function

    • AagosOrg, adding tracking info
  • Add all statistics data tracking

    • histogram
    • neighboring genes
    • average gene overlap
  • histogram

    • Number of neutral sites
    • Number of single overlap sites
    • num sites with overlap of >1
  • Number of neighbor genes

  • Average gene overlap
    * Average number of genes per site
    * num_genes * gene_length / genome_length

  • Add snapshot of representative organism for statistics update

  • Add data tracking for snapshots

  • Snapshot org data
    * Gene start positions
    * Genome length
    * Fitness

  • Move data calculation to separate function that node pulls from only on snapshot updates

  • Hook in data collection to world creation

Create launch script for HPCC

need a bash script to launch jobs on cluster

  • bash script should launch 32 qsub calls
  • each qsub should run 40 Aagos instances
  • wall time: 4 hrs
  • num nodes: 1

Fitness Comparison Runs

  • Need to compare the fitness of organisms evolved in static environments to those from changing environments. Will do so across 2 different axes:

Axis 1

Gene locations

  • Will look at how locking down gene locations affects genome length
    1. lock down gene locations for comparison
    2. allow gene locations to mutate for comparison

Axis 2

Environment

  • Will compare static and changing with different environments
  • For all below runs, run both static and changing environments for 50,000 gens.
  • Below changes will occur after 50,000 gens and all runs will continue for another 10,000 gens.
    1. Fix the environment of changing environment runs to what it was at end of 50,000, basically environment now becomes static
    2. Give both static and changing env. organisms new randomly generated static environment
    3. Give the changing env. organisms the static env. that the static organisms had been evolving with

Paper To Do

  • literature review
  • start an overleaf doc to start writing down literature review notes
  • Talk to Dr. Ofria about what journal to focus on
    • PLOS Comp Bio - next easiest
    • Artificial Life - easiest
    • Target somewhere more bio-focused?
    • Evolution journal - slightly more prestigious than PLOS and the people we want to read it
    • PNAS - pretty moon shot
  • Once we've picked a target, lay out paper structure and length

Create Python Script to Clean Data

  • Python script to clean data
  • should loop through each directory for each mutation combo and add to one large csv
  • Want two scripts:
    1. Grabs data from:
    • representative.csv
    • gene_stats.csv
    • fitness.csv
    1. Grabs data from:
    • representative.csv
    • gene_stats.csv

Getting a malloc error when Aagos reaches MAX_GENS

Getting malloc error:
Aagos(31016,0x7fff7b538300) malloc: *** error for object 0x7fd4d1409040: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug Abort trap: 6
Happens whenever Aagos reaches the last generation. Looks like a pointer is getting freed before the last generation ends as both the print and snapshot log make it to gen 99 before the program terminates.

Find Slowdown in Aagos Runs

Certain Aagos runs are taking longer than the 4 hr walltime given for each job when submitted to the hpcc. This is problematic as the runs should not be taking this long. Trying to troubleshoot what's slowing down these specific runs so much.

  • Add gprof to runs to better analyze time performance of program
    • Added a new option to makefile to make Aagos with gprof enabled
  • Compare runs of Aagos that were originally slow to runs that finished before walltime
    • Were getting that the slow runs were taking the same amount of time as the faster runs
    • Discovered that the seed was not getting set early enough to make runs deterministic - were getting different values across runs
  • Fix setting seed in Aagos so it occurs before fitness tables constructed
    • fixed the seed setting in Aagos so it occurs on construction
    • Confirmed that across runs with the same seed values were identical
  • Rerun initial mutation rate test with new seed and gprof to pinpoint slowdown
    • Using Instruments on Mac, was able to pinpoint parameter combination that causes a slowdown on both the hpcc and my machine
      • GENE_MOVE_PROB = .03
      • BIT_FLIP_PROB = .1
      • BIT_INS_PROB = .003
      • BIT_DEL_PROB = .003
      • SEED = 4524
    • The area of code that seems to be the problem is the mutation function.

      Aagos/source/AagosWorld.h

      Lines 103 to 183 in 57dbc3e

      std::function<size_t(AagosOrg &, emp::Random &)> mut_fun =
      [this](AagosOrg &org, emp::Random &random) {
      // Do gene moves.
      size_t num_moves = random.GetRandBinomial(org.GetNumGenes(), config.GENE_MOVE_PROB());
      for (size_t m = 0; m < num_moves; m++)
      {
      size_t gene_id = random.GetUInt(org.GetNumGenes());
      org.gene_starts[gene_id] = random.GetUInt(org.GetNumBits());
      }
      // Do bit flips mutations
      size_t num_flips = random.GetRandBinomial(org.GetNumBits(), config.BIT_FLIP_PROB());
      for (size_t m = 0; m < num_flips; m++)
      {
      const size_t pos = random.GetUInt(org.GetNumBits());
      org.bits[pos] ^= 1;
      }
      // Get num of insertions and deletions.
      int num_insert = random.GetRandBinomial(org.GetNumBits(), config.BIT_INS_PROB());
      int num_delete = random.GetRandBinomial(org.GetNumBits(), config.BIT_DEL_PROB());
      const int proj_size = (int)org.bits.GetSize() + num_insert - num_delete;
      // checks gene size is within range
      if (proj_size > config.MAX_SIZE())
      { // if size of genome larger than max, restrict to max size
      num_insert -= proj_size - config.MAX_SIZE();
      }
      else if (proj_size < config.MIN_SIZE())
      { // else if size of genome smaller than min, restrict as well
      num_delete -= config.MIN_SIZE() - proj_size;
      }
      // asserts size limitations
      emp_assert((int)org.bits.GetSize() + num_insert - num_delete >= config.MIN_SIZE(),
      "the genome size can't be smaller than the genome length, else BitSet breaks");
      emp_assert((int)org.bits.GetSize() + num_insert - num_delete <= config.MAX_SIZE(), "some limit on bloat of program");
      // Do insertions
      for (int i = 0; i < num_insert; i++) // For each insertion that occurs,
      {
      const size_t pos = random.GetUInt(org.GetNumBits()); // Figure out position for insertion.
      org.bits.Resize(org.bits.GetSize() + 1); // Increase size to make room for insertion.
      emp::BitVector mask(pos, 1); // Setup a mask to preserve early bits.
      mask.Resize(org.bits.GetSize()); // Align mask size.
      // Now build the new string!
      org.bits = (mask & org.bits) | ((org.bits << 1) & ~mask);
      org.bits[pos] = random.P(0.5); // Randomize the new bit.
      // Shift any genes that started at pos or later.
      for (auto &x : org.gene_starts)
      if (x >= pos)
      x++;
      }
      // Do deletions
      for (int i = 0; i < num_delete; i++) // For each deletion that occurs,
      {
      size_t pos = random.GetUInt(org.GetNumBits()); // Figure out position to delete.
      emp::BitVector mask(pos, 1); // Setup a mask to preserve early bits.
      mask.Resize(org.bits.GetSize()); // Align mask size.
      org.bits = (mask & org.bits) | ((org.bits >> 1) & ~mask); // Build the new string!
      org.bits.Resize(org.bits.GetSize() - 1); // Decrease size to account for deletion
      // Shift any genes that started at pos or later.
      if (pos == 0)
      pos = 1; // Adjust position if beginning was deleted.
      for (auto &x : org.gene_starts)
      if (x >= pos)
      x--;
      }
      return num_moves + num_flips + num_insert + num_delete; // Returns total num mutations
      };
      SetMutFun(mut_fun); // set mutation function of world to above
      SetPopStruct_Mixed(true); // uses well-mixed population structure
      SetDataTracking(); // sets up data tracking
      }
      Not sure yet what the slowdown is caused by
  • Find and fix what is slowing down the code in the mutation function
    • Were able to pinpoint slowdown in the insert function in mut_fun
      • After gene_length exceeded 1000 saw a huge speedup
    • Slowdown was being caused by GetRandBinomial call
      • With N < 1000, binomial dist. was being done manually, very slow, 999 worst case
    • To fix, @mercere99 added Binomial.h class, which generates static distribution
    • @emilydolson updated code in AagosWorld to pull mutation values from these dists.
      • Now creates one dist for gene mutation
      • Creates a dist for each possible gene length for bit flip, insert, and deletion mutations
    • Time to finish runs that were taking minutes to hours before now only take ~13 seconds!

Model Changes

suggested by Claus Wilke at BEACON Congress 2018

  • Currently, gene representation doesn't have any agreement
  • no relationship between bits and gene fitness
  • want to see if shrinking effect due to smaller mutational target or lethal mutations / drift robustness?

To-Do:

  • need to add correlation between bit sequence in gene and fitness, currently 2 possible options:
    1. turn each gene into an NK landscape
    2. Correlate gene fitness to distance from optimal bit sequence via gradient

Coding Sites not adding up to correct values

  • The number of coding sites often adds up to greater than the total number of gene bits, which shouldn't be possible
  • Find out what is causing the values to not add up correctly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.