Giter VIP home page Giter VIP logo

namegen's Introduction

Namegen

Three drivers are available:

  • SQLite - persists as sqlite database
    • Safe storage
    • Slow
    • Crash Resistant
      • Safe if crash happens outside a call
      • Relies on rollback if crash happens in call
  • Memory - persists as compressed file on disk
    • Saves as compressed file on disk
      • Perhaps not as portable
    • Much Faster
    • Constant memory use
    • Not crash safe
  • Hybrid
    • Pretty much as fast as the memory one
    • Offloads saving to another thread (SQLite)
    • Uses more and more memory
    • Lags behind generation
    • Not crash safe
      • Saving lags behind
    • Should be interoperable with SQLite driver

Environment Variables

The workdir in Dockerfile is /opt/namegen

NAMEGEN_FILE ( default: wordlist.csv )
- This file specifies names for generation

NAMEGEN_DB_FILE ( default: namegen.db )
- Database file for namegen
- Hybrid and SQLite drivers only

NAMEGEN_MEMORY_STATE ( default: namegen_memory_state )
- Memory driver state file
- Memory driver only

NAMEGEN_DRIVER ( default: hybrid )
- Options: memory, sqlite
- Defines which driver is user
- Server only

Test

  • Coverage is 100%
  • Full gen tests are marked skip because of comparatively long runtime
    • These generate all values available

TODO and Improvements

  • Add tests that test regaining state
  • Add driver for Redis or MariaDB for distributed use
    • Going to be slower
  • Make hybrid model compress events
    • Remove individual counts
    • Save on interval and take the last event maybe

Generation Speed

This is not really relevant in actual use most probably, but shows the differences between the drivers. If the pool of words is a lot larger, then the memory usage will go up. The Hybrid driver will lag behind in saving the events and needs to catch up.

{
  "sqlite": {
    "time": "31.05 ms",
    "memory_increase": "0.27%",
    "memory_maximum": "36.68 mb",
    "memory_minimum": "36.59 mb"
  },
  "memory": {
    "time": "0.48 ms",
    "memory_increase": "0.00%",
    "memory_maximum": "36.80 mb",
    "memory_minimum": "36.80 mb"
  },
  "hybrid": {
    "time": "0.64 ms",
    "memory_increase": "0.09%",
    "memory_maximum": "37.03 mb",
    "memory_minimum": "37.00 mb"
  },
  "count": 1000
}
{
  "sqlite": {
    "time": "2218.99 ms",
    "memory_increase": "0.80%",
    "memory_maximum": "37.34 mb",
    "memory_minimum": "37.04 mb"
  },
  "memory": {
    "time": "40.61 ms",
    "memory_increase": "0.00%",
    "memory_maximum": "37.49 mb",
    "memory_minimum": "37.49 mb"
  },
  "hybrid": {
    "time": "49.99 ms",
    "memory_increase": "27.42%",
    "memory_maximum": "48.04 mb",
    "memory_minimum": "37.70 mb"
  },
  "count": 100000
}
{
  "sqlite": {
    "time": "21669.93 ms",
    "memory_increase": "0.70%",
    "memory_maximum": "37.50 mb",
    "memory_minimum": "37.24 mb"
  },
  "memory": {
    "time": "406.25 ms",
    "memory_increase": "0.00%",
    "memory_maximum": "37.65 mb",
    "memory_minimum": "37.65 mb"
  },
  "hybrid": {
    "time": "488.76 ms",
    "memory_increase": "269.45%",
    "memory_maximum": "139.71 mb",
    "memory_minimum": "37.81 mb"
  },
  "count": 1000000
}

Notes

Period of sum of two periodic functions (a,b) is lcm(a,b)

I.E: (wordgen.py)

START.rotate(1) => Period of len(START) = a
END.rotate(1) => Period of len(END) = b

Proof:

When START rotates fully END has rotated a / b laps.
Meaning END will fully rotate when n * a / b is in N.
=> lcm(a,b) * a / b.

I.E: 
Choosing 2,4 we only get 4 distinct possibilities.
BUT: 2,5 => 10 which is much better.

Now this needs to be tweaked as high as possible.

So keep this in mind when tweaking the list.

At the time of writing it was at 
- len(START)=17
- len(END)=16
=> 272

Remember to check for duplicates ^(.+)$[^#]+^\1$

This can generate 2 720 000 unique values in format StartEnd#0000.

Check this for how the number generation works Wikipedia, LCG

Then numbering uses a LCG with parameters:

- m = 10000
- a = 21
- c = 1
=> period: 10000

Which is used to produce the whole range [0, 9999]

The counts are generated for the whole sum function.
Initially the choice of starting sum is random.
This step uses random.randint(a,b)

The state can be loaded and saved at will.

Since the way LCG's work the state can even be 
recovered from the DB if necessary.

The combined generation limit is set by:
- PERIOD_OF_SUM_FUNCTION * LCG_M

After which the values will repeat.

Previously used this to regain hybrid state instead of rowid

# Regain state
cnt = 0
i = next_start
j = next_end
while len(unsorted) > 0:
    for idx, item in enumerate(unsorted):
        if i == item.start.id and j == item.end.id:
            del unsorted[idx]
            self.queue.append(item)
            break
    cnt += 1
    i = i + 1 if i < max_start else 1
    j = j + 1 if j < max_end else 1

namegen's People

Contributors

joniumgit avatar

Watchers

 avatar

Forkers

muistotkartalla

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.