Giter VIP home page Giter VIP logo

declarativepython's Introduction

Declarative Python

A calculation engine for automatically stitching together functions, and walking the dependency tree.

This python file is the best example for seeing how this can be used to simplify your calculations.

Two major use cases are:

  1. Letting you focus on the individual functions and not the structure of the program.
  2. Writing timeseries projection models which has highly interconnected logic would be a pain to try and structure in a standard program.

How To Use

Install package

python -m pip install git+https://github.com/hearnderek/DeclarativePython.

Oh yeah. I know. But I want to be able to use this on my remote system and not have to deal with pypi just yet.

Basic usage

import declarative

def f() -> str:
    print('f')
    return 'hello'
    
def g(f: str) -> str:
    print('g')
    return f + ' world'
    
def output(f: str, g: str)
    print(f)
    print(g)

if __name__ == '__main__':
    declarative.Run()
~$ python hello_declarative.py
f
g
hello
hello world
~$ 

Okay, what just happened?

In the above example we have three functions. f and g return values, and g uses the value return by f. Since that is obvious by the name of the parameter being exactly the same as the function, this package -- declarative -- does all of the plumming work to make that happen. In the third function output, we take f and g and output their return values. Our functions were executed in order of f -> g -> output.

You may have noticed when looking at the output that every function is only executed once. Every function output is memozied, or in otherwords saved in memory for later use. This makes sure your code runs efficiently without any effort.

Forward Projection Calculations

import declarative

def count_up(t, count_up):
    if t == 0:
        return 0
    else:
        return count_up[t-1] + 1

if __name__ == '__main__':
    df = declarative.Run(t=10)
    print(df)
~$ python forward_projection.py
             count_up
result_id t
0         0         0
          1         1
          2         2
          3         3
          4         4
          5         5
          6         6
          7         7
          8         8
          9         9
~$  

Woah woah woah. What?

t is a special parameter with this system that tells our engine that you are doing calculations with distinct timesteps. You tell the Run function how many time steps you use, and within your functions you can calculate forward through 0..n. This type of programming is super common within EXCEL. "Using the result of the above cell do a calculation" You can now easily convert those excel calculations into highly similar code. Let the declarative package handle the loops, you handle the logic.

For the data savy you may have noticed the pandas DataFrame was returned by the Run function. You can write out your projections standard python then do your analysis in pandas.

Cool, but I still want normal functions

import declarative

@declarative.ignore
def print_helper(s):
    return(s + ' world')

def f() -> str:
    print_helper('hello')

if __name__ == '__main__':
    declarative.Run()
~$ python ignoreme.py
hello world
~$

Do IO functions block everything else?

Naw, I got you

import declarative
import time

@declarative.io_bound
def slow_one():
    time.sleep(1)
    return 1

@declarative.io_bound
def slow_two():
    time.sleep(1)
    return 2
    
@declarative.io_bound
def slow_three(slow_one, slow_two):
    time.sleep(1)
    return slow_one + slow_two

def output(slow_one, slow_two, slow_three):
    print(slow_one + slow_two + slow_three

if __name__ == '__main__':
    # takes 2 seconds not 3
    declarative.Run()
~$ python slow_io.py
6
~$

Warnings

  • I am throughly abusing python within this package. Use at your own risk.
  • I have not implemented any garbage collection, so all function results must be able to fit in memory.
  • You can only reliably use [t] [t-1] amd [t+1] when accessing values being passed around in projections. (sorry)

declarativepython's People

Contributors

dependabot[bot] avatar hearnderek avatar

Stargazers

 avatar

Watchers

 avatar  avatar

declarativepython's Issues

load in tmp_{col}.py on first run IF there are no changes to the originating module.

How to tell if there was a change to the file:

Hacky Idea 1:

When generating python file

  1. Generate MD5 hash of the user module. (only imagining a single file)
  2. Place hash into generated flat script as a standard string variable user_module_hash.

When initializing engine

  1. search for matching flat script within working directory
  2. Generate MD5 hash of the user module.
  3. test user_module_hash against generated hash
  4. if match use flat script
  5. if not match ignore flat script

Adjacent Idea 1:

Once we start generating these files, it would be nice to have a central place to keep them, and have them out of the way for user.
Store flat file in a sqlite database, with it's hash, create date, etc.
Alternatively we could store the files in a zipfile which acts as a document store.
This then means we have a place to store our results as well. (Which then I would want to look into optimizing writes to sqlite)

Refactor the Engine class.

There are a lot of unneeded members, and some logic should be broken out into separate classes.

Specifically:

  1. Remove all unneeded members
  2. Rename remaining members
  3. Break out flat file generation into it's own class for easier refactoring.
  4. Add method for choosing optimization style

home_economics

I'm trying to build a forward projection system which shows you what your net worth will be in x years based on your income, expenses, tax, debt, and investments.

This basically and expand indefinitely. Income Tax on it's own is a non-trivial topic. I don't consider car payments, or mortgages or the complexities of buying new cars, unexpected expenses, bankruptcy, pay raises that are correlated to age, and randomness at all.

Decide on a storage medium.

It would be nice to be able to save work directly to an output file.
It would also be nice to use said output files when working with multiprocess/machine workloads.
It would also be important when dealing with high memory workloads.

Ideas:

  1. results_to_dataframe() -> dataframe_to_csv()
    • limiting on what can be stored as a string
  2. results_to_json()
  3. results_to_sqlite()
  4. result_to_zipdb()

Trim down generated flat python to keep LoC under 1,000,000.

Looks like my notes didn't save.
Basically python scripts and functions are limited to 1,000,000 lines of code. there is a PEP related to this.
In my profiling test I ran into this limit. Also importing takes longer when there are a ton of lines of code.

The basic idea behind this fix is to take the unrolled loops and convert them back into loops.

Should I do this when

  1. Within new object which walks the graph without executing it?
  2. creating the code?
  3. just before writing to file?
  4. just before importing?

My gut feeling is that the easier options are 2 and 3, but in the long term 1 would be the most maintainable since get_calc is quite bloated.

Observe what gains could be made by running on a dedicated CPU

I used psutil to set our priority to the highest possible then run 100% on a single CPU.

p = psutil.Process(os.getpid())
p.nice(psutil.REALTIME_PRIORITY_CLASS)
p.cpu_affinity([1])

While there were some gains, it was not a significant enough increase to warrant the addition of psutil to our dependencies.

Add in multi-process capabilities to IterativeEngine

Since this system is by design simple to run in parallel, let's build that into our IterativeEngine.

The basic idea is

# serial (initial design)
engine = Engine(...)
foreach (row) in input_rows:
    engine.calculate(row)
result = engine.results_to_df()

# parallel local (pseudo code of intended result of this issue)
parallel_foreach (input_rows, engine) in divided_work:
    foreach row in input_rows:
        engine.calculate(row)
result = collect(divided_work)

# parallel distributed (needed to run at production scale)
foreach (input_rows, engine) in divided_work:
    api.send(input_rows, engine, lambda x: receive_results(x))
result = collect(divided_work)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.