teald / porchlight Goto Github PK

A function parsing and management library written in Python.

License: GNU General Public License v3.0

Python 100.00%

porchlight's Introduction

porchlight

porchlight is a function management suite that handles shared inputs and outputs of methods and/or functions which evolve over the lifetime of a program.

This package's original intent was to be a part of a modular scientific package yet to be released. Rather than isolating this method to a single model, the already-developed work has been modified to stand alone as a package.

porchlight does not have any dependencies outside of the standard CPython library. Please note that porchlight requires Python 3.9+, and that examples may require external libraries such as numpy and matplotlib.

Installation

You can install porchlight using pip:

pip install porchlight

Usage

The main object used in porchlight is the porchlight.Neighborhood object. This groups all functions together and keeps track of call order and parameters.

import porchlight


# To add a function, we simply define it and pass it to porchlight.
def increase_x(x: int, y: int) -> int:
    x = x * y
    return x

# Type annotations are optional, as with normal python.
def string_x(x):
    x_string = f"{x = }"
    return x_string

def increment_y(y=0):
    y = y + 1
    return y

# Generating a complete, coupled model between these functions is as simple as
# adding all these functions to a Neighborhood object.
neighborhood = Neighborhood([increment_y, increase_x, string_x])

# The neighborhood object inspects the function, finding input and output
# variables if present. These are added to the collections of functions and
# parameters.
print(neighborhood)

# We initialize any variables we need to (in this case, just x), and then
# executing the model is a single method call.
neighborhood.set_param('x', 2)

neighborhood.run_step()

# Print out information.
for name, param in neighborhood.params.items():
    print(f"{name} = {param}")

Documentation

Documentation for porchlight can be found on Read the Docs here: https://porchlight.readthedocs.io/en/latest/

Other info

You can find slides from presentations about porchlight within the docs folder, under docs/slides.

porchlight's People

Contributors

Stargazers

Watchers

porchlight's Issues

Pass doors to `Neighborhood.call` and check against available doors

Just an underformed idea, but being able to call directly by reference would be nice. For example:

from porchlight import Neighborhood, Door

@Door
def test_function():
    pass

neighborhood = Neighborhood(test_function)

neighborhood.call(test_fuction)  # checks function ID to see if it is referenced in the Neighborhood.

This would also encourage cleanliness when naming functions. It should raise a NeighborhoodError if the id is not present in the neighborhood, with a helpful message.

Add dynamic Door generation to Neighborhood

The ability to dynamically generate doors with porchlight would be nice---re-defining doors as with parameters. This would allow for updating function wrappers based on existing parameters.

For example, supposed we have a function, blackbody_gen, that will output a new function bb at a given temperature. E.g.,

from math import exp
import typing


@Door(returned_def_to_door)
def blackbody_den(temperature: float) -> typing.Callable:
    def bb(wavelength: float) -> float:
        intensity = a * wavelength**-5 * (exp(-b/(wavelength * temperature)) - 1)**-1
        return intensity
    return bb

bb could be another Door included in the neighborhood object. It can be called in order with everything else, with some rules:

bb must be called after blackbody_den is called, unless initialized.
bb must abide by all current rules within the Neighborhood object.

Add descriptive aliases for `Neighborhood`, `Door`

Currently, Neighborhood and Door are the heart of porchlight, but their names do rely on some extrapolation from the words themselves. I'm considering including aliases to all objects with those descriptors. I.e.,

Neighborhood -> PorchlightMediator
Door -> PorchlightAdapter
DoorError -> PorchlightAdapterError

This isn't particularly hard, but it will require some work.

Flesh out `BaseDoor` docstring

The BaseDoor class-level docstring is a filler, and should be replaced with something more descriptive.

Add tests for Neighborhood.order_doors

This method is not currently covered by unit tests.

Is `inspect.getsourcelines` the best way to parse through the function for return values?

Right now, BaseDoor._get_return_vals uses inspect.getsourcelines to retrieve the lines of code defining the function of interest, but this seems like a slow means of parsing for return values, especially if functions are large and/or DynamicDoors are considered (see issue #5). That's not to consider future features which may want to retrieve source multiple times, or in different ways.

If the source could be retrieved as a single string instead of a list, for example, then it could be parsed directly with re. Not sure how much of an improvement that would be.

Basically, this needs more thought/research. It works the way it is, and is not an issue, but might have a much better alternative.

BaseDoor.return_types should be a dictionary, not a list

BaseDoor.return_types is meant to communicate what specific return parameters' types are. This isn't always going to be possible, and handling the case where there are no type hints is important.

That said, this is primarily a convenience for the user here. The only situation where BaseDoor.return_vals is integral to porchlight's functionality is with DynamicDoor objects, since they rely on attributing a return type to a function (see issue #5 and pull request #22). Relying on return type hints too much is a significant concern as well, since it may cause misinterpretations of what porchlight is meant to be doing in the background.

The solution to the confusion here is independent of (but might be resolvable alongside) Issue #19.

Handle situations where non-parsable functions are provided

There are plenty of situations where a function of interest may not be user-defined, but could be useful to handle this somehow.

Possible Solutions

Raise a very specific error (short-term, maybe a :python:NotImplementedError?
Allow for user-defined input/output overrides.
- i.e., a way to dell Door that this function will fail through inspect but we will assert the inputs.
- This could be somewhat complicated
Try to find another way to directly parse the source
- This would still fail for some cases without quite a lot of work (e.g., C or Fortran integrations)
- Not sure this is within the scope of porchlight

Logging restructuring

Currently, there are some significant issues without logging works for porchlight. This issue is to track ideas for and progress towards a 1.0-ready logging framework.

[BUG] Initialization functions with no return values

Describe the bug
If an initialization function does not have a return value, it will raise TypeError: 'NoneType' object is not iterable .

To Reproduce
Steps to reproduce the behavior (a code snippet preferred):

from porchlight import neighborhood


def nop_initialization():
    pass

def nop_fxn():
    pass

# Note: this is also the case for None-returning initialization function
# passed in a list of functions.
nbr = neighborhood([nop_fxn], initialization=nop_initialization)

nbr.run_step()  # Error raised

Expected behavior
The initialization function should have executed and not been checked for outputs.

Screenshots
N/A

Specs (please complete the following information):

OS: WSL 2 Ubuntu
Python version: 3.11.0
Version v0.1.0

Additional context
This could also be a problem with finalization.

Allow positional-only arguments in functions

Currently, a NotImplementedError is raised if any positional-only arguments are present in a function being converted into a BaseDoor. The initial reasoning behind this was only to skip over implementing this while the rest of the code was worked on, since this was not a common use case at the time (and still isn't), but it's also a pretty obvious feature to include.

Map arguments from function variables to neighborhood variables

Right now, if a new function is to be added, but argument names can mismatch. It would be nice to write something like

@Door(map_arguments={'temp': 'temperature'})
def my_func(temp):
    return pressure

Return values not properly mapped by `argument_mapping` if not present in argument list.

Initially, this was implemented this way to prevent weird behavior across functions, but the below shows an example of this standard failing.

Example

import porchlight

# Below works as expected. The return value is visible as 'why'.
@porchlight.Door(argument_mapping={'ecks': 'x', 'why': 'y'})
def example_door(x, y):
    y = x + 1
    return y

# This raises a DoorError
@porchlight.Door(argument_mapping={'ecks': 'x', 'why': 'y'})
def example_door(x):
    y = x + 1
    return y

Output:

DoorError: why is not a valid argument for example_door

Notes

The standard could be kept as strict, but include return values (i.e., each mapped parameter must appear AT LEAST ONCE or raise a DoorError).
Could be loosened, allow any mappings and ignore the ones that don't show up.
- This would be good for Neighborhood management, since a universal dictionary of mappings could be kept.

Anticipate empty parameters being filled by Doors or other processes.

There are plenty of cases where the initial state of a parameter might not be known until other doors have run. Currently, this level of initialization has to be done on the user-end to avoid the program failing.

from porchlight import Door, Neighborhood


#Define two functions, with the second requiring the output of the first.
def fxn_one(x):
    y = x + 1
    return y


def fxn_two(y):
    print("Hello!")


nbr = Neighborhood()
nbr.add_function(fxn_one)
nbr.add_function(fxn_two)

# Ideally, just providing the value of x should be sufficient to call a step.
nbr.set_param('x', 1)

# However, this raises a ParameterError because 'y' is an undefined input
# and is required by the current way Neighborhood objects check their state
# before running. So this would only succeed in printing "Hello!" if 'y' is set
# to a non-Empty value.
nbr.set_param('y', 0)

This could be done by removing the requirement for non-Empty values and waiting for the Neighborhood to fail when calling the door. The errors here might be cryptic, though, and if a particularly lengthy Door needs to run before the error would get caught that could be an issue.

Comprehensive examples written as notebooks and raw python files.

It would be useful to expand on the examples in README.md, which at this point is skeletal, to include a more comprehensive set of use cases.

Importantly, it would be good to have at least three cases spanning unique niches this could be used in.

Initialization and finalization settings

Encountering a situation where, for the purpose of convenience and fluidity, it would be nice to have initialization that can be set like doors. It would also be very nice to have guaranteed closing actions in some form of finalization.

Example

I have two functions, f1 and f2, which each require an SSHClient instance. This instance needs to be open/closed, and instead of wrapping f1 and f2 into new functions, it would be nice to so something like this:

from porchlight import Neighborhood

def f1(ssh: SSHClient, ...):
    ...

def f2(ssh: SSHClient, ...):
    ...

def start_sshclient() -> SSHClient:
    ...
    return ssh

def close_sshclient(ssh: SSHClient):
    ssh.close()
    
nbr = Neighborhood()
nbr.add_function(f1)
nbr.add_function(f2)

nbr.add_function(start_sshclient, initialize=True)
nbr.add_function(close_sshclient, finalize=True)

In this implementation, start_sshclient and close_sshclient would be run, including if an error is raised during f1 or f2 (basically akin to a with statement, and could probably be implemented as such).

Ideas

Can refactor Neighborhood.run_step to allow for a with statement that could cover this
- Could make a new class---e.g., RunManager---that allows for something akin to:

with RunManager(start=[start_sshclient], stop=[stop_sshclient]):
    Neighborhood.run_step()

`name` is required for `BaseDoor`, but not always present

There's a requirement right now for __name__ to be present in a function, but there's not necessarily a specific need for it. It throws errors when otherwise perfectly ok callables (e.g., callables returned by other functions) are passed to doors directly, which should not be an issue.

Possible solutions

Remove the requirement entirely, set BaseDoor.__name__ to be None or '' if not present.
Apply a unique porchlight ID in lieu of a name.
- Confusing if not implemented well
Warn the user that a function name could not be found
Research more extensive search methods to get a name.

`Door` catches return statements for functions defined within a function.

Minimal example:

from porchlight import Door

@Door
def fxn_wrapper():
    '''This function tries to return a new function.'''
    def ret_fxn():
        return some_value

    return ret_fxn

print(fxn_wrapper.variables)
# Expected output:
#     ['ret_fun']
#
# Present output:
#     ['some_value', 'ret_fun']

This is not desired behavior for, e.g., wrapped functions where the output is not of particular concern. This could be an optional argument passed to the Door initializer, since there is some convenience to being able to modify tracked variables using wrapped functions alongside the functions themselves.

Improve testing coverage of porchlight utils

Open issue for getting test coverage to 100% for the following files:

porchlight/utils/inspect_functions.py
porchlight/utils/typing_functions.py

Allow for Neighborhood to split into identical states

It would be useful to spawn new Neighborhoods (and corresponding data) on command.

Motivation and example

I have an extant Neighborhood object, nbr, that I want to stop at some point and then feed two different inputs. Currently, if I wanted to manage this I'd need two instances to get the job done. This is a hassle if any initializations are required that take appreciable time. It would be easier to initialize a single Neighborhood and then have it run the two cases in tandem.

Consider the following example:

# Assume `nbr` was previously initialized as a Neighborhood()
# Run 10 steps
for _ in range(10):
    nbr.run_step()

# Now take this instance and create a new Neighborhood using '.fork()',
# which returns a new Neighborhood object with the same state as nbr.
nbr_other = nbr.fork()

print(nbr_other == nbr)
# Out: "True"

# Now, could run these two as if they were independent neighborhoods.
for _ in range(100:
    nbr.run_step()
    nbr_other.run_step()

# At this point, nbr and nbr_other will have significantly diverged.
print(nbr_other == nbr)
# Out: "False"

With this, more efficient comparison of the two models can be perform as they each evolve.

Implementation specifics

The primary barrier to this being immediately possible is deciding how copying will be applied to the new neighborhood object.

Update `DynamicDoor` documentation

The documentation for DynamicDoor needs to be updated and expanded.

Let `Neighborhood` initialize with a list of candidate functions/doors

This is a pretty obvious need that's been on the backburner a while. Right now, Neighborhood objects must be instantiated alone before any Doors or functions are added in. Instead, it'd be nice to keep it all contained:

def test1(x):
    pass

def test2(y):
    pass

nbh = Neighborhood([test1, test2])

Very minor issue in docs due to RtD scheme implementation

In the About page of the documentation, the list of example use cases does not render properly. Base on my research, this seems to be a problem with how the scheme is implemented. This isn't urgent enough to stall on v0.4.0, so I'm making it an issue for now.

Param string should use reprs for stored data

This is a bit nit-picky, but right now the __str__ method for porchlight.param.Param returns the __str__ for each of the slots it writes out:

porchlight/porchlight/param.py

Line 103 in bd65757

infostrings = [f"{key}={value}" for key, value in info.items()]

This isn't terrible, but if we pass a str as a value for the parameter, it will not be clear the value is a string if it is a string of python-like data.

Example

my_data = [0, 1, 2, 3]
my_str_data = str(my_data)

pr = Param("my_data", my_str_data) 
print(pr)
# Param(name=my_data, value=[0, 1, 2, 3], constant=False, type=<class 'str'>)

This also may be confusing for newer folks, since the only indication that value is a str is the type attr, and there's no telling what my_data might be.

If the repr for these are used, it will make cases like these unambiguous, since it would instead return the string "Param(name='my_data', value='[0, 1, 2, 3]', constant=False, type=<class 'str'>)".

[BUG] Initialization with keyword argument not recognized as default value

Describe the bug
If a keyword argument is used for an initialization function, a KeyError is raised if the argument does not already exist as a Param.

To Reproduce
Steps to reproduce the behavior (a code snippet preferred):

from porchlight import Neighborhood

def my_initialization_function(kwarg_with_val = 10):
    pass

neighborhood = Neighborhood(initialization=[my_initialization_function])
neighborhood.run_step()

Results in the following error:

KeyError: 'kwarg_with_val'

Expected behavior
The default value should be read in and stored as a param---or, at least not throw an error since it is not a required parameter for the initialization function.

Specs (please complete the following information):

OS: Ubuntu WSL2
Python version: 3.10
Version v1.0.1

Decorated doors are not correctly parsed by `BaseDoor`

When a function is decorated, the resulting door will not correctly parse the name of the function, nor its return values.

The following will reproduce the problem:

from porchlight import Door

def my_wrapper(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        return result

    return wrapper

@my_wrapper
def my_test_fxn():
    pass

my_door = Door(my_test_fxn)

print(my_door.name)         # >> 'wrapper', but should be 'my_test_fxn'
print(my_door.return_vals)  # >> [['result']], but expected []

This also occurs when using @Door as a decorator, as well as derived classes like DynamicDoor being worked on in issue #5.

Add an auto-wrapper for non-func callables

In Pull Request #48, a NotImplementedError was added to BaseDoor to handle situations imposed by objects like numpy's ufuncs, which are critical for many (if not the majority) of use cases porchlight targets.

This is a pretty significant issue. It automatically induces overhead for the user who must now wrap the function and re-pass it to BaseDoor. Instead, it would be useful to have something more fluid/dynamic/approachable for this common situation.

Example

import numpy as np
from porchlight import Door

# Current workaround for numpy ufuncs
@Door
def _np_cos(theta):
    cos_theta = np.cos(theta)
    return cos_theta

# Proposed solution
positional_args = ['theta']
_np_cos = Door(np.cos, positional_args=positional_args, wrapping=True)

Notes

Could even offer direct support for common encounters, e.g., numpy, pandas
- Without bringing them on as dependencies, though
There might be side-effects in some cases that I'm not thinking of. Reading the docs for ufuncs and other cases for potential unexpected behavior will be needed.

Memory usage monitor

It would be useful to be able to place memory limits on objects at runtime (as a Param check), so that some relevant exception would be raised if the size of an object exceeds a global maximum limit or a user-defined limit specific to a parameter.

Add informative neighborhood `str` method for `porchlight.neighborhood.Neighborhood`

Need to incorporate an informative __str__ method for the porchlight.neighborhood.Neighborhood object. This could include a method that incorporated a "full report" style string useful for console/terminal updates.

The base string should include the following information:

Number of doors, parameters
Names of parameters
Any empty parameters should be highlighted

e.g.,

from porchlight.neighborhood import Neighborhood

neighborhood = Neighborhood()

neighborhood.add_param('x', 7)
neighborhood.add _param('hello', 'beep')

def ex_fxn(x: int) -> str:
    hello = f"{x + 5 = }"
    return hello

neighborhood.add_function(ex_fxn)

print(neighborhood)

# This would result in something like
# Neighborhood(1 door, 2 params, 0 Empty)

Pretty print is up in the air, something that could be updated in the terminal or easily logged.

Leftover testing artifacts in door.py

print statements from testing were left behind.

Parameters should have associated "output file" representations

It would be useful to define a transformation into a desired input file format and stick it to a Param object to immediately construct when a specific Door is executed. Right now, to support an external input file for a model either the Door must handle this or a separate Door must be made. This also adds an extra barrier to using non-Python doors (which are more likely to solely rely on an input file in my own science use cases).

Proposed updates

A new ParamSet class that contains multiple parameters.
- Contains weak references to parameters, and can be set to read-only.
A new InputFile class that is a ParamSet with temporary file support.
- This takes a constructor that constructs a string and writes it to a temporary file.
- This temporary file (probably a NamedTemporaryFile or something akin to it) is then passed to the model as the input file.
- OR, more traditionally, it can be a specific file within an extant model directory structure, instead saving/replacing files safely.
- Should only open/close once over the lifetime of the InputFile.
Basic column or function-based input file construction that can take in multiple

Rationale

Although this may seem hacky, there are some significant benefits (at least for me) having input files be managed like this. Firstly, if temporary files can be used there's no danger of local files being overwritten.

Notes

The temporary file may need to be in-house, depending on how NamedTemporaryFile works on Windows (see the tempfile documentation)
A parameter set object is important to a number of future updates.

Finalize how ambiguous return values will be handled

Right now, the current framework allows for multiple returns to occur throughout a function (not including returns from embedded functions, see #5 as an example of a special case.

Here is a minimal case:

import porchlight
import random

@porchlight.Door
def many_ret_func():
    """Four random integers and a random selection of what to return."""
    a, b, c, d = (x for x in random.randint(0, 10))
    die_roll = random.randint(1, 100)

    # The die roll (a d100) determines which return statement executes
    if die_roll < 50:
        return a, b
    elif die_roll >=95:
        return d, c
    else:
        return a, b, c, d

# Door(
#     name=many_ret_func,
#     base_function=<function many_ret_func at 0x42>,
#     arguments={},
#     return_vals=[['a', 'b'], ['d', 'c'], ['a', 'b', 'c', 'd']]
# )

There is no way for Door to predict which of the return statements will execute when the function is called. When Door parses many_ret_func for return values, the current return_vals is a list of lists with a length greater than 1 and the actual return values of the function cannot be disentangled. In this case, type checking is useless since a, b, c, and d are the same type, and while we could intuit when a, b, c, d is returned due to its length, there's no hope for distinguishing between a, b and d, c.

The easiest, and I think most reasonable, path forward is to restrict Door definitions to having only one return type bound to one set of return variables. Multiple returns can occur throughout a definition, but all must be identical. This would raise an exception when BaseDoor._get_return_vals evaluates a definition breaking this rule.

An alternative would be to somehow follow execution introspectively. I don't think this is within the scope of a beta release as it is. It seems like a dangerous thing to implement within the current frame work, since so much of the long-term design is still in its infancy. Even if implemented, it would probably need to be used with caution and care outside of specific use cases. Overall, identifying return values at runtime would be interesting but not particularly useful.

Rules for functions compatible with `Door`:

All return statements must return the same set of variables separated by commas.
All return statements must have the same type (which may be typing.Any).
If a Door is changed during runtime, such as with a DynamicDoor, it will apply the previous rules to changes as if the function was being re-initialized.

Rule 3 here is the current implementation, not necessarily the best one. It's not the most important question being addressed here, anyways.

Examples

import typing

##################
# VALID EXAMPLES #
##################
def valid_fxn1():
    pass

def valid_fxn2():
    return

def valid_fxn3() -> None:
    return

def valid_fxn4(x):
    y = str(x)
    return y

def valid_fxn5(x, y, z) -> typing.Tuple[str, int]:
    max1 = max(x, y, z)
    if x == max1:
         max2 = max(y, z)

    elif y == max1:
        max2 = max(x, z)

    else:
        max2 = max(x, y)

    outstr = f"Sum of two max values if {max1 + max2} = {max1} + {max2}"
    value = max1 + max2

    return outstr, value

####################
# INVALID EXAMPLES #
####################
def invalid_fxn1():
    return 1 + 6

def invalid_fxn2(x: float, ret_sign: bool = False) -> float:
    sign = -1 if x < 0. else 1
    abs_value = abs(x)

    if ret_sign:
        return abs_value, sign

    else:
        return abs_value

def invalid_fxn3(x: int) -> int:
    square = x ** 2
    cube = x ** 3
    if square > cube:
        return square
    else:
        return cube

Add self-checking parameter option

It would be nice for parameters to check themselves when being read from/written to as a means of catching errors unique to a data structure.

An intuitive example would be that the value of something with mass should, generally, be positive. Being able to add that to a parameter would avoid errors like a negative mass propagating forward through the program.

Off the top of my head I can think of the following situations where this is helpful:

The above example; bounding things more strictly than the language can without requiring a new class.
Catching unexpected changes in size, type
Applying user-defined restrictions (callables) with a simple True/False success/fail criterion.

Implementation notes:

Need to decide what determines if a check occurs or not
- There is a risk of adding considerable overhead if costly checks are happening frequently
- How to override Param while avoiding the possibility for screw-ups
How enforceable should this be? What limitations need to be set on when/why/how parameters are checked?

`porchlight.param.Empty` should either be a singleton or be required to be initialized

Right now, Empty == x evaluates to True if x in (Empty(), Empty). This is a little confusing and will be inconsistent across code as a result.

Empty should either have one (and only one) instance, or must be initialized whenever it is used. I am leaning towards a singleton implementation, but there are some restrictions there (particularly for extending Empty easily in the future.

teald / porchlight Goto Github PK

porchlight's Introduction

Installation

Usage

Documentation

Other info

porchlight's People

Contributors

Stargazers

Watchers

porchlight's Issues

Possible Solutions

Example

Notes

Example

Ideas

Possible solutions

Motivation and example

Implementation specifics

Example

Example

Notes

Proposed updates

Rationale

Notes

Rules for functions compatible with Door:

Examples

Recommend Projects

Recommend Topics

Recommend Org

Rules for functions compatible with `Door`: