teald / porchlight Goto Github PK
View Code? Open in Web Editor NEWA function parsing and management library written in Python.
License: GNU General Public License v3.0
A function parsing and management library written in Python.
License: GNU General Public License v3.0
Currently, Neighborhood
and Door
are the heart of porchlight
, but their names do rely on some extrapolation from the words themselves. I'm considering including aliases to all objects with those descriptors. I.e.,
Neighborhood
-> PorchlightMediator
Door
-> PorchlightAdapter
DoorError
-> PorchlightAdapterError
This isn't particularly hard, but it will require some work.
Describe the bug
If a keyword argument is used for an initialization function, a KeyError
is raised if the argument does not already exist as a Param
.
To Reproduce
Steps to reproduce the behavior (a code snippet preferred):
from porchlight import Neighborhood
def my_initialization_function(kwarg_with_val = 10):
pass
neighborhood = Neighborhood(initialization=[my_initialization_function])
neighborhood.run_step()
Results in the following error:
KeyError: 'kwarg_with_val'
Expected behavior
The default value should be read in and stored as a param---or, at least not throw an error since it is not a required parameter for the initialization function.
Specs (please complete the following information):
In Pull Request #48, a NotImplementedError
was added to BaseDoor
to handle situations imposed by objects like numpy's ufuncs, which are critical for many (if not the majority) of use cases porchlight
targets.
This is a pretty significant issue. It automatically induces overhead for the user who must now wrap the function and re-pass it to BaseDoor
. Instead, it would be useful to have something more fluid/dynamic/approachable for this common situation.
import numpy as np
from porchlight import Door
# Current workaround for numpy ufuncs
@Door
def _np_cos(theta):
cos_theta = np.cos(theta)
return cos_theta
# Proposed solution
positional_args = ['theta']
_np_cos = Door(np.cos, positional_args=positional_args, wrapping=True)
numpy
, pandas
ufuncs
and other cases for potential unexpected behavior will be needed.This is a pretty obvious need that's been on the backburner a while. Right now, Neighborhood
objects must be instantiated alone before any Door
s or functions are added in. Instead, it'd be nice to keep it all contained:
def test1(x):
pass
def test2(y):
pass
nbh = Neighborhood([test1, test2])
Encountering a situation where, for the purpose of convenience and fluidity, it would be nice to have initialization that can be set like doors. It would also be very nice to have guaranteed closing actions in some form of finalization.
I have two functions, f1
and f2
, which each require an SSHClient
instance. This instance needs to be open/closed, and instead of wrapping f1
and f2
into new functions, it would be nice to so something like this:
from porchlight import Neighborhood
def f1(ssh: SSHClient, ...):
...
def f2(ssh: SSHClient, ...):
...
def start_sshclient() -> SSHClient:
...
return ssh
def close_sshclient(ssh: SSHClient):
ssh.close()
nbr = Neighborhood()
nbr.add_function(f1)
nbr.add_function(f2)
nbr.add_function(start_sshclient, initialize=True)
nbr.add_function(close_sshclient, finalize=True)
In this implementation, start_sshclient
and close_sshclient
would be run, including if an error is raised during f1
or f2
(basically akin to a with
statement, and could probably be implemented as such).
Neighborhood.run_step
to allow for a with
statement that could cover this
RunManager
---that allows for something akin to:with RunManager(start=[start_sshclient], stop=[stop_sshclient]):
Neighborhood.run_step()
Right now, if a new function is to be added, but argument names can mismatch. It would be nice to write something like
@Door(map_arguments={'temp': 'temperature'})
def my_func(temp):
return pressure
Open issue for getting test coverage to 100% for the following files:
porchlight/utils/inspect_functions.py
porchlight/utils/typing_functions.py
When a function is decorated, the resulting door will not correctly parse the name of the function, nor its return values.
The following will reproduce the problem:
from porchlight import Door
def my_wrapper(func):
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
return result
return wrapper
@my_wrapper
def my_test_fxn():
pass
my_door = Door(my_test_fxn)
print(my_door.name) # >> 'wrapper', but should be 'my_test_fxn'
print(my_door.return_vals) # >> [['result']], but expected []
This also occurs when using @Door
as a decorator, as well as derived classes like DynamicDoor
being worked on in issue #5.
Right now, Empty == x
evaluates to True
if x in (Empty(), Empty)
. This is a little confusing and will be inconsistent across code as a result.
Empty
should either have one (and only one) instance, or must be initialized whenever it is used. I am leaning towards a singleton implementation, but there are some restrictions there (particularly for extending Empty
easily in the future.
This is a bit nit-picky, but right now the __str__
method for porchlight.param.Param
returns the __str__
for each of the slots it writes out:
porchlight/porchlight/param.py
Line 103 in bd65757
This isn't terrible, but if we pass a str
as a value for the parameter, it will not be clear the value is a string if it is a string of python-like data.
my_data = [0, 1, 2, 3]
my_str_data = str(my_data)
pr = Param("my_data", my_str_data)
print(pr)
# Param(name=my_data, value=[0, 1, 2, 3], constant=False, type=<class 'str'>)
This also may be confusing for newer folks, since the only indication that value
is a str
is the type
attr, and there's no telling what my_data
might be.
If the repr
for these are used, it will make cases like these unambiguous, since it would instead return the string "Param(name='my_data', value='[0, 1, 2, 3]', constant=False, type=<class 'str'>)"
.
It would be useful to define a transformation into a desired input file format and stick it to a Param
object to immediately construct when a specific Door
is executed. Right now, to support an external input file for a model either the Door
must handle this or a separate Door
must be made. This also adds an extra barrier to using non-Python doors (which are more likely to solely rely on an input file in my own science use cases).
ParamSet
class that contains multiple parameters.
InputFile
class that is a ParamSet
with temporary file support.
NamedTemporaryFile
or something akin to it) is then passed to the model as the input file.Although this may seem hacky, there are some significant benefits (at least for me) having input files be managed like this. Firstly, if temporary files can be used there's no danger of local files being overwritten.
NamedTemporaryFile
works on Windows (see the tempfile documentation)There are plenty of situations where a function of interest may not be user-defined, but could be useful to handle this somehow.
NotImplementedError
?Door
that this function will fail through inspect
but we will assert the inputs.porchlight
Minimal example:
from porchlight import Door
@Door
def fxn_wrapper():
'''This function tries to return a new function.'''
def ret_fxn():
return some_value
return ret_fxn
print(fxn_wrapper.variables)
# Expected output:
# ['ret_fun']
#
# Present output:
# ['some_value', 'ret_fun']
This is not desired behavior for, e.g., wrapped functions where the output is not of particular concern. This could be an optional argument passed to the Door
initializer, since there is some convenience to being able to modify tracked variables using wrapped functions alongside the functions themselves.
BaseDoor.return_types
is meant to communicate what specific return parameters' types are. This isn't always going to be possible, and handling the case where there are no type hints is important.
That said, this is primarily a convenience for the user here. The only situation where BaseDoor.return_vals
is integral to porchlight
's functionality is with DynamicDoor
objects, since they rely on attributing a return type to a function (see issue #5 and pull request #22). Relying on return type hints too much is a significant concern as well, since it may cause misinterpretations of what porchlight
is meant to be doing in the background.
The solution to the confusion here is independent of (but might be resolvable alongside) Issue #19.
This method is not currently covered by unit tests.
The ability to dynamically generate doors with porchlight would be nice---re-defining doors as with parameters. This would allow for updating function wrappers based on existing parameters.
For example, supposed we have a function, blackbody_gen
, that will output a new function bb
at a given temperature. E.g.,
from math import exp
import typing
@Door(returned_def_to_door)
def blackbody_den(temperature: float) -> typing.Callable:
def bb(wavelength: float) -> float:
intensity = a * wavelength**-5 * (exp(-b/(wavelength * temperature)) - 1)**-1
return intensity
return bb
bb
could be another Door
included in the neighborhood object. It can be called in order with everything else, with some rules:
bb
must be called after blackbody_den
is called, unless initialized.bb
must abide by all current rules within the Neighborhood
object.It would be useful to spawn new Neighborhoods (and corresponding data) on command.
I have an extant Neighborhood
object, nbr
, that I want to stop at some point and then feed two different inputs. Currently, if I wanted to manage this I'd need two instances to get the job done. This is a hassle if any initializations are required that take appreciable time. It would be easier to initialize a single Neighborhood
and then have it run the two cases in tandem.
Consider the following example:
# Assume `nbr` was previously initialized as a Neighborhood()
# Run 10 steps
for _ in range(10):
nbr.run_step()
# Now take this instance and create a new Neighborhood using '.fork()',
# which returns a new Neighborhood object with the same state as nbr.
nbr_other = nbr.fork()
print(nbr_other == nbr)
# Out: "True"
# Now, could run these two as if they were independent neighborhoods.
for _ in range(100:
nbr.run_step()
nbr_other.run_step()
# At this point, nbr and nbr_other will have significantly diverged.
print(nbr_other == nbr)
# Out: "False"
With this, more efficient comparison of the two models can be perform as they each evolve.
The primary barrier to this being immediately possible is deciding how copying will be applied to the new neighborhood object.
Initially, this was implemented this way to prevent weird behavior across functions, but the below shows an example of this standard failing.
import porchlight
# Below works as expected. The return value is visible as 'why'.
@porchlight.Door(argument_mapping={'ecks': 'x', 'why': 'y'})
def example_door(x, y):
y = x + 1
return y
# This raises a DoorError
@porchlight.Door(argument_mapping={'ecks': 'x', 'why': 'y'})
def example_door(x):
y = x + 1
return y
Output:
DoorError: why is not a valid argument for example_door
DoorError
).Neighborhood
management, since a universal dictionary of mappings could be kept.Describe the bug
If an initialization function does not have a return value, it will raise TypeError: 'NoneType' object is not iterable
.
To Reproduce
Steps to reproduce the behavior (a code snippet preferred):
from porchlight import neighborhood
def nop_initialization():
pass
def nop_fxn():
pass
# Note: this is also the case for None-returning initialization function
# passed in a list of functions.
nbr = neighborhood([nop_fxn], initialization=nop_initialization)
nbr.run_step() # Error raised
Expected behavior
The initialization function should have executed and not been checked for outputs.
Screenshots
N/A
Specs (please complete the following information):
Additional context
This could also be a problem with finalization.
Currently, there are some significant issues without logging works for porchlight. This issue is to track ideas for and progress towards a 1.0-ready logging framework.
In the About page of the documentation, the list of example use cases does not render properly. Base on my research, this seems to be a problem with how the scheme is implemented. This isn't urgent enough to stall on v0.4.0, so I'm making it an issue for now.
Right now, the current framework allows for multiple returns to occur throughout a function (not including returns from embedded functions, see #5 as an example of a special case.
Here is a minimal case:
import porchlight
import random
@porchlight.Door
def many_ret_func():
"""Four random integers and a random selection of what to return."""
a, b, c, d = (x for x in random.randint(0, 10))
die_roll = random.randint(1, 100)
# The die roll (a d100) determines which return statement executes
if die_roll < 50:
return a, b
elif die_roll >=95:
return d, c
else:
return a, b, c, d
# Door(
# name=many_ret_func,
# base_function=<function many_ret_func at 0x42>,
# arguments={},
# return_vals=[['a', 'b'], ['d', 'c'], ['a', 'b', 'c', 'd']]
# )
There is no way for Door
to predict which of the return statements will execute when the function is called. When Door
parses many_ret_func
for return values, the current return_vals
is a list of lists with a length greater than 1 and the actual return values of the function cannot be disentangled. In this case, type checking is useless since a
, b
, c
, and d
are the same type, and while we could intuit when a, b, c, d
is returned due to its length, there's no hope for distinguishing between a, b
and d, c
.
The easiest, and I think most reasonable, path forward is to restrict Door
definitions to having only one return type bound to one set of return variables. Multiple returns can occur throughout a definition, but all must be identical. This would raise an exception when BaseDoor._get_return_vals
evaluates a definition breaking this rule.
An alternative would be to somehow follow execution introspectively. I don't think this is within the scope of a beta release as it is. It seems like a dangerous thing to implement within the current frame work, since so much of the long-term design is still in its infancy. Even if implemented, it would probably need to be used with caution and care outside of specific use cases. Overall, identifying return values at runtime would be interesting but not particularly useful.
Door
:typing.Any
).Door
is changed during runtime, such as with a DynamicDoor
, it will apply the previous rules to changes as if the function was being re-initialized.Rule 3 here is the current implementation, not necessarily the best one. It's not the most important question being addressed here, anyways.
import typing
##################
# VALID EXAMPLES #
##################
def valid_fxn1():
pass
def valid_fxn2():
return
def valid_fxn3() -> None:
return
def valid_fxn4(x):
y = str(x)
return y
def valid_fxn5(x, y, z) -> typing.Tuple[str, int]:
max1 = max(x, y, z)
if x == max1:
max2 = max(y, z)
elif y == max1:
max2 = max(x, z)
else:
max2 = max(x, y)
outstr = f"Sum of two max values if {max1 + max2} = {max1} + {max2}"
value = max1 + max2
return outstr, value
####################
# INVALID EXAMPLES #
####################
def invalid_fxn1():
return 1 + 6
def invalid_fxn2(x: float, ret_sign: bool = False) -> float:
sign = -1 if x < 0. else 1
abs_value = abs(x)
if ret_sign:
return abs_value, sign
else:
return abs_value
def invalid_fxn3(x: int) -> int:
square = x ** 2
cube = x ** 3
if square > cube:
return square
else:
return cube
print
statements from testing were left behind.
It would be useful to be able to place memory limits on objects at runtime (as a Param
check), so that some relevant exception would be raised if the size of an object exceeds a global maximum limit or a user-defined limit specific to a parameter.
The BaseDoor
class-level docstring is a filler, and should be replaced with something more descriptive.
It would be useful to expand on the examples in README.md
, which at this point is skeletal, to include a more comprehensive set of use cases.
Importantly, it would be good to have at least three cases spanning unique niches this could be used in.
It would be nice for parameters to check themselves when being read from/written to as a means of catching errors unique to a data structure.
An intuitive example would be that the value of something with mass should, generally, be positive. Being able to add that to a parameter would avoid errors like a negative mass propagating forward through the program.
Off the top of my head I can think of the following situations where this is helpful:
Implementation notes:
Param
while avoiding the possibility for screw-upsRight now, BaseDoor._get_return_vals
uses inspect.getsourcelines
to retrieve the lines of code defining the function of interest, but this seems like a slow means of parsing for return values, especially if functions are large and/or DynamicDoors
are considered (see issue #5). That's not to consider future features which may want to retrieve source multiple times, or in different ways.
If the source could be retrieved as a single string instead of a list, for example, then it could be parsed directly with re
. Not sure how much of an improvement that would be.
Basically, this needs more thought/research. It works the way it is, and is not an issue, but might have a much better alternative.
Just an underformed idea, but being able to call directly by reference would be nice. For example:
from porchlight import Neighborhood, Door
@Door
def test_function():
pass
neighborhood = Neighborhood(test_function)
neighborhood.call(test_fuction) # checks function ID to see if it is referenced in the Neighborhood.
This would also encourage cleanliness when naming functions. It should raise a NeighborhoodError
if the id is not present in the neighborhood, with a helpful message.
Currently, a NotImplementedError
is raised if any positional-only arguments are present in a function being converted into a BaseDoor
. The initial reasoning behind this was only to skip over implementing this while the rest of the code was worked on, since this was not a common use case at the time (and still isn't), but it's also a pretty obvious feature to include.
There's a requirement right now for __name__
to be present in a function, but there's not necessarily a specific need for it. It throws errors when otherwise perfectly ok callables (e.g., callables returned by other functions) are passed to doors directly, which should not be an issue.
BaseDoor.__name__
to be None
or '' if not present.porchlight
ID in lieu of a name.
There are plenty of cases where the initial state of a parameter might not be known until other doors have run. Currently, this level of initialization has to be done on the user-end to avoid the program failing.
from porchlight import Door, Neighborhood
#Define two functions, with the second requiring the output of the first.
def fxn_one(x):
y = x + 1
return y
def fxn_two(y):
print("Hello!")
nbr = Neighborhood()
nbr.add_function(fxn_one)
nbr.add_function(fxn_two)
# Ideally, just providing the value of x should be sufficient to call a step.
nbr.set_param('x', 1)
# However, this raises a ParameterError because 'y' is an undefined input
# and is required by the current way Neighborhood objects check their state
# before running. So this would only succeed in printing "Hello!" if 'y' is set
# to a non-Empty value.
nbr.set_param('y', 0)
This could be done by removing the requirement for non-Empty values and waiting for the Neighborhood
to fail when calling the door. The errors here might be cryptic, though, and if a particularly lengthy Door
needs to run before the error would get caught that could be an issue.
Need to incorporate an informative __str__
method for the porchlight.neighborhood.Neighborhood
object. This could include a method that incorporated a "full report" style string useful for console/terminal updates.
The base string should include the following information:
e.g.,
from porchlight.neighborhood import Neighborhood
neighborhood = Neighborhood()
neighborhood.add_param('x', 7)
neighborhood.add _param('hello', 'beep')
def ex_fxn(x: int) -> str:
hello = f"{x + 5 = }"
return hello
neighborhood.add_function(ex_fxn)
print(neighborhood)
# This would result in something like
# Neighborhood(1 door, 2 params, 0 Empty)
Pretty print is up in the air, something that could be updated in the terminal or easily logged.
The documentation for DynamicDoor
needs to be updated and expanded.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.