Comments (7)
You can simplify GPlearn expressions with sympy as follows:
...
model = SymbolicRegressor(**kwargs)
...
import sympy
from sympy import sympify, simplify
locals = {
'sub': lambda x, y : x - y,
'div': lambda x, y : x/y,
'mul': lambda x, y : x*y,
'add': lambda x, y : x + y,
'neg': lambda x : -x,
'pow': lambda x, y : x**y,
'cos': lambda x : sympy.cos(x)
}
simplify(sympify(model._program, locals=locals))
This is helpful for simplification + display of equations.
Though, I wonder if there is a way to convert the sympy expression back into gplearn using the same locals
dictionary? Then it could be done automatically at each step to trim individual expressions.
from gplearn.
To answer, a couple years ago I decided it would be easier to just write an SR package from scratch - https://github.com/MilesCranmer/PySR. This one does some simplification throughout the search but I found doing it too often prevents the genetic algorithm from exploring effectively, since redundant unsimplified expressions can act as a stepping stone to more optimal ones.
from gplearn.
Thanks for the feedback @jamartinh ... I'll have to look into sympy, first concern would be the requirement in GP for closure as we use "safe division", strange concoctions of square roots/logs, and so on, to avoid infs sneaking into the results. I'll leave the issue open and take a peek at what sympy is capable of though!
from gplearn.
Okay it looks like srepr
does the reverse.
srepr(simplify(sympify(model._program, locals=locals)))
converts it back into the print(model._program)
format. Though it's slightly different:
Add(Mul(Float('0.13100000000000001', precision=53), Pow(Symbol('X0'), Integer(2))), Mul(Integer(-1), Pow(Symbol('X1'), Integer(2))), Symbol('X1'))
So here's some regex to put it in the same format:
print('GPLearn:', model._program)
>>> GPLearn: sub(X1, sub(mul(X1, X1), mul(mul(X0, 0.131), X0)))
print('sympy:', simplify(sympify(model._program, locals=locals)))
>>> sympy: 0.131*X0**2 - X1**2 + X1
sympy_string = srepr(simplify(sympify(model._program, locals=locals)))
sympy_string = (
re.sub(r"x([0-9]+?)", r"X\1",
re.sub(r"Float\('([\-0-9\.]+)', precision=[0-9]+?\)", r"\1",
re.sub(r"Integer\(([\-0-9]+)\)", r"\1",
re.sub(r"Symbol\('(.+?)'\)", r"\1",
sympy_string
))).lower())
)
print('srepr:', sympy_string)
>>> srepr: add(mul(0.13100000000000001, pow(X0, 2)), mul(-1, pow(X1, 2)), X1)
Can I pass this string back into GPLearn somehow? Then it would be easy to auto-simplify with sympy every loop.
from gplearn.
Each individual in gplearn is internally saved in prefix notation in a list. So what you simply have to do is convert your string, which already seems to be in prefix notation, to a list. I also began experimenting with that here: https://github.com/wulfihm/gplearn_ba/blob/master/gplearn/_program.py
Some issues I encountered was that simplify can fail, take too long to simply the expression (I am talking here about 30mins) or never finish the simplification at all. My expressions were I think sometimes 150 symbols long, so that could be why. Maybe I also did something wrong, I am not sure anymore. As far as I remember I also got invalid individuals sometimes and couldn't figure out why. So definitely make sure to implement some safeguards there.
Anyways, from a genetic programming standpoint I would advice to also experiment with simplifying every N loops. Simplifying every loop could have the effect of destroying important "genetic" information and making it harder to get to a good solution and may also negatively impact the effect of the genetic operators. Another Idea I just had was to create simplifying similar to subtree mutation. But instead of mutating the subtree it is simplified with a chance of Ps. That saves on computation time, does not completely "destroy" the individuals and may be more beneficial for the whole genetic programming process.
Simplifying in GP is also not novel, there are definitely papers out there where people experiment with that. So maybe that could be some source of something people already did.
from gplearn.
Awesome, thanks for sharing this advice! Very useful to know.
Good point - I agree that in practice maybe just having a short cutoff time for simplification could be enough, otherwise the expression stays the same. I recall mathematica's FullSimplify has some timeout parameter which is also used to pick faster/less complete simplification strategies; maybe sympy has that too.
from gplearn.
Could this kind of functionality (simplifying with sympy and converting back to a tree) be provided through a helper function to reduce the complexity of applying it? :) I wonder also if having a parameter in the SymbolicRegressor to apply Sympy periodically could be helpful (possibly asynchronously / with some timeout so that it is "unfactorization-proof").
from gplearn.
Related Issues (20)
- Solution to avoid dividing by zero when substructing two Feature Names HOT 3
- Auto-Save function HOT 3
- [Question] How to use gplearn in comparison to neural networks? HOT 1
- Is there any way to get the formula expresssion of each individual? Thanks. HOT 4
- Check transformer supports pandas dataframe
- const_range error HOT 6
- Use of raw_fitness vs. penalized fitness HOT 2
- how to run gplearn by multi process ?
- Use logging instead of print HOT 1
- Would there be a way to produce the equivalent Python code for the program coming from the symbolic regress or HOT 1
- question about _weighted_pearson HOT 1
- Matrix shaped features issue HOT 1
- SymbolicClassifier doesn't classify tasks with more than 2 classes. HOT 1
- Optimal Population Size HOT 1
- POWER function still overflows HOT 1
- Implement elitism
- Bugs with _Program.build_program method
- Allow nans in X HOT 2
- Naming variables in GPlearn equation results and tree output HOT 1
- On the construction of fitness functions in the evolution of SymbolicClassifier
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gplearn.