Giter VIP home page Giter VIP logo

Comments (7)

MilesCranmer avatar MilesCranmer commented on June 19, 2024 2

You can simplify GPlearn expressions with sympy as follows:

...
model = SymbolicRegressor(**kwargs)
...

import sympy
from sympy import sympify, simplify
locals = {
    'sub': lambda x, y : x - y,
    'div': lambda x, y : x/y,
    'mul': lambda x, y : x*y,
    'add': lambda x, y : x + y,
    'neg': lambda x    : -x,
    'pow': lambda x, y : x**y,
    'cos': lambda x    : sympy.cos(x)
}
simplify(sympify(model._program, locals=locals))

This is helpful for simplification + display of equations.

Though, I wonder if there is a way to convert the sympy expression back into gplearn using the same locals dictionary? Then it could be done automatically at each step to trim individual expressions.

Credit: https://stackoverflow.com/questions/48404263/how-to-export-the-output-of-gplearn-as-a-sympy-expression-or-some-other-readable

from gplearn.

MilesCranmer avatar MilesCranmer commented on June 19, 2024 1

To answer, a couple years ago I decided it would be easier to just write an SR package from scratch - https://github.com/MilesCranmer/PySR. This one does some simplification throughout the search but I found doing it too often prevents the genetic algorithm from exploring effectively, since redundant unsimplified expressions can act as a stepping stone to more optimal ones.

from gplearn.

trevorstephens avatar trevorstephens commented on June 19, 2024

Thanks for the feedback @jamartinh ... I'll have to look into sympy, first concern would be the requirement in GP for closure as we use "safe division", strange concoctions of square roots/logs, and so on, to avoid infs sneaking into the results. I'll leave the issue open and take a peek at what sympy is capable of though!

from gplearn.

MilesCranmer avatar MilesCranmer commented on June 19, 2024

Okay it looks like srepr does the reverse.

srepr(simplify(sympify(model._program, locals=locals)))

converts it back into the print(model._program) format. Though it's slightly different:

Add(Mul(Float('0.13100000000000001', precision=53), Pow(Symbol('X0'), Integer(2))), Mul(Integer(-1), Pow(Symbol('X1'), Integer(2))), Symbol('X1'))

So here's some regex to put it in the same format:

print('GPLearn:', model._program)
>>> GPLearn: sub(X1, sub(mul(X1, X1), mul(mul(X0, 0.131), X0)))
print('sympy:', simplify(sympify(model._program, locals=locals)))
>>> sympy: 0.131*X0**2 - X1**2 + X1
sympy_string = srepr(simplify(sympify(model._program, locals=locals)))

sympy_string = (
re.sub(r"x([0-9]+?)", r"X\1",
re.sub(r"Float\('([\-0-9\.]+)', precision=[0-9]+?\)", r"\1",
re.sub(r"Integer\(([\-0-9]+)\)", r"\1",
re.sub(r"Symbol\('(.+?)'\)", r"\1",
    sympy_string
))).lower())
)

print('srepr:', sympy_string)
>>> srepr: add(mul(0.13100000000000001, pow(X0, 2)), mul(-1, pow(X1, 2)), X1)

Can I pass this string back into GPLearn somehow? Then it would be easy to auto-simplify with sympy every loop.

from gplearn.

hwulfmeyer avatar hwulfmeyer commented on June 19, 2024

Each individual in gplearn is internally saved in prefix notation in a list. So what you simply have to do is convert your string, which already seems to be in prefix notation, to a list. I also began experimenting with that here: https://github.com/wulfihm/gplearn_ba/blob/master/gplearn/_program.py

Some issues I encountered was that simplify can fail, take too long to simply the expression (I am talking here about 30mins) or never finish the simplification at all. My expressions were I think sometimes 150 symbols long, so that could be why. Maybe I also did something wrong, I am not sure anymore. As far as I remember I also got invalid individuals sometimes and couldn't figure out why. So definitely make sure to implement some safeguards there.

Anyways, from a genetic programming standpoint I would advice to also experiment with simplifying every N loops. Simplifying every loop could have the effect of destroying important "genetic" information and making it harder to get to a good solution and may also negatively impact the effect of the genetic operators. Another Idea I just had was to create simplifying similar to subtree mutation. But instead of mutating the subtree it is simplified with a chance of Ps. That saves on computation time, does not completely "destroy" the individuals and may be more beneficial for the whole genetic programming process.
Simplifying in GP is also not novel, there are definitely papers out there where people experiment with that. So maybe that could be some source of something people already did.

from gplearn.

MilesCranmer avatar MilesCranmer commented on June 19, 2024

Awesome, thanks for sharing this advice! Very useful to know.

Good point - I agree that in practice maybe just having a short cutoff time for simplification could be enough, otherwise the expression stays the same. I recall mathematica's FullSimplify has some timeout parameter which is also used to pick faster/less complete simplification strategies; maybe sympy has that too.

from gplearn.

jerabaul29 avatar jerabaul29 commented on June 19, 2024

Could this kind of functionality (simplifying with sympy and converting back to a tree) be provided through a helper function to reduce the complexity of applying it? :) I wonder also if having a parameter in the SymbolicRegressor to apply Sympy periodically could be helpful (possibly asynchronously / with some timeout so that it is "unfactorization-proof").

from gplearn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.