Memory issues about gplearn HOT 6 CLOSED

trevorstephens commented on June 19, 2024

Memory issues

from gplearn.

Comments (6)

jamartinh commented on June 19, 2024

Hello, I have also experienced the same issue, and can't run my experiments
for many iterations.

I tough it was a problem with garbage collection.

I think one can have this as a parameter. num_generations_history

Cheers,
Jose A.

2016-02-11 16:08 GMT+01:00 guyko81 [email protected]:

Hi Trevor,

it's a very nice implementation - I was searching for such solution for a
long time. So really thank you!

I got only 1 issue that with long term of evolution (generations =
some_huge_number; or population_size = some_huge_number + generations =
some_number) the program runs out of memory. I checked the code and it
saves every iteration's population. Do you think it's necessary? In my
understanding we only need the current population and the best of the
previous in the beginning.

What do you think, can the code be changed some way to make
self._programs = []
before every iteration and just save the previous one in a
self._programs_prev (or something)?

—
Reply to this email directly or view it on GitHub
#5.

/ .- .-.. .-.. / -.-- --- ..- / -. . . -.. / .. ... / .-.. --- ...- .
José Antonio Martín H. (PhD) E-Mail: [email protected]
Computer Science Faculty Phone: (+34) 91 3947650
Complutense University of Madrid Fax: (+34) 91 3947527
C/ Prof. José García Santesmases,s/n 28040 Madrid, Spain
web: http://www.dacya.ucm.es/jam/
LinkedIn: http://www.linkedin.com/in/jamartinh (Let's connect)
.-.. --- ...- . / .. ... / .- .-.. .-.. / .-- . / -. . . -..

from gplearn.

trevorstephens commented on June 19, 2024

Thanks for the report! I'll look into your hypothesis @guyko81 but suspect the issue is more likely with numpy arrays being stored as the equations are recursively evaluated. These /should/ be garbage collected by Python as they are never stored in the object, but I'll check that out as well @jamartinh

I have seen this issue as well, and was thinking that a eval_size parameter might help by evaluating fewer samples at once, rather than the whole dataset. I've been meaning to work on a v0.2 for a while now. This should be top of the list.

For now, you might find using n_jobs=1 more stable (fewer evaluations at once) or ramping up the parsimony to keep the programs smaller.

from gplearn.

guyko81 commented on June 19, 2024

Thanks Trevor! Can't tell more, so thank you :)

from gplearn.

trevorstephens commented on June 19, 2024

I've located the main culprit. It is due almost entirely to saving the indices of X & y used for evaluating a programs fitness in the case of using max_samples. These indices are also retained for no under-sampling. I am working on a fix now, and can still retain all prior populations for inspecting the lineage of a final program.

from gplearn.

trevorstephens commented on June 19, 2024

I have also added a check at each evolution to see whether older generations are still relevant, ie whether any of their "dna" exists in the current generation. Any irrelevant programs will be removed from the old generation's population by marking them as None. This results in a massive reduction of the number of programs stored and should help significantly with memory use.

from gplearn.

trevorstephens commented on June 19, 2024

Mostly fixed by #19 ... Please re-open if problems still persist in the master branch or the next release.

from gplearn.

Memory issues about gplearn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent