ahmedfgad / geneticalgorithmpython Goto Github PK

Source code of PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

Home Page: https://pygad.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 99.31% Cython 0.69%

python genetic-algorithm optimization numpy pygad pygad-documentation neural-networks machine-learning deep-learning evolutionary-algorithms

geneticalgorithmpython's People

Contributors

Stargazers

Watchers

Forkers

amimul msaberai trungx alanrabelo ajaybha ptrebulle apachesep ai-maxim rmshao nuaayxy fnuabhimanyu iamonuwa hellohktk ncmarian rakhmatullinart kamos86 wardahsuhaim ordino-ai car3989p abdelhek115 lalax-systems stjordanis liuhane pkucp kenalv drericebert lizhaodong ols3er fresh-tuna keremakkaya hcchengithub vipulmalhotra zhenfengcao mayanksemwal abiraja2004 wangboa123 rameshcpoonia campioni1 pmannil riviera2015 ismail-30 lucasp18 dormantony pseudobublar master-0f-none charudatta10 pidugusundeep vedraiyani adityapai14 deepakthandra hendrosaragih jainshashwat16 jayagupta678 sayopaul harishamd leodenale prasys riyanto04 visheshwar tararadvand95 supr4pt0 gmeski xinhen ahmedfiz masteroforest marsnone mridho2828 mighty-phoenix devhritwik perfmjs mukulverma33 jayew dadbob diogoribeiro7 rahadbinalislam exp-optimization-tools hwangpo fasladodo qing-zhou radwenedouissa nunofernandes-plight 1789291 rollingstone haonan-zhang sakalasabareesh rohit2099 peenaphoenix sachinisam espoiriste orzorzorzxx fhrozen minyoungjwa spheppner jasonfinewelcome hrushikeshjadhav9 albnc linhduongtuan subhash-pal chaipat-ncm jesufemi-o

geneticalgorithmpython's Issues

How do I do it for multiple values of xi and yi?

How do I do it for multiple values of xi and yi? For example, I have:

X=
[[1.875, 1.875, 1.875, 2.5  ],
 [0.625, 3.125, 1.25 , 3.125],
 [3.125, 1.25 , 1.875, 0.625],
 [0.625, 1.25 , 0.625, 1.25 ],
 [0.625, 3.125, 0.625, 0.625]]
y=
[[3.],
[6.],
[9.],
[9.],
[6.]]

How do I calculate the weights for X that would fit y using the code above? Is there a way to calculate the error for each case and optimize accordingly from within the library?

Seed of Random Number Generators

Hi,
first of all, thank you for the excellent work you do.

When experimenting for scientific research purposes, it is good practice to specify seeds for random number generators. This allows to support exact results replicability.
I could not find the option to specify the seed of the random generators in the current library, for example when initializing the population.

I was wondering if I am missing something or if this is a potential enhancement of the current library.

Thank you.

Implementation of the elitism feature

Initially I would like to thank you for the excellent work.

Moving on to the question at hand, I've identified that there is no version of PyGAD that has already delivered elitism functionality.
Currently a similar effect can be achieved with just a few parent selection methods by making use of the variable keep_parents = -1(default) or >0.

As noted in the code comment itself:
"For some parent selection operators like rank selection, the parents are of high quality and it is beneficial to keep them in the next generation. In some other parent selection operators like roulette wheel selection (RWS), it is not guranteed that the parents will be of high quality and thus keeping the parents might degarde the quality of the population."
This functionality is interesting and important, so its implementation is my suggestion to make this work even better!
The expected behavior would be elitsm = 0(default) or >0, if >0 a specific number of best solutions or individuals from the current population would be kept in the next generation population.

This is my suggestion, thanks again!

Bug in save best solutions

Description

If save_best_solutions = True, the best_solutions list is overwritten after each generation when a new population is being generated. A bug appears on line number 748. Solutions stored in best_solutions are probably just shallow copies, i.e., pointers to population.

What the bug looks like

best_solutions = []
best_solutions = [[9, 10, 5, 9, 9]] # after append on the line 728, it holds first best solution found
best_solutions = [[6, 9, 6, 9, 6]] # after the line 748, original value has been changed (changed after every single generation)

[FEATURE] Add Multiprocess Capabilities! :)

I know in the documentation, or on an article I read (can't remember which) it said that PyGAD didn't perform well enough in multiprocessing to warrant adding it as a feature, however I have a GREAT need for it with a lot of my fitness functions that I create using PyGAD. Would be awesome to see it get implemented as another feature before running a GA search.

I envision something like adding a parameter use_multiprocessing = True, and num_workers = multiprocessing.cpu_count(), and if those are enabled, start a process pool for each chromosome in the current population, so each population item gets its own worker. When the generation is done, the pool is closed, and then when the next generation starts, the pool fires up again for the new population. Pseudo-code would look something like:

import concurrent.futures

if use_multiprocessing == True:
    with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
        results = [executor.submit(fitness_func, solution, solution_idx) for solution, solution_idx in current_population]
        for f in concurrent.futures.as_completed(results):
            ind_solution_result = f.result() #[0]
            # Logic for what to do with the individual solution stuff here
        executor.shutdown(wait=True)
else:
    #...the rest of the default PyGAD behaviour

...I recognize this COULD be a big undertaking, but doing it this way would allow the current population of chromosomes/generation to be gone through much quicker than having to wait for a linear progression when more cpu cores are available.

You COULD also create several ga_instance's to run simultaneously yes, but I think being able to get through the generations themselves quicker is a better idea.

Would love to see this get implemented as I love PyGAD and don't really want to switch to DEAP as PyGAD is much easier to control/use IMO.

ga_instance.best_solution() performs all solutions in instance

The best_solution() method of a PyGAD.GA() instance runs all the solutions within the instance. This seems unnecessary as the instance could save the best solution on the fly and therefore not need to run all the solutions in the instance.

Optimize problem using constraints

Hi @ahmedfgad

Let's consider
y = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6
where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=44

Is it possible to get the best solution for constrained weight parameter i.e if I want w1 to vary from (0,1) and w2 from (10,100) and so on ..... (Different bounds for each variable)

Thanks

Add plot customization feature

Hello!

I want to ask for a plot customization feature, so one can title plots, or make subplots for example. Now all plot_result() method does is plotting a result and one can't do anything more with it.

Thank you!

fitness_func performance issue

The fitness_func method is called more then ones for each generation. This can be avoided using the previous calculated fitness instead calling again fitness_func or cal_pop_fitness. This issue is relevant if the fitness calculation is high demanding (like the final score of a videogame match).

allow_duplicate_genes not working

Hi!

I am trying to solve TSP with GA and it seems like allow_duplicate_genes is not working.

Reproduction:
TSP with 32 citites, each city is represented by number [0, ..., 31]

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=2,
                       fitness_func=fitness,
                       init_range_low=0,
                       init_range_high=32,
                       num_genes=32,
                       gene_space=a = np.arange(0,32,1),
                       gene_type=int,
                       allow_duplicate_genes=False,
                       )

a = ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print(f'{solution}')
solution.sort(axis=0)
print(solution)

It gives:
[25 15 20 1 30 1 19 13 29 10 28 3 24 12 12 5 0 26 26 6 7 2 23 16 20 18 8 11 18 3 17 26]
[ 0 1 1 2 3 3 5 6 7 8 10 11 12 12 13 15 16 17 18 18 19 20 20 23 24 25 26 26 26 28 29 30]

As you see numbers 1, 3, 12, 18, 20, 26 are duplicated

Advice for GA implementation in my project

Hey!

I am working on some code with open cv to track the percentage of frames in which my face is detected in terms of "attention span". I intend to optimize this attention span with an evolutionary solver that iterates a combination of diffused and direct light levels in my environment with an arduino board. Owing to the long range of time over which a generation of light levels need to be iterated I am unsure as to the best Evolutionary algorithm library or approach i can use to do this. I am currently not connected to an arduino and am just trying to do this with dummy numbers (1-4,1-4) for diffused and direct respectively. The idea is that every 15 mins or so ( lets say 50,000 frames) the light levels assume a new and improved combination and wait for another 15.

I am new to python and coding in general so I am finding it harder to figure out which library and it's ideal implementation for this purpose I'd be grateful if you have any advice.

Use of custom `parent_selection_type` function

Hi,

Thanks for your great module. I have an implementation question that might be an issue. I am trying to use a custom fitness function with multiple variables. I also implemented a custom parent_selection_type which uses a custom function to sort the multivariate fitness function. I then run into an issue that, regardless of the parent_selection_type function that I passed the self.steady_state_selection function is called in run function.

Could it be that it would be more logical to call the custom parent_selection_type here, which is available as self.select_parents?

Thanks for taking this into consideration.

Kind regards,

Fedor Baart

gene_space not working as expected

The attribute gene_space is not working as expected.
I have 4 genes, and have set gene_space as follows:
gene_space = [{'low': 1, 'high': 30}, {'low': 1, 'high': 20}, {'low': 0.0001, 'high': 4}, {'low': 0.00001, 'high': 4}]
However, the 3rd and 4th genes are taking values above 4.

The version of pygad is 2.16.3
Is there any other configuration that is required to impose these restrictions ?

Please specify that your library only supports deterministic subject

PyGAD version: 2.16.3

Please specify somewhere in the docs that PyGAD is only geared for deterministic problems, and are not suitable for subjects that will produce different outputs even when given the same inputs, such as games. Or we would very welcome PyGAD to account for stochastic problems in the future.

We had to find out about this ourselves by digging into GA.cal_pop_fitness() to find out that parents kept from the previous generation will directly use the fitness value calculated during the previous generation.

Thank you.

Rank versus Steady State Parent Selection

It seems that the two parent selection technique are exactly the same. Rank parent selection is however meant to be more of an explorative parent selection approach where every chromosome/solution is assigned a selection probability with respect to its rank (which is based on its fitness). This is meant to decouple the selection probability from the population fitness distribution in order to avoid selection exploitation from very strong solutions.

Rank selection is essentially the same as Roulette Wheel Selection, but instead of weighting each solutions selection probability by its fitness, the weighting should be done the rank of the solutions compared to the other solutions.

FEATURE REQUEST: Spatial awareness

Hello, I've read that GAs can be improved by introduced a spatial dimension, which can avoid early local minima being found.

This effectively means giving each individual an x,y coordinate, and only allowing mating within a specific distance.

Have you considered introducing this to pyGAD? If so then I'd love to learn more and understand why you decided not to include it. If not then is this something you would consider?

Many thanks for a fantastic library!

solution_idx

Why does the fitness function need solution_idx as an input parameter if it doesn't do anything with it?

when set mutation_type="adaptive", it ran into error.

Input argument num_parents in GA.stochastic_universal_selection() method

Line:
pointers_distance = 1.0 / self.num_parents_mating # Distance between different pointers.

Should be:
pointers_distance = 1.0 / num_parents # Distance between different pointers.

fitness_func() Repetition Issues

GeneticAlgorithmPython/pygad.py

Line 3080 in c87641b

def best_solution(self, pop_fitness=None):

fitness_func() is repeatedly called whenever I call to best_solution() (for example, on_generation). Maybe it's called in order of best_solution() -> cal_pop_fitness0 -> fitness_func()

I think it need to change pop_fitness = self.last_generation_fitness on line 3094 or fix all the examples where best_solution() is called without a argument.

~~Also, in run(), a fitness_func() is called unnecessarily because of cal_pop_fitness() above the main for statement.~~

Penalty functions not working as expected in PyGAD

I'm trying to use PyGAD to solve a minimization problem which is part of a bigger algorithm I'm trying to implement, but I have no idea on the appropiate way to include the constraints. I've tried to add if structures to the fitness function to add a penalty value to the objective, but it isn't always working.

What I'm trying to solve is a binary decision problem, which is meant to place a rectangle in a set of points that define a grid inside a bigger rectangle in which some rectangles might have been already placed. The current function just takes into account the distance to the already placed rectangles, but I'll probably add more parameters in the form of rewards and penalties. The constraints to avoid exceeding the site boundaries, the overlap between rectangles and placing the rectangle avoiding or inside some specific zones, make the problem quite complex, so thats why I'm trying to use a discrete set of possible locations and the GA to solve it.

I think I can rework some constraints, also have to add others. I tried using 1/Distance and -Distance, due to the fact that PyGAD always tries to maximize and I want to minimize said Distance. My current fitness function is the following:

def fitness_func(solution, solution_idx):
    Distance = 0
    xi = sum(LookUpList[i][0]*solution[i] for i in range(len(LookUpList)))
    yi = sum(LookUpList[i][1]*solution[i] for i in range(len(LookUpList)))
    alphai = sum(LookUpList[i][2]*solution[i] for i in range(len(LookUpList)))
    LXi = Stages[Current]['LX']
    LYi = Stages[Current]['LY']
    VXi = (LXi/2*(alphai-1)**2,LYi/2*alphai)
    VYi = (LYi/2*(alphai-1)**2,LXi/2*alphai)
    Penalty = np.inf
    
    # Only placed in one spot
    if sum(solution) > 1:
        Distance += Penalty
    
    # Site boundary constraint
    if xi+VXi[0]+VXi[1] >= Site[0] or xi-VXi[0]-VXi[1] <= 0 or yi+VYi[0]+VYi[1] >= Site[1] or yi-VYi[0]-VYi[1] <= 0:
        Distance += 1000*max(Site)
    
    # Avoid overlap between facilities
    for p in Previous:
        xp = Stages[p]['X']
        yp = Stages[p]['Y']
        alphap = Stages[p]['Alpha']
        LXp = Stages[p]['LX']
        LYp = Stages[p]['LY']
        VXp = (LXp/2*(alphap-1)**2,LYp/2*alphap)
        VYp = (LYp/2*(alphap-1)**2,LXp/2*alphap)
        if xi+VXi[0]+VXi[1] <= xp+VXp[0]+VXp[1] and xi-VXi[0]-VXi[1] >= xp-VXp[0]-VXp[1] and yi+VYi[0]+VYi[1] <= xp+VYp[0]+VYp[1] and yi-VYi[0]-VYi[1] >= xp-VYp[0]-VYp[1]:
            Distance += Penalty
    
    # Zones where a certain facility can't be placed
    for e in ExclusionZones:
        if Current == ExclusionZones[e]['Facility']:
            xp = ExclusionZones[e]['X']
            yp = ExclusionZones[e]['Y']
            LXp = ExclusionZones[e]['LX']
            LYp = ExclusionZones[e]['LY']
            if xi+VXi[0]+VXi[1] <= xp+LXp/2 and xi-VXi[0]-VXi[1] >= xp-LXp/2 and yi+VYi[0]+VYi[1] <= xp+LXp/2 and yi-VYi[0]-VYi[1] >= xp-LXp/2:
                Distance += Penalty
    
    for p in Previous:
        Distance += abs(xi-Stages[p]['X']) + abs(yi-Stages[p]['Y'])
    fitness = - Distance
    return fitness

The configuration of the GA and it's execution are done as follows:

num_generations = 150
num_parents_mating = 2

sol_per_pop = 10
num_genes = 2*(Grid[0]-1)*(Grid[1]-1)
gene_type = int

init_range_low = random_mutation_min_val = 0
init_range_high = random_mutation_max_val = 2

parent_selection_type = 'sss'
keep_parents = 1

crossover_type = 'single_point'

mutation_type = 'random'
mutation_by_replacement = True
mutation_percent_genes = 10

save_solutions = False

ga_instance = pygad.GA(num_generations = num_generations,
                       num_parents_mating = num_parents_mating,
                       fitness_func = fitness_func,
                       sol_per_pop = sol_per_pop,
                       num_genes = num_genes,
                       gene_type = gene_type,
                       init_range_low = init_range_low,
                       init_range_high = init_range_high,
                       parent_selection_type = parent_selection_type,
                       keep_parents = keep_parents,
                       crossover_type = crossover_type,
                       mutation_type = mutation_type,
                       mutation_by_replacement = mutation_by_replacement,
                       mutation_percent_genes = mutation_percent_genes,
                       save_solutions = save_solutions
                       )

ga_instance.run()

I have a function that evaluates the dictionary called Stages, which stores all the data relevant to the algorithm, that gives me the final cost value. It matches the one I get after running the PyGAD instance, but when plotting the solution with another function (I don't think is relevant, just a matplotlib figure with shapes drawn) I can see the solution isn´t always in the feasible space. I can understand some overlap due to the grid being finite so placing the new facility in one spot would be the best solution if this spot was placed a little bit lower, upper or to the side. However, this could be adjusted if the penalty function took into account how much it overlaps, so it doesn't bother me that much.

What I dont understand is why the cost function just gives me the distance, not including the penalty value that should be added as it's definitely violating the constraint stated in the condition. Should I find another way of stating constraint violation? Also, I'm starting to find that some runs don't give a solution at all.

FEATURE REQUEST: args to pass parameter to fitness function

First of all, thank you for the great repository.

The thing we need to improve about pygad is args for fitness function.
Currently, fitness function takes only two inputs: solution and solution index.
However, sometimes we also need to take extra inputs for the fitness function.

In the example of diff_evol from scipy, it has "args" to pass parameters.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html

If you think this is a good feature, I will try to implement this by myself.

feature request: variable length genetic algorithm

a whole class of problems such as topological partitioning or bezier path finding naturally lend themselves to variable-length genetic alogorithms. the first GA I ever wrote was variable length. I was surprised to find that this is not an option.

This could be added simply by allowing per-parent selection of points in mutation.

The genes are always integers even if the gene_space attribute has float values.

In the next code, the initial_population parameter is used to feed an initial population that has 8 solutions with 2 genes each. The gene_space parameter is used which has the half values starting from 0.5 to 15.5.

When the mutation is applied, it is expected that some genes have floating-point values like 3.5, 6.5, 0.5, 2.5, etc. But all genes are integers.

import pygad
import numpy

init_pop = ((1, 1), (3, 3), (5, 5), (7, 7), (9, 9), (11, 11), (13, 13), (15, 15))

def fitness_func(solution, solution_idx):
    fitness = numpy.sum(solution)
    return fitness

gene_space = numpy.arange(0.5, 15.5, 1)

ga_instance = pygad.GA(initial_population=init_pop,
                       num_generations=4,
                       num_parents_mating=8,
                       fitness_func=fitness_func,
                       gene_space=gene_space)

ga_instance.run()

Solution_FItness array and solutions arrays are in different length.

I am using pygad, for GA, to find combination of solutions which would satisfy conditions. I have got a code, which runs 15 generations with 40 populations. When GA stops running, the size of array is 640 where as array is 600. I am looking for a single array which would have solutions for all trials with fitness array next to it. However, i was expecting them to be equal. May be i am doing something wrong?

Index error on running GA, looks like best match idx is looked in a zero size array, what can be the reason?

~/pypi_local/pygad/pygad.py in run(self)
1261 self.last_generation_fitness = self.cal_pop_fitness()
1262
-> 1263 best_solution, best_solution_fitness, best_match_idx = self.best_solution(pop_fitness=self.last_generation_fitness)
1264
1265 # Appending the best solution in the current generation to the best_solutions list.

~/pypi_local/pygad/pygad.py in best_solution(self, pop_fitness)
3115 pop_fitness = self.cal_pop_fitness()
3116 # Then return the index of that solution corresponding to the best fitness.
-> 3117 best_match_idx = numpy.where(pop_fitness == numpy.max(pop_fitness))[0][0]
3118
3119 best_solution = self.population[best_match_idx, :].copy()

IndexError: index 0 is out of bounds for axis 0 with size 0

This is how I have created my GA instance before run:

num_generations = 1500
num_parents_mating = 20

sol_per_pop = 50
num_genes = num_customers
gene_type = int

init_range_low = 0
init_range_high = 6
gene_space= np.arange(6)

parent_selection_type = "sss"
keep_parents = 20

crossover_type = "single_point"

mutation_type = "random"
mutation_num_genes= [3, 1]
mutation_probability = [0.25, 0.1]
mutation_percent_genes = [20,10]

# create an instance of the pygad.GA class 
global ga_instance
ga_instance = pygad.GA(num_generations=num_generations,
                   fitness_func=fitness_func,
                   num_parents_mating=4,
                       gene_space = gene_space,
                   sol_per_pop=50,
                   num_genes=num_genes,
                    gene_type = gene_type,
                   mutation_type="adaptive",
                   mutation_num_genes=(3, 1),
                      save_solutions= True)

ga_instance.run()

The initialize_population function only intializes populations that are one dimensional.

The num_genes attribute is not actually used at all in the initialize function attribute.

Issue with pygad.load()

When I load a previously saved instance of the genetic algorithm with ga_instance = pygad.load(filename=filename) the loaded instance has only the best solution as parent and not the selected number of parents from the save instance. To articulate, for num_parents_mating=2 and keep_parents=-1 the loaded instance has two identical parents (two copies of the best solution of the saved instance) and not the two parents of the saved instance

You recommend pip3 for linux and mac, but pip for windows. Why?

If it's pip3 anywhere, it's pip3 everywhere. Windows gets pip3 from python3 just like everybody else

Customize initialize_population()

The initialize_population() is called directly by the constructor of GA, i.e., __init__().
That function randomly creates values from -4 to 4. I have a feeling that that function should be customized, just like fitness_func() so that it could be adapted based on problems at hand.

Project doesn't state a license

The README.md describes this project as 'open-source' but I can't find an actual license anywhere.
Please choose and specify the license under which this code is released as open-source, so that potential users and contributors know what they are permitted to do with it and under what conditions.

'mutation_type = None' not allowed

I did not use mutation, pygad print the following warning:

If you do not want to mutate any gene, please set mutation_type=None.

But when I set mutation_type=None, or mutation_type="None", then pygad crashed:

  File "C:\My\MyPythonProject\GeneticAlgo\venv\lib\site-packages\pygad\pygad.py", line 282, in __init__
    raise TypeError("The expected type of the 'mutation_type' parameter is str but ({mutation_type}) found.".format(mutation_type=type(mutation_type)))
TypeError: The expected type of the 'mutation_type' parameter is str but (<class 'NoneType'>) found.

I looked at the source code, it seems that line 280 does not allow the possibility of mutation_type being None. Yet line 295 does allow such possibility.

Scramble mutation unnecessary computation.

Hello,
I do not think the np.flip in this line is necessary in the scramble_mutation as it only reverses the order of selected genes (they were already shuffled):

GeneticAlgorithmPython/pygad.py

Line 2074 in 158e53c

genes_to_scramble = numpy.flip(offspring[idx, genes_range])

IMO this is enough:
genes_to_scramble = numpy.flip(offspring[idx, genes_range])
Let me know what you think,
Cheers,
Max

Enforcing Percentage (Genes Must Sum to 1)

The values I'm trying to optimize are percentages, such that the sum of all genes should be exactly 1. Is there a way to enfoce this in PyGAD?
I can keep each gene individually bound [0, 1] with the gene_space hyperparameter, but I haven't yet managed to apply the "sum" constrain.
From the sequencing in the algorithm, would a custom function called on_generation perhaps be able to handle this? I couldn't yet crack that but I think it might be a solution.

Keep Parents Issue

Hello, first of all I would like to say you did an excellent job with the pygad project.

Secondly, I would like to address an issue I am having with the keep_parents parameter. I was running an algorithm yesterday and it was working fine, but some changes were made to the code apparently, and now everytime I assign a value to that parameter an error pops up. It is because some part of the code related to it has a variable as a tuple, but the atribute .shape is called upon, which is only supported for numpy arrays.

The error is the following:
AttributeError: 'tuple' object has no attribute 'shape'

And it happens on the line 1202 of the pygad.py code.

1200 elif (self.keep_parents > 0): 1201 parents_to_keep = self.steady_state_selection(self.last_generation_fitness, num_parents=self.keep_parents) -> 1202 self.population[0:parents_to_keep.shape[0], :] = parents_to_keep 1203 self.population[parents_to_keep.shape[0]:, :] = self.last_generation_offspring_mutation 1204

Is there a different way to implement this parameter now? Or if there is not is there a posibility that you would mind fixing the code?

Thanks for your time and work on the pygad project.

License ?

Hello, under what license is this project released under ? I would like to study it to learn from as it seems to be a very well commented project. Thank you.

Dynamic num_genes

Hello
I'm new to genetic algorithms, and I'm trying to figure out if I can use dynamic num of genes to optimise number of parameters and their values in my model, for example

any help would be appreciated

Error when trying to use the built-in plotting functions

pygad version: 2.16.3 (testing in a jupyter notebook), python version: 3.10.3

The function call ga_instance.plot_fitness() as well as other plot functions like ga_instance.plot_new_solution_rate() throw an error:

        matplotlib.pyplot.title(title, fontsize=font_size)
     -> matplotlib.pyplot.xlabel(xlabel, fontsize=font_size)
        matplotlib.pyplot.ylabel(ylabel, fontsize=font_size)

TypeError: 'str' object is not callable

This error also obviously prevents the rest of the code to run, which means I currently can not use the built-in plotting functions.

Just glancing at the error I would assume the problem stems from the string parameter for the xlabel name being called the same as the used pyplot function to create this plot and python not being able to differ between them.

However that's just a guess, maybe someone had this issue before and it is caused by something else?

Return value of pygad.GA.tournament_selection misleading

The parent_indices returned by pygad.GA.tournament_selection are actually the best indices selected from the K rand_indices. Within this method parents_indices are updated as followed:
parents_indices.append(selected_parent_idx)

where I believe it should be changed to:
parents_indices.append(rand_indices[selected_parent_idx])

KerasGA error: AttributeError: module 'pygad.kerasga' has no attribute 'KerasGA'

Hi!
I have tried to run the example that combines keras with your library (that one).

But python can't find the class KerasGA (as showed in the title)

I installed pygad using pip pip install pygad.

Looking in the file init.py I tried to add:

from .kerasga import * # Relative import.

And it seems to solve the problem...

For some reason fitness never exceeds 1.0

I use pygad to train my neural network. The code below is a test of pygad. And it worked. After I wrote simple NN implementation and tried to train it by pygad. But for some reason, fitness never exceeds 1.0. First I thought that my code doesn't work properly. But I again run my first test of pygad(the code below) and it has the same issue.

import math
import pygad


def calculate_neuron(input, weight, nonlinear=None, bias=False):
    """
    Calculate value of neuron.

    :param input: Input for neuron
    :param weight: Weight for each input
    :param nonlinear: Nonlinear function for neuron. If == None then neuron is linear
    :param bias: If true bias exist in previous layer
    :return: value of neuron
    """

    value = 0
    for i in range(len(input)):
        value += input[i] * weight[i]

    if bias:
        value += 1 * weight[len(weight) - 1]

    if nonlinear is not None:
        value = nonlinear(value)

    return value


def sigmoid(x):
    return math.exp(x) / (math.exp(x) + 1)


def xor_neural_network(input, weight):
    """
    This is neural network that must implement xor function. (I didn't read about objects yet)

    :param input: Input for neural network. For this is 2
    :param weight: Weight for neural. Length is 9
    :return:
    """

    hid1 = calculate_neuron(input, weight[:3], sigmoid, True)
    hid2 = calculate_neuron(input, weight[3:6], sigmoid, True)

    output = calculate_neuron([hid1, hid2], weight[6:9], sigmoid, bias=True)
    return output


function_inputs = [[0, 0],
                   [0, 1],
                   [1, 0],
                   [1, 1]]

des_outputs = [0, 1, 1, 0]


def fitness_func(solution):
    outputs = []
    for input in function_inputs:
        outputs.append(xor_neural_network(input, solution))

    error = 0
    for output, des_output in zip(outputs, des_outputs):
        error += abs(output - des_output)

    fitness = 1 / error
    return fitness


if __name__ == "__main__":
    num_generations = 1000
    sol_per_pop = 800
    num_parents_mating = 4

    mutation_percent_genes = 10

    parent_selection_type = "sss"

    crossover_type = "single_point"

    mutation_type = "random"

    keep_parents = 1

    num_genes = 9

    ga_instance = pygad.GA(num_generations=num_generations,
                           sol_per_pop=sol_per_pop,
                           num_parents_mating=num_parents_mating,
                           num_genes=num_genes,
                           fitness_func=fitness_func,
                           mutation_percent_genes=mutation_percent_genes,
                           parent_selection_type=parent_selection_type,
                           crossover_type=crossover_type,
                           mutation_type=mutation_type,
                           keep_parents=keep_parents,
                           )

    while True:
        ga_instance.run()
        print(ga_instance.best_solution())
        print(xor_neural_network(function_inputs[0], ga_instance.best_solution()[0]))
        print(xor_neural_network(function_inputs[1], ga_instance.best_solution()[0]))
        print(xor_neural_network(function_inputs[2], ga_instance.best_solution()[0]))
        print(xor_neural_network(function_inputs[3], ga_instance.best_solution()[0]))
        ga_instance.plot_result()

gene_space with condition between genes

Hi @ahmedfgad,
First, you did an excellent project, congratulations!

Secondly I have a question, my problem has a peculiar condition in gene generation, the sum of two genes must be in range (0,1) like:
0<sum(variable1+variable2)<1

How can I implement ?

My gen space:
[{'low': 0, 'high': 1.0},
{'low': 0, 'high': 1.0},
{'low': 0, 'high': 3.5},
{'low': 0, 'high': 4.0}]

The first and second parameters must be within the range and the sum of the two parameters must be between 0 and 1

Thank you

Equation inputs

Hi Ahmed,

I'm using your code to solve a problem in my project. However, when I change the equation_inputs from [4,-2,3.5,5,-11,-4.7] to my datafile, that is a .dat with 2000 rows and 1 column, I get this error: "ValueError: operands could not be broadcast together with shapes (5,3) (3,2000)"

I'm using only three files, that's the reason of 3 in (3,2000).

I've partially solved my problem, if I use the max(datafile), but I don't find the best solution in all cases, because I'm improving only one point, instead of 2000. You can see the fitness evolution in this case, for example, in the attached figure. The fitness function is the Chi-square of one point.

Could you, please, give some insight about this particular problem?

enhancement - soft constraint / hard constraint

this engine has some nice features you may consider
https://github.com/AluBhorta/UCSPy-Engine

the definitions are in json - with soft / hard constraints / penalties defined.

generation callback not executed properly, leads to unwanted repition of fitness function

Working with 'pygad.kerasga' (other flavors not tested). Only one line of the callback_generation() function is executed with each generation. The number of lines in callback_generation() function determines number of times fitness_function() is repeated (unexpectedly).

Set a global counter inside fitness function to prove this:

count=0
def fitness_func(solution, sol_idx):
    global count
    count += 1
    print(count)

Expected number of iterations:

callback_generation():
    print("Generation = {generation}".format(generation=ga_instance.generations_completed))
    #print("Fitness    = {fitness}".format(fitness=ga_instance.best_solution()[1]))

2X wasted cycles

callback_generation():
    print("Generation = {generation}".format(generation=ga_instance.generations_completed))
    print("Fitness    = {fitness}".format(fitness=ga_instance.best_solution()[1]))

fitness_func runs n_times
Prints "Generation..."
fitness_func runs n_times
Prints "Fitness..."

... and so on. Adding more lines of code multiplies the wasted cycles.

EDIT: I believe there is an additional cycle executed at the beginning of the ga_instance resulting n+1 cycles, this is on top of the bug as described above.

Check for range of values

Inside the mutation function, you are adding random values to the genes. This can cause the gene values to drift over time and in one of my examples, it is causing the value to go beyond the initial range of [-4, 4].

Maybe some checks need to be added for the range of acceptable values for example, instead of adding a random value between -1 and 1, maybe you can just pick a random value from the original range and use that as the mutated value.

Randomizing the gene that gets mutated

It looks like your code (function - mutation) is mutating the same gene(s) each time. Should this not be randomized?

encoding process

I am a beginner in programming. May I ask, does this package have encoding and decoding process? For example, binary encoding. where can I see this?

plot_fitness(plot_type="scatter") fails after 2nd run()

Hi! Great project, thanks. However I've experianced some bugs.

model = pygad.GA(someparameters - may be even sample model from documentation)
model.run()
model.plot_fitness(plot_type="scatter")
ok
model.run()
model.plot_fitess()
ok
model.plot_fitness(plot_type="scatter")

ValueError Traceback (most recent call last)
in
----> 1 ga_instance.plot_fitness(plot_type="scatter")

~/.conda/envs/test/lib/python3.9/site-packages/pygad/pygad.py in plot_fitness(self, title, xlabel, ylabel, linewidth, font_size, plot_type, color, save_dir)
3150 matplotlib.pyplot.plot(self.best_solutions_fitness, linewidth=linewidth, color=color)
3151 elif plot_type == "scatter":
-> 3152 matplotlib.pyplot.scatter(range(self.generations_completed + 1), self.best_solutions_fitness, linewidth=linewidth, color=color)
3153 elif plot_type == "bar":
3154 matplotlib.pyplot.bar(range(self.generations_completed + 1), self.best_solutions_fitness, linewidth=linewidth, color=color)

~/.conda/envs/test/lib/python3.9/site-packages/matplotlib/pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite, data, **kwargs)
3066 vmin=None, vmax=None, alpha=None, linewidths=None, *,
3067 edgecolors=None, plotnonfinite=False, data=None, **kwargs):
-> 3068 __ret = gca().scatter(
3069 x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
3070 vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths,

~/.conda/envs/test/lib/python3.9/site-packages/matplotlib/init.py in inner(ax, data, *args, **kwargs)
1359 def inner(ax, *args, data=None, **kwargs):
1360 if data is None:
-> 1361 return func(ax, *map(sanitize_sequence, args), **kwargs)
1362
1363 bound = new_sig.bind(ax, *args, **kwargs)

~/.conda/envs/test/lib/python3.9/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite, **kwargs)
4496 y = np.ma.ravel(y)
4497 if x.size != y.size:
-> 4498 raise ValueError("x and y must be the same size")
4499
4500 if s is None:

ValueError: x and y must be the same size

Inconsistency between codes in tutorial and the codes in github

Hi there,

Thanks so much for your code, and it looks clear logically. I followed your tutorial in https://towardsdatascience.com/genetic-algorithm-implementation-in-python-5ab67bb124a6, and I imported your packages from GitHub to run the example exactly; However, I ran into errors.

After inspecting, it seems the codes you provided in the tutorial is inconsistent with your code in GitHub. For example, your function cal_pop_fitness() has two input positional argument equation_inputs, pop, in your tutorial, but there is no positional argument in your function under the GitHub module.

Will you be able to confirm with my confusion if possible, and it would be highly appreciated if you could help me with making the example get running.

Best regards,

-Yili