gkhayes / mlrose Goto Github PK

View Code? Open in Web Editor NEW

230.0 230.0 242.0 414 KB

Python package for implementing a number of Machine Learning, Randomized Optimization and SEarch algorithms.

Home Page: https://mlrose.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 87.46% Shell 0.52% Jupyter Notebook 12.03%

mlrose's People

Contributors

Stargazers

Watchers

Forkers

almoslmi redscare373 scribby182 hermanschaaf wonilpark daronprater cooknl jfs42 zemerica bspivey jeffrobots oshaikh13 tyz928 vivekiyer melazalea nacharya114 coderfanfan alexandredauquier ayertugba sheankim ludwiktrammer domfrecent danieltgustafson bblazei phunehehe nibelungvalesti daivinhtran zhuyuwei1111 cipherrat thongtran715 yishuihanhan gaimjkp goryszewskig arslankathia cubextreme jianleisun raymonica eglxiang lyle-nel maritzamills milkigit hiive tmsuidan allezved parkds wesleysee vermachint minicow pdjely maksimkazanskii arbaforce donbranson1 ewall kingloko asantas93 hworden kirenebre davideasaf softwarephil afcarl mariojovanovic srahsrah kelvinheng92 tg3orge dheeraj141 adataholic joshneighbor jtmancilla wtld mendiang wiler hpsuresh12345 jptrinastic alienii123 emanucrimson mcadams2 aishwaryaghegde pschauppner obokaiwele davidmarins umsenhorqualquer eylul981 ronghuizhou mjtieman rajaxion tedmanmarszalek christopherbilg alexandrebarb0sa blakeheimann2 giladfelsen alexjavidi silenzio777 nicolasmahn kelliemadson wasehahmad pjotoole yzpride jamontanac smruthijain zguo370

mlrose's Issues

ImportError: cannot import name 'six' from 'sklearn.externals'

Environment:

OS: WSL Ubuntu
Python 3.8
Dependencies installed:
- mlrose 1.3.0
- numpy 1.19.0
- scipy 1.5.1
- six 1.14.0
- sklearn 0.0

When importing mlrose, receive error:

File "/lib/python3.8/site-packages/mlrose/__init__.py", line 12, in <module>
    from .neural import NeuralNetwork, LinearRegression, LogisticRegression
File "/lib/python3.8/site-packages/mlrose/neural.py", line 12, in <module>
    from sklearn.externals import six
ImportError: cannot import name 'six' from 'sklearn.externals' (/lib/python3.8/site-packages/sklearn/externals/__init__.py)

Also see stackoverflow question: https://stackoverflow.com/q/61867945/1164465

Support for asymmetric TSP

Any interest in supporting asymmetric TSP?

For example by the method outlined here.

Question on inital states for GA and MIMIC

Got a question. Why initial state is not a parameter for GA and MIMIC (unlike RHC and SA) while it is always reset ?

Bias for hidden layers

It appears that the neural network in mlrose only adds the bias node to the input layer, but not the hidden layers. This is different from what I've seen elsewhere, for example sklearn.

I have a branch that adds this bias to the hidden layers too, and this works well enough for me. Is this something desirable for mlrose? If so I'll try to fix the tests and submit a PR.
https://github.com/phunehehe/mlrose/tree/bias

GA, TSP, and MaxK Color

I've run several experiments with the optimization problems. Unexpectedly, GA doesn't perform best on TSP as expected (in fact rhc was best followed closely by sa). Also, for MaxKColor, MIMIC should perform best but it's only slightly better than GA while RHC is far better and with SA closely behind again. Is it possible there's a bug in the fitness determination or am I setting up the problems incorrectly?

https://github.com/tmsuidan/cs7641-assignment2/tree/master/MaxkColor

https://github.com/tmsuidan/cs7641-assignment2/tree/master/tsp

no 'random' method in the class OptProb

On line 181 of opt_probs.py, you are calling "state = self.random()". There is not random() method defined in this class.

Max K Color Bug

'max_restarts' should be an option for NeuralNetwork

When NeuralNetwork gets initialized with the algorithm "randomized_hill_climbing," that algorithm gets a default max_restart of 0.

unable to set state vector lower bound?

Hi, thanks for this package but I find it's not available to set state vector. Is it possible to realize?

I am try to fit a nonlinear curve to extract parameters but simulate annealing directly drops state vector to 0 in a continuous optimization problem. thx

Fast MIMIC Fails TSPOpt

When running TSPOpt problem with custom distance. MIMIC fails only when fast_mimic=True.

Code that works:

best_state, best_fitness, fit_curve = mlrose.mimic(problem_cust, max_attempts=100, max_iters=1000, random_state=42,
                                                       curve=True)

Code that fails:

best_state, best_fitness, fit_curve = mlrose.mimic(problem_cust, max_attempts=100, max_iters=1000, random_state=42,
                                                   fast_mimic=True, curve=True)

It is specifically failing in opt_probs.py on line 1010. The issue is that remaining is empty and it's trying to pull a random choice from an empty array.

'genetic_alg' does not work for multiple layers

From the tutorial (revised) hidden_nodes = [5,5,5,5,5]

np.random.seed(3)

nn_model1 = mlrose.NeuralNetwork(hidden_nodes = [5,5,5,5,5], activation ='relu',
algorithm ='genetic_alg',
max_iters = 1000, bias = True, is_classifier = True,
learning_rate = 0.0001, early_stopping = True,
clip_max = 5, max_attempts = 100)

nn_model1.fit(X_train_scaled, y_train_hot)

Error message:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

mlrose.genetic_alg returns fitness_curve even if curve is not mentioned

GA probability of selecting parents incorrect

Hi,

the probability of selecting parents is calculated by (https://github.com/gkhayes/mlrose/blob/master/mlrose/opt_probs.py eval_mate_probs function):

self.mate_probs = pop_fitness/np.sum(pop_fitness)

This results in high probabilities for bad individuals and low probabilities for good individuals for minimization problems.
Example: Population with 2 individuals, individual_1 fitness -1 and individual_2 fitness -2. The probability of selecting individual_1 (the good one) is 1/3 and probability of selecting individual_2 (the bad one) is 2/3. It is more likely to select individuals with poor fitness.

A quick fix could be the use of the reciprocal value for minimization problems:

if self.maximize == 1:
self.mate_probs = pop_fitness / np.sum(pop_fitness)
else:
self.mate_probs = 1/pop_fitness / np.sum(1/pop_fitness)

mlrose suitable for variable lengths discrete variables?

Hi,

I am looking for an alternative to a brute force optimisation approach.

My parameter set is the following. So, my problem is a multi-variate, single-objective problem. It is not related to any machine learning task.

    from sklearn.model_selection import ParameterGrid
    from multiprocessing import Pool

    var1 = 'var1'
    var2 = 'var2'
    abc = [1, 2]
    xyz = list(range(1_00_000))
    pg = [{'variant': [var1],
           'abc': abc,
           'xyz': xyz, },
          {'variant': [var2],
           'abc': abc, }]
    parameterGrid = ParameterGrid(pg)

So I have variable lengths of parameters, my parameters can only have discrete values, and not all parameters do exist dependant from other parameters.

I would like to avoid brute force, as the execution of the cost function is quite time consuming.

Would I be able to use mlrose for my problem?

(I ask this as DiscreteOpt expects fixed lengths for all parameters).

Thanks in advance!

Why do genetic_alg and mimic not support 'init_state' parameter?

the other optimization algorithms allows to give init_state as starting solution, I am not sure why genetic_alg and mimic do not accept init_state? Thanks.

simulated_annealing neighbor selection

In looking at simulated_annealing, the problem method random_neighbor() is called, which for ContinuousOpt objects computes the neighbors based on the step size, and only considers those neighbors (from what I can tell) within a step of the current state. I know it is common to open the "neighbors" to include most/all of the population (perhaps some multiple of step size * a normal probability, or the temperature is commonly used as the step size also), this way the algorithm can random walk at first (while temperature is high), and then settle in on only higher points as the temperature decreases. I think of simulated annealing as almost a random restart at every iteration, with decreasing likelihood of jumping to a lower point as the temperature decreases.

EDIT: after further research, it seems like your approach is a valid one, especially as talked about in the CS7641 lectures. I stumbled on this link, which ended up helping my case, and is a slightly different implementation of simulated annealing : https://www.mathworks.com/help/gads/how-simulated-annealing-works.html

Randomized Hill Climbing improvement

According to referenced "Clever Algorithms: Nature-Inspired Programming Recipes":

neighbors with better or equal cost should be accepted, allowing the technique to navigate across plateaus in the response surface

I suggest a change in the code like this:

while (attempts < max_attempts) and (iters < max_iters):
    iters += 1
    attempts += 1

    # Find random neighbor and evaluate fitness
    next_state = problem.random_neighbor()
    next_fitness = problem.eval_fitness(next_state)

    improvement = next_fitness - problem.get_fitness()

    # If the neighbor is better or equal move to that state
    if improvement >= 0:
        problem.set_state(next_state)
        # if better than reset attempts counter
        if improvement > 0: 
            attempts = 0

I personally work with problems where this small change has a big impact on results.

get param set param error

There is an issue where in the BaseNeuralNetwork class, the 'set_param' and 'get_param' function do not get and set all of the parameters. This causes an issue when using basic sklearn functions such as 'cross_validate'

Edit: Looks like this may be fixed in PR #55

to fix update the following functions like this:

    def get_params(self, deep=False):
        """Get parameters for this estimator.

        Returns
        -------
        params : dictionary
            Parameter names mapped to their values.
        """
        params = {'activation': self.activation,
                  'algorithm': self.algorithm,
                  'hidden_nodes': self.hidden_nodes,
                  'max_iters': self.max_iters,
                  'bias': self.bias,
                  'is_classifier': self.is_classifier,
                  'learning_rate': self.learning_rate,
                  'early_stopping': self.early_stopping,
                  'clip_max': self.clip_max,
                  'restarts': self.restarts,
                  'schedule': self.schedule,
                  'pop_size': self.pop_size,
                  'mutation_prob': self.mutation_prob,
                  'max_attempts': self.max_attempts,
                  'random_state': self.random_state,
                  'curve': self.curve}

        return params

    def set_params(self, **in_params):
        """Set the parameters of this estimator.

        Parameters
        -------
        in_params: dictionary
            Dictionary of parameters to be set and the value to be set to.
        """
        if 'hidden_nodes' in in_params.keys():
            self.hidden_nodes = in_params['hidden_nodes']
        if 'max_iters' in in_params.keys():
            self.max_iters = in_params['max_iters']
        if 'bias' in in_params.keys():
            self.bias = in_params['bias']
        if 'is_classifier' in in_params.keys():
            self.is_classifier = in_params['is_classifier']
        if 'learning_rate' in in_params.keys():
            self.learning_rate = in_params['learning_rate']
        if 'early_stopping' in in_params.keys():
            self.early_stopping = in_params['early_stopping']
        if 'clip_max' in in_params.keys():
            self.clip_max = in_params['clip_max']
        if 'restarts' in in_params.keys():
            self.restarts = in_params['restarts']
        if 'schedule' in in_params.keys():
            self.schedule = in_params['schedule']
        if 'pop_size' in in_params.keys():
            self.pop_size = in_params['pop_size']
        if 'mutation_prob' in in_params.keys():
            self.mutation_prob = in_params['mutation_prob']
        if 'activation' in in_params.keys():
            self.activation = in_params['activation']
        if 'algorithm' in in_params.keys():
            self.algorithm = in_params['algorithm']
        if 'max_attempts' in in_params.keys():
            self.max_attempts = in_params['max_attempts']
        if 'random_state' in in_params.keys():
            self.random_state = in_params['random_state']
        if 'curve' in in_params.keys():
            self.curve = in_params['curve']

TSPopt()- genetic_alg- Malfunction in max_attempts

I am using TSPopt() to solve simple TSP example with following code:-
->best_state, best_fitness, fittness_curve = mlrose.genetic_alg(problem_fit2, mutation_prob = 0.2, max_attempts = max_attempt, random_state = 2, curve=True)
I iterate max_attempt from 1 to 50, and get following size of fittness_curve:-
Output:-
Max attempts = 1; size of fittness curve = 2
Max attempts = 2; size of fittness curve = 3
Max attempts = 3; size of fittness curve = 4
Max attempts = 4; size of fittness curve = 5
Max attempts = 5; size of fittness curve = 6
Max attempts = 6; size of fittness curve = 7
Max attempts = 7; size of fittness curve = 15
Max attempts = 8; size of fittness curve = 16
Max attempts = 9; size of fittness curve = 17
Max attempts = 10; size of fittness curve = 18
Max attempts = 11; size of fittness curve = 32
Max attempts = 12; size of fittness curve = 33
Max attempts = 13; size of fittness curve = 34
Max attempts = 14; size of fittness curve = 35
Max attempts = 15; size of fittness curve = 36
Max attempts = 16; size of fittness curve = 37
Max attempts = 17; size of fittness curve = 38
Max attempts = 18; size of fittness curve = 39
Max attempts = 19; size of fittness curve = 40
Max attempts = 20; size of fittness curve = 41

It seems that more attempts are being made by the library than passed at parameters and the problem exist at values 7 and 11.

Kindly help to improve my understanding please.

tsp_mlrose-distance.py.pdf

Hill Climb Curve?

I was looking at the code and it looks like the curve array only gets updated once per restart, I'd expect it to update once per iteration.

Is that on purpose or a bug?

Code snippet below for context:

for _ in range(restarts + 1):
         ...

        while iters < max_iters:
            ...

       ...
        if curve:
            fitness_curve.append(problem.get_fitness())```

TypeError during Fitting

I can't fit my data (it works on sklearn though).
I keep getting this

AttributeError: 'float' object has no attribute 'exp'
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/negm/PycharmProjects/testa2/a2_1.py", line 47, in
nn_model.fit(x_train, y_train)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/neural.py", line 528, in fit
restarts=0, init_state=init_weights)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/algorithms.py", line 188, in random_hill_climb
problem.set_state(init_state)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/opt_probs.py", line 224, in set_state
self.fitness = self.eval_fitness(self.state)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/opt_probs.py", line 98, in eval_fitness
fitness = self.maximize*self.fitness_fn.evaluate(state)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/neural.py", line 290, in evaluate
self.y_pred = self.output_activation(outputs)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/activation.py", line 77, in sigmoid
fx = 1/(1 + np.exp(-x))
TypeError: loop of ufunc does not support argument 0 of type float which has no callable exp method`

The weird thing is i tried different activation functions, but it always maps back to 'sigmoid' !!

Feature Request: Use np functionality where needed, support random seeding

Thanks for putting together this resource!
To support evaluating fitness across an ndarray of examples, some functions should use np functionality - specifically I see regular sum() used in one_max fitness evaluation, replace with np.sum(axis=-1) or the like. Not sure where else this is relevant.

It'd also be helpful if we could provide random_seed to optimization objects and algorithms, for consistent results, would require adding a few np.random.seed() lines in the respective functions.

genetic_alg with size of 1 crashes

It probably isn't very meaningful, but if you have a size of 1 when using the genetic algorithm, then reproduce crashes.

fitness = mlrose.OneMax()
problem = mlrose.DiscreteOpt(length = 1, fitness_fn = fitness, maximize = True, max_val = 2)
mlrose.genetic_alg(problem)

Then you get a crash :(

BUG in eval_mate_probs()

The fitness values of a population might contain positive AND negative values in OptProb.eval_mate_probs(). Thus its sum might be positive, negative, or even accidentally zero. So we have to be more careful to avoid negative probabilities that arrive if the sum of the probabilities has a different sign than the probability of a member.

Set -1*inf values to 0 to avoid dividing by sum of infinity.

    # This forces mate_probs for these pop members to 0.
    pop_fitness[pop_fitness == -1.0*np.inf] = 0

    # Fitness of the population might contain positive AND negative values.
    # Thus its sum can be positive, negative or even accidentally zero.

    # All fitness values are zero
    if np.count_nonzero(pop_fitness) == 0:
        self.mate_probs = np.ones(len(pop_fitness)) \
                          / len(pop_fitness)
    else:
        # Make all values positive, the smallest will be offset by the sdev of the data
        pop_fitness += (pop_fitness.min() + pop_fitness.std())
        self.mate_probs = pop_fitness/np.sum(pop_fitness)

potential bug for problem.reset() not clearing fevals when running rhc algorithm with multiple restarts

Hey, thank you for creating this awesome package. I was using a fork hiive/mlrose and I seem to run into an issue that reset does not clear function evaluations properly when multiple restarts are specified for the rhc algorithm. And apologies if this might not be the proper place raising this, and let me know if there are other places I should raise this issue.

Issue Description:
I found problem.reset() not clearing fevals when running rhc algorithm.

Problem Setup:
I was using FlipFlop fitness function, DiscreteOpt for the problem, and rhc for the algorithm.

The way I call rhc is as follows. I was trying to have multiple runs of same parameters with different seed to reduce variance and I call reset function each time hoping to clear the states.

opt.reset()
 mlrose.random_hill_climb(opt, max_attempts=100, max_iters=3000, restarts=restart, init_state=None, curve=True, random_state=seed)

when I input restarts = 2 for example, I find the second fevals not being cleared. Attaching my fevals output as below:

The seed is  aaa (dummy just for illustartion)
[  1.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.  15.
  16.  17.  18.  19.  20.  21.  22.  24.  26.  27.  28.  29.  30.  31.
  32.  33.  34.  36.  37.  38.  39.  40.  41.  42.  43.  44.  45.  46.
  47.  48.  49.  50.  51.  52.  53.  54.  55.  56.  57.  58.  59.  60.
  61.  62.  63.  64.  65.  66.  67.  68.  69.  70.  71.  72.  73.  74.
  75.  76.  77.  78.  79.  80.  81.  82.  83.  84.  85.  86.  87.  88.
  89.  90.  91.  92.  93.  94.  95.  96.  97.  98.  99. 100. 101. 102.
 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116.
 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130.
 131. 132. 133. 134. 135. 136.]
The seed is  bbb
[315. 317. 319. 320. 321. 322. 324. 325. 326. 327. 328. 329. 330. 331.
 332. 333. 334. 335. 336. 337. 338. 339. 340. 342. 343. 344. 345. 346.
 347. 348. 349. 350. 351. 352. 353. 354. 355. 356. 357. 358. 359. 360.
 362. 363. 364. 365. 366. 367. 368. 369. 370. 371. 372. 373. 374. 375.
 376. 377. 378. 379. 380. 381. 382. 383. 384. 385. 386. 387. 388. 389.
 390. 391. 392. 393. 394. 395. 396. 397. 398. 399. 400. 401. 402. 403.
 404. 405. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415. 416. 417.
 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429. 430. 431.
 432. 433. 434. 435. 436. 437. 438. 439. 440. 441. 442. 443. 444. 445.
 446. 447. 448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459.
 460. 461. 462.]
The seed is  ccc, this got cleared 
[  1.   3.   4.   5.   6.   8.  10.  11.  13.  14.  15.  16.  17.  18.
  19.  20.  21.  23.  24.  25.  26.  27.  28.  29.  30.  31.  32.  33.
  34.  35.  36.  37.  38.  39.  40.  41.  42.  43.  44.  45.  46.  47.
  48.  49.  50.  51.  52.  53.  54.  55.  56.  57.  58.  59.  60.  61.
  62.  63.  64.  65.  66.  67.  68.  69.  70.  71.  72.  73.  74.  75.
  76.  77.  78.  79.  80.  81.  82.  83.  84.  85.  86.  87.  88.  89.
  90.  91.  92.  93.  94.  95.  96.  97.  98.  99. 100. 101. 102. 103.
 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117.
 118. 119. 120. 121. 122. 123.]
The seed is  49
[229. 230. 232. 233. 235. 236. 237. 238. 239. 240. 241. 242. 243. 244.
 245. 246. 247. 248. 249. 250. 252. 253. 254. 255. 256. 257. 258. 259.
 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271. 272. 273.
 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287.
 288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301.
 302. 303. 304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 315.
 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 328. 329.
 330. 331. 332. 333. 334. 335. 336. 337. 338. 339. 340. 341. 342. 343.
 344. 345. 346. 347. 348. 349. 350. 351. 352.]
The seed is  66
[263. 265. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278.
 279. 281. 282. 284. 285. 286. 287. 288. 289. 290. 291. 292. 293. 294.
 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305. 306. 307. 308.
 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 321. 322.
 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333. 334. 335. 336.
 337. 338. 339. 340. 341. 342. 343. 344. 345. 346. 347. 348. 349. 350.
 351. 352. 353. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364.
 365. 366. 367. 368. 369. 370. 371. 372. 373. 374. 375. 376. 377. 378.
 379. 380. 381. 382. 383. 384.]

Let me know if there is more information I could provide and once again, thank you for this awesome package.

bug for mutation_probability in genetic algorithms

I believe the inequality sign should be flipped in the following code:
As we want say 10% mutation probability to mean we want 10% of the time mutating children. At 10% with the current code, 90% of children will get mutated.

def mutate(self, child, mutation_probability):
if np.random.rand() > mutation_probability:
# do swap mutation
m1 = np.random.randint(len(child))
m2 = np.random.randint(len(child))
tmp = child[m1]
child[m1] = child[m2]
child[m2] = tmp
return child

Six Peaks Opt

Code being referenced:

mlrose/mlrose/fitness.py

Lines 393 to 408 in 71ee20a

 # Calculate head and tail values 

 head_0 = head(0, state) 

 tail_0 = tail(0, state) 

 head_1 = head(1, state) 

 tail_1 = tail(1, state) 

 # Calculate R(X, T) 

 if (tail_0 > _t and head_1 > _t) or (tail_1 > _t and head_0 > _t): 

 _r = _n 

 else: 

 _r = 0 

 # Evaluate function 

 fitness = max(tail_0, head_1) + _r 

 return fitness

I'm reading through: https://papers.nips.cc/paper/1328-mimic-finding-optima-by-estimating-probability-densities.pdf to understand the implementation of this alg and i'm finding that 6 peaks does not give 4 optima as expected.

The reason is because the evaluation always uses
fitness = max(tail_0, head_1) + _r on line 406

and NOT the tail or head that actually fulfilled the OR conditional for R(X,T).

In the image below, I run a four peaks and six peaks fitness function analysis and have circled the peaks that should match with the other max optimum based on what i presume to be the correct logic based on the above paper.

Potential Bug with Genetic Algorithm Function

While using the genetic algorithm function for a class project, I was getting poorer performance than expected, and after looking into the code to gain some insight I think I have found an issue. When the algorithm gets into the while (attempts < max_attempts) loop, the new population is bred from the original, and then the fitness of the new population is tested, which makes perfect sense. However, in order to test the fitness, the original population gets overridden, and even when the new fitness is worse than the original the population is never reverted back and the next attempt is based off of the population that performed worse.

Here is the code (algorithms.py, inside genetic_alg function):

    next_gen = np.array(next_gen)
    **problem.set_population(next_gen)                        <---- Population gets changed**

    next_state = problem.best_child()
    next_fitness = problem.eval_fitness(next_state) 

    # If best child is an improvement,
    # move to that state and reset attempts counter
    if next_fitness > problem.get_fitness():
        problem.set_state(next_state)
        attempts = 0

    **else:                       <---- If fitness is worse, there is no revert back to original population**
        attempts += 1

Maybe this implementation was intentional, but this doesn't seem like ideal functionality. If this was in fact done intentionally, and someone knows the reasoning, please let me know. It's got me curious

GA for NN incorrect

"This problem is because the probability of selecting a parent was determined by:

mate_probs = pop_fitness / np.sum(pop_fitness)

This will result in low numbers if pop_fitness is close to 0 and high numbers if pop_fitness is far from zero regardless of sign. Lets say there are 2 people in population with fitness -0.1 (good) and -10 (bad). the mate_probs would be -0.1 / -10.1 = ~0.01 for the good person and -10 / -10.1 or ~0.99 for the bad person. This means we are more likely to mate people with poor fitness...bad! "(taken from a piazza post in a class by the person who found it)

It's corrected in this fork which has basically redesigned/refactored most of the library with additional functionality
https://github.com/hiive/mlrose
specifically in this file
https://github.com/hiive/mlrose/blob/master/mlrose/opt_probs/_opt_prob.py

Bugs with Neural Network Moudle

Hello there, I played mlrose today, however when I used default tutorial_examples.ipynb, there were some bugs poped out. In example 6, I got following tracking info:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/IPython/core/formatters.py in call(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()

~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/IPython/lib/pretty.py in pretty(self, obj)
400 if cls is not object
401 and callable(cls.dict.get('repr')):
--> 402 return _repr_pprint(obj, self, cycle)
403
404 return _default_pprint(obj, self, cycle)

~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/IPython/lib/pretty.py in repr_pprint(obj, p, cycle)
695 """A pprint that just redirects to the normal repr function."""
696 # Find newlines and replace them with p.break()
--> 697 output = repr(obj)
698 for idx,output_line in enumerate(output.splitlines()):
699 if idx:

~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/sklearn/base.py in repr(self)
228 def repr(self):
229 class_name = self.class.name
--> 230 return '%s(%s)' % (class_name, _pprint(self.get_params(deep=False),
231 offset=len(class_name),),)
232

TypeError: get_params() got an unexpected keyword argument 'deep'`

After some digging, i think the bug comes from the following parts:

neural.py line 609: get_params(self). In skilearn BaseEstimator the original signature of the funciton should be get_params(self, deep=False) which are incompatible.
neural.py line 621: 'learning_rate': self.lr. I think it should be 'learning_rate': self.learning_rate. There is no lr used in the class.
neural.py line 648: self.lr = in_params['learning_rate']. It should be self.learing_rate = in_params['learning_rate'].

Since I'm new to the ML things. I'm not sure with my findings. If i was wrong, please close this issue.

Infinite loop in find_sample_order()?

Running mimic with {'pop_size': 50, 'keep_pct': 0.2, 'max_attempts': 10, 'max_iters': 100} for a DiscreteOpt(n, oneMax, max_val=2) problem I randomly get stuck in an infinite loop in find_sample_order(). It isn't consistently happening, but if I do a loop of n=1..50 I keep getting caught somewhere around n=10. Trying to understand what find_sample_order() should do in order to propose a fix, but not confident yet..

question

I'm curious about mlrose, whether it might be useful for 2 problems.
Problem 1: minimal disc covering problem. Find the centers for a minimal set of discs of unit radius to cover a given set of points in a plane. These centers take continuous values.

Problem 2: Find a minimal number of sets to partition a set (of points in a plane) such that the euclidean distance between each of the members within each set is greater than some value.

RHC using wrong get_fitness?

I think the fitness_curve.append in line 210 of algorithms.py is using the incorrect get_fitness (it's using get_pop_fitness)?
When running an RHC, I get a blank array back for the fitness curve.

ImportError: cannot import name 'six' from 'sklearn.externals'

When trying to follow along https://github.com/gkhayes/mlrose/blob/master/tutorial_examples.ipynb ,

I get the following:

(base) nolan@nolanDesktop:.../optimization$ python3 ./thing.py
Traceback (most recent call last):
  File "./thing.py", line 1, in <module>
    import mlrose
  File "/home/nolan/anaconda3/lib/python3.8/site-packages/mlrose/__init__.py", line 12, in <module>
    from .neural import NeuralNetwork, LinearRegression, LogisticRegression
  File "/home/nolan/anaconda3/lib/python3.8/site-packages/mlrose/neural.py", line 12, in <module>
    from sklearn.externals import six
ImportError: cannot import name 'six' from 'sklearn.externals' (/home/nolan/anaconda3/lib/python3.8/site-packages/sklearn/externals/__init__.py)

which is resolved by https://stackoverflow.com/questions/61867945/python-import-error-cannot-import-name-six-from-sklearn-externals

(this appears related to #54)

Could the library be updated such that the workaround is no longer needed?

(base) nolan@nolanDesktop:.../optimization$ pip3 show mlrose
Name: mlrose
Version: 1.3.0
Summary: MLROSe: Machine Learning, Randomized Optimization and Search
Home-page: https://github.com/gkhayes/mlrose
Author: Genevieve Hayes
Author-email: None
License: BSD
Location: /home/nolan/anaconda3/lib/python3.8/site-packages
Requires: numpy, scipy, sklearn
Required-by:

Cannot Install

PIP install fails. There is an issue in setup.py where it is trying to install sklearn instead of scikit-learn.

how would one indicate a start location

Hello,

Looking at the TSPOpt
I am able to generate a route, but would like to make one of the places "first" in the final route I find.
How might I go about extending this code to support that?
Any hints would be great!

Thanks,
Evan

	# Calculate head and tail values
	head_0 = head(0, state)
	tail_0 = tail(0, state)
	head_1 = head(1, state)
	tail_1 = tail(1, state)

	# Calculate R(X, T)
	if (tail_0 > _t and head_1 > _t) or (tail_1 > _t and head_0 > _t):
	_r = _n
	else:
	_r = 0

	# Evaluate function
	fitness = max(tail_0, head_1) + _r

	return fitness