Giter VIP home page Giter VIP logo

mlrose's Introduction

mlrose: Machine Learning, Randomized Optimization and SEarch

mlrose is a Python package for applying some of the most common randomized optimization and search algorithms to a range of different optimization problems, over both discrete- and continuous-valued parameter spaces.

Project Background

mlrose was initially developed to support students of Georgia Tech's OMSCS/OMSA offering of CS 7641: Machine Learning.

It includes implementations of all randomized optimization algorithms taught in this course, as well as functionality to apply these algorithms to integer-string optimization problems, such as N-Queens and the Knapsack problem; continuous-valued optimization problems, such as the neural network weight problem; and tour optimization problems, such as the Travelling Salesperson problem. It also has the flexibility to solve user-defined optimization problems.

At the time of development, there did not exist a single Python package that collected all of this functionality together in the one location.

Main Features

Randomized Optimization Algorithms

  • Implementations of: hill climbing, randomized hill climbing, simulated annealing, genetic algorithm and (discrete) MIMIC;
  • Solve both maximization and minimization problems;
  • Define the algorithm's initial state or start from a random state;
  • Define your own simulated annealing decay schedule or use one of three pre-defined, customizable decay schedules: geometric decay, arithmetic decay or exponential decay.

Problem Types

  • Solve discrete-value (bit-string and integer-string), continuous-value and tour optimization (travelling salesperson) problems;
  • Define your own fitness function for optimization or use a pre-defined function.
  • Pre-defined fitness functions exist for solving the: One Max, Flip Flop, Four Peaks, Six Peaks, Continuous Peaks, Knapsack, Travelling Salesperson, N-Queens and Max-K Color optimization problems.

Machine Learning Weight Optimization

  • Optimize the weights of neural networks, linear regression models and logistic regression models using randomized hill climbing, simulated annealing, the genetic algorithm or gradient descent;
  • Supports classification and regression neural networks.

Installation

mlrose was written in Python 3 and requires NumPy, SciPy and Scikit-Learn (sklearn).

The latest released version is available at the Python package index and can be installed using pip:

pip install mlrose

Documentation

The official mlrose documentation can be found here.

A Jupyter notebook containing the examples used in the documentation is also available here.

Licensing, Authors, Acknowledgements

mlrose was written by Genevieve Hayes and is distributed under the 3-Clause BSD license.

You can cite mlrose in research publications and reports as follows:

BibTeX entry:

@misc{Hayes19,
 author = {Hayes, G},
 title 	= {{mlrose: Machine Learning, Randomized Optimization and SEarch package for Python}},
 year 	= 2019,
 howpublished = {\url{https://github.com/gkhayes/mlrose}},
 note 	= {Accessed: day month year}
}

mlrose's People

Contributors

bspivey avatar christopherbilg avatar cooknl avatar davideasaf avatar domfrecent avatar gkhayes avatar jfs42 avatar mjschock avatar nibelungvalesti avatar parkds avatar vpipkt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlrose's Issues

Randomized Hill Climbing improvement

According to referenced "Clever Algorithms: Nature-Inspired Programming Recipes":

neighbors with better or equal cost should be accepted, allowing the technique to navigate across plateaus in the response surface

I suggest a change in the code like this:

while (attempts < max_attempts) and (iters < max_iters):
    iters += 1
    attempts += 1

    # Find random neighbor and evaluate fitness
    next_state = problem.random_neighbor()
    next_fitness = problem.eval_fitness(next_state)

    improvement = next_fitness - problem.get_fitness()

    # If the neighbor is better or equal move to that state
    if improvement >= 0:
        problem.set_state(next_state)
        # if better than reset attempts counter
        if improvement > 0: 
            attempts = 0

I personally work with problems where this small change has a big impact on results.

BUG in eval_mate_probs()

The fitness values of a population might contain positive AND negative values in OptProb.eval_mate_probs(). Thus its sum might be positive, negative, or even accidentally zero. So we have to be more careful to avoid negative probabilities that arrive if the sum of the probabilities has a different sign than the probability of a member.

`

Set -1*inf values to 0 to avoid dividing by sum of infinity.

    # This forces mate_probs for these pop members to 0.
    pop_fitness[pop_fitness == -1.0*np.inf] = 0

    # Fitness of the population might contain positive AND negative values.
    # Thus its sum can be positive, negative or even accidentally zero.

    # All fitness values are zero
    if np.count_nonzero(pop_fitness) == 0:
        self.mate_probs = np.ones(len(pop_fitness)) \
                          / len(pop_fitness)
    else:
        # Make all values positive, the smallest will be offset by the sdev of the data
        pop_fitness += (pop_fitness.min() + pop_fitness.std())
        self.mate_probs = pop_fitness/np.sum(pop_fitness)

`

RHC using wrong get_fitness?

I think the fitness_curve.append in line 210 of algorithms.py is using the incorrect get_fitness (it's using get_pop_fitness)?
When running an RHC, I get a blank array back for the fitness curve.

TypeError during Fitting

I can't fit my data (it works on sklearn though).
I keep getting this

AttributeError: 'float' object has no attribute 'exp'
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/negm/PycharmProjects/testa2/a2_1.py", line 47, in
nn_model.fit(x_train, y_train)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/neural.py", line 528, in fit
restarts=0, init_state=init_weights)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/algorithms.py", line 188, in random_hill_climb
problem.set_state(init_state)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/opt_probs.py", line 224, in set_state
self.fitness = self.eval_fitness(self.state)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/opt_probs.py", line 98, in eval_fitness
fitness = self.maximize*self.fitness_fn.evaluate(state)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/neural.py", line 290, in evaluate
self.y_pred = self.output_activation(outputs)
File "/home/negm/PycharmProjects/testa2/venv/lib/python3.6/site-packages/mlrose/activation.py", line 77, in sigmoid
fx = 1/(1 + np.exp(-x))
TypeError: loop of ufunc does not support argument 0 of type float which has no callable exp method`

The weird thing is i tried different activation functions, but it always maps back to 'sigmoid' !!

GA for NN incorrect

"This problem is because the probability of selecting a parent was determined by:

mate_probs = pop_fitness / np.sum(pop_fitness)

This will result in low numbers if pop_fitness is close to 0 and high numbers if pop_fitness is far from zero regardless of sign. Lets say there are 2 people in population with fitness -0.1 (good) and -10 (bad). the mate_probs would be -0.1 / -10.1 = ~0.01 for the good person and -10 / -10.1 or ~0.99 for the bad person. This means we are more likely to mate people with poor fitness...bad! "(taken from a piazza post in a class by the person who found it)

It's corrected in this fork which has basically redesigned/refactored most of the library with additional functionality
https://github.com/hiive/mlrose
specifically in this file
https://github.com/hiive/mlrose/blob/master/mlrose/opt_probs/_opt_prob.py

simulated_annealing neighbor selection

In looking at simulated_annealing, the problem method random_neighbor() is called, which for ContinuousOpt objects computes the neighbors based on the step size, and only considers those neighbors (from what I can tell) within a step of the current state. I know it is common to open the "neighbors" to include most/all of the population (perhaps some multiple of step size * a normal probability, or the temperature is commonly used as the step size also), this way the algorithm can random walk at first (while temperature is high), and then settle in on only higher points as the temperature decreases. I think of simulated annealing as almost a random restart at every iteration, with decreasing likelihood of jumping to a lower point as the temperature decreases.

EDIT: after further research, it seems like your approach is a valid one, especially as talked about in the CS7641 lectures. I stumbled on this link, which ended up helping my case, and is a slightly different implementation of simulated annealing : https://www.mathworks.com/help/gads/how-simulated-annealing-works.html

Six Peaks Opt

Code being referenced:

mlrose/mlrose/fitness.py

Lines 393 to 408 in 71ee20a

# Calculate head and tail values
head_0 = head(0, state)
tail_0 = tail(0, state)
head_1 = head(1, state)
tail_1 = tail(1, state)
# Calculate R(X, T)
if (tail_0 > _t and head_1 > _t) or (tail_1 > _t and head_0 > _t):
_r = _n
else:
_r = 0
# Evaluate function
fitness = max(tail_0, head_1) + _r
return fitness

I'm reading through: https://papers.nips.cc/paper/1328-mimic-finding-optima-by-estimating-probability-densities.pdf to understand the implementation of this alg and i'm finding that 6 peaks does not give 4 optima as expected.

The reason is because the evaluation always uses
fitness = max(tail_0, head_1) + _r on line 406

and NOT the tail or head that actually fulfilled the OR conditional for R(X,T).

In the image below, I run a four peaks and six peaks fitness function analysis and have circled the peaks that should match with the other max optimum based on what i presume to be the correct logic based on the above paper.

image

Hill Climb Curve?

I was looking at the code and it looks like the curve array only gets updated once per restart, I'd expect it to update once per iteration.

Is that on purpose or a bug?

Code snippet below for context:

for _ in range(restarts + 1):
         ...

        while iters < max_iters:
            ...

       ...
        if curve:
            fitness_curve.append(problem.get_fitness())```

get param set param error

There is an issue where in the BaseNeuralNetwork class, the 'set_param' and 'get_param' function do not get and set all of the parameters. This causes an issue when using basic sklearn functions such as 'cross_validate'

Edit: Looks like this may be fixed in PR #55

to fix update the following functions like this:

    def get_params(self, deep=False):
        """Get parameters for this estimator.

        Returns
        -------
        params : dictionary
            Parameter names mapped to their values.
        """
        params = {'activation': self.activation,
                  'algorithm': self.algorithm,
                  'hidden_nodes': self.hidden_nodes,
                  'max_iters': self.max_iters,
                  'bias': self.bias,
                  'is_classifier': self.is_classifier,
                  'learning_rate': self.learning_rate,
                  'early_stopping': self.early_stopping,
                  'clip_max': self.clip_max,
                  'restarts': self.restarts,
                  'schedule': self.schedule,
                  'pop_size': self.pop_size,
                  'mutation_prob': self.mutation_prob,
                  'max_attempts': self.max_attempts,
                  'random_state': self.random_state,
                  'curve': self.curve}

        return params

    def set_params(self, **in_params):
        """Set the parameters of this estimator.

        Parameters
        -------
        in_params: dictionary
            Dictionary of parameters to be set and the value to be set to.
        """
        if 'hidden_nodes' in in_params.keys():
            self.hidden_nodes = in_params['hidden_nodes']
        if 'max_iters' in in_params.keys():
            self.max_iters = in_params['max_iters']
        if 'bias' in in_params.keys():
            self.bias = in_params['bias']
        if 'is_classifier' in in_params.keys():
            self.is_classifier = in_params['is_classifier']
        if 'learning_rate' in in_params.keys():
            self.learning_rate = in_params['learning_rate']
        if 'early_stopping' in in_params.keys():
            self.early_stopping = in_params['early_stopping']
        if 'clip_max' in in_params.keys():
            self.clip_max = in_params['clip_max']
        if 'restarts' in in_params.keys():
            self.restarts = in_params['restarts']
        if 'schedule' in in_params.keys():
            self.schedule = in_params['schedule']
        if 'pop_size' in in_params.keys():
            self.pop_size = in_params['pop_size']
        if 'mutation_prob' in in_params.keys():
            self.mutation_prob = in_params['mutation_prob']
        if 'activation' in in_params.keys():
            self.activation = in_params['activation']
        if 'algorithm' in in_params.keys():
            self.algorithm = in_params['algorithm']
        if 'max_attempts' in in_params.keys():
            self.max_attempts = in_params['max_attempts']
        if 'random_state' in in_params.keys():
            self.random_state = in_params['random_state']
        if 'curve' in in_params.keys():
            self.curve = in_params['curve']

Bias for hidden layers

It appears that the neural network in mlrose only adds the bias node to the input layer, but not the hidden layers. This is different from what I've seen elsewhere, for example sklearn.

I have a branch that adds this bias to the hidden layers too, and this works well enough for me. Is this something desirable for mlrose? If so I'll try to fix the tests and submit a PR.
https://github.com/phunehehe/mlrose/tree/bias

TSPopt()- genetic_alg- Malfunction in max_attempts

I am using TSPopt() to solve simple TSP example with following code:-
->best_state, best_fitness, fittness_curve = mlrose.genetic_alg(problem_fit2, mutation_prob = 0.2, max_attempts = max_attempt, random_state = 2, curve=True)
I iterate max_attempt from 1 to 50, and get following size of fittness_curve:-
Output:-
Max attempts = 1; size of fittness curve = 2
Max attempts = 2; size of fittness curve = 3
Max attempts = 3; size of fittness curve = 4
Max attempts = 4; size of fittness curve = 5
Max attempts = 5; size of fittness curve = 6
Max attempts = 6; size of fittness curve = 7
Max attempts = 7; size of fittness curve = 15
Max attempts = 8; size of fittness curve = 16
Max attempts = 9; size of fittness curve = 17
Max attempts = 10; size of fittness curve = 18
Max attempts = 11; size of fittness curve = 32
Max attempts = 12; size of fittness curve = 33
Max attempts = 13; size of fittness curve = 34
Max attempts = 14; size of fittness curve = 35
Max attempts = 15; size of fittness curve = 36
Max attempts = 16; size of fittness curve = 37
Max attempts = 17; size of fittness curve = 38
Max attempts = 18; size of fittness curve = 39
Max attempts = 19; size of fittness curve = 40
Max attempts = 20; size of fittness curve = 41

It seems that more attempts are being made by the library than passed at parameters and the problem exist at values 7 and 11.

Kindly help to improve my understanding please.

tsp_mlrose-distance.py.pdf

ImportError: cannot import name 'six' from 'sklearn.externals'

Environment:

  • OS: WSL Ubuntu
  • Python 3.8
  • Dependencies installed:
    • mlrose 1.3.0
    • numpy 1.19.0
    • scipy 1.5.1
    • six 1.14.0
    • sklearn 0.0

When importing mlrose, receive error:

File "/lib/python3.8/site-packages/mlrose/__init__.py", line 12, in <module>
    from .neural import NeuralNetwork, LinearRegression, LogisticRegression
File "/lib/python3.8/site-packages/mlrose/neural.py", line 12, in <module>
    from sklearn.externals import six
ImportError: cannot import name 'six' from 'sklearn.externals' (/lib/python3.8/site-packages/sklearn/externals/__init__.py)

Also see stackoverflow question: https://stackoverflow.com/q/61867945/1164465

GA, TSP, and MaxK Color

I've run several experiments with the optimization problems. Unexpectedly, GA doesn't perform best on TSP as expected (in fact rhc was best followed closely by sa). Also, for MaxKColor, MIMIC should perform best but it's only slightly better than GA while RHC is far better and with SA closely behind again. Is it possible there's a bug in the fitness determination or am I setting up the problems incorrectly?

https://github.com/tmsuidan/cs7641-assignment2/tree/master/MaxkColor

https://github.com/tmsuidan/cs7641-assignment2/tree/master/tsp

GA probability of selecting parents incorrect

Hi,

the probability of selecting parents is calculated by (https://github.com/gkhayes/mlrose/blob/master/mlrose/opt_probs.py eval_mate_probs function):

self.mate_probs = pop_fitness/np.sum(pop_fitness)

This results in high probabilities for bad individuals and low probabilities for good individuals for minimization problems.
Example: Population with 2 individuals, individual_1 fitness -1 and individual_2 fitness -2. The probability of selecting individual_1 (the good one) is 1/3 and probability of selecting individual_2 (the bad one) is 2/3. It is more likely to select individuals with poor fitness.

A quick fix could be the use of the reciprocal value for minimization problems:

if self.maximize == 1:
self.mate_probs = pop_fitness / np.sum(pop_fitness)
else:
self.mate_probs = 1/pop_fitness / np.sum(1/pop_fitness)

how would one indicate a start location

Hello,

Looking at the TSPOpt
I am able to generate a route, but would like to make one of the places "first" in the final route I find.
How might I go about extending this code to support that?
Any hints would be great!

Thanks,
Evan

'genetic_alg' does not work for multiple layers

From the tutorial (revised) hidden_nodes = [5,5,5,5,5]

np.random.seed(3)

nn_model1 = mlrose.NeuralNetwork(hidden_nodes = [5,5,5,5,5], activation ='relu',
algorithm ='genetic_alg',
max_iters = 1000, bias = True, is_classifier = True,
learning_rate = 0.0001, early_stopping = True,
clip_max = 5, max_attempts = 100)

nn_model1.fit(X_train_scaled, y_train_hot)

Error message:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Fast MIMIC Fails TSPOpt

When running TSPOpt problem with custom distance. MIMIC fails only when fast_mimic=True.

Code that works:

best_state, best_fitness, fit_curve = mlrose.mimic(problem_cust, max_attempts=100, max_iters=1000, random_state=42,
                                                       curve=True)

Code that fails:

best_state, best_fitness, fit_curve = mlrose.mimic(problem_cust, max_attempts=100, max_iters=1000, random_state=42,
                                                   fast_mimic=True, curve=True)

It is specifically failing in opt_probs.py on line 1010. The issue is that remaining is empty and it's trying to pull a random choice from an empty array.

unable to set state vector lower bound?

Hi, thanks for this package but I find it's not available to set state vector. Is it possible to realize?

I am try to fit a nonlinear curve to extract parameters but simulate annealing directly drops state vector to 0 in a continuous optimization problem. thx

potential bug for problem.reset() not clearing fevals when running rhc algorithm with multiple restarts

Hey, thank you for creating this awesome package. I was using a fork hiive/mlrose and I seem to run into an issue that reset does not clear function evaluations properly when multiple restarts are specified for the rhc algorithm. And apologies if this might not be the proper place raising this, and let me know if there are other places I should raise this issue.

Issue Description:
I found problem.reset() not clearing fevals when running rhc algorithm.

Problem Setup:
I was using FlipFlop fitness function, DiscreteOpt for the problem, and rhc for the algorithm.

The way I call rhc is as follows. I was trying to have multiple runs of same parameters with different seed to reduce variance and I call reset function each time hoping to clear the states.

opt.reset()
 mlrose.random_hill_climb(opt, max_attempts=100, max_iters=3000, restarts=restart, init_state=None, curve=True, random_state=seed)

when I input restarts = 2 for example, I find the second fevals not being cleared. Attaching my fevals output as below:

The seed is  aaa (dummy just for illustartion)
[  1.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.  15.
  16.  17.  18.  19.  20.  21.  22.  24.  26.  27.  28.  29.  30.  31.
  32.  33.  34.  36.  37.  38.  39.  40.  41.  42.  43.  44.  45.  46.
  47.  48.  49.  50.  51.  52.  53.  54.  55.  56.  57.  58.  59.  60.
  61.  62.  63.  64.  65.  66.  67.  68.  69.  70.  71.  72.  73.  74.
  75.  76.  77.  78.  79.  80.  81.  82.  83.  84.  85.  86.  87.  88.
  89.  90.  91.  92.  93.  94.  95.  96.  97.  98.  99. 100. 101. 102.
 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116.
 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130.
 131. 132. 133. 134. 135. 136.]
The seed is  bbb
[315. 317. 319. 320. 321. 322. 324. 325. 326. 327. 328. 329. 330. 331.
 332. 333. 334. 335. 336. 337. 338. 339. 340. 342. 343. 344. 345. 346.
 347. 348. 349. 350. 351. 352. 353. 354. 355. 356. 357. 358. 359. 360.
 362. 363. 364. 365. 366. 367. 368. 369. 370. 371. 372. 373. 374. 375.
 376. 377. 378. 379. 380. 381. 382. 383. 384. 385. 386. 387. 388. 389.
 390. 391. 392. 393. 394. 395. 396. 397. 398. 399. 400. 401. 402. 403.
 404. 405. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415. 416. 417.
 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429. 430. 431.
 432. 433. 434. 435. 436. 437. 438. 439. 440. 441. 442. 443. 444. 445.
 446. 447. 448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459.
 460. 461. 462.]
The seed is  ccc, this got cleared 
[  1.   3.   4.   5.   6.   8.  10.  11.  13.  14.  15.  16.  17.  18.
  19.  20.  21.  23.  24.  25.  26.  27.  28.  29.  30.  31.  32.  33.
  34.  35.  36.  37.  38.  39.  40.  41.  42.  43.  44.  45.  46.  47.
  48.  49.  50.  51.  52.  53.  54.  55.  56.  57.  58.  59.  60.  61.
  62.  63.  64.  65.  66.  67.  68.  69.  70.  71.  72.  73.  74.  75.
  76.  77.  78.  79.  80.  81.  82.  83.  84.  85.  86.  87.  88.  89.
  90.  91.  92.  93.  94.  95.  96.  97.  98.  99. 100. 101. 102. 103.
 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117.
 118. 119. 120. 121. 122. 123.]
The seed is  49
[229. 230. 232. 233. 235. 236. 237. 238. 239. 240. 241. 242. 243. 244.
 245. 246. 247. 248. 249. 250. 252. 253. 254. 255. 256. 257. 258. 259.
 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271. 272. 273.
 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287.
 288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301.
 302. 303. 304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 315.
 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 328. 329.
 330. 331. 332. 333. 334. 335. 336. 337. 338. 339. 340. 341. 342. 343.
 344. 345. 346. 347. 348. 349. 350. 351. 352.]
The seed is  66
[263. 265. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278.
 279. 281. 282. 284. 285. 286. 287. 288. 289. 290. 291. 292. 293. 294.
 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305. 306. 307. 308.
 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 321. 322.
 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333. 334. 335. 336.
 337. 338. 339. 340. 341. 342. 343. 344. 345. 346. 347. 348. 349. 350.
 351. 352. 353. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364.
 365. 366. 367. 368. 369. 370. 371. 372. 373. 374. 375. 376. 377. 378.
 379. 380. 381. 382. 383. 384.]

Let me know if there is more information I could provide and once again, thank you for this awesome package.

Bugs with Neural Network Moudle

Hello there, I played mlrose today, however when I used default tutorial_examples.ipynb, there were some bugs poped out. In example 6, I got following tracking info:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/IPython/core/formatters.py in call(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()

~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/IPython/lib/pretty.py in pretty(self, obj)
400 if cls is not object
401 and callable(cls.dict.get('repr')):
--> 402 return _repr_pprint(obj, self, cycle)
403
404 return _default_pprint(obj, self, cycle)

~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/IPython/lib/pretty.py in repr_pprint(obj, p, cycle)
695 """A pprint that just redirects to the normal repr function."""
696 # Find newlines and replace them with p.break
()
--> 697 output = repr(obj)
698 for idx,output_line in enumerate(output.splitlines()):
699 if idx:

~/.pyenv/versions/3.7.1/envs/py3-mlrose/lib/python3.7/site-packages/sklearn/base.py in repr(self)
228 def repr(self):
229 class_name = self.class.name
--> 230 return '%s(%s)' % (class_name, _pprint(self.get_params(deep=False),
231 offset=len(class_name),),)
232

TypeError: get_params() got an unexpected keyword argument 'deep'`

After some digging, i think the bug comes from the following parts:

  1. neural.py line 609: get_params(self). In skilearn BaseEstimator the original signature of the funciton should be get_params(self, deep=False) which are incompatible.
  2. neural.py line 621: 'learning_rate': self.lr. I think it should be 'learning_rate': self.learning_rate. There is no lr used in the class.
  3. neural.py line 648: self.lr = in_params['learning_rate']. It should be self.learing_rate = in_params['learning_rate'].

Since I'm new to the ML things. I'm not sure with my findings. If i was wrong, please close this issue.

genetic_alg with size of 1 crashes

It probably isn't very meaningful, but if you have a size of 1 when using the genetic algorithm, then reproduce crashes.

fitness = mlrose.OneMax()
problem = mlrose.DiscreteOpt(length = 1, fitness_fn = fitness, maximize = True, max_val = 2)
mlrose.genetic_alg(problem)

Then you get a crash :(

bug for mutation_probability in genetic algorithms

I believe the inequality sign should be flipped in the following code:
As we want say 10% mutation probability to mean we want 10% of the time mutating children. At 10% with the current code, 90% of children will get mutated.

def mutate(self, child, mutation_probability):
if np.random.rand() > mutation_probability:
# do swap mutation
m1 = np.random.randint(len(child))
m2 = np.random.randint(len(child))
tmp = child[m1]
child[m1] = child[m2]
child[m2] = tmp
return child

Infinite loop in find_sample_order()?

Running mimic with {'pop_size': 50, 'keep_pct': 0.2, 'max_attempts': 10, 'max_iters': 100} for a DiscreteOpt(n, oneMax, max_val=2) problem I randomly get stuck in an infinite loop in find_sample_order(). It isn't consistently happening, but if I do a loop of n=1..50 I keep getting caught somewhere around n=10. Trying to understand what find_sample_order() should do in order to propose a fix, but not confident yet..

question

I'm curious about mlrose, whether it might be useful for 2 problems.
Problem 1: minimal disc covering problem. Find the centers for a minimal set of discs of unit radius to cover a given set of points in a plane. These centers take continuous values.

Problem 2: Find a minimal number of sets to partition a set (of points in a plane) such that the euclidean distance between each of the members within each set is greater than some value.

Cannot Install

PIP install fails. There is an issue in setup.py where it is trying to install sklearn instead of scikit-learn.

mlrose suitable for variable lengths discrete variables?

Hi,

I am looking for an alternative to a brute force optimisation approach.

My parameter set is the following. So, my problem is a multi-variate, single-objective problem. It is not related to any machine learning task.

    from sklearn.model_selection import ParameterGrid
    from multiprocessing import Pool

    var1 = 'var1'
    var2 = 'var2'
    abc = [1, 2]
    xyz = list(range(1_00_000))
    pg = [{'variant': [var1],
           'abc': abc,
           'xyz': xyz, },
          {'variant': [var2],
           'abc': abc, }]
    parameterGrid = ParameterGrid(pg)

So I have variable lengths of parameters, my parameters can only have discrete values, and not all parameters do exist dependant from other parameters.

I would like to avoid brute force, as the execution of the cost function is quite time consuming.

Would I be able to use mlrose for my problem?

(I ask this as DiscreteOpt expects fixed lengths for all parameters).

Thanks in advance!

Feature Request: Use np functionality where needed, support random seeding

Thanks for putting together this resource!
To support evaluating fitness across an ndarray of examples, some functions should use np functionality - specifically I see regular sum() used in one_max fitness evaluation, replace with np.sum(axis=-1) or the like. Not sure where else this is relevant.

It'd also be helpful if we could provide random_seed to optimization objects and algorithms, for consistent results, would require adding a few np.random.seed() lines in the respective functions.

ImportError: cannot import name 'six' from 'sklearn.externals'

When trying to follow along https://github.com/gkhayes/mlrose/blob/master/tutorial_examples.ipynb ,

I get the following:

(base) nolan@nolanDesktop:.../optimization$ python3 ./thing.py
Traceback (most recent call last):
  File "./thing.py", line 1, in <module>
    import mlrose
  File "/home/nolan/anaconda3/lib/python3.8/site-packages/mlrose/__init__.py", line 12, in <module>
    from .neural import NeuralNetwork, LinearRegression, LogisticRegression
  File "/home/nolan/anaconda3/lib/python3.8/site-packages/mlrose/neural.py", line 12, in <module>
    from sklearn.externals import six
ImportError: cannot import name 'six' from 'sklearn.externals' (/home/nolan/anaconda3/lib/python3.8/site-packages/sklearn/externals/__init__.py)

which is resolved by https://stackoverflow.com/questions/61867945/python-import-error-cannot-import-name-six-from-sklearn-externals

(this appears related to #54)

Could the library be updated such that the workaround is no longer needed?

(base) nolan@nolanDesktop:.../optimization$ pip3 show mlrose
Name: mlrose
Version: 1.3.0
Summary: MLROSe: Machine Learning, Randomized Optimization and Search
Home-page: https://github.com/gkhayes/mlrose
Author: Genevieve Hayes
Author-email: None
License: BSD
Location: /home/nolan/anaconda3/lib/python3.8/site-packages
Requires: numpy, scipy, sklearn
Required-by:

Potential Bug with Genetic Algorithm Function

While using the genetic algorithm function for a class project, I was getting poorer performance than expected, and after looking into the code to gain some insight I think I have found an issue. When the algorithm gets into the while (attempts < max_attempts) loop, the new population is bred from the original, and then the fitness of the new population is tested, which makes perfect sense. However, in order to test the fitness, the original population gets overridden, and even when the new fitness is worse than the original the population is never reverted back and the next attempt is based off of the population that performed worse.

Here is the code (algorithms.py, inside genetic_alg function):

    next_gen = np.array(next_gen)
    **problem.set_population(next_gen)                        <---- Population gets changed**

    next_state = problem.best_child()
    next_fitness = problem.eval_fitness(next_state) 

    # If best child is an improvement,
    # move to that state and reset attempts counter
    if next_fitness > problem.get_fitness():
        problem.set_state(next_state)
        attempts = 0

    **else:                       <---- If fitness is worse, there is no revert back to original population**
        attempts += 1

Maybe this implementation was intentional, but this doesn't seem like ideal functionality. If this was in fact done intentionally, and someone knows the reasoning, please let me know. It's got me curious

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.