michaelhush / m-loop Goto Github PK

View Code? Open in Web Editor NEW

159.0 159.0 54.0 3.22 MB

M-LOOP: Machine-learning online optimization package

Home Page: http://m-loop.readthedocs.io/en/latest/

License: MIT License

Python 100.00%

m-loop's People

Contributors

Stargazers

Watchers

m-loop's Issues

A typo in controllers.py

Hi there,
Thanks for your work.

In controllers.py, I find a typo on line 475 of controllers.py, which should be self.max_boundary.

Newer Tensorflow Compatibility for neural_net Controller

Hello,

I'm using M-LOOP with my experiment and so far I've had great results with the gaussian_learner.

I have run into a few issues with the neural_net controller though. It looks like the neuralnet.py uses some deprecated tensorflow features. In particular, when attempting to optimize with it using tensorflow 2.0.0 I get the following error:

Process NeuralNetLearner-1:
Traceback (most recent call last):
  File "C:\Users\user_name\.conda\envs\labscript\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\learners.py", line 1869, in run
    self.create_neural_net()
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\learners.py", line 1619, in create_neural_net
    n.init()
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\neuralnet.py", line 493, in init
    self.net = self._make_net(self.last_net_reg)
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\neuralnet.py", line 431, in _make_net
    return SampledNeuralNet(creator, 1)
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\neuralnet.py", line 298, in __init__
    self.nets = [self.net_creator() for _ in range(count)]
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\neuralnet.py", line 298, in <listcomp>
    self.nets = [self.net_creator() for _ in range(count)]
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\neuralnet.py", line 430, in <lambda>
    self.losses_list)
  File "C:\Users\user_name\.conda\envs\labscript\lib\site-packages\mloop\neuralnet.py", line 60, in __init__
    self.tf_session = tf.Session(graph=self.graph)
AttributeError: module 'tensorflow' has no attribute 'Session'

After some Googling, it seems that this is a deprecated way to use tensorflow. One possible work around is to replace import tensorflow as tf with the following in neuralnet.py

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

That prevented the error and allowed the optimization to proceed, but it produced the following warnings

WARNING:tensorflow:From C:\Users\user_name\.conda\envs\labscript\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-02-26 18:24:16.228468: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  AVX
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-02-26 18:24:16.230358: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.

So it looks like that work around will likely stop working in a future version of tensorflow. Maybe there is a better way to fix it?

For reference, I have gotten the neural_net controller to work with tensorflow 1.15.0, although it does issue several deprecation warnings.

Cheers,
Zak

GP possible overestimates uncertainty?

Describe the bug
I was realizing that M-LOOP seems to be overestimating the uncertainty in the presence of noise, both in real experiments and on synthetic data. To illustrate this I try to optimize a "experiment" in which I can easily control the level of noise. While being able to fit the function quite well, the uncertainty seems to greatly exceed the seen data. Furthermore, if I fit the data "by hand" using the hyperparameters from M-LOOP, the uncertainty seems to be more realistic. I am worried that this will affect the performance in the real experiment, so I wanted to ask if this behavior is to be expected.

To Reproduce

#Imports for M-LOOP
import mloop.interfaces as mli
import mloop.controllers as mlc
import mloop.visualizations as mlv
import mloop.utilities as mlu

# sklearn imports
import sklearn.gaussian_process as skg
import sklearn.gaussian_process.kernels as skk

import numpy as np
import matplotlib.pyplot as plt

import sys
noise_level = float(sys.argv[1])

input_dict = {
    'max_num_runs' : 30, 
    'num_params' : 1, 
    'min_boundary' : [0],
    'max_boundary' : [1],
    'cost_has_noise' : True
}

with open("cost.npy", 'rb') as file:
    cost_array = np.load(file)
    params_array = np.linspace(0, 1, len(cost_array))

def cost_fct(x, xp, yp, noise):
    res = np.interp(x, xp, yp)
    res += np.random.normal(0, noise)
    return res

class CustomInterface(mli.Interface):
    
    def __init__(self):
        super(CustomInterface,self).__init__()

    def get_next_cost_dict(self,params_dict):
        
        params = params_dict['params']
        
        cost = cost_fct(params[0],  params_array,cost_array, noise_level)
        uncer = 0
        bad = False
        
        cost_dict = {'cost':cost, 'uncer':uncer, 'bad':bad}
        return cost_dict
    
def main():

    print(cost_array, )
    
    interface = CustomInterface()
    controller = mlc.create_controller(interface, **input_dict)
    controller.optimize() 
    
    # visualization
    vis = mlv.GaussianProcessVisualizer(controller.ml_learner.total_archive_filename)
    vis.plot_cross_sections()

    # plotting
    plt.figure(1)
    plt.scatter(controller.out_params, controller.in_costs, label="sampled data")
    plt.plot(params_array, cost_array, '-', label="true data")
    plt.title("Landscape M-LOOP (noise = %0.3f)" % noise_level)
    plt.legend()

    # manual fit
    gp_kernel = skk.RBF(vis.length_scale)
    gp_kernel += skk.WhiteKernel(vis.noise_level)
    alpha = vis.all_uncers**2
    gaussian_process = skg.GaussianProcessRegressor(kernel=gp_kernel, n_restarts_optimizer=10)
    gaussian_process.fit(vis.all_params,vis.all_costs)
    params = np.linspace(0, 1, 100).reshape(-1, 1)
    (cost, uncer) = gaussian_process.predict(params, return_std=True)

    #plotting
    plt.figure(2)
    plt.title("Landscape (noise = %0.3f)")
    plt.plot(params, cost, 'r-', label="fit")
    plt.plot(params,cost+uncer, 'r--')
    plt.plot(params, cost-uncer, 'r--')
    plt.scatter(controller.out_params, controller.in_costs, label="sampled data")
    plt.plot(params_array, cost_array, '-', label="true data")
    plt.title("Landscape M-LOOP (noise = %0.3f)" % noise_level)
    plt.xlim(0, 1)
    plt.legend()
    
    plt.show()

if __name__ == '__main__':
    main()

Expected behavior
Good fits, but very large uncertainty

OS: Linux Ubuntu 16.04, 20.04
M-LOOP 3.1.1
Python 3.5.2 & 3.8

Creating a learner visualizer instance creates an M-LOOP_archives directory

Describe the bug
Creating a learner visualizer instance causes M-LOOP to create a directory called M-LOOP_archives in the current working directory.

To Reproduce
Steps to reproduce the behavior:

Generate a learner archive by running an optimization, or take one from a previous optimization.
cd to a directory that does not contain a directory called M-LOOP_archives
Create a visualizer, e.g. by running the sample code below.
See that a M-LOOP_archives directory has been created in the current working directory.

learner_archive_filename = ''  # Set to learner achive file, including path and extension.
import mloop.visualizations as mlv
learner_visualizer = mlv.create_learner_visualizer_from_archive(learner_archive_filename)

Expected behavior
The M-LOOP_archives directory should be created if necessary when an optimization is started since M-LOOP needs a place to store the new archive files for the new optimization. However, that directory should not be created when instantiating a visualizer instance to plot data from an existing learner archive as no new files will be created.

Additional context
Results from an optimization are plotted using the visualizer classes. The learner visualizer classes inherit from the learner classes themselves, so learners.Learner.__init__() is run when a learner visualizer is instantiated. That method then creates the directory M-LOOP_archives in the current working directory if it doesn't already exist. That behavior makes sense when the Learner.__init__() is being run while creating a learner to start an optimization, but ideally it shouldn't happen when just creating a visualizer.

It might be possible to fix this by passing learner_archive_filename=None to the parent __init__() methods in the visualizer classes. That should avoid the directory creation because of the if learner_archive_filename is None statement in Learner.__init__(). I haven't tried this though.

Not sure if I'll get a chance to fix this soon, but figured I'd post a bug report about it now before I forget. It's a pretty minor and inconsequential bug anyway, though it does lead to some file system clutter.

Tests hang due to test_shell_interface_config()

Describe the bug
When running the tests, execution hangs. Pointed out in #111 (comment).

To Reproduce
Steps to reproduce the behavior:

cd to the tests directory
run pytest -v
See that the execution hangs during test_examples.py::TestExamples::test_shell_interface_config.

Desktop (please complete the following information):

OS: Windows 10
M-LOOP commit 6ca72cb

Additional context
Manually running the code from that test gives the following error:

Traceback (most recent call last):
  File "C:\Users\user_name\Software\anaconda3\envs\mloop_install_test_2\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "c:\users\user_name\software\m-loop\mloop\interfaces.py", line 106, in run
    cost_dict = self.get_next_cost_dict(params_dict)
  File "c:\users\user_name\software\m-loop\mloop\interfaces.py", line 287, in get_next_cost_dict
    self.param_names.append('param' + str(ind+1))
AttributeError: 'NoneType' object has no attribute 'append'

lanscape_vis.py is Outdated

mlv.show_all_default_visualizations_from_archive() no longer supports the upload_cross_sections option for uploading to plotly, and the script lanscape_vis.py in the tools directory should be updated to reflect that.

This is an easy fix, which I hopefully will have a chance to take care of in a few weeks

Add interface for programs executed from the command line

Make an interface which for experiments run from the command line. With a result returned on the console.

NeuralNetVisualizer calls nonexistent function safe_squeeze()

Hi all,

I came across this issue when trying to plot the results from an optimization run done using the neural_net controller. Here is the code to produce the error:

import mloop.visualizations as mlv

filename = r"C:\path\to\learner_archive_2020-02-22_06-44.txt"
file_type='txt'

visualization = mlv.NeuralNetVisualizer(filename, file_type)

and here's the resulting traceback:

AttributeError                            Traceback (most recent call last)
<ipython-input-1-21542a9fb6a5> in <module>
     12 # mlv.configure_plots()
     13 # mlv.create_neural_net_learner_visualizations(learner_archive,file_type='txt')
---> 14 visualization = mlv.NeuralNetVisualizer(filename, file_type)
     15 
     16 plt.show()

~\.conda\envs\labscript\lib\site-packages\mloop\visualizations.py in __init__(self, filename, file_type, **kwargs)
    608                                                   nn_training_file_type = file_type,
    609                                                   update_hyperparameters = False,
--> 610                                                   **kwargs)
    611 
    612         import plotly.plotly as py

~\.conda\envs\labscript\lib\site-packages\mloop\learners.py in __init__(self, trust_region, default_bad_cost, default_bad_uncertainty, nn_training_filename, nn_training_file_type, minimum_uncertainty, predict_global_minima_at_end, **kwargs)
   1478             #Data from previous experiment
   1479             self.all_params = np.array(self.training_dict['all_params'], dtype=float)
-> 1480             self.all_costs = mlu.safe_squeeze(self.training_dict['all_costs'])
   1481             self.all_uncers = mlu.safe_squeeze(self.training_dict['all_uncers'])
   1482 

AttributeError: module 'mloop.utilities' has no attribute 'safe_squeeze'

Maybe this function was superseded by safe_cast_to_array()?

I'm happy to submit the learner archive if desired. I'd also be happy to fix the bug and issue a pull request if it's a simple matter of changing safe_squeeze() to safe_cast_to_array().

Cheers,
Zak

Program stop mid running without any outup

Hello! I tried to run the code several time and sometimes (but not always), it stops in the middle (not always in the same place). This is some output from one of the runs where it stopped (the last lines of the terminal output - Powershell on Windows 10):

INFO     cost 32.265573116854235 +/- 0.0
INFO     Run: 70 (machine learner)
INFO     params [-500.         -965.9951292  -600.97861861 -300.         -352.33433901]

INFO     cost 1000.0 +/- 0.0
INFO     Run: 71 (trainer)
INFO     params [-1298.35679689  -816.87250569  -615.15777157  -426.3981931
  -499.14163221]

INFO     cost 1000.0 +/- 0.0
INFO     Run: 72 (machine learner)
INFO     params [-1265.03281961 -1137.39862994  -657.10271614  -719.20824797
  -377.11512437]

INFO     cost 1000.0 +/- 0.0
INFO     Run: 73 (machine learner)
INFO     params [-1096.93149896 -1076.23267012  -614.1589204   -706.57107924
  -351.21274201]

INFO     cost 49.006086364989656 +/- 0.0
INFO     Run: 74 (machine learner)
INFO     params [-1500.         -1313.5352554   -765.02461555  -796.65924047
  -369.23886747]

INFO     cost 107.2868748223299 +/- 0.0
INFO     Run: 75 (trainer)
INFO     params [-726.25157754 -628.76582468 -604.00469597 -403.95902915 -397.53860198]

Sometimes it stops at a very early run (<10) sometimes it goes few hundred runs, sometimes it finishes, without changing anything in the python script, just re-running it. This is the get_next_cost_dict:

def get_next_cost_dict(self,params_dict):
    
    params = params_dict['params']
    V1 = params[0]
    V2 = params[1]
    V3 = params[2]
    V4 = params[3]
    V5 = params[4]

    filename = "out_test.txt"

    try:
        os.remove(filename)
    except:
        print("")

    subprocess.call(r"powershell.exe & '.\SIMION 8.1.lnk' --nogui fastadj .\electrode.PA0 " + "1=" + str(V1) + ",2=" + str(V2) + ",3=" + str(V3) + ",4=" + str(V4) + ",5=" + str(V5))
    subprocess.call(r"powershell.exe & '.\SIMION 8.1.lnk' --nogui fly --restore-potentials=0 --recording-output=out_test.txt .\electrode.iob")
    while True:
        if os.path.isfile(".\out_test.txt"):
            data = np.loadtxt(filename,skiprows=1,usecols=0)
            if len(data)==500:
                break
            else: 
                time.sleep(0.2)
        else:
            time.sleep(0.2)

    data_simion = np.loadtxt(filename,skiprows=1)

    idx = np.where(data_simion[:,0]==50)
    data_simion = data_simion[idx]
    pos_y = data_simion[:,1]
    pos_z = data_simion[:,2]
    radius = np.sqrt((pos_y)**2+(pos_z)**2)
    if len(data_simion)<400:
        resolution = 1000
    else:
        resolution = np.std(radius)/np.mean(radius)*200

    new_func_value = resolution
    os.remove(filename)

    cost = np.sum(new_func_value)
    uncer = 0
    bad = False
    
    cost_dict = {'cost':cost, 'uncer':uncer, 'bad':bad}
    return cost_dict

and this is the main function:

def main():
    filename = "learner_archive_" + str(strftime("%Y-%m-%d_%H-%M")) + ".txt"
    
    interface = CustomInterface()
    controller = mlc.create_controller(interface, 
                                       controller_type='neural_net',
                                       #controller_type='gaussian_process',
                                       max_num_runs = 500,
                                       num_params = 5, 
                                       min_boundary = [-1500,-1500,-1000,-1000,-500],
                                       max_boundary = [-500,-500,-300,-300,-100],
                                       first_params = [-1000,-1000,-500,-500,-300],
                                       training_type = "differential_evolution",
                                       num_training_runs = 50)
    controller.optimize()
    
    print('Best parameters found:')
    best_params = controller.best_params
    best_cost = controller.best_cost
    print(best_params)
    print(best_cost)

I get this behavior both for NN and Gaussian process.

Add tutorial for using M-LOOP in a python environment.

Many experiments are already running on python. Add a tutorial to the documentation on how to use M-LOOP as a python library.

Syntax error during installation

Describe the bug
Syntax error when installing:
python setup.py develop

To Reproduce
Upon recloning:
git clone git://github.com/michaelhush/M-LOOP.git

Running installation:
python setup.py develop

Expected behavior
Successful completion of installation.

Screenshots
/tmp/easy_install-y_M61L/pytest-runner-5.3.1/temp/easy_install-UeIPEu/setuptools_scm-6.0.1/src
<pkg_resources.WorkingSet object at 0xb59f03d0>
Traceback (most recent call last):
File "setup.py", line 73, in
main()
File "setup.py", line 68, in main
'Topic :: Scientific/Engineering :: Physics']
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 144, in setup
_install_setup_requires(attrs)
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 139, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 724, in fetch_build_eggs
replace_conflicting=True,
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 782, in resolve
replace_conflicting=replace_conflicting
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1065, in best_match
return self.obtain(req, installer)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1077, in obtain
return installer(requirement)
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 791, in fetch_build_egg
return cmd.easy_install(req)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 704, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 730, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 915, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 1183, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 1169, in run_setup
run_setup(setup_script, args)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 253, in run_setup
raise
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 166, in save_modules
saved_exc.resume()
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 141, in resume
six.reraise(type, exc, self._tb)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-y_M61L/pytest-runner-5.3.1/setup.py", line 21, in
name = 'M-LOOP',
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 144, in setup
_install_setup_requires(attrs)
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 139, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 724, in fetch_build_eggs
replace_conflicting=True,
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 782, in resolve
replace_conflicting=replace_conflicting
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1065, in best_match
return self.obtain(req, installer)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1077, in obtain
return installer(requirement)
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 791, in fetch_build_egg
return cmd.easy_install(req)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 704, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 730, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 915, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 1183, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/usr/lib/python2.7/dist-packages/setuptools/command/easy_install.py", line 1169, in run_setup
run_setup(setup_script, args)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 253, in run_setup
raise
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 166, in save_modules
saved_exc.resume()
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 141, in resume
six.reraise(type, exc, self._tb)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/usr/lib/python2.7/dist-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-y_M61L/pytest-runner-5.3.1/temp/easy_install-UeIPEu/setuptools_scm-6.0.1/setup.py", line 52, in
download_url = 'https://github.com/michaelhush/M-LOOP/tarball/3.2.1',
File "/tmp/easy_install-y_M61L/pytest-runner-5.3.1/temp/easy_install-UeIPEu/setuptools_scm-6.0.1/setup.py", line 29, in scm_config

File "/tmp/easy_install-y_M61L/pytest-runner-5.3.1/temp/easy_install-UeIPEu/setuptools_scm-6.0.1/src/setuptools_scm/init.py", line 8, in
File "/tmp/easy_install-y_M61L/pytest-runner-5.3.1/temp/easy_install-UeIPEu/setuptools_scm-6.0.1/src/setuptools_scm/config.py", line 6, in
File "/tmp/easy_install-y_M61L/pytest-runner-5.3.1/temp/easy_install-UeIPEu/setuptools_scm-6.0.1/src/setuptools_scm/utils.py", line 41
print(*k)
^
SyntaxError: invalid syntax

Desktop (please complete the following information):

OS: Raspbian GNU/Linux 10 (buster)

Problem importing previous training data when using gaussian process

Describe the bug
I am trying to use data from previous experiments as input for my optimizer. Unfortunately this does not work if I use a Gaussian Process controller (yet, for a Differential Evolution controller everything seems to work). Is this a known issue and is there a workaround?

To Reproduce

create a simple controller and optimize (for example the optimizer from the tutorial):

#Imports for M-LOOP
import mloop.interfaces as mli
import mloop.controllers as mlc
import mloop.visualizations as mlv
import mloop.utilities as mlu

class CustomInterface(mli.Interface):
    
    def __init__(self):
        super(CustomInterface,self).__init__()
        self.minimum_params = np.array([0,0.1,-0.1])

    def get_next_cost_dict(self,params_dict):
        
        params = params_dict['params']
        
        cost = -np.sum(np.sinc(params - self.minimum_params))
        uncer = 0
        bad = False
        
        cost_dict = {'cost':cost, 'uncer':uncer, 'bad':bad}
        return cost_dict
    
def main():
    
    input_dict = mlu.get_dict_from_file('config.txt', 'txt')
    interface = CustomInterface()
    controller = mlc.create_controller(interface, **input_dict)
    controller.optimize() 

if __name__ == '__main__':
    main()

with the configuration file

max_num_runs = 15
target_cost = -2.99
num_params = 3
min_boundary = [-2,-2,-2]
max_boundary = [2,2,2]
controller_type = 'differential_evolution'
param_names=None

training_type = 'differential_evolution'   
gp_training_filename = "training_data.txt"
gp_training_file_type = 'txt'

where I either use controller_type = 'differential_evolution' or controller_type = 'gaussian_process'. training_data.txt simply is a learner archive from a previous run.

Expected behavior

previous run controller_type = 'differential_evolution' and current run controller_type = 'differential_evolution': works
previous run controller_type = 'gaussian_process' and current run controller_type = 'differential_evolution': works
previous run controller_type = 'gaussian_process' and current run controller_type = 'gaussian_process': TypeError: __init__() got multiple values for keyword argument 'param_names'
previous run controller_type = 'differential_evolution' and current run controller_type = 'gaussian_process': TypeError: __init__() got multiple values for keyword argument 'param_names'

Screenshots

Desktop (please complete the following information):

OS: Ubuntu 16.04
M-LOOP version 3.1.1
Python version 3.5.2

M-LOOP reading the same cost twice

The following problem has turned up recently in my work with M-LOOP, and I haven't been able to understand why:

I'm using the file interface with another program, let's call it TARGET. In my interface I have added some terminal output to print when TARGET starts and stops. I have begun experiencing the following sequence in the terminal:

INFO Run: 1
INFO params [<paramset1>]
Running TARGET...
TARGET done.
INFO cost <cost1>
INFO Run: 2
INFO params [<paramset2>]
Running TARGET...
INFO cost <cost1>
INFO Run: 3
INFO params [<paramset3>]
TARGET done.
INFO cost <cost2>

The problem here is that M-LOOP for some reason reads the same cost twice. This means it will assign the wrong cost to a set of parameters twice, first when it copies the previous cost, and second when the next cost is applied to a set of new, wrong parameters. Note that nothing happens in the interface between TARGET runs and TARGET stops, it simply waits for TARGET to finish.

I have checked to see if exp_output.txt for some reason might not be deleted, causing M-LOOP to think the experiment finished again, but this is not the case. It is deleted at the correct point in the program. I have the latest version of M-LOOP. This seems a serious problem to me, as the program will completely misunderstand the parameter-cost space in this way. But for some reason it has only recently begun being a problem, and it only happens sometimes.

Can you help me fix this?

Multiprocessing not working on Windows

The Gaussian process is run as a separate process. Unfortunately windows does not fork python when it runs a new process but creates a new session and pickles the process object. Currently the GaussianProcessLearner can not be pickled. Currently looking for a work around.

Neural net learner fail

Hi,
Using the new neural net learner we have experienced the following error at the end of a several hundred run long execution.

Traceback (most recent call last):
File "c:\program files\anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
self.run()
File "c:\program files\anaconda3\lib\site-packages\mloop\learners.py", line 1914, in run
self.fit_neural_net()
AttributeError: 'NeuralNetLearner' object has no attribute 'fit_neural_net'

The logs, archives, and config files are given in:

mloop neural net fail.txt
mloopfiles.zip

Previously we have had good success using the gaussian process learner.

Thanks.

Ashby

Add a differential optimizer

A differential optimizer as a compliment to the machine learning algorithm, or to benchmark its performance would be helpful.

Tutorial Code Bug for neural_net and gaussian_process Learners

Hi, I've been using M-LOOP, but I've had an issue where the learner archive is not being created exclusively for the neural_net learner type. It works fine for the other learners. To distill the issue, I notice that the same thing happens in the less-complicated tutorial code on the M-LOOP website. I am using M-LOOP 3.2.1

To Reproduce
Execute the following code. (It's the same as the tutorial code on the website, but I have specified the controller_type='neural_net')


# Imports for python 2 compatibility
from __future__ import absolute_import, division, print_function

__metaclass__ = type

# Imports for M-LOOP
import mloop.interfaces as mli
import mloop.controllers as mlc
import mloop.visualizations as mlv

# Other imports
import numpy as np
import time


# Declare your custom class that inherits from the Interface class
class CustomInterface(mli.Interface):

    # Initialization of the interface, including this method is optional
    def __init__(self):
        # You must include the super command to call the parent class, Interface, constructor
        super(CustomInterface, self).__init__()

        # Attributes of the interface can be added here
        # If you want to precalculate any variables etc. this is the place to do it
        # In this example we will just define the location of the minimum
        self.minimum_params = np.array([0, 0.1, -0.1])

    # You must include the get_next_cost_dict method in your class
    # this method is called whenever M-LOOP wants to run an experiment
    def get_next_cost_dict(self, params_dict):
        # Get parameters from the provided dictionary
        params = params_dict['params']

        # Here you can include the code to run your experiment given a particular set of parameters
        # In this example we will just evaluate a sum of sinc functions
        cost = -np.sum(np.sinc(params - self.minimum_params))
        # There is no uncertainty in our result
        uncer = 0
        # The evaluation will always be a success
        bad = False
        # Add a small time delay to mimic a real experiment
        time.sleep(.01)

        # The cost, uncertainty and bad boolean must all be returned as a dictionary
        # You can include other variables you want to record as well if you want
        cost_dict = {'cost': cost, 'uncer': uncer, 'bad': bad}
        return cost_dict


def main():
    # M-LOOP can be run with three commands

    # First create your interface
    interface = CustomInterface()
    # Next create the controller. Provide it with your interface and any options you want to set
    controller = mlc.create_controller(interface,
                                       controller_type='neural_net',
                                       max_num_runs=1000,
                                       target_cost=-2.99,
                                       num_params=3,
                                       min_boundary=[-2, -2, -2],
                                       max_boundary=[2, 2, 2])
    # To run M-LOOP and find the optimal parameters just use the controller method optimize
    controller.optimize()

    # The results of the optimization will be saved to files and can also be accessed as attributes of the controller.
    print('Best parameters found:')
    print(controller.best_params)

    # You can also run the default sets of visualizations for the controller with one command
    mlv.show_all_default_visualizations(controller)


# Ensures main is run when this code is run as a script
if __name__ == '__main__':
    main()

The error I get from running this is

Traceback (most recent call last):
  File "C:/Users/matth/PycharmProjects/pythonProject/mloop_quick_test.py", line 77, in <module>
    main()
  File "C:/Users/matth/PycharmProjects/pythonProject/mloop_quick_test.py", line 72, in main
    mlv.show_all_default_visualizations(controller)
  File "C:\Users\matth\labscript-suite\Python_38\lib\site-packages\mloop\visualizations.py", line 91, in show_all_default_visualizations
    create_learner_visualizations(
  File "C:\Users\matth\labscript-suite\Python_38\lib\site-packages\mloop\visualizations.py", line 258, in create_learner_visualizations
    visualizer = create_learner_visualizer_from_archive(
  File "C:\Users\matth\labscript-suite\Python_38\lib\site-packages\mloop\visualizations.py", line 198, in create_learner_visualizer_from_archive
    controller_type = mlu.get_controller_type_from_learner_archive(filename)
  File "C:\Users\matth\labscript-suite\Python_38\lib\site-packages\mloop\utilities.py", line 274, in get_controller_type_from_learner_archive
    learner_dict = get_dict_from_file(learner_filename, file_type)
  File "C:\Users\matth\labscript-suite\Python_38\lib\site-packages\mloop\utilities.py", line 222, in get_dict_from_file
    dictionary = txt_file_to_dict(filename)
  File "C:\Users\matth\labscript-suite\Python_38\lib\site-packages\mloop\utilities.py", line 158, in txt_file_to_dict
    with open(filename,'r') as in_file:
FileNotFoundError: [Errno 2] No such file or directory: './M-LOOP_archives/learner_archive_2022-01-05_13-35.txt'

Process finished with exit code 1

because the learner archive file was not created. I do not get this error with the GP or DE learners. Could someone suggest a fix for this? Is there something that I misunderstood? Thanks!

Include phrase "Bayesian Optimization" in the docs

M-LOOP's Gaussian process and neural network optimizers are implementations of Bayesian optimization. The docs should include those magic words, probably at least in the introductory section.

Error when computing predicted best parameters

Describe the bug
When trying to compute the best predicted parameters and the associated cost

To Reproduce
Simply run the provided example python_controlled_experiment.py. At the end of the optimization, when trying to find the predicted optimum, there is a sklearn error, and the optimum is not computed. When adding noise to the cost function to force the predicted optimum to differ from the best found result, I still get the error.

Expected behavior
I suppose I should get at this step the predicted best parameters

Screenshots

Process GaussianProcessLearner-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
     self.run()
  File "~/M-LOOP/mloop/learners.py", line 2048, in run
    self.find_global_minima()
  File "~/M-LOOP/mloop/learners.py", line 2095, in find_global_minima
    self.predicted_best_cost = self.cost_scaler.inverse_transform(self.predicted_best_scaled_cost)
  File "~/mloop/lib/python3.8/site-packages/scikit_learn-1.0-py3.8-linux-x86_64.egg/sklearn/preprocessing/_data.py", line 1016, in inverse_transform
    X = check_array(
  File "~/mloop/lib/python3.8/site-packages/scikit_learn-1.0-py3.8-linux-x86_64.egg/sklearn/utils/validation.py", line 761, in check_array
    raise ValueError(
ValueError: Expected 2D array, got 1D array instead:
array=[-1.62357476].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Desktop (please complete the following information):

Ubuntu 20.04, with all the python packages installed via pip in a venv
Arch Linux, with python dependencies from the package manager

Make interface multiprocessing optional

Currently the interface class inherits from multiprocessing. However running the interface to the experiment in a forked python environment can lead to some trouble in certain OS, particularly when calling matplotlib or other libraries which are not multiprocessing safe.

python setup.py test report warning

Hi,
I installed the M-LOOP from source. I have run '$python setup.py develop'. It is done with no error. And then I run '$python setup.py test' an warning occurred,
============================== warnings summary ===============================
mloop\testing.py:71
D:\grocery\M-LOOP\mloop\testing.py:71: DeprecationWarning: invalid escape sequence \s
'''

-- Docs: https://docs.pytest.org/en/latest/warnings.html
================= 18 passed, 1 warnings in 107.85s (0:01:47) ==================

`Controller` docstring has some incorrectly specified default values

The entries for controller_archive_filename and controller_archive_file_type are wrong about what the default values are.

Installation error on OS X 10.11

I followed the installation instructions using anaconda, and creating a python environment with scikit-learn (named chipGA). I get an error preventing me from installing M-LOOP:

$ python setup.py test
Traceback (most recent call last):
File "~/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/setuptools/dist.py", line 434, in fetch_build_egg
AttributeError: 'Distribution' object has no attribute '_egg_fetcher'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "setup.py", line 41, in
'Topic :: Scientific/Engineering :: Physics']
File "/anaconda/envs/chipGA/lib/python3.5/distutils/core.py", ine 108, in setup
_setup_distribution = dist = klass(attrs)
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/setuptools/dist.py", line 348, in init
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/setuptools/dist.py", line 394, in fetch_build_eggs
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/pkg_resources/init.py", line 851, in resolve
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/pkg_resources/init.py", line 1123, in best_match
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/pkg_resources/init.py", line 1135, in obtain
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/setuptools/dist.py", line 453, in fetch_build_egg
File "/anaconda/envs/chipGA/lib/python3.5/site-packages/setuptols-26.1.1-py3.5.egg/setuptools/dist.py", line 418, in get_egg_cache_dir
PermissionError: [Errno 13] Permission denied: './.eggs'

Here's the output of 'conda list' for the environment:

mkl 11.3.3 0
numpy 1.11.1 py35_0
openssl 1.0.2h 2
pip 8.1.2 py35_0
python 3.5.2 0
readline 6.2 2
scikit-learn 0.17.1 np111py35_2
scipy 0.18.0 np111py35_0
setuptools 26.1.1 py35_0
sqlite 3.13.0 0
tk 8.5.18 0
wheel 0.29.0 py35_0
xz 5.2.2 0
zlib 1.2.8 3

Issue while loading my_config.txt

While I run the comand M-LOOP -c [my_config.txt] in the root where the comand "M-LOOP" is invoked, appear me two errors, I solved the fist one but later appear the secod error :

1rs error:

root@10814d9fcf9e:/notebooks/M-LOOP# M-LOOP -c [my_config.txt] Traceback (most recent call last): File "/usr/local/bin/M-LOOP", line 6, in exec(compile(open(file).read(), file, 'exec')) File "/notebooks/M-LOOP/bin/M-LOOP", line 38 main(sys.argv[1:])
IndentationError: unindent does not match any outer indentation level

solve this by modifieding the line of code 37 in /bin/M-LOOP file :

if __name__=="__main__":
    mp.freeze_support()
    main(sys.argv[1:])

and then run : M-LOOP -c [my_config.txt]

2nd Error:

Traceback (most recent call last):
File "/usr/local/bin/M-LOOP", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/notebooks/M-LOOP/bin/M-LOOP", line 38, in
main(sys.argv[1:])
File "/notebooks/M-LOOP/bin/M-LOOP", line 34, in main
_ = mll.launch_from_file(config_filename)
File "/notebooks/M-LOOP/mloop/launchers.py", line 26, in launch_from_file
file_kwargs = mlu.get_dict_from_file(config_filename,'txt')
File "/notebooks/M-LOOP/mloop/utilities.py", line 159, in get_dict_from_file
dictionary = txt_file_to_dict(filename)
File "/notebooks/M-LOOP/mloop/utilities.py", line 113, in txt_file_to_dict
with open(filename,'r') as in_file:
FileNotFoundError: [Errno 2] No such file or directory: '[my_config.txt]'

I confused about this :/ , the file is in the root where the M-LOOP is called.

Make creating of new figures optional

One thing I would find convenient was to be able to aranging the plots created by mloop.visualizations more freely (e.g. position the plots in a specific figure/subfigure). For that one could make the lines

figure_counter += 1
plt.figure(figure_counter)

optional. I think that would be a useful feature, but obviously not necessary.

Add binary installer package for Windows

Installing on windows machines is currently quite complicated. A binary installer would be useful. Preferably with an ability to update the installation.

Default ANN - Possible to create custom ANN

Hi,

Apologies as this is not really an issue but an asking for advice. I am currently using M-LOOP for optimising ring resonators on a silicon chip. I would like to employ an ANN to do this alongside online optimisation, having already been using the GPR in M-LOOP. I am aware the Neural Net controller can be used out of the box so to speak, however I am unsure as to how to find the depth of this ANN, it's activation functions etc. I am wondering how one can create their own ANN to be used in M-LOOP? So for example can one be created in keras and then imported into M-LOOP, or created in M-LOOP itself. I have read the paper 'Applying machine learning optimization methods to the production of a quantum gas' were they have a self defined ANN and use it in M-LOOP, but I do not know how to do this myself.

Thankyou for any help!

Support for discrete parameter spaces

Is your feature request related to a problem? Please describe.
I'm trying to use your software for some optimizations of experimental parameters. One potential problem is that some of my parameters have discrete values, e.g., integers 0, 1, ..., 255, in addition to other parameters that may be continuous. I don't know how to treat these parameters within your framework.

Describe the solution you'd like

Could you upport for discrete parameter ranges?
when and how can one treat discrete ranges as continuous by rounding? How well could this work?

Describe alternatives you've considered
I'm not so familiar with the mathematical background for your optimization methods. Thus, I'm not sure to what extent discrete parameter ranges are compatible with Gaussian processes or the other algorithms employed. However, the library scikit-optimize (https://pypi.org/project/scikit-optimize/) also uses Gaussian process regression and does allows discrete or categorical parameter ranges. As scikit-learn is commonly used for hyperparameter optimization in machine learning (many dimensions of discrete values), I assume that there are practical ways to allow discrete values in Gaussian process regression.

I have been using scikit-optimize. However, your library seems to have some features that scikit-optimize is lacking, such as parallel data acquisition and processing. Therefore I will be trying it as well and comparing results.

An obvious "alternative" is to treat the discrete spaces as continuous and round off the parameters when they are fed to experimental control. For reasonably fine-grained parameter ranges, such as 256 or 1000 distinct values, I imagine this might work, even though it technically violates some smoothness assumptions implied by the algorithms? I would love to hear some comments from you if this is feasible or would break the optimization. What kinds of hyper-parameters or modes of operation would be best suited to this scenario?

Ignores halting conditions

While testing M-LOOP by manually deleting exp_input and creating exp_output.txt, using the simple configuration

num_params = 2; max_num_runs = 3

I was unable to get the program to stop. Using configuration

num_params = 2; max_num_runs = 3; target_cost = 0.1

it ends seemingly at random, both running more than three times, and continuing past costs < 0.1. See attached file for an example of behaviour with the second configuration.
strange_output.txt

Is it not supposed to stop before 7 runs, or at the instant it surpasses target_cost?

I'm running on Linux Mint.

Scikit-Learn `RuntimeError` due to `ParameterScaler` using `*args`

Describe the bug
With recent versions of scikit-learn (somewhere between 1.1.2 and 1.2.1), the scaler classes now prohibit using *args in the call signatures for their __init__() methods. M-LOOP's custom ParameterScaler class inherits from scikit-learn's MinMaxScaler and uses *args to pass any additional positional arguments to the parent class. We don't actually use that capability at all (it was just put there to be flexible if the parent class call signature changes), so it's easy to just remove *args from the call signature.

To Reproduce
Steps to reproduce the behavior:

Create an environment with M-LOOP and a recent version of scikit-learn (1.2.1 is new enough to exhibit the issue).
Run M-LOOP's tests.

Desktop (please complete the following information):

OS: Ubuntu 20.04

Problem in mloop.visualizations.mlv.ControllerVisualizer

I realized that when using the the function plot_hyperparameters_vs_run I could get an error quite easily in case I had less different length scales than parameters (e.g. when having only one length scale for all). I could easily circumvent this by simply specifying a parameter subset but fixing this might be useful.

Thank you for the great work you put into this project. I love using M-LOOP!

test_gaussian_process_complete_config fails

Describe the bug
The test_gaussian_process_complete_config() test fails due to the presence of an unused keyword argument in the config file.

To Reproduce
Steps to reproduce the behavior:

cd to the M-LOOP directory
Run pytest -v -k gaussian_process_complete_config

Desktop (please complete the following information):

OS: Windows 10
Version: M-LOOP Commit 6ca72cb

Additional context
The issue is that the gp_training_override_kwargs argument is passed to GuassianProcessLearner, but that option was removed in 845a74b. I forgot to update a couple places in the code when doing that. In particular I forgot to remove it from the complete Gaussian process learner config file, which is the reason that this test fails. I also forgot to remove it from the call to super() in GaussianProcessVisualizer.__init__(), though that doesn't have any effect here. Fortunately these are easy fixes.

Here is the test output of the test for reference:

==================================================================================================== test session starts ===================================================================================================== 
platform win32 -- Python 3.7.6, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- C:\Users\user_name\Software\anaconda3\envs\mloop_install_test_2\python.exe
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('C:\\Users\\user_name\\Software\\M-LOOP\\.hypothesis\\examples')
rootdir: C:\Users\user_name\Software\M-LOOP
plugins: hypothesis-5.5.4, arraydiff-0.3, astropy-header-0.1.2, doctestplus-0.5.0, openfiles-0.4.0, remotedata-0.3.2
collected 20 items / 19 deselected / 1 selected

tests/test_examples.py::TestExamples::test_gaussian_process_complete_config FAILED                                                                                                                                      [100%]

========================================================================================================== FAILURES ==========================================================================================================
_____________________________________________________________________________________ TestExamples.test_gaussian_process_complete_config _____________________________________________________________________________________ 

self = <test_examples.TestExamples testMethod=test_gaussian_process_complete_config>

    def test_gaussian_process_complete_config(self):
        controller = mll.launch_from_file(mlu.mloop_path+'/../examples/gaussian_process_complete_config.txt',
                                          interface_type = 'test',
                                          no_delay = False,
>                                         **self.override_dict)

test_examples.py:100:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  

config_filename = 'c:\\users\\user_name\\software\\m-loop\\mloop/../examples/gaussian_process_complete_config.txt', kwargs = {'console_log_level': 10, 'file_log_level': 30, 'interface_type': 'test', 'no_delay': False, ...}       
file_kwargs = {'gp_training_override_kwargs': False}, interface = <TestInterface(Thread-1, initial)>, controller = <mloop.controllers.GaussianProcessController object at 0x00000238A3112C88>
extras_kwargs = {'visualizations': False}

    def launch_from_file(config_filename,
                         **kwargs):
        '''
        Launch M-LOOP using a configuration file. See configuration file documentation.
    
        Args:
            config_filename (str): Filename of configuration file
            **kwargs : keywords that override the keywords in the file.
    
        Returns:
            controller (Controller): Controller for optimization.
        '''
        try:
            file_kwargs = mlu.get_dict_from_file(config_filename,'txt')
        except (IOError, OSError):
            print('Unable to open M-LOOP configuration file:' + repr(config_filename))
            raise
        file_kwargs.update(kwargs)
        #Main run sequence
        #Create interface and extract unused keywords
        interface = mli.create_interface(**file_kwargs)
        file_kwargs = interface.remaining_kwargs
        #Create controller and extract unused keywords
        controller = mlc.create_controller(interface, **file_kwargs)
        file_kwargs = controller.remaining_kwargs
        #Extract keywords for post processing extras, and raise an error if any keywords were unused.
        extras_kwargs = _pop_extras_kwargs(file_kwargs)
        if file_kwargs:
            logging.getLogger(__name__).error('Unused extra options provided:' + repr(file_kwargs))
>           raise ValueError
E           ValueError

..\mloop\launchers.py:42: ValueError
---------------------------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------------------------- 
INFO     M-LOOP version 3.2.1
DEBUG    M-LOOP Logger configured.
DEBUG    Creating interface.
DEBUG    Setting default landscapes
INFO     Using the test interface with the experiment.
DEBUG    Controller init completed.
DEBUG    Learner init completed.
DEBUG    Random learner init completed.
DEBUG    Learner init completed.
ERROR    Unused extra options provided:{'gp_training_override_kwargs': False}
----------------------------------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------------------------------ 
INFO     mloop:utilities.py:86 M-LOOP version 3.2.1
DEBUG    mloop:utilities.py:87 M-LOOP Logger configured.
DEBUG    mloop.interfaces:interfaces.py:82 Creating interface.
DEBUG    mloop.testing:testing.py:33 Setting default landscapes
INFO     mloop.interfaces:interfaces.py:40 Using the test interface with the experiment.
DEBUG    mloop.controllers:controllers.py:235 Controller init completed.
DEBUG    mloop.learners.1:learners.py:185 Learner init completed.
DEBUG    mloop.learners.1:learners.py:469 Random learner init completed.
DEBUG    mloop.learners.2:learners.py:185 Learner init completed.
ERROR    mloop.launchers:launchers.py:41 Unused extra options provided:{'gp_training_override_kwargs': False}
============================================================================================== 1 failed, 19 deselected in 3.37s ==============================================================================================

Update install instructions

Right now the docs suggest running python setup.py develop to perform an editable install, but the better approach is to use pip install -e <path_to_mloop>. See e.g. https://stackoverflow.com/questions/30306099/pip-install-editable-vs-python-setup-py-develop.

Similarly, the suggested testing command at the moment is python setup.py test which is now a deprecated approach (see pypa/setuptools#1684, pytest-dev/pytest#5534, pytest-dev/pytest#5546). The command pytest runs the tests, so that should be the suggested method.

Error when reproducing visualizations

I am trying to reproduce some visualizations as you have written in http://m-loop.readthedocs.io/en/latest/visualizations.html#reproducing-visualizations, but when I run the sample code with one of my archives, I get the following error:

/path/anaconda3/lib/python3.5/site-packages/sklearn/gaussian_process/gpr.py:308: UserWarning: Predicted variances smaller than 0. Setting those variances to 0.
warnings.warn("Predicted variances smaller than 0. "
Traceback (most recent call last):
File "revisualize.py", line 6, in
mlv.create_gaussian_process_learner_visualizations('M-LOOP_archives/learner_archive_2017-02-13_10-07.txt',file_type='txt')
File "/path/M-LOOP/mloop/visualizations.py", line 360, in create_gaussian_process_learner_visualizations
visualization.plot_all_minima_vs_cost()
File "/path/M-LOOP/mloop/visualizations.py", line 489, in plot_all_minima_vs_cost
if not self.has_all_minima:
AttributeError: 'GaussianProcessVisualizer' object has no attribute 'has_all_minima'

This is both for processes that have finished and have been interrupted. How do I fix this? Archives below.

controller_archive_2017-02-13_10-07.txt
learner_archive_2017-02-13_10-07.txt

Some visualization method names and plot labels are misleading

In particular:

GaussianProcessVisualizer.plot_hyperparameters_vs_run()
- This method actually plots fitted hyperparameter values as a function of fit number, not run number, so the name is a bit misleading. The x-axis label and docstring also say run number instead of fit number.
- The title of the plot accurately says '...vs fit number'.
- There is one fit per generation, so alternatively you could say it plots fitted hyperparameter values vs generation.
GaussianProcessLearner.plot_noise_level_vs_run() has the same issues as plot_hyperparameters_vs_run().
NeuralNetVisualizer.plot_losses()
- The loss is recorded after every 10 epochs.
- The docstring, x-axis label, and title say its as a function of "training run", but generally there are more than 10 epochs per training run.

To make things more clear I was thinking of making the following changes:

Rename the GaussianProcessVisualizer methods mentioned above to end in vs_fit().
- Update the x-axis labels and docstrings to say fit number.
- Mention that there is one fit per generation in the docstring.
- Keep the methods with the old "vs_run" names for now. Make them issue a deprecation warning then call the methods with the new names.
Change the docstring, x-axis label, and title in NeuralNetVisualizer.plot_losses() to say epochs instead of training run.
- Change the indices on the x-axis to be 0, 10, 20... instead of 0,1, 2...
- Mention in the docstring that the loss is recorded every 10 epochs.

@charmasaur do you have any opinions on this? If not I'll just make the changes listed above, probably some time in the next week.

Small Differential Evolution Learner Docstring Typos

Looks like there's two small errors in the evolution_strategy docstring section of the DifferentialEvolutionLearner in learners.py. The strategy 'best1' is listed twice (one should be 'best2') and the default is said to be 'best2' but appears to be set to 'best1' in DifferentialEvolutionLearner.__init__().

Add python 2 compatibility

Make M-LOOP compatible with python 2 and 3

Record global minima throughout learner process

I'm trying to study the evolution of the GP learner over time (i.e. how does the fit improve after 10, 50, 100... runs). My original method involved collecting the parameters from the controller_archive dictionary for each run where a lower cost was found than all previous runs:

(plotting cost of each run vs run number, blue points are runs, blue line marks current minimum cost and red markers represent some examples of new, better costs being found)

However this method starts to work less well when noise is added to the fit, as the cost function can't decrease as quickly, and I think it would be interesting to see how the learner's best guess changes over time.

The ideal solution for this would be to optionally run mloop.learners.find_global_minima() and record the output every x runs in the learner_archive or controller_archive. Optionally because it will probably cost performance even though it's only a quick search, if it runs often.

Currently, find_global_minima() is only run if the Learner attribute predict_global_minima_at_end is True, after optimisation. This change would mean the loop within Learner.run() is modified to include an if statement:

self.best_params_every_x_runs = []
for _ in range(self.generation_num):
    self.log.debug('Gaussian process learner generating parameter:'+ str(self.params_count+1))
    next_params = self.find_next_parameters()
    self.params_out_queue.put(next_params)
    if record_minima_over_time:
        if _ % x == 0:
            self.best_params_every_x_runs.append(self.find_global_minima())
    if self.end_event.is_set():
        raise LearnerInterrupt()

Here, x is the repetition rate aka how often the global minimum is checked and record_minima_over_time is a boolean which is True if recording is enabled. This is from the GaussianProcessLearner but I assume the NeuralNetLearner would work similarly.

I'd be happy to implement this in principle but I haven't fully understood how the controller_archive process works yet, and I'm very new to open-source & github. I checked the current parameters in both controller & learner archive and they don't appear to contain this data already either. If there's a way to do this without a pull request, please let me know!

Issue with installation

I am following your installation instructions on Ubuntu 16.04. I have python 3.5.2 and fully functional tensorflow-gpu 1.2.0 installed on it, but I receive the following error:

Processing dependencies for M-LOOP==2.2.0
Searching for tensorflow>=1.2.0
Reading https://pypi.python.org/simple/tensorflow/
No local packages or working download links found for tensorflow>=1.2.0
error: Could not find suitable distribution for Requirement.parse('tensorflow>=1.2.0')

Incompatible with latest numpy

Bug Desctiption
Since version 1.24 numpy no longer has the attributes np.float (or np.int etc). They have been deprecated since numpy 1.20. They are still used in M-LOOP, which will cause issues when using latest numpy. See https://numpy.org/doc/stable/release/1.20.0-notes.html#using-the-aliases-of-builtin-types-like-np-int-is-deprecated for the deprecation note.

To Reproduce
Steps to reproduce the behavior:

Install numpy 1.24
Try to use learners.py
AttributeError: module 'numpy' has no attribute 'float'.

Expected behavior
Don't use stuff that doesn't exist.

Additional context
This is trivial to fix (just replace np.float with float). I can create a PR if that helps.

Simplify training archive arguments

Currently if you want to provide a training archive to an optimization using a Gaussian process, you have to pass the name of the file as gp_training_filename. On the other hand, if you want to do the same thing for the neural net you have to pass the name of the file as nn_training_filename. That's a bit annoying because you have to change the argument name if you change which learner you want to use, even though you're passing in the same file to fulfill the same role. For concreteness, I should mention that both learners are able to take an archive from a previous optimization run with any learner. So it's not the case that these arguments have different names because they need to be from optimizations run with different learners. We should deprecate these arguments and just use training_filename argument for the MachineLearner class (then update the docs accordingly).

Along these same lines, we should deprecate the gp_training_file_type and nn_training_file_type arguments since the file type should just be automatically determined from the file extension. Mentions of these options should then be removed from the documentation.

Neural Net Overfits Data

Describe the bug
At the end of an optimization with the neural net learner, the predicted cross sections show signs of overfitting. I haven't done a full study on how this affects optimizations, but it leads to extra local minima in the predicted cost landscape which may make it harder to pick new parameter values. It also makes it harder for the user to interpret the cross sections as they have to guess which features are real and which are due to overfittting.

Expected behavior
The predicted landscapes from the neural net learner should fit the data to a reasonable degree; they should be relatively smooth without sharp wiggles and spikes.

Desktop:

OS: Windows 10
M-LOOP commit bc172f2 (soon after version 3.1.1)
TensorFlow 2.1.0

To Reproduce
This can be seen by running an optimization with the neural net learner. For convenience I've attached the files from an optimization that demonstrates this behavior: 2020-11-04_1667_raman_cool_n_stage_4.zip. I've also included some example code below to play around with the results in those attached files. I'll omit many of the plots generated but attach enough of them to demonstrate the issue. Generally I'll include one of the cross section plots generated by a single net and the plot that shows the min/max/average of the predictions of the different nets.

First, here is what the predicted cross sections look like at the end of the optimization:

# Set options (Set learner_archive to the correct path for your machine)
learner_archive = 'path/to/learner_archive_2020-11-04_22-53.txt'

# Load in data from archive.
import mloop.visualizations as mlv
learner_visualizer = mlv.create_learner_visualizer_from_archive(learner_archive)

# Show cross sections generated during optimization.
learner_visualizer.do_cross_sections()

Which yields four plots, including these two:

There are a lot of sharp edges, wiggles, and local minima which are likely due to overfitting. To check that we can delete the neural nets then recreate them with the same regularization coefficient. The only difference is that their weights will be reinitialized to random values then trained.

# Delete the existing nets and create/train new ones.
for net in learner_visualizer.neural_net:
    net.destroy()
learner_visualizer.create_neural_net()
for j in range(learner_visualizer.num_nets):
    learner_visualizer._fit_neural_net(j)

# Plot new fitted cross sections.
learner_visualizer.do_cross_sections()

This yields

The predicted cross sections are now much smoother, as the cost landscapes presumably are. As a further check we can make sure that training the nets more causes overfitting again. The following section took a while, ~5 minutes, to run on my machine.

n_extra_trainings = 100

for k in range(n_extra_trainings):
    for j in range(learner_visualizer.num_nets):
        learner_visualizer._fit_neural_net(j)

# Plot new fitted cross sections.
learner_visualizer.do_cross_sections()

The nets seem overfit again. For example, the blue curve now has a large spike which is not a real feature of the cost landscape. Running the training routines too many times leads to overfitting, and the training routines are run many times throughout an M-LOOP optimization.

Again I haven't done much looking into how this effects M-LOOP's attempts to optimize. This issue certainly does make it harder for the user to interpret the final cost landscapes though, e.g. to figure out which parameters are important or not. Maybe one solution would be to reinitialize the nets periodically during the optimization?

michaelhush / m-loop Goto Github PK

m-loop's People

Contributors

Stargazers

Watchers

Forkers

m-loop's Issues

Recommend Projects

Recommend Topics

Recommend Org