automl / fanova Goto Github PK

This project forked from frank-hutter/fanova

Functional ANOVA

Python 79.16% Jupyter Notebook 10.54% Makefile 10.29%

fanova's Introduction

Fanova

Functional ANOVA: an implementation of the ICML 2014 paper "An Efficient Approach for Assessing Hyperparameter Importance" by Frank Hutter, Holger Hoos and Kevin Leyton-Brown.

Documentation

An 'ever growing' documentation for the Python bindings can be found at https://automl.github.io/fanova/

Note To use fANOVA please make sure to use SWIG version 3.0.12. If you experience problems, try using anaconda or reinstalling pyrfr (#102).

fanova's People

Contributors

Stargazers

Watchers

fanova's Issues

Hyperparameter configuration space is unordered but is used as if is ordered by data columns

ConfigurationSpace.get_hyperparameters() only performs
list(hyperparamsdict.values())
and so is unordered (https://github.com/automl/ConfigSpace/blob/master/ConfigSpace/configuration_space.py#L553).

Yet in your code you're looping over the hyperparameters and expecting the order of the items to match the order of the data columns:

fanova/fanova/fanova.py

Line 61 in 9a7b500

self.cs_params =self.cs.get_hyperparameters()

This leads to errors where the columns are assumed to have a different type than they actually have.

require matplotlib > 1.4.0

plot_pairwise_marginal() does not work with matplotlib versions (probably) before this commit matplotlib/matplotlib@d32e2ab
and crashes with AttributeError: 'module' object has no attribute '_string_to_bool'
For me it works with matplotlib 1.4.2, but not 1.4.0

no module resource when 'import pysmac'

After I have installed pysmac, I run 'branin_pysmac_example.py'. It shows me no module resource in line 8 of 'remote-smac.py'. How could we solve this issue? Thank you.

Fanova crashes on smac example

Hi,

I installed the newest SMAC and FANOVA versions and ran the smac branin example. Then I tried to visualize the results with FANOVA

cd /smac/smac-output/NoScenarioFile
python

from pyfanova.fanova import Fanova
f = Fanova("state-run394311859")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Traceback (most recent call last):
File "", line 1, in
File "build/bdist.linux-x86_64/egg/pyfanova/fanova.py", line 97, in init
RuntimeError: Failed starting fanova. Did you start it from a SMAC state-run directory?

Then I tried my own SMAC setup and got a different error:

from pyfanova.fanova import Fanova
f = Fanova("state-run1")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
java.lang.IllegalArgumentException: You must supply an intra instance objective
at ca.ubc.cs.beta.aeatk.runhistory.NewRunHistory.(NewRunHistory.java:161)
at net.aeatk.fanova.FAnovaExecutor.main(FAnovaExecutor.java:86)
Traceback (most recent call last):
File "", line 1, in
File "build/bdist.linux-x86_64/egg/pyfanova/fanova.py", line 97, in init
RuntimeError: Failed starting fanova. Did you start it from a SMAC state-run directory?

Edit: I now got it to run by adding "intra-obj = MEAN" to the scenario file.
I also noticed, that the packaged smac example from fanova crashes with a Deserialization error

Adding header option for parameter names in fanova_from_csv

Hi all,

I am using fANOVA with some data that is not obtained through pySMAC/SMAC (I use hyperopt). Therefore, to analyze the hyperparameters, I use the function fanova_from_csv.

I was wondering if it would be possible to add the option of including a header in the CSV with the names of the variables and use them in the different plots and printing commands instead of X0, X1, ...

I know that this options is the default when using fANOVA with pySMAC generated data. It would be also great to have it available when importing the data manually.

Regards,

Jesus.

[question] Does this package support logscale hyperparameters

I.e., I would like the marginals to be computed on a different scale, giving more emphasize to the lower values and less to the higher values.

Cut/Cap at default performance

It would be nice, if fANOVA could cut/cap at the default (performance).
This way the results would only quantify improvement.
Of course this would only be possible if fANOVA is given a ConfigSpace object and does not have to construct it from the data.

java version check with OpenJDK

Hi!

Upon installing I stumbled upon a small problem with the regex that you use to check the java version.
The fix boils down to replace the string 'java' for '.*' in the regex ... so no matter how the actual jdk implementation (java, openjdk, ... ?) is named, the regex looks for the version string only.

I provided a diff:

diff --git a/pyfanova/fanova.py b/pyfanova/fanova.py
index c95bfa9..a194fb1 100644
--- a/pyfanova/fanova.py
+++ b/pyfanova/fanova.py
@@ -19,7 +19,7 @@ def check_java_version():
     if len(out) < 1:
         print("Failed checking Java version. Make sure Java version 7 or greater is installed
         return False
-    m = re.match('java version "\d+.(\d+)..*'.encode("utf-8"), out[0])
+    m = re.match('.*version "\d+.(\d+)..*'.encode("utf-8"), out[0])
     if m is None or len(m.groups()) < 1:
         print("Failed checking Java version. Make sure Java version 7 or greater is installed
         return False

Maybe you'd like to change that ...

Add Python 2 vs. 3 compatibility notes in the README

Notes for self in case I need to install again in the future:

It's not clear whether Python 2 or Python 3 or both are supported.
Some of the dependencies are poorly documented and give the wrong impression that only Python 2 is supported. This is not the case, so we don't need to much with a Python 2 virtualenv.
In fact, as far as I can tell, only Python 3 is supported.
Once we know this, the installation is simple (tested with Miniconda + Python 3.6):
- conda install <typical packages, e.g. numpy>
- conda install cython swig
- pip install ConfigSpace # ConfigSpace 0.4.6 was installed and works
- pip install pyrfr # pyrfr 0.8.0 was installed and works
- pip install fanova # fanova 2.0.3 was installed and works

Accessing the variance value

Is it possible from within the library to access the variance values obtained during the decomposition? In particular, accessing the total variance would be nice.

wrong x-limits for marginal plots

Hi,
I want to generate the marginal plots of the following ConfigSpace with the plot_marginal() function:
burn_in, Type: UniformFloat, Range: [0.0, 0.8], Default: 0.29999999999999999
l_rate, Type: UniformFloat, Range: [1e-06, 0.1], Default: 0.01, on log-scale
mdecay, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.050000000000000003
n_units_1, Type: UniformInteger, Range: [16, 512], Default: 64, on log-scale
n_units_2, Type: UniformInteger, Range: [16, 512], Default: 64, on log-scale

However, the x-limits of the plots seem to be wrongly scaled for hyperparameters that are on a log-scale. For instance, the generated plot for "l_rate" only shows the marginals between 1e-6 and 3*1e-6 instead of 1e-6 and 1e-1. Can I somehow manually set the x_limits? For non-log scaled hyperparameters the plots look fine.

Handling conditionals

Currently fANOVA is not handling conditionals.
I.e. if a parent parameter activate or deactivates a child parameter this is currently not handled.

Maybe these conditionals could be handled like pairwise marginals.

TypeError: unorderable types: dict() < dict() in get_most_important_pairwise_marginals

When running the example for the pyrfr implementation I get the above mentioned error.

It is thrown in line 336 of fanova.py, in the get_most_important_pairwise_marginals method

The whole error message:
Traceback (most recent call last): File "examples/fanova_example.py", line 35, in <module> best_margs = f.get_most_important_pairwise_marginals(n=10) File "/home/biedenka/envs/imp/local/lib/python3.5/site-packages/fanova-2.0-py3.5.egg/fanova/fanova.py", line 366, in get_most_important_pairwise_marginals TypeError: unorderable types: dict() < dict()

pairwise plot: ignores logscale of hyperparams

The individual plots shows hyperparameters that are on log-scale on a log-scale. The pairwise plot always shows in normal space.

Discrete & Categorical Variables

Hi,

I'm successfully using Fanova with continuous variables, it's doing a great job. I'm loading data coming from scikit-learn random grid search results, with the FanovaFromCSV function; unfortunately I can't find a way to declare some hyperparameters as discrete or categorical when loading data.

Can you give me a pointer to do that ?

update pypi

Integers on logscale result in empty plots

In my configspace I have some parameters like the following:
sp-first-restart [25,3200] [100]il
Per the fANOVA result, it is rather an unimportant parameter with a value at around 0.0001.
There should however be no reason for the plot to be empty.

The data with which I encountered this problem are part of the PIMP package examples see.

Bug in fanova/visualizer.py

Hello,

I ran fanova on spear-qcp scenario, and got an error in file fanova/visualizer.py. There seems to be a bug in lines 166, 167, 169, 171: variables "min" and "max" should be changed to "lower_bound" and "upper_bound"

Regards,
Nguyen

add more docstrings

a lot docstrings for the functions are missing

Conditional parameter clauses cause an error in fANOVA

Hello,

I am using fANOVA with the data generated by pySMAC 0.9.

param.pcs file is as in the following:

neurons3 integer [10, 200] [11]
neurons2 integer [10, 200] [11]
num_layers {1,2,3}[1]
neurons2 | num_layers in {2,3}
neurons3 | num_layers in {3}

When I run fANOVA, I get the following error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationStringFormatException: Error processing Parameter Configuration String "0.007853403141361256,0.007853403141361256,0.007853403141361256,0.16666666666666666" in format: ARRAY_STRING_SYNTAX please check the arguments (and nested exception) and try again at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.getParameterConfigurationFromString(ParameterConfigurationSpace.java:1500) at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.getParameterConfigurationFromString(ParameterConfigurationSpace.java:1313) at ca.ubc.cs.beta.aeatk.state.legacy.LegacyStateDeserializer.<init>(LegacyStateDeserializer.java:258) at ca.ubc.cs.beta.aeatk.state.legacy.LegacyStateFactory.getStateDeserializer(LegacyStateFactory.java:129) at net.aeatk.fanova.FAnovaExecutor.main(FAnovaExecutor.java:91) Caused by: java.lang.IndexOutOfBoundsException: Index: -1, Size: 3 at java.util.LinkedList.checkElementIndex(LinkedList.java:553) at java.util.LinkedList.get(LinkedList.java:474) at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfiguration.get(ParameterConfiguration.java:222) at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfiguration._getActiveParameters(ParameterConfiguration.java:1005) at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfiguration.cleanUp(ParameterConfiguration.java:908) at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfiguration.getActiveParameters(ParameterConfiguration.java:948) at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.getParameterConfigurationFromString(ParameterConfigurationSpace.java:1472) ... 4 more Traceback (most recent call last): File "fanova_results.py", line 9, in <module> f = Fanova("Desktop/JapVowel/out/scenario/state-run0") File "build/bdist.macosx-10.12-x86_64/egg/pyfanova/fanova.py", line 97, in __init__ RuntimeError: Failed starting fanova. Did you start it from a SMAC state-run directory?

What I observed is that caused by conditional parameter clauses in param.pcs file. If I delete the lines where conditional parameter clauses are declared as in the following in param.pcs file, I do not get any error.

neurons3 integer [10, 200] [11]
neurons2 integer [10, 200] [11]
num_layers {1,2,3}[1]

Could you help me through that? Is it safe to delete conditional parameter clauses?

Thank you for your helps!

Support for Mac OSX?

Is the library supported on Mac systems? A student of mine, @abhinavs95, is having problems running fanova on a MAC (installing is no problem).

@abhinavs95, can you please share the stacktrace?

(If not, that would be good to know, as then we can arrange a linux machine for Abhinav)

Visualizing pair wise marginals not working

The visualization in this example in RoBO does not work for pairwise marginal case.

I get the following error if i add vis.plot_pairwise_marginal([0,1]) at the end of the above program.

Wrong imports in the documentation

In the quick start guide it says:

import fanova
...
f = Fanova(X,Y)

However, the Fanova object does not exist. I think it should be rather:

from fanova import fANOVA
f = fANOVA(X, y)

The same for the visualizer:

import visualizer

Should be:

import fanova.visualizer

Best,
Aaron

Issue in visualizer.py when config_on_hypercube=True

In visualizer.py:

function generate_marginal, for continuous parameter, I guess that "grid" should be built within [0,1] (line 187) if "config_on_hypercube=True", before it is passed to "fanova.marginal_mean_variance_for_values", since fanova is trained on [0,1] scale data points in this case.
And the same issue in function generate_pairwise_marginal

fANOVA with pySMAC 0.9 , problem with plot_marginal and plot_categorical_marginal function

Hello,

I am using fANOVA with the data generated by pySMAC 0.9.

What I observe is as in the following :
I have param.pcs file generated by pySMAC as in the following:
popsize integer [1, 30000] [150]l
selection categorical {:tournament, :lexicase} [:lexicase]

When I want to plot marginal of popsize feature:
vis = Visualizer(f)
plot = vis.plot_marginal("popsize")
I got the following error eventhough it is an integer parameter:
Parameter popsize is not a continuous or integer parameter!

I observed that If I change the line in param.pcs as in the following, It works:
popsize integer [1, 30000] [150]l -------- `popsize [1, 30000] [150]l``

So basically, I need to delete the type of parameter in param.pcs file in order to make it work. plot_categorical_marginal function does not work either unless I do the following changes:

selection categorical {:tournament, :lexicase} [:lexicase] ------- selection {:tournament, :lexicase} [:lexicase]

So, Is it safe to do those changes for each param.pcs file that is produced by pySMAC to make it work for fANOVA? Or Am I missing something?

Thanks a lot for your helps,
Best Regards,
Ecem

support for smac_2_10

When applying pyfanova on HPOlib-pickles and a pcs file valid for SMAC 2.10.x it raises RuntimeError: Failed starting fanova. Did you start it from a SMAC state-run directory?. Internally it crashes, because it cannot read forbidden parameters. Here is the complete log:

[INFO] [15:39:41:HPOlib.doFanovaPlots] Starting pyfanova .. this might take a bit
[INFO] [15:39:41:HPOlib.doFanovaPlots] Using params: {'improvement_over': 'NOTHING', 'heap_size': 1024, 'split_min': 5, 'seed': 42, 'num_trees': 30}
java.lang.IllegalArgumentException: Line specifying forbidden parameters contained an name value pair that could not be parsed: [margin > 0.8*f_0] in line: {margin > 0.8*f_0}
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.parseForbiddenLine(ParameterConfigurationSpace.java:598)
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.<init>(ParameterConfigurationSpace.java:435)
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.<init>(ParameterConfigurationSpace.java:247)
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParameterConfigurationSpace.<init>(ParameterConfigurationSpace.java:234)
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParamFileHelper.getParamFileParser(ParamFileHelper.java:41)
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParamFileHelper.getParamFileParser(ParamFileHelper.java:29)
    at ca.ubc.cs.beta.aeatk.parameterconfigurationspace.ParamConfigurationSpaceOptions.getParamConfigurationSpace(ParamConfigurationSpaceOptions.java:184)
    at ca.ubc.cs.beta.aeatk.algorithmexecutionconfiguration.AlgorithmExecutionOptions.getAlgorithmExecutionConfig(AlgorithmExecutionOptions.java:159)
    at ca.ubc.cs.beta.aeatk.algorithmexecutionconfiguration.AlgorithmExecutionOptions.getAlgorithmExecutionConfigSkipDirCheck(AlgorithmExecutionOptions.java:103)
    at net.aeatk.fanova.FAnovaExecutor.main(FAnovaExecutor.java:83)
Traceback (most recent call last):
  File "---/bin/HPOlib-pyFanova", line 5, in <module>
    pkg_resources.run_script('HPOlib==0.2.0dev', 'HPOlib-pyFanova')
  File "---/site-packages/pkg_resources.py", line 488, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "---/site-packages/pkg_resources.py", line 1354, in run_script
    execfile(script_filename, namespace, namespace)
  File "---/site-packages/HPOlib-0.2.0dev-py2.7.egg/EGG-INFO/scripts/HPOlib-pyFanova", line 26, in <module>
    doFanovaPlots.main()
  File "---/lib/python2.7/site-packages/HPOlib-0.2.0dev-py2.7.egg/HPOlib/Plotting/doFanovaPlots.py", line 92, in main
    f = FanovaFromHPOLib(param_file=args.pcsfile, pkls=unknown, **fanova_params)
  File "build/bdist.linux-x86_64/egg/pyfanova/fanova_from_hpolib.py", line 41, in __init__
  File "build/bdist.linux-x86_64/egg/pyfanova/fanova.py", line 97, in __init__
RuntimeError: Failed starting fanova. Did you start it from a SMAC state-run directory?

ConfigSpace bounds for Float and Integer Parameters should be between 0 and 1

The values of the parameters of the ConfigSpace object are transformed to fall in the interval [0., 1.]
(see _inverse_transform of UniformFloatHyperparameter)
Not just Float Parameters are transformed to [0., 1.0] but also Integer Parameters
(see _inverse_transform of UniformIntegerHyperparameter).

Consequently the bounds read in fanova.__init__ are outside [0.,1.0].
They should just be set to 0. and 1.0 respectively

Prevent crashes of fanova

fanova crashes with wrong inputs.

For example:

from pyfanova.fanova import Fanova
f = Fanova(".")
from pyfanova.visualizer import Visualizer
vis = Visualizer(f)
plot = vis.plot_marginal("max-feature-time")
Traceback (most recent call last):
File "", line 1, in
File "build/bdist.linux-x86_64/egg/pyfanova/visualizer.py", line 119, in plot_marginal
AssertionError: param max-feature-time not known
vis = Visualizer(f)
Exception socket.error: (32, 'Broken pipe') in <bound method Fanova.del of <pyfanova.fanova.Fanova object at 0x7f9887bac310>> ignored

smac rejects some smac options in scenario file and fanova crashes

problem, see title.

smac/fanova output:

com.beust.jcommander.ParameterException: Unknown option: --cli-log-all-calls
11:16:07.813 [main] ERROR net.aeatk.fanova.FAnovaExecutor - Unknown option: --cli-log-all-calls

proposed solution: automatically remove all unnecessary options in the scenario file

print number of datapoints used for training

It would be interesting to know how many configurations and datapoints were used for training.

missing dependencies

The following libraries are not declares as dependencies:
pyrfr
ConfigSpace

add function: get_most_important_params(...)

the function print_all_marginals() prints all marginals but does not return them :-(

different smac runs

automatically merge different state_run directories

Adding option to define categorical variables when using fanova_from_csv()

Following the same reasoning as I commented here #35, it would be great to have an option when importing data using the fanova_from_csv() to define which variables are categorical.

At the moment, because this definition can't be made, when using fanova.f.get_categorical_marginal_for_value(param,value) or Visualizer.plot_categorical_marginal(param) I get some errors.

EDIT: I just noticed that this issue was referenced in #30 9 months ago. Any idea when the next version will be released?

pairwise plot: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Sorry for brief description, I can generate a MWE later. The above error occurs when I try to execute the following line of code (and it obtains a numeric and nominal hyperparameter )

vis.create_most_important_pairwise_marginal_plots()

(Workaround is of course to manually pass the numeric params as first argument, in case more people have this problem)

Division by Zero error

This error occurs every now and then. I can provide you with mwe, but github does not accept the files

Traceback (most recent call last):
  File "/home/vanrijn/projects/openml-pimp/openmlpimp/examples/run_on_openml.py", line 89, in retrieve_results
    importance = evaluator.quantify_importance([idx])[(idx,)]['total importance']
  File "/home/vanrijn/projects/pythonvirtual/pimp/lib/python3.5/site-packages/fanova/fanova.py", line 300, in quantify_importance
    fractions_total = np.array([self.V_U_total[sub_dims][t]/self.trees_total_variance[t] for t in range(self.n_trees)])
  File "/home/vanrijn/projects/pythonvirtual/pimp/lib/python3.5/site-packages/fanova/fanova.py", line 300, in <listcomp>
    fractions_total = np.array([self.V_U_total[sub_dims][t]/self.trees_total_variance[t] for t in range(self.n_trees)])
ZeroDivisionError: float division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vanrijn/projects/openml-pimp/openmlpimp/examples/run_on_openml.py", line 165, in <module>
    execute(task_folder, args.flow_id, task_id, args)
  File "/home/vanrijn/projects/openml-pimp/openmlpimp/examples/run_on_openml.py", line 138, in execute
    result = retrieve_results(evaluator, configspace)
  File "/home/vanrijn/projects/openml-pimp/openmlpimp/examples/run_on_openml.py", line 94, in retrieve_results
    raise Exception('fANOVA crashed with a "float division by zero" error. Dumping the data to disk')
Exception: fANOVA crashed with a "float division by zero" error. Dumping the data to disk

pdf files

Add function that automatically creates a pdf file with all plots and marginals

outdated quick start guide

https://automl.github.io/fanova/manual.html

The example below should work

import fanova
import numpy as np
import os
path = os.getcwd()
X = np.loadtxt(path + '/examples/example_data/online_lda/online_lda_features.csv', delimiter=",")
Y = np.loadtxt(path + '/examples/example_data/online_lda/online_lda_responses.csv', delimiter=",")
f = fanova.fANOVA(X,Y)
print(f.quantify_importance((0, )))

Prevent capping at quality scenarios

Aaron:
"ich glaube das Problem ist, dass die Fanova immer noch eine alte Version von SMAC bzw aclib verwendet. Mit dieser Version gibt es einen komische Bug, der wenn Quality als objective function angeben wird, die cutoff time auf die Quality anwendet. Fuer Details frag lieber noch mal Frank.
Ich wollte schon lange mal die Abhaengigkeiten zu SMAC aus der Fanova entfernen. Allerdings habe ich noch nicht die Zeit dazu gefunden. In der Zwischenzeit kannst du das Problem relativ einfach beheben, in dem du die Cutoff Time einfach so hoch wie moeglich setzt bzw groesser als den groessten y-Wert. (Ich habe es einfach mal auf cutoff_time = 100000000 gesetzt und es scheint zu funktionieren.)"

Frank: "Er kam daher, dass Steve immer nur Laufzeit als response anschaute, und demnach die Response beim Einlesen gecapped hat, wenn sie ueber der cutoff time war."

issue with pyrfr 0.5 and 0.6

testing on ubuntu 14.04 and a virtualenv
Python 3.4.3
SWIG Version 3.0.2 (ubuntu repo)
pyrfr versions 0.5.0 and 0.6.0 fail with the error below. pyrfr version 0.4.0 works fine.

In [9]: f = fanova.fANOVA(X,Y)
---------------------------------------------------------------------------
AttributeError     Traceback (most recent call last)
<ipython-input-9-d48bae3fcbaf> in <module>()
----> 1 f = fanova.fANOVA(X,Y)

~/fanova/fanova.py in __init__(self, X, Y, config_space, n_trees, seed, bootstrapping, points_per_tree, max_features, min_samples_split, min_samples_leaf, max_depth, cutoffs)
     93         forest = reg.fanova_forest()
     94         forest.options.num_trees = n_trees
---> 95         forest.options.seed = np.random.randint(2**31-1) if seed is None else seed
     96         forest.options.do_bootstrapping = bootstrapping
     97         forest.options.num_data_points_per_tree = X.shape[0] if points_per_tree is None else points_per_tree

~/.fanova/lib/python3.5/site-packages/pyrfr-0.6.0-py3.5-linux-x86_64.egg/pyrfr/regression.py in set_attr(self, name, value)
     93             set(self, name, value)
     94         else:
---> 95             raise AttributeError("You cannot add attributes to %s" % self)
     96     return set_attr
     97 

AttributeError: You cannot add attributes to <pyrfr.regression.forest_opts; proxy of <Swig Object of type 'rfr::forests::forest_options< num_t,response_t,index_t > *' at 0x7f6cd015afc0> >

Creating the n most important pairwise plots!

The following line causes an issue

fanova/fanova/visualizer.py

Line 282 in 9860cdc

 most_important_pairwise_marginals = self.fanova.get_most_important_pairwise_marginals(n) 

get_most_important_pairwise_marginals has two optional parameters, where n is the second one. As the first optional parameter expects a list of strings and the n parameter is an int it always throws the following error:
TypeError: 'int' object is not subscriptable

If the line gets changed to "...get_most_important_pairwise_marginals(n=n)" everything should work fine.

heap space

Start the java process with an user specified amount of heap space

"ValueError: Axis limits cannot be NaN or Inf" caused by all parameter importances being nan

When trying to plot single marginals I get the following error:

vis.plot_marginal(self.cs.get_idx_by_hyperparameter_name(param), show=False)
File "/home/fr/fr_fr/fr_ab1184/miniconda3/envs/pimp/lib/python3.6/site-packages/fanova/visualizer.py", line 241, in plot_marginal
plt.ylim([min_y, max_y])
File "/home/fr/fr_fr/fr_ab1184/miniconda3/envs/pimp/lib/python3.6/site-packages/matplotlib/pyplot.py", line 1548, in ylim
ret = ax.set_ylim(*args, **kwargs)
File "/home/fr/fr_fr/fr_ab1184/miniconda3/envs/pimp/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 3225, in set_ylim
bottom = self._validate_converted_limits(bottom, self.convert_yunits)
File "/home/fr/fr_fr/fr_ab1184/miniconda3/envs/pimp/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 2836, in _validate_converted_limits
raise ValueError("Axis limits cannot be NaN or Inf")
ValueError: Axis limits cannot be NaN or Inf

However the issue is already caused way earlier as all parameters "total importance"(s) are nan.

Interestingly, empty plots for integer parameters are generated until the visualizer is tasked to plot the results for a categorical. I.e.

actavgmax [0,150][112]i # glue average max limit for dyn acts
actdblarithlim [0,15][3]i # glue lim for dbl arith increase
actgeomlim [0,15][2]i # glue limit for geometric increase
actgsdul [0,150][7]i # glue useless standard deviation limit
acts {0,1,2}[2] # activity based reduction: 0=no,1=enable,2=dyn

empty plots for the first four are generated and acts then causes an error in the visualizer.

Fanova gives an error for SLF4J in example data

Hello,

I already generated a data with Pysmac. And now I would like to use Fanova for the data I generated.

But I get an error with fanova with example data :

from pyfanova.fanova import Fanova
f = Fanova("fanova/example/online_lda")

ERROR:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

Do you have any idea I could I fix the problem?

Thanks a lot in advance.
Ecem

ValueError when plotting parameters on log-scale due to imprecision

The parameter that caused the issue is given in the pcs as follows:

barrier_limits_growth [1000000, 100000000000000] [1000000000000]l

According to fANOVA (given the input data) it's single total importance is less than 0.000. So computing the marginal is no problem.

However when trying to plot the parameter, the following code just throws a ValueError:

fanova/fanova/visualizer.py

Line 146 in 95ade37

def generate_marginal(self, param, resolution=100):

...

fanova/fanova/visualizer.py

Lines 176 to 185 in 95ade37

 if log: 

 # JvR: my conjecture is that ConfigSpace uses the natural logarithm 

 base = np.e 

 log_lower = np.log(lower_bound) / np.log(base) 

 log_upper = np.log(upper_bound) / np.log(base) 

 grid = np.logspace(log_lower, log_upper, resolution, endpoint=True, base=base) 

 if abs(grid[0] - lower_bound) > 0.00001: 

 raise ValueError() 

 if abs(grid[-1] - upper_bound) > 0.00001: 

 raise ValueError()

For a second set of data, the same parameter causes the visualizer to crash.

Just plugging in the lower and upper values and using base and resolution as stated in the code snipped given above:

In [1]: import numpy as np

In [2]: l, u = 1000000, 100000000000000

In [3]: ll, lu = np.log(l)/np.log(np.e), np.log(u)/np.log(np.e)

In [4]: grid = np.logspace(ll, lu, 100, endpoint=True, base=np.e)

In [5]: print(grid[0], grid[-1])
1000000.0 1e+14

In [6]: print(abs(grid[0]-l), abs(grid[-1]-u))
1.16415321827e-09 0.046875

However

In [7]: u-10**14
Out[7]: 0

I'm not sure what the best way of handling this imprecision would be or if this test of having the correct grid is even necessary.

make single parameter importance usable with a single parameter

f = fanova.fANOVA(X,y, n_trees=32,bootstrapping=True)

f.quantify_importance(0)

should be equivalent to

f.quantify_importance((0,))

Python 2.7.8 |Anaconda 2.3.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
>>> from pyfanova.fanova import Fanova
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\pyfanova\fanova.py", line 13, in <module>
ImportError: No module named ParameterConfigSpace.config_space

I'm not sure whether this constitutes a bug or a configuration mistake.
kind regards

	if log:
	# JvR: my conjecture is that ConfigSpace uses the natural logarithm
	base = np.e
	log_lower = np.log(lower_bound) / np.log(base)
	log_upper = np.log(upper_bound) / np.log(base)
	grid = np.logspace(log_lower, log_upper, resolution, endpoint=True, base=base)
	if abs(grid[0] - lower_bound) > 0.00001:
	raise ValueError()
	if abs(grid[-1] - upper_bound) > 0.00001:
	raise ValueError()

automl / fanova Goto Github PK

fanova's Introduction

Fanova

Documentation

fanova's People

Contributors

Stargazers

Watchers

Forkers

fanova's Issues

Recommend Projects

Recommend Topics

Recommend Org