convexengineering / gpfit Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 7.0 3.02 MB

Fit posynomials to data

Home Page: http://gpfit.readthedocs.io/en/latest/

License: MIT License

Python 99.12% Shell 0.88%

gpfit's People

Contributors

Stargazers

Watchers

Forkers

tonystao hrishikeshvganu giserh nanjekyejoannah vishalbelsare shaynababe

gpfit's Issues

unit tests are failing

Traceback below.

======================================================================
FAIL: test_MA (t_print_fit.t_print_MA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/whoburg/MIT/dev/gpfit/gpfit/tests/t_print_fit.py", line 17, in test_MA
    'w = 8.1e+03 * (u_0)**10 * (u_1)**11 * (u_2)**12'])
AssertionError: Lists differ: ['w = 2.72 * (u_1)**2 * (u_2)*... != ['w = 2.72 * (u_0)**2 * (u_1)*...

First differing element 0:
w = 2.72 * (u_1)**2 * (u_2)**3 * (u_3)**4
w = 2.72 * (u_0)**2 * (u_1)**3 * (u_2)**4

- ['w = 2.72 * (u_1)**2 * (u_2)**3 * (u_3)**4',
-  'w = 148 * (u_1)**6 * (u_2)**7 * (u_3)**8',
-  'w = 8.1e+03 * (u_1)**10 * (u_2)**11 * (u_3)**12']
+ ['w = 2.72 * (u_0)**2 * (u_1)**3 * (u_2)**4',
+  'w = 148 * (u_0)**6 * (u_1)**7 * (u_2)**8',
+  'w = 8.1e+03 * (u_0)**10 * (u_1)**11 * (u_2)**12']

======================================================================
FAIL: test_SMA (t_print_fit.t_print_SMA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/whoburg/MIT/dev/gpfit/gpfit/tests/t_print_fit.py", line 32, in test_SMA
    '    + 2 * (u_0)**0.769 * (u_1)**0.846 * (u_2)**0.923'])
AssertionError: Lists differ: ['w**0.0769 = 1.08 * (u_1)**0.... != ['w**0.0769 = 1.08 * (u_0)**0....

First differing element 0:
w**0.0769 = 1.08 * (u_1)**0.154 * (u_2)**0.231 * (u_3)**0.308
w**0.0769 = 1.08 * (u_0)**0.154 * (u_1)**0.231 * (u_2)**0.308

- ['w**0.0769 = 1.08 * (u_1)**0.154 * (u_2)**0.231 * (u_3)**0.308',
-  '    + 1.47 * (u_1)**0.462 * (u_2)**0.538 * (u_3)**0.615',
-  '    + 2 * (u_1)**0.769 * (u_2)**0.846 * (u_3)**0.923']
+ ['w**0.0769 = 1.08 * (u_0)**0.154 * (u_1)**0.231 * (u_2)**0.308',
+  '    + 1.47 * (u_0)**0.462 * (u_1)**0.538 * (u_2)**0.615',
+  '    + 2 * (u_0)**0.769 * (u_1)**0.846 * (u_2)**0.923']

======================================================================
FAIL: test_ISMA (t_print_fit.t_print_ISMA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/whoburg/MIT/dev/gpfit/gpfit/tests/t_print_fit.py", line 47, in test_ISMA
    '    + (1.82/w**0.0667) * (u_0)**0.667 * (u_1)**0.733 * (u_2)**0.8'])
AssertionError: Lists differ: ['1 = (1.08/w**0.0769) * (u_1)... != ['1 = (1.08/w**0.0769) * (u_0)...

First differing element 0:
1 = (1.08/w**0.0769) * (u_1)**0.154 * (u_2)**0.231 * (u_3)**0.308
1 = (1.08/w**0.0769) * (u_0)**0.154 * (u_1)**0.231 * (u_2)**0.308

Diff is 803 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 47 tests in 0.136s

FAILED (failures=3)

I've been trying to fit some compressor maps and am getting what appears to be a deterministic linear algebra error. I can get the error, then run the exact same code and it won't throw an error the second time. I'm not too familiar with gpfit so I'm not sure why this would occur.

Below is the error and some code which causes said error.

Traceback (most recent call last):
  File "/Users/mayork/Documents/GpGit/gpfit/gpfit/compressor_map_REAL_DATA_fitting.py", line 140, in <module>
    r,const = fit.fit(independent, dependent, 4, 'SMA')
  File "/Users/mayork/Documents/GpGit/gpfit/gpfit/fit.py", line 71, in fit
    bainit = max_affine_init(xdata, ydata, K)
  File "/Users/mayork/Documents/GpGit/gpfit/gpfit/max_affine_init.py", line 58, in max_affine_init
    if matrix_rank(X[inds, :]) < dimx + 1:
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1543, in matrix_rank
    S = svd(M, compute_uv=False)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1338, in svd
    _assertNoEmpty2d(a)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 222, in _assertNoEmpty2d
    raise LinAlgError("Arrays cannot be empty")
LinAlgError: Arrays cannot be empty

import numpy as np
import matplotlib.pyplot as plt
import fit

#variable to control if plotting occurs
PLOT = True

#-----------------------------------
#Compressor map
#make the data
#real data
##N = [.5, .6, .7, .75, .8, .85, .875, .9, .925, .95, .975, .985, 1, 1.025]
##pi= [[2.27690,2.21610,2.15520,2.00330,1.91210,1.66880],[3.21930,3.12810,3.00650,2.82410,2.61130,2.21610],[4.86100,4.58740,4.31380,4.16180,3.76660,3.18890],
##     [5.95550,5.65150,5.40830,5.19540,4.73940,4.00980],[7.68840,7.56680,7.32360,7.14120,6.62430,5.65150],[10.2117,9.72530,9.26930,8.93490,8.23560,7.20200],
##     [11.7318,11.3062,10.7590,10.2725,9.60370,8.38760],[13.6775,13.1303,12.5223,11.9446,11.2757,9.99890],[16.3529,15.8056,15.0760,14.4072,13.5255,12.1574],
##     [19.4235,18.9978,18.3594,17.4473,16.4745,14.9544],[23.4365,22.8284,22.0076,20.8523,19.8187,18.3898],[24.8350,24.3181,23.6493,22.5548,21.4604,20.1227],
##     [27.0239,26.2942,25.3822,24.1357,22.7372,21.5212],[29.0304,28.2704,27.2975,26.3246,24.5309,23.3453]]
##mbar = [[0.114208,0.118718,0.123241,0.126468,0.128404,0.129694],[0.155504,0.161956,0.165183,0.167763,0.170345,0.170345],[0.220674,0.229707,0.230998,0.232933,0.235515,0.236805],
##        [0.261969,0.274874,0.280681,0.282617,0.285843,0.287779],[0.345206,0.351659,0.364563,0.371015,0.376177,0.378759],[0.425216,0.436185,0.443283,0.448445,0.451026,0.452962],
##        [0.486515,0.496838,0.505227,0.509743,0.512324,0.512324],[0.560072,0.572332,0.580075,0.585237,0.586528,0.588463],[0.651697,0.667183,0.676216,0.682024,0.683314,0.684605],
##        [0.751710,0.770422,0.783972,0.792360,0.792360,0.792360],[0.871081,0.886569,0.896886,0.904634,0.907211,0.907862],[0.918187,0.934317,0.947862,0.958187,0.962057,0.963992],
##        [0.993033,0.997545,1.00142,1.00464,1.00593,1.00787],[1.05433,1.05820,1.06014,1.06337,1.06659,1.06659]]

##N = [.95, .985]
##pi= [[20.4125,20.4325,19.4235,18.9978,18.3594,17.4473,16.4745,14.9544],[25.8250,25.8350,24.8350,24.3181,23.6493,22.5548,21.4604,20.1227]]
##mbar = [[.552,.652,0.751710,0.770422,0.783972,0.792360,0.792360,0.79442360],[.7182,.8182,0.918187,0.934317,0.947862,0.958187,0.962057,0.964]]

uppi=[]
upm=[]
centerpi=26.19
centerm=1
for i in range(8):
    if i==0:
        uppi.extend([centerpi+.06])
        upm.extend([centerm-.217])
    if i==1:
        uppi.extend([centerpi+.063])
        upm.extend([centerm-.1368])
    if i==2:
        uppi.extend([centerpi+.042])
        upm.extend([centerm-.0618])
    if i==3:
        uppi.extend([centerpi])
        upm.extend([centerm])
    if i==4:
        uppi.extend([centerpi-.06])
        upm.extend([centerm+.0447])
    if i==5:
        uppi.extend([centerpi-.102])
        upm.extend([centerm+.0664])
    if i==6:
        uppi.extend([centerpi-.184])
        upm.extend([centerm+.0972])
    if i==7:
        uppi.extend([centerpi-.44])
        upm.extend([centerm+.0972])

uppi2=[]
upm2=[]
centerpi=17.9
centerm=.8
for i in range(8):
    if i==0:
        uppi2.extend([centerpi+.06])
        upm2.extend([centerm-.217])
    if i==1:
        uppi2.extend([centerpi+.063])
        upm2.extend([centerm-.1368])
    if i==2:
        uppi2.extend([centerpi+.042])
        upm2.extend([centerm-.0618])
    if i==3:
        uppi2.extend([centerpi])
        upm2.extend([centerm])
    if i==4:
        uppi2.extend([centerpi-.06])
        upm2.extend([centerm+.0447])
    if i==5:
        uppi2.extend([centerpi-.102])
        upm2.extend([centerm+.0664])
    if i==6:
        uppi2.extend([centerpi-.184])
        upm2.extend([centerm+.0972])
    if i==7:
        uppi2.extend([centerpi-.44])
        upm2.extend([centerm+.0972])

N=[1,.925]
pi=[uppi,uppi2]
mbar=[upm,upm2]
if PLOT == True:
#plot of data used in gpfit
    for i in range(len(N)):
        Nplot = N[i]*np.ones(len(mbar[0]))
        piplot = pi[i]
        mbarplot = mbar[i]
        plt.plot(mbarplot,piplot, '-r')
    plt.xlabel('Normalized Corrected Mass Flow')
    plt.ylabel('Fan Pressure Ratio')
    plt.title('E3 Fan Map')
    plt.show()

    for i in range(len(N)):
        Nplot = N[i]*np.ones(len(mbar[0]))
        piplot = pi[i]
        mbarplot = mbar[i]
        plt.plot(np.log(mbarplot),np.log(piplot), '-r')
    plt.xlabel('Log of Normalized Corrected Mass Flow')
    plt.ylabel('Log of Fan Pressure Ratio')
    plt.title('E3 Fan Map in Log Space')
    plt.show()

    for i in range(len(N)):
        Nplot = N[i]*np.ones(len(mbar[0]))
        invpiplot = np.ones(len(pi[i]))
        for j in range(len(pi[i])):
            invpiplot[j] = 1/(pi[i][j])
        mbarplot = mbar[i]
        plt.plot(np.log(mbarplot),np.log(invpiplot), '-r')
    plt.xlabel('Log of Normalized Corrected Mass Flow')
    plt.ylabel('Log of Fan Pressure Ratio')
    plt.title('E3 Fan Map in Log Space')
    plt.show()

#set up the data for the fit
Nfit = []
mbarfit = []
pifit = []

for i in range(len(N)):
    Nfit.extend(N[i]*np.ones(len(mbar[i])))
    for j in range(len(pi[i])):
        hold=pi[i][j]
        pifit.extend([1/hold])
    mbarfit.extend(np.divide(mbar[i],[1]))

#create the fit
independent = np.array([np.log(Nfit),np.log(mbarfit)])
dependent = np.log(pifit)
r,const = fit.fit(independent, dependent, 4, 'SMA')
print const

#plot the fit
nvec = np.linspace(.8, 1, 10)
mbarvec = np.linspace(.8,1,100)

for i in range(len(nvec)):
    N = nvec[i]
    pi=[]
    for j in range(len(mbarvec)):
        mbar = mbarvec[j]
        #fit to the tweaked data
        pi.extend([(0.282 * (N)**-3.56 * (mbar)**0.132
        + 9.75e-06 * (N)**-133 * (mbar)**49.7
        + 0.3 * (N)**-0.59 * (mbar)**-0.184
        + 0.306 * (N)**2.8 * (mbar)**0.0678)**(-1/.124)])
        #fit to the original data
##        pi.extend([(0.106 * (N)**-0.0299 * (mbar)**-0.129
##        + 0.119 * (N)**-0.0527 * (mbar)**-0.123
##        + 0.107 * (N)**0.028 * (mbar)**-0.146
##        + 0.102 * (N)**-0.0231 * (mbar)**-0.131
##        + 0.123 * (N)**-0.0233 * (mbar)**-0.131
##        + 0.136 * (N)**0.082 * (mbar)**-0.161)**(-1/.116)])

    plt.plot(mbarvec, pi, '-r')

#code for adding in the actual fan map data
##N = [.5, .6, .7, .75, .8, .85, .875, .9, .925, .95, .975, .985, 1, 1.025]
##pi= [[2.27690,2.21610,2.15520,2.00330,1.91210,1.66880],[3.21930,3.12810,3.00650,2.82410,2.61130,2.21610],[4.86100,4.58740,4.31380,4.16180,3.76660,3.18890],
##     [5.95550,5.65150,5.40830,5.19540,4.73940,4.00980],[7.68840,7.56680,7.32360,7.14120,6.62430,5.65150],[10.2117,9.72530,9.26930,8.93490,8.23560,7.20200],
##     [11.7318,11.3062,10.7590,10.2725,9.60370,8.38760],[13.6775,13.1303,12.5223,11.9446,11.2757,9.99890],[16.3529,15.8056,15.0760,14.4072,13.5255,12.1574],
##     [19.4235,18.9978,18.3594,17.4473,16.4745,14.9544],[23.4365,22.8284,22.0076,20.8523,19.8187,18.3898],[24.8350,24.3181,23.6493,22.5548,21.4604,20.1227],
##     [27.0239,26.2942,25.3822,24.1357,22.7372,21.5212],[29.0304,28.2704,27.2975,26.3246,24.5309,23.3453]]
##mbar = [[0.114208,0.118718,0.123241,0.126468,0.128404,0.129694],[0.155504,0.161956,0.165183,0.167763,0.170345,0.170345],[0.220674,0.229707,0.230998,0.232933,0.235515,0.236805],
##        [0.261969,0.274874,0.280681,0.282617,0.285843,0.287779],[0.345206,0.351659,0.364563,0.371015,0.376177,0.378759],[0.425216,0.436185,0.443283,0.448445,0.451026,0.452962],
##        [0.486515,0.496838,0.505227,0.509743,0.512324,0.512324],[0.560072,0.572332,0.580075,0.585237,0.586528,0.588463],[0.651697,0.667183,0.676216,0.682024,0.683314,0.684605],
##        [0.751710,0.770422,0.783972,0.792360,0.792360,0.792360],[0.871081,0.886569,0.896886,0.904634,0.907211,0.907862],[0.918187,0.934317,0.947862,0.958187,0.962057,0.963992],
##        [0.993033,0.997545,1.00142,1.00464,1.00593,1.00787],[1.05433,1.05820,1.06014,1.06337,1.06659,1.06659]]
##for i in range(len(N)):
##        Nplot = N[i]*np.ones(len(mbar[0]))
##        piplot = pi[i]
##        mbarplot = mbar[i]
##        plt.plot(mbarplot,piplot, '-b')

plt.xlabel('Normalized Corrected Mass Flow')
plt.ylabel('Pressure Ratio')
plt.title('Fan Map')
plt.show()

regularization

... we should implement it, to avoid huge fitted parameters.

ridge
lasso

Ability to pre-specify the power on an independent variable

This might sound like a strange request, and I haven't really thought about how feasible it is, but it would be nice if a user could specify the power on a certain independent variable based on a priori knowledge of the underlying relationships.

For example, if I am fitting a function, z = f(x, y) and I know that z should go with x^0.5, it would be nice to "fix" that part of the regression, so I end up with something like z = 4.58*x^0.5*y^0.234.

Finish implementation of SMA and MA in fit.py

list index out of range

I keep getting this error and I'm not sure why. I would be grateful for any help.

In [19]: fit(x_log,y_log,2,"SMA")
w**424 = 0 * (u_1)**1.5e+03
    + 0 * (u_1)**1.36e+03
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-19-920e59fab3f5> in <module>()
----> 1 fit(x_log,y_log,2,"SMA")

/Users/mjburton11/Documents/SuperUROP/gpfit/gpfit/fit.pyc in fit(xdata, ydata, K, ftype, varNames)
    156         # Create gpkit objects
    157         # SMA returns a constraint of the form w^alpha >= c1*u1^exp1 + c2*u2^exp2 +....
--> 158         posy  = Posynomial(exps, cs)
    159         mono = Monomial(w_exp,1)
    160         cstrt = (mono >= posy)

/Users/mjburton11/Documents/SuperUROP/gpkit/gpkit/nomials.pyc in __init__(self, exps, cs, require_positive, simplify, **descr)
    104 
    105         # init NomialData to create self.exps, self.cs, and so on
--> 106         super(Signomial, self).__init__(exps, cs, simplify=simplify)
    107 
    108         if self.any_nonpositive_cs:

/Users/mjburton11/Documents/SuperUROP/gpkit/gpkit/nomial_data.pyc in __init__(self, exps, cs, simplify)
     26             return
     27         if simplify:
---> 28             exps, cs = simplify_exps_and_cs(exps, cs)
     29         self.exps, self.cs = exps, cs
     30         self.any_nonpositive_cs = any(mag(c) <= 0 for c in self.cs)

/Users/mjburton11/Documents/SuperUROP/gpkit/gpkit/nomial_data.pyc in simplify_exps_and_cs(exps, cs, return_map)
    177     exps_ = tuple(matches.keys())
    178     cs_ = list(matches.values())
--> 179     if isinstance(cs_[0], Quantity):
    180         units = Quantity(1, cs_[0].units)
    181         cs_ = [c.to(units).magnitude for c in cs_] * units

IndexError: list index out of range

Here are my x_log and y_log arrays:

In [17]: x_log
Out[17]: 
array([ 1.60943791,  1.62964062,  1.64944325,  1.66886133,  1.68790953,
        1.70660166,  1.7249508 ,  1.74296931,  1.76066888,  1.77806062,
        1.79515506,  1.81196218,  1.82849148,  1.844752  ,  1.86075234,
        1.8765007 ,  1.89200488,  1.90727236,  1.92231023,  1.93712532,
        1.95172412,  1.96611286,  1.98029749,  1.99428373,  2.00807706,
        2.02168271,  2.03510573,  2.04835095,  2.06142304,  2.07432644,
        2.08706547,  2.09964425,  2.11206677,  2.12433686,  2.13645822,
        2.14843441,  2.16026887,  2.17196491,  2.18352573,  2.19495443,
        2.20625398,  2.21742728,  2.22847712,  2.23940619,  2.25021711,
        2.2609124 ,  2.27149451,  2.28196581,  2.29232859,  2.30258509])

In [18]: y_log
Out[18]: 
array([ 1.16385884,  1.21664897,  1.27131266,  1.32741476,  1.3845526 ,
        1.4423605 ,  1.50051215,  1.55872099,  1.61673936,  1.67435666,
        1.73139691,  1.78771599,  1.84319872,  1.89775602,  1.95132211,
        2.00385192,  2.05531868,  2.10571175,  2.15503457,  2.20330291,
        2.25054315,  2.29679084,  2.34208929,  2.38648828,  2.43004285,
        2.4728122 ,  2.51485854,  2.5562461 ,  2.59704002,  2.63730544,
        2.67710647,  2.71650529,  2.75556121,  2.79432986,  2.83286237,
        2.87120462,  2.90939664,  2.94747203,  2.98545754,  3.02337269,
        3.06122963,  3.09903299,  3.13677999,  3.17446052,  3.21205749,
        3.24954715,  3.2868996 ,  3.3240793 ,  3.3610457 ,  3.39775386])

fit.py import error

Line 8 of fit.py reads

from gpkit.nomials import Posynomial, Monomial, Constraint, MonoEQConstraint

I'm getting an error that Constraint can't be imported. I'm guessing this has been moved during a gpkit refactor. I edited my local copy only to make line 8 read import gpkit and it works, obviously not a good long term fix.

Refactor goals

top-level support of Fit Constraint Sets
reduce the amount of code
be more consistent with numpy/scipy convention
- such as by having variables in different rows, not columns, or automatically transposing (#72)

printed solution significantly different from posynomial inequality

I think that significant figures on the auto printed fit equation and the posynomial output equation should be the same.

This is what I see when I run fit:

In [4]: fit(X,Y, 4, "SMA")
w**3.72 = 6.35e+10 * (u_1)**-0.243 * (u_2)**-3.43
    + 0.0247 * (u_1)**2.49 * (u_2)**-1.11
    + 2.03e-07 * (u_1)**12.7 * (u_2)**-0.338
    + 6.49e-06 * (u_1)**-1.9 * (u_2)**-0.681
Out[4]: 
(gpkit.PosynomialInequality(w**3.7 >= 0.0247*u_1**2.5*u_2**-1.1 + 2.03e-07*u_1**13*u_2**-0.34
+ 6.35e+10*u_1**-0.24*u_2**-3.4 + 6.49e-06*u_1**-1.9*u_2**-0.68),
 0.0048930297487385886)

The difference in significant figures from the printed solution and the gpkit.PosynomialInequalityresulted in pretty drastic gaps when I compared the fits to the actual data. jh01polarfit.pdf uses all the significant figures, jh01polarfit1.pdf uses the posynomial equation.
jh01polarfit.pdf
jh01polarfit1.pdf

plot_fit function

I think GPfit would benefit from a plot_fit method for 1D, and perhaps 2D, functions. This method should plot both the original data and the fitted function, and have the option of plotting in log space too.

Documentation: Add examples to readthedocs

gpfit not updated to match gpkit

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/Users/mjburton11/Documents/SuperUROP/gpkit-models/1682/gas_male/Datasets/fitDF70.py in <module>()
      2 
      3 import gpfit
----> 4 from gpfit.fit import fit
      5 import numpy as np
      6 import pandas as pd

/Users/mjburton11/Documents/SuperUROP/gpfit/gpfit/fit.py in <module>()
      6 from max_affine_init import max_affine_init
      7 from print_fit import print_ISMA, print_SMA, print_MA
----> 8 from gpkit.nomials import Posynomial, Monomial, Constraint, MonoEQConstraint
      9 from numpy import append, ones, exp, sqrt, mean, square
     10 

ImportError: cannot import name Constraint

duplicate examples directory

examples in gpfit/examples/ should be updated / renamed as appropriate, moved to docs/source/examples, and added to tests/t_examples.

gpfit logo a disaster

The font changes 1/3 of the way through "fit", at least on my browser: http://gpfit.readthedocs.org/en/latest/

continuous integration

Set up CI just like in gpkit.

GitHub is currently advertising a lot of CI integrations: https://github.com/integrations/feature/code. @galbramc, any thoughts on whether those are worth looking in to, or is another Jenkins setup our best bet?

fit should fail gracefully when given infinite or nan inputs

I have been trying to fit a SMA function with two terms to the data attached, but been getting the index error above. Don't exactly know what I am doing wrong. The data, code, and specific error are attached.

issue.zip

different attributes in different installations

Hi all,

I'm using gpfit in two different machines. After I run a fitting routine in my data I get certain attributes for the output. The "good" machine shows:

pr.pprint(smafit.__dict__.keys())
['posymap',
 'mfac',
 'ivar',
 'constraint',
 'evaluate',
 'dvars',
 'numpy_bools',
 'bounds',
 'substitutions',
 'max_err',
 'fitdata',
 'varkeys',
 'rms_err']

Whereas the "bad" machine shows:

pr.pprint(smafit.__dict__.keys())
['oper',
 'unsubbed',
 'right',
 'evaluate',
 'nomials',
 'substitutions',
 'p_lt',
 'varkeys',
 'm_gt',
 'last_used_substitutions',
 'left']

I use the fitdata attribute to extract the coefficients in a nice way and import them im MATLAB for post-processing. Why is the attribute list so different in these two installations? I cloned gpfit today in the old (bad) machine to see if there was any update but it doesn't see to affect the attribute list. None of the attributes available in the "bad" machine contains the coefficient in a nice format for extraction.

Examples using wind turbine data from NREL

overall system models: https://nwtc.nrel.gov/WISDEM
Cost models: https://nwtc.nrel.gov/taxonomy/term/23

Defunct link to the GPfit paper in Readme.

Probably should fix this. Seems important if anyone wants to understand gpfit.

index errors with large datasets?

reported by tony tao:

"on the surface it looks like a data size issue ("MemoryError" and "Iterator too large" errors) but when I truncate the data set to something that already worked before, it returns indexing errors which makes me believe it's something in GPfit, but I can't figure out what it is. "

"Actually (as usual, problem is solved after calling mayday) I may have figured it out and now I have a Cd model as well.

The training input data set is around 40,000 data points over 7 dimensions (originally 80,000), so it takes around 16 GB of memory to build the model, which explains the memory error running in Python(x,y). Running it in Ubuntu and deleting about half the training data seems to have fixed it.

The index error is caused by line 73 of the max_affine_init.py script where if the while loop isn't fulfilled by the end of the dataset, it calls the next index location which is out of bounds. "

gpfit failing tests on reynolds

seemingly because which coverage is returning a blank string!

Identify why fitting process seems to time out for relatively "easy" problems

Relatively straightforward fitting problems seem to be cause GPfit to reach the max time limit (5 seconds). Examples of such problems are ex61.py and ex63.py, which are both taken from the GPfit paper.

The fact they are reaching max time can be found be re-enabling the verbose option

Saving and loading fitted nomials

There should be some way of saving/loading/manipulating nomials because it can take a long time to generate them.
During loads, it would be convenient to be able to rename variables.

(Pickling worked in the old implementation of nomials, but it doesn't work anymore.)

Example t_ex6_1.py issue

Here the current issue with t_ex6_1. Probably due to missed update with gpfit

In [2]: %run gpfit/tests/t_e
gpfit/tests/t_ex6_1.py  gpfit/tests/t_ex6_3.py  

In [2]: %run gpfit/tests/t_ex6_1.py
1 = (0.95/w**0.0961) * (u_1)**0.0161
    + (0.996/w**0.165) * (u_1)**-0.0958
    + (0.975/w**0.112) * (u_1)**-0.0166
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/mjburton11/Documents/SuperUROP/gpfit/gpfit/tests/t_ex6_1.py in <module>()
      3 from gpfit.fit import fit
      4 
----> 5 class t_ex6_1_ISMA(unittest.TestCase):
      6     '''
      7     ISMA unit tests based on example 6.1 from GPfit paper

/Users/mjburton11/Documents/SuperUROP/gpfit/gpfit/tests/t_ex6_1.py in t_ex6_1_ISMA()
     14     K = 3
     15 
---> 16     cstrt, rms_error = fit(x, y, K, "ISMA")
     17 
     18     def test_rms_error(self):

/Users/mjburton11/Documents/SuperUROP/gpfit/gpfit/fit.py in fit(xdata, ydata, K, ftype, varNames)
    110         # ISMA returns a constraint of the form 1 >= c1*u1^exp1*u2^exp2*w^(-alpha) + ....
    111         posy  = Posynomial(exps, cs)
--> 112         cstrt = Constraint(posy,1)
    113 
    114         # # If only one term, automatically make an equality constraint

/Users/mjburton11/Documents/SuperUROP/gpkit/gpkit/nomials.pyc in __init__(self, left, right, oper_ge)
    572         self.left, self.right = (pgt, plt) if oper_ge else (plt, pgt)
    573 
--> 574         p = plt / pgt
    575 
    576         if isinstance(p.cs, Quantity):

TypeError: unsupported operand type(s) for /: 'Monomial' and 'Posynomial'

Can't import fit in the latest commit

I am on commit 648d114, and was about to try to fit something very basic when I got the following error from trying import fit.

In [1]: import fit
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b7a6b9d011b5> in <module>()
----> 1 import fit

C:\Users\Berk\Dropbox (MIT)\MIT Senior Year\16.82\gpfit\gpfit\fit.py in <module>()
      2 from numpy import ones, exp, sqrt, mean, square, hstack
      3 from gpkit import NamedVariables, VectorVariable, Variable, NomialArray
----> 4 from .implicit_softmax_affine import implicit_softmax_affine
      5 from .softmax_affine import softmax_affine
      6 from .max_affine import max_affine

ValueError: Attempted relative import in non-package

??? Help?

Make GPfit more object oriented(?)

It might be more elegant to make classes of fits, e.g. ISMA_fit, SMA_fit, MA_fit.

These classes could then have functions such as plot_fit (for 1D and 2D functions) and print_fit (which already exists).

Documentation: make it better

Massive overhaul of docs needed:

installation (see #69)
Correct citation
Tutorial

Plot_fit legend is gross for fits with multiple terms

Example:

Make a new example using data from the airline data project

There is a wealth of data at http://web.mit.edu/airlinedata/www/default.html

Speak to Luke Jensen and co. about interesting correlations.

make unit tests deterministic

unit tests should use fixed seeds to prevent non-determinism associated with initial guesses etc.

For example, this error seems to be sporadic, some times it passes, sometimes it fails:

FAIL [0.000s]: test_rms_error (gpfit.tests.t_ex6_3.t_ex6_3_ISMA)

Traceback (most recent call last):
File "/home1/jenkins/workspace/gpfit_PullRequest/buildnode/reynolds/gpfit/tests/t_ex6_3.py", line 21, in test_rms_error
self.assertTrue(self.rms_error < 5e-4)
AssertionError: False is not true

Create unit tests for fit.py

errors on incorrect inputs
rms error
output types

Example code t_ex6_1.py not working

I just tried importing t_ex6_1.py into ipython and got the following error:

In [1]: import t_ex6_1
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-eff11dff08f3> in <module>()
----> 1 import t_ex6_1

/Users/mjburton11/Documents/SuperUROP/gpfit/gpfit/tests/t_ex6_1.py in <module>()
      1 import unittest
      2 from numpy import logspace, log, exp, log10
----> 3 from gpfit.fit import fit
      4 
      5 class t_ex6_1_ISMA(unittest.TestCase):

ImportError: No module named gpfit.fit

Should input data be pre- or post- log-transformation?

Given a data set with large numbers, GPfit runs into numerical overflow issues with exp().

x = [1200,
       13000,
       15000,
       16000,
       17000,
       18000,
       19000,
       30000,
       32000,
       34000]

y = [325000,
       250000,
       750000,
       2E6,
       7E6,
       750000,
       8E6,
       6E6,
       2E6,
       13E6,
       ]

Gives results like:

/Users/philippekirschen/Documents/MIT/Research/GPfit/gpfit/gpfit/fit.py:127: RuntimeWarning: overflow encountered in exp
  w_SMA = exp(y_SMA)
/Users/philippekirschen/Documents/MIT/Research/GPfit/gpfit/gpfit/fit.py:130: RuntimeWarning: overflow encountered in exp
  w = (exp(ydata)).T[0]


w**0.1 = 0 * (u_1)**34.9
    + inf * (u_1)**-5.28
    + 0 * (u_1)**198

Wondering if anything clever can be done here.

add installation instructions to docs

Just received a request from a user who was confused about how to install GPfit. I've responded, but it made me realize we don't have install docs.

E-mail copied below:

Hi,

I've recently installed your GPkit python tool and I would like to test it by fitting my data as (I)SMA functions. However, when I try to run the example given for GPfit in:

http://gpfit.readthedocs.io/en/latest/examples.html

I get:

ImportError: No module named gpfit.fit

In the GPkit installation there's no reference to GPfit so I imagine this is an standalone package. How can it be installed?

Thanks in advance.

index error

@bqpd I'm doing some D8 fits (the file is naca_cl0_fits.py the Tail Fits folder at commit convexengineering/SPaircraft@851b1d4 on D8 master. GPfit is throwing the following error and neither I nor @1ozturkbe know what it is..

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/mayork/Documents/GpGit/d8/Tail Fits/naca_cl0_fits.py in <module>()
    100     X, Y = fit_setup(NACA, Re) # call fit(X, Y, 4, "SMA") to get fit
    101     F, A = plot_fits(NACA, Re)
--> 102     make_fit(NACA, Re)
    103     F.savefig("tail_fits/taildragpolar.pdf",
    104               bbox_inches="tight")

/Users/mayork/Documents/GpGit/d8/Tail Fits/naca_cl0_fits.py in make_fit(naca_range, re_range)
     65     print np.size(x)
     66     print np.size(y)
---> 67     fit(x, y, 3)
     68 
     69 def plot_fits(naca_range, re_range):

/Users/mayork/Documents/GpGit/gpfit/gpfit/fit.pyc in fit(xdata, ydata, K, ftype)
     70         w = Variable("w")
     71 
---> 72     params = get_params(ftype, K, xdata, ydata)
     73 
     74     # A: exponent parameters, B: coefficient parameters

/Users/mayork/Documents/GpGit/gpfit/gpfit/fit.pyc in get_params(ftype, K, xdata, ydata)
     24         return r, drdp
     25 
---> 26     ba = ba_init(xdata, ydata.reshape(ydata.size, 1), K).flatten('F')
     27 
     28     if ftype == "ISMA":

/Users/mayork/Documents/GpGit/gpfit/gpfit/ba_init.pyc in ba_init(x, y, K)
     77                       "full rank for local fitting." % (i-iinit, k))
     78         # now create the local fit
---> 79         b[:, k] = lstsq(X[inds.nonzero()], y[inds.nonzero()])[0][:, 0]
     80 
     81     return b

IndexError: index 59 is out of bounds for axis 0 with size 59

Decide what (if anything) should print to screen during fitting

Currently the bverbose option is set to False in the code. This suppresses all print messages, which is nice and clean, but it also means that the user doesn't know if the fitting process reaches max iterations, or reaches max time etc.

A user may also want to know about the rate of residual convergence.

All terms in multi-term SMA fits have very similar coefficients and exponents

This could just be the models I'm fitting to, but it seems like I every time I try to use SMA fits with multiple (K) terms, the result is a sum of K nearly identical terms, e.g. w**0.149 = 1.2 * (u_1)**0.0106 + 1.22 * (u_1)**0.0105. This has been my experience with a wide variety of relationship types, and it seems dubious.

generate example output automatically

just like in gpkit, as discussed here: https://github.com/hoburg/gpfit/pull/26/files#r51971484

@bqpd: which code in gpkit makes this happen?

fits with poorly conditioned data

I'm trying to fit this data. Not sure why but I'm really struggling . If i fit the full range of data, the fit isn't even close. Fitting to a subset of the entire range yields a much better result, but the returned fit still consistently underestimates the pressure ratio...any ideas for some data manipulation that could help? I've tried fitting to log(1/(p**2)), log(1/(2p)), log(1/(10p)) with fits ranging from 3-20 terms...I included some example plots below. I'm fairly confident my fitting code is correct due to the fact some of the fits are close.

what I want to fit

the data I am trying to fit in log space

fit from a subset of data range. The longer vertical tails are anticipated, I plotted over slightly larger range.

fit from the entire data range

Documentation: figure out how to get auto-doc to work properly

Currently auto-doc seems to work when the html files are made locally using make html command, but doesn't work on the live version of the documentation site.

gpfit should accept mesh vectors

per @Ltrollinger, they are a nice interpretable transpose-invariant input.

Unit tests currently do fitting outside of test methods

... leading to inaccurate total times being output, among other issues.

Returned fit causes overflow error

Again, when trying to fit the compressor maps, I've gotten a number of returned fits with very large exponents. Sometimes, when these are plotted outside of log space, overflow errors occur. I'm not sure this can be avoided, and I don't think the data I'm using is too well conditioned, but it would be nice if there was someway to control how large the exponents were in the fit.

'''
w0.231 = 0.00187 * (u_1)-302 * (u_2)58.2
+ 3.75e-12 * (u_1)-2.67e+03 * (u_2)496
+ 0.326 * (u_1)-7.81 * (u_2)**0.962
+ 0.465 * (u_1)1.97 * (u_2)-0.525
'''

Much documentation. 'Getting started' is blank?!

And @pgkirsch's docstrings are so lovely, they deserve a glossary.

ValueError('Not enough data points')

Hi,

I'm trying to run a SMA fit on my data but I can't seem to enter it correctly. My x-data is a 1062x4 matrix and correspondingly my y-data is a 1062x1 vector.

x
array([[ -0.69314718, -1.2039728 , -13.81551056, -13.81551056],
[ -0.65392647, -1.2039728 , -13.81551056, -13.81551056],
[ -0.61618614, -1.2039728 , -13.81551056, -13.81551056],
...,
[ 0.37843644, 0.18232156, -11.51292546, -9.21034037],
[ 0.39204209, 0.18232156, -11.51292546, -9.21034037],
[ 0.40546511, 0.18232156, -11.51292546, -9.21034037]])

x.shape
(1062, 4)

y
array([-10.09725113, -10.0955659 , -10.09396532, ..., -5.87544124,
-5.87526203, -5.87509244])

y.shape
(1062,)

When I try to run the fit, I get the following error:

cSMA, errorSMA = fit(x,y,K,"SMA")
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/gpfit/fit.py", line 72, in fit
params = get_params(ftype, K, xdata, ydata)
File "/usr/local/lib/python2.7/dist-packages/gpfit/fit.py", line 26, in get_params
ba = ba_init(xdata, ydata.reshape(ydata.size, 1), K).flatten('F')
File "/usr/local/lib/python2.7/dist-packages/gpfit/ba_init.py", line 36, in ba_init
raise ValueError('Not enough data points')
ValueError: Not enough data points

I know that this is probably not a tool error but a user-keyboard bug, but I just don't get what could be the problem here. Any pointers are much appreciated.

Regards,

Lucho.

ensure consistency with gpkit (especially in method / class naming)
put fixme in the command-line flags
remove duplicate-code from the disable

convexengineering / gpfit Goto Github PK

gpfit's People

Contributors

Stargazers

Watchers

Forkers

gpfit's Issues

FAIL [0.000s]: test_rms_error (gpfit.tests.t_ex6_3.t_ex6_3_ISMA)

Recommend Projects

Recommend Topics

Recommend Org