civisanalytics / python-glmnet Goto Github PK
View Code? Open in Web Editor NEWA python port of the glmnet package for fitting generalized linear models via penalized maximum likelihood.
License: Other
A python port of the glmnet package for fitting generalized linear models via penalized maximum likelihood.
License: Other
Starting in version 0.21 of sklearn, certain packages previously provided in the sklearn.externals module are now depreciated. These include joblib and six, which glmnet uses in its util.py and scorer.py files, respectively. To reduce FutureWarnings upon module load, and also to avoid issues when sklearn-0.23 is released and suddenly joblib and six aren't found under sklearn.externals, it would be a good idea to depend on these packages directly, adding them to the requirements.txt file.
I run into an error when trying to pip install glmnet, which I wanted to share with you. Iโm using macOS High Siera (10.13.6). The output I receive you find underneath. I also ran a locate to see if I could find the dynamic library, which I did, and also attached the output displaying this result. In case you can tell me how to resolve this, I would appreciate that.
Sincerely,
Michiel
(venv) Zarathustra:allosteric-inference-master Zarathustra$ pip3 install glmnet
Collecting glmnet
Using cached https://files.pythonhosted.org/packages/c7/97/6f92f20fc193478c5d5927396c8d691abbdaa7774fd67e8a08fdeb1a2470/glmnet-2.0.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/tq/yhb6dsx952l1wcmx5qnxp0940000gn/T/pip-install-u44dd773/glmnet/setup.py", line 38, in <module>
GFORTRAN_LIB = get_lib_dir('libgfortran.3.dylib')
File "/private/var/folders/tq/yhb6dsx952l1wcmx5qnxp0940000gn/T/pip-install-u44dd773/glmnet/setup.py", line 30, in get_lib_dir
raise Exception("Failed to find {}".format(dylib))
Exception: Failed to find libgfortran.3.dylib
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/tq/yhb6dsx952l1wcmx5qnxp0940000gn/T/pip-install-u44dd773/glmnet/
(venv) Zarathustra:allosteric-inference-master Zarathustra$ locate libgfortran.3.dylib
/Applications/MATLAB_R2017b.app/sys/os/maci64/libgfortran.3.dylib
/Applications/Tellurium.app/Contents/Resources/telocal/python-3.6.3/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libgfortran.3.dylib
/Users/Zarathustra/ETH/git_repo/allosteric-inference-master/venv/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/python_server/tutorial/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/venv/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/ETH/repositories/allosteric-inference-master/venv/python_server/tutorial/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
/Users/Zarathustra/miniconda2/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda2/pkgs/libgfortran-3.0.1-h93005f0_2/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda3/lib/libgfortran.3.dylib
/Users/Zarathustra/miniconda3/pkgs/libgfortran-3.0.1-h93005f0_2/lib/libgfortran.3.dylib
/usr/local/lib/python3.6/site-packages/scipy/.dylibs/libgfortran.3.dylib
Hi,
I have the following pipeline.
First I apply ridge regression using 10-cv to find the best lambda.
I get same lambda max and lambda best as in R cv.glmnet.
Next, I refit the model using the best lambda from the first step, without intercept and compare it to the results of R glmnet.
The coefficients and predictions are different. Why is that?
Comparison of coefficients:
R
(Intercept) 0
f1 -0.004059542
f2 0.377331808
f3 1.006589044
f4 0.876858914
f5 0.140710854
f6 730268.470575249
f7 244447.850561236
f8 537663.923355049
f9 176279.892636801
f10 662.748853227
f11 739399.127039033
python:
Intercept 0
f1 -0.16957
f2 0.33352
f3 0.80749
f4 0.71330
f5 0.11385
f6 801091.27661
f7 293769.02256
f8 557147.70998
f9 251954.31707
f10 797640.12411
f11 1086129.27954
Thanks
python-glmnet/glmnet/linear.py
Lines 288 to 293 in 813c06f
glmnet
actually does a slightly different check than just a "n" vs "p" comparison like this. It invokes method 1 (covariance method) if p <= 500. The covariance method keeps track of a matrix of covariances C(i,j) for every feature i and every active feature j. And under the hood, C is allocated as a pxp matrix (even though we use much less memory than that usually); this was done out of simplicity because it's very hard to write clever data structures in Fortran. So even when n >> p, if p is also large, this is not a viable default option on most machines.
Anyways, I'd suggest changing to
if X.shape[1] <= 500:
algo_flag = 1
else:
algo_flag = 2
I have a specific problem that is very frustrating. In my application, if I specify n_lambda=100, and check the attribute coef_path_ it has the shape (n_samples, 5). This is expected in some cases. However, if I specify n_lambda=5, then the attribute coef_path_ looks very different. Here, I would expect similar behavior. Is there any good explanation?
It looks like the doc strings have slightly different names for various attributes that get set during CV?
I am running into a number of Fortran compilation issues on MacOS. I followed the installation instructions but when running python setup.py install
I get the following warning: (repeated for many lines)
glmnet/src/glmnet/glmnet5.f90:683:72:
subroutine get_int_parms(sml,eps,big,mnlam,rsqmax,pmin,exmx) 772
1
Warning: Line truncated at (1) [-Wline-truncation]
After these, there are warnings about unused labels.
At the end it returns an error.
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
error: Command "/usr/local/bin/gfortran -Wall -g -pie -headerpad_max_install_names build/temp.macosx-10.9-x86_64-3.6/build/src.macosx-10.9-x86_64-3.6/glmnet/_glmnetmodule.o build/temp.macosx-10.9-x86_64-3.6/build/src.macosx-10.9-x86_64-3.6/build/src.macosx-10.9-x86_64-3.6/glmnet/fortranobject.o build/temp.macosx-10.9-x86_64-3.6/glmnet/src/glmnet/glmnet5.o -L/usr/local/Cellar/gcc/6.2.0/lib/gcc/6 -L/usr/local/Cellar/gcc/6.2.0/lib/gcc/6 -L/usr/local/Cellar/gcc/6.2.0/lib/gcc/6/gcc/x86_64-apple-darwin15.6.0/6.2.0 -lgfortran -o build/lib.macosx-10.9-x86_64-3.6/_glmnet.cpython-36m-darwin.so" failed with exit status 1
About about one and a half year ago I succesfully installed python-glmnet. Cannot remember any problems at that time. I am installing in a conda environment. Required packages scipy, numpy, scikit-learn have been installed.
Any ideas?
The glment.ElasticNet.predict
method outputs an array with dimension 0 when given one row to predict. It should return an array with dimension 1 and shape (1,). Presumably the LogitNet.predict_proba
has the same problem.
Code to reproduce using glmnet v2.0.0:
from sklearn import datasets
X, y = datasets.make_regression(n_samples=9, n_features=4, random_state=0)
import glmnet
print(glmnet.__version__)
gl = glmnet.ElasticNet(random_state=0)
gl.fit(X, y)
print(gl.predict(X[:2], lamb=[20, 10]).shape)
print(gl.predict(X[:1], lamb=[20, 10]).shape)
print(gl.predict(X[:2]).shape)
print(gl.predict(X[:1]).shape)
Actual output:
2.0.0+18.ga25bcef
(2, 2)
(2,)
(2,)
()
Expected output:
2.0.0+18.ga25bcef
(2, 2)
(2,)
(2,)
(1,)
I'd originally reported this under #30 , but I believe it's a different issue.
I'm trying to build this package on Windows and not having much luck. Using the mingw-w64 Fortran compiler (installed from Anaconda) and Visual Studio 2015, I get these errors:
glmnet5.o : error LNK2001: unresolved external symbol _gfortran_runtime_error_at
glmnet5.o : error LNK2001: unresolved external symbol _gfortran_internal_pack
glmnet5.o : error LNK2001: unresolved external symbol _gfortran_internal_unpack
A morning of Googling hasn't been much help. The closest I found was this StackOverflow question, but setting the compiler=mingw32
flag in setup.cfg
leads to a different error, ValueError: Unknown MS Compiler version 1900
.
Have you successfully built this package on Windows?
Hello,
I have forked the repo to work on a new feature. Based on the contributing documents, I started to see if I can run pytest on tests
files, which I could not. Apparently, _glmnet
module is missing. I tried to load the modules, and I failed as well. I am wondering if you can help me. I get the following error.
1 import pkg_resources
2
----> 3 from .logistic import LogitNet
4 from .linear import ElasticNet
5
~/Documents/GitHub/GLM-Net/glmnet/logistic.py in <module>
12
13 from .errors import _check_error_flag
---> 14 from _glmnet import lognet, splognet, lsolns
15 from glmnet.util import (_fix_lambda_path,
16 _check_user_lambda,
ModuleNotFoundError: No module named '_glmnet'
An extra __init__.py
slipped into the root directory. It's outside the glmnet
module, so we should remove it.
Can we get an update for Python version 3.9 as well?
Hi,
I came across your implementation. Is it possible to have Poisson regression as well? Do you have plans to include this? Thanks.
Thank you for your package, and for making it available on conda.
If I set a max_iter which is too low, instead of getting a convergence warning as in sklearn behavior, it simply fails with an error. Can this be fixed easily? I'm trying to get the solution for a single lambda (and from what I understood, if I use a default apth, I have no guarantee that glmnet will go to the end of it, it may early stop, which I don't want).
Reproduce with:
from celer.datasets import make_correlated_data
from sklearn.linear_model import ElasticNet
import glmnet
from numpy.linalg import norm
import numpy as np
np.random.seed(0)
X = np.random.randn(100, 200)
X = np.asfortranarray(X)
y = np.random.randn(100)
alpha_max = norm(X.T @ y, ord=np.inf) / len(y)
clf2 = glmnet.ElasticNet(alpha=1, lambda_path=[
alpha_max, alpha_max/100], standardize=False, fit_intercept=False, tol=1e-10, max_iter=1).fit(X, y)
output:
/home/mathurin/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/errors.py:66: RuntimeWarning: Model did not converge for smaller values of lambda, returning solution for the largest 3 values.
warnings.warn("Model did not converge for smaller values of lambda, "
/home/mathurin/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/errors.py:66: RuntimeWarning: Model did not converge for smaller values of lambda, returning solution for the largest 3 values.
warnings.warn("Model did not converge for smaller values of lambda, "
/home/mathurin/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/util.py:202: RuntimeWarning: lambda_path has a single value, this may be an intercept-only model.
warnings.warn("lambda_path has a single value, this may be an "
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-97963ff96fcf> in <module>
10
11
---> 12 clf2 = glmnet.ElasticNet(alpha=1, lambda_path=[
13 alpha_max, alpha_max/100], standardize=False, fit_intercept=False, tol=1e-10, max_iter=1).fit(X, y)
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in fit(self, X, y, sample_weight, relative_penalties, groups)
236 self._cv = GroupKFold(n_splits=self.n_splits)
237
--> 238 cv_scores = _score_lambda_path(self, X, y, groups,
239 sample_weight,
240 relative_penalties,
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/util.py in _score_lambda_path(est, X, y, groups, sample_weight, relative_penalties, scoring, n_jobs, verbose)
64 warnings.simplefilter(action, UndefinedMetricWarning)
65
---> 66 scores = Parallel(n_jobs=n_jobs, verbose=verbose, backend='threading')(
67 delayed(_fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
68 est.lambda_path_, train_idx, test_idx)
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
1041 # remaining jobs.
1042 self._iterating = False
-> 1043 if self.dispatch_one_batch(iterator):
1044 self._iterating = self._original_iterator is not None
1045
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
859 return False
860 else:
--> 861 self._dispatch(tasks)
862 return True
863
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch)
777 with self._lock:
778 job_idx = len(self._jobs)
--> 779 job = self._backend.apply_async(batch, callback=cb)
780 # A job can complete so quickly than its callback is
781 # called before we get here, causing self._jobs to
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/util.py in _fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
117
118 lamb = np.clip(score_lambda_path, m.lambda_path_[-1], m.lambda_path_[0])
--> 119 return scorer(m, X[test_inx, :], y[test_inx], lamb=lamb)
120
121
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/scorer.py in _passthrough_scorer(estimator, *args, **kwargs)
187 def _passthrough_scorer(estimator, *args, **kwargs):
188 """Function that wraps estimator.score"""
--> 189 return estimator.score(*args, **kwargs)
190
191
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in score(self, X, y, lamb)
437
438 # pred will have shape (n_samples, n_lambda)
--> 439 pred = self.predict(X, lamb=lamb)
440
441 # Reverse the args of the r2_score function from scikit-learn. The
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in predict(self, X, lamb)
414 Predicted response value for each sample given each value of lambda
415 """
--> 416 return self.decision_function(X, lamb)
417
418 def score(self, X, y, lamb=None):
~/anaconda3/envs/benchopt_lasso/lib/python3.8/site-packages/glmnet/linear.py in decision_function(self, X, lamb)
392 # single value of lambda
393 if lamb.shape[0] == 1:
--> 394 z = z.squeeze(axis=-1)
395 return z
396
ValueError: cannot select an axis to squeeze out which has size not equal to one
ping @agramfort
Hey there,
when doing leave one out cross-validation LogitNet
fails with an Index Error
when I try to predict the label of the single test sample.
Here is a simple working example:
X = np.random.randint(0,10, size=(22,10))
y = np.random.randint(0,2,size=(22,))
X_train = X[:-1,:]
y_train = y[:-1]
X_test = X[[-1]]
y_test = y[[-1]]
m = LogitNet(alpha=0.8, tol=0.3, max_iter=2000)
m.fit(X_train, y_train)
m.predict(X_test)
In my case it fails with
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-178-2b6b26202b08> in <module>()
15 m.fit(X_train, y_train)
16
---> 17 m.predict(X_test)
/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in predict(self, X, lamb)
474 """
475
--> 476 scores = self.predict_proba(X, lamb)
477 indices = scores.argmax(axis=1)
478
/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in predict_proba(self, X, lamb)
443 # reshape z to (n_samples, n_classes, n_lambda)
444 n_lambda = len(np.atleast_1d(lamb))
--> 445 z = z.reshape(z.shape[0], -1, n_lambda)
446
447 if z.shape[1] == 1:
IndexError: tuple index out of range
As a side note, I could imagine it refers to #25 but I am not sure.
When switching to leave-2-out cv the problem does not occur anymore.
Best and Merry Christmas!
I have been trying to use
F1 = make_scorer(fbeta_score, beta=1, labels = ['1', '2'], average='micro')
and F2
as the scoring parameter for logistic regression but glmnet throws the below error. It runs smoothly with scoring = 'accuracy', though. I tried to study the code but couldn't find a way to work with customized scores. Any help would be appreciated.
TypeError Traceback (most recent call last)
/home/Dados/Redes_Neurais_II/Dissertacao/Teste pdf 201224.py in
4110 n_splits=n_splits, min_lambda_ratio=min_lambda_ratio, tol=tol,
4111 scoring=scoring, n_jobs=-1, random_state=1, verbose=True)
----> 4112 clf_cv = lgnetcv.fit(x_train, y_train)
4113 print(f'\nMelhor lambda = {clf_cv.lambda_best_} para alpha = {alpha}')
4114 # Usa todo o conj de treinamento (inclusive valid) para achar coeficientes finais
~/.local/lib/python3.6/site-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties, groups)
248 self.scoring,
249 n_jobs=self.n_jobs,
--> 250 verbose=self.verbose)
251
252 self.cv_mean_score_ = np.atleast_1d(np.mean(cv_scores, axis=0))
~/.local/lib/python3.6/site-packages/glmnet/util.py in _score_lambda_path(est, X, y, groups, sample_weight, relative_penalties, scoring, n_jobs, verbose)
67 delayed(fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
68 est.lambda_path, train_idx, test_idx)
---> 69 for (train_idx, test_idx) in cv_split)
70
71 return scores
~/.local/lib/python3.6/site-packages/joblib/parallel.py in call(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time
~/.local/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())
/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
/usr/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
117 job, i, func, args, kwds = task
118 try:
--> 119 result = (True, func(*args, **kwds))
120 except Exception as e:
121 if wrap_exception and func is not _helper_reraises_exception:
~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py in call(self, *args, **kwargs)
606 def call(self, *args, **kwargs):
607 try:
--> 608 return self.func(*args, **kwargs)
609 except KeyboardInterrupt:
610 # We capture the KeyboardInterrupt and reraise it as
~/.local/lib/python3.6/site-packages/joblib/parallel.py in call(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):
~/.local/lib/python3.6/site-packages/joblib/parallel.py in (.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):
~/.local/lib/python3.6/site-packages/glmnet/util.py in fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
117
118 lamb = np.clip(score_lambda_path, m.lambda_path[-1], m.lambda_path_[0])
--> 119 return scorer(m, X[test_inx, :], y[test_inx], lamb=lamb)
120
121
TypeError: call() got an unexpected keyword argument 'lamb'
Hi,
An almost fully functional glmnet_python version from Stanford has been around for a few months now. It has been tested and validated with base R versions with several use cases. There is also an extensive vignette with many examples and feature documentation.
However, it needs a little bit more work in getting it pip installable, by someone knowledgeable about how it is done. Essentially, it has been successfully installed on Centos 6.7/64-bit linux machines but not tested in others.
It would be great to get the help of this community to get the two versions integrated. Here is the code repository:
https://github.com/bbalasub1/glmnet_python
The best starting point for looking at this project is the jupyter notebook here:
https://github.com/bbalasub1/glmnet_python/blob/master/test/glmnet_examples.ipynb
Thank you & Regards,
Bala
A wrapper of the regularized cox regression would be awesome. I'm thinking about trying, would it be a similar thing to LogitNet?
The package does not build with pip using --use-pep517
with poetry. This is mainly an issue as the flag cannot be disabled with poetry (see python-poetry/poetry#3433).
A reproducible example is given below:
mkdir tmp
cd tmp
poetry init
# click through steps...
poetry add glmnet
> Using version ^2.2.1 for glmnet
>
> Updating dependencies
> Resolving dependencies... (0.1s)
>
> Writing lock file
>
> Package operations: 6 installs, 0 updates, 0 removals
>
> โข Installing numpy (1.23.3)
> โข Installing joblib (1.2.0)
> โข Installing scipy (1.9.2)
> โข Installing threadpoolctl (3.1.0)
> โข Installing scikit-learn (1.1.2)
> โข Installing glmnet (2.2.1): Failed
>
> CalledProcessError
>
> ...
>
> Installing build dependencies: started
> Installing build dependencies: finished with status 'done'
> Getting requirements to build wheel: started
> Getting requirements to build wheel: finished with status 'error'
> error: subprocess-exited-with-error
>
> ร Getting requirements to build wheel did not run successfully.
> โ exit code: 1
> โฐโ> [2 lines of output]
> install requires: 'numpy'. use pip or easy_install.
> $ pip install numpy
> [end of output]
The error comes from the beginning of the setup.py file:
Lines 12 to 18 in 813c06f
which seems to use this SO answer to compile the FORTRAN code from the original R package: https://stackoverflow.com/a/55358607/5861244
A solution in the short term is to call
poetry run python -m pip install glmnet --no-use-pep517
poetry add glmnet
as mentioned here: python-poetry/poetry#3433 (comment)
Versions for completness:
poetry run python --version
> Python 3.10.7
poetry --version
> Poetry (version 1.2.0)
from glmnet import LogitNet
m3 = LogitNet()
xd = trn
%time m3 = m3.fit(X=xd, y=yd_trn)
the fit command gives me an error
Traceback (most recent call last):
File "<timed exec>", line 1, in <module>
File "XXX/anaconda3/lib/python3.6/site-packages/glmnet/logistic.py", line 206, in fit
self.cv = self.CV(n_splits=self.n_splits, shuffle=True,
AttributeError: 'LogitNet' object has no attribute 'CV'
I have 'fixed' it by changing code to LogitNet.CV .... I am not clear on why this should be required.
I am using Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49)
Hey there,
I've been trying to fit an Elastic Net with your toolbox and ran into an error:
In the logistic.py class in the predict_proba() function you have the following code:
z = self.decision_function(X, lamb)
expit(z, z)
# z = np.atleast_2d(z)
# reshape z to (n_samples, n_classes, n_lambda)
n_lambda = len(np.atleast_1d(lamb))
z = z.reshape(z.shape[0], -1, n_lambda)
However, when the passed X
is only one-dimensional and let's say n_lambda = 86
, then z.shape()
will return the number of lambdas ( as in (86,)
, not (1,86)
). Which leads the reshape to fail since it tries to shape a 1x86 array into an 86xKx86 array.
As you can see, I added the
z = np.atleast_2d(z)
line which takes care of the reshaping problem. However, then I this kind of error:
/usr/local/lib/python3.4/dist-packages/glmnet/logistic.py in predict(self, X, lamb)
478 indices = scores.argmax(axis=1)
479
--> 480 return self.classes_[indices]
481
482 def score(self, X, y, lamb=None):
IndexError: index 85 is out of bounds for axis 1 with size 2
since then the output is apparently not in the expected shape anymore.
I believe, this error could be fixed with a simple axis=0
in line 478, but I do not have the overview so I thought it's better to report back to you.
Best,
Sophie
On Windows 10 cmd, I typed the following command
conda install -c conda-forge glmnet
But the error message says
PackagesNotFoundError: The following packages are not available from current channels:
- glmnet
Current channels:
- https://conda.anaconda.org/conda-forge/win-64
- https://conda.anaconda.org/conda-forge/noarch
- https://repo.anaconda.com/pkgs/main/win-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/win-64
- https://repo.anaconda.com/pkgs/r/noarch
- https://repo.anaconda.com/pkgs/msys2/win-64
- https://repo.anaconda.com/pkgs/msys2/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
CircleCI v1 configuration files were discontinued at the end of August 2018. We should update to use a v2 configuration file.
Hi there,
In the R implementation, glmnet allows supplying the model with offset
, a vector to be included in the linear predictor. It seems that the current python implementation does not yet have it. But you guys intend to support this soon? Because I found a placeholder in your code. Thanks.
offset = np.zeros((X.shape[0], n_classes), dtype=np.float64, order='F')
Is it possible to customise the CV scheme? For example, can one perform canonical CV?
Just what the title says. It should be implemented for full sklearn compliance.
Hi,
So I have this project where I am forced to use 'glmnet', however for the last 10 hours I have tried almost everything possible to install it but it is giving some sort of lame 'numpy' dependency error even though I am having latest version of numpy installed. From the original documentation at https://pypi.org/project/glmnet/ I have found out that it need fortran compiler, I have installed that too and ensured its working. But to no use. I have pasteed the error code below for better understanding. Any help regarding this will be greatly appreciated as I can not find any latest post on this that works for the current version of python and pip. Also if I need to downgrade something and than I can install it, please let me know. I have tried this too vaguely but has not worked.
Thanks in advance.
ERROR:
C:\Users\PMLS>pip install glmnet
Collecting glmnet
Using cached glmnet-2.2.1.tar.gz (90 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
ร python setup.py egg_info did not run successfully.
โ exit code: 1
โฐโ> [2 lines of output]
install requires: 'numpy'. use pip or easy_install.
$ pip install numpy
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
ร Encountered error while generating package metadata.
โฐโ> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
For me, the package still gets installed in Python2.7 after the update to conda-forge following #32.
I did a conda clean -a
to clear caches and tarballs.
I used windows 10 system. I got the error: the following packages are not available from current channels: glmnet
I've been using glmnet==2.2.1
installed from Mac wheels with gcc==9.3.0
with no issues. But when my colleagues who didn't have gcc
installed yet tried to go through the setup, they ran into the following:
python -c "import glmnet"
ImportError: dlopen(/Users/rockwellweiner/model/.venv/lib/python3.7/site-packages/_glmnet.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/gcc/lib/gcc/9/libgfortran.5.dylib Referenced from: /Users/rockwellweiner/model/.venv/lib/python3.7/site-packages/_glmnet.cpython-37m-darwin.so Reason: image not found
Symlinking /usr/local/opt/gcc/lib/gcc/10
to /usr/local/opt/gcc/lib/gcc/9
seems to do the trick but is obviously not ideal; is there maybe a change to setup.py
or the wheel build script that would support both?
I was hoping to get some clarification on why glmnet for Python is always deterministic regardless of seed, despite the fact that the documentation states the solver is not deterministic (e.g. https://github.com/civisanalytics/python-glmnet/blob/master/glmnet/linear.py#L77). For example, each of the following different runs return the result, regardless of whether a seed is or is not set:
from glmnet import ElasticNet
import io
import numpy as np
import pandas as pd
import requests
from sklearn.preprocessing import StandardScaler
# Load data
url = 'https://raw.githubusercontent.com/CCS-Lab/easyml/master/Python/datasets/prostate.csv'
s = requests.get(url).content
prostate = pd.read_csv(io.StringIO(s.decode('utf-8')))
# Generate coefficients from data by hand
X, y = prostate.drop('lpsa', axis=1).values, prostate['lpsa'].values
sclr = StandardScaler()
X_preprocessed = sclr.fit_transform(X)
# no random state
coefficients = []
for i in range(10):
model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200)
print(id(model))
model.fit(X_preprocessed, y)
coefficients.append(np.asarray(model.coef_))
print(coefficients)
# seed set at outer level
np.random.seed(43210)
coefficients = []
for i in range(10):
model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200)
print(id(model))
model.fit(X_preprocessed, y)
coefficients.append(np.asarray(model.coef_))
print(coefficients)
# seed set at inner level
coefficients = []
for i in range(10):
np.random.seed(43210)
model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200)
print(id(model))
model.fit(X_preprocessed, y)
coefficients.append(np.asarray(model.coef_))
print(coefficients)
# seed set at function level
coefficients = []
for i in range(10):
random_state = np.random.RandomState(i)
model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200, random_state=random_state)
print(id(model))
model.fit(X_preprocessed, y)
coefficients.append(np.asarray(model.coef_))
print(coefficients)
coefficients = []
random_state = np.random.RandomState(43210)
for i in range(10):
model = ElasticNet(alpha=1, standardize=False, cut_point=0.0, n_lambda=200, random_state=random_state)
print(id(model))
model.fit(X_preprocessed, y)
coefficients.append(np.asarray(model.coef_))
print(coefficients)
This behavior is in direct contrast with the behavior observed in the R version of the glmnet package:
library(easyml) # devtools::install_github("CCS-Lab/easyml", subdir = "R")
library(glmnet)
data("prostate", package = "easyml")
# Set X, y, and scale X
X <- as.matrix(prostate[, -9])
y <- prostate[, 9]
X_scaled <- scale(X)
# no seed
m <- 10
n <- ncol(X)
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
model <- glmnet(X_scaled, y)
coefs <- coef(model, s = model_cv$lambda.min)
Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)
# Seed set at outer level
set.seed(43210)
m <- 10
n <- ncol(X)
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
model <- glmnet(X_scaled, y)
coefs <- coef(model, s = model_cv$lambda.min)
Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)
# Seed set at inner level
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
set.seed(43210)
model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
model <- glmnet(X_scaled, y)
coefs <- coef(model, s = model_cv$lambda.min)
Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)
# Different seed set each loop at inner level
Z <- matrix(NA, nrow = m, ncol = n)
for (i in (1:m)) {
set.seed(i)
model_cv <- cv.glmnet(X_scaled, y, standardize = FALSE)
model <- glmnet(X_scaled, y)
coefs <- coef(model, s = model_cv$lambda.min)
Z[i, ] <- as.numeric(coefs)[-1]
}
print(Z)
If the https://github.com/civisanalytics/python-glmnet/ version of glmnet is a wrapper around the Fortran code, why are the behavior in R and Python different?
The latest master threw me some warning such as
/home/samuel/anaconda3/lib/python3.5/site-packages/sklearn/cross_validation.py:44:
DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved.
Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
and
home/samuel/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py:395:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19.
Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
They are already at 0.18, so it's just annoying as of now but would be worth having a look into for the future.
If I run
from glmnet import ElasticNet, LogitNet
from sklearn.utils.estimator_checks import check_estimator
check_estimator(ElasticNet)
check_estimator(LogitNet)
then each estimator check fails.
For the ElasticNet, the error is
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-40-d2891e7905ab> in <module>()
----> 1 check_estimator(ElasticNet)
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
263 for check in _yield_all_checks(name, estimator):
264 try:
--> 265 check(name, estimator)
266 except SkipTest as message:
267 # the only SkipTest thrown currently results from not
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/testing.py in wrapper(*args, **kwargs)
289 with warnings.catch_warnings():
290 warnings.simplefilter("ignore", self.category)
--> 291 return fn(*args, **kwargs)
292
293 return wrapper
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_sample_weights_list(name, estimator_orig)
429 sample_weight = [3] * 10
430 # Test that estimators don't raise any exception
--> 431 estimator.fit(X, y, sample_weight=sample_weight)
432
433
~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/linear.py in fit(self, X, y, sample_weight, relative_penalties)
186 sample_weight = np.ones(X.shape[0])
187
--> 188 self._fit(X, y, sample_weight, relative_penalties)
189
190 if self.n_splits >= 3:
~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/linear.py in _fit(self, X, y, sample_weight, relative_penalties)
225
226 _y = y.astype(dtype=np.float64, order='F', copy=True)
--> 227 _sample_weight = sample_weight.astype(dtype=np.float64, order='F',
228 copy=True)
229
AttributeError: 'list' object has no attribute 'astype'
and for the LogitNet, it's
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-b458d16bd33c> in <module>()
----> 1 check_estimator(LogitNet)
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
263 for check in _yield_all_checks(name, estimator):
264 try:
--> 265 check(name, estimator)
266 except SkipTest as message:
267 # the only SkipTest thrown currently results from not
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/testing.py in wrapper(*args, **kwargs)
289 with warnings.catch_warnings():
290 warnings.simplefilter("ignore", self.category)
--> 291 return fn(*args, **kwargs)
292
293 return wrapper
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py in check_sample_weights_list(name, estimator_orig)
429 sample_weight = [3] * 10
430 # Test that estimators don't raise any exception
--> 431 estimator.fit(X, y, sample_weight=sample_weight)
432
433
~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties)
196 self.scoring, classifier=True,
197 n_jobs=self.n_jobs,
--> 198 verbose=self.verbose)
199
200 self.cv_mean_score_ = np.atleast_1d(np.mean(cv_scores, axis=0))
~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/util.py in _score_lambda_path(est, X, y, sample_weight, relative_penalties, cv, scoring, classifier, n_jobs, verbose)
69 delayed(_fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
70 est.lambda_path_, train_idx, test_idx)
---> 71 for (train_idx, test_idx) in cv)
72
73 return scores
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
623 return False
624 else:
--> 625 self._dispatch(tasks)
626 return True
627
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
589 self._jobs.append(job)
590
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
112 if callback:
113 callback(result)
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
333
334 def get(self):
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
~/anaconda3/envs/civis/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
~/anaconda3/envs/civis/lib/python3.6/site-packages/glmnet/util.py in _fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
112 """
113 m = clone(est)
--> 114 m = m._fit(X[train_inx, :], y[train_inx], sample_weight[train_inx], relative_penalties)
115
116 lamb = np.clip(score_lambda_path, m.lambda_path_[-1], m.lambda_path_[0])
TypeError: only integer scalar arrays can be converted to a scalar index
I would expect that these objects should pass the check_estimator
checks.
Sklearn's ElasticNet has the positive=True parameter. Is this possible to add?
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html
A minimal, reproducible example:
from glmnet import ElasticNet, LogitNet
from sklearn import datasets
X, y = datasets.make_classification(n_samples=100000, n_features=20,
n_informative=2, n_redundant=2)
er = ElasticNet()
lr = LogitNet()
er.fit(X, y)
print(er.coef_)
print(er.coef_.shape)
lr.fit(X, y)
print(lr.coef_)
print(lr.coef_.shape)
It'd be nice if the README were .rst instead of .md, for PyPI.
I think one could just
pandoc --from=markdown --to=rst --output=README.rst README.md
I want to use 'lambda.1se' to choose the model, I didn't find which parameter is for it.
Can you help me with that ?
Thank you!
Hey there,
I sometimes get the following error when fitting the model for cross validation:
ValueError Traceback (most recent call last)
<ipython-input-39-ed9d5b075d90> in <module>()
38 print(X_train.shape)
39
---> 40 m.fit(X_train,y_train)
41 scores.append(m.score(X_test,y_test))
42
/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties)
196 self.scoring, classifier=True,
197 n_jobs=self.n_jobs,
--> 198 verbose=self.verbose)
199
200 self.cv_mean_score_ = np.atleast_1d(np.mean(cv_scores, axis=0))
/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _score_lambda_path(est, X, y, sample_weight, relative_penalties, cv, scoring, classifier, n_jobs, verbose)
69 delayed(_fit_and_score)(est, scorer, X, y, sample_weight, relative_penalties,
70 est.lambda_path_, train_idx, test_idx)
---> 71 for (train_idx, test_idx) in cv)
72
73 return scores
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
623 return False
624 else:
--> 625 self._dispatch(tasks)
626 return True
627
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
589 self._jobs.append(job)
590
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
112 if callback:
113 callback(result)
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
333
334 def get(self):
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in __call__(self)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _fit_and_score(est, scorer, X, y, sample_weight, relative_penalties, score_lambda_path, train_inx, test_inx)
112 """
113 m = clone(est)
--> 114 m = m._fit(X[train_inx, :], y[train_inx], sample_weight[train_inx], relative_penalties)
115
116 lamb = np.clip(score_lambda_path, m.lambda_path_[-1], m.lambda_path_[0])
/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in _fit(self, X, y, sample_weight, relative_penalties)
370 self.lambda_path_ = self.lambda_path_[:self.n_lambda_]
371 # also fix the first value of lambda
--> 372 self.lambda_path_ = _fix_lambda_path(self.lambda_path_)
373 self.intercept_path_ = self.intercept_path_[:, :self.n_lambda_]
374 # also trim the compressed coefficient matrix
/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _fix_lambda_path(lambda_path)
122 reasonable. The method below matches what is done in the R/glmnent wrapper."""
123 if lambda_path.shape[0] > 2:
--> 124 lambda_0 = math.exp(2 * math.log(lambda_path[1]) - math.log(lambda_path[2]))
125 lambda_path[0] = lambda_0
126 return lambda_path
ValueError: math domain error
I guess there is an uncaught zero or -inf or something in the lambda_path, that is causing the math library to fail. Could you add a condtition?
Hey there,
when trying to fit a LogitNet using
X = np.ones((22,1))
y_ = array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
m = LogitNet(alpha=0.8, maxiter=2000, n_splits=3, tol=0.3)
m.fit(X,y_)
I get
RuntimeError Traceback (most recent call last)
<ipython-input-30-7c128fa244fb> in <module>()
----> 1 m.fit(X,y_)
/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in fit(self, X, y, sample_weight, relative_penalties)
187
188 # fit the model
--> 189 self._fit(X, y, sample_weight, relative_penalties)
190
191 # score each model on the path of lambda values found by glmnet and
/usr/local/lib/python3.5/dist-packages/glmnet/logistic.py in _fit(self, X, y, sample_weight, relative_penalties)
363 # raises RuntimeError if self.jerr_ is nonzero
364 self.jerr_ = jerr
--> 365 _check_glmnet_error_flag(self.jerr_)
366
367 # glmnet may not return the requested number of lambda values, so we
/usr/local/lib/python3.5/dist-packages/glmnet/util.py in _check_glmnet_error_flag(jerr)
138 else:
139 msg = "glmnet error no. {}"
--> 140 raise RuntimeError(msg.format(jerr))
141
142
RuntimeError: glmnet error no. 7777
Any idea why?
Sorry- python-glmnet does not contain glmnetcv, I got confused with glmnet_python
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.