intel / scikit-learn-intelex Goto Github PK

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Home Page: https://intel.github.io/scikit-learn-intelex/

License: Apache License 2.0

Python 76.97% C++ 19.26% Shell 0.52% Batchfile 0.16% C 0.27% CMake 0.37% Cython 2.46%

oneapi scikit-learn machine-learning-algorithms data-analysis machine-learning python swrepo ai-machine-learning big-data analytics

scikit-learn-intelex's Introduction

Intel(R) Extension for Scikit-learn*

Speed up your scikit-learn applications for Intel(R) CPUs and GPUs across single- and multi-node configurations
Releases | Documentation | Examples | Support | License

Overview

Intel(R) Extension for Scikit-learn is a free software AI accelerator designed to deliver over 10-100X acceleration to your existing scikit-learn code. The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel(R) platforms at launch time.

With Intel(R) Extension for Scikit-learn, you can:

Speed up training and inference by up to 100x with the equivalent mathematical accuracy
Benefit from performance improvements across different Intel(R) hardware configurations
Integrate the extension into your existing Scikit-learn applications without code modifications
Continue to use the open-source scikit-learn API
Enable and disable the extension with a couple of lines of code or at the command line

Intel(R) Extension for Scikit-learn is also a part of Intel(R) AI Tools.

Acceleration

Benchmarks code

Intel(R) Optimizations

Enable Intel(R) CPU optimizations

import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Enable Intel(R) GPU optimizations

import numpy as np
import dpctl
from sklearnex import patch_sklearn, config_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

👀 Check out available notebooks for more examples.

Installation

To install Intel(R) Extension for Scikit-learn, run:

pip install scikit-learn-intelex

See all installation instructions in the Installation Guide.

Integration

The software acceleration is achieved through patching. It means, replacing the stock scikit-learn algorithms with their optimized versions provided by the extension.

The patching only affects supported algorithms and their parameters. You can still use not supported ones in your code, the package simply fallbacks into the stock version of scikit-learn.

TIP: Enable verbose mode to see which implementation of the algorithm is currently used.

To patch scikit-learn, you can:

Use the following command-line flag:
```
python -m sklearnex my_application.py
```

Add the following lines to the script:

from sklearnex import patch_sklearn
patch_sklearn()

👀 Read about other ways to patch scikit-learn.

Documentation

daal4py and oneDAL

The acceleration is achieved through the use of the Intel(R) oneAPI Data Analytics Library (oneDAL). Learn more:

Samples & Examples

How to Contribute

We welcome community contributions, check our Contributing Guidelines to learn more.

* The Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

scikit-learn-intelex's People

Contributors

Stargazers

Watchers

Forkers

fschlimb gridl keceli ogrisel ghacupha umaimat laymer ehsantn napetrov raoberman surfndez leolorenzoluis icem4n1 bibikar samaid orazve owerbat petrovkp agorshk vyacheslav-smirnov samiuelee michael-smirnov pahandrovich pivovara amgrigoriev alexsandruss rlnx outoftardis aguzmanballen kalyanovd mhash1m scaactk romaa2000 rukhovichiv onlydeniko alexeykrylov shvetsks maria-petrova adityagupta26 yarshev a-mere-peasant asadmansr chinvib66 samsomyajit niltonfrederico makart19 smirnovegorru toxicscum vmeshche felipensantos tapojyotipaul vlad-nazarov sergtkachev lkampoli 5l1v3r1 ssgantayat cmsxbc ghostintheshellarise ankitshah009 cbigit yard1 ssrisunt tronindv rickfsa xwu99 qize itearsl emmacall homksei dmitrii-kriukov cuchulainx jinbeomjeong dave0412 anuragphadnis tirbo06 gshimansky establishedshark iideas18 gradient-ai harryone shi-kou ashishpatel26 lordoz234 rjachuthan abatomunkuev harunpehlivan alex-radio ekmixon samir-nasibli avasenin-148 hussain-catalyst yumorozov cvlearner dliofindia rajkubp020 sdoof greenbaum svpra isabella232 bmg-pcl

scikit-learn-intelex's Issues

Add distributed mode to QR

add all 3 steps to new entry in wrappers.py:has_dist
Most work will be adding the new pattern
- use something like map-reduce-star-plus if same can be used for SVC
- otherwise follow dist_custom pattern
add distributed example

Add distributed mode for covariance

add 2 steps to new entry in wrappers.py:has_dist, use pattern map-reduce-star
add distributed example

Gradient Boosted Classification Malloc Errors

I found 3 errors when running gradient boosted classification (the same errors may also come up with gradient boosted regression, which acts in a similar behavior). These bugs are inconsistent behavior but appear frequently when I run my code (with at least one error appearing about 60% of the time):

When I run my data (Pandas DataFrames) through the gradient boosted classifier training, 9/10 times my code will break and result in a malloc unsigned long error (see first image).

While trying to do more testing with this bug. I found a different inconsistent error while running the same exact code - this time saying there was a corrupted double-linked list somewhere in memory (see second image).

When I tried to troubleshoot this issue by double-checking that my data was in a valid format and re-casting the data as pandas DataFrames, I found another inconsistent error saying that there was a malloc memory corruption error (see third image).

I have already checked that there is no issue with the data itself, its size, or the code I run before it (i.e. the data is valid and should be accepted) - the issue is within gradient boosted classification.

Within the attached mallocproblems.zip zipfile:

mallocproblems.zip

• Daal4py_solution_11_place.py – The code that causes the malloc error when I run it. This is a shortened version that isolates the issue to the line causing the issue, which is the gradient boosted classifier training. A successful run will print the time in seconds it takes for the code to initialize centroids and train the data for KMeans and put it through a gradient boosted classifier.
• D4p_cluster_target_encoder.py – A helper class that encodes the training data, initializes centroids for the KMeans clustering model, then trains the KMeans model. This runs fine.
• Train.csv – The training data for this workflow coding example. (4209 rows with 378 columns)
• Test.csv – The test data for this workflow coding example (not used in the shortened daal4py_solution_11_place.py, but used for prediction)
• Df_classification_train.csv – the training data used for the original gradient boosted classification validation test
• Df_classification_test.csv – the test data used for the original gradient boosted classification validation test
• Gradient_boosted_classification_batch.py – the original validation test code. This works perfectly fine.
• Problem_gradient_boosted_classification.py – the validation test, with the only modification being the inclusion of the Kaggle data (and the KMeans clustering encoder) instead of the validation test’s original datasets. This originally revealed the inconsistent linked list and malloc unsigned errors.
• Inconsistent_malloclong_and_list.py – Very similar to problem_gradient_boosted_classification.py, but rather than the first parameter including np.number, it is including np.int_ (the 1s and 0s are casted as int). This code tends to make the linked list and malloc unsigned errors show up more frequently.
• Malloccorruption.py – Based on problem_gradient_boosted.py, this version has the first parameter does [include =(np.int_)] and then casts the entire dataframe to float, and the second parameter is changed from a pandas Series to a pandas DataFrame casted to float. Thus, the labels and training features are the same exact type of object (pandas dataframes casted to float), but interestingly enough this produces a malloc memory corruption error.

Add distributed mode for low order moments

add 2 steps to new entry in wrappers.py:has_dist, use pattern map-reduce-star
add distributed example

Enable PCA streaming algorithm

Previously we had not functionality to process specialized templates in our code generator. That is why we skipped this algorithm. Today it should be possible to add it. First of all we need to remove condition for its ignoring:
https://github.com/IntelPython/daal4py/blob/cf10c5af86be8951106d520a7683a18a9279923e/generator/gen_daal4py.py#L775

pca_transform_result.transformedData seems to leak memory

@fschlimb

Consider the following script:

import numpy as np
import daal4py
import os
import psutil

X = np.zeros((10**6, 100))
U = np.full((10, 100), 0.3, dtype='d')

n_iters = 200
proc = psutil.Process(os.getpid())
rss0 = proc.memory_info().rss

for _ in range(n_iters):
    res = daal4py.pca_transform(fptype='double').compute(
        X, U, dict()
    )
    Y = res.transformedData
    del res
    del Y
rss1 = proc.memory_info().rss

print( (rss1 - rss0) / X.nbytes )

Running it, prints out a number close to the value of n_iter for any of its value large enough, suggesting that memory is leaked within that loop.

Commenting out lines Y = res.transformedData and del Y, the value printed out becomes independent of the value of n_iter.

Having run Intel(R) Inspector --collect mi1, I see the following

Header of output of `inspxe-cl --report problems`

(vanilla) [09:51:22 ansatskl1004 pca_transform_memory_leak]$ inspxe-cl --report problems
P1: Error: Memory leak: New
 P1.147: Error: Memory leak: 32 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x32023d: Error X147: Allocation site: Function [Unknown]: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P2: Error: Memory leak: New
 P2.148: Error: Memory leak: 32 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3c15cb: Error X148: Allocation site: Function [Unknown]: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P3: Error: Memory leak: New
 P3.149: Error: Memory leak: 24 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3c1ded: Error X149: Allocation site: Function make_nt: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P4: Error: Memory leak: New
 P4.150: Error: Memory leak: 24 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3c9a1f: Error X150: Allocation site: Function make_dnt: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P5: Error: Memory leak: New
 P5.151: Error: Memory leak: 24 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3d7b26: Error X151: Allocation site: Function SharedPtr<unsigned char, daal::services::interface1::ServiceDeleter>: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P6: Error: Memory leak: New
 P6.152: Error: Memory leak: 48 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3dbda6: Error X152: Allocation site: Function NumericTable: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P7: Error: Memory leak: New
 P7.153: Error: Memory leak: 48 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3dbe00: Error X153: Allocation site: Function NumericTable: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P8: Error: Memory leak: New
 P8.154: Error: Memory leak: 24 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3dc0a4: Error X154: Allocation site: Function NumericTable: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P9: Error: Memory leak: New
 P9.155: Error: Memory leak: 24 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3dc0fc: Error X155: Allocation site: Function NumericTable: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P10: Error: Memory leak: New
 P10.156: Error: Memory leak: 24 Bytes: New
  _daal4py.cpython-36m-x86_64-linux-gnu.so!0x3f3dae: Error X156: Allocation site: Function create: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/_daal4py.cpython-36m-x86_64-linux-gnu.so

P11: Error: Memory leak: New
 P11.143: Error: Memory leak: 72 Bytes: New
  libdaal_core.so!0x1cd7c8b: Error X143: Allocation site: Function Argument: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/../../libdaal_core.so

P12: Error: Memory leak: New
 P12.144: Error: Memory leak: 56 Bytes: New
  libdaal_core.so!0x210356c: Error X144: Allocation site: Function Atomic: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/../../libdaal_core.so

P13: Error: Memory leak: New
 P13.160: Error: Memory leak: 80013344 Bytes: New
  libdaal_core.so!0x5acff49: Error X160: Allocation site: Function fpk_serv_malloc: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/../../libdaal_core.so

P14: Error: Memory leak: New
 P14.146: Error: Memory leak: 8 Bytes: New
  libdaal_core.so!0x5ad0122: Error X146: Allocation site: Function fpk_serv_malloc: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/lib/python3.6/site-packages/../../libdaal_core.so

P15: Error: Memory leak: New
 P15.157: Error: Memory leak: 185127 Bytes: New
  /tmp/build/80754af9/python_1546130271559/work/Objects/obmalloc.c(1446): Error X157: Allocation site: Function _PyObject_Alloc.isra.0: Module /localdisk/work/opavlyk/miniconda3_cb3/envs/vanilla/bin/python3.6

failed to build daal4py with daal built from opensource

hi
i failed to build daal4py with the daal built from https://github.com/intel/daal/archive/2020.tar.gz, it would report some error as below:
work/daal4py/build/daal4py_cpp.h:1396:49: error: ‘weight’ is not a member of ‘daal::algorithms::gbt::training’
1396 | {"weight", daal::algorithms::gbt::training::weight},
| ^~~~~~
/home/pnp/work/daal4py/build/daal4py_cpp.h:1397:53: error: ‘totalCover’ is not a member of ‘daal::algorithms::gbt::training’
1397 | {"totalCover", daal::algorithms::gbt::training::totalCover},

seems daal4py uses different version of daal than the opensource version 2020, as the class definition in the header file is not aligned. How can i make it work?

take gbt_training_parameter.h for example:
this one is from daal github 2020 pkg:
enum ResultsToCompute
{
computeWeight = 0x001ULL,
computeTotalCover = 0x002ULL,
computeCover = 0x004ULL,
computeTotalGain = 0x008ULL,
computeGain = 0x010ULL,
};

below one is used by daal4py:
enum VariableImportanceModes
{
weight = 0x001ULL,
totalCover = 0x002ULL,
cover = 0x004ULL,
totalGain = 0x008ULL,
gain = 0x010ULL,
};

import daal4py fails on MacOSX

Create an environment b_d4py with Python 3.7.6 needed to build daal4py:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/osx-64/bzip2-1.0.8-h01d97ff_1.tar.bz2
https://conda.anaconda.org/intel/osx-64/daal-include-2020.0-intel_166.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/ca-certificates-2019.11.28-hecc5488_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libcxx-9.0.1-1.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/llvm-openmp-9.0.0-h40edb58_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/mpi-1.0-mpich.tar.bz2
https://conda.anaconda.org/intel/osx-64/tbb-2020.0-intel_166.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/xz-5.2.4-h1de35cc_1001.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/zlib-1.2.11-h0b31af3_1006.tar.bz2
https://conda.anaconda.org/intel/osx-64/daal-2020.0-intel_166.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libffi-3.2.1-h6de7cb9_1006.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libgfortran-4.0.0-2.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libllvm9-9.0.1-ha1b3eb9_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/ncurses-6.1-h0a44026_1002.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/openssl-1.1.1d-h0b31af3_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/tapi-1000.10.8-ha1b3eb9_4.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.10-hbbe82c9_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/clang-9.0.1-default_hf57f61e_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/ld64-450.3-h3c32e8a_3.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libopenblas-0.3.7-h4bb4525_3.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/mpich-3.3.2-hc856adb_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/readline-8.0-hcfe32e1_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/cctools-927.0.2-h5ba7a2e_3.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/clangxx-9.0.1-default_hf57f61e_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libblas-3.8.0-14_openblas.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/sqlite-3.30.1-h93121df_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/compiler-rt_osx-64-9.0.1-h6a512c6_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/libcblas-3.8.0-14_openblas.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/liblapack-3.8.0-14_openblas.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/python-3.7.6-h5c2c468_1.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/certifi-2019.11.28-py37_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/compiler-rt-9.0.1-h6a512c6_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/cython-0.29.14-py37h4a8c4bd_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/markupsafe-1.1.1-py37h0b31af3_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/numpy-1.17.3-py37hde6bac1_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/clang_osx-64-9.0.1-h05bbb7f_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/setuptools-41.6.0-py37_1.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/clangxx_osx-64-9.0.1-h05bbb7f_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/jinja2-2.10.3-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/wheel-0.33.6-py37_0.tar.bz2
https://conda.anaconda.org/conda-forge/osx-64/pip-19.3.1-py37_0.tar.bz2

Activate the environment, and build the latest source of daal4py (2958265) as follows:

LDFLAGS="${LDFLAGS//-Wl,-dead_strip_dylibs}" \
 LDFLAGS_LD="${LDFLAGS_LD//-dead_strip_dylibs}" \
 LDSHARED="-bundle -undefined dynamic_lookup -Wl,-pie -Wl,-headerpad_max_install_names" \  
 DAAL4PY_VERSION=2020 \
 TBBROOT=$CONDA_PREFIX \
 MPIROOT=$CONDA_PREFIX \
 DAALROOT=$CONDA_PREFIX \
 python setup.py build_ext --inplace

It builds without errors. However, running "python -c 'import _daal4py'" fails:

(b_d4p) [14:02:40 mymac daal4py]$ python -c 'import _daal4py; print(_daal4py.__version__)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit__daal4py)

Indeed, the native extension does not seem to have the symbol, as checked with nm -A _daal4py.cpython-37m-darwin.so | grep PyInit.

Downgrading Python from 3.7.6 to 3.7.3 with conda install -c conda-forge python=3.7.3, and repeating build steps gives a working extension:

(b_d4p) [14:09:12 mymac daal4py]$ python -c 'import _daal4py; print(_daal4py.__version__)'
(2020, 0)

This is reminiscent of numpy/numpy#13822

Reproduced in:

      System Version: OS X 10.11.4 (15E65)
      Kernel Version: Darwin 15.4.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal

Remove L0 dependency in daal4py build process and CI

To align with oneDAL Beta06 release L0 dependency was introduced on daal4py side in #209

This dependency should be removed with next oneDAL release as L0 handling process have been changed

Change in behavior due to #19

@fschlimb Binary search implicated #19 in breakage affecting sklearn patches (#15). I left a comment in that PR, but thought it's likely better to file a separate issue.

Specifically, the following script

# d4py_log_loss.py
import numpy as np
import daal4py


def getFPType(X):
    dt = getattr(X, 'dtype', None)
    if dt == np.double:
        return "double"
    elif dt == np.single:
        return "float"
    else:
        raise ValueError("Input array has unexpected dtype = {}".format(dt))


def make2d(X):
    if np.isscalar(X):
        X = np.asarray(X)[np.newaxis, np.newaxis]
    elif isinstance(X, np.ndarray) and X.ndim == 1:
        X = X.reshape((X.size, 1))
    return X


def _resultsToCompute_string(value=True, gradient=True, hessian=False):
    results_needed = []
    if value:
        results_needed.append('value')
    if gradient:
        results_needed.append('gradient')
    if hessian:
        results_needed.append('hessian')

    return '|'.join(results_needed)


def _daal4py_logistic_loss_extra_args(
        nClasses_unused, beta, X, y, l1=0.0, l2=0.0, fit_intercept=True, 
        value=True, gradient=True, hessian=False):
    X = make2d(X)
    nSamples, nFeatures = X.shape

    y = make2d(y)
    beta = make2d(beta)
    n = X.shape[0]

    results_to_compute = _resultsToCompute_string(value=value, 
        gradient=gradient, hessian=hessian)

    objective_function_algorithm_instance = daal4py.optimization_solver_logistic_loss(
        numberOfTerms = n,
        fptype = getFPType(X),
        method = 'defaultDense',
        interceptFlag = fit_intercept,
        penaltyL1 = l1 / n,
        penaltyL2 = l2 / n,
        resultsToCompute = results_to_compute
    )
    objective_function_algorithm_instance.setup(X, y, beta)

    return (objective_function_algorithm_instance, X, y, n)


def _daal4py_loss_and_grad(beta, objF_instance, X, y, n):
    beta_ = make2d(beta)
    res = objF_instance.compute(X, y, beta_)
    gr = res.gradientIdx
    if gr is None:
        print(X)
        print(y)
        print(beta_)
    gr *= n
    v = res.valueIdx
    v *= n
    return (v, gr)


if __name__ == '__main__':
    X, Y1 = np.array([[-1, 0], [0, 1], [1, 1]], dtype=np.double), np.array([0, 1, 1], np.double)
    X = X[-1:]
    y = Y1[-1:]

    beta = np.zeros(3, dtype=np.double)

    objF, X2d, y2d, n = _daal4py_logistic_loss_extra_args(
        1, beta, X, y, l1=0.0, l2=1., 
        value=True, gradient=True, hessian=False)
    _daal4py_loss_and_grad(beta, objF, X2d, y2d, n)

runs fine in daal4py built from 2133108, but fails with an error in package built from 08f1301.

Specifically, res.gradientIdx returns None, rather than an array.

DAAL SVM performance vs sklearn 0.22

I tried SVM with webpage_train.csv (This dataset has 23187 samples with 300 features, 2 classes) with scikit-learn and DAAL4PY on intel cascade lake server. It turns out scikit-learn can finish the training quickly, but i've run with daal4py for like several minutes (to run the origianl scikit-learn script with -m daal4py), it did not finished. And libdaal_core.so keeps as hottest spot through perf top.

I'd like to confirm that is it as expected. that daal svm lags behind sklearn on svm ? or I need to check/tweak anything to make daal perform better.

distributed pca crashes on macos

Accessing the result attributes in distributed pca leads to segmentation fault.
This happens on macOS only.

Task: investigate test failures in unpatched PyPI scikit-learn in our CI

CircleCI shows that unpatched 0.21.2 does not pass all the tests in our CI, with the number of failures changing from run-to-run. Sometimes 11, sometimes 13, sometimes more.

Here is a sample:

FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/covariance/tests/test_graph_lasso.py::test_graph_lasso
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/covariance/tests/test_graphical_lasso.py::test_graphical_lasso
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/ensemble/tests/test_gradient_boosting.py::test_boston[0.5-ls-auto]
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/ensemble/tests/test_gradient_boosting.py::test_boston[0.5-ls-True]
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/ensemble/tests/test_gradient_boosting.py::test_boston[0.5-ls-False]
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/linear_model/tests/test_coordinate_descent.py::test_sparse_input_convergence_warning
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/linear_model/tests/test_least_angle.py::test_lars_cv_max_iter
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/linear_model/tests/test_omp.py::test_omp_cv
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/model_selection/tests/test_search.py::test_grid_search_cv_results
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/model_selection/tests/test_search.py::test_random_search_cv_results
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/model_selection/tests/test_search.py::test_grid_search_failing_classifier
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/preprocessing/tests/test_data.py::test_standard_scaler_numerical_stability
FAILED miniconda/envs/bld/lib/python3.6/site-packages/sklearn/tree/tests/test_export.py::test_precision
= 13 failed, 12031 passed, 45 skipped, 218 deselected, 3 xfailed, 20033 warnings in 684.45 seconds =

This task is to investigate the root cause.

Add missing parameters to PCA

PCA parameters 'covariance' and 'normalization' are currently ignored.

PCA is one of the algorithms in DAAL which has multiple different Parameter classes. This was not properly handled win earlier version of daalpy. With 1c7e737 the trickiest part has been fixed.

The following need to be done in generator/wrappers.py

remove 'covariance', and 'normalization' from the 'ignore' list for PCA
add interface classes/definitions where the new parameters need an interface type
add a test

'correlation' input argument requires handling of optional input. Let's address this later.

Missing _classification_model Parameters Info in daal4py Documentation

The daal4py documentation only provides information on the parameter types for the daal4py.decision_forest_classification_model parameters. There is no other information (purpose, other ranges of values, etc.) on these parameters found in the daal4py documentation or the Intel DAAL Developer Guide.

This same issue is found for all daal4py algorithms with a model class in the daal4py documentation.

The documentation should be modified to include the purpose of the model class parameters as well as any other relevant information.

Sklearn integration of RF predict uses float fptype always for daal4py

https://github.com/IntelPython/daal4py/blob/ca67cc2d070ff27318e210e4b74f05ab7b9fa1d0/daal4py/sklearn/ensemble/decision_forest.py#L285

Looking at the code - we use float type always for RF predict. Probably, it's handled in another place, but it looks like an issue.

Support optional input arguments

generator/wrappers.py lists input arguments that are currently ignored, because we do not support optional input arguments. We want them to be exposed as optional args to compute.

in wrappers.py: move optional arguments from 'ignore' to 'defaults'
in wrapper_gen.py, template 'parent_wrapper_template': use 'decl_dflt_cy' instead of 'decl_cy' when generating compute method of cython class
if it adds the wrong default value, look in format.py how the default value is computed for 'decl_dflt_cy'

I suggest starting with one algorithm, like svm. Then, when this works, add all other 'weights'. Then the rest.

Notice: we do not want all inputs listed in 'ignore' to be exposed as optional arguments, like the optimization solvers have arguments that are used internally only.

@oleksandr-pavlyk

daal4py patches to scikit-learn cause test_dtype_match to fail

With #84 daal4py is used to compute logistic loss and its gradient with solver='newton-cg'. This causes

python -m daal4py -m pytest --pyargs sklearn.linear_model.tests.test_logistic::test_dtype_match -ra

to fail, so it is currently being deselected, see deselected_tests.yaml#L31.

This issue is to document the deselection and the failure:

>       assert_allclose(lr_32.coef_, lr_64.coef_.astype(np.float32), rtol=rtol)
E       AssertionError:
E       Not equal to tolerance rtol=1e-06, atol=0
E
E       Mismatch: 100%
E       Max absolute difference: 0.00011081
E       Max relative difference: 0.00011659
E        x: array([[0.950293, 0.723663]], dtype=float32)
E        y: array([[0.950404, 0.723695]], dtype=float32)

tailored daal4py build

Starting from 2019u1.1 DAAL has feature to be built with specified set of algorithms. It is impossible to build daal4py out_of_the_box with such DAAL - will see undefined references.
Add the tailored build feature to daal4py please.

Ubuntu 18.04: ModuleNotFoundError: No module named 'daal4py'

Hi
When I try to run any sample example using Anaconda & jupyter notebook in Ubuntu 18.04, it says
ModuleNotFoundError: No module named 'daal4py'

But I can run the same code on python interpretor and Ubuntu 16.04 via anaconda & Jupyter notebook.
Have anyone else identified this issue? I am experienceing this issue after I migrated to Ubuntu 18.04 itself. Any solution?
Thank you

RuntimeError: Failed to solve the system of normal equations

Hi,
I am running a list of regression models and I am getting the following:
Traceback (most recent call last):
File "/home/XXXXXX/iowa_housing_functions1.py", line 293, in
col_num=columns_num, col_cat=columns_cat, data=df_train,run_from_cache=False)
File "/home/XXXXX/iowa_housing_functions1.py", line 217, in run_regression
regressor.fit(X_preprocessed_train, y_train)
File "/home/XXX/miniconda3/envs/idp3/lib/python3.6/site-packages/sklearn/linear_model/ransac.py", line 366, in fit
base_estimator.fit(X_subset, y_subset)
File "/home/XXX/miniconda3/envs/idp3/lib/python3.6/site-packages/sklearn/daal4sklearn/linear.py", line 81, in fit
_daal4py_fit(self, X, y)
File "/home/XXX/miniconda3/envs/idp3/lib/python3.6/site-packages/sklearn/daal4sklearn/linear.py", line 20, in _daal4py_fit
lr_res = lr_algorithm.compute(X, y)
File "build/daal4py_cy.pyx", line 7445, in _daal4py.linear_regression_training.compute
RuntimeError: Failed to solve the system of normal equations

It does not happen when I run it from another virtual environment with non-intel python and sklearn

Thanks a lot.

Defer (default) fptype/method selection until data arrives

Analyze underlying types of arrays passed to compute() method. Usually we pass numpy arrays into compute() method, but don`t care about the underlying type of this array. So if we pass numpy array read from csv (where dtype=np.float64 by default) and feed the algorithm with it, we may give significant performance degradation caused by internal data conversion from double to float (if algorithm is specified by fptype=’float’). So maybe it is useful to develop automatic fptype detection based on data, which the algorithm is fed with. Or simply notify user by warning in stdout about conversion of data to be made.

The same issues exists for CSR input, the user should by default get the fastest method and not be required to select csr method manually.

The problem here is of course that we cannot create the algorithm until we know the input data. In general it should be possible to defer the creation until we have the data. There are some technical details in daal4py to work out. More importantly this raises a few user-visible issues, like

DAAL’s parameter checking will be deferred as well (and so the user will get a message triggered by a ‘unrelated’ line of code)
what should happen if a kernel is setup by the user with partial input-data and DAAL uses it internally by other algorithms (like optimization solver pattern)? What if the user changes the partial input?

Refresh documentation for streaming algorithms

We should add info about QR and Covariance in
https://github.com/IntelPython/daal4py/blob/master/doc/streaming.rst
add PCA when it will be ready
#158

add windows CI

ADD appveyor CI
@anton-malakhov might be able to help

Add distributed mode implicit ALS

dist_custom
use C++ MPI example to implement dist class
@fschlimb also has a CnC version which might be helpful because of it's modular structure

Numerical precision issues in K-means

The way inertia is computed in daal can be different from the real one which would be computed using the labels, with an arbitrarily large relative error. Below is an example of such behavior with intel sklearn:

import numpy as np
from sklearn.cluster import KMeans

X = np.array([[-1.0001],[-0.9999],[0.9999],[1.0001]], dtype=np.float32)
km = KMeans(n_clusters=2, n_init=1, algorithm='full')
km.fit(X)

km.cluster_centers_
>>> [[ 1.]
     [-1.]]

km.inertia_
>>> 0.0

((X - km.cluster_centers_[km.labels_])**2).sum()
>>> 4.0013276e-08

Here the fitted and exact inertia differ in every digit !

The issue is not about the centers which are the right ones, so computing the distances with ||x||² - 2x.y + ||y||², which can lead to catastrophic cancellation, is fine to find the clusters but not to compute the inertia.

In daal, each time a cluster is found to be the closest from a sample, their distance, computed with the above formula, is added to the inertia. The right way would be not to update inertia incrementally but compute it using the labels and centers at the end of the iteration.

I think this is the reason of the failing sklearn test. With several inits, a same clustering with labels permuted can be found with different computed inertia. Can you confirm that the failure only occurs with lloyd algorithm (elkan uses the safe distance formula)?

@fschlimb @oleksandr-pavlyk @ogrisel

Improve generation of documentation (comment parser)

Doxygen documentation witch is used in DAAL supports comments with doc strings before class members. There are only few places with such doc strings but we don't support them yet.

Add better print behavior for low_order_moments()

The low order moments class has the advantage of being able to process an entire array of features, but does not make it easy to see the results without individually selecting it.

For printing the entire result array, it would be preferred to print ALL of the results if using the print() function.

metrics_processor = d4p.low_order_moments()
data = metrics_processor.compute(dataset.values)
print(data) # this does not do anything smart at the moment
print(data.standardDeviation) # but this does print the result array

From a data scientist perspective, better printing behavior would be useful especially in the Jupyter Notebook arena. Consequently if you add Dataframe support, it might make it easier for easier printing.

Scikit-Learn estimator for RandomForestClassifier fails in check_estimator()

Estimator fails during check_methods_subset_invariance.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L853
Need to be investigated.
For now this test is excluded during executing check_estimator suite.

Expose daal::services::getNumberOfThreads in daal4py

Currently one can effectively call daal::services::setNumberOfThreads(n) via daal4py.daalinit(nthreads=n).

In pyDAAL it used to be possible using nThreadsNew = Environment.getInstance().getNumberOfThreads() as seen in this example.

conda install daal4py forces a downgrade on the Python

Prerequisites state the following:

Prerequisites
Python version 2.7 or >= 3.6

However, the problem here is that 3.7 (which is greater than 3.6) is not supported.
conda install daal4py forces the python version inside my conda environment to be downgraded from 3.7 to 3.6.8.

add kmeans parallel++

distributed mode in kmeans-init currently does not support parallelPlusPlus method. Needs to be added.

Add computation of leftSingularMatrix to distributed SVD

add an extra step (Distributed) to corresponding entry in wrappers.py:has_dist
Most work will be adding the new pattern
- use something like map-reduce-star-plus if same can be used for QR(#49)
- otherwise follow dist_custom pattern
add using leftSingularMatrix to svd examples

SVM Runtime Error

I found a strange error while running SVM. When I run my data (a sklearn toy dataset) through the SVM training algorithm, I receive the following error:

However, both the data and the labels arguments put into the function have the same number of rows and are in an acceptable daal4py format, so I am unsure why I am receiving this error (see below).

This error occurs when I run the data with numpy arrays and Pandas DataFrames.

When I run the same data set in the same format (numpy arrays) through KNN Classification, the error does not occur and the code runs fine.

My code that causes the error:

svmerrors.zip

A successful run will print the statement "Example run successful!"

Add covariance to PCA

Parameter "covariance" defined multiple times with different types.

Undocumented Third Parameter Acceptable in Decision Forest Regression Training .compute() function

While coding, I discovered that decision_forest_regression_training.compute() function is allowed to take in a third parameter, however, according to the daal4py documentation, only 2 input parameters are allowed: data and dependentVariable - meaning that this third parameter is undocumented. See here: https://intelpython.github.io/daal4py/algorithms.html#daal4py.decision_forest_regression_training.compute

Upon further investigation, placing in a 1-d vertical numpy array as the value for the third parameter, then running the training/prediction for Decision Forest Regression does not impact the prediction results (same result as only data and dependentVariable being inputted).

See pic below (does not error out):

What is the purpose of this third mystery parameter, and why is it undocumented?

My test code: third_param.zip

Native pandas dataframe handling

Most algorithms in daal4py take numpy arrays, but this is an extra step that most data scientists prefer not to have to do.

It would be easier to be able to put in a direct Dataframe instead of a NumPy array into the models, as we normally just call '.values' anyways.

It would go from this:

target = np.array(dataset.columnname.values, ndmin=2).T
d4p_lm = d4p.linear_regression_training(interceptFlag=True)
lm_trained = d4p_lm.compute(feature_dataset.values, target)

to this:

d4p_lm = d4p.linear_regression_training(interceptFlag=True)
lm_trained = d4p_lm.compute(feature_dataset, target_df_or_series)

Note that this would 1) simplify the calls if using a single Y vector to not have the above numpy array cast, and 2) would allow us to keep internal information (such as column names and dropped feature frames) within the model. Both of these would be good additions to daal4py.

Replace default device by host device

In https://github.com/IntelPython/daal4py/blob/master/src/oneapi/oneapi.h when we have an issue with selecting of device the default device is selecting. It should be not the default but the host device because devices can have various priorities and the default device can be not the best choice.

Intel Scikit-learn vs daal4py

Both Intel Scikit-learn and daal4py uses Intel DAAL at backend so what's the basic difference? One thing that I for to know is daa;4py can be used during Scaling due to availability of Intel MPI. I hopw I am correct! Any thing else?
please help

Broken References in K-Means Initialization Parameter Documentation

Three of the K-Means Initialization parameters (nTrials, oversamplingFactor, & nRounds) say to refer to a section in references [1] or [2] .

See image:

However, these references do not exist anywhere in the daal4py documentation or the Intel DAAL Dev documentation. I am unable to locate or edit these parameter descriptions in algorithms.rst, as they are automatically populated as members of an autoclass from daal4py.kmeans_init.

See image:

Please add in the references and their corresponding sections to the daal4py documentation or remove the broken references.

Add quality metrics

we need to come up with a useful and generic API
define mapping DAAL -> daal4py
do it

SVD spmd with leftSingularMatrix param throws RuntimeError

version 2019.5 Linux

import daal4py as d4p
import numpy as np
import os, os.path

# initialized DAAL4Py
d4p.daalinit()

fn = os.path.join('data', 'X_' + str( d4p.my_procid() ) + '.csv')

X = np.loadtxt(fn)

print("X.size",X.size)
assert X.size > 0

svd_distr = d4p.svd(
    fptype='double',
    distributed=True,
#    leftSingularMatrix = 'notRequired',
    rightSingularMatrix = 'notRequired'
    )
res = svd_distr.compute(X)

if d4p.my_procid() == 0:
    print(res)

Traceback (most recent call last):
File "svd_distributed.py", line 28, in
res = svd_distr.compute(X)
File "build/daal4py_cy.pyx", line 14355, in _daal4py.svd.compute
File "build/daal4py_cy.pyx", line 14340, in _daal4py.svd._compute
RuntimeError: Incorrect number of elements in input collection
Details:
Argument name: inputOfStep3FromStep2

measure accuracy and confusion matrix

Hi
In daal4py, can we calculate accuracy score of model along with confusion matrix, precision, recall, F1 score etc ? How? Any example?

Performance Warnings

It'd be nice to get performance warnings in certain cases, such as

repeated use of non-contiguous ndarrays leads to repeated conversion to DAAL tables
different input types than algorithm's fptype waste a lot of time in conversion

Latest daal4py doesn't work well under Ubuntu python 3 env.

Repo:

Install latest anaconda.
Create clean env using: conda create -n test python=3.6.6.
Activate env: conda activate test
Install daal4py: conda install -c intel daal4py.

Then just run python and use "import daal4py", I got the following error.
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import daal4py
Traceback (most recent call last):
File "", line 1, in
File "/home/jonas/anaconda3/envs/test/lib/python3.6/site-packages/daal4py/init.py", line 2, in
from _daal4py import *
File "init.pxd", line 918, in init _daal4py
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

There is no error on windows with the same instructions. And if I use python=2.* on Linux, there is no errors too.

DBScan

Add dbscan to daal4py.

Distributed mode is more complex than just map-reduce.

DAAL's decision_forest_regression_training may produce non-deterministic results

This manifests itself in failure of check_fit_idempotent (a test new in 0.21):

from sklearn.utils.estimator_checks import check_fit_idempotent
from daal4py.sklearn.ensemble import  RandomForestRegressor

check_fit_idempotent('RandomForestRegressor', RandomForestRegressor())

with a sample output of

AssertionError:
Not equal to tolerance rtol=1e-07, atol=1e-09

Mismatch: 25%
Max absolute difference: 1.1920929e-07
Max relative difference: 3.2737175e-06
 x: array([ 0.504864,  0.239177, -0.960478, -0.204671,  0.731089,  0.262303,
       -0.675667, -0.779623,  0.362969, -2.344096,  0.38684 ,  0.104189,
       -0.363802, -0.199642, -0.482426,  0.326316,  0.003698, -0.714691,
       -0.858935,  0.370866], dtype=float32)
 y: array([ 0.504864,  0.239177, -0.960478, -0.204671,  0.731089,  0.262303,
       -0.675667, -0.779623,  0.362969, -2.344096,  0.38684 ,  0.104189,
       -0.363802, -0.199642, -0.482426,  0.326316,  0.003698, -0.714691,
       -0.858935,  0.370866], dtype=float32)

the amount of mismatch varies from run to run.

For this reason the check_fit_idempotent is being skipped for RandomForestRegressor right now (see 7ab470b).

Intel(R) DAAL team is aware of the issue and is working on a fix.

This issue is here to re-enable check_fit_idempotent for RandomForestRegressor once the fix is available in the DAAL itself.

@fschlimb @Alexander-Makaryev

remove TBB references

At this moment TBB is not necessary but it is still mentioned in scripts and README.md. It should be removed.

pandas' fast csv reader explaination

Hi
I am looking for explaination of pandas' fast csv reader from sample examples. Any detailed reference would help which talks about all parameters and their role.

read_csv = lambda f, c, t=np.float64: pandas.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t)
read_csv = lambda f, c, t=np.float64: np.loadtxt(f, usecols=c, delimiter=',', ndmin=2)

daal4py patches make sklearn.preprocessing.tests.test_discretization::test_nonuniform_strategies fail

Using scikit-learn 0.20.3, run

python -m daal4py -m pytest --disable-warnings --pyargs sklearn.preprocessing.tests.test_discretization::test_nonuniform_strategies

The test fails for strategy='kmeans' for 5 bins.

The following script replicates the issue:

# km.py
import numpy as np
import daal4py
import sklearn.cluster

X = np.array([0, 0.5, 2, 3, 9, 10]).reshape(-1, 1)
init = np.array([1, 3, 5, 7, 9], dtype=np.double).reshape(-1,1)

init_labels_ = np.array([np.argmin(np.square(X[k] - init).sum(axis=-1)) for k in range(X.shape[0])])
init_inertia_ = np.square(X - init[init_labels_]).sum()

cl0 = sklearn.cluster.KMeans(n_clusters = 5, n_init=1, init=init, algorithm='full', tol=1e-8, max_iter=1)
cl0.cluster_centers_ = init
assert np.all(np.equal(cl0.predict(X), init_labels_))

print("Initial configuration stats")
print("cluster_centers = ", init)
print("labels ", init_labels_)
print("inertia " , init_inertia_)

for k in range(1, 4):
    print("==" * 80)
    cl = sklearn.cluster.KMeans(n_clusters = 5, n_init=1, init=init, algorithm='full', tol=1e-8, max_iter=k).fit(X)

    print("cluster_centers = ",cl.cluster_centers_)
    print("labels ", cl.labels_)
    print("inertia ", cl.inertia_, np.square(X - cl.cluster_centers_[cl.labels_]).sum())
    print(cl.n_iter_)

Executing without patches python km.py and with patches as python -m daal4py km.py reveals the following

{{'iter': 0, 
    'orig': {'cc': np.array([[1],[3],[5],[7], [9]]), 'inertia': 3.25 }, 
    'daal4py' : {'cc': np.array([[1],[3],[5],[7], [9]]), 'inertia': 3.25}},
 {'iter': 1,
     'orig': {'cc': np.array([[5/6],[3],[10],[2], [9.5]]), 'inertia': 1 + 5/90. },
     'daal4py': {'cc': np.array([[5/6],[3],[0],[2], [9.5]]) , 'inertia': 6/10 + 1/90 }},  # daal4py choice seems better
{'iter': 2,
    'orig': {'cc': np.array([[ 1/4 ],[3],[10],[2], [9]]), 'inertia': 0.125},    # end configuration ends up better
    'daal4py': {'cc': np.array([[ 0.5 ],[3],[0],[2], [9.5]]), 'inertia': 0.5 }},
}

The discrepancy between two runs in how algorithms handle vacuous clusters. Label assignment for the initial configuration is [0, 0, 0, 1, 4, 4], meaning that centers 2 (initial position [5]) and 3 (initial position [7]) are vacuous.

Scikit-learn assigns observation [10] to center 2, and observation [2] for center 3.
Daal4py assigns observations [0] to center 2, and observation [2] for center 3.

@jeremiedbb Could you comment, please. My stance would be that scikit-learn should not use denser dataset, which would avoid the issue of vacuous clusters for KMeans in the test for discretizer.