Giter VIP home page Giter VIP logo

sap-samples / hana-ml-samples Goto Github PK

View Code? Open in Web Editor NEW
89.0 24.0 55.0 102.63 MB

This project provides code examples for SAP HANA Predictive and Machine Learning scenarios and is educational content. It covers simple Predictive Analysis Library SQL examples as well as complete SAP HANA design-time “ML scenario”-application content or HANA-ML Python Notebook examples.

License: Apache License 2.0

Jupyter Notebook 96.50% Python 0.21% HTML 3.26% R 0.01% ABAP 0.01% CAP CDS 0.01% JavaScript 0.01% CSS 0.01%
sample sample-code predictive ml sap-hana machine-learning

hana-ml-samples's Introduction

REUSE status

SAP HANA Predictive and Machine Learning Scenarios

Description

This project provides code examples for SAP HANA Predictive and Machine Learning scenarios and is educational content. It covers simple Predictive Analysis Library SQL examples as well as complete SAP HANA design-time “ML scenario”-application content or HANA-ML Python Notebook examples.

Requirements

In order to "run" the provided sample codes, a SAP HANA database environment is required with the AFL-component installed, which includes the Predictive Analysis Library (PAL). Specific sample files will specify additional requirements if required.

Download and Installation

The sample files can be downloaded and used within the respective user / developer environment, e.g. SQL files may be opened and used within the SQL console of SAP HANA Studio or SAP HANA Database Explorer. The sample files don't require a install step for themselves, they are simply downloaded and then opened in the respective editor.

How to obtain support

Create an issue in this repository if you find a bug or have questions about the content.

License

Copyright (c) 2019 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.

hana-ml-samples's People

Contributors

andreasforster avatar brookzx avatar btbernard avatar cherepnev avatar cmog avatar d054070 avatar frankgottfried avatar marcdaniau avatar raymondyao avatar rronaldk avatar sygyzmundovych avatar xinchen510 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hana-ml-samples's Issues

optimal_parameter : None

Hello ,

I am doing hyperparameter tuning using unified classification. But I am getting optimal parameter as "None" in report. I tried both RandomSearchCV and GridSearchCV. Can you please tell me where I am doing wrong.

rfc = UnifiedClassification(func = 'RandomDecisionTree')

gscv = GridSearchCV(estimator=rfc,
param_grid={'n_estimators': [50,80,90,100,200,300,400,500,600,800],
'max_depth': [50, 60, 70, 80, 90, 100, 110,120,150],
'split_threshold': [0.1, 0.4, 0.7, 1],
'min_samples_leaf':[1,2,3,4]},
train_control=dict(fold_num=5,
random_state=1,
resampling_method='cv'),
scoring='error_rate')

gscv.fit(data=hdf_train, key= 'id',
label='label',
features=features,
partition_method='stratified',
partition_random_state=1,
stratified_column='label',
build_report=True)

image

ERROR:hana_ml.algorithms.pal.unified_classification

Hello ,

while using Grid Search CV with unified classification I am getting below error :

ERROR:hana_ml.algorithms.pal.unified_classification:(423, 'AFL error: "HC_APL"."(DO statement)": line 65 col 1 (at pos 2182): search table error: _SYS_AFL.AFLPAL:UNIFIED_CLASSIFICATION_ANY: [423] (range 3) AFL error exception: exception 73001043: PAL error[73001043]:Value [0] of parameter TRY_NUM is invalid. \n')

I haven't declared any "TRY_NUM " variable in my code. Can you please tell me where I am doing wrong.

code :
`# train model with hyperparameter tunning
rfc = UnifiedClassification(func = 'randomforest')

gscv = GridSearchCV(estimator=rfc,
param_grid={'n_estimators': [50,80,90,100,200,300,400,500,600,800],
'max_depth': [50, 60, 70, 80, 90, 100, 110,120,150],
'split_threshold': [0.1, 0.4, 0.7, 1],
'min_samples_leaf':[1,2,3,4]},
train_control=dict(fold_num=5,
random_state=1,
resampling_method='cv'),
scoring='error_rate')

gscv.fit(data=hdf_train, key= 'CC_KEY', label='NAT_FUNCTION',partition_random_state=1,
features=features,
partition_method='stratified',
stratified_column='class',
build_report=True)
`

Can't append new data to existing table.

Hello,

I currently have a table to save data, the idea is that new data is constantly being introduced and saved/append to the last rows of said table.
Our team currently generates several indexes on this table:
image

I'm currently dealing with two issues:

  1. I currently have 4000 rows in the table, I'm adding 2000 new rows (with the same columns) via Pandas Dataframes, from Pandas I'm handling this new 6000 rows dataframe (old 4000+new 2000) and pushing it using the following code
    hana_ml.dataframe.create_dataframe_from_pandas(cc, df_upload, 'BUPRY', schema=schema, force=True, primary_key='INDEX')

However with this the table gets dropped and a new one gets created with the new uploaded data, in this process the "INDEXES" also get cleaned up.

  1. I have been trying to not drop the table to prevent said "INDEXES" tab from getting cleaned, mainly playing with the two following parameters:
  • force=False/True
  • drop_exist_tab=False/True

When changing the value of these to both False I have noticed the following warning on the console
Table already exists. Data will be appended to the existing table. To set "force=True" can empty the table.
Which is the correct outcome I am looking for, however I am getting the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\bupryhr_dev\venv\lib\site-packages\hana_ml\dataframe.py", line 4658, in create_dataframe_from_pandas
    cursor.executemany(sql, rows)
hdbcli.dbapi.IntegrityError: (301, 'unique constraint violated')

I have been trying changing different parameters, uploading the total 6000 rows and just the 2000 new ones, not specifying the primary key, but nothing seems to work and I keep getting the same error.

How can I fix this? Thanks in advance.

APL hyper-parameter tunning

Hello ,
Do we need hyperparameter tuning while using SAP APL models ?
as per document there are default values for learning_rate,max depth ,max_iterations,early stopping . After using this default values my model is giving me 81% ACC on training dataset. But I need to further improve the model ACC. I got nowhere about hyper-parameter tuning about APL models.
In APL, how can I tune above parameters , do we have any options to activate correct auto-selection for above hyperparameters or model do it by themselves at backend and no need to do any hyperparameter tuning for APL models ?

my code :
apl_model = GradientBoostingClassifier() apl_model.set_params(variable_auto_selection = True,variable_selection_max_nb_of_final_variables = '6') apl_model.fit(hdf_train, label=col_target, key=col_key,features=col_predictors)

Parallel run

Hi ,
Do we have anything option like in HGBT algorithm to execute parallel run in model training ?
so Parallel runs will complete model training in minimum timestamp

PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n'

Hello ,
I'm unable to understand below error for hana_ml version --> 2.14.22120100

Error :
ERROR:hana_ml.algorithms.pal.unified_classification:(423, 'AFL error: "HC_APL"."(DO statement)": line 60 col 1 (at pos 1957): search table error: _SYS_AFL.AFLPAL:UNIFIED_CLASSIFICATION_ANY: [423] (range 3) AFL error exception: exception 73001255: PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n')

I'm using PAL algorithm as below :

from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
rdt_params = dict(random_state=2,n_estimators=10,max_depth=25,learning_rate=0.1)
uc_rdt = UnifiedClassification(func = 'HybridGradientBoostingTree', thread_ratio=1.0,**rdt_params)
uc_rdt.fit(data=res,
              key= 'ID', 
              label='Target',
              features=features,
              partition_method='stratified',
              stratified_column='Target', 
              partition_random_state=2,
              training_percent=0.8, ntiles=2)

I have cross checked already that features are available in dataset as well as named it properly . It works when I ran it yesterday. but today for same dataset and same code --> it's giving me above error.

Can anyone please help me to understand the root cause of this error.

Tutorial 3 Step 1 - Error using the Graph.describe() method

I'm at step 1 of the tutorial: https://developers.sap.com/tutorials/hana-cloud-python-analysis-multimodel-3.html

Error I get is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], [line 1](vscode-notebook-cell:?execution_count=20&line=1)
----> [1](vscode-notebook-cell:?execution_count=20&line=1) g_storm.describe()

File [~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:196](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:196), in Graph.describe(self)
    [194](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:194) # Merge all to one Series
    [195](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:195) desc_series = pd.Series(desc)
--> [196](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:196) desc_series = pd.concat(desc_series, describer.self_loops)
    [197](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:197) desc_series = pd.concat(desc_series, describer.degree)
    [198](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:198) desc_series = pd.concat(desc_series, describer.density)

TypeError: concat() takes 1 positional argument but 2 were given

I read that some pandas version (1.5) allowed to have 2 arguments, but not pandas 2.2.2:
https://stackoverflow.com/questions/76034809/pandas-concat-second-argument

Could it be a bug?

GridSearchCV

Hello,

I am looking for GridSearchCV in SAP HANA documents . I am not able to find the best_estimator_ attribute in this library. For example after using GridSearchCV how do I know which attributes are get selected by the program and what is the accuracy for it.

Can you please share documents where I can get idea of it. I already looked at SAP HANA document (not much useful)

hana_ml.visualizers.model_debriefing.TreeModelDebriefing Issue

Hello ,
I have some doubts regarding model_debriefing in SAP HANA ml package as below :

  1. Can we perform model_debriefing on hybrid gradient boosting tree classification APL algorithm? also what about PAL ?
  2. My intention is only I want to visualize the model , I mean how it works at backend while doing prediction , is it possible ?
  3. I have tried using hana_ml.visualizers.model_debriefing on HGBT classification model (more interested on APL ). But some how it is not working .
    MicrosoftTeams-image (1)

It will be really helpful if you share your points with us .
Thank you !

Dependent library not installed automatically.

Due to the missing Jinja2 dependency, most of the library's features can not be imported. It cause an import error, resulting in a non working version of hana_ml .

These below two munally installation will fix the problem. As this must be done manually - I strongly recommend integrating this dependency into hana_ml pip packages as an dependancy

pip install jinja2
or
add poetry jinja2

RDTClassifier and Random Forest are same ?

Hello ,

RDT Classifier and Random Forest in SAP HANA ML package, are both same algorithms ?
I have checked documentation of SAP , attributes in both algorithms looks like same for me, Can you also please confirm from your end as well .
Currently I am trying to use Grid Search CV with unified classification but there is no option for Random Forest algorithm. Hence Raising this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.