sap-samples / hana-ml-samples Goto Github PK

This project provides code examples for SAP HANA Predictive and Machine Learning scenarios and is educational content. It covers simple Predictive Analysis Library SQL examples as well as complete SAP HANA design-time “ML scenario”-application content or HANA-ML Python Notebook examples.

License: Apache License 2.0

Jupyter Notebook 96.50% Python 0.21% HTML 3.26% R 0.01% ABAP 0.01% CAP CDS 0.01% JavaScript 0.01% CSS 0.01%

sample sample-code predictive ml sap-hana machine-learning

hana-ml-samples's Introduction

SAP HANA Predictive and Machine Learning Scenarios

Description

Requirements

In order to "run" the provided sample codes, a SAP HANA database environment is required with the AFL-component installed, which includes the Predictive Analysis Library (PAL). Specific sample files will specify additional requirements if required.

Download and Installation

The sample files can be downloaded and used within the respective user / developer environment, e.g. SQL files may be opened and used within the SQL console of SAP HANA Studio or SAP HANA Database Explorer. The sample files don't require a install step for themselves, they are simply downloaded and then opened in the respective editor.

How to obtain support

Create an issue in this repository if you find a bug or have questions about the content.

License

hana-ml-samples's People

Contributors

Stargazers

Watchers

hana-ml-samples's Issues

optimal_parameter : None

Hello ,

I am doing hyperparameter tuning using unified classification. But I am getting optimal parameter as "None" in report. I tried both RandomSearchCV and GridSearchCV. Can you please tell me where I am doing wrong.

rfc = UnifiedClassification(func = 'RandomDecisionTree')

gscv = GridSearchCV(estimator=rfc,
param_grid={'n_estimators': [50,80,90,100,200,300,400,500,600,800],
'max_depth': [50, 60, 70, 80, 90, 100, 110,120,150],
'split_threshold': [0.1, 0.4, 0.7, 1],
'min_samples_leaf':[1,2,3,4]},
train_control=dict(fold_num=5,
random_state=1,
resampling_method='cv'),
scoring='error_rate')

gscv.fit(data=hdf_train, key= 'id',
label='label',
features=features,
partition_method='stratified',
partition_random_state=1,
stratified_column='label',
build_report=True)

ERROR:hana_ml.algorithms.pal.unified_classification

Hello ,

while using Grid Search CV with unified classification I am getting below error :

ERROR:hana_ml.algorithms.pal.unified_classification:(423, 'AFL error: "HC_APL"."(DO statement)": line 65 col 1 (at pos 2182): search table error: _SYS_AFL.AFLPAL:UNIFIED_CLASSIFICATION_ANY: [423] (range 3) AFL error exception: exception 73001043: PAL error[73001043]:Value [0] of parameter TRY_NUM is invalid. \n')

I haven't declared any "TRY_NUM " variable in my code. Can you please tell me where I am doing wrong.

code :
`# train model with hyperparameter tunning
rfc = UnifiedClassification(func = 'randomforest')

gscv.fit(data=hdf_train, key= 'CC_KEY', label='NAT_FUNCTION',partition_random_state=1,
features=features,
partition_method='stratified',
stratified_column='class',
build_report=True)
`

[rl-assigned_teams-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-3
Explanation: Does it have enough admins on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-vulnerability_alerts-1] Violation against OSS Rules of Play

The OSPO bot created this issue by mistake - It did not have enough priviledge to check the vulnerability alerts, So I am closing this issue now. Sorry for any inconvenience.

Can't append new data to existing table.

Hello,

I currently have a table to save data, the idea is that new data is constantly being introduced and saved/append to the last rows of said table.
Our team currently generates several indexes on this table:

I'm currently dealing with two issues:

I currently have 4000 rows in the table, I'm adding 2000 new rows (with the same columns) via Pandas Dataframes, from Pandas I'm handling this new 6000 rows dataframe (old 4000+new 2000) and pushing it using the following code
hana_ml.dataframe.create_dataframe_from_pandas(cc, df_upload, 'BUPRY', schema=schema, force=True, primary_key='INDEX')

However with this the table gets dropped and a new one gets created with the new uploaded data, in this process the "INDEXES" also get cleaned up.

I have been trying to not drop the table to prevent said "INDEXES" tab from getting cleaned, mainly playing with the two following parameters:

force=False/True
drop_exist_tab=False/True

When changing the value of these to both False I have noticed the following warning on the console
Table already exists. Data will be appended to the existing table. To set "force=True" can empty the table.
Which is the correct outcome I am looking for, however I am getting the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\bupryhr_dev\venv\lib\site-packages\hana_ml\dataframe.py", line 4658, in create_dataframe_from_pandas
    cursor.executemany(sql, rows)
hdbcli.dbapi.IntegrityError: (301, 'unique constraint violated')

I have been trying changing different parameters, uploading the total 6000 rows and just the 2000 new ones, not specifying the primary key, but nothing seems to work and I keep getting the same error.

How can I fix this? Thanks in advance.

APL hyper-parameter tunning

Hello ,
Do we need hyperparameter tuning while using SAP APL models ?
as per document there are default values for learning_rate,max depth ,max_iterations,early stopping . After using this default values my model is giving me 81% ACC on training dataset. But I need to further improve the model ACC. I got nowhere about hyper-parameter tuning about APL models.
In APL, how can I tune above parameters , do we have any options to activate correct auto-selection for above hyperparameters or model do it by themselves at backend and no need to do any hyperparameter tuning for APL models ?

my code :
apl_model = GradientBoostingClassifier() apl_model.set_params(variable_auto_selection = True,variable_selection_max_nb_of_final_variables = '6') apl_model.fit(hdf_train, label=col_target, key=col_key,features=col_predictors)

Parallel run

Hi ,
Do we have anything option like in HGBT algorithm to execute parallel run in model training ?
so Parallel runs will complete model training in minimum timestamp

PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n'

Hello ,
I'm unable to understand below error for hana_ml version --> 2.14.22120100

Error :
ERROR:hana_ml.algorithms.pal.unified_classification:(423, 'AFL error: "HC_APL"."(DO statement)": line 60 col 1 (at pos 1957): search table error: _SYS_AFL.AFLPAL:UNIFIED_CLASSIFICATION_ANY: [423] (range 3) AFL error exception: exception 73001255: PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n')

I'm using PAL algorithm as below :

from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
rdt_params = dict(random_state=2,n_estimators=10,max_depth=25,learning_rate=0.1)
uc_rdt = UnifiedClassification(func = 'HybridGradientBoostingTree', thread_ratio=1.0,**rdt_params)
uc_rdt.fit(data=res,
              key= 'ID', 
              label='Target',
              features=features,
              partition_method='stratified',
              stratified_column='Target', 
              partition_random_state=2,
              training_percent=0.8, ntiles=2)

I have cross checked already that features are available in dataset as well as named it properly . It works when I ran it yesterday. but today for same dataset and same code --> it's giving me above error.

Can anyone please help me to understand the root cause of this error.

Tutorial 3 Step 1 - Error using the Graph.describe() method

I'm at step 1 of the tutorial: https://developers.sap.com/tutorials/hana-cloud-python-analysis-multimodel-3.html

Error I get is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], [line 1](vscode-notebook-cell:?execution_count=20&line=1)
----> [1](vscode-notebook-cell:?execution_count=20&line=1) g_storm.describe()

File [~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:196](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:196), in Graph.describe(self)
    [194](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:194) # Merge all to one Series
    [195](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:195) desc_series = pd.Series(desc)
--> [196](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:196) desc_series = pd.concat(desc_series, describer.self_loops)
    [197](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:197) desc_series = pd.concat(desc_series, describer.degree)
    [198](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/gunter/llm_graph/~/llm_graph/.venv/lib/python3.11/site-packages/hana_ml/graph/hana_graph.py:198) desc_series = pd.concat(desc_series, describer.density)

TypeError: concat() takes 1 positional argument but 2 were given

I read that some pandas version (1.5) allowed to have 2 arguments, but not pandas 2.2.2:
https://stackoverflow.com/questions/76034809/pandas-concat-second-argument

Could it be a bug?

[rl-assigned_teams-1] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-1
Explanation: Does it have enough teams on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

not able to install hana-ml and hdbcli on new macbook pro m1 chip aarch64 architecture

ERROR: Could not find a version that satisfies the requirement hdbcli (from versions: none)
ERROR: No matching distribution found for hdbcli

ERROR: Could not find a version that satisfies the requirement hana-ml (from versions: none)
ERROR: No matching distribution found for hana-ml

GridSearchCV

Hello,

I am looking for GridSearchCV in SAP HANA documents . I am not able to find the best_estimator_ attribute in this library. For example after using GridSearchCV how do I know which attributes are get selected by the program and what is the accuracy for it.

Can you please share documents where I can get idea of it. I already looked at SAP HANA document (not much useful)

hana_ml.visualizers.model_debriefing.TreeModelDebriefing Issue

Hello ,
I have some doubts regarding model_debriefing in SAP HANA ml package as below :

Can we perform model_debriefing on hybrid gradient boosting tree classification APL algorithm? also what about PAL ?
My intention is only I want to visualize the model , I mean how it works at backend while doing prediction , is it possible ?
I have tried using hana_ml.visualizers.model_debriefing on HGBT classification model (more interested on APL ). But some how it is not working .

It will be really helpful if you share your points with us .
Thank you !

[rl-assigned_teams-2] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-2
Explanation: Does it have an admin team on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Dependent library not installed automatically.

Due to the missing Jinja2 dependency, most of the library's features can not be imported. It cause an import error, resulting in a non working version of hana_ml .

These below two munally installation will fix the problem. As this must be done manually - I strongly recommend integrating this dependency into hana_ml pip packages as an dependancy

pip install jinja2
or
add poetry jinja2

RDTClassifier and Random Forest are same ?

Hello ,

RDT Classifier and Random Forest in SAP HANA ML package, are both same algorithms ?
I have checked documentation of SAP , attributes in both algorithms looks like same for me, Can you also please confirm from your end as well .
Currently I am trying to use Grid Search CV with unified classification but there is no option for Random Forest algorithm. Hence Raising this issue.

sap-samples / hana-ml-samples Goto Github PK

hana-ml-samples's Introduction

SAP HANA Predictive and Machine Learning Scenarios

Description

Requirements

Download and Installation

How to obtain support

License

hana-ml-samples's People

Contributors

Stargazers

Watchers

Forkers

hana-ml-samples's Issues

Recommend Projects

Recommend Topics

Recommend Org