signavio / sap-sam Goto Github PK

Example source code for SAP Signavio Academic Models (SAP-SAM)

License: Apache License 2.0

Jupyter Notebook 99.59% Python 0.41%

sap-sam's Introduction

SAP Signavio Academic Models (SAP-SAM)

This repository contains the source code for the paper SAP Signavio Academic Models: A Large Process Model Dataset by Diana Sola, Christian Warmuth, Bernhard Schäfer, Peyman Badakhshan, Jana-Rebecca Rehse, and Timotheus Kampik.

Link to the paper: https://arxiv.org/abs/2208.12223 (pre-print)

Link to the dataset: https://zenodo.org/record/7012043

License

The example code in this repository is licensed as follows. Note that a different license applies to the dataset itself!

Copyright (c) 2022 by SAP.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

The following license applies to the SAP-SAM dataset.

Copyright (c) 2022 by SAP.

SAP grants to Recipient a non-exclusive copyright license to the Model Collection to use the Model Collection for Non-Commercial Research purposes of evaluating Recipient’s algorithms or other academic research artefacts against the Model Collection. Any rights not explicitly granted herein are reserved to SAP. For the avoidance of doubt, no rights to make derivative works of the Model Collection is granted and the license granted hereunder is for Non-Commercial Research purposes only.

"Model Collection" shall mean all files in the archive (which are JSON, XML, or other representation of business process models or other models).

"Recipient" means any natural person receiving the Model Collection.

"Non-Commercial Research" means research solely for the advancement of knowledge whether by a university or other learning institution and does not include any commercial or other sales objectives.

Citing SAP-SAM

@misc{SAP-SAM-paper,
  doi = {10.48550/ARXIV.2208.12223},
  url = {https://arxiv.org/abs/2208.12223},
  author = {Sola, Diana and Warmuth, Christian and Schäfer, Bernhard and Badakhshan, Peyman and Rehse, Jana-Rebecca and Kampik, Timotheus},
  keywords = {Other Computer Science (cs.OH), Software Engineering (cs.SE), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {SAP Signavio Academic Models: A Large Process Model Dataset},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

@dataset{SAP-SAM-dataset,
  author       = {Kampik, Timotheus and Warmuth, Christian and Sola, Diana and Schäfer, Bernhard and Axworthy, Liz and Ivarsson, Erica and
                  Ouda, Karim and Eickhoff, David},
  title        = {SAP Signavio Academic Models},
  month        = aug,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {0.5.1},
  doi          = {10.5281/zenodo.6964944},
  url          = {https://doi.org/10.5281/zenodo.6964944}
}

Setup

You need to download the dataset and place it into the folder ./data/raw such that the models are in ./data/raw/sap_sam_2022/models.

It is also possible to run the analysis on any .sgx files (Signavio workspace exports). Place the files in ./data/raw/sap_sam_2022/models and the conversion will be performed automatically.

To get started on Mac or Windows, we provide a dependency setup with poetry. Make sure poetry is installed on your system with poetry --version. If not, run pip poetry install.

To install the dependencies, do to the root of the cloned repository, type this line in the terminal, and press enter:

poetry install

It is important to note that you should have the latest stable version of python or python3 installed on your machine, and not a pre-release one (try python --version). The current latest stable version is 3.12.2 (as of April 2024).

After executing the script, you should be able to setup the kernel:

python -m ipykernel install --user --name=sap-sam-kernel

Then, to open the project, simply type:

jupyter notebook

Alternatively, a conda setup is possible.

We provide two conda environment.yml files that can be used to create a new environment and install the required dependencies:

environment.yml: contains the abstract dependencies (pandas, numpy, ...).
environment-lock.yml: contains versions for all dependencies and the transitive dependencies to ensure reproducible results.

You can use the following conda command to create the environment:

conda env create -f environment.yml

conda env create -f environment-lock.yml

Getting started

We provide a tutorial Jupyter Notebook that illustrates the dataset format in more detail and shows how to use the csv parsers developed in ./src.

The properties Jupyter Notebook gives an overview of selected properties of the dataset.

Dataset Format

The SAP-SAM dataset contains 103 csv files with a rough size of 38 GB of process models (see modeling notations of the models below).

CSV Format

csv columns:
- Revision ID: Unique identifier for model revision
- Model ID: Unique identifier for model
- Organization ID: Unique identifier for organization this model originates from
- Datetime: Date and time of creation
- Model JSON: JSON containing model information
- Description: Description of model (typically empty)
- Name: Model name
- Type: Model type (duplicate and less specific than namespace)
- Namespace: Stencilset/modeling notation (e.g. BPMN, DMN, UML,...)
Number of models: 1,021,471
Number of models by modeling notation:

Modeling notation	Frequency
BPMN 2.0	618,807
Value Chain	194,078
DMN 1.0	98,286
EPC	32,369
BPMN 1.0	15,643
UML 2.2 Class	14,953
Petri Net	11,207
ArchiMate 2.1	10,956
UML Use Case	10,228
Organigram	4,568
BPMN 2.0 Choreography	4,096
BPMN 2.0 Conversation	2,788
FMC Block Diagram	1,398
CMMN 1.0	999
CPN	385
Journey Map	287
YAWL 2.2	238
Process Documentation Template	86
jBPM 4	76
XForms	20
Chen Notation	3

Dummy Data

In order to remove personal first and last names, emails or in some cases matriculations numbers (which users have added in non-compliance with the T&Cs), we have applied a simple replacement script. In particular, we have replaced - to the extent possible - emails, names, and (matriculation) numbers with the following dummy values:

Context	Dummy
Email Dummy	[email protected]
Name Dummy	Jane Doe
Matriculation/Number Dummy	12345678

Project Organization

├── data
│   ├── interim           <- Intermediate data that has been transformed.
│   └── raw               <- The raw dataset should be placed in this folder.
├── notebooks             <- Jupyter notebooks.
├── reports            
│   └── figures           <- Generated graphics and figures used in the paper.
├── src               
│   └── sapsam            <- Source code and dictionaries for use in this project.
├── LICENSE               <- License that applies to the example code in this repository.
├── README.md             <- The top-level README for developers using this project.
├── environment-lock.yml  <- Contains versions for all dependencies and the transitive dependencies to ensure reproducible results.
├── environment.yml       <- Contains the abstract dependencies (pandas, numpy, ...).
└── setup.py              <- Makes project pip installable (pip install -e .) such that src can be imported.

sap-sam's People

Contributors

Stargazers

Watchers

Forkers

loxk pitcherag timkam isabella232 sap-pnschumacher peiyanpan guccigui ccoreasap mzurmuehlen rajatmandaniyan

sap-sam's Issues

Update filter for BPMN 2.0 parsing stage

– Updated code to include a 'name' column after in the table of BPMN 2.0 diagrams
– Using the 'name' column, filtering out example processes is now possible
– Updated markdown to make it clear that diagrams with no elements are ignored during parsing stage, hence explaining count discrepancies

[rl-vulnerability_alerts-1] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-vulnerability_alerts-1
Explanation: Are vulnerability alerts enabled? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

No Conf file is provided

I am getting an error in this line
https://github.com/signavio/sap-sam/blob/main/src/sapsam/ImageGenerator.py#L2

from conf import system_instance
However, no config file is provided. Does it have to do anything with the Signavio account?

Please point me in the right direction.

Fix reuse compliance

Running analysis on data subsets

Noticed some unexpected behaviour in the code after trying to run the analysis on a very small subset of the data (~25 models).

[rl-reuse_tool-2] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-2
Explanation: Does it have LICENSES directory with licenses? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-reuse_tool-1] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-1
Explanation: Does README mention REUSE? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Update for index consistency in modelling notations

[rl-assigned_teams-5] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-5
Explanation: Does teams have enough members on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-reuse_tool-4] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-4
Explanation: Is it compliant with REUSE rules? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-assigned_teams-2] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-2
Explanation: Does it have an admin team on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Run analysis with SGX exports

[rl-assigned_teams-4] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-4
Explanation: Does it have a team with push privileges on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Update script to minimum requirements

– Move jupyter and notebook dependency to venv
– Maybe check for the existence of usr/bin/clang

Forks metrics

Getting error invalid_request_missing_parameter

Hi and thank you for your work!

I try to use your code to convert BPMN diagrams to an image. However, I run into an error when trying:

Traceback (most recent call last):                                                                                            
  File "convert_to_bpmn.py", line 41, in <module>
  	image_request = gen.generate_image(model_name, model_json, model_namespace)
  File "venv/lib/python3.10/site-packages/sapsam-0.0.1-py3.10.egg/sapsam/ImageGenerator.py", line 117, in generate_image
    return self.generate_representation(name, data, namespace, 'png', deletes)
  File "venv/lib/python3.10/site-packages/sapsam-0.0.1-py3.10.egg/sapsam/ImageGenerator.py", line 91, in generate_representation
    model_id = result['href'].replace('/model/', '')
KeyError: 'href'

When printing the result variable from line 90, I get the following data, including an error message:

{'requestId': '***deleted***', 'message': 'Ein Fehler ist aufgetreten (invalid_request_missing_parameter)', 'errors': ['invalid_request_missing_parameter']}

I checked the login data end entered a wrong password, which lead to a different error. In addition, the authenticator information looked good when printing them with the correct login information.

Can you help me? What am I doing wrong?

[rl-assigned_teams-1] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-1
Explanation: Does it have enough teams on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Advanced filtering options

Business object links metrics

Rate limiting for Imagegenerator

I am encountering a technical issue while attempting to convert SAP Signavio Academic models into event logs (XES) using the sap-sam Image Generator module.

I have been following the process outlined in the python notebooks (https://github.com/signavio/sap-sam/blob/main/notebooks/3_images_and_XMLs.ipynb) and utilizing the sap-sam Image Generator module (https://github.com/signavio/sap-sam/blob/main/src/sapsam/ImageGenerator.py) to convert the models. I have successfully converted approximately 50 BPMN JSON files from the CSV format to .bpmn files.
However, I am now facing an issue with the generate_xml method within the Image Generator module. After successfully converting the initial set of files, the method suddenly stops returning any output, and the conversion process halts without generating any errors or indications of failure. I have reviewed the logs and examined my code to ensure there are no obvious errors or misconfigurations.

Despite my efforts, I have not been able to identify the root cause of this issue. To provide more context, I am utilizing the SAP Signavio Academic models available at this link: https://zenodo.org/record/7012043.

Numpy and Matplotlib updates

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Update dependency ipykernel to v6.29.5
Update dependency matplotlib to v3.9.1
Update dependency thinc to v8.2.5
Update dependency pillow to v10.4.0
Update dependency pydantic to v2.8.2
Update dependency numpy to v2
Click on this checkbox to rebase all open PRs at once

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Update dependency thinc to v9

Detected dependencies

pep621

pyproject.toml

poetry

pyproject.toml

python ^3.11

matplotlib ^3.8.4

pillow ^10.3.0

pandas ^2.2.1

numpy ^1.26.4

toml ^0.10.2

seaborn ^0.13.2

wordcloud ^1.9.3

language-data ^1.2

tqdm ^4.66.2

thinc ^8.2.3

spacy ^3.7.4

stringcase ^1.2.0

ipykernel ^6.29.4

pydantic ^2.6.4

spacy_langdetect ^0.1.2

pyarrow ^16.0.0

jupyter ^1.0.0

Check this box to trigger a request for Renovate to run again on this repository

Some data-sets seem to have missing header information

When calling parser.parse_model_metadata(), some of the data-sets could not be parsed on my machine, with the error message that the header information was missing. The data-sets that were affected are:

70000.csv
140000.csv
330000.csv
510000.csv
590000.csv
670000.csv
820000.csv

After removing these data-sets from the data folder, the parsing worked fine.

[rl-assigned_teams-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-3
Explanation: Does it have enough admins on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

change the default "setup folder" to "my documents" (instead of shared documents)

In ImageGenerator.py, the functionction "setup_folder" currently ensures a folder "SAP-SAM" in the workspace under "shared documents". When other functions such as "generate_xml" are then called, the entire network with access to the workspace will however then get (potentially thousands) of notifications such as "model x created, model y deleted". Maybe this can be changed to setup the folder under "my documents" per default. I will try this and create a pull request if applicable

signavio / sap-sam Goto Github PK

sap-sam's Introduction

SAP Signavio Academic Models (SAP-SAM)

License

Citing SAP-SAM

Setup

Getting started

Dataset Format

CSV Format

Dummy Data

Project Organization

sap-sam's People

Contributors

Stargazers

Watchers

Forkers

sap-sam's Issues

Open

Ignored or Blocked

Detected dependencies

Recommend Projects

Recommend Topics

Recommend Org