Giter VIP home page Giter VIP logo

presidio-action's Introduction

Presidio Action

Github Action that analyzes text for PII entities with Microsoft's Presidio framework.

Author

Insights Engineering

Inputs

  • path:

    Description: Path to verify

    Required: false

    Default: "."

  • configuration-file:

    Description: Path to custom configuration file

    Required: false

    Default: "default"

  • configuration-data:

    Description: Configuration data as an inline YAML configuration

    Required: false

    Default: ""

  • output:

    Description: Format of output

    Required: false

    Default: "auto"

  • publish:

    Description: Publish result as a PR comment

    Required: false

    Default: "true"

  • upload:

    Description: Upload results as an artifact

    Required: false

    Default: "true"

  • presidio-cli-version:

    Description: Presidio CLI version

    Required: false

    Default: "latest"

  • lang-models:

    Description: List of additional language models to install

    Required: false

    Default: ""

  • only-changed-files:

    Description: Only run checks for changed files

    Required: false

    Default: false

Outputs

An output depends on the output parameter:

The default format is auto.

Available formats:

  • standard - standard output format
tests/conftest.py
  34:58     0.85     PERSON
  37:33     0.85     PERSON
  • github - similar to diff function in github
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
  • colored - standard output format but with colors

  • parsable - easy to parse automaticaly

{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
  • auto - default format, switches automatically between those 2 modes:
    • github, if run on github - environment variables GITHUB_ACTIONS and GITHUB_WORKFLOW are set
    • colored, otherwise

How it works

Presidio action uses presidio-cli based on presidio-analyzer from Microsoft Presidio framework to check code against undesirable types of data such as 'EMAIL_ADDRESS' or 'PHONE_NUMBER' inside application's code.

For more information please see a full list of supported entities.

Usage

Example usage:

---
name: Presidio check

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  presidio-action:
    runs-on: ubuntu-latest
    name: Presidio check

    steps:
      - name: Checkout Code
        uses: actions/checkout@v2
        with:
          # 0 fetch-depth is needed if you set `only-changed-files` to true
          # and if you are configuring this check to run on push events
          fetch-depth: 0

      - name: Produce the presidio report
        uses: insightsengineering/presidio-action@v1
        # all parameters below are optional
        with:
          # path to project.
          # if project does not have a specific 'my-project' path,
          # '.' - current folder is a default value
          path: "my-project"
          # configuration-file - path to file with specific configuration
          # or use one of predefined files:
          #   - default - `conf/default.yaml` file from action repository, check default list of entities
          #                and ignore content of `.git` folder
          #   - limited - `conf/limited.yaml` file from action repository, check only PERSON, EMAIL_ADDRESS and CREDIT_CARD
          #                and ignore `.git` folder and *.cfg files
          configuration-file: "my-project/conf/my-presidio-config.yaml"
          # configuration-data - content of configuration in raw yaml format.
          # Give possibility to prepare own configuration without adding file to project
          # any value in this field will block usage of configuration file
          configuration-data: |
            entities:
              - PERSON
            threshold: 0.9
          # output - specify one of output formats
          output: "parsable"
          # only-changed-files - only run the check for files that were changed
          # NOTE: You must set fetch-depth: 0 in the actions/checkout@v2 step
          # for push events while this paramater is set to true
          only-changed-files: true

Example of comment added to the PR:

Screenshot with PR comment example

presidio-action's People

Contributors

cicdguy avatar tomszosz avatar walkowif avatar insights-engineering-bot avatar karthik-philips-ta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.