Giter VIP home page Giter VIP logo

paresy's Introduction

PaRESy

This repo contains the code and other artifacts for the paper.

Search-Based Regular Expression Inference on a GPU

by Mojtaba Valizadeh and Martin Berger. The paper is available at

The original code, submitted to PLDI 2023, is available using the tag v0.1, or using commit 6aa87ae.

Introduction

In this work, the goal is to find a precise and minimal regular expression (RE) that accepts a given set of positive strings and rejects a given set of negative ones. To accomplish this, we developed an algorithm called PaRESy, Parallel Regular Expression Synthesiser, which is implemented in two codes: one for CPU using C++, and another for GPU using Cuda C++. By doing this, we could measure the speed-up for the most challenging examples.

In this version of the work, we use a simple grammar for the REs:

R ::= Φ|ε|a|R?|R*|R.R|R+R

Regarding the minimality, we need to use a cost function that maps every constructor in the RE to a positive integer. By summing up the costs of all the constructors, we can obtain the overall cost of the RE. This cost function helps us to avoid overfitting and returning the trivial RE that is the union of the positive strings.

Example:

  • Positive: ["", "010", "1000", "0101", "1010", "0010"]
  • Negative: ["001", "1011", "111"]

Note: In this work, we use "" to represent the epsilon (ε) as the empty string.

We aim to find a regular expression (RE) that accepts all strings in the positive set and rejects all strings in the negative set. However, there could be an infinite number of such REs, and we want to identify the minimal one based on a given cost function. This cost function assigns positive integers to each constructor, including the alphabet, question mark, star, concatenation, and union. Let us assume that the costs of these constructors are given as [c1, c2, c3, c4, c5]. Based on the different costs, we can generate various REs, as follows.

  • [1, 1, 1, 1, 1] ---> (0+101?)*
  • [1, 5, 1, 1, 5] ---> (01)*(1*0)*

Note: In this work, we use the + symbol to represent the union constructor, which is commonly denoted by | in standard libraries.

In this simple example, both REs are precise (i.e., accepts all positive and rejects all negative examples) and minimal w.r.t their own cost functions. We can observe that by increasing the costs of question mark and union in the second cost function, the resulting RE contains fewer instances of these constructors. However, to compensate for this, the regular expression tends to use more stars, which are cheaper in this particular cost function.

Colab Notebook

This work is presented on a Google Colab notebook platform, which automatically clones this GitHub repository and comes fully equipped with all necessary dependencies and libraries, eliminating the need for additional installations. You can run the scripts by clicking on the designated buttons and adjusting inputs as needed.

To access the notebook, please use the link below and follow the instructions provided.

paresy's People

Contributors

mojtabavalizadeh avatar martinberger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.