Giter VIP home page Giter VIP logo

regal's Introduction

Regal

Royally reified regular expressions

CircleCI cljdoc badge Clojars Project

Regal provides a syntax for writing regular expressions using plain Clojure data: vectors, keywords, strings. This is known as Regal notation.

Once you have a Regal form you can either compile it to a regex object (java.util.regex.Pattern or JavaScript RegExp), or you can use it to create a Generator (see test.check) for generating values that conform to the given pattern.

It is also possible to parse regular expression patterns back to Regal forms.

Regal is Clojure and ClojureScript compatible, and has fixed semantics across platforms. Write your forms once and run them anywhere! It also allows manipulating multiple regex flavors regardless of the current platform, so you can do things like converting a JavaScript regex pattern to one that is suitable for Java's regex engine.

Support Lambda Island Open Source

If you find value in our work please consider becoming a backer on Open Collective

An example

(require '[lambdaisland.regal :as regal]
         '[lambdaisland.regal.generator :as regal-gen])

;; Regal expression, like Hiccup but for Regex
(def r [:cat
        [:+ [:class [\a \z]]]
        "="
        [:+ [:not \=]]])

;; Convert to host-specific regex
(regal/regex r)
;;=> #"[a-z]+\Q=\E[^=]+"

;; Match strings
(re-matches (regal/regex r) "foo=bar")
;;=> "foo=bar"

;; ... And generate them
(regal-gen/gen r)
;;=> #clojure.test.check.generators.Generator{...}

(regal-gen/sample r)
;;=> ("t=�" "d=5Ë" "zja=·" "uatt=ß¾" "lqyk=É" "xkj=q\f��" "gxupw=æ" "pkadbgmc=¯²" "f=Ã�J" "d=ç")

A swiss army knife

Regal can convert between three different represenations for regular expressions, Regal forms, patterns(i.e. strings), and regex objects. Here is an overview of how to get from one to the other.

↓From / To→ Form Pattern Regex
Form identity lambdaisland.regal/pattern lambdaisland.regal/regex
Pattern lambdaisland.regal.parse/parse-pattern identity lambdaisland.regal/compile
Regex lambdaisland.regal.parse/parse lambdaisland.regal/regex-pattern identity

Regal forms

Forms consist of vectors, keywords, strings, character literals, and in some cases integers. For example:

[:cat [:alt [:char 11] [:char 13]] \J [:rep "hello" 2 3]]

Forms have platform-independent semantics. The same regal form will match the same strings both in Clojure and ClojureScript, even though Java and JavaScript (and even different versions of Java or JavaScript) have different regex "flavors". In other words, we generate the regex that is right for the target platform.

;; Clojure
(regal/regex :vertical-whitespace) ;;=> #"\v"

;; ClojureScript
(regal/regex :vertical-whitespace) ;;=> #"[\n\x0B\f\r\x85\u2028\u2029]"

Regal currently knows about three "flavors"

  • :java8 Java 1.8 (earlier versions are not supported)
  • :java9 Java 9 or later
  • :ecma ECMAScript (JavaScript)

By default it takes the flavor that is best suited for the platform, but you can override that with lambdaisland.regal/with-flavor

(regal/with-flavor :ecma
  (regal/pattern ...))

Note that using regal/regex with a flavor that does not correspond with the flavor of the platform may yield unexpected results, when dealing with "foreign" regex flavors always stick to string representations (i.e. patterns).

Pattern

The second regex representation regal knows about is the pattern, i.e. the regex pattern in string form.

(regal/regex-pattern #"\u000B\v") ;; => "\\u000B\\v"

Depending on the situation there are several reasons why you might want to use this pattern representation over the compiled regex object.

  • simple strings, so easy to (de-)serialize
  • value semantics (can be compared)
  • allow manipulating regex pattern of regex flavors other than the one supported by the current runtime

Note that in Clojure the syntax available in regex patterns differs from the syntax available in strings, in particluar when it comes to notations starting with a backslash. e.g. #"\xFF" is a valid regex, while "\xFF" is not a valid string. We encode regex patterns in strings, which practically speaking means that backslashes are escaped (doubled).

(regal/regex-pattern #"\xFF") ;;=> "\\xFF"
(regal/compile "\\xFF")       ;;=> #"\xFF"

Regex

To use the regex engine provided by the runtime (e.g. through re-find or re-seq) you need a platform-specific regex object. This is what lambdaisland.regal/regex gives you.

Grammar

  • Strings and characters match literally. They are escaped, so . matches a period, not any character, ^ matches a caret, etc.
  • A few keywords have special meaning. These are :any (match any character, like .), :start (match the start of the input), :end (match the end of the input).
  • All other forms are vectors, with the first element being a keyword
    • [:cat forms...] : concatenation, match the given Regal expressions in order
    • [:alt forms...] : alternatives, match one of the given options, like (foo|bar|baz)
    • [:* form] : match the given form zero or more times
    • [:+ form] : match the given form one or more times
    • [:? form] : match the given form zero or one time
    • [:class entries...] : match any of the given characters or ranges, with ranges given as two element vectors. E.g. [:class [\a \z] [\A \Z] "_" "-"] is equivalent to [a-zA-Z_-]
    • [:not entries...] : like :class, but negates the result, equivalent to [^...]
    • [:repeat form min max] : repeat a form a number of times, like {2,5}
    • [:capture forms...] : capturing group with implicit concatenation of the given forms
  • A clojure.spec.alpha definition of the grammar can be made available as :lambdaisland.regal/form by explicitly requiring lambdaisland.regal.spec-alpha

Use with spec.alpha

(require '[lambdaisland.regal.spec-alpha :as regal-spec]
         '[clojure.spec.alpha :as s]
         '[clojure.spec.gen.alpha :as gen])

(s/def ::x-then-y (regal-spec/spec [:cat [:+ "x"] "-" [:+ "y"]]))

(s/def ::xy-with-stars (regal-spec/spec [:cat "*" ::x-then-y "*"]))

(s/valid? ::xy-with-stars "*xxx-yy*")
;; => true

(gen/sample (s/gen ::xy-with-stars))
;; => ("*x-y*"
;;     "*xx-y*"
;;     "*x-y*"
;;     "*xxxx-y*"
;;     "*xxx-yyyy*"
;;     "*xxxx-yyy*"
;;     "*xxxxxxx-yyyyy*"
;;     "*xx-yyy*"
;;     "*xxxxx-y*"
;;     "*xxx-yyyy*")

BYO test.check / spec-alpha

Regal does not declare any dependencies. This lets people who only care about using Regal Expressions to replace normal regexes to require lambdaisland.regal without imposing extra dependencies upon them.

If you want to use lambdaisland.regal.generator you will require org.clojure/test.check. For lambdisland.regal.spec-alpha you will additionally need org.clojure/spec-alpha.

Contributing

Everyone has a right to submit patches to this projects, and thus become a contributor.

Contributors MUST

  • adhere to the LambdaIsland Clojure Style Guide
  • write patches that solve a problem. Start by stating the problem, then supply a minimal solution. *
  • agree to license their contributions as MPLv2.
  • not break the contract with downstream consumers. **
  • not break the tests.

Contributors SHOULD

  • update the CHANGELOG and README.
  • add tests for new functionality.

If you submit a pull request that adheres to these rules, then it will almost certainly be merged immediately. However some things may require more consideration. If you add new dependencies, or significantly increase the API surface, then we need to decide if these changes are in line with the project's goals. In this case you can start by writing a pitch, and collecting feedback on it.

* This goes for features too, a feature needs to solve a problem. State the problem it solves, then supply a minimal solution.

** As long as this project has not seen a public release (i.e. is not on Clojars) we may still consider making breaking changes, if there is consensus that the changes are justified.

Prior Art

License

Copyright © 2020 Arne Brasseur

Licensed under the term of the Mozilla Public License 2.0, see LICENSE.

regal's People

Contributors

plexus avatar dergutemoritz avatar jackrusher avatar borkdude avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.