Giter VIP home page Giter VIP logo

eslint-plugin-clean-regex's Introduction

eslint-plugin-clean-regex

Actions Status npm

An ESLint plugin for writing better regular expressions.

⚠️ Deprecated ⚠️

This project is deprecated.

Please use eslint-plugin-regexp instead.

What happened?

eslint-plugin-clean-regex and eslint-plugin-regexp have joined forces. We decided to work together on one ESLint plugin for JavaScript regexes. Since maintaining two plugins with similar rules takes too much work, I decided to stop working on eslint-plugin-clean-regex.

As of right now, eslint-plugin-regexp supports all rules of eslint-plugin-clean-regex along improvements to those rules and with many more useful rules.

Migration

See the migration guide.

About

This is an ESLint plugin to lint JavaScript regular expressions. Its goal is to help both beginners and experts to write better regular expressions by pointing out errors and suggesting improvements.

The plugin offers rules for possible errors, best practices, and coding style in regular expressions.

Right now, this project is still young (and many rules are opinionated). Feel free to open an issue if you think rules are too strict/lax/inflexible. Suggestions and feature requests are welcome as well!

Getting started

You'll need to install ESLint and eslint-plugin-clean-regex:

$ npm i eslint eslint-plugin-clean-regex --save-dev

Note: If you installed ESLint globally (using the -g flag) then you must also install eslint-plugin-clean-regex globally.

Add clean-regex to the plugins section of your .eslintrc configuration file (you can omit the eslint-plugin- prefix) and configure the rules you want:

{
    "plugins": [
        "clean-regex"
    ],
    "rules": {
        "clean-regex/rule-name": 2
    }
}

You can also use the recommended config:

{
    "plugins": [
        "clean-regex"
    ],
    "extends": [
        "plugin:clean-regex/recommended"
    ]
}

The setting of every rule in the recommended config can be found in the table below.

Highlights

Some highlights of the working and working-together of rules in the recommended config.

Optimize character classes

Before:

- /[0-9]/i
- /[^\s]/
- /[a-fA-F0-9]/i
- /[a-zA-Z0-9_-]/
- /[a-z\d\w]/
- /[\S\d]/
- /[\w\p{ASCII}]/u

After:

- /\d/
- /\S/
- /[a-f0-9]/i
- /[\w-]/
- /\w/
- /\S/
- /\p{ASCII}/u

Simplify patterns

Before:

- /(?:\w|\d)+/
- /(?:a|(b)|c|(?:d)|(?:ee)){0,}/
- /(?<!\w)a+(?=$)/mi
- /[\s\S]#[\0-\uFFFF]/ysi
- /\d*\w(?:[a-z_]|\d+)*/im

After:

- /\w+/
- /(?:[acd]|(b)|ee)*/
- /\ba+$/im
- /.#./sy
- /\w+/

Detect non-functional code and potential errors

- /\1(a)/        // `\1` won't work
- /a+b*?/        // `b*?` can be removed
- /(?:\b)?a/     // `(?:\b)?` can be removed
- /[a-z]+|Foo/i  // `Foo` can be removed
- /(?=a?)\w\Ba/  // `(?=a?)` and `\B` always accept and can be removed
- /[*/+-^&|]/    // `+-^` will match everything from \x2B to \x5E including all character A to Z

Supported Rules

Fixable rules are denoted with a 🔧.

Problems

Rule Description
confusing-quantifier Warn about confusing quantifiers.
disjoint-alternatives Disallow different alternatives that can match the same words.
no-empty-alternative Disallow alternatives without elements.
no-empty-backreference Disallow backreferences that will always be replaced with the empty string.
no-empty-lookaround Disallow lookarounds that can match the empty string.
no-lazy-ends Disallow lazy quantifiers at the end of an expression.
no-obscure-range Disallow obscure ranges in character classes.
no-octal-escape Disallow octal escapes outside of character classes.
no-optional-assertion Disallow optional assertions.
no-potentially-empty-backreference Disallow backreferences that reference a group that might not be matched.
no-unnecessary-assertions Disallow assertions that are known to always accept (or reject).
🔧 no-zero-quantifier Disallow quantifiers with a maximum of 0.
optimal-lookaround-quantifier Disallows the alternatives of lookarounds that end with a non-constant quantifier.

Suggestions

Rule Description
🔧 consistent-match-all-characters Use one character class consistently whenever all characters have to be matched.
🔧 identity-escape How to handle identity escapes.
no-constant-capturing-group Disallow capturing groups that can match only one word.
🔧 no-trivially-nested-lookaround Disallow lookarounds that only contain another assertion.
🔧 no-trivially-nested-quantifier Disallow nested quantifiers that can be rewritten as one quantifier.
🔧 no-unnecessary-character-class Disallow unnecessary character classes.
🔧 no-unnecessary-flag Disallow unnecessary regex flags.
🔧 no-unnecessary-group Disallow unnecessary non-capturing groups.
🔧 no-unnecessary-lazy Disallow unnecessarily lazy quantifiers.
🔧 no-unnecessary-quantifier Disallow unnecessary quantifiers.
🔧 optimal-concatenation-quantifier Use optimal quantifiers for concatenated quantified characters.
🔧 optimized-character-class Disallows unnecessary elements in character classes.
🔧 prefer-character-class Prefer character classes wherever possible instead of alternations.
🔧 prefer-predefined-assertion Prefer predefined assertions over equivalent lookarounds.
🔧 prefer-predefined-character-set Prefer predefined character sets instead of their more verbose form.
🔧 prefer-predefined-quantifiers Prefer predefined quantifiers (+*?) instead of their more verbose form.
🔧 simple-constant-quantifier Prefer simple constant quantifiers over the range form.
🔧 sort-flags Requires the regex flags to be sorted.

eslint-plugin-clean-regex's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

eslint-plugin-clean-regex's Issues

Join forces with eslint-plugin-regexp

Description

Thanks for your great work on this plugin!

I was using eslint-plugin-regexp for a while, and now I stumbled upon eslint-plugin-clean-regex. I can see that many rules are very similar in these two plugins.

@RunDevelopment and @ota-meshi, how do you feel about joining forces with each other by merging eslint-plugin-regexp and eslint-plugin-clean-regex into a single project?

Detect alternations which are prefix of another

Top-level alternations like int|integer are problematic because the word integer integer will never be matched.

In general: Let A and B be a pair of distinct alternatives of the same alternation where A comes before B.
If there ex. a word w in L(A) and x in L(B) \ L(A) such that w is prefix of x, then x can never be matched by B.
In other words: Report if   [ L(B) \ L(A) ] ∩ L(/A[\s\S]*/) != ∅.

Examples:

/a|b/ //ok
/a|aa/ // report that `aa` cannot be matched
/a\b|aa/ // ok
/a|[ab]a/ // report that `aa` cannot be matched
/(?:a|[ab]a)\b/ // ok
/a(a|aa)a/  // report that `aaaa` cannot be matched

no-unnecessary-lazy: Improve rule

Right now, the rule only reports lazy constant quantifiers but it can do a lot more:

If the next character of the lazy quantifier is not a prefix of the lazily quantified element, the lazy modifier can be removed.

Example:

/ab+?c/

Upgrade to new ESLint major

This project currently uses ESLint 3 which isn't the latest major version of ESLint. Looking at the releases, it's probably the best move to wait until v6.0 is released.

Add a goal section to the readme

When I first started this project, it was just a collection of simple rules I came up with to make reviewing regexes easier. But as the project grew and rules became more numerous, my focus started to shift. While the vague promise of "writing better regular expressions" may have been a good umbrella for what I've been doing until now, it doesn't describe the function and goals of the project at all.

Task:

  • Describe in a few sentences in plain English what the purpose of this plugin is and how it tries to achieve its goals.
  • List non-goals and make it clear that this plugin is opinionated.
  • Be brief!

The section will be placed directly below the slogan, so people don't have to search for this information.

`no-trivially-nested-lookaround` can change the patterns

Description

I just learned that capturing groups inside negated lookarounds behave interestingly. The captured text of a capturing group is reset after leaving a negated lookaround.

This means that the \1 in /(?!(a))\w\1/ is useless. However, the \1 in /(?!(?!(a)))\w\1/ (double negation) is useless too.

This means that (?!(?!R) for some regex R is equivalent to (?=R) if and only if R does not contain capturing groups.

The no-trivially-nested-lookaround does not account for this right now.

perfer-character-class: Reorder alternatives if it's safe

If the only thing preventing two one-character alternatives from being merged is an alternative that cannot start with either characters (from the one-character alternatives), then it's safe to reorder the alternatives and merge the one-character alternatives.

Example:
(?:a|foo|b) == (?:a|b|foo) == (?:[ab]|foo)

I already added a util method that can detect the first character of an alternative, so this should be easy to implement.

Prefere standard character sets

/[0-9]/ // replace with \d
/[a-zA-Z_\d]/ // replace with \w
/[a-zA-Z_\d-]/ // replace with [\w-]
/[a-z_\d]/i // replace with \w

This rule should only affect \d\D\w\W.

Detect unnecessary assertions

Some assertions are plain unnecessary, either because they are always true or always false.
Examples:

/a\ba/ // always false
/a\Ba/ // always true
/foo^/ // note that there is no m flags
/$bar/
/foo(?!x)\s+bar/

Detect quadratic patterns

Some seemly innocent patterns can have a run time of O(n^2). This can be a vulnerability as pointed out here and further explained here.

"Even extremely simple regexes like /a+b/ show this O(n^2) behavior for inputs like 'a'*n." ('a'*n means n-many a characters.)

The purpose of this rule is to detect these patterns.

From what I've seen, the general rule seems to be: If there exists some set of paths AB*C in the regex R such that x = (L(A) ∩ L(B*)) \ ({ε} ∪ L(C)) is not the empty set, then R will take Ω(n^2) many steps to reject a word w ∈ x^n \ L(R).

Please note the Omega in the time complexity bound. This is not a typo. The backtracking algorithm might actually take more than O(n) steps to reject a suffix of the input string.

Consistent match all character class

There are multiple ways to express a character class which accepts all characters but you should choose one and stick with it to make your regexes easier to understand.

Examples:

/[\s\S]/, /[\d\D]/, /[\w\W]/, /[^]/

Notes:
If present, we could even take advantage of the s flag.

This should also detect alternations which are equal to the set of all characters. Examples:

/(?:\s|\S)/, /\s|\S/, /(?:.|\s)/, /.|\D/

Detect non-disjoint alternatives

To prevent exponential backtracking, the alternatives of a quantified group have to be disjoint (aside from the empty string and assertions).

For a RE /(A1|A2|...|An)*/, its alternatives have to disjoint such that:
∀ Aj: ( L(/(A1|...|Aj-1|Aj+1|An)*/) ∩ L(/Aj*/) ) \ {ε} = ∅ where L is a function which returns the language of the given RE and ε is the empty string.

This will be difficult to implement because we need to be able to construct an NFA from the RE and assertions don't make this any easier.

Prefer character classes instead of groups if possible

Example:

/(?:a|b|c)/ => /[abc]/
/(?:\w|-|\+|\*|\/)+/ => /[\w+*/-]+/
/(a|b|c)/ => /([abc])/
/(?:[ab]|c)/ => /[abc]/
/(?:a|b)/ // stay like this

Note:
This rules should only affect groups with a) >= 3 alternatives or b) at least one character class.

Suggest i flag if it simplifies the pattern

If a pattern can be simplified by adding the i flag, it might be a good idea to suggest that to the user.

The conditions for this rule to add the i flag (and subsequently simplify the pattern) are:

  1. The i flag isn't present.
  2. All characters, character classes, and character sets match the same character(s) regardless of the i flag (aka. the flag doesn't change the meaning of the pattern.)
  3. There is at least one character class that can be simplified because of the added i flag. E.g. a character range can be removed, or a character class can be replaced by a character or character set, or similar.

Adding flags willy-nilly might cause some problems, so I don't know whether it should be auto-fixable. Does ESLint have a suggestion mode (aka "you might want to do this but I (ESLint) won't auto-fix it for you")?

Use standard assertions if possible

Standard assertions \b, \B, ^, and $ are efficiently implemented and easy to understand. That's why they should be preferred over lookaround assertions which do the same.

Examples:

/foo(?!\w)/ => /foo\b/,
/foo(?!.)/ => /foo$/m

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.