rundevelopment / eslint-plugin-clean-regex Goto Github PK

View Code? Open in Web Editor NEW

271.0 7.0 2.0 1.4 MB

An ESLint plugin for writing better regular expressions.

License: MIT License

JavaScript 2.26% TypeScript 97.74%

eslint-plugin regexp

eslint-plugin-clean-regex's Introduction

eslint-plugin-clean-regex

An ESLint plugin for writing better regular expressions.

⚠️ Deprecated ⚠️

This project is deprecated.

Please use eslint-plugin-regexp instead.

What happened?

eslint-plugin-clean-regex and eslint-plugin-regexp have joined forces. We decided to work together on one ESLint plugin for JavaScript regexes. Since maintaining two plugins with similar rules takes too much work, I decided to stop working on eslint-plugin-clean-regex.

As of right now, eslint-plugin-regexp supports all rules of eslint-plugin-clean-regex along improvements to those rules and with many more useful rules.

Migration

See the migration guide.

About

This is an ESLint plugin to lint JavaScript regular expressions. Its goal is to help both beginners and experts to write better regular expressions by pointing out errors and suggesting improvements.

The plugin offers rules for possible errors, best practices, and coding style in regular expressions.

Right now, this project is still young (and many rules are opinionated). Feel free to open an issue if you think rules are too strict/lax/inflexible. Suggestions and feature requests are welcome as well!

Getting started

You'll need to install ESLint and eslint-plugin-clean-regex:

$ npm i eslint eslint-plugin-clean-regex --save-dev

Note: If you installed ESLint globally (using the -g flag) then you must also install eslint-plugin-clean-regex globally.

Add clean-regex to the plugins section of your .eslintrc configuration file (you can omit the eslint-plugin- prefix) and configure the rules you want:

{
    "plugins": [
        "clean-regex"
    ],
    "rules": {
        "clean-regex/rule-name": 2
    }
}

You can also use the recommended config:

{
    "plugins": [
        "clean-regex"
    ],
    "extends": [
        "plugin:clean-regex/recommended"
    ]
}

The setting of every rule in the recommended config can be found in the table below.

Highlights

Some highlights of the working and working-together of rules in the recommended config.

Optimize character classes

Before:

- /[0-9]/i
- /[^\s]/
- /[a-fA-F0-9]/i
- /[a-zA-Z0-9_-]/
- /[a-z\d\w]/
- /[\S\d]/
- /[\w\p{ASCII}]/u

After:

- /\d/
- /\S/
- /[a-f0-9]/i
- /[\w-]/
- /\w/
- /\S/
- /\p{ASCII}/u

Simplify patterns

Before:

- /(?:\w|\d)+/
- /(?:a|(b)|c|(?:d)|(?:ee)){0,}/
- /(?<!\w)a+(?=$)/mi
- /[\s\S]#[\0-\uFFFF]/ysi
- /\d*\w(?:[a-z_]|\d+)*/im

After:

- /\w+/
- /(?:[acd]|(b)|ee)*/
- /\ba+$/im
- /.#./sy
- /\w+/

Detect non-functional code and potential errors

- /\1(a)/        // `\1` won't work
- /a+b*?/        // `b*?` can be removed
- /(?:\b)?a/     // `(?:\b)?` can be removed
- /[a-z]+|Foo/i  // `Foo` can be removed
- /(?=a?)\w\Ba/  // `(?=a?)` and `\B` always accept and can be removed
- /[*/+-^&|]/    // `+-^` will match everything from \x2B to \x5E including all character A to Z

Supported Rules

Fixable rules are denoted with a 🔧.

Problems

	Rule	Description
	confusing-quantifier	Warn about confusing quantifiers.
	disjoint-alternatives	Disallow different alternatives that can match the same words.
	no-empty-alternative	Disallow alternatives without elements.
	no-empty-backreference	Disallow backreferences that will always be replaced with the empty string.
	no-empty-lookaround	Disallow lookarounds that can match the empty string.
	no-lazy-ends	Disallow lazy quantifiers at the end of an expression.
	no-obscure-range	Disallow obscure ranges in character classes.
	no-octal-escape	Disallow octal escapes outside of character classes.
	no-optional-assertion	Disallow optional assertions.
	no-potentially-empty-backreference	Disallow backreferences that reference a group that might not be matched.
	no-unnecessary-assertions	Disallow assertions that are known to always accept (or reject).
🔧	no-zero-quantifier	Disallow quantifiers with a maximum of 0.
	optimal-lookaround-quantifier	Disallows the alternatives of lookarounds that end with a non-constant quantifier.

Suggestions

	Rule	Description
🔧	consistent-match-all-characters	Use one character class consistently whenever all characters have to be matched.
🔧	identity-escape	How to handle identity escapes.
	no-constant-capturing-group	Disallow capturing groups that can match only one word.
🔧	no-trivially-nested-lookaround	Disallow lookarounds that only contain another assertion.
🔧	no-trivially-nested-quantifier	Disallow nested quantifiers that can be rewritten as one quantifier.
🔧	no-unnecessary-character-class	Disallow unnecessary character classes.
🔧	no-unnecessary-flag	Disallow unnecessary regex flags.
🔧	no-unnecessary-group	Disallow unnecessary non-capturing groups.
🔧	no-unnecessary-lazy	Disallow unnecessarily lazy quantifiers.
🔧	no-unnecessary-quantifier	Disallow unnecessary quantifiers.
🔧	optimal-concatenation-quantifier	Use optimal quantifiers for concatenated quantified characters.
🔧	optimized-character-class	Disallows unnecessary elements in character classes.
🔧	prefer-character-class	Prefer character classes wherever possible instead of alternations.
🔧	prefer-predefined-assertion	Prefer predefined assertions over equivalent lookarounds.
🔧	prefer-predefined-character-set	Prefer predefined character sets instead of their more verbose form.
🔧	prefer-predefined-quantifiers	Prefer predefined quantifiers (+*?) instead of their more verbose form.
🔧	simple-constant-quantifier	Prefer simple constant quantifiers over the range form.
🔧	sort-flags	Requires the regex flags to be sorted.

eslint-plugin-clean-regex's People

Stargazers

Watchers

Forkers

vccavalcanti nwthomas

eslint-plugin-clean-regex's Issues

Join forces with eslint-plugin-regexp

Description

Thanks for your great work on this plugin!

I was using eslint-plugin-regexp for a while, and now I stumbled upon eslint-plugin-clean-regex. I can see that many rules are very similar in these two plugins.

@RunDevelopment and @ota-meshi, how do you feel about joining forces with each other by merging eslint-plugin-regexp and eslint-plugin-clean-regex into a single project?

Detect alternations which are prefix of another

Top-level alternations like int|integer are problematic because the word integer integer will never be matched.

In general: Let A and B be a pair of distinct alternatives of the same alternation where A comes before B.
If there ex. a word w in L(A) and x in L(B) \ L(A) such that w is prefix of x, then x can never be matched by B.
In other words: Report if [ L(B) \ L(A) ] ∩ L(/A[\s\S]*/) != ∅.

Examples:

/a|b/ //ok
/a|aa/ // report that `aa` cannot be matched
/a\b|aa/ // ok
/a|[ab]a/ // report that `aa` cannot be matched
/(?:a|[ab]a)\b/ // ok
/a(a|aa)a/  // report that `aaaa` cannot be matched

no-unnecessary-lazy: Improve rule

Right now, the rule only reports lazy constant quantifiers but it can do a lot more:

If the next character of the lazy quantifier is not a prefix of the lazily quantified element, the lazy modifier can be removed.

Example:

/ab+?c/

Make no-unnecessary-groups fixable

Upgrade to new ESLint major

This project currently uses ESLint 3 which isn't the latest major version of ESLint. Looking at the releases, it's probably the best move to wait until v6.0 is released.

Add a goal section to the readme

When I first started this project, it was just a collection of simple rules I came up with to make reviewing regexes easier. But as the project grew and rules became more numerous, my focus started to shift. While the vague promise of "writing better regular expressions" may have been a good umbrella for what I've been doing until now, it doesn't describe the function and goals of the project at all.

Task:

Describe in a few sentences in plain English what the purpose of this plugin is and how it tries to achieve its goals.
List non-goals and make it clear that this plugin is opinionated.
Be brief!

The section will be placed directly below the slogan, so people don't have to search for this information.

Add naming conventions for rule names and rule options

`no-trivially-nested-lookaround` can change the patterns

Description

I just learned that capturing groups inside negated lookarounds behave interestingly. The captured text of a capturing group is reset after leaving a negated lookaround.

This means that the \1 in /(?!(a))\w\1/ is useless. However, the \1 in /(?!(?!(a)))\w\1/ (double negation) is useless too.

This means that (?!(?!R) for some regex R is equivalent to (?=R) if and only if R does not contain capturing groups.

The no-trivially-nested-lookaround does not account for this right now.

Warn about empty alternatives

Almost nobody uses empty alternatives and it's easy to forget to delete a | when rewriting regular expressions.

perfer-character-class: Reorder alternatives if it's safe

If the only thing preventing two one-character alternatives from being merged is an alternative that cannot start with either characters (from the one-character alternatives), then it's safe to reorder the alternatives and merge the one-character alternatives.

Example:
(?:a|foo|b) == (?:a|b|foo) == (?:[ab]|foo)

I already added a util method that can detect the first character of an alternative, so this should be easy to implement.

Prefere standard character sets

/[0-9]/ // replace with \d
/[a-zA-Z_\d]/ // replace with \w
/[a-zA-Z_\d-]/ // replace with [\w-]
/[a-z_\d]/i // replace with \w

This rule should only affect \d\D\w\W.

Detect unnecessary assertions

Some assertions are plain unnecessary, either because they are always true or always false.
Examples:

/a\ba/ // always false
/a\Ba/ // always true
/foo^/ // note that there is no m flags
/$bar/
/foo(?!x)\s+bar/

Warn about nested unused capturing groups

Unused nested capturing groups are likely meant to be non-capturing groups:

Examples:

/(foo(bar))/
/(foo(\s+|\s*,\s*)?bar)/

Warn about empty groups containing only assertions

Examples:

/(\b)+/
/(?:(?!a))/

Detect quadratic patterns

Some seemly innocent patterns can have a run time of O(n^2). This can be a vulnerability as pointed out here and further explained here.

"Even extremely simple regexes like /a+b/ show this O(n^2) behavior for inputs like 'a'*n." ('a'*n means n-many a characters.)

The purpose of this rule is to detect these patterns.

From what I've seen, the general rule seems to be: If there exists some set of paths AB*C in the regex R such that x = (L(A) ∩ L(B*)) \ ({ε} ∪ L(C)) is not the empty set, then R will take Ω(n^2) many steps to reject a word w ∈ x^n \ L(R).

Please note the Omega in the time complexity bound. This is not a typo. The backtracking algorithm might actually take more than O(n) steps to reject a suffix of the input string.

Consistent match all character class

There are multiple ways to express a character class which accepts all characters but you should choose one and stick with it to make your regexes easier to understand.

Examples:

/[\s\S]/, /[\d\D]/, /[\w\W]/, /[^]/

Notes:
If present, we could even take advantage of the s flag.

This should also detect alternations which are equal to the set of all characters. Examples:

/(?:\s|\S)/, /\s|\S/, /(?:.|\s)/, /.|\D/

Detect non-disjoint alternatives

To prevent exponential backtracking, the alternatives of a quantified group have to be disjoint (aside from the empty string and assertions).

For a RE /(A1|A2|...|An)*/, its alternatives have to disjoint such that:
∀ Aj: ( L(/(A1|...|Aj-1|Aj+1|An)*/) ∩ L(/Aj*/) ) \ {ε} = ∅ where L is a function which returns the language of the given RE and ε is the empty string.

This will be difficult to implement because we need to be able to construct an NFA from the RE and assertions don't make this any easier.

Prefer character classes instead of groups if possible

Example:

/(?:a|b|c)/ => /[abc]/
/(?:\w|-|\+|\*|\/)+/ => /[\w+*/-]+/
/(a|b|c)/ => /([abc])/
/(?:[ab]|c)/ => /[abc]/
/(?:a|b)/ // stay like this

Note:
This rules should only affect groups with a) >= 3 alternatives or b) at least one character class.

Suggest i flag if it simplifies the pattern

If a pattern can be simplified by adding the i flag, it might be a good idea to suggest that to the user.

The conditions for this rule to add the i flag (and subsequently simplify the pattern) are:

The i flag isn't present.
All characters, character classes, and character sets match the same character(s) regardless of the i flag (aka. the flag doesn't change the meaning of the pattern.)
There is at least one character class that can be simplified because of the added i flag. E.g. a character range can be removed, or a character class can be replaced by a character or character set, or similar.

Adding flags willy-nilly might cause some problems, so I don't know whether it should be auto-fixable. Does ESLint have a suggestion mode (aka "you might want to do this but I (ESLint) won't auto-fix it for you")?

Use standard assertions if possible

Standard assertions \b, \B, ^, and $ are efficiently implemented and easy to understand. That's why they should be preferred over lookaround assertions which do the same.

Examples:

/foo(?!\w)/ => /foo\b/,
/foo(?!.)/ => /foo$/m

Detect polynomial backtracking caused by the trivial concatenation of a subset with its superset

Stuff like this:

/;+.*/ == /;.*/
/\d+\w+/ == /\d\w+/
/\w+\d+/ == /\w+\d/
/\w+\d*/ == /\w+/

More general: Let A be a subset of B.

/A{n,m}B{o,}/ == /A{n}B{o,}/
/B{n,}A{o,p}/ == /B{n,}A{o}/

The new rule should detect concatenation like this and report them. In trivial cases, it might even provide a fixer.

rundevelopment / eslint-plugin-clean-regex Goto Github PK

eslint-plugin-clean-regex's Introduction

eslint-plugin-clean-regex

⚠️ Deprecated ⚠️

What happened?

Migration

About

Getting started

Highlights

Optimize character classes

Simplify patterns

Detect non-functional code and potential errors

Supported Rules

Problems

Suggestions

eslint-plugin-clean-regex's People

Stargazers

Watchers

Forkers

eslint-plugin-clean-regex's Issues

Description

Description

Recommend Projects

Recommend Topics

Recommend Org