errata-ai / vale Goto Github PK

View Code? Open in Web Editor NEW

4.2K 4.2K 135.0 73.41 MB

:pencil: A markup-aware linter for prose built with speed and extensibility in mind.

Home Page: https://vale.sh

License: MIT License

Makefile 0.60% Go 99.17% Dockerfile 0.23%

linter linting nlp vale

vale's People

Contributors

Stargazers

Watchers

Forkers

mjang djdj10x10 dsyer keenahn elliott5 abingham ef2k gaotongfei cwonrails chrischinchilla gaurav-nelson renesugar ondine wouter-veeken lionawurscht devopsotrator plaindocs decentral1se svx felicitymay pombredanne vishalbelsare isgasho jessica-mitchell infrastrukt maxwelldb ka1bi4 hcoles infotexture jk464 spread0x opencollective tbm sgerrand amrita42 inoxx03 clywyk zhutony dmacvicar yikeke rowhit marvin9 nschonni irinanadolu paircast ashmckenzie pklaschka ggrossetie artflag forkkit melanie-feb qiancai germling awesomegolang trendingtechnology younghai suryapanneer pkafei im2nguyen dboshardy marshallgunnell solimant syllogy rockerboo franklincomp learnerkeith20 juh2 ynotstartups ohir tyhal sarah-welton osslate thebadt dawid-wieczorek tibsatwork leipert problem-solving-agency-zenahr-barzani isabella232 tancnle jbabinsack-fms cyberflamego bayanijulian bowser1704 josmar-crwdstffng giuseppemp sthagen fwilhe wrich-shaw dohertywa m-czernek kij grettke cymruu p- zbraiterman matchaxnb avineshwar centaurioun 5l1v3r1 heyitsgilbert

vale's Issues

Support predefined variables in rule definitions

For example, $word, $noun, $verb, etc.

Related to #1.

Support changing the severity level of built-in rules

For example,

vale.PassiveVoice = warning # instead of just `= YES`

Refine built-in rules

I think Vale should ship with a minimal set of commonly used and/or non-subjective rules. Considering this, I think:

PassiveVoice should be decreased to a suggestion.
MinAlertLevel should default to warning (it's currently suggestion).

Add basic rule template generation

I think it might be nice if you could type something like vale new RuleName/ExtensionPoint and get a file named RuleName.yml with the basic structure + useful comments about the particular extension point.

Installed via Homebrew, unable to configure any parser but vale

I'm a new user, just having installed Vale via Homebrew. I can't get Vale to use other rulesets except the build-in one. In this example reproducing the error, I'm trying to lint a series of Asciidoc documents. I have my Vale configuration in a .vale file.

▶ which vale
/usr/local/bin/vale

Here's my .vale file:

[*]
BasedOnStyles = proselint

▶ vale dc
{
"Checks": null,
"GBaseStyles": [
"proselint"
],
"GChecks": {},
"MinAlertLevel": 1,
"SBaseStyles": {},
"SChecks": {},
"StylesPath": "",
"RuleToLevel": {},
"Output": "CLI",
"Wrap": false,
"NoExit": false,
"Sorted": false,
"Normalize": false,
"Simple": false,
"InExt": ".txt"
}
▶ vale *.adoc
✔ 0 errors, 0 warnings and 0 suggestions in 6 files.

When I modify my .vale file to read:

[*]
BasedOnStyles = vale

I get:

▶ vale *.adoc
[several items omitted]
vol4.adoc
125:11 warning Consider removing 'Tiny' vale.Editorializing
196:47 warning Consider removing 'excellent' vale.Editorializing

✖ 7 errors, 75 warnings and 0 suggestions in 6 files.

If vale is in the BasedOnStyles list in my configuration file, I get the 7 errors and 75 warnings shown above. If it's not, I get 0 of each. I've also tried setting StylesPath to a local directory, I've tried some Markdown files, and I've tried configuring it with other parsers than parselint like 18F and TheEconomist, but to no avail.

Is there logging that would help track down the issue?

Improve Vale's out-of-the-box experience

As a few Reddit users mentioned here, Vale's out-of-the-box experience can be a bit off-putting due to its aggressiveness. I think part of this is a misunderstanding about Vale's purpose, but it probably should be addressed anyway.

(Related to #30, although I think my opinion about offering "non-subjective" rules has changed; I'm not sure they exist.)

Here are a few of my thoughts:

I'm tempted to ship without any rules enabled. My stance from the beginning has been that writing is too complex/nuanced to be able to offer general purpose, authoritative advice. Instead, Vale is designed to help its users adhere to an existing guide—whether that's a simple style guide for an open source project's documentation or something more thorough like 18F.
Assuming we don't completely disable the built-in rules, I think we should strive to be very quiet by default. No one likes to receive a long list of suggestions about their writing and we only get one chance to make a first impression.
Regardless of whether they're controversial or imprecise, I think there are a number of rules that should come pre-implemented with Vale (even if they're likely to be disabled by default). My criteria for these are: (1) it's not easy to implement (e.g., PassiveVoice), (2) it's time-consuming to implement (e.g., ComplexWords), or (3) it's commonly mentioned in style guides (e.g., GenderBias). Basically, I'd like to make style creation as easy as possible. (See https://github.com/ValeLint/docs/issues/1 for more discussion on style creation.)

Improve handling of rule citations

Rule definitions already have a link key—we just need to include it in the CLI output somehow.

Support input from stdin

I'm thinking something like:

# Will be treated as plain text
$ vale 'This is some text to lint'
...
# Will be treated as Markdown
$ vale --ext='.md' 'This is some text to lint'
...
# Will be treated as plain text
$ echo 'this is more text' | vale
...

Duplicate matches on the same line aren't handled correctly

For example in,

all of the teams in all of the use cases of your team.

we return the location of the first match twice.

Add org-mode support

There's a library that should make this addition fairly straight forward.

Vale does not recognise javadoc comments

Together with C-style comments Java also javadoc comments:

The first line contains the begin-comment delimiter ( /**).

Example:

/**
 * Returns an Image object that can then be painted on the screen. 
 *
 * @param  url  an absolute URL giving the base location of the image
 * @param  name the location of the image, relative to the url argument
 * @return      the image at the specified URL
 * @see         Image
 */
 public Image getImage(URL url, String name) {
  // methodimpl.
 }

When parsing javadoc comments as above in java files, Vale complains about repeating star

 47:6   error    '*' is repeated!              vale.Repetition
 83:6   error    '*' is repeated!              vale.Repetition
 86:6   error    '*' is repeated!              vale.Repetition

As javadoc-comments are widely used in java programs these false-positives errors greatly impede Vale result analysis.
I think that core/format.go file needs to be modified. Now it treats java-comments as c-comments(obviously C does not support javadoc):

var CommentsByNormedExt = map[string]map[string]string{
	".c": {
		"inline":     `(//.+)|(/\*.+\*/)`,
		"blockStart": `(/\*.*)`,
		"blockEnd":   `(.*\*/)`,
	},
.....
	`\.(?:java|bsh)$`:                             {".c", "code"},
....
}

Create a website

Some features to consider are:

A demo textarea (using GopherJS?);
an interactive style creator (similar to clang-format?);
a searchable list of styles; and
documentation to replace the wiki.

Port codetype

For ambiguous cases like .m (is it Objective-C or MATLAB?)...

Support for linting json and yaml content.

A lot of static site and documentation builders have content in json / yaml.

How to match backticks in styles?

I think I'm just being dense, but I can't figure out how to match backticks ("`" characters) in a swap rule. I'm linting a book on Python, and we want to standardize how we spell e.g. "for-loop". So I want a rule like this (simplified from what we really want):

swap:
  'for loop': for-loop
  '`for` loop': for-loop

But this doesn't seem to work. The first rule works just fine, but the second one with backticks doesn't seem to fire. How can I make this work?

Generate a human readable style guide from the YML

Would be a nice enhancement.

Replace Cucumber/Aruba with pure Go tests

Possible bug

This line

https://travis-ci.org/writethedocs/www/builds/212845872#L253

contains warningm which looks like it could a parsing bug somewhere.

Support in-text rule disabling

I'm thinking something along the lines of

<!-- vale off -->
This is some text

more text here...
<!-- vale on -->

and

<!-- vale Style.Rule = NO -->
This is some text

which would reset on the next blank line.

Improve LaTeX support

We lint line-by-line at the moment (it only gets scoped as text).

Tests? Possible issue with GenderBias

Are there some tests I can run that'll convince me all checks are working? :-p

I'm currently manually testing GenderBias and it does not seem to be working perfectly.

I does not trigger on doorman but should. (Also typo in concierge.)

Thanks :-)

Include reference styles in other languages

(Note: this depends on further development of the prose library.)

lintProse is our only English-specific component.

We could add a Language configuration key:
```
# .vale
Language = German
...
```
We could compile for different languages.
Languages could be specified on a per-rule basis. (This would probably be the most ideal solution...)

Allow rules to target code spans

While most of the time we'll probably want to ignore code spans, there are cases (see #43 and Rust's style guide) where it'd be nice to have access to them without having to use --ignore-syntax (which negates the other benefits of Vale's syntax handling—notably scoping and avoiding things like URLs).

So, I'm thinking of making this optional on a per-rule basis. For example, if we want to enforce omitting parentheses in code spans (as in the Rust style guide), we could write the following:

extends: existence
message: "Remove parentheses from '%s'"
description: "When talking about a method in prose, DO NOT include the parentheses."
level: error
nonword: true
code: true # this tells Vale to keep code spans
tokens:
  - '`\w+\(\)`' # something like `read_line()` would be flagged.

This is currently being explored on the feat/ignore branch.

Spell checking ... ?

My initial feelings on this were that it was out of scope: Vale is designed to work with the more subjective, stylistic parts of writing—not spelling and grammar.

However, I think its "syntax awareness" could prove useful here too. For example, checking source code comments and ignoring markup syntax and code blocks.

POS Tagger

Create a Windows installer

Consider go-msi and Inno Setup.

Improve styling of multiple options

For example,

message: Use "%s" instead of "%s"
swap:
  foo: bar or baz #  currently the only way to list multiple options

results in Use "bar or baz" instead of "foo". I think it would be better as something like:

foo: [bar, baz] # => Use "bar" or "baz" instead of "foo"

Create a better benchmark suite

As mentioned in #26, one of the goals of our test suite is to include better profiling information. Vale is pretty fast at the moment (especially when compared to other prose linters), and it'd be nice to be able to measure how changes impact this without having to profile the entire application. For starters, I think we should do this at the package level:

vale only lints first file?

When passing multiple files to vale, as in

$ vale --output JSON --no-exit foo.md bar.md

vale seems to lint only the first file. I see errors in foo.md but no errors from bar.md (regardless of the --output option), although I'm sure bar.md has errors (which do appear if I lint bar.md specifically).

Is this expected behaviour? The --help output suggests that I should be able to pass multiple arguments:

NAME:
   vale - A command-line linter for prose.

USAGE:
   vale [global options] command [command options] [arguments...]

Create .github files

CONTRIBUTING.md, ISSUE_TEMPLATE.md, and PULL_REQUEST_TEMPLATE.md.

add tag to suppress Vale errors/warnings for specific case

Some errors/warnings reported by Vale are perfectly valid English prose. Ultimately, Vale just provides suggestions, potential issues and only author should decide if text needs corrections of not.
It would be very handy to have an ability to suppress such specific 'style smells' by using special tag, e.g.

<!DOCTYPE html>
<html>
<p>In many cases, ......be returned.<!--novale.begin-->Note that<!--novale.end--> the geocoder is very tolerant...
</html>

When processing the such tag vale should not report an issue:

1:426   warning  Consider removing 'Note that'  vale.Editorializing

Similar tags could be added for other file formats(Java, C, PHP, e.t.c.)

Add to Homebrew

Integrate with existing tools

Editors + CLI tools

Sublime Text (https://github.com/ValeLint/SubVale)
Atom (https://github.com/TimKam/atomic-vale)
VS Code (https://github.com/lunaryorn/vscode-vale)
coala CI
Vim (via ALE, thanks to @chew-z)
Emacs (https://github.com/abingham/flycheck-vale)
JetBrains

Other

The following would be nice to have, but will probably need to access Vale through an API of some sort (see languagetool-msword10-addin, for example).

Chrome + Firefox + Thunderbird
MS Word
WordPress
Slack

Expand scopes

I'd like to add quote and list to the existing sentence, paragraph, heading and comment.

Linux should not install to home dir

I'm fairly sure most people do not want the default install location to be macOS and Linux: ~/vale

Maybe ~/bin/vale ?

[Question] How to use Joblint via Vale?

1. Briefly

I don't understand, how I can use at the same time proselint + Joblint + write-good — linters with green development status — for my documents via Vale.

2. Settings

Content of my SashaEquality.md file:

policewoman skillz

Windows 10 Enterprise LTSB 64-bit EN,
Vale — 0.8.1,
Joblint — 2.3.2,
proselint — 0.8.0

3. Expected behavior

If joblint SashaEquality.md:

D:\Киролайна>joblint SashaEquality.md

Joblint

Issue tally:
Culture  |█  (1)

• Use of bro terminology (error)
  policewoman skillz
    ✔ Remove these words.
    ✘ Bro culture terminology can really reduce the number of
    people likely to show interest. It discriminates against anyone who
    doesn't fit into a single gender-specific archetype.

4. Actual behavior

If vale SashaEquality.md:

D:\Киролайна>vale SashaEquality.md

 SashaEquality.md
 1:1  error  Consider using 'police          vale.GenderBias
             officer(s)' instead of
             'policewoman'

✖ 1 error, 0 warnings and 0 suggestions in 1 file.

I can see proselint error, but I can't see Joblint error.

5. Did not help

I download Joblint.zip file and place it to D:\Киролайна\Styles folder. I create .vale file in D:\Киролайна folder. It content:

# Core settings
StylesPath = D:\Киролайна\Styles
MinAlertLevel = warning # suggestion, warning or error

# Global settings (applied to every syntax)
[*]
# List of styles to load
BasedOnStyles = Joblint
# Style.Rule = {YES, NO} to enable or disable a specific rule
vale.Editorializing = YES
# You can also change the level associated with a rule
vale.Hedging = error
…

# Syntax-specific settings
# These overwrite any conflicting global settings
[*.{md,txt}]
…

No effect.

Thanks.

Readability statistics

I'm thinking about including a new readability extension point that will allow users to set standards for metrics like Flesch-Kincaid, Gunning-Fog, and Coleman-Liau. For example,

extends: readability
level: warning
metric: Flesch-Kincaid
grade: 8
scope: paragraph

This would warn about any paragraphs that exceed a reading level of 8th grade.

The prose library already supports these metrics, so it's just a matter of deciding on the check implementation details.

Improve test coverage

We mostly have integration tests now (through cucumber). We need more unit tests.

Multiline matches are handled incorrectly

Something like

ATM \n machine

will be flagged, but we only return one line—so, the second element of Span is always incorrect (it's really on the next line).

Referencing key matches in values for swaps

I'm working on a style to standardize some books on Python, and one place I need consistency is how I talk about "blocks" in the language. I want vale rules to make sure I use "for-block" rather than "for block", "for-block", or "for block". I want similar rules for "if-block", "else-block", and every other kind of block I might need to talk about.

Conceptually, I want swap rules like this:

(for|if|try|else) block: \1-block
`(for|if|try|else)`(?: |-)block: \1-block

That is, I want to capture the type of block in the key regex and reference it in the value. As far as I can tell, this isn't supported in vale. Am I wrong and there's actually a way to do this?

If I'm right, would it be possible to add this?

This is far from critical for me, and it's probably a bit of an edge-case for most users. I can always just enumerate all of the different rules.

Improve the scoping system

From the TextMate docs:

it's also possible to AND, OR, and subtract scope selectors, e.g.: (a | b) & c - d would select the scope which is not matched by d, and matched by both c, and a or b.

I think instead of outright ignoring code spans, we should expose them as a scope (text.code, perhaps) that can be selectively ignored. For example,

scope: text - code

would exclude code spans for the particular rule.

Improve markup processing

Add a `description` key to rule definitions

In many cases, I think it would be desirable to have a longer description than what message currently provides. While not very partical in a CLI environment, it could be useful in editor tooltips or Word add-in views.

Feature request: way to manage globally installed rule sets

Hello! I just learned about vale and am very excited about it. The first thing I noticed is that when you install from brew install vale it just provides the compiled bin with none of the optional rule sets. My workaround is to currently clone the entire vale repo and add

StylesPath = xxxxxx/vale/styles in my $HOME/.vale

It would be cool if the distribution shipped with these styles. If they do and I'm not seeing it, please correct me. I'm going to be working off the source version for now.

Support linting embedded markup

For example, comments written for rustdoc:

/// # Examples
///
/// ```
/// use std::rc::Rc;
///
/// let five = Rc::new(5);
/// ```

Use multiple check definition structs?

We currently use a single struct to represent all checks because a rule's type is specified in the definition itself. It looks like we could use mapstructure to split this up.

Improve error reporting

We currently do a decent job of handling errors but most of it is done silently. Some areas that could use better reporting (perhaps with jWalterWeatherman) are:

configuration: malformed glob patterns, invalid paths, unrecognized rules, rule name collisions
rules: missing keys, malformed regular expressions, YAML syntax errors
linting: unsupported file formats, file I/O errors, parsing errors

Add more output options

At the least, I'd like an option that incorporates link and description.

add a `capitalization` check

I'm thinking this will serve two purposes:

When we're looking for specific spelling of a word (e.g., JavaScript), we'll simply pass the desired string:

extends: capitalization
tokens:
# flags any X such that X.lower() == javascript && X != JavaScript
- JavaScript

It will support variables representing generic cases ($lower, $upper, $title, etc.) so that we could, for example, check that all headings are in title case.