Giter VIP home page Giter VIP logo

proselint's Introduction

proselint logo

Workflow status Reviewed by Hound Code Climate codecov License

Writing is notoriously hard, even for the best writers, and it's not for lack of good advice — a tremendous amount of knowledge about the craft is strewn across usage guides, dictionaries, technical manuals, essays, pamphlets, websites, and the hearts and minds of great authors and editors. But poring over Strunk & White hardly makes one a better writer — it turns you into neither Strunk nor White. And nobody has the capacity to apply all the advice from Garner’s Modern English Usage, an 1100-page usage guide, to everything they write. In fact, the whole notion that one becomes a better writer by reading advice on writing rests on untenable assumptions about learning and memory. The traditional formats of knowledge about writing are thus essentially inert, waiting to be transformed.

We devised a simple solution: proselint, a linter for English prose. A linter is a computer program that, akin to a spell checker, scans through a file and detects issues — like how a real lint roller helps you get unwanted lint off of your shirt.

proselint places the world's greatest writers and editors by your side, where they whisper suggestions on how to improve your prose. You’ll be guided by advice inspired by Bryan Garner, David Foster Wallace, Chuck Palahniuk, Steve Pinker, Mary Norris, Mark Twain, Elmore Leonard, George Orwell, Matthew Butterick, William Strunk, Elwyn White, Philip Corbett, Ernest Gowers, and the editorial staff of the world’s finest literary magazines and newspapers, among others. Our goal is to aggregate knowledge about best practices in writing and to make that knowledge immediately accessible to all authors in the form of a linter for prose; all in a neat command-line utility that you can integrate into other tools, scripts, and workflows.

Installation

To get this up and running, install it using pip:

pip install proselint

Fedora

sudo dnf install proselint

Debian

sudo apt install python3-proselint

Ubuntu

sudo add-apt-repository universe
sudo apt install python3-proselint

Plugins for other software

proselint is available on:

Usage

Suppose you have a document text.md with the following text:

John is very unique.

You can run proselint over the document using the command line:

proselint text.md

This prints a list of suggestions to stdout, one per line. Each suggestion has the form:

text.md:<line>:<column>: <check_name> <message>

For example,

text.md:0:10: wallace.uncomparables Comparison of an uncomparable: 'unique' cannot be compared.

The command-line utility can also print suggestions in JSON using the --json flag. In this case, the output is considerably richer:

{
  // Type of check that output this suggestion.
  check: "wallace.uncomparables",

  // Message to describe the suggestion.
  message: "Comparison of an uncomparable: 'unique' cannot be compared.",

  // The person or organization giving the suggestion.
  source: "David Foster Wallace"

  // URL pointing to the source material.
  source_url: "http://www.telegraph.co.uk/a/9715551"

  // Line where the error starts.
  line: 0,

  // Column where the error starts.
  column: 10,

  // Index in the text where the error starts.
  start: 10,

  // Index in the text where the error ends.
  end: 21,

  // length from start -> end
  extent: 11,

  // How important is this? Can be "suggestion", "warning", or "error".
  severity: "warning",

  // Possible replacements.
  replacements: [
    {
      value: "unique"
    }
  ]
}

To run the linter as part of another Python program, you can use the lint function in proselint.tools:

import proselint

suggestions = proselint.tools.lint("This sentence is very unique")

This will return a list of suggestions:

[('weasel_words.very', "Substitute 'damn' every time you're inclined to write 'very;' your editor will delete it and the writing will be just as it should be.", 0, 17, 17, 22, 5, 'warning', None), ('uncomparables.misc', "Comparison of an uncomparable: 'very unique.' is not comparable.", 0, 17, 17, 29, 12, 'warning', None)]

Checks

You can disable any of the checks by modifying $XDG_CONFIG_HOME/proselint/config.json. If $XDG_CONFIG_HOME is not set or empty, ~/.config/proselint/config.json will be used. Additionally, for compatibility reasons, the legacy configurations ~/.proselintrc and $XDG_CONFIG_HOME/proselint/config will be checked if $XDG_CONFIG_HOME/proselint/config.json does not exist.

{
  "checks": {
    "typography.diacritical_marks": false
  }
}
ID Description
airlinese.misc Avoiding jargon of the airline industry
annotations.misc Catching annotations left in the text
archaism.misc Avoiding archaic forms
cliches.hell Avoiding a common cliché
cliches.misc Avoiding clichés
consistency.spacing Consistent sentence spacing
consistency.spelling Consistent spelling
corporate_speak.misc Avoiding corporate buzzwords
cursing.filth Words to avoid
cursing.nfl Avoiding words banned by the NFL
dates_times.am_pm Using the right form for the time of day
dates_times.dates Stylish formatting of dates
hedging.misc Not hedging
hyperbole.misc Not being hyperbolic
jargon.misc Avoiding miscellaneous jargon
lgbtq.offensive_terms Avoding offensive LGBTQ terms
lgbtq.terms Misused LGBTQ terms
lexical_illusions.misc Avoiding lexical illusions
links.broken Linking only to existing sites
malapropisms.misc Avoiding common malapropisms
misc.apologizing Being confident
misc.back_formations Avoiding needless backformations
misc.bureaucratese Avoiding bureaucratese
misc.but Avoid starting a paragraph with "But..."
misc.capitalization Capitalizing only what ought to be capitalized
misc.chatspeak Avoiding lolling and other chatspeak
misc.commercialese Avoiding jargon of the commercial world
misc.currency Avoiding redundant currency symbols
misc.debased Avoiding debased language
misc.false_plurals Avoiding false plurals
misc.illogic Avoiding illogical forms
misc.inferior_superior Superior to, not than
misc.latin Avoiding overuse of Latin phrases
misc.many_a Many a singular
misc.metaconcepts Avoiding overuse of metaconcepts
misc.narcissism Talking about the subject, not its study
misc.phrasal_adjectives Hyphenating phrasal adjectives
misc.preferred_forms Miscellaneous preferred forms
misc.pretension Avoiding being pretentious
misc.professions Calling jobs by the right name
misc.punctuation Using punctuation assiduously
misc.scare_quotes Using scare quotes only when needed
misc.suddenly Avoiding the word suddenly
misc.tense_present Advice from Tense Present
misc.waxed Waxing poetic
misc.whence Using "whence"
mixed_metaphors.misc Not mixing metaphors
mondegreens.misc Avoiding mondegreen
needless_variants.misc Using the preferred form
nonwords.misc Avoid using nonwords
oxymorons.misc Avoiding oxymorons
psychology.misc Avoiding misused psychological terms
redundancy.misc Avoiding redundancy and saying things twice
redundancy.ras_syndrome Avoiding RAS syndrome
skunked_terms.misc Avoid using skunked terms
spelling.able_atable -able vs. -atable
spelling.able_ible -able vs. -ible
spelling.athletes Spelling of athlete names
spelling.em_im_en_in -em vs. -im and -en vs. -in
spelling.er_or -er vs. -or
spelling.in_un in- vs. un-
spelling.misc Spelling words correctly
security.credit_card Keeping credit card numbers secret
security.password Keeping passwords secret
sexism.misc Avoiding sexist language
terms.animal_adjectives Animal adjectives
terms.denizen_labels Calling denizens by the right name
terms.eponymous_adjectives Calling people by the right name
terms.venery Call groups of animals by the right name
typography.diacritical_marks Using dïacríticâl marks
typography.exclamation Avoiding overuse of exclamation
typography.symbols Using the right symbols
uncomparables.misc Not comparing uncomparables
weasel_words.misc Avoiding weasel words
weasel_words.very Avoiding the word "very"

Contributing

Interested in contributing to proselint? Great — there are plenty of ways you can help. Read more on our website, where we describe how you can help us build proselint into the greatest writing tool in the world.

Support

If you run into a problem, please open an issue in or send an email to [email protected].

Running Automated Tests

Automated tests are included in the proselint/tests directory. To run these tests locally, you can use ./utils.

License

The project is licensed under the BSD license.

proselint's People

Contributors

carreau avatar catherineh avatar craigkelly avatar dependabot[bot] avatar drinks avatar hugovk avatar infotexture avatar j10sanders avatar joshmgrant avatar jwilk avatar kylesezhi avatar laraross avatar lcd047 avatar manuel-uberti avatar marsam avatar mavit avatar mpacer avatar nvuillam avatar nytelife26 avatar patchranger avatar patrick96 avatar pyup-bot avatar ran4 avatar salbertson avatar saul avatar suchow avatar tatsh avatar tdenewiler avatar viccuad avatar vikasgorur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proselint's Issues

Create a plugin system

I want a single code file for each check that:

  1. Implements the check.
  2. Includes a docstring that is autogenerated into a web page.
  3. Includes test cases that do and do not raise an error.

Here a sample of what I'm imagining:

"""DFW001: Comparing uncomparables.

---
layout:     post
error_code: DFW201
source:     David Foster Wallace
title:      PL001&#58; Comparing an uncomparable
date:       2014-06-10 12:31:19
summary:    Comparing an uncomparable.
categories: check

---

David Foster Wallace says:

> This is one of a class of adjectives, sometimes called "uncomparables", that
can be a little tricky. Among other uncomparables are precise, exact, correct,
entire, accurate, preferable, inevitable, possible, false; there are probably
two dozen in all. These adjectives all describe absolute, non-negotiable
states: something is either false or it's not; something is either
inevitable or it's not. Many writers get careless and try to modify
uncomparables with comparatives like more and less or intensives like very. But
if you really think about them, the core assertions in sentences like "War is
becoming increasingly inevitable as Middle East tensions rise"; "Their cost
estimate was more accurate than the other firms'"; and "As a mortician, he has
a very unique attitude" are nonsense. If something is inevitable, it is bound
to happen; it cannot be bound to happen and then somehow even more bound to
happen. Unique already means one-of-a-kind, so the adj. phrase very unique is
at best redundant and at worst stupid, like "audible to the ear" or
"rectangular in shape". You can blame the culture of marketing for some of
this difficulty. As the number and rhetorical volume of US ads increase, we
become inured to hyperbolic language, which then forces marketers to load
superlatives and uncomparables with high-octane modifiers (special --- very
special --- Super-special! --- Mega-Special!!), and so on. A deeper issue
implicit in the problem of uncomparables is the dissimilarities between
Standard Written English and the language of advertising. Advertising English,
which probably deserves to be studied as its own dialect, operates under
different syntactic rules than SWE, mainly because AE's goals and assumptions
are different. Sentences like "We offer a totally unique dining experience";
"Come on down and receive your free gift"; and "Save up to 50 per cent... and
more!" are perfectly OK in Advertising English — but this is because
Advertising English is aimed at people who are not paying close attention.
If your audience is by definition involuntary, distracted and numbed, then free
gift and totally unique stand a better chance of penetrating — and simple
penetration is what AE is all about. One axiom of Standard Written English is
that your reader is paying close attention and expects you to have done the
same.
"""

import re


def check(text):

    error_code = "PL001"
    msg = "Comparison of an uncomparable."  # do formatting thing

    comparators = [
        "very",
        "more",
        "less",
        "extremely",
        "increasingly"
    ]

    uncomparables = [
        "unique",
        "correct",
        "inevitable",
        "possible",
        "false",
        "true"
    ]

    errors = []
    for comp in comparators:
        for uncomp in uncomparables:
            occurences = [
                m.start() for m in re.finditer(comp + "\s" + uncomp, text)]
            for o in occurences:
                errors.append((1, o, error_code, msg))
    return errors

def test1():
    pass

Check for common typographical issues

2 x 4 vs. 2 × 4
2-4 vs. 2–4
Bose-Einstein condensate vs. Bose–Einstein condensate
--- vs. —
+/- vs. ±

(Take a look at Jordan's typography talk for some examples.)

Apply memoized rule checks at the paragraph level

Rules are currently defined as functions over the full text of the document. It would be better to apply the functions to each paragraph separately. The reason for this is that, for many documents (especially large ones), most of the paragraphs will not change between saves or keystrokes, such that when these functions are memoized, most of the linter computations will be available right away.

Great writing should come back nearly clean

It would be good to include an automated test sweet that runs the linter over writing that is written by a great author and has already been heavily edited and copyedited (e.g., an essay from The New Yorker that went on to win the Pulitzer prize in nonfiction) . The linter should be nearly silent.

Create a sports detector

One of the entries in GMAU is:

answer back is a common REDUNDANCY, especially in BrE—e.g.: “Hilary and Piers du Pre seem determined to wreak the ultimate revenge on their sister by discrediting her while she lies—unable to answer back [read answer]—in her grave.” Julian Lloyd Webber, “An Insult to Jackie’s Memory,” Daily Telegraph, 4 Jan. 1999, at 15.

In AmE, the phrase is fairly common in sportswriting in the sense “to equal an opponent’s recent scoring effort”—e.g.:
• “Even when the Cougars did score, the Herd answered back in an instant.” Joe Davidson, “Herd Remain on a Roll,” Sacramento Bee, 21 Nov. 1998, at D1.
• “Jake Armstrong quickly answered back for the Knights, but the two-goal cushion was short-lived.” Joe Connor, “La Jolla, Bishop’s Tie One On in Wester,” San Diego Union-Trib., 16 Dec. 1998, at D6.

Some writers have used the sport phrase metaphorically—e.g.: “The last time somebody tried to impose prohibition on Chicago, the city answered back with Al Capone.” Peter Annin, “Prohibition Revisited?” Newsweek, 7 Dec. 1998, at 68. Despite the currency of this usage, answer can carry the entire load by itself.

LANGUAGE-CHANGE INDEX answer back for answer (outside sports): Stage 3

This pattern, where there is an exception to a rule when talking about a particular topic (or where a rule applies only when talking about the topic) will come up many times.

Choose a sensible naming/numbering scheme for errors

The convention is a capital letter and a 3 digit code. For example, pep257 uses codes like D100, D302, etc. It might be nice for us to use a 3-letter code for the source of the advice and a 3-digit code for the specific check, e.g., DFW201. The numeric codes can then be organized across sources according to higher-level categories of errors. For example, 100-level codes might be for overused words, phrases, idioms, symbols, and grammatical structures. The 200-level codes might be for nonsensical structures, such as DFW's comparing uncomparables. This fails if a particular author has > 99 pieces of advice of a particular kind, but if we run into that problem, then we're doing great. If that happens it might also suggest that our errors could use some compression (e.g., by merging all the overused single words into one check).

The URLs it leads to are nice and compact, too: http://lifelinter.com/DFW201.

working out how i can best contribute using github/git

I may need your advice on this one.

I know to do pull requests requires having set up a separate fork of the repo (or at least I think I know that), and I successfully managed to add my fork as a repo, but I fear trying to push changes and overriding anything you've done.

Or should I not worry about that? This is the kind of thing that is most frustrating about trying to work on these projects — I don't want to break anything but I'm not sure always how to properly set it up so that everything is correctly following version control protocol.

Architecture for sharing processed data across rules

If we need to use more computationally intense analyses in multiple rules (e.g., nltk & syntax parsing to identify whether •while• is being used as a conjunction or a adverb) it would make more sense to memoize the output so that it can be accessed by other rules rather than rerun.

This should be a fairly general system that automatically builds the data structures so that they can be shared across individual checks, possibly with some kind of a require type statement being present in more than one rule?

Rubric for hard-to-implement features

Develop some systematic way to describe features within the extracted sources that are not easily implementable but with an eye to why they are not easily implementable and any clues as to what sources may provide a solution to the problem.

Run checks in parallel

There's an opportunity to run the linter in a way that's massively parallel. The main insights here are that many of the rules can be run independently of each other and that they can be run independently on separate parts of the text (e.g., at the paragraph level).

Unincorporated clichés from GMAU

the following need some more thought before including.

  • "inclement weather", ?
  • "there is wide support" in politics
  • boasts as a transitive verb,
  • choreograph used figuratively,
  • giveth ... taketh away
  • orchestrate in nonmusical contexts
  • venerable when used for 'old'

it would also be good to go through all the clichés and think of variant forms that might appear.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.