Comments (6)
On a related note, I'm not wild about referring to the readability
or capitalization
checks as "extension points." These feel much less abstract than the others (in reality, they're really modified existence
checks).
This is also relevant to my goal of supporting externally-defined checks that don't directly use one of the extension points (see #45 (comment) for details).
from vale.
Question: Does this "plugin" ignore content in Markdown that does not appear in a doc build? I'm thinking about links and descriptions such as alt text.
In other words, would a page full of links like some word bias the results? IIRC, the Flesch-Kincaid calculations would read bits like the relative path URL as a single (complicated) word.
Example: when I run https://developer.cobalt.io/getting-started/sign-in/ through:
- The WebFX tool (https://www.webfx.com/tools/read-able/), I get a score of 5.8
- Vale's flesch-kincaid plugin, I get a score of 8.09
My wild guess: Vale's flesch-kincaid plugin also reads link text in markdown, such as [some word](../path/to/something-complex) as single words, which would increase the score.
from vale.
Thanks for posting this question, @mjang. (For context: we've been chatting on Slack and spitballing ideas of why the scores differ.)
Another idea: I wonder if the web tools are also counting sidebars and menus.
Some examples:
- https://docs.gitlab.com/ee/user/admin_area/settings/visibility_and_access_controls.html
- 11.83 in Vale version, 8.3 in WebFX version
- https://docs.gitlab.com/ee/ci/pipeline_editor/
- 10.62 in Vale version, 8.7 in WebFX version
- https://docs.gitlab.com/ee/topics/gitlab_flow.html
- 8.97 in Vale version, 6.5 in WebFX version
from vale.
Question: Does this "plugin" ignore content in Markdown that does not appear in a doc build? I'm thinking about links and descriptions such as alt text.
Yes -- Vale tries to be as accurate as possible when calculating these metrics. It uses its summary
scope, which strictly follows the formula: (1) it doesn't include non-prose content (links, html tags, source code, front matter, etc.) and (2) only operates on sentence-containing blocks.
There's a few problems with the comparison to WebFX:
- If you pass a link to a web page, it uses the entire page -- not just the equivalent Markdown contents.
- It "strips" HTML naively, which results in it using source code, tables, and other non-prose content in its calculations.
Here's an example HTML document (a snippet from gitlab_flow
):
<p>Organizations coming to Git from other version control systems frequently find it hard to develop a productive workflow.
This article describes GitLab flow, which integrates the Git workflow with an issue tracking system.
It offers a transparent and effective way to work with Git:</p>
<pre><code class="language-mermaid">graph LR
subgraph Git workflow
A[Working copy] --> |git add| B[Index]
B --> |git commit| C[Local repository]
C --> |git push| D[Remote repository]
end
</code></pre>
-
WebFX reports 10 sentences, 68 words, and a Flesch Kincaid Grade Level of 7.2, which is wildly inaccurate.
-
Vale, on the other hand, internally calculates 3 sentences, 44 words, and a score of 10.78.
Let's break this down:
Sentence 1 [18 words]: Organizations coming to Git from other version control systems frequently find it hard to develop a productive workflow.
Sentence 2 [15 words]: This article describes GitLab flow, which integrates the Git workflow with an issue tracking system.
Sentence 3 [11 words]: It offers a transparent and effective way to work with Git:
Total: 3 sentences, 44 words.
If we pass just the "correct" text to WebFx, it changes its calculations to 3, 44, and 10.2. The score difference is likely from the calculation of "complex words" and syllables, but it's much closer.
from vale.
I'm reopening this issue because I think it would be useful to add a "View: Readability" option to https://vale-studio.errata.ai/.
from vale.
To extend the discussion from the Write the Docs slack:
I need to be able to do an "apples to apples" comparison of Flesch-Kincaid scores. And it's at best difficult to apply the Vale plugin to HTML content (Sure, I could pull the source code from external HTML into a repo, but that requires understanding git, repos, and Vale).
So I need to know -- do you have / know of a Web tool that shows consistent results to your Flesch-Kincaid plugin?
from vale.
Related Issues (20)
- "sequence" extension point not working with .txt HOT 3
- Linting XML files HOT 1
- Consider adding an 'info' level HOT 1
- Ignore rules or styles from CLI? HOT 2
- Vale CLI: out of memory HOT 2
- FEATURE: Make Vale easier to be used in a CI/CD environment HOT 6
- Feature request - allow for "ignorescope" in rules
- Caret (^) not matching the beginning of sentence (but paragraph), when scope is sentence (MD) HOT 1
- Massive CPU Inefficiency (54x longer) introduced with the 2.15.5 release (vale_2.15.5_macOS_64-bit.tar.gz) HOT 9
- Long line of Chinese characters in frontmatter causes crash HOT 4
- Span generated in JSON output is not accurate HOT 2
- Cannot match "</quote>" in XML HOT 2
- How to match a single quote? HOT 2
- Vale falls over on malformed AsciiDoc attributes block HOT 2
- Add single quotes in documentation example when list is involved HOT 4
- Pre-commit hooks don't error out on warnings/suggestions HOT 3
- Feature: Support environment variable for vale.ini location
- Feature: Support running a single rule from the command line HOT 4
- Runtime error for the dita file
- How to disable Vale.terms from checking link hrefs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vale.