Comments (9)
Some background
There is a longnstanding issue that API Documenter generates "markdown" yet there is no standardized syntax for Markdown. Of course CommonMark is such a standard, but all real world implementations add additional custom syntaxes, and because CommonMark lacks formalized extensibility, such custom syntaxes will always break the ability to parse true CommonMark correctly. As one example, suppose we introduce :::tip
as mentioned above - consider this case:
:::tip
```
[link](http://example.com)
:::
```
Is [link](http://example.com)
a hyperlink or not? Most likely the :::tip
parser will naively snip out the code and parse it as Markdown, interpreting ```
as literal backticks because the closing ```
is now outside the chunk being parsed. As a result [link](http://example.com)
is a real hyperlink. Whereas CommonMark would interpret [link](http://example.com)
as being inside a code fence and NOT a hyperlink.
This mutual intelligibility is a mild inconvenience for human authors, but a major frustration for automated tools such as API Documenter that need to emit thousands of pages of API docs and have them render correctly. (BTW TSDoc explicitly aims to solve this problem by forbidding custom syntaxes and relying instead on HTML elements for advanced custom syntaxes.)
What it means
Until now, API Documenter has aimed to produce a balanced dialect of markdown syntax that is likely to get rendered correctly by the most popular engines (Jekyll and GitHub at the time). Where it doesn't work, people have implemented various hacks like api-documenter-docusaurus-plugin that try rewrite the incorrect expressions. That is exactly what @Lightning00Blade did, too.
PR #4578 will break that delicate equilibrium, so unfortunately we probably shouldn't have merged this PR. The safest move may be to revert it. 😅
A better solution
But how to solve @phryneas's problem then?
What we can do instead is introduce api-documenter.json settings to control the output dialect. In this way API Documenter can be optimized for a known Markdown engine rather than trying to be a "least common denominator." We've been planning this already, since Docusuarus 3 has finally adopted standard MDX (whose syntax is based on JSX not CommonMark and thus really needs its own target profile).
This week I'll put together a design proposal.
from rushstack.
PS: Before you start targeting different Markdown engines (which would probably end up with a ton of work for you guys, both to write and maintain), one suggestion from me:
Roll back my change from #4578 and instead consider either a HTML emitter (or a Markdown AST emitter?).
Most people will use the generated markdown as an intermediate step to HTML anyways, and catering to a dozen engines that will all result in HTML is probably more work than directly generating that HTML (which is probably supported by all consumers like Docusaurus as well, especially if it sticks to semantic HTML with "markdown-y" tags only).
from rushstack.
The old table formatting didn't inject :::
. Is that part of your own table contents that is now getting stripped out?
Can you share a repro?
from rushstack.
@iclanton It's part of our in our TsDoc
summery:
class A {
/**
* :::tip
*
* I am a tip
*
* :::
* /
doSomething(){}
}
We use things on top of that to make messages more distinguished when people visit the website Example Tip.
The problem is that now you emit table
(which is fine), but our custom format start being unable to parse these combination. We had similar issue with other things and used the getEscapedText()
to remove things and get the correct formatting. We also used getTableEscapedText()
to do the same for TableCells.
I can confirm that getTableEscapedText()
is never called after #4578, and you can see that this line removes the check and never reintroduces it. This may make the current version prone to XSS due to text never being escaped.
Here is a Gist to change the apps/api-documenter/src/markdown/test/CustomMarkdownEmitter.test.ts
https://gist.github.com/Lightning00Blade/5904e90b3c92f3e16580dedc832e5150
When you run the test I the snapshot should sort of pass (most likely some extra new line will be added) for old version.
from rushstack.
I believe you were relying on an implementation detail here rather than on an intentional API.
Looking at where that function comes from, it was introduced in #1183 to HTML-escape some characters that could not be rendered in Github-style markdown tables.
I don't think it was meant to be extended to add further functionality on top :(
To be honest, I missed that function in my PR, or I would probably completely have removed it.
This may make the current version prone to XSS due to text never being escaped.
I don't think that this function was meant to prevent XSS in the first place - it was added because type unions could not be displayed in markdown tables. (#1065).
Could you maybe elaborate what attack you are foreseeing here, and how you are relying on that?
As for your code above to remove :::tip
sections - maybe that could be achieved in a remark plugin? 🤔
from rushstack.
@octogonz are you sure that's the core of this issue, though?
I believe we interpret the issue reported here differently, so I want to double-check if we're at the same point here.
If I am understanding @Lightning00Blade's report correctly, it's not about that it's breaking markdown generation or that the generated markdown isn't parsable in their dialect anymore.
They relied on the fact that there was a getTableEscapedText
function that was called only inside of tables, because they were overwriting this (implementation detail?) with their own functionality by extending the ApiFormatterMarkdownEmitter
and overwriting that protected
function - and now that there's no inherent reason for that escaping anymore, their override isn't called anymore.
Essentially, instead of using an external plugin like api-documenter-docusaurus-plugin
, they were changing the implementation itself.
Of course, I don't know if my change breaks anything else regarding markdown parsing (I believe it should be more stable for most parsers out there, or I wouldn't have opened the PR), but looking at all the related PRs and issues around the generation of that function, it seems like getTableEscapedText
was not meant as a public API with the intent of "library users should override it".
If getTableEscapedText
was indeed an intended public api, it should also be possible to restore that method call without rolling all other changes back, or creating multiple different emitter strategies.
from rushstack.
Firstly I want to clarify that I know we are using internal APIs, that may brake without notice, the burden is on us to fix them (be it with more overwrites or other patches).
I reported an issue due to this not being a deliberate one while making the changes.
Also I have reworked some our TsDocs to be more inline with what the transformer is expecting. And no patching is needed anymore.
The issue was we had Markdown that was not fully what it expected, but due to all it being Markdown it was making the correct assumptions of how to patch it. With the new HTML it can no longer do that.
Keeping the bug open (at least) until the functionality is added back or fully remove, based on solely your decisions.
from rushstack.
Roll back my change from #4578
Thanks for your contribution @phryneas. I appreciate that you solved some issues in that PR, but unfortunately it comes at a price (for me, at least). The problem is that my Markdown renderer (Marked) doesn't, to my knowledge, allow me to customize the HTML output for elements that are already encoded in HTML rather than Markdown syntax.
from rushstack.
Seems that Material for Mkdocs doesn't like HTML tables either. Markdown syntax introduced in there isn't rendered.
You can check a (short living) example here:
https://19acf12c.ngx-meta.pages.dev/api/ngx-meta/
I'd say same would happen for sites powered by Mkdocs but not 100% sure
from rushstack.
Related Issues (20)
- [api-extractor] Inline imports sometimes incorrectly hoisted (unmodified) into rollups HOT 1
- [api-extractor-model] Broken link HOT 1
- [heft] Error: Cannot find module 'colors/safe' HOT 6
- [heft] Cannot find module '@rushstack/terminal' HOT 1
- [api-extractor] #3465 incorrectly changed defaults for `reportFolder` in the template HOT 2
- [rush] Rush should not write to the build cache when build contains warnings
- [rush] eslint command in rush autoinstallers broken after updated the rush and pnpm version HOT 5
- [rush] Using install throw "??=" error HOT 6
- [api-extractor] Support Typescript 5.4 HOT 6
- [rush] Rush shouldn't need to determine the merge base if commit hash has been provided HOT 2
- [rush] browser-approved-packages.json is not updated when dependencies are removed HOT 2
- [rush] `rush install --only .` installs all dependencies in the monorepo instead of only the dependencies in current folder/project HOT 10
- [@rushstack/terminal] Remove support for the legacy `IColorableSequence` objects.
- [heft-jest] In watch mode, changedFiles should only remove files that match before/after the run HOT 1
- [heft-typescript] Bound current directory in sys.watchFile and sys.watchDirectory
- [rush] Make `install` and `update` log package manager to a file in common/temp
- [api-documenter]
- [rush] Find rush.json location by using 'dirname' at most 10 times HOT 8
- [api-extractor] Allow generation of API reports per release level HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rushstack.