Summary Then running generation of docs I want to have custom esca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Roll back my change from <a class="issue-link js-issue-link" data-error-t

[api-documenter] Regression in Table Cells escaping with latest version about rushstack HOT 9 OPEN

Lightning00Blade commented on June 3, 2024 1

[api-documenter] Regression in Table Cells escaping with latest version

from rushstack.

Comments (9)

octogonz commented on June 3, 2024 1

Some background

There is a longnstanding issue that API Documenter generates "markdown" yet there is no standardized syntax for Markdown. Of course CommonMark is such a standard, but all real world implementations add additional custom syntaxes, and because CommonMark lacks formalized extensibility, such custom syntaxes will always break the ability to parse true CommonMark correctly. As one example, suppose we introduce :::tip as mentioned above - consider this case:

:::tip
```
[link](http://example.com)
:::
```

Is [link](http://example.com) a hyperlink or not? Most likely the :::tip parser will naively snip out the code and parse it as Markdown, interpreting ``` as literal backticks because the closing ``` is now outside the chunk being parsed. As a result [link](http://example.com) is a real hyperlink. Whereas CommonMark would interpret [link](http://example.com) as being inside a code fence and NOT a hyperlink.

This mutual intelligibility is a mild inconvenience for human authors, but a major frustration for automated tools such as API Documenter that need to emit thousands of pages of API docs and have them render correctly. (BTW TSDoc explicitly aims to solve this problem by forbidding custom syntaxes and relying instead on HTML elements for advanced custom syntaxes.)

What it means

Until now, API Documenter has aimed to produce a balanced dialect of markdown syntax that is likely to get rendered correctly by the most popular engines (Jekyll and GitHub at the time). Where it doesn't work, people have implemented various hacks like api-documenter-docusaurus-plugin that try rewrite the incorrect expressions. That is exactly what @Lightning00Blade did, too.

PR #4578 will break that delicate equilibrium, so unfortunately we probably shouldn't have merged this PR. The safest move may be to revert it. 😅

A better solution

But how to solve @phryneas's problem then?

What we can do instead is introduce api-documenter.json settings to control the output dialect. In this way API Documenter can be optimized for a known Markdown engine rather than trying to be a "least common denominator." We've been planning this already, since Docusuarus 3 has finally adopted standard MDX (whose syntax is based on JSX not CommonMark and thus really needs its own target profile).

This week I'll put together a design proposal.

from rushstack.

phryneas commented on June 3, 2024 1

PS: Before you start targeting different Markdown engines (which would probably end up with a ton of work for you guys, both to write and maintain), one suggestion from me:

Roll back my change from #4578 and instead consider either a HTML emitter (or a Markdown AST emitter?).
Most people will use the generated markdown as an intermediate step to HTML anyways, and catering to a dozen engines that will all result in HTML is probably more work than directly generating that HTML (which is probably supported by all consumers like Docusaurus as well, especially if it sticks to semantic HTML with "markdown-y" tags only).

from rushstack.

iclanton commented on June 3, 2024

The old table formatting didn't inject :::. Is that part of your own table contents that is now getting stripped out?

Can you share a repro?

from rushstack.

Lightning00Blade commented on June 3, 2024

@iclanton It's part of our in our TsDoc summery:

class A {
/**
   * :::tip
   *
   * I am a tip
   *
   * :::
   * /
doSomething(){}
}

We use things on top of that to make messages more distinguished when people visit the website Example Tip.

The problem is that now you emit table (which is fine), but our custom format start being unable to parse these combination. We had similar issue with other things and used the getEscapedText() to remove things and get the correct formatting. We also used getTableEscapedText() to do the same for TableCells.

I can confirm that getTableEscapedText() is never called after #4578, and you can see that this line removes the check and never reintroduces it. This may make the current version prone to XSS due to text never being escaped.

Here is a Gist to change the apps/api-documenter/src/markdown/test/CustomMarkdownEmitter.test.ts
https://gist.github.com/Lightning00Blade/5904e90b3c92f3e16580dedc832e5150

When you run the test I the snapshot should sort of pass (most likely some extra new line will be added) for old version.

from rushstack.

phryneas commented on June 3, 2024

I believe you were relying on an implementation detail here rather than on an intentional API.

Looking at where that function comes from, it was introduced in #1183 to HTML-escape some characters that could not be rendered in Github-style markdown tables.

I don't think it was meant to be extended to add further functionality on top :(

To be honest, I missed that function in my PR, or I would probably completely have removed it.

This may make the current version prone to XSS due to text never being escaped.

I don't think that this function was meant to prevent XSS in the first place - it was added because type unions could not be displayed in markdown tables. (#1065).

Could you maybe elaborate what attack you are foreseeing here, and how you are relying on that?

As for your code above to remove :::tip sections - maybe that could be achieved in a remark plugin? 🤔

from rushstack.

phryneas commented on June 3, 2024

@octogonz are you sure that's the core of this issue, though?
I believe we interpret the issue reported here differently, so I want to double-check if we're at the same point here.

If I am understanding @Lightning00Blade's report correctly, it's not about that it's breaking markdown generation or that the generated markdown isn't parsable in their dialect anymore.

They relied on the fact that there was a getTableEscapedText function that was called only inside of tables, because they were overwriting this (implementation detail?) with their own functionality by extending the ApiFormatterMarkdownEmitter and overwriting that protected function - and now that there's no inherent reason for that escaping anymore, their override isn't called anymore.
Essentially, instead of using an external plugin like api-documenter-docusaurus-plugin, they were changing the implementation itself.

Of course, I don't know if my change breaks anything else regarding markdown parsing (I believe it should be more stable for most parsers out there, or I wouldn't have opened the PR), but looking at all the related PRs and issues around the generation of that function, it seems like getTableEscapedText was not meant as a public API with the intent of "library users should override it".

If getTableEscapedText was indeed an intended public api, it should also be possible to restore that method call without rolling all other changes back, or creating multiple different emitter strategies.

from rushstack.

Lightning00Blade commented on June 3, 2024

Firstly I want to clarify that I know we are using internal APIs, that may brake without notice, the burden is on us to fix them (be it with more overwrites or other patches).
I reported an issue due to this not being a deliberate one while making the changes.

Also I have reworked some our TsDocs to be more inline with what the transformer is expecting. And no patching is needed anymore.
The issue was we had Markdown that was not fully what it expected, but due to all it being Markdown it was making the correct assumptions of how to patch it. With the new HTML it can no longer do that.

Keeping the bug open (at least) until the functionality is added back or fully remove, based on solely your decisions.

from rushstack.

nsaunders commented on June 3, 2024

Roll back my change from #4578

Thanks for your contribution @phryneas. I appreciate that you solved some issues in that PR, but unfortunately it comes at a price (for me, at least). The problem is that my Markdown renderer (Marked) doesn't, to my knowledge, allow me to customize the HTML output for elements that are already encoded in HTML rather than Markdown syntax.

from rushstack.

davidlj95 commented on June 3, 2024

Seems that Material for Mkdocs doesn't like HTML tables either. Markdown syntax introduced in there isn't rendered.

You can check a (short living) example here:
https://19acf12c.ngx-meta.pages.dev/api/ngx-meta/

I'd say same would happen for sites powered by Mkdocs but not 100% sure

from rushstack.

[api-documenter] Regression in Table Cells escaping with latest version about rushstack HOT 9 OPEN

Comments (9)

Some background

What it means

A better solution

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent