Giter VIP home page Giter VIP logo

Comments (9)

adon-at-work avatar adon-at-work commented on July 17, 2024

I can see your point of having the double encoding issues.

First, excuse me for not able to disclose how yahoo internally implement this and that.

But let's take one of the most popular open-sourced projects - Wordpress as an example. From https://github.com/WordPress/WordPress/search?utf8=%E2%9C%93&q=rawurlencode, you can see that rawurlencode() (i.e., http://php.net/manual/en/function.rawurlencode.php, much like the php's encodeURI() implementing RFC3986) is found on where the URL is about to be concatenated as a part of HTML output. And occasionally, urldecode() runs exactly before that. It's inapparent to me that rawurlencode() is placed before DB calls. A modern DB can properly store non-alphanumeric characters.

from xss-filters.

zerkms avatar zerkms commented on July 17, 2024

@adon-at-work

It's inapparent to me that rawurlencode() is placed before DB calls. A modern DB can properly store non-alphanumeric characters.

It's not obvious why one should do that.

Not to mention that WordPress is an example of terrible code practices one should never use as arguments.

from xss-filters.

adon-at-work avatar adon-at-work commented on July 17, 2024

It's inapparent to me that rawurlencode() is placed before DB calls. A modern DB can properly store non-alphanumeric characters.

It's not obvious why one should do that.

@zerkms, As long as the DB can store those non-alphanumeric characters, you're agreeing that rawurlencode() is not needed before data is stored at rest, right?

I concur that wordpress might not be a good enough example. So, let's take a look at the following one.

In general, DB should store the raw text from users http://example.com/你好, which will be encoded using encodeURI() or encodeURIComponent() depending on the output context.

  • In <a href="/redirect?url={{url}}">, encodeURIComponent('http://example.com/你好') is needed, as in uriComponentInDoubleQuotedAttr()
  • In <a href="{{url}}">, encodeURI('http://example.com/你好') is needed, as in uriInDoubleQuotedAttr()

If an input filtering using encodeURI() is applied to the url before being saved into DB, what stored in the DB will be http://example.com/%E4%BD%A0%E5%A5%BD. Imagine if @bitinn's suggestion is followed, i.e., removing the encodeURI() and encodeURIComponent() at output filters. One could expect to see something like <a href="/redirect?url=http://example.com/%E4%BD%A0%E5%A5%BD">, which is undesirable.

It is because the correct encoded output should be <a href="/redirect?url=http%3A%2F%2Fexample.com%2F%E4%BD%A0%E5%A5%BD">.

So, unfortunately, @bitinn may need to workaround the problem by running decodeURI() on everything that is previously encoded before using the context-aware output filters (e.g., uriInDoubleQuotedAttr(decodeURI(url)).

from xss-filters.

bitinn avatar bitinn commented on July 17, 2024

@adon-at-work just want to clarify one main point: the raw user input is http://example.com/%E4%BD%A0%E5%A5%BD, because browser automatically escape them when user copy the url from address bar.

So you mean it's better to do decodeURI on input than on output, in general?

from xss-filters.

adon-at-work avatar adon-at-work commented on July 17, 2024

Yes, something like uriInDoubleQuotedAttr(decodeURI(url)) will ensure a correct output.

from xss-filters.

adon-at-work avatar adon-at-work commented on July 17, 2024

To further elaborate, I guess you're asking which of the following is preferable:

  • when collecting input, url = decodeURI(url); saveDB(url). at output, uriInDoubleQuotedAttr(url)
  • when collecting input, saveDB(url). at output, uriInDoubleQuotedAttr(decodeURI(url))

Surely, both will work. The first approach is better since decodeURI() runs only once per 'save', and does not need to run every time the page is visited.

from xss-filters.

bitinn avatar bitinn commented on July 17, 2024

@adon-at-work cheers! on a semi-related note: we use xss-filters with virtual-dom and find on many instance we have to decode xss-filtered output again for virtual-dom to output properly, eg:

var input = '> <';
var filtered = filter. inHTMLData(input); // > &lt;
var vdom = h('div', filtered) // > &amp;lt;

So we end up needing decode again before step 3, because on client-side virtual-dom eventually generate text node using document.createTextNode and it auto-encode &. And on server we follow the same standard.

TL;DR in many vdom solution the encoding is done automatically, leading for some performance lost when using xss-filters (nothing serious, but annoying enough).

from xss-filters.

bitinn avatar bitinn commented on July 17, 2024

I will open an issue on virtual-dom, close for now.

from xss-filters.

adon-at-work avatar adon-at-work commented on July 17, 2024

So we end up needing decode again before step 3

xss filters if not applied at the last step before output is always error-prone.

on client-side virtual-dom eventually generate text node using document.createTextNode

No xss filters are needed if document.createTextNode() is used. But, I don't know virtual-dom enough to tell whether it's 100% time being used. You may like to confirm with them.

from xss-filters.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.