Giter VIP home page Giter VIP logo

Comments (8)

kmike avatar kmike commented on August 24, 2024

There is a stalled PR to address that: #25

from w3lib.

odinplus avatar odinplus commented on August 24, 2024

Yes @kmike, that PR addressing exactly this issue. Maybe if it is impossible to make general solution for safe symbols then it is worth to give optional control to the user? With some additional parameter in request.meta for example.

from w3lib.

kmike avatar kmike commented on August 24, 2024

@odinplus I wonder how this site works with Firefox, as according to @redapple's test Firefox doesn't encode | as well.

from w3lib.

odinplus avatar odinplus commented on August 24, 2024

@kmike with Firefox it is answering with same code 400 if there is a pipe symbol in url.

from w3lib.

kmike avatar kmike commented on August 24, 2024

It seems there is still no consensus between browsers how to handle different characters in URL path (e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1064700). This means that a website which works in one browser may break in another, and we can't create escaping method which works everywhere. #25 is merged, but we've removed | handling from it, so #25 does not fix this particular issue. | handling was removed for these reasons:

  1. it makes changes smaller, more focused and less controversial;
  2. Firefox handles | the same way as w3lib, so it is not that | handling is incorrect per se;
  3. Chrome handles | differently in path and in query, while safe_url_string doesn't make this distinction currently, using the same set of chars - likely it should though.

I'm not opposed to change the way | is handled; we can do it in a separate PR.

But even if we do it, we still won't cover all cases, because behavior differs in browsers. @nyov proposed to have an option to specify which browser should we emulate (#25 (comment)). I think it may require work to maintain, because browsers change, so these rules are not set in stone. They already changed between experiments @dangra and @redapple were making.

I wonder if a more future-proof (though less user-friendly) way to tackle this is to fix scrapy/scrapy#833.

from w3lib.

Gallaecio avatar Gallaecio commented on August 24, 2024

I think Firefox’s approach is the right one in light of https://url.spec.whatwg.org/, which should be considered the latest URL standard.

However, until adoption grows, I wonder if we should, as you @kmike suggest, update safe_url_string to be “safer”.

from w3lib.

Gallaecio avatar Gallaecio commented on August 24, 2024

we can't create escaping method which works everywhere

Well, if we focus specifically on the logic of whether or not to escape a given code point, I think escaping it if any major browser escapes it would be a valid, safe approach. In fact, we may want to decide which characters to escape not so much based on what characters web browser escape, but what characters servers out there need escaped. Over-escaping should not be a problem, so aiming to support as many servers as possible by escaping any characters that some server may need escaped may be the safest approach here.

from w3lib.

wRAR avatar wRAR commented on August 24, 2024

I think this makes sense.

from w3lib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.