Giter VIP home page Giter VIP logo

whatwg-url's Introduction

whatwg-url

whatwg-url is a full implementation of the WHATWG URL Standard. It can be used standalone, but it also exposes a lot of the internal algorithms that are useful for integrating a URL parser into a project like jsdom.

Specification conformance

whatwg-url is currently up to date with the URL spec up to commit eee49fd.

For file: URLs, whose origin is left unspecified, whatwg-url chooses to use a new opaque origin (which serializes to "null").

whatwg-url does not yet implement any encoding handling beyond UTF-8. That is, the encoding override parameter does not exist in our API.

API

The URL and URLSearchParams classes

The main API is provided by the URL and URLSearchParams exports, which follows the spec's behavior in all ways (including e.g. USVString conversion). Most consumers of this library will want to use these.

Low-level URL Standard API

The following methods are exported for use by places like jsdom that need to implement things like HTMLHyperlinkElementUtils. They mostly operate on or return an "internal URL" or "URL record" type.

The stateOverride parameter is one of the following strings:

The URL record type has the following API:

These properties should be treated with care, as in general changing them will cause the URL record to be in an inconsistent state until the appropriate invocation of basicURLParse is used to fix it up. You can see examples of this in the URL Standard, where there are many step sequences like "4. Set context object’s url’s fragment to the empty string. 5. Basic URL parse input with context object’s url as url and fragment state as state override." In between those two steps, a URL record is in an unusable state.

The return value of "failure" in the spec is represented by null. That is, functions like parseURL and basicURLParse can return either a URL record or null.

whatwg-url/webidl2js-wrapper module

This module exports the URL and URLSearchParams interface wrappers API generated by webidl2js.

Development instructions

First, install Node.js. Then, fetch the dependencies of whatwg-url, by running from this directory:

npm install

To run tests:

npm test

To generate a coverage report:

npm run coverage

To build and run the live viewer:

npm run prepare
npm run build-live-viewer

Serve the contents of the live-viewer directory using any web server.

Supporting whatwg-url

The jsdom project (including whatwg-url) is a community-driven project maintained by a team of volunteers. You could support us by:

  • Getting professional support for whatwg-url as part of a Tidelift subscription. Tidelift helps making open source sustainable for us while giving teams assurances for maintenance, licensing, and security.
  • Contributing directly to the project.

whatwg-url's People

Contributors

alwinb avatar annevk avatar aomarks avatar armano2 avatar azerum avatar bfarias-godaddy avatar charpeni avatar dependabot[bot] avatar domenic avatar karwa avatar mgiuca avatar pmdartus avatar rmisev avatar sebmaster avatar stevenvachon avatar tilgovi avatar timothygu avatar watilde avatar ykzts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whatwg-url's Issues

Absolute url with relative base throws error

new URL("http://domain/path", "");
new URL("http://domain/path", "/path2/");
new URL("http://domain/path", "//domain2/path2/");

If the the input URL is absolute, the base shouldn't matter (unless it's not a string and not undefined). It probably shouldn't even be parsed in such a case.

Host is dropped if path contains colon

file:// URLs on Windows typically do not %-encode the colon after the drive letter. When building such URIs, this library drops the host of the base (I assume because it reads it as a port?)

Expected, Chrome:

new URL('/c:/baz/qux', 'file://host/foo/bar').href
"file://host/c:/baz/qux"

Actual, whatwg-url:

new URL('/c:/baz/qux', 'file://host/foo/bar').href
"file:///c:/baz/qux"

It actually happens even when you just set pathname to anything with a colon:

const uri = new URL('file://host/foo')
uri.host // 'host'
uri.pathname = '/c:/bar'
uri.host // ''

"get the base" confusion

Code says

        // SPEC: says to use "get the base" algorithm,
        // but the base might've already been provided by the constructor.
        // Clarify!

I think the implementation here is not quite on-spec. If you provide a base URL to the constructor, then per https://url.spec.whatwg.org/#dom-url-urlurl-base, the internal getTheBase returns a (normalized, validated) version of the provided base.

So getURL() doesn't need a getTheBase argument. init() probably does, since getAnchor() or mixinAnchor() or whatever will need it.

I might be able to PR this in.

Unit tests are not run against "built" version of code?

I recently opened a Pull Request that adds tests to check whether the URL constructor would break when a URL object (instead of string) is passed in as the second argument: #42

However, It looks like the "built" lib/url.js file that we get when we npm install whatwg-url-compat does NOT pass these tests. Tests are not run against that file, and I get an the error like this:

var URL = require('whatwg-url-compat').createURLConstructor();

var b = new URL("https://developer.mozilla.org");
var c = new URL("en-US/docs", b);

This code snippet will throw the error:

node_modules/whatwg-url-compat/lib/url.js:455
  return url.replace(/^[\u0000-\u001F\u0020]+|[\u0000-\u001F\u0020]+$/g, "");
             ^

TypeError: url.replace is not a function
    at trimControlChars (/Users/wei/development/urbancompass/src/js/node_modules/whatwg-url-compat/lib/url.js:455:14)
    at new URLStateMachine (/Users/wei/development/urbancompass/src/js/node_modules/whatwg-url-compat/lib/url.js:480:18)
    at URL.init (/Users/wei/development/urbancompass/src/js/node_modules/whatwg-url-compat/lib/url.js:1324:18)
    at new URL (/Users/wei/development/urbancompass/src/js/node_modules/whatwg-url-compat/lib/url.js:1344:10)
    at Object.<anonymous> (/Users/wei/development/urbancompass/src/js/url-test.js:6:9)
    at Module._compile (module.js:435:26)
    at Object.Module._extensions..js (module.js:442:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:311:12)
    at Function.Module.runMain (module.js:467:10)

Property descriptors are wrong

Currently non-enumerable and non-configurable, but they should be enumerable and configurable.

To fix this it's probably easiest to move from Object.defineProperty to just object literals with get x() {} and set x(v) {}.

Pin tests to a specific commit

Having tests fail if upstream is updated is kind of sad; we acknowledge our current revision in the README and the tests should be for that revision.

Also we probably want the ability to point at PR branches.

I can probably work this in to my PR. Filing it in case I forget.

Explain c vs. c_str in a comment at the top

From what I understand the spec uses c (a "code point", which it writes as a string like "@"). We have c, a number, an c_str, the corresponding string. We pass them around together so that code which wants to do numeric comparison can use c, and code which wants to use string comparison can use c_str.

Can we use typed arrays instead of Buffers?

Buffers are such a Node thing... Besides, it's confusing that sometimes in the implementation buffer means a string and sometimes a Node Buffer object.

Uint8Array is probably what you want.

No low-level access to parseError

As far as I can tell there's no low-level access to parseError. This makes it impossible to implement a simple validator on top of this code. (It would be even better for that to have detailed messages of course, but a boolean is a start.)

How to get/set query string parameters?

Has URLUtils.searchParams been removed?

What is the best way to do this?

Unrelated: Why doesn't URL use the same terminology as RFC3986? (protocol -> scheme, hash -> fragment, search -> query, etc.)

hostname setter does not work because of wrong state override.

using hostname setter leads to this backtrace:

TypeError: this[("parse" + this.state)] is not a function
at new URLStateMachine (/opt/node/node_modules/whatwg-url/lib/url-state-machine.js:477:43)
at Object.module.exports.basicURLParse (/opt/node/node_modules/whatwg-url/lib/url-state-machine.js:1148:15)
at URLImpl.hostname (/opt/node/node_modules/whatwg-url/lib/URL-impl.js:126:9)
at URL.set [as hostname] (/opt/node/node_modules/whatwg-url/lib/URL.js:132:25)

The problem lies in whatwg-url/lib/URL-impl.js +126 where stateOverride is set to "hostname" instead of "host name" as it is described in URLStateMachine prototype.

Setters need tests

They can be buggy and there are no web platform tests for them. This is sad.

Properly follow "return failure"

At the moment failures which cause parsing to terminate are short-cutted by directly throwing an exception and where failure does not imply termination are handled by encoding failure in the return value.

However, to align more closely with the spec, they all should return a real failure value (probably symbol).

Update for latest HTML/URL changes

URLUtils is basically dead. Although url.spec.whatwg.org still needs to be updated for the URL constructor.

As you can see from this commits (or the corresponding spec sections) whatwg-url will need to expose things like basicURLParse(urlObject, stateOverride), or setUsername(urlObject, value), or serializeHost(host), or... Might be best to write the jsdom patch first then see what APIs it needs.

host setter doesn't work

Test case:

"use strict";
const whatwgURL = require("whatwg-url-compat");
const URL = whatwgURL.createURLConstructor();

const url = new URL("/dom/nodes/CharacterData-appendData.html", "http://127.0.0.1:57100/");
console.log(url.host);
url.host = "http://w3c-test.org/";

Output:

$ iojs test.js
127.0.0.1:57100
C:\Users\Domenic\Dropbox\GitHub\tmpvar\jsdom\node_modules\whatwg-url-compat\lib\url.js:1337
  obj[updateStepsSymbol].call(obj, value);
                        ^

TypeError: Cannot read property 'call' of undefined
    at preUpdateSteps (C:\Users\Domenic\Dropbox\GitHub\tmpvar\jsdom\node_modules\whatwg-url-compat\lib\url.js:1337:25)
    at URL.host (C:\Users\Domenic\Dropbox\GitHub\tmpvar\jsdom\node_modules\whatwg-url-compat\lib\url.js:1187:5)
    at Object.<anonymous> (C:\Users\Domenic\Dropbox\GitHub\tmpvar\jsdom\test.js:7:10)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Function.Module.runMain (module.js:471:10)
    at startup (node.js:117:18)
    at node.js:952:3

browserify support?

This lib is written with ES6, so it doesn't get parsed correctly with browserify:

Parse error at -:3082,12
SyntaxError: Unexpected token: name (x)
Error

It'd be great if there were a browser version available to serve as a polyfill for IE.

Parse relative URLs

... containing only a path and hash. Is this possible? I know that the browser implementation of URL does not.

Validate IPv4 addresses

It seems like the spec doesn't contains any steps to validate IPv4 addresses, but the test cases in the web-platform-tests repository include some which validate proper handling of IPv4 addresses.

@annevk is that missing from the spec, or did I miss it somewhere?

Status of this standard?

Has it gained any traction in the industry or is it another proposal that could just get ignored?

Add code coverage

Not necessary for v1. But I think that it would be really good, for both the implementation and the URL spec itself, to add some code coverage so we get some idea of how complete the test cases in web-platform-tests are, and can create new ones to target under-tested areas of the spec.

bug in parsing file:

Some oddities show up while parsing file urls across all the envs I am looking at.

u = new URL('file://x#y')
u.href // 'file://x/#y'

The extra / shouldn't be there for this one.

toJSON() serialization

console.log( JSON.stringify( new (require("whatwg-url").URL)("http://domain/") ) )

produces:

{}

do browsers use punycode?

I don't see any mention of "punycode" in the spec, and Chrome behaves differently:

new URL("http://faß.ExAmPlE").hostname
//-> fass.example

I don't know enough about text encoding to know if punycode is different from ascii/unicode

Need toString method

URLUtils includes the "stringifier" IDL production which implies a toString method. The following should do:

toString() {
  return this.href
}

Is encodeURIComponent used correctly?

I've always been a little afraid of that function, since IIUC it doesn't really follow any spec besides just "here is the behavior Brendan put in to ES1." Just wanted to check you were aware of its caveats and are sure it is being used correctly?

getURL name is meh

It creates, it doesn't really get. createURLConstructor? makeURLConstructor?

Consider not packaging the /coverage/ dir

About 83% of the whatwg-url as packaged on npm is consumed by the /coverage directory.
This is probably not intended.

To exclude that, one of the following should be done:

  • it has to be either added to .npmignore
  • .npmignore has to be deleted and entries should be merged with .gitignore (probably not an option, as you want to package ./lib)
  • Use the files property in the package.json to explicitly specify the packaged files.

Ref: https://docs.npmjs.com/misc/developers#keeping-files-out-of-your-package

Bug in parsing URLs

While playing with some jquery/ajax stuff in node.js, I think I’ve hit a bug in the whatwg-url module used by jsdom. It’s unable to parse certain valid URLs such as: "https://r3---sn-p5qlsnz6.googlevideo.com/" and throws "TypeError: Invalid URL". I think the problem is with the “r3---sn” part of the hostname.

jsdom version: 8.1.0
whatwg-url version: 1.0.1

Following node.js code reproduces the issue:

var url = "https://r3---sn-p5qlsnz6.googlevideo.com/";
parseURL = require("whatwg-url").parseURL;
var result = parseURL(url);

Output:

../node_modules/whatwg-url/lib/url-state-machine.js:1151
    throw new TypeError("Invalid URL");
    ^
TypeError: Invalid URL
    at Object.module.exports.basicURLParse (../node_modules/whatwg-url/lib/url-state-machine.js:1151:11)
    at module.exports.parseURL (../node_modules/whatwg-url/lib/url-state-machine.js:1189:25)
    at Object.<anonymous> (test.js:5:14)

Fix file URL serialization

The current bug we need to fix is that parsing /foo/bar against file:/path/to/docroot/index.html currently gives file:/foo/bar, whereas it should give file:///foo/bar.

In fact, just parsing file:/path/to/docroot/index.html should give back file:///path/to/docroot/index.html.

The complication here is that we have no tests for href serialization, which is horrible.

Related: whatwg/url#52 web-platform-tests/wpt#2038

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.