Giter VIP home page Giter VIP logo

psl's Introduction

psl (Public Suffix List)

Node.js CI

psl is a JavaScript domain name parser based on the Public Suffix List.

This implementation is tested against the test data hosted by Mozilla and kindly provided by Comodo.

Cross browser testing provided by BrowserStack

What is the Public Suffix List?

The Public Suffix List is a cross-vendor initiative to provide an accurate list of domain name suffixes.

The Public Suffix List is an initiative of the Mozilla Project, but is maintained as a community resource. It is available for use in any software, but was originally created to meet the needs of browser manufacturers.

A "public suffix" is one under which Internet users can directly register names. Some examples of public suffixes are ".com", ".co.uk" and "pvt.k12.wy.us". The Public Suffix List is a list of all known public suffixes.

Source: http://publicsuffix.org

Installation

Node.js

npm install --save psl

Browser

Download psl.min.js and include it in a script tag.

<script src="psl.min.js"></script>

This script is browserified and wrapped in a umd wrapper so you should be able to use it standalone or together with a module loader.

The script is also available on most popular CDNs. For example:

API

psl.parse(domain)

Parse domain based on Public Suffix List. Returns an Object with the following properties:

  • tld: Top level domain (this is the public suffix).
  • sld: Second level domain (the first private part of the domain name).
  • domain: The domain name is the sld + tld.
  • subdomain: Optional parts left of the domain.

Example:

var psl = require('psl');

// Parse domain without subdomain
var parsed = psl.parse('google.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'google'
console.log(parsed.domain); // 'google.com'
console.log(parsed.subdomain); // null

// Parse domain with subdomain
var parsed = psl.parse('www.google.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'google'
console.log(parsed.domain); // 'google.com'
console.log(parsed.subdomain); // 'www'

// Parse domain with nested subdomains
var parsed = psl.parse('a.b.c.d.foo.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'foo'
console.log(parsed.domain); // 'foo.com'
console.log(parsed.subdomain); // 'a.b.c.d'

psl.get(domain)

Get domain name, sld + tld. Returns null if not valid.

Example:

var psl = require('psl');

// null input.
psl.get(null); // null

// Mixed case.
psl.get('COM'); // null
psl.get('example.COM'); // 'example.com'
psl.get('WwW.example.COM'); // 'example.com'

// Unlisted TLD.
psl.get('example'); // null
psl.get('example.example'); // 'example.example'
psl.get('b.example.example'); // 'example.example'
psl.get('a.b.example.example'); // 'example.example'

// TLD with only 1 rule.
psl.get('biz'); // null
psl.get('domain.biz'); // 'domain.biz'
psl.get('b.domain.biz'); // 'domain.biz'
psl.get('a.b.domain.biz'); // 'domain.biz'

// TLD with some 2-level rules.
psl.get('uk.com'); // null);
psl.get('example.uk.com'); // 'example.uk.com');
psl.get('b.example.uk.com'); // 'example.uk.com');

// More complex TLD.
psl.get('c.kobe.jp'); // null
psl.get('b.c.kobe.jp'); // 'b.c.kobe.jp'
psl.get('a.b.c.kobe.jp'); // 'b.c.kobe.jp'
psl.get('city.kobe.jp'); // 'city.kobe.jp'
psl.get('www.city.kobe.jp'); // 'city.kobe.jp'

// IDN labels.
psl.get('食狮.com.cn'); // '食狮.com.cn'
psl.get('食狮.公司.cn'); // '食狮.公司.cn'
psl.get('www.食狮.公司.cn'); // '食狮.公司.cn'

// Same as above, but punycoded.
psl.get('xn--85x722f.com.cn'); // 'xn--85x722f.com.cn'
psl.get('xn--85x722f.xn--55qx5d.cn'); // 'xn--85x722f.xn--55qx5d.cn'
psl.get('www.xn--85x722f.xn--55qx5d.cn'); // 'xn--85x722f.xn--55qx5d.cn'

psl.isValid(domain)

Check whether a domain has a valid Public Suffix. Returns a Boolean indicating whether the domain has a valid Public Suffix.

Example

var psl = require('psl');

psl.isValid('google.com'); // true
psl.isValid('www.google.com'); // true
psl.isValid('x.yz'); // false

Testing and Building

Test are written using mocha and can be run in two different environments: node and phantomjs.

# This will run `eslint`, `mocha` and `karma`.
npm test

# Individual test environments
# Run tests in node only.
./node_modules/.bin/mocha test
# Run tests in phantomjs only.
./node_modules/.bin/karma start ./karma.conf.js --single-run

# Build data (parse raw list) and create dist files
npm run build

Feel free to fork if you see possible improvements!

Acknowledgements

License

The MIT License (MIT)

Copyright (c) 2017 Lupo Montero [email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

psl's People

Contributors

greenkeeper[bot] avatar lupomontero avatar msmiley avatar parcley avatar ph4r05 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

psl's Issues

An in-range update of eslint-config-hapi is breaking the build 🚨

Version 10.1.0 of eslint-config-hapi just got published.

Branch Build failing 🚨
Dependency eslint-config-hapi
Current Version 10.0.0
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As eslint-config-hapi is “only” a devDependency of this project it might not break production or downstream projects, but “only” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this 💪

Status Details
  • continuous-integration/travis-ci/push The Travis CI build failed Details

Commits

The new version differs by 7 commits.

  • edfd05d v10.1.0
  • 3d74958 use destructuring: 'all' with prefer-const
  • c283933 update dependencies and test on Node 8
  • 0427131 update lab. ignore eslint cache file
  • b1514a9 add test for no-undef rule
  • 05235d3 remove unnecessary quotes in rule config
  • c0d2550 enable no-undef rule

See the full diff

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

Add support for ES6 browser import

It seems the library cannot be loaded with module imports on modern browsers.

The purpose would be to allow such imports since JavaScript imports are now available with all major browsers. Including an esm file or adapting existing files in the NPM package to have a default or another export would allow something like the following:

import psl from 'psl';

It would also be nice to be able to import only the desired function:

import {parse} from 'psl';

Wrong for domains with AM tld

Version 1.1.23

Steps to reproduce
pls.parse("foo--bar.am")

Expected Behavior
{ error: "Label can't contains two '-'", ... }

Actual Behavior
{ sld: "foo--bar", ... }

Not returning proper domain

var psl = require('psl');

// Parse domain without subdomain
var parsed = psl.parse('resourcehub.in.gov');
console.log(parsed.domain); // we are getting 'in.gov'

Expected output 'resourcehub.in.gov'
Please add 'in.gov' to the list.

Automate List Updates

It would be ideal if we used the atom feed to update to refresh the official list and run new tests. I'm not sure if you would want it to automatically push to the repository or package mangers, but it would be good if the delta between updates was minimized.

Unmaintained Package

This package has had no updates in a year and a half and deprecation warnings are stacking up. It has almost 30 million weekly downloads from NPM. Would appreciate if downstream package maintainers would consider taking over maintenance or forking.

Option to parse even when invalid

URLs like www-.test.com are invalid so they return an error and no other info, but it would be good if they returned the parsed information as well as the error

Anyone know of a php version?

Just stumbled here while looking for a way to parse only the domain name (no subs) from a string. I briefly looked over this (haven't played with it yet), but looks to be exactly what I was looking for... and the list is updateable (I think).

Has anyone turned this into a php as a class or anything? I need to use this in php and would prefer to do so, but if need be I can make this js method work. Just thought I would ask and maybe I haven't seen or found the php equivalent yet.

Using 'import' instead of require?

Hi,

Trying to use import instead of require for the module, but getting an error. I've tried
import * as psl from 'psl' but to no avail. Any input?

cloudfront.net tld?

Why does psl think cloudfront.net is a tld? The list at http://data.iana.org/TLD/tlds-alpha-by-domain.txt doesn't indicate it is one.

console.log(psl.parsed('d31qbv1cthcecs.cloudfront.net')
{
  input: 'd31qbv1cthcecs.cloudfront.net',
  tld: 'cloudfront.net',
  sld: 'd31qbv1cthcecs',
  domain: 'd31qbv1cthcecs.cloudfront.net',
  subdomain: null,
  listed: true
}

psl.get('1.2.3.4') returns last two numbers of IP

Example:

> require('psl').get('1.2.3.4')
'3.4'

I think this should return the entire IP and not part of it.

This also does not work with ipv6:

> require('psl').get('2001:0db8:85a3:0000:0000:8a2e:0370:7334')
null

I'm not sure if this is within the scope of this module though. Is this module used to get the domain (in a cookie sense) on which a cookie can be set when given a hostname (domain or ip) or is this module only used to provide the public suffix when given a real domain?

Performance issues - batch processing

Hi!

Thanks for an awesome package I love it!

I wanted to use it in my dashboard project showing tens of certificates with tens to hundreds alternative domains - turned out the processing is just too slow :(

I also used lodash _.memoize function to tabulate the get method results but I am still on 13 seconds processing time on i7 proc.

I was wondering whether it could be possible to improve the search in the rule database as it seems to be the most expensive part.

It comes to my mind to somehow init the searching structure before use and then just use pre-processed rule database. E.g., the profiler shows Punycode.toASCII is quite expensive - 86.8% of the get processing time in the findRule method:

return internals.rules.reduce(function (memo, rule) {
    var punySuffix = Punycode.toASCII(rule.suffix);  // <--- overhead
    if (!internals.endsWith(punyDomain, '.' + punySuffix) && punyDomain !== punySuffix) {
      return memo;
    }
    // ...
}

The Punycode.toASCII call is called on static rules each time the get is called. PunySuffix could be pre-computed for static rules to optimize batch requests which saves a lot of time. Also the if condition can be swapped to firstly evaluate punyDomain !== punySuffix. If it fails the lazy evaluation skips the endsWith check.

Profiler shows this is the critical path with significant overhead. There are also further improvements possible - e.g., to sort the database / build a suffix tree and reduce O(N) to O(lg(N)) where N is the database size.

My profiller results:

total: 12963 ms
findrule: 12937 ms (99.8%)
findrule -> reduce: 12934 ms (99.8%)
toASCII: 11248 ms (86.8%)

just for illustration, without memoizing 993 domains took 15980 ms.

Pls let me know what you think about this improvement. I could also do a PR later if you let me know how do you like to have it implemented.

Thanks for considering!

Wrong for domains with CO.BZ tld

Version 1.1.23

Steps to reproduce
psl.parse('test.co.bz')

Expected Behavior
{ domain: "test.co.bz", tld: "co.bz", ... }

Actual Behavior
{ domain: "co.bz", tld: "bz", ... }

Dependency on deprecated Node "punycode" module

The project calls require('punycode'), using the Node punycode module which has been deprecated since v7 (2016). The Node docs now recommend using punycode.js in place of it.

This also means that when this library is used in a project using webpack, webpack must be manually configured to load punycode.js as a replacement for the punycode module because webpack since v5 (2020) no longer defaults to including shims for Node APIs like punycode.

It would be helpful if this library switched to using punycode.js instead of the Node punycode module.

An in-range update of browserify is breaking the build 🚨

Version 14.3.0 of browserify just got published.

Branch Build failing 🚨
Dependency browserify
Current Version 14.2.0
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As browserify is “only” a devDependency of this project it might not break production or downstream projects, but “only” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this 💪


Status Details
  • continuous-integration/travis-ci/push The Travis CI build failed Details
Commits

The new version differs by 4 commits .

  • cd01926 14.3.0
  • 08caf04 changelog
  • ad5060d Merge pull request #1710 from substack/https-browserify-1
  • 7c7b4d4 update https-browserify to ^1.0.0

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

PSL test caese failing on power machine

npm version : 5.10.0
node version : 8.9.4
phantomjs version : 2.1.1 (binary created from source code)

successfully working on X86 ubuntu.

But facing below issue on Power RHEL while npm test -f:

57 tests completed
48 tests failed
psl

Any lead would be helpful.
Thanks.

Performance issues - iterating over 9k rules for every domain

I've made some benchmarks and performance of psl is around 2k ops/sec:

psl#isValid x 2,310 ops/sec ±2.00% (92 runs sampled)
psl#parse x 2,428 ops/sec ±0.24% (97 runs sampled)
psl#parse invalid domain x 2,415 ops/sec ±1.56% (94 runs sampled)

by using maps it is possible to increase speed to 300k ops/sec:

psl#isValid x 349,961 ops/sec ±0.48% (95 runs sampled)
psl#parse x 346,816 ops/sec ±0.18% (99 runs sampled)
psl#parse invalid domain x 375,047 ops/sec ±0.16% (96 runs sampled)

psl.isValid inconsistency with Amazon S3 urls

psl.isValid('s3.amazonaws.com') // false ??? Why???
psl.isValid('s3.amazonaws.co.uk') // true
psl.isValid('s2.amazonaws.com') // true
psl.isValid('s4.amazonaws.com') // true
psl.isValid('s33.amazonaws.com') // true

An in-range update of browserify is breaking the build 🚨

The devDependency browserify was updated from 16.3.0 to 16.4.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

browserify is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Commits

The new version differs by 5 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Use with Google apps script engine

Is there an easy way to modify this to work in GAS? GAS does not support require as they have their own module system. I probably do not need the punycode stuff if that makes it more difficult.

com.in not recognised as a TLD

psl.parse('testing.com.in')

returns

{ input: 'testing.com.in', tld: 'in', sld: 'com', domain: 'com.in', subdomain: 'testing', listed: true }

I can see com.in in the public suffix list so not sure why this is not returning correctly

An in-range update of uglify-js is breaking the build 🚨

The devDependency uglify-js was updated from 3.5.15 to 3.6.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

uglify-js is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Release Notes for v3.6.0

 

Commits

The new version differs by 3 commits.

  • 70bb304 v3.6.0
  • 9d3b1ef fix corner case in assignments (#3430)
  • 482e1ba enhance assignments & unused (#3428)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Support loading external data file

Hey @lupomontero,

I'm enjoying using psl - thanks for your work on it!

One extra I'd like is the ability to load the rules from an external data file rather than using the internal one. This might just be to track the upstream MPSL more closely, or it would allow you to fork the MPSL and make custom changes for particular applications. Would you be open to this kind of change?

If so, what kind of interface would you like? I've got a prototype in my fork:

https://github.com/gavincarr/psl

that allows you to do e.g. var psl = require('psl').loadData('effective_tld_names.dat'); to synchronously load a (MPSL format, not json) data file at require time.

Thoughts?

platform.sh should be a valid domain

Version 1.8.0

Steps to reproduce
psl.isValid('platform.sh')

Expected Behavior
true

Actual Behavior
false

Why?
platform.sh is not a TLD while .sh is.

it.com

Hello. Can you add it.com to the psl? They have come back online. Thx.

Domain name validation is not correct according to RFC 2181

psl validates the regular expression /^[a-z0-9-]+$/ and returns 'LABEL_INVALID_CHARS' if not valid.
This is wrong according to rfc 2181 section 11 Name Syntax:

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name....Implementations of the DNS protocols must not place any restrictions on the labels that can be used

The validation of LABEL_INVALID_CHARS should be removed

Interest in isCookieDomainAcceptable()

Hi! Just wondering if there's any interest in implementing a function along the lines of isCookieDomainAcceptable() from libpsl: https://rockdaboot.github.io/libpsl/libpsl-Public-Suffix-List-functions.html#psl-is-cookie-domain-acceptable

This might be useful for cookie handling libraries that currently have to call psl.get() and implement their own logic around checking for cookie validity. I understand if this library wants to remain small, but I noticed this useful function in libpsl and wondered if there was interest in including it in the JS counterpart.

About released files on NPM

Hi @lupomontero ,
I'm a member of cdnjs team and we want to host this library.
I found that the files published on NPM are different from GitHub repository.
Files on NPM looks like as follow:

.
├── data
   └── rules.json
├── index.js
├── karma.conf.js
├── package.json
└── README.md

I think it's better to get main files from GitHub. Or will the files under dist folder be published on NPM in the future? This will affect our decision using git or npm auto-update config.
Thank you.

cdnjs/cdnjs#8942

Backdoor in event-stream dependency

The dependency event-stream version 3.3.6 contains a backdoor via a library it uses, flatmap-stream. The linked issue explains the situation.

Since package.json specifies "event-stream": "^3.3.4",, it was possible to pull in the back door if you updated psl's dependencies during the period of time that event-stream 3.3.6 was available.

One fix would be to lock event-stream to exactly 3.3.4.

data folder disappearing when packaging for electron

I'm pretty new to Electron/electron-packager/package.json so I don't know enough about its bowels to know what's going on exactly, but somehow when I package an Electron app up that uses psl, it'll copy over index.js and the other files, but the data folder gets lots in the process and I have to manually copy it over for everything to work. If I do then everything is roses, but if not then obviously index.js will fail when it tries to require the json file.

If anyone has a specific pointer for how to address this, I'd be happy to try to take it on. I'm assuming it has something to do with the package.json file not properly referencing the required folder, but I don't actually know that. It's also possible electron-packager is ignoring subdirectories, in which case sorry to waste your time, I just know that in a collection of 100+ node_modules this is the only one that's failing for me.

Thanks for your help!

Incorrectly stating a valid URL is invalid

Issue exists where overlapping suffixes aren't handled correctly, and state URL as invalid.

For example:
gov.uk - valid URL since "gov" is the sld, and ".uk" is the tld. However, this incorrectly states the URL is invalid, since "gov.uk" is also a tld in its own right (e.g. "data.gov.uk", where "data" is the sld and "gov.uk" is the tld).

Same goes for other overlapping tlds, such as github.io.

ts-lint fails with psl import in version 1.1.29

Here is the error:

./node_modules/.bin/tslint -p tsconfig.json
src/helpers/custom-domains.ts:5:22 - error TS2307: Cannot find module 'psl'.

5 import * as psl from 'psl';
                       ~~~~~

npm ERR! code ELIFECYCLE
npm ERR! errno 2
npm ERR! [email protected] build: `yarn run lint && ./node_modules/.bin/tsc`
npm ERR! Exit status 2
npm ERR! 
npm ERR! Failed at the [email protected] build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/X/.npm/_logs/2018-08-03T13_56_09_536Z-debug.log

Error: functions predeploy error: Command terminated with non-zero exit code2
error Command failed with exit code 1.

Simply forcing version 1.1.28 of psl fixes the issue.

domain is not parsed correctly for `run.app`

Steps to reproduce:

const parsed = psl.parse('health-check-tool-2-api-izeboapi4q-nn.a.run.app')
console.log(parsed.domain)

expected:
run.app

actual:
health-check-tool-2-api-izeboapi4q-nn.a.run.app

version:
"psl": "^1.9.0",

p.s app.run is the default google cloud run domain

from examples on readme

// Parse domain with nested subdomains
var parsed = psl.parse('a.b.c.d.foo.com');
console.log(parsed.tld); // 'com'
console.log(parsed.sld); // 'foo'
console.log(parsed.domain); // 'foo.com'
console.log(parsed.subdomain); // 'a.b.c.d'

Also added a request publicsuffix/list#1710

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.