Giter VIP home page Giter VIP logo

tld-data's Introduction

build status build status twitter shield website shield

tld-data

Accurate data on TLDs with a focus on which ones can actually be registered. Data pulled weekly

Browse the data at tld-data.com!

Methodology:

  • Pull all TLDs from DNS root zone for accuracy (disregards upcoming and terminated TLDs)
  • Combines with type information from IANA root zone database
  • Scrapes ICANN registry agreements for other information to get as close to the source as possible

Data

tldData.json contains an array with an object for every TLD in the root zone. Each object has other properties shown in the below snippet assembled from multiple sources.

{
  "generated": "2021-03-16T06:41:50-04:00",
  "data": {
    // TLD, no leading '.', unicode (not punycode)
    "tld": "accenture",

    // type of the TLD from IANA database
    // ['generic', 'country-code', 'sponsored', 'infrastructure', 'generic-restricted', 'test']
    // An explanation of each can be found: https://icannwiki.org/Generic_top-level_domain
    "type": "generic",

    // If present, is the generic TLD a generic brand TLD?
    // More specifically, does the registry agreement for this TLD specify "Specification 13"
    // or have an exemption to "Specification 9". Both of these prohibit the registry
    // from giving domains to anyone but the registry and affiliates (no third parties).
    "isBrand": true,

    // If present, are there any restrictions for registering the TLD?
    // Only checks for "Specification 12" currently (see notes in code)
    // Not super accurate yet, and not currently implemented for ccTLDs!
    "hasRestrictions": false,

    // If the gTLD is NOT in General Availability (useful for filtering out domains
    // that are too new)
    // NOTE: This is NOT PARTICULARLY ACCURATE. This uses the end of the last listed
    // period as there's no well-maintained public data source for this...
    // NOTE: omitted on non-generic TLDs
    "isNotInGeneralAvailability": false,

    // The periods of the gTLD, in ISO8601 date format (no time)
    "periods": [{
      "name": "Sunrise",
      "open": "2015-07-06",
      "close": "2015-12-31"
    }]
  }
}

Running

src/cli.js prints data to stdout and takes previously found data from stdin (to reuse in certain portions to reduce HTTP requests).

You can run the command to generate all new data:

node -r esm --unhandled-rejections=strict src/cli.js --color > tldData.json

Or to reuse the old isBrand and hasRestrictions keys, you can run:

$ node -r esm --unhandled-rejections=strict src/cli.js --stdin --color < tldData.json > tldDataNew.json
$ mv -f tldDataNew.json tldData.json

Contributing

Contributing for tld-data.com can be found here.

tld-data's People

Contributors

cobertos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tld-data's Issues

Includes TLDs that are not registerable (.kerryproperties)

For some reason .kerryproperties is included in the dataset, when you definitely can't register it. (nic.kerryproperties does not resolve and the sponsoring party is a real-estate company, kerryprops.com). Most sites say it's a "brand TLD":

  • It's listed on ntldstats.com as a brand TLD.
  • It's listed as a Brand TLD on ICANN Wiki.
  • Listed as an "Organization" on tld-list though so is .dev so perhaps it also has trouble classifying
  • It's listed as a "Generic brand" TLD on namestat and I have yet to look at where they're scraping data from.

Yet...

  • I see no mention of Specification 13 in any of the ICANN registry agreements
    • Not in the original registry agreement
    • Not in any amendment
    • There's no "Specification 13" section on the registry agreement page

Include _all_ domains

It would be nice to include all the domains we find, instead of just the ones in the root zones. We have to rewrite the ordering of how everything is compiled though

More helpful commit messages

It'd be nice if the CI could give a little blurb like "Removed .intel" or "Added .spa" in the commit messages

Times are in a specific timezone?

The times seem to be different between the GitHub generated ones and the ones I'm generating locally. Is there a time/clock issue? What needs to be done to fix this.

Deploy to package repositories?

Might be nice to be able to pull this down from npm or pypi and have it update automatically? Though then the consumer would have to make sure to be constantly rebuilding/updating their packages as well... Which requires tooling like renovate bot

Include all data we're omitting

There's some data we're specifically omitting to not make tldData.json too big, but honestly that should be on the consumer to strip down the JSON file before inclusion in their app... Big file is fine for the default I think

Configure nodeFetch to retry on 502 bad gateway

nodeFetch seems to be not retrying on 502 bad gateway. I recall the docs having retrictive defaults for what it will retry on

const fetch = fetchRetry(nodeFetch); // 3 retries, 1000ms delays :10 in fetch.js

isGenerallyAvailable is annoying to use

.isGenerallyAvailable is annoying to use.

If you want to filter over the array you have to do o.isInGeneralAvailability || o.isInGeneralAvailability === undefined, because it's falsy on ccTLDs as the key is not present on there...

Stop catastrophically failing

If the script runs and fails or doesn't output anything, the file is blow away.

Like here it was blown away for a week and it was a silent failure cause I didn't get an email ;-;

215fd5b

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.