Giter VIP home page Giter VIP logo

addr's Introduction

Robust and fast domain name parsing

CI Latest Version Docs

This library uses Mozilla's Public Suffix List to reliably parse domain names in Rust. It will reliably check if a domain has valid syntax. It also checks the length restrictions for each label, total number of labels and full length of domain name.

Examples

use addr::{parse_domain_name, parse_dns_name};

// You can find out the root domain
// or extension of any given domain name
let domain = parse_domain_name("www.example.com")?;
assert_eq!(domain.root(), Some("example.com"));
assert_eq!(domain.suffix(), "com");
assert_eq!(domain.prefix(), Some("www"));

let domain = parse_domain_name("www.食狮.**")?;
assert_eq!(domain.root(), Some("食狮.**"));
assert_eq!(domain.suffix(), "**");

let domain = parse_domain_name("www.xn--85x722f.xn--55qx5d.cn")?;
assert_eq!(domain.root(), Some("xn--85x722f.xn--55qx5d.cn"));
assert_eq!(domain.suffix(), "xn--55qx5d.cn");

let domain = parse_domain_name("a.b.example.uk.com")?;
assert_eq!(domain.root(), Some("example.uk.com"));
assert_eq!(domain.suffix(), "uk.com");

let name = parse_dns_name("_tcp.example.com.")?;
assert_eq!(name.suffix(), Some("com."));

// In any case if the domain's suffix is in the list
// then this is definately a registrable domain name
assert!(domain.has_known_suffix());

TODO

Strict internationalized domain names (IDN) validation (use the idna feature flag)

Use Cases

For those who work with domain names the use cases of this library are plenty. publicsuffix.org/learn lists quite a few. For the sake of brevity, I'm not going to repeat them here. I work for a domain registrar so we make good use of this library. Here are some of the ways this library can be used:

  • Validating domain names. This one is probably obvious. If a domain.has_known_suffix() you can be absolutely sure this is a valid domain name. A regular expression is simply not robust enough.
  • Blacklisting or whitelisting domain names. You can't just blindly do this without knowing the actual registrable domain name otherwise you risk being too restrictive or too lenient. Bad news either way...
  • Extracting the registrable part of a domain name so you can check whether the domain is registered or not.
  • Storing details about a domain name in a DBMS using the registrable part of a domain name as the primary key.

addr's People

Contributors

davidhewitt avatar jonasbb avatar jprider63 avatar muellpanda avatar rushmorem avatar yerke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

addr's Issues

`addr::parse_domain_name()` equivalent in 0.15.2

Hello, thank you for the work on this crate! Prior to 0.15.2 I was using:

use addr::parse_domain_name;

fn parse_domain(target: &str) -> bool {
  parse_domain_name(target).is_ok()
}

However, in 0.15.2 it doesn't seem to be a valid option to do it because addr::parse_domain_name doesn't exist anymore, I have made a quick look at the source code and found out that the equivalent is:

use addr::{parser::DomainName, psl::List};

fn parse_domain(target: &str) -> bool {
  List.parse_domain_name(target).is_ok()
}

Is that correct? Sorry for the basic question but I'm using the crate for critical development and I want to be sure before the change.

Regards,
Ed

Build failure

When trying to build the latest master of adblock-rust for my python-adblock project, I get the following build error:

error[E0015]: calls in constant functions are limited to constant functions, tuple structs and tuple variants
  --> /home/arni/.cargo/registry/src/github.com-1ecc6299db9ec823/addr-0.9.2/src/dns.rs:44:13
   |
44 |             suffix.is_known()
   |             ^^^^^^^^^^^^^^^^^

error[E0015]: calls in constant functions are limited to constant functions, tuple structs and tuple variants
  --> /home/arni/.cargo/registry/src/github.com-1ecc6299db9ec823/addr-0.9.2/src/domain.rs:47:9
   |
47 |         self.suffix.is_known()
   |         ^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to 2 previous errors

@rushmorem I wonder if you know what's going on?

Lacks a Git tag for 0.15.2

Crates.io says that 0.15.2 was published 4 days ago, but the newest commit was 15 days ago, and 0.15.2 tag is missing from GitHub.

For other curious souls: I downloaded 0.15.1 and 0.15.2 from https://crates.io/api/v1/crates/addr/0.15.2/download, renamed to tar.gz, unpacked and diffed:

diff -ruN addr-0.15.1/Cargo.toml addr-0.15.2/Cargo.toml
--- addr-0.15.1/Cargo.toml      1970-01-01 03:00:01.000000000 +0300
+++ addr-0.15.2/Cargo.toml      1970-01-01 03:00:01.000000000 +0300
@@ -12,7 +12,7 @@
 [package]
 edition = "2018"
 name = "addr"
-version = "0.15.1"
+version = "0.15.2"
 authors = ["rushmorem <[email protected]>"]
 description = "A library for parsing domain names"
 documentation = "https://docs.rs/addr"
@@ -20,8 +20,6 @@
 keywords = ["tld", "gtld", "cctld", "domain", "no_std"]
 license = "MIT/Apache-2.0"
 repository = "https://github.com/addr-rs/addr"
-[package.metadata.docs.rs]
-all-features = true
 
 [[bench]]
 name = "list_benchmark"
diff -ruN addr-0.15.1/Cargo.toml.orig addr-0.15.2/Cargo.toml.orig
--- addr-0.15.1/Cargo.toml.orig 1973-11-30 00:33:09.000000000 +0300
+++ addr-0.15.2/Cargo.toml.orig 1973-11-30 00:33:09.000000000 +0300
@@ -1,7 +1,7 @@
 [package]
 name = "addr"
 description = "A library for parsing domain names"
-version = "0.15.1"
+version = "0.15.2"
 license = "MIT/Apache-2.0"
 repository = "https://github.com/addr-rs/addr"
 documentation = "https://docs.rs/addr"
@@ -51,6 +51,3 @@
 idna = []
 net = ["no-std-net"]
 std = []
-
-[package.metadata.docs.rs]
-all-features = true
diff -ruN addr-0.15.1/.cargo_vcs_info.json addr-0.15.2/.cargo_vcs_info.json
--- addr-0.15.1/.cargo_vcs_info.json    1970-01-01 03:00:01.000000000 +0300
+++ addr-0.15.2/.cargo_vcs_info.json    1970-01-01 03:00:01.000000000 +0300
@@ -1,5 +1,5 @@
 {
   "git": {
-    "sha1": "40bf37d50c7c7f6d86535ffe5143e13d2615a540"
+    "sha1": "be9602f2a06792f3680400373deae1a765487e1a"
   }
 }

Subdomains with underscores aren't handled

I noticed that parse_domain_name will produce an IllegalCharacter error if there is an underscore in a subdomain.

use addr::psl::List;
use addr::parser::DomainName as _;

fn main() {
    // ok
    List.parse_domain_name("zn-ed65ynwxvsuk9lf-cbs.siteintercept.qualtrics.com").unwrap();

    // panics
    List.parse_domain_name("zn_ed65ynwxvsuk9lf-cbs.siteintercept.qualtrics.com").unwrap();
}

According to this stackoverflow post, it doesn't seem to be "best practice" to include an underscore in a subdomain. However, there are several real-world examples of URLs (shared in that post as well as the one I found above), all of which appear to have no issues in modern browsers and CLI tools. So it would be great to have it supported by addr as well.

Strict internationalized domain names (IDN) validation

Hello!

First allow me to thank you for your work 👍 That crate has been really useful and very simple to use!

I am not exactly sure if it's actually a goal of the crate but I figured I might ask.
Should internationalized domain names be properly validated?
I was looking at test cases from https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft7/optional/format/idn-hostname.json and it seems that some domain names are accepted whereas they should probably be rejected.

A few examples:

  • 〮실례.테스트 should be rejected because it contains a forbidden leading combining character
$ dig 〮실례.테스트
dig: '〮실례.테스트' is not a legal IDNA2008 name (string contains a forbidden leading combining character), use +noidnin
  • 실〮례.테스트 should be rejected because it contains a disallowed character
$ dig 실〮례.테스트
dig: '실〮례.테스트' is not a legal IDNA2008 name (string contains a disallowed character), use +noidnin
  • xn--X should be rejected because it contains invalid punycode data
$ dig xn--X
dig: 'xn--X' is not a legal IDNA2008 name (string contains invalid punycode data), use +noidnin

What do you think? Could the crate be enhanced to provide such domain validation? If not, do you recommend some alternatives?

Thank you for taking the time to read this.

has_known_suffix not support

Name { full: "aaa-bbb.cccc.lab", suffix: Suffix { bytes: [108, 97, 98], fqdn: false, typ: None } }
let url_check = match parse_domain_name(address) {
    Ok(domain) => {
        println!("{:?}", domain);
        domain.has_known_suffix()
    }
    Err(_) => {
        false
    }
};

I tried to use lab as the domain name suffix and found that the verification was not successful.
It is true that lab is not a commonly used domain name suffix. Can I implement a method that supports custom verification suffixes?

Is this project still active?

The repository has no commits since October 2018. Perhaps the code just still works fine, but I want to be sure there is still a developer.

document that inputs are case-sensitive or convert to lowercase

I was somewhat surprised by this behavior:

$ evcxr
>> addr::parse_domain_name("GOOGLE.COM").unwrap().is_icann()
false
>> addr::parse_domain_name("google.com").unwrap().is_icann()
true

It might be a good idea to deal with input case in a more explicit way, either by documenting that the validation is case-sensitive, converting the inputs internally, or using case-insensitive comparisons.

I expect there is some reason for this (probably defined in some RFC about domain names) that would seem obvious to someone who knows, but my expectation as a new user of the library is that a parse_domain_name function would deal with the casing of the input somehow.

Remove tests directory from published crate

The tests directory contains public_suffix_list.dat which is licensed under the MPL2. As far as I can tell it isn't used by any of the crate's dependencies. Can you please upload a new release with the tests directory (or at least the file) removed?

Thanks!

Consider pub use psl

#[cfg(psl)]
pub use psl;
// or
#[cfg(psl)]
pub use psl::List;

This would allow to use the example from the README w/o adding an additional crate to use this crate.

Allow compiling addr without depending on openssl

The sub-crates of addr appear to have features to disable fetching of remote lists, but addr doesn't allow a 'pass-through' of these features. It would be nice to be able to disable the remote list from addr.

Additionally, I don't suppose theres a way to build the list at runtime?

Document the difference between domain names and DNS names

Old issue: rushmorem/publicsuffix#33

My test program outputs domain contains illegal characters. But if you replace the _ with a -, it outputs sub-domain.example.com.

src/main.rs:

use addr::parser::DomainName;
use psl::List;

fn main() {
    match List.parse_domain_name("sub_domain.example.com") {
        Ok(ok) => println!("{}", ok),
        Err(err) => println!("{}", err),
    }
}

Cargo.local:

[package]
name = "adr"
version = "0.1.0"
authors = ["rusty-snake"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
addr = "0.11"
psl = "2.0"

Docs fail to build on docs.rs

Thanks for this excellent library.

It appears psl has been causing the documentation on docs.rs for the adblock crate to fail to build for quite some time now. However, it looks like the documentation for both psl and addr build without issues. I'm curious if you have any insight as to what the difference might be.

Original issue reported here: brave/adblock-rust#86

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.