Giter VIP home page Giter VIP logo

Comments (11)

egulias avatar egulias commented on August 28, 2024 1

Hi there!
Thanks for reaching out and @chconnor for the analysis.

On the analysis:

  1. "compilation issue": Will check it. Tests are passing though.
  2. Features: That's correct, the validator lacks any "feature flag" or ways of extracting parts of an email. It was not the intention of it. The way to add/remove is to implement the interface.
  3. (& 4)RFCs issues:
    "bob" @ example.com
    "Bob Smith" <[email protected]>
    Bob Smith <[email protected]>
    are all invalid emails, at the veary least because there's a whitespace in the string. So maybe you are trying to validate with an "email application" point of view?
  4. Sample usage: my bad, you just spotted typo/bug. Will fix. Thanks!
  5. Hand made lexer:
    RFCs uses a extended bnf which generates a crazy grammar. I also took a look at http://www.antlr.org/ back then but it seemed that the effort to learn the tool to generate it would not compensate.
    I'd be more than happy to change the approach if there's a tool out there for lexer generation out of the bnf already provided by the RFC.
    I haven't found any lib doing this without a regex, maybe a hand made parser is the least of the sins :).
    You can check https://github.com/egulias/EmailValidator4Scala for a similar to a regex approach (not yet complete however).
    Also this lib is a port from https://github.com/egulias/EmailValidator which has the similar testing and we can say is battle tested (used by Drupal 8, Symfony and with more than 7M downloads, see https://packagist.org/packages/egulias/email-validator).
    Regarding testing, was (is?) easier to test for the failures than for the crazy cases. You can take a look at egulias/EmailValidator's issues (e.g egulias/EmailValidator#161) to see how "crazy" it can get, to the point where being strict is not necesarly useful :S

One more thing: I checked Hibernate implementation of the @email for validation and... it is a regex. A well crafted one, passes almost all the cases (half a year ago when I did the tests).

Finally, the lib is young, yes. Is a port from another one and in Java is hard to get some adoption to battle test and improve. I'm more than happy to join efforts.

cheers!

from email-rfc2822-validator.

bbottema avatar bbottema commented on August 28, 2024 1

And whats the status of this task? Can email4j replace this lib?

I'm not convinced it solves any real problem. The only real advantage I see is a performance boost, but I'm not convinced this is that really an issue. Furthermore, I have very limited time, so I have to be a little critical as to which changes I spend my time on. Currently I mostly investigate bugs and useful features.

That aside, I'm not that happy about the FkEMailValidator code base. I can't easily analyse the code as it is in German and the quality is not great imo (lots of magic numbers), it's not really a library at all (no static code analyses, no flexibility to change what is validated, no standardized build script, not available in Maven Central, etc), so I don't see this the road to go forward with this library (yet).

As far as Email4J is concerned, I think the libraries are too divergent in feature set and compliancy and as I don't have the time to bridge the difference and patch the bugs in Email4J, I think we can simply coexist for the time being.

This library too has been battle tested and quite mature now, I have no qualms leaving it be until a true contender comes along. 🍺

from email-rfc2822-validator.

chconnor avatar chconnor commented on August 28, 2024

Wow, that's great news. I'll check it out.

from email-rfc2822-validator.

chconnor avatar chconnor commented on August 28, 2024

Unfortunately I haven't kept up with the RFC's to know how accurate this new parser/lexer is.
I note a few things:

  • The recent code wouldn't compile for me... I had to change line 138 (commit f23390a) of emailvalidator4j.parser.Parser.java to the following to get it to compile, and I don't know if this is the intent of the non-compiling line that was there:

if (!this.lexer.isNextToken(Tokens.SP) && !this.lexer.isNextToken(Tokens.HTAB)) {

  • I may well be missing something, but it seems a lot less featured than this project. E.g. I don't see methods for extracting parts of addresses, etc, or configuration options for allowing or disallowing this or that, etc. I didn't dig through everything, but that's the impression I get.

  • Maybe new RFCs are less permissive than 2822 was (we can only hope), but there are a few addresses that were valid under 2822 that this class won't validate. E.g.:

"bob" @ example.com -- invalid

  • It also failed to validate these very standard forms for me:
"Bob Smith" <[email protected]>
Bob Smith <[email protected]>

Those are probably the most common forms of address there are...but again, I had to modify the code to get it to compile, so maybe I broke it.

  • the demo code on the github page shows
    if (validator->isValid(args[0])
    ...which doesn't seem to be a proper use of the -> lambda operator and perhaps is confused with some other language's method reference syntax? Or perhaps I'm confused?

  • It would also appear that the author hand-made a lexer, instead of automatically generating one from a specified grammar, and it's the latter that Scott Kitterman recommended, as I understood him. Again, I'm not hip to the latest RFC's, but 2822 wasn't something that a person would likely correctly/completely implement by hand (hence the regex approach in the first place.) There are a respectable number of test cases present, but only a few that are intended to pass: the majority are invalid addresses. Meaning, the vast universe of crazy-but-legitimate addresses probably aren't handled. If one is going to hand-code, I intuit that the regex approach is probably the better route.

Upshot: emailvalidator4j seems young still, and I may not be testing properly, but I would stick with this project for now. Perhaps it will continue to improve. And perhaps I'm totally wrong in my analysis. :-)

Last commit was last Feb, I think.

from email-rfc2822-validator.

bbottema avatar bbottema commented on August 28, 2024

I'm grateful for your quick analyses. I'll invite the creator of EmailValidator4J to take a look and see if he can address some of observations you made.

from email-rfc2822-validator.

chconnor avatar chconnor commented on August 28, 2024

Hello again and thanks for the response --

re: (1) -- line 138 of Parser.java calls EmailLexer.isNextToken() with an ArrayList of Token objects (i.e. ArrayList<Tokens>.) My Java is rusty, so I can't say if this is wrong; I see that EmailLexer has isNextToken(List<TokenInterface> tokens) -- and since the Token object implements TokenInterface, perhaps this is OK and my older version of Eclipse is just confused, but I'm not able to compile (using Java 8) because no matching method in EmailLexer is found for that line 138. If I do an apparently-redundant cast to ArrayList it compiles (with a warning): if (!this.lexer.isNextToken((ArrayList)(new ArrayList<>(Arrays.asList(Tokens.SP, Tokens.HTAB))))) {

I don't know what to do about that, and will leave it to the more experienced Java folks such as yourself. :-)

re: (3) -- our class implements the grammar for the "mailbox" and "group" ABNF specifications from RFC 2822 (as well as a few other tokens). In other words, what we call an "email address" is either the "name-addr" or "addr-spec" tokens from the grammar.

However, I just checked 5322, and as with 2822, I think you will find that even the more restricted addr-spec token permits whitespace to be found in it. This is due to the liberally-defined "dot-atom" token, and since an (unquoted) addr-spec is just a dot-atom left of the "@" sign, and since the dot-atom grammar starts and ends with optional CFWS, you are technically allowed to insert any amount of folded whitespace before and after the "local part" left of the @ sign (please correct me if I have misunderstood this). I don't personally understand why they don't just remove this annoying "feature" of the spec, since nobody wants to allow it in practice, but for whatever (potentially good) reason, they have not.

As a result, the example addresses I supplied above are valid, as well as many more crazy than that. The following, for example, is a legitimate 2822 (and probably 5322) email address, white space and all:

"<bob \" (here) " < (hi there) "bob(the man)smith" (hi) @ (there) example.com (hello) > (again)

Our intent was to make a parser that matched as much of 2822 as possible, and which permitted users to extract as much usable information from email addresses presented in the wild as possible, with a focus on text extracted from email messages, as opposed to simply validating email addresses entered in webforms. We included some features to make it usable as a more "strict" email parser as well.

However an address like:

"Bob Smith" <[email protected]>

...is the most common form of address on the internet, and it seems to me that it would be wise for any email address parser to be able to handle that?

I'm no longer involved in maintaining this code, so I will leave it to the current maintainers to decide if it makes sense to join forces. It seems to me that they are different projects with slightly different goals, so maybe it doesn't make sense to do so.

from email-rfc2822-validator.

ea234 avatar ea234 commented on August 28, 2024

https://github.com/ea234/FkEMailValidator

from email-rfc2822-validator.

chconnor avatar chconnor commented on August 28, 2024

Thanks @ea234 -- I don't have time to really check this out or test it, but judging from an overview of the code it looks like good work. @bbottema it looks like a hand-tooled approach that includes exhaustive testing of many cases, so the first question of course would be if it covers all the weird cases. It also might not include the utility stuff that email-rfc2822-validator does (extraction, cleanup, etc.) not sure -- I don't know german. But if it does I would expect the hand-tooled approach to be faster, naturally.

from email-rfc2822-validator.

ea234 avatar ea234 commented on August 28, 2024

I added some changes:

  • A version with english comments (Google Translate)
  • A version with email part extraction (... in German)
  • Fixed a problem with weird inputs
    Input "Non EMail part <(comment)Local."Part"@[IPv6::ffff:127.0.0.1]>" is now OK

Answers to Questions:

  • I think i covered the most weird Inputs (see the Testclass)
  • every Class has its own MAIN-Function with testcases
  • I now have a Version with Part-Extraction. You can make a Diff between FkEMail_EXTRACTION and FkEMail) to spot the changes.

Questions:

  • IPv6 Adresses: do I check for "IPv6:" or only for "IPv6" ?
  • Square Brackets: Is the form "[email protected] cd" correct ?

from email-rfc2822-validator.

Lonzak avatar Lonzak commented on August 28, 2024

And whats the status of this task? Can email4j replace this lib?

from email-rfc2822-validator.

Lonzak avatar Lonzak commented on August 28, 2024

Thanks for the heads-up...

from email-rfc2822-validator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.