Giter VIP home page Giter VIP logo

Comments (6)

chconnor avatar chconnor commented on July 4, 2024

Non-ascii is forbidden in legitimate email addresses, at least in 'classic' addresses. There are more recent extensions to SMTP that I don't know much about that allow non-ascii in email headers, but AFAIK the standard protocol is still to use RFC 2047 to encode non-ascii as ascii. You seem to have a decoded address, there. So one option is to make sure you aren't decoding the addresses from the raw header before giving it to the validator.

But of course you are right: our class should be able to extract the address parts, even if the personal name is invalid per the RFC's.

I don't have time to work on this, personally, but maybe @bbottema can take a look at toughening the parser in these cases.

from email-rfc2822-validator.

kdabir avatar kdabir commented on July 4, 2024

@chconnor thanks for explaining. I saw the similar behavior using an npm module in node so I was guessing that it (non-ascii character) is not allowed as per RFC.

However, I am actually getting email addresses like this from an email api, and just wanted to extract the actual address (local + domain) and personal name from the entire address. Seems like no present Java/node library can perfectly do that :(

from email-rfc2822-validator.

chconnor avatar chconnor commented on July 4, 2024

Hopefully @bbottema has some time to check it out; shouldn't be hard to catch an appropriate exception and just not-fail when this happens. Or better, I suppose, to check for non-ascii preemptively and behave accordingly. Seems like an increasing number of mail servers are accepting and passing through UTF-8 type characters, so we should be able to handle it.

from email-rfc2822-validator.

bbottema avatar bbottema commented on July 4, 2024

I would love to add extra support this, but I recently became father and have my hands full (literally!). Adding non-standardized support isn't exactly on the top of my list currently.

from email-rfc2822-validator.

chconnor avatar chconnor commented on July 4, 2024

Oh, sure, pull the father card! :-)

I just took a look at it and it's going to be too complicated (and probably not appropriate) for us to handle non-ascii in addresses. I'd suggest pre-processing your addresses before sending them to our class. A brutal but simple way is to just strip out non-ascii characters. If you know the email address is not null, you can just do:

EmailAdressParser.getAddressParts(emailAddressreplaceAll("[^\\x00-\\x7F]", ""), EmailAddressCriteria.RFC_COMPLIANT, false);
...but that may not be what you want to accomplish since it will erase the personal name altogether. Actually extracting the unicode characters would require a significant re-write of this project, and I will guess that it isn't going to happen any time soon.

I don't know how you're getting these email addresses: it's possible that whoever is sending them to you is decoding them from properly-encoded RFC2047 strings, in which case you could ask them to stop doing that and send you the raw addresses from the email headers.

from email-rfc2822-validator.

bbottema avatar bbottema commented on July 4, 2024

Using Normalizer

string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^\\p{ASCII}]", "");
// or for unicode: 
string.replaceAll("\\p{M}", "");

This removes diacritics, but keeps base letters

from email-rfc2822-validator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.