Comments (6)
Non-ascii is forbidden in legitimate email addresses, at least in 'classic' addresses. There are more recent extensions to SMTP that I don't know much about that allow non-ascii in email headers, but AFAIK the standard protocol is still to use RFC 2047 to encode non-ascii as ascii. You seem to have a decoded address, there. So one option is to make sure you aren't decoding the addresses from the raw header before giving it to the validator.
But of course you are right: our class should be able to extract the address parts, even if the personal name is invalid per the RFC's.
I don't have time to work on this, personally, but maybe @bbottema can take a look at toughening the parser in these cases.
from email-rfc2822-validator.
@chconnor thanks for explaining. I saw the similar behavior using an npm module in node so I was guessing that it (non-ascii character) is not allowed as per RFC.
However, I am actually getting email addresses like this from an email api, and just wanted to extract the actual address (local + domain) and personal name from the entire address. Seems like no present Java/node library can perfectly do that :(
from email-rfc2822-validator.
Hopefully @bbottema has some time to check it out; shouldn't be hard to catch an appropriate exception and just not-fail when this happens. Or better, I suppose, to check for non-ascii preemptively and behave accordingly. Seems like an increasing number of mail servers are accepting and passing through UTF-8 type characters, so we should be able to handle it.
from email-rfc2822-validator.
I would love to add extra support this, but I recently became father and have my hands full (literally!). Adding non-standardized support isn't exactly on the top of my list currently.
from email-rfc2822-validator.
Oh, sure, pull the father card! :-)
I just took a look at it and it's going to be too complicated (and probably not appropriate) for us to handle non-ascii in addresses. I'd suggest pre-processing your addresses before sending them to our class. A brutal but simple way is to just strip out non-ascii characters. If you know the email address is not null, you can just do:
EmailAdressParser.getAddressParts(emailAddressreplaceAll("[^\\x00-\\x7F]", ""), EmailAddressCriteria.RFC_COMPLIANT, false);
...but that may not be what you want to accomplish since it will erase the personal name altogether. Actually extracting the unicode characters would require a significant re-write of this project, and I will guess that it isn't going to happen any time soon.
I don't know how you're getting these email addresses: it's possible that whoever is sending them to you is decoding them from properly-encoded RFC2047 strings, in which case you could ask them to stop doing that and send you the raw addresses from the email headers.
from email-rfc2822-validator.
Using Normalizer
string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^\\p{ASCII}]", "");
// or for unicode:
string.replaceAll("\\p{M}", "");
This removes diacritics, but keeps base letters
from email-rfc2822-validator.
Related Issues (20)
- Make the EmailValidator configurable HOT 1
- `"atest"@example.com´ is parsed to `[email protected]´ HOT 4
- "Bob" <[email protected]> is parsed to "[email protected]" <Bob> HOT 7
- Update project to Java 1.7 and Jakarta Mail HOT 2
- Remove proprietary annotations HOT 2
- IllegalArgumentException when passing null to EmailAddressValidator.isValid(String) HOT 4
- [question] Parsing problem with '=?UTF-8?Q?Gesellschaft_f=C3=BCr_Freiheitsrechte_e=2EV=2E?= <[email protected]>' HOT 14
- Brackets and Parens not parsed properly and API documentation / usage needs improvement HOT 13
- Verify regex crash doesn't happen HOT 4
- The prefix "Xlint" for element "Xlint:all" is not bound. HOT 1
- Consider making jakarata.mail as optional and make slf4j as scope test HOT 5
- Incorrect validation for a variety of emails HOT 12
- Published JARs for versions 2.2.0 and 2.3.0 have an invalid module name in MANIFEST.MF HOT 6
- Javadoc regarding default setting for email validation contradicting code, but what it should be HOT 1
- Address causing infinite loop
- URLDataSource uses source data name HOT 1
- Support international domain names HOT 2
- RFC: is & valid in a domain? HOT 3
- Investigate and possibly deprecate email-rfc2822-validator in favor of EmailValidator4J HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from email-rfc2822-validator.