Giter VIP home page Giter VIP logo

Comments (15)

ggoforth avatar ggoforth commented on June 1, 2024 2

screen shot 2018-07-27 at 3 13 12 pm

But for reals, reading this issue is fantastic. I like the solution proposed at the end, and the amount of testing done against it. 馃憤

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024 1

I noticed we do have an example documented in best practices here:
https://github.com/Shift3/standards-and-practices/tree/main/best-practices/development-tools/validation#code-example

For this to be a completed standard, we should include a definition for our goal on what should and shouldn't pass this validation. It should also include a set of unit tests to verify that goal.

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024

This is a promising solution if we could figure out a way to standardize it for our projects.
https://github.com/django/django/blob/master/django/core/validators.py#L164

Plus lots of examples and resources here:
http://emailregex.com/

from standards-and-practices.

zbyte64 avatar zbyte64 commented on June 1, 2024

Stackoverflow answer has a pretty awesome regexp pattern with a state machine diagram: https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression

But considering that different languages have different regexp syntaxes it might be better to designate a validation library for each language we use. For nodejs isemail looks pretty robust: https://github.com/hapijs/isemail/blob/master/test/tests.json

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024

I would like to humbly propose a solution which performs as well as the RFC5322 Official Standard (in my particular test set) but is much easier to understand and verify.

^(?!\.)(?!.*?\.(\.|@))[\w\d.!#$%&'*+\-\/=?^_`{|}~]+@[\w\d.-]+\.[\w\d]{2,}$
  • ^ - start of line
  • (?!\.) - don't allow the line to start with .
  • (?!.*?\.(\.|@)) - don't allow consecutive periods, ex. ([email protected]). Also don't allow a period at the end of the local part, ex ([email protected])
  • [\w\d.!#$%&'*+\-\/=?^_`{|}~]+ - match one or more letters, numbers, and these special characters: .!#$%&'*+-/=?^_`{|}~
  • @ - match the literal character @
  • [\w\d.-]+ - match one or more letter, digit, period (.), or hyphen (-)
  • \. - match a period (.)
  • [\w\d]{2,} - match 2 or more letters and numbers
  • $ - end of line

This regex can be tested here: https://regex101.com/r/A9jZZ4/4
This is not meant to be a perfect solution, but should cover 99% of email addresses Shift3 would expect to deal with, while catching some basic mistakes for user convenience. It does NOT handle extended ASCII / international characters, which the RFC 5322 standard does.

The following email addresses expectedly pass this validation:

The following email addresses expectedly fail this validation:

[email protected]
@test.com
admin@mailserver1
"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"@example.org
user@[2001:DB8::1]
" "@example.org
[email protected]
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com
Abc.example.com 
A@b@[email protected]
a"b(c)d,e:f;g<h>i[j\k][email protected]
just"not"[email protected]
this is"not\[email protected]
this\ still\"not\\[email protected]
[email protected]
[email protected]
[email protected].

I would appreciate if others would throw some other test cases against this regex and try to break it.

For reference, here is the RFC 5322 Standard I am comparing against.

^(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$

Found at http://emailregex.com/

from standards-and-practices.

zbyte64 avatar zbyte64 commented on June 1, 2024

Running through the validation examples from isemail against

^(?!\.)(?!.*?\.(\.|@))[\w\d.!#$%&'*+\-\/=?^_`{|}~]+@[\w\d.-]+\.[\w\d]{2,}$

Most notable is the lack of UTF8 support and hyphen handling.

False positives:

[email protected]
a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.abcdefghijkl.hijk
[email protected]
[email protected]
[email protected]

False negatives:

[email protected]
帽o帽贸[email protected]
test@\uD800\uD800帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽o帽贸閮典欢帽o帽贸.閮典欢帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽贸閮典欢.帽o帽贸閮典欢帽o帽贸閮典欢.o帽贸閮典欢帽o帽贸閮典欢帽o帽贸閮典欢.鍟嗗嫏

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024

Hyphen support I'm not as concerned with, in terms of hitting that balance of simplicity vs. complete accuracy to RFC 5322. A false positive is not a big deal, vs a false negative which would stop a valid user from accessing a service. With that in mind, the false negatives do seem like a problem. How common is UTF8 support with the major email providers? And what percentage of users would hit that use case? If we're talking < 1 %, I would rather just tell a user to use a different email address.

Let me know what you guys think.

from standards-and-practices.

stephengtuggy avatar stephengtuggy commented on June 1, 2024

Personally, I've known people from multiple people groups in various parts of the world, and as far as I recall, almost all of them used plain ANSI characters in their email addresses, web addresses, and IM'ing. So I don't think UTF-8 support is a big deal.

from standards-and-practices.

zbyte64 avatar zbyte64 commented on June 1, 2024

Frankly, I think it is more important to adopt a library for this concern then to bless a regex to be copied for all projects. Having a small clever regex pattern to stamp out is cool but it runs afoul with the DRY principle: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself. The argument for simplicity makes more sense if we're the one's maintaining the code, which for something as common as email validation, can we not?

Emoji is another reason to support UTF8: https://medium.com/@zackbloom/i-have-a-unicode-email-address-fbecd630ec12

If we're good at out jobs, our software should live to see a day when UTF-8 is more common in email addresses. Since we're here to address email validation, let's do it so we don't have to again.

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024

I don't disagree. My goal in this particular task was to discover a good front-end validation for email which gives a user immediate feedback to avoid typos, not necessarily to vet and validate all possible correct email addresses (we can leave that to the 3rd party email service).

The issue I see with using someone else's library for this is that we support and develop for many frontend frameworks (ionic, react, .net mvc, nativescript, xamarin..... ) One library would not work across all of those. A regex line would.

I imagine this being the beginning of a shift3 internal library of common functions, which we could build out for all of our primary development . If these things were rolled into our own libraries, we'd be respecting DRY way more than we do nowadays (across projects, not necessarily per individual project).

@zbyte64 I'm definitely open for other suggestions as well. Let me know if there was a particular library you had in mind, or if there is something you're already doing on your projects that you really like.

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024

@michaelachrisco 3 years later and this is still a recurring issue in projects. Now that we have boilerplates to implement a standard, I think this is a good time to resurface this.

Now that we're supporting locale translation in the boilerplates, I think the UTF-8 argument has some more strength behind it.

I suspect for client-side validation we will still be served best by simple and permissive validation, as opposed to strict and technical. What do you think?

from standards-and-practices.

coreyshuman avatar coreyshuman commented on June 1, 2024

Adding that I agree with Justin Schiff's assessment here:

@coreyshuman I would normally agree, but what i'm trying to make clear here is that complicated email regex is not the preferred pattern for signup or email validation anyway. Attempting to send an email to the address specified is. Provided a permissive regex, or none at all (or just asking the user to enter their email twice) while sending a confirmation email, is a 100% method to ensure you end up with a valid email address, and 100% method to make sure you have no false negatives.

When you run into an "edge case" in your complicated regular expression you have to do the follow -> find the fix, hope you don't implement a regression possibly in other untested parts of the regex -> backport to all running applications using the old regex -> make sure all old versions of applications are updated -> etc. etc. etc.

I think that have an email regex may be valuable for things other than sign up fields, but I want it to be clear that in my opinion for sign in/sign up this is not the preferred pattern of validation, nor does it enhance security.

Originally posted by @DropsOfSerenity in #130 (comment)

from standards-and-practices.

Karvel avatar Karvel commented on June 1, 2024

The current RegEx in the Angular boilerplate is the following:

/^[a-z0-9!#$%&'*+\/=?^_\`{|}~.-]+@[a-z0-9]([a-z0-9-])+(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$/i

For the test sets you provided above, all of the ones that should match do, and the commented out ones below that should fail pass.

        const failingValues: string[] = [
          // '[email protected]', //
          '@test.com',
          // 'admin@mailserver1', //
          `"()<>[]:,;@\\\"!#$%&'-/=?^_\`{}| ~.a"@example.org`,
          'user@[2001:DB8::1]',
          '" "@example.org',
          '[email protected]',
          '"very.(),:;<>[]".VERY."very@\\ "very".unusual"@strange.example.com',
          'Abc.example.com ',
          'A@b@[email protected]',
          'a"b(c)d,e:f;g<h>i[jk][email protected]',
          'just"not"[email protected]',
          'this is"[email protected]',
          'this still"not\\[email protected]',
          // '[email protected]', //
          '[email protected]',
          '[email protected].',
        ];

I do have unit tests for the validator using the regular expression, but I can add the test sets as follows:

      describe('[Unit] EmailValidation validEmail() Required', () => {
        const urlValidator = EmailValidation.validEmail(true);
        const emailControl = new FormControl('');
        const matchingValues: string[] = [
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '[email protected]',
          '1234567890123456789012345678901234567890123456789012345678901234+x@example.com',
        ];

        const failingValues: string[] = [
          '@test.com',
         `"()<>[]:,;@\\\"!#$%&'-/=?^_\`{}| ~.a"@example.org`,
          'user@[2001:DB8::1]',
          '" "@example.org',
          '[email protected]',
          '"very.(),:;<>[]".VERY."very@\\ "very".unusual"@strange.example.com',
          'Abc.example.com ',
          'A@b@[email protected]',
          'a"b(c)d,e:f;g<h>i[jk][email protected]',
          'just"not"[email protected]',
          'this is"[email protected]',
          'this still"not\\[email protected]',
          '[email protected]',
          '[email protected].',
        ];

        it(`should return null if value matches a list of values that should work`, () => {
          matchingValues.forEach((value) => {
            emailControl.setValue(value);
            expect(urlValidator(emailControl)).toEqual(null);
          });
        });

        it(`should return { invalidEmail: 'Please enter a valid email.' } if value matches a list of values that should fail`, () => {
          failingValues.forEach((value) => {
            emailControl.setValue(value);
            const expectedValue = {
              invalidEmail: 'Please enter a valid email.',
            };
            expect(urlValidator(emailControl)).toEqual(expectedValue);
          });
        });
      });

We can decide if we want to keep the current RegEx, change it, and add the above test values.

Either way, the boilerplate also follows the recommendations that @DropsOfSerenity posted above: it requires confirming the email address and sends an activation email to that account.

from standards-and-practices.

michaelachrisco avatar michaelachrisco commented on June 1, 2024

@michaelachrisco 3 years later and this is still a recurring issue in projects. Now that we have boilerplates to implement a standard, I think this is a good time to resurface this.

Now that we're supporting locale translation in the boilerplates, I think the UTF-8 argument has some more strength behind it.

I suspect for client-side validation we will still be served best by simple and permissive validation, as opposed to strict and technical. What do you think?

@coreyshuman I agree with making validation simple and permissive as you stated. If we get too strict with the REGEX/standard, we may get quite a few false positives (I remember a few horror projects I worked on in the EDI world). Emojis are now valid email addresses. Its a strange world we live in.

I also like the example @Karvel shows by adding real email addresses to the unit tests for each of the valid/invalid emails. As time goes on, this list will naturally expand as we find a user with some strange valid email address that we will need to accommodate and we can just add that to the unit test/fix.

Most of the projects I have worked on in the past has stolen or use thee default MDN example here: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/email and called it a day.

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This, of course, leaves in bugs (like https://www.w3.org/Bugs/Public/show_bug.cgi?id=15489) but it does seem to be "good enough" for most.

I feel like we could add unit tests to the examples https://github.com/Shift3/standards-and-practices/tree/main/best-practices/development-tools/validation#code-example but a better place would probably be in the boilerplate projects.

from standards-and-practices.

stephengtuggy avatar stephengtuggy commented on June 1, 2024

FWIW, I also agree with making validation simple and permissive. And with requiring confirmation emails. I think something like @Karvel 's regex or the MDN one @michaelachrisco mentioned would probably work well.

from standards-and-practices.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.