Giter VIP home page Giter VIP logo

email-addresses's Introduction

email-addresses.js

An RFC 5322 email address parser.

v 5.0.0

What?

Want to see if something could be an email address? Want to grab the display name or just the address out of a string? Put your regexes down and use this parser!

This library does not validate email addresses - we can't really do that without sending an email. However, it attempts to parse addresses using the (fairly liberal) grammar specified in RFC 5322. You can use this to check if user input looks like an email address.

Note carefully though - this parser supports all features of RFC 5322, which means that "Bob Example" <[email protected]> is a valid email address. If you just want to validate the [email protected] part, that is RFC 5321, for which you want to use something like node-address-rfc2821.

Why use this?

Use this library because you can be sure it really respects the RFC:

  • The functions in the recursive decent parser match up with the productions in the RFC
  • The productions from the RFC are written above each function for easy verification
  • Tests include all of the test cases from the is_email project, which are extensive

Installation

npm install email-addresses

Example

$ node
> addrs = require("email-addresses")
{ [Function: parse5322]
  parseOneAddress: [Function: parseOneAddressSimple],
  parseAddressList: [Function: parseAddressListSimple] }
> addrs.parseOneAddress('"Jack Bowman" <[email protected]>')
{ parts:
   { name: [Object],
     address: [Object],
     local: [Object],
     domain: [Object] },
  name: 'Jack Bowman',
  address: '[email protected]',
  local: 'jack',
  domain: 'fogcreek.com' }
> addrs.parseAddressList('[email protected], Bob <[email protected]>')
[ { parts:
     { name: null,
       address: [Object],
       local: [Object],
       domain: [Object] },
    name: null,
    address: '[email protected]',
    local: 'jack',
    domain: 'fogcreek.com' },
  { parts:
     { name: [Object],
       address: [Object],
       local: [Object],
       domain: [Object] },
    name: 'Bob',
    address: '[email protected]',
    local: 'bob',
    domain: 'example.com' } ]
> addrs("[email protected]")
{ ast:
   { name: 'address-list',
     tokens: '[email protected]',
     semantic: '[email protected]',
     children: [ [Object] ] },
  addresses:
   [ { node: [Object],
       parts: [Object],
       name: null,
       address: '[email protected]',
       local: 'jack',
       domain: 'fogcreek.com' } ] }
> addrs("bogus")
null

API

obj = addrs(opts)

Call the module directly as a function to get access to the AST. Returns null for a failed parse (an invalid address).

Options:

  • string - An email address to parse. Parses as address-list, a list of email addresses separated by commas.
  • object with the following keys:
    • input - An email address to parse. Required.
    • rfc6532 - Enable rfc6532 support (unicode in email addresses). Default: false.
    • partial - Allow a failed parse to return the AST it managed to produce so far. Default: false.
    • simple - Return just the address or addresses parsed. Default: false.
    • strict - Turn off features of RFC 5322 marked "Obsolete". Default: false.
    • rejectTLD - Require at least one . in domain names. Default: false.
    • startAt - Start the parser at one of address, address-list, angle-addr, from, group, mailbox, mailbox-list, reply-to, sender. Default: address-list.
    • atInDisplayName - Allow the @ character in the display name of the email address. Default: false.
    • commaInDisplayName - Allow the , character in the display name of the email address. Default: false.
    • addressListSeparator - Specifies the character separating the list of email addresses. Default: ,.

Returns an object with the following properties:

  • ast - the full AST of the parse.
  • addresses - array of addresses found. Each has the following properties:
    • parts - components of the AST that make up the address.
    • type - The type of the node, e.g. mailbox, address, group.
    • name - The extracted name from the email. e.g. parsing "Bob" <[email protected]> will give Bob for the name.
    • address - The full email address. e.g. parsing the above will give [email protected] for the address.
    • local - The local part. e.g. parsing the above will give bob for local.
    • domain - The domain part. e.g. parsing the above will give example.com for domain.

Note if simple is set, the return will be an array of addresses rather than the object above.

Note that addresses can contain a group address, which in contrast to the address objects will simply contain two properties: a name and addresses which is an array of the addresses in the group. You can identify groups because they will have a type of group. A group looks something like this: Managing Partners:[email protected],[email protected];

obj = addrs.parseOneAddress(opts)

Parse a single email address.

Operates similarly to addrs(opts), with the exception that rfc6532 and simple default to true.

Returns a single address object as described above. If you set simple: false the returned object includes a node object that contains the AST for the address.

obj = addrs.parseAddressList(opts)

Parse a list of email addresses separated by comma.

Operates similarly to addrs(opts), with the exception that rfc6532 and simple default to true.

Returns a list of address objects as described above. If you set simple: false each address will include a node object that contains the AST for the address.

obj = addrs.parseFrom(opts)

Parse an email header "From:" address (specified as mailbox-list or address-list).

Operates similarly to addrs(opts), with the exception that rfc6532 and simple default to true.

Returns a list of address objects as described above. If you set simple: false each address will include a node object that contains the AST for the address.

obj = addrs.parseSender(opts)

Parse an email header "Sender:" address (specified as mailbox or address).

Operates similarly to addrs(opts), with the exception that rfc6532 and simple default to true.

Returns a single address object as described above. If you set simple: false the returned object includes a node object that contains the AST for the address.

obj = addrs.parseReplyTo(opts)

Parse an email header "Reply-To:" address (specified as address-list).

Operates identically to addrs.parseAddressList(opts).

Usage

If you want to simply check whether an address or address list parses, you'll want to call the following functions and check whether the results are null or not: parseOneAddress for a single address and parseAddressList for multiple addresses.

If you want to examine the parsed address, for example to extract a name or address, you have some options. The object returned by parseOneAddress has four helper values on it: name, address, local, and domain. See the example above to understand is actually returned. (These are equivalent to parts.name.semantic, parts.address.semantic, etc.) These values try to be smart about collapsing whitespace, quotations, and excluding RFC 5322 comments. If you desire, you can also obtain the raw parsed tokens or semantic tokens for those fields. The parts value is an object referencing nodes in the AST generated. Nodes in the AST have two values of interest here, tokens and semantic.

> a = addrs.parseOneAddress('Jack  Bowman  <[email protected] >')
> a.parts.name.tokens
'Jack  Bowman  '
> a.name
'Jack Bowman'
> a.parts.name.semantic
'Jack Bowman '
> a.parts.address.tokens
'[email protected] '
> a.address
'[email protected]'
> a.parts.address.semantic
'[email protected]'

If you need to, you can inspect the AST directly. The entire AST is returned when calling the module's function.

References

Props

Many thanks to Dominic Sayers and his documentation and tests for is_email which helped greatly in writing this parser.

License

Licensed under the MIT License. See the LICENSE file.

email-addresses's People

Contributors

0xflotus avatar baudehlo avatar chesnokovilya avatar jackbearheart avatar leafac avatar msimerson avatar osm avatar pauloppenheim-gingerlabs avatar rctay avatar sciyoshi avatar wbhob avatar zhangyijiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

email-addresses's Issues

Allow semicolon separator

When I cut and paste an address list from Outlook it comes in with a semicolon delimiter like this:

"Smith, Alice" <[email protected]>; "Jones, Bob" <[email protected]>

So I'd like it if email-addresses would tolerate either a , or ; separated list. But currently it requires a comma-separated list.

I would just blindly replace semicolons with commas, except that I'm worried about semicolons appearing in quoted names, so it would be best if they would get parsed properly by email-addresses.

.git

when i have installed,i found there a .git file in the folder

Finding email addresses in arbitrary text

Hi!

Do you think this library can be used for finding emails in arbitrary texts? For example, I want to pull [email protected] out of the following string:

"My email address is [email protected]."

Right now I'm splitting the text on /\s+/ and running each part though this module to see if it's an email address. Is there a better way to utilize this module?

Edit: Sometimes I have emails that are not separated by whitespace. For example <strong>[email protected]</strong> or [email protected]. Is there a more liberal (but still safe) boundary than /\s+/?

Properly parse FROM email headers with colon, comma characters

We've encountered a few email address headers that are not properly parsed, likely because they contain comma and colon characters. These return null when parsed via parseOneAddress.

Test code:

var Addrs = require('email-addresses');
function testParse(fromHeader) {
  console.log("Input: "+fromHeader);
  var parsed = Addrs.parseOneAddress(fromHeader);
  console.log("Output: %s", inspect(parsed));
}

Here's some sample inputs and null failure output:

Input: iZotope, Inc. <[email protected]>
Output: null
Input: Goodhertz, Inc. <[email protected]>
Output: null
Input: HowlRound: A Center for the Theater Commons <[email protected]>
Output: null

Removing the colon and comma characters generates successful output:

Input: iZotope, Inc. <[email protected]>
Output: null
Input: iZotope Inc. <[email protected]>
Output: { parts: 
   { name: 
      { name: 'display-name',
        tokens: 'iZotope Inc. ',
        semantic: 'iZotope Inc.',
        children: [Object] },
     address: 
      { name: 'addr-spec',
        tokens: '[email protected]',
        semantic: '[email protected]',
        children: [Object] },
     local: 
      { name: 'local-part',
        tokens: 'izotope',
        semantic: 'izotope',
        children: [Object] },
     domain: 
      { name: 'domain',
        tokens: 'izotope.com',
        semantic: 'izotope.com',
        children: [Object] } },
  name: 'iZotope Inc.',
  address: '[email protected]',
  local: 'izotope',
  domain: 'izotope.com' }
Input: Goodhertz, Inc. <[email protected]>
Output: null
Input: Goodhertz Inc <[email protected]>
Output: { parts: 
   { name: 
      { name: 'display-name',
        tokens: 'Goodhertz Inc ',
        semantic: 'Goodhertz Inc',
        children: [Object] },
     address: 
      { name: 'addr-spec',
        tokens: '[email protected]',
        semantic: '[email protected]',
        children: [Object] },
     local: 
      { name: 'local-part',
        tokens: 'support',
        semantic: 'support',
        children: [Object] },
     domain: 
      { name: 'domain',
        tokens: 'goodhertz.com',
        semantic: 'goodhertz.com',
        children: [Object] } },
  name: 'Goodhertz Inc',
  address: '[email protected]',
  local: 'support',
  domain: 'goodhertz.com' }
Input: HowlRound: A Center for the Theater Commons <[email protected]>
Output: null
Input: HowlRound A Center for the Theater Commons <[email protected]>
Output: { parts: 
   { name: 
      { name: 'display-name',
        tokens: 'HowlRound A Center for the Theater Commons ',
        semantic: 'HowlRound A Center for the Theater Commons',
        children: [Object] },
     address: 
      { name: 'addr-spec',
        tokens: '[email protected]',
        semantic: '[email protected]',
        children: [Object] },
     local: 
      { name: 'local-part',
        tokens: 'webmaster',
        semantic: 'webmaster',
        children: [Object] },
     domain: 
      { name: 'domain',
        tokens: 'howlround.com',
        semantic: 'howlround.com',
        children: [Object] } },
  name: 'HowlRound A Center for the Theater Commons',
  address: '[email protected]',
  local: 'webmaster',
  domain: 'howlround.com' }

Are these not parsing properly because they don't meet the RFC 5322 standard?

It'd be great to be able to support these and headers like them, particularly on calls to parseOneAddress, as delimiter checking shouldn't be necessary for a single address. These and headers like them are seen commonly enough in email we are seeing in practice that it may make sense to extend support.

Accept address lists with a trailing comma

Would be nice if the trailing comma was ignored and this worked:

$ npm install email-addresses
npm http GET https://registry.npmjs.org/email-addresses
npm http 304 https://registry.npmjs.org/email-addresses
[email protected] node_modules/email-addresses
$ node
> var email = require('email-addresses')
undefined
> email.parseAddressList('Kysen Vogel <[email protected]>,')
null

Names with periods in them don't resolve

Repro:

$ npm install email-addresses
npm http GET https://registry.npmjs.org/email-addresses
npm http 304 https://registry.npmjs.org/email-addresses
[email protected] node_modules/email-addresses
$ node
> var email = require('email-addresses')
undefined
>  email.parseAddressList('H.B. Lewis <[email protected]>')
null
>  email.parseAddressList('Cedarville Store Info. <[email protected]>')
null
>  email.parseAddressList('stardust.com <[email protected]>')
null

Should '[email protected]' (apostrophes included!) be a valid email address?

The following script instead of returning null, returns a parsed address, with apostrophes present in local and domain parts:

const addr = require('email-addresses');
console.log(addr.parseOneAddress("'[email protected]'"));

Result:

{
  parts: {
    name: null,
    address: {
      name: 'addr-spec',
      tokens: "'[email protected]'",
      semantic: "'[email protected]'",
      children: [Array]
    },
    local: {
      name: 'local-part',
      tokens: "'foo",
      semantic: "'foo",
      children: [Array]
    },
    domain: {
      name: 'domain',
      tokens: "bar.com'",
      semantic: "bar.com'",
      children: [Array]
    },
    comments: []
  },
  type: 'mailbox',
  name: null,
  address: "'[email protected]'",
  local: "'foo",
  domain: "bar.com'",
  comments: '',
  groupName: null
}

Domain name bar.com' is obviously invalid.

Library version: 3.1.0

Parser fails when quoted string includes dquotes and comma

Hi,

First of all, thanks for the lib, it helps a lot :)

I may have some cases where the parser fails :

//Inside dquotes are not escaped, but it works
>> emailAddresses.parseAddressList({
    input     : '"Blah blah "Some quoted string" foo bar" <[email protected]>',
    rfc6532   : true,
    partial   : true,
    simple    : false,
    strict    : false,
    rejectTLD : true,
    startAt   : 'address-list'
});
>> OK

//Inside dquotes are not escaped, but it fails only because of the comma
>> emailAddresses.parseAddressList({
    input     : '"Blah blah "One, comma" foo bar" <[email protected]>',
    rfc6532   : true,
    partial   : true,
    simple    : false,
    strict    : false,
    rejectTLD : true,
    startAt   : 'address-list'
});
>> null

//Inside dquotes are escaped, but it still fails because of the comma
>> emailAddresses.parseAddressList({
    input     : '"Blah blah \"One, comma\" foo bar" <[email protected]>',
    rfc6532   : true,
    partial   : true,
    simple    : false,
    strict    : false,
    rejectTLD : true,
    startAt   : 'address-list'
});
>> null

If I got the RFC5322 right :

  • The display-name is a phrase
  • A phrase can be a 1*word.
  • A word can be a quoted-string
  • A quoted-string is like "qcontent"
  • A qcontent can be qtext or a quoted-pair
  • A quoted-pair can be \VCHAR (VCHAR includes the comma and the double quote chars)
  • A qtext is the set of ASCII chars %d33 and %d35-91 and %d93-126. In other words, the printable US-ASCII characters not including \ or the quote character

So, for those display names :

  • "Display, name" is ok. It's a simple qtext
  • "Display "middle" name" is not ok. Dquotes are not allowed in qtext and should be a quoted-pair, as \". However the parser accepts it.
  • "Display \"middle\" name" is ok. Dquotes are quoted characters, as described in the section 3.2.1 of the RFC5322
  • "Display \"one, comma\" name" is ok. The comma is accepted as part of the qtext, and the dquotes are escaped. However the parser fails

I can also be wrong because that RFC is awful to read and understand :D In this case, I'd be glad to hear your explanations :)

Thank you for your time !

Address list without commas?

Is it possible to fiddle with this parser to make it accept commas? I've been fiddling with in for a while and I can't seem to make commas optional. Not a big deal, just wondering.

Simple option doesn't seem to work

> addrs.parseOneAddress({ input: 'a@b', simple: true })
{ parts:
   { name: null,
     address:
      { name: 'addr-spec',
        tokens: 'a@b',
        semantic: 'a@b',
        children: [Array] },
     local:
      { name: 'local-part',
        tokens: 'a',
        semantic: 'a',
        children: [Array] },
     domain: { name: 'domain', tokens: 'b', semantic: 'b', children: [Array] },
     comments: [] },
  type: 'mailbox',
  name: null,
  address: 'a@b',
  local: 'a',
  domain: 'b',
  groupName: null }

Was expecting it to just return a@b. Am I using this right?

Feature Request: Include the parsed value in the returned objects

In the addresses portion, return the full value captured by the parser.

ex.

> addrs.parseAddressList('[email protected], Bob <[email protected]>')
[
  {
    parts: {
      name: null,
      address: [Object],
      local: [Object],
      domain: [Object],
    },
    name: null,
    address: "[email protected]",
    local: "jack",
    domain: "fogcreek.com",
    full: "[email protected]",
  },
  {
    parts: {
      name: [Object],
      address: [Object],
      local: [Object],
      domain: [Object],
    },
    name: "Bob",
    address: "[email protected]",
    local: "bob",
    domain: "example.com",
    full: "Bob <[email protected]>",
  },
];

Parsing error on email names

When parsing an email formated like "Kyle Mathews <[email protected]"—where the name doesn't have a quote around it—the name always loses the space between the two parts. So "Kyle Mathews" becomes "KyleMathews".

coffee> ea = require('email-addresses')
{ [Function: parse5322]
  parseOneAddress: [Function: parseOneAddressSimple],
  parseAddressList: [Function: parseAddressListSimple] }
coffee> ea.parseOneAddress("Kyle Mathews <[email protected]>")
{ name: 'KyleMathews',
  address: '[email protected]',
  local: 'blah',
  domain: 'blah.com' }

Line feed in address string will return null

Lately I've been seeing emails that have a line feed separating the user name and email address. This causes email-addresses to return null, even though the email address is valid.

$ node
> var addrs = require("email-addresses")
> addrs.parseOneAddress('"Jack Bowman"\n<[email protected]>')
parseString = "Jack Bowman"
<[email protected]>
parsed = null
null

rejectTLD option for input with no domain part produces error

Version 3.0.1

addrs = require("email-addresses")
addrs.parseAddressList({input: 'jack@'})
null

addrs.parseAddressList({input: 'jack@', rejectTLD: true})
TypeError: Cannot read property 'semantic' of null
at domainCheckTLD (email-addresses/lib/email-addresses.js:580:27)

Handle unicode names?

I'm getting null when trying to parse unicode names:

node 
> addrs = require("email-addresses")
> addrs.parseOneAddress('杨孝宇 <[email protected]>')
null

Are there plans to support it?

support ES module importing

I'd love to be able to use this library directly out of node_modules, but it doesn't work with ES modules.

Ideally it would export native ES Modules, but I'd also be fine with just loading it for side effects. The only problem is that "this" is undefined in ES Modules. Changing the last line from "}(this));" to "}(self));" makes that possible.

`parseOneAddress` sometimes returns `null`

function parseOneAddress(input: string | Options): ParsedMailbox | ParsedGroup;

I use this package to scan my spam email addresses, and some of them are not legitimate. Like the From field would appear as Example <abc.example.com>, so the parseOneAddress function returns null.

This wasn't indicated in the types. Please add null as a possible return type, for the functions for which this is true.

Thanks!

`isValid` method?

Other packages such as isemail, @hapi/address, and validator seem to have validation bugs. Parsing packages such as addressparser in nodemailer/lib/addressparser also seem to have issues when parsing certain addresses. It would be useful if this package exposed a simple isValid method which returned true or false. I'm assuming this could be a simple wrapper around parseOneAddress method which checked if the return value had a typeof with value of "Object" - but I haven't looked close enough yet to determine if that's the case.

Support for rfc6854

rfc6854 allows a From address to be either a mailbox-list or an address-list.

The parser should support that.

It's a bit more complex than changing where to start the parser at, because if there's a "group" (or multiple groups) then the set of addresses should be collected under that group. For example: Group1:foo@bar,boo@baz;, Group2:blah@fod; should return 2 groups with the sets of addresses underneath them.

https://tools.ietf.org/html/rfc6854

How should I detect ParsedGroup?

Hey, thanks for the great module!

I have a minor issue that may just be a type definitions thing.
The following code in typescript:

function getAddress(headerString: string): string | null {
  const parsedAddress = emailAddresses.parseOneAddress(headerString);
  if (parsedAddress) {
    return parsedAddress.address;
  }
  return null;
}

Gives me the warning Property 'address' does not exist on type 'ParsedGroup'

It looks like all functions return ParsedMailbox | ParsedGroup (array of or single item) so this means two things.

  1. It does not explicitly tell me that the result may be null
  2. It does not guarantee that if there is a result, it will contain the email address.

My question is, how do I detect that it is not a ParsedGroup type and under what circumstances would ParsedGroup be returned instead of ParsedMailbox (or an array of ParsedMailbox).

Release v3.0.2?

Hi Jack,

I just bumped into #33. I saw a fix was merged on Aug 23, 2017, but it was never released. What can I do to help the release of this patch as v3.0.2?

Bracketing not supported

Hi,

I am using the 'email-addresses' package in order to parse incoming emails senders.
However, sometimes I get something like "JOHN DOE[XXX - DOE John] [email protected]". The functions parseOneAddress() and parseAddressList() both return null (image below).

Would be nice to accept this kind of format.

image

Thanks!

Document how to get semantic parts

After the change from #2, the parsed addresses now include any non-semantic content. It appears that this can now be retrieved through parts.local.semantic and parts.domain.semantic (parts.address is the address function from line 402, which is probably not intended) but these are not documented in the README.

I would argue that the semantic content should be the default for returned address, local, and domain parts (although not name, for the reasons in #2) since callers are unlikely to be expecting whitespace and RFC 5322 comments in the address parts, but if you'd prefer to keep the current defaults, a comment about the preferred way of accessing the semantic content in the README might save future users some time and would be much appreciated.

Thanks,
Kevin

Whitespaces in address?

It seems whitespaces are allowed in addresses. I received an email into our system with the below "from" address and parsing failed and returned null.

var from = '"xxxxxxx, yyy A MAJ USARMY 2 CAV REGT (US)"\n\t<[email protected]>\n';

console.log(addrs.parseOneAddress(from)); // returns null

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.