Giter VIP home page Giter VIP logo

usaddress's Introduction

Icon usaddress

Build status

Fork of https://usaddress.codeplex.com/

Since codeplex is closing soon, here is the readme from codeplex:

Project Description

This is a partial port of the Perl Geo::StreetAddress::US CPAN module to C#. The goal is to take a US address as a single-line string and parse it out into its component pieces to accelerate data entry and import.

The class AddressParser takes a semi-structured address input as a single String and returns it parsed into an AddressParseResult instance. It includes some unit tests and a console application so that you can play around with it:

Screenshot

Where this came from

This code is a partial port of the Geo::StreetAddress:US Perl module from CPAN written by Schuyler D. Erle. In his case, he wrote it as part of the great geocoder.us service that provides free geocoding for US addresses (with some reasonable rate limiting and restrictions for commercial use).

As such, the original Perl module has the ability to parse intersections "Main St & 1st St, Anytown, VA 12345" and partial addresses. I didn't port this functionality over to C# for reasons explained below.

What this is for

In my case, I wanted to be able to provide a single textbox on a Web page for users to paste in shipping addresses and be able to parse out the address correctly (in a large percentage of cases, at least) into the individual street / city / state / zip fields for database storage and submission to third-party APIs. Instead of having to tab between four different fields, the user can just paste in the address from Word or another Web page and off they go. (If the address parsing fails, then I pop up an AJAX dialog that shows the individual form fields.) I wanted to be able to do this without having to worry about subscribing to, paying for, and integrating with a third-party CASS-certified address verification API. In other words, it's meant to provide a convenience for users when it comes to data entry, not data validation.

As a result of this difference in intended use (geocoding vs parsing), I neglected to port some functionality of the original Perl module (intersections and partial addresses) and added some additional functionality (recognizing PO boxes and military addresses as well as correcting secondary unit abbreviations).

The class could also be useful if you have a large list of unstructured address input and you want some help in getting everything merged into a "mostly correct" set of delimited fields without actually paying for a CASS-certified service.

What this is NOT for

This does not provide CASS-certified address correction, and it does not verify a delivery point. It does not tell if you an address is correct and/or deliverable. The USPS address database costs real money and is updated monthly, and this doesn't depend on that.

As such, you can pass in "321 Cheese Street Apt A Sillytown Virginia 12345", and it will happily spit back "321 CHEESE ST APT A; SILLYTOWN VA 12345". It does not know whether or not an address exists.

As such, it cannot provide perfect parse results in all cases, although an effort has been made to cover common ones, such as grid-style addresses, Queens-style addresses, post office boxes, and military addresses. But without a list of valid delivery points, it won't be able to decide if "403D S St" should be "403 SOUTH ST APT D" or "403 D S ST". This is also particularly common when users add nonsense to the street line.

How it works

Like the original Perl version, the AddressParser class solves the problem by building up a fantastically giant regular expression ("now we have two problems") based on the common abbreviations and formats that a US address can be in. Calling ParseAddress() simply runs a match against that regex and returns any of its named captures as properties in an AddressParseResult instance.

It is certainly not magic. Without a list of valid delivery points, it's impossible for it to know the true intent in some cases, especially if you hand it addresses where users have typed crap in the street line. ("123 Main St Door Code # 438" would result in it thinking the street is named "MAIN DOOR CODE", for example, and without a list of valid streets, it's not going to know that "DOOR CODE" should be part of a leftovers field instead of the street name.)

Make sure that you can accept these limitations in your intended use case. I have posted it because I feel that it's something that people could still use in a great many products, and that porting it to C# makes it a bit more accessible to many more people.

Last edited Dec 24, 2011 at 1:26 PM by npiaseck, version 12

Available on NuGet

Install-Package AddressParser

Icon

Map by Pieter J. Smits from The Noun Project

usaddress's People

Contributors

eslamkamal1985 avatar jamesrcounts avatar jcw87 avatar tetsuo13 avatar waynebrantley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

usaddress's Issues

OutOfMemoryError parsing some addresses

I have been using this library successfully for quite a while, but just deployed to a production environment and the ParseAddress method is causing the .Net Regex class to trigger an OutOfMemoryError. I have not seen this problem during development -- it only is happening in production.

Some sample addresses that trigger the behavior (but the exact address may not matter):

  • 4611 KOLZE AVE, SCHILLER PARK, IL
  • 1654 ILLINOIS ST, DES PLAINES, IL
  • 11 W HUBBARD ST, CHICAGO, IL

Any suggestion of what we can look for that may be triggering this behavior? We have very limited access to the production environment, so just dropping into a debugger is not feasible.

oom

Parsing address without City, State, Zip?

Hello, I'm curious if you can support the parsing of an address without City, State, and Zip? I have a situation where I can be given a string for an partial address and even if I can't get a full address I'd like to parse into number, pre direction, street and suffix.

I think the workaround would be to append bogus data to the string if I detect that, e.g. "Beverly Hills, CA 90210".

Dotnet Core Support

I noticed that some of the forks out there seem to have been created just to add core support. It would be great to roll that support into this library.

Route thrown away

This is not a verified address.

Route 1 Gouldsboro ME 04607

The STREET comes back as Gouldsboro ME.
There is no STATE or CITY in the parsed results and the word Route is completely discarded.

Is this available via Nuget?

I see there is another "fork of the codeplex version" at https://github.com/twz1234/AddressParser in Nuget, but I can't locate this one. It appears that this one has had enough updates applied since then to make it more useful.

I know this is a convenience matter but appreciate your efforts on maintaining this library in any case.

Address Incorrectly Parsed "no 228"

97327 Forest Ln, No. 228\nDallas, TX 75243'
Streetline value becomes "22 8" The expected value would be something like "97327 Forest Ln, No. 228"

PlacePattern matches partial state names

Using the following address text, PlacePattern will match it incorrectly:

FT LAUDERDALE,FL,33312

It will match the city as 'FT', the state as 'LA', and nothing for the zip code. Removing the last question mark from the StatePattern seems to fix this.

in x64 bit mode parser goes into inifinite loop

If project is built with x64 ParseAddress goes into infinite loop.
Googling suggested that there is an isuue with regex bakctracking. The only solution seems to be rewrite/simplify/split in pieces regex.

Address incorrectly parsed

This is a valid address - you can confirm it any address verification site.

3360 County Road F  Tekamah NE 68061

When AddressParser takes this, it parses that the CITY is F Tekamah
Of course the CITY should be Tekamah and the STREET should be County Road F

Additionally it makes the STREET County Rd instead of County Road

Thoughts?

TODOs:

  • Add Bulk Testing Method
  • Add Failing Examples
  • Add 10k samples
  • Solve failing examples (possibly with bigger refactorings)
    • 3360 County Road F Tekamah NE 68061
    • 623 NE 5th AVE Fort Lauderdale FL 33304
    • Route 1 Gouldsboro ME 04607
    • 3419 Avenue C Council Bluffs IA 51501
    • 65 Ginger Woods Valley NE 68064
    • Rte 175 Blue Hill Falls Rd Blue Hill ME 04614
    • 1302 LUCERNE AVENUE Lake Worth FL 33460
    • RR 1 Box 3145 Sedgwick ME 04676
  • Create plugin to allow city to be looked up from an external source (think this should be its own issue)

Handling a directional in an Apt/Unit number

This is parsing very nicely for the most part but not when there is a directional in the Unit/Apt number.

355 8TH AVE APT 15E NewYork NY 10001

Number: 355
Street: 8TH AVE APT 15
PostDirectional: E
SecondaryNumber:
SecondaryUnit:
Suffix:

Swap the Apt to 15F and it works great NSEW all get munged.
I am not smart enough to figure out the regex to fix that.

--

Incorrect Handling of Null Secondary Unit

I've tried to take a stab at fixing this, but I am not skilled enough at RegEx to solve it...

If i understand the USPS spec correctly, the secondary unit is allowed to be null - so when passed an address like:

3330 W SIGNAL PEAK DR RP303

Expected Behavior would be:
Number: 3330
Predirectional: W
Street: SIGNAL PEAK
Suffix: DR
SecondaryUnit: string.Empty
SecondaryNumber: RP303

Actual Behavior is:
Number: 3330
Predirectional: W
Street: SIGNAL PEAK DR RP303
Suffix: string.Empty
SecondaryUnit: string.Empty
SecondaryNumber: string.Empty

Question about dual address with PO Box

Hello,

I'm trying to understand the parsing behavior in a scenario with a specific address:
"741 N Main St Po Box 246, Cedarville, CA 96104".

This is an address string that contains <Line1> <Line2>, <City>, <State> <Zip> and <Line2> is a PO Box.

Here is the output:
address

I've highlighted the parts of the output that seem incorrect.

I entered the freeform address on https://smartystreets.com/ and it calls it a "Dual Address".

Is this output correct? If not, what is the correct output you'd expect based on your knowledge of the USPS specifications? Is having a dual address a valid scenario for this parser to process?

Very slow!

I'v been using the old AddressParser library and this version is extremely slow. It also installs a ton of libraries that I'm pretty sure it doesn't need. System.Security.Cryptography??

If your going to take over this library please make sure it doesn't run slower and clean house with the libraries.

Address incorrectly parsed

This is a valid address - you can confirm it any address verification site.

623 NE 5th AVE  Fort Lauderdale FL 33304

STREET is expected to be 623 NE 5th AVE but instead was 623 NE 5th AVE FT
as expected CITY is wrong too - expected to be Fort Lauderdale but is Lauderdale

Common tech company addresses are not parsed

I'm having some trouble with parsing addresses for large tech companies that don't use a numerical street number.

For example, Microsoft and Apple:

One Microsoft Way Redmond WA 98502-6399
One Infinite Loop, Cupertino, CA 95014

Note that nothing is successfully parsed, not even the city, state or zip code.

Could we change the pattern for the street number to accommodate this?

Support parsing address line only

This would be a nice feature.
Sometimes I have the address, city, state and zip in different fields. So, there is no need to parse them.
However, I would like to hand it just the address line(s) and let it parse out the street number, direction, etc.

What do you think?

TODO:

  • Create an overload of ParseAddress that accepts a regex.
  • Direct current overloads to use the new overload, passing AddressRegex
  • Create a property that returns an address line regex (as a string).

Detecting CT as part of a street when no punctuation exists

Hi! Loving the address parser. I have found an edge case where it doesn't like parsing addresses where the state is CT and there is no punctuation in the address, like:

"777 RAINBOW DR WATERFORD CT 06385"

It picks up the street as: "RAINBOW DR WATERFORD" and "CT" as the suffix, with no city or state.

I forked the repo and stepped through the code and am able to replicate in a regex tester, but I am having issues figuring out a working regex tweak to support this scenario.

Anyone got any thoughts?

Target .NET Framework

It appears not many additional Core libraries are used with the project, but when I added the nuget to my .NET Framework project it needed to add all these extra libraries because this project only targeted .NET Standard. If the project were updated to also target .NET Framework all those extra libraries would not be required.

I ended up not using the project because it would add about 2 dozen more references to my solution and that didn't seem worth the benefit. But I'd be interested in using it if it targeted .NET Framework.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.