Giter VIP home page Giter VIP logo

usaddress-scourgify's Introduction

usaddress-scourgify

A Python3.x library for cleaning/normalizing US addresses following USPS pub 28 and RESO guidelines.

Documentation

Use

normalize_address_record()

or

get_geocoder_normalized_addr()

or

NormalizeAddress().normalize()

to standardize your addresses. (Note: usaddress-scourgify does not make any attempts at address validation.)

Both functions, and the class init, take an address string, or a dict-like object, and return an address dict with all field values in uppercase format mapped to the keys address_line_1, address_line_2, city, state, postal_code... code-block:: python

from scourgify import normalize_address_record, NormalizeAddress

normalize_address_record('123 southwest Main street, Boring, or, 97203')

normalize_address_record({
    'address_line_1': '123 southwest Main street',
    'address_line_2': 'unit 2',
    'city': 'Boring',
    'state': 'or',
    'postal_code': '97203'
})

NormalizeAddress('123 southwest Main street, Boring, or, 97203').normalize()

expected output

{
     'address_line_1': '123 SW MAIN ST',
     'address_line_2': 'UNIT 2'
     'city': 'BORING',
     'state': 'OR',
     'postal_code': '97203'
 }

By default, the output style abbreviates all pre or post directionals, street types, and occupancy types. Alternately, if you would like to receive your output with full word directionals and street types, you can use the long_hand parameter.

from scourgify import normalize_address_record, NormalizeAddress

normalize_address_record('123 southwest Main street, Boring, or, 97203', long_hand=True)

normalize_address_record({
    'address_line_1': '123 southwest Main street',
    'address_line_2': 'unit 2,
    'city': 'Boring',
    'state': 'or',
    'postal_code': '97203'
})

NormalizeAddress('123 southwest Main street, Boring, or, 97203', long_hand=True).normalize()

expected output

{
     'address_line_1': '123 SOUTHWEST MAIN STREET',
     'address_line_2': 'UNIT 2'
     'city': 'BORING',
     'state': 'OR',
     'postal_code': '97203'
 }

normalized_address_record() uses the included processing functions to remove unacceptable special characters, extra spaces, predictable abnormal character sub-strings and phrases. It also abbreviates directional indicators and street types according to the abbreviation mappings found in address_constants. If applicable, line 2 address elements (ie: Apt, Unit) are separated from line 1 inputs and standard occupancy type abbreviations are applied.

You may supply additional additional processing functions as a list of callable supplied to the addtl_funcs parameter. Any additional functions should take a string address and return a tuple of strings (line1, line2).

Postal codes are normalized to US zip or zip+4 and zero padded as applicable. ie: 2129 => 02129, 02129-44 => 02129-0044, 021290044 => 02129-0044. However, postal codes that cannot be effectively normalized, such as invalid length or invalid characters, will raise AddressValidationError. ie 12345678901 or 02129- or 02129-0044-123, etc

Alternately, you may extend the NormalizeAddress class to customize the normalization behavior by overriding any of the class' methods.

If your address is in the form of a dict that does not use the keys address_line_1, address_line_2, city, state, and postal_code, you must supply a key map to the addr_map parameter in the format {standard_key: custom_key}

{
    'address_line_1': 'Line1',
    'address_line_2': 'Line2',
    'city': 'City',
    'state': 'State',
    'postal_code': 'Zip'
}

You can also customize the address constants used by setting up an address_constants.yaml config file. Allowed keys are:: DIRECTIONAL_REPLACEMENTS OCCUPANCY_TYPE_ABBREVIATIONS STATE_ABBREVIATIONS STREET_TYPE_ABBREVIATIONS KNOWN_ODDITIES PROBLEM_ST_TYPE_ABBRVS

You may also use the key insertion_method with a value of update or replace to indicate where you would like to insert your values into the existing constants or replace them. If insertion_method is not present, update is assumed.

insertion_method: update
KNOWN_ODDITIES:
    'developed by HOST': ''
    ', UN ': ' UNIT '

OCCUPANCY_TYPE_ABBREVIATIONS:
    'UN': 'UNIT'

get_geocoder_normalized_addr() uses geocoder.google to parse your address into a standard dict. No additional cleaning is performed, so if your address contains any stray or non-conforming elements (ie: 8888 NE KILLINGSWORTH ST, UN C, PORTLAND, OR 97008), no result will be returned. Since geocoder accepts an address string, if your address is in dict format you will need to supply a list of the address related keys within your dict, in the order of address string composition, if your keys do not match the standard key set (address_line_1, address_line_2, city, state, postal_code)

Installation

Requires Python3.x.

pip install usaddress-scourgify

To use a custom constants yaml, set the ADDRESS_CONFIG_DIR environment variable with the full path to the directory containing your address_constants.yaml file

export ADDRESS_CONFIG_DIR=/path/to/your/config_dir

To use get_geocoder_normalized_addr, set the GOOGLE_API_KEY environment variable

export GOOGLE_API_KEY=your_google_api_key

Contributing

Create a new branch to hold your change; no pull requests submitted directly to dev or master will be approved. Please include a comment explain the issue your pull request solves. Make sure all appropriate test, and tox, updates are included and that all tests are passing.

License

usaddress-scourgify is released under the terms of the MIT license. Full details in LICENSE file.

Changelog

usaddress-scourgify was developed for use in the greenbuildingregistry project. For a full changelog see CHANGELOG.rst.

usaddress-scourgify's People

Contributors

fablet avatar bmckallagat-os avatar philiporlando avatar adamyala avatar ahassansos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.