Giter VIP home page Giter VIP logo

nycbuildings's Introduction

NYC building and address import

Generates an OSM file of buildings with addresses per NYC election districts, ready to be used in JOSM for a manual review and upload to OpenStreetMap.

This README is about data conversion. Find Import guidelines in the Wiki.

Blog posts

Data

See documentation on OSM Wiki

Status

  • Needs peer review

Prerequisites

Python 2.7.x
pip
virtualenv
libxml2
libxslt
spatialindex
GDAL

Installing prerequisites on Mac OSX

# install brew http://brew.sh

brew install libxml2 
brew install libxslt 
brew install spatialindex 
brew install gdal 

Installing prerequisites on Ubuntu

apt-get install python-pip
apt-get install python-virtualenv
apt-get install gdal-bin
apt-get install libgdal-dev
apt-get install libxml2-dev
apt-get install libxslt-dev
apt-get install python-lxml
apt-get install python-dev
apt-get install libspatialindex-dev
apt-get install unzip

Set up Python virtualenv and get dependencies

# may need to easy_install pip and pip install virtualenv 
virtualenv ~/venvs/nycbuildings
source ~/venvs/nycbuildings/bin/activate 
pip install -r requirements.txt

Usage

Run all stages:

# Download all files and process them into a building
# and an address .osm file per district.
make

You can run stages separately, like so:

# Download and expand all files, reproject
make download

# Chunk address and building files by district
make chunks

# Generate importable .osm files.
# This will populate the osm/ directory with one .osm file per
# NYC election district.
make osm

# Clean up all intermediary files:
make clean

# For testing it's useful to convert just a single district.
# For instance, convert election district 65001:
make merged # Will take a while
python convert.py merged/buildings-addresses-65001.geojson # Fast

Features

  • Conflates buildings and addresses
  • Cleans address names
  • Exports one OSM XML building file per NYC election district
  • Exports OSM XML address files for addresses that pertain to buildings with more than one address
  • Handles multipolygons
  • Simplifies building shapes

Attribute mapping

Buildings

Each building is a closed way tagged with:

building="yes"
height="HEIGHT_ROO" # In meters, if available
addr:housenumber="HOUSE_NUMB" # If available
addr:streetname="STREET_NAM" # If available
addr:postcode="ZIPCODE" # If available
nycdoitt:bin="BIN" # NYC DoITT building identifier

(All "addr" entities in CAPS are from address file.)

Addresses

Each address is a node tagged with:

addr:housenumber="HOUSE_NUMB"
addr:streetname="STREET_NAM"
addr:postcode="ZIPCODE"

(All entities in CAPS from address file.)

House number attributes

House number attributes are captured in 5 columns of the address shape file.

There are four fields that begin with HOUSE_NU and one named HYPHEN_TYPE.

HOUSE_NUMBER (HOUSE_NUMB):
Alias: HOUSE_NUMBER
Data type: String
Width: 9
Precision: 0
Scale: 0

Definition: Stores the address number. The field will support hyphenated and range based addresses. Excludes suffixes.

HOUSE_NUMBER_SUFFIX (HOUSE_NU_1):
Alias: HOUSE_NUMBER_SUFFIX
Data type: String
Width: 9
Precision: 0
Scale: 0

Definition: It contains any suffix (e.g. 1/2, A, B) associated with the house number. GARAGE and REAR are not captured.

HOUSE_NUMBER_RANGE (HOUSE_NU_2):
Alias: HOUSE_NUMBER_RANGE
Data type: String
Width: 9
Precision: 0
Scale: 0

Definition: Stores the minimum and maximum numbers of a range assigned to a building. Unlike a hyphen these are separate building numbers found on a building. Co-op City is an example.

HOUSE_NUMBER_RANGE_SUFFIX (HOUSE_NU_3):
Alias: HOUSE_NUMBER_RANGE_SUFFIX
Data type: String
Width: 9
Precision: 0
Scale: 0

Definition: Stores suffixes associated with HOUSE_NUMBER_RANGE addresses.

HYPHEN_TYPE
Alias: HYPHEN_TYPE
Data type: String
Width: 1
Precision: 0
Scale: 0

Definition: The Address Point Feature Class will support the storage of hyphenated addresses. There are three domain values associated with hyphen types: 1) Building Ranges, 2) "Queens" style hyphens and 3) Floor numbers.

Related

nycbuildings's People

Contributors

aaronlidman avatar ebrelsford avatar ingalls avatar lxbarth avatar mateov avatar rub21 avatar wildintellect avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nycbuildings's Issues

Incomplete addresses

Needs fix before continuing import.

There are areas in which what looks to be the same address appears multiple times. What's very likely going on is that we're missing a significant attribute from the source data in our addressing scheme.

Investigating now.

screen shot 2013-10-05 at 7 54 45 am

Formatting of REAR addresses

Bit of a strange one from the source data, and it could be a one-off that doesn't need fixing, but node 2494076774 has addr:housenumber = 174 REAR. Is there a better formatting for this?

Extraneous Nodes

I've noticed that many of the buildings are composed of extra nodes.
screenshot from 2013-11-28 13 00 19

They don't detract from the import in any way but they are completely useless and if there was an easy way to perform an automated edit to remove them it may want to be examined.

Merging address nodes into buildings

There are a series of buildings with single separate address nodes where the address information should be sitting on the building polygon (see #15).

Write a bot that iterates through those buildings, copies address data from address nodes to building polygon and then saves back to OSM?

@emacsen - got some good starter code for this?

Fix overlapping buildings

There are 150+ changesets uploaded before #8 generating overlapping building warnings.

Needs automated fix.

One 'open' button or flip buttons?

@MateoV -

I assume there are two buttons because we can't open this in one go, right?

In my mind ideally we'd have one button that:

  • Downloads OSM data into JOSM
  • Downloads import data in a separate layer into JOSM

Presuming that this is not possible, we should at least flip the buttons around, as if you download OSM data first, loading the import data second will merge it into the same layer which is not preferable.

Not merge addresses to polygons?

Colin Reilly told me DoITT has tried to place address points as close as possible to the actual entrance to a building.

Right now we're merging addresses into buildings where possible, losing any address location information.

Should we stop merging addresses into buildings?

screen shot 2013-10-10 at 4 31 44 pm

Count me in!

I can't join the mapping party on October 12th but I'd like to help. I can be a guinea pig for the import process, or whatever else makes sense. Let me know.

(Opening a ticket because that's what the instructions on OSM Task Manger said to do).

Detect common import errors early

@emacsen - based on your review, what are common mistakes importers make? Looking for a list here that we can use as a basis for a daily script or something that creates a report that we can use for cleanups.

Review of imports

The tasking server has a review function.

Who has access to review? And if they do, does it provide feedback?

Conflate DoITT with OSM NYC data

Two overlapping questions:

  1. Should we discount existing NYC buildings from OSM files prepared for import? I can go either way here because it could be fine to do this manually.
  2. How can we generate quick diffs between OSM buildings and NYC data for future updates from NYC to OSM?

This question pertains not only to buildings but also to addresses.

Some addr:street tags have no number suffix

Some buildings have addr:street tags with incorrect street name (e.g. 108 Avenue instead of 108th Avenue)

example

(This appears to have been fixed in a later generation of the OSM files, but the older stuff needs to be fixed.)

Ordinal suffixes missing

Ordinal suffixes are missing. This is a critical error and I'll stop import effective immediately until I've fixed this.

There are a series of areas affected. I'll need to find a way to update them.

Find the meaning of 4-million ids

Some BIDs are 4xxxxxx, and they are unaddressed. Sarah thought this had some special meaning in the dataset.

Some of these buildings have no height (height of 0.0 meters), which tells me that something is very odd with that.

If we know what these are, we should tag them. The same does for any other feature like this, where the ID or other indicator in the original dataset can tell us something about the data other than building.

Precision discrepancies

We export data with much higher precision and we validate data in JOSM at much higher precision than what OSM handles (see example below).

The result is that while buildings validate before upload, they don't validate after they've been downloaded from OpenStreetMap again.

@MateoV - what options do you see for a fix here? Should we round all coordinates to 7 places after the decimal point when we read in data before we merge nodes?

# Node before upload  
Node: -14546
  Data Set: 26e15b1d
  Edited at: <new object>
  Edited by: <new object>
  Version: 0
  In changeset: 0
  Coordinates: 40.68840054832918, -73.99240044050097
  Coordinates (projected): -8236796.339608559, 4966488.890780375
  Part of: 
    Way: -14556

# Node after upload - note the precision lost in Coordinates.
Node: 2524540191
  Data Set: 55a7e5ae
  Edited at: 2013-11-08T18:26:37Z
  Edited by: lxbarth_nycbuildings (1764427)
  Version: 1
  In changeset: 18787119
  Coordinates: 40.6884005, -73.9924004
  Coordinates (projected): -8236796.335100011, 4966488.883685272
  Part of: 
    Way: 245220992

Dashes in addresses

We were discussing the issue of dashes in addresses, e.g. 632 - 636 5 Avenue, Brooklyn.

The wiki currently states this is equivalent to all housenumbers in the inclusive range being there. Of course the wiki does not make it clear if this is only for a-b or also for a - b, or how to deal with housenumbers that do have a dash in them.

The obvious question was, what do data consumers support.

Richard Welty thought that mkgmap and OSMand didn't support dashes to indicate ranges in housenumbers. Nominatim interprets more complicated schemes than other geocoders, so I investigated what it supports by quering ___ 5th Avenue, Brooklyn, New York, USA

___    Match type        Nominatim place_id
646    Exact match       9151589306
640    No exact match    8213537384
636    No exact match    8213537307
634    No exact match    8213537303
632    No exact match    8213537298

It's safe to say that Nominatim doesn't handle - in addresses but falls back to matching on the street which it knows is close.

This is actually good for NYC, because there are addresses that have dashes in them, and you wouldn't want those dashes specially interpreted.

Given that the data consumers we checked don't special-case -, nor do we really want them to, we should look at what to better do with these addresses.

Converting ranges covering more than two housenumbers to interpolation ways would be ideal and the correct way to represent the data. It's not obvious to me that this conversion could be done within code, and there's a substantial number of these.

Preserving BIN (building identification number) as tag

sort of like chicago:building_id, dug up this previous discussion:

https://groups.google.com/forum/#!topic/openstreetmap/kH_DwuYEWRU

I'm not super familiar with doing imports and on cursory inspection it looks like this isn't included right now, are there plans for this? It would be useful for joining with PLUTO/PAD, and i imagine this could be used as a stepping stone to crowdsourcing a better building age dataset (see comments on http://bdon.org/2013/09/12/building-age-nyc/)

Detecting suspicious uploads

Issue: it's hard to determine whether an upload is properly done simply by looking at the ratio between time and number of objects created. In fact, it is possible to create large and clean uploads in short amounts of time. While some slower ones can be still quite messy.

Here are a couple of ideas for scripted changeset analysis that would help us flag uploads for manual inspection:

  • Run JOSM validators on an upload (can you run JOSM from CLI?), flag on any non-role related errors.
  • Flag if only building and address nodes where modified in an upload, but no other geometries were touched

None of the above thresholds are our measures of quality, but they are useful thresholds to raise a flag at.

/cc @iandees per our conversation last night
/cc @emacsen

Some missed POI merges

There were a couple nodes imported from GNIS (e.g. this fire station) that probably should have been merged with buildings during import.

This can be laborious, because you can't assume the building directly underneath the node is the one it should be merged with. For example, there were two nodes for a fire station that was a couple buildings down the block. I had to look up the name on the GNIS nodes to figure out which building they belonged on.

Spurious nodes on buildings

Sometimes there are extra nodes on an otherwise perfectly rectangular building. I checked for these by scanning up a block and comparing the position of the "meta node" handles -- they should all be pretty close together except in the case where there's an extra node that doesn't need to be there

You might consider running a simplification algorithm that only gets rid of completely superfluous nodes (i.e. where the angle between the line segments on either side of the node isn't very very close to 180°).

image

Add missing ordinal suffixes

Before #28 was fixed, a series of changesets were uploaded where ordinal suffixes were missing:

East 4 Street instead of East 4th Street

Add missing ordinal suffixes on data that was uploaded before #28 was applied. Based on the fact that detecting and fixing ordinals is programmatically reliable, this could be an automated edit.

Rat Island

The OSM file for Rat Island loads a bunch of buildings that aren't in the bounding box.

Implement script

  • Model after dcbuildings
  • Capture entire process from download to output as chunked .osm files in the scripts
  • Keep in mind that address data will change and might need a spatial join and not an id join to buildings.
  • Find an appropriate shapefile for chunking out data (for DC we used census tracts, they're a bit big, there maybe something smarter in NYC)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.