Giter VIP home page Giter VIP logo

csvgeocode's Introduction

csvgeocode

For when you have a CSV with addresses and you want a lat/lng for every row. Bulk geocode addresses a CSV with a few lines of code.

The defaults are configured for Google's geocoder but it can be configured to work with any other similar geocoding service. There are built-in response handlers for Google, Mapbox, OSM Nominatim, Mapzen, and Texas A & M's geocoders (details below).

Make sure that you use this in compliance with the relevant API's terms of service.

Basic command line usage

Install globally via npm:

npm install -g csvgeocode

Use it:

$ csvgeocode path/to/input.csv path/to/output.csv --url "https://maps.googleapis.com/maps/api/geocode/json?address={{MY_ADDRESS_COLUMN_NAME}}&key=MY_API_KEY"

If you don't specify an output file, the output will stream to stdout instead, so you can stream the result as an HTTP response or do something like:

$ csvgeocode path/to/input.csv [options] | grep "greppin for somethin"

Options

You can add extra options when running csvgeocode. For example:

$ csvgeocode input.csv output.csv --url "http://someurl.com/" --lat CALL_MY_LATITUDE_COLUMN_THIS_SPECIAL_NAME --delay 1000 --verbose

The only required option is url. All others are optional.

--url [url] (REQUIRED)

A URL template with column names as Mustache tags, like:

http://api.tiles.mapbox.com/v4/geocode/mapbox.places/{{address}}.json?access_token=MY_API_KEY

https://maps.googleapis.com/maps/api/geocode/json?address={{address}}&key=MY_API_KEY

http://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx?apiKey=MY_API_KEY&version=4.01&streetAddress={{address}}&city={{city}}&state={{state}}

https://search.mapzen.com/v1/search?api_key=MY_API_KEY&text={{address}}

If your addresses are broken up into multiple columns (e.g. a street_address column, a city column, and a state column), you can use them all together in a URL template:

https://maps.googleapis.com/maps/api/geocode/json?address={{street_address}},{{city}},{{state}}&key=MY_API_KEY

--handler [handler]

What handler function to process the API response with. Current built-in handlers are "google", "mapbox", "mapzen", "osm", and "tamu". Contributions of handlers for other geocoders are welcome! You can define a custom handler when using this as a Node module (see below).

Examples:

$ csvgeocode input.csv --url "http://api.tiles.mapbox.com/v4/geocode/mapbox.places/{{MY_ADDRESS_COLUMN_NAME}}.json?access_token=123ABC" --handler mapbox

$ csvgeocode input.csv --url 'https://search.mapzen.com/v1/search?api_key=123ABC&text={{MY_ADDRESS_COLUMN_NAME}}' --handler mapzen

$ csvgeocode input.csv --url "http://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx?version=4.01&streetAddress={{ADDR}}&city={{CITY}}&state={{STATE}}&apiKey=123ABC" --handler tamu

Default: "google"

--lat [latitude column name]

The name of the column that should contain the resulting latitude. If this column doesn't exist in the input CSV, it will be created in the output.

Default: Tries to automatically detect if there is a relevant existing column name in the input CSV, like lat or latitude. If none is found, it will use lat.

--lng [longitude column name]

The name of the column that should contain the resulting longitude. If this column doesn't exist in the input CSV, it will be created in the output.

Default: Tries to automatically detect if there is a relevant existing column name in the input CSV, like lng or longitude. If none is found, it will use lng.

--delay [milliseconds]

The number of milliseconds to wait between geocoding calls. Setting this to 0 is probably a bad idea because most geocoders limit how fast you can make requests.

Default: 250

--force

By default, if a lat/lng is already found in an input row, that will be kept. If you want to re-geocode every row no matter what and replace any lat/lngs that already exist, add --force. This means you'll hit API limits faster and the process will take longer.

--verbose

See extra output while csvgeocode is running.

$ csvgeocode input.csv --url "MY_API_URL" --verbose
160 Varick St,New York,NY
SUCCESS

1600 Pennsylvania Ave,Washington,DC
SUCCESS

123 Fictional St,Noncity,XY
NO MATCH

Rows geocoded: 2
Rows failed: 1
Time elapsed: 1.8 seconds

Using as a Node module

Install via npm:

npm install csvgeocode

Use it:

var csvgeocode = require("csvgeocode");

//stream to stdout
csvgeocode("path/to/input.csv",{
    url: "MY_API_URL"
  });

//write to a file
csvgeocode("path/to/input.csv","path/to/output.csv",{
    url: "MY_API_URL"
  });

You can add all the same options in a script, except for verbose.

var options = {
  "url": "MY_API_URL",
  "lat": "MY_SPECIAL_LATITUDE_COLUMN_NAME",
  "lng": "MY_SPECIAL_LONGITUDE_COLUMN_NAME",
  "delay": 1000,
  "force": true,
  "handler": "mapbox"
};

//stream to stdout
csvgeocode("input.csv",options);

//write to a file
csvgeocode("input.csv","output.csv",options);

csvgeocode runs asynchronously, but you can listen for two events: row and complete.

row is triggered when each row is processed. It passes a string error message if geocoding the row failed, and the row itself.

csvgeocode("input.csv",options)
  .on("row",function(err,row){
    if (err) {
      console.warn(err);
    }
    /*
      `row` is an object like:
      {
        first: "John",
        last: "Keefe",
        address: "160 Varick St, New York NY",
        employer: "WNYC",
        lat: 40.7267926,
        lng: -74.00537369999999
      }
    */
  });

complete is triggered when all geocoding is done. It passes a summary object with three properties: failures, successes, and time.

csvgeocoder("input.csv",options)
  .on("complete",function(summary){
    /*
      `summary` is an object like:
      {
        failures: 1, //1 row failed
        successes: 49, //49 rows succeeded
        time: 8700 //it took 8.7 seconds
      }
    */
  });

Using a custom geocoder

You can use any basic geocoding service from within a Node script by supplying a custom handler.

The easiest way to see what a handler should look like is to look at handlers.js.

The handler function is passed the body of an API response and should either return a string error message or an object with lat and lng properties.

csvgeocoder("input.csv",{
  url: "MY_API_URL",
  handler: customHandler
});

function customHandler(body) {
  //success, return a lat/lng
  if (body.result) {
    return {
      lat: body.result.lat,
      lng: body.result.lng
    };
  }

  //failure, return a string
  return "NO MATCH";
}

Contributing/tests

The tests for the Mapbox and TAMU geocoders both require API keys. To run those tests, you need those API keys in a .env file in the project's root folder that defines two environment variables like so:

MAPBOX_API_KEY=123ABC
TAMU_API_KEY=123ABC

Some Alternatives

To Do

  • Add the NYC geocoder as a built-in handler.
  • Support a CSV with no header row where lat and lng are numerical indices instead of column names.
  • Support both POST and GET requests somehow.

Credits/License

By Noah Veltman

Available under the MIT license.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

csvgeocode's People

Contributors

ericsoco avatar jasonsanford avatar veltman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csvgeocode's Issues

include full error message when REQUEST_DENIED from Google geocoder

The Google geocoder can return REQUEST_DENIED, which csvgeocode will display if --verbose argument is given, but additional info from the error message returned by Google's API is not being shown.

Adding that information to the output from csvgeocode would make it easier for users to quickly debug why the Google geocoder isn't returning the expected results.

For example, here's what I see now:

$ csvgeocode --verbose --url "https://maps.googleapis.com/maps/api/geocode/json?address={{City}},{{State}}&key=YOUR_KEY_HERE" input.csv output.csv

8/19/00,NM,Carlsbad,Pipeline rupture and explosion; 12 deaths,el paso,

REQUEST_DENIED

Rows geocoded: 0
Rows failed: 1
Time elapsed: 1.3 seconds

If we add a check for REQUEST_DENIED in handlers.js, like so:

    else if (response.status === "REQUEST_DENIED") {
      return "Request denied: " + response.error_message;
    }

we get more useful info when running csvgeocode:

$ csvgeocode --verbose --url "https://maps.googleapis.com/maps/api/geocode/json?address={{City}},{{State}}&key=YOUR_KEY_HERE" input.csv output.csv

8/19/00,NM,Carlsbad,Pipeline rupture and explosion; 12 deaths,el paso,

Request denied: This IP, site or mobile application is not authorized to use this API key. Request received from IP address 200.200.200.201, with empty referer

Rows geocoded: 0
Rows failed: 1
Time elapsed: 1.2 seconds

Which tells me that the request was denied because of an issue with the IP address I'm on, so I know I need to change the settings for the Google Geocoder API key -- much more helpful than just 'request denied'. After updating the allowed IP addresses, it works fine:

$ csvgeocode --verbose --url "https://maps.googleapis.com/maps/api/geocode/json?address={{City}},{{State}}&key=YOUR_KEY_HERE" input.csv output.csv

8/19/00,NM,Carlsbad,Pipeline rupture and explosion; 12 deaths,el paso,32.4206736,-104.2288375

SUCCESS

Rows geocoded: 1
Rows failed: 0
Time elapsed: 1.5 seconds

Needed --lat and --lng options

Here's the original data:

address
0 JFK AIRPORT JAMAICA 11430
0 jfk airport queens 11430
0 JFK INT'L AIRPORT JAMAICA 11430
0 JFK INTERNATIONAL AIRPOR JAMAICA 11430
0 JFK INTERNATIONAL AIRPORT JAMAICA 11430
0 JFK INTL AIRPORT JAMAICA 11430

Running with: csvgeocode addresses.csv addressesGeo.csv --url="https://maps.googleapis.com/maps/api/geocode/json?address={{address}}" --handler=google --verbose

address,undefined
0 JFK AIRPORT JAMAICA 11430,-74.8987696
0 jfk airport queens 11430,-74.8987696
0 JFK INT'L AIRPORT JAMAICA 11430,-74.8987696

Running with: csvgeocode addresses.csv addressesGeo.csv --url="https://maps.googleapis.com/maps/api/geocode/json?address={{address}}" --handler=google --verbose --lat=lat --lng=lng

address,lat,lng
0 JFK AIRPORT JAMAICA 11430,40.123,-74.8987696
0 jfk airport queens 11430,40.123,-74.8987696
0 JFK INT'L AIRPORT JAMAICA 11430,40.123,-74.8987696

Might be something to do with the single column.

(side note, I got an error without the explicit --handler=google option)

an apostrophe can modify results

I was looking to geocode a file that contained "Villeneuve d'Ascq, France", and the script found "Strada Villeneuve D' Asq, Gorj, Romania".

For some reason the apostrophe was encoded to an HTML numeric entity and caused this problem and the death of many squirrels 🐿.

I fixed it for this specific case with

url = url.replace(/'/, " ");

but maybe mustache has an option for removing html codes or transcoding them properly to utf8.

I can send a PR if you like, but I'm not sure of he scope of the problem or the best way to fix it.

Escaping strings with double quote inside

When used with fields that contain double quotes and therefore are escaped by prepending those with an escaping backslash this backslash gets broken and Excel, Libreoffice and other editors interpret those double quotes as field ends.
This might be more like an issue with dsv but noteworthy nonetheless even if you can't fix it within csvgeocode.

migrate mapzen to geocode.earth

Just to confirm that https://geocode.earth/ the drop-in replacement of mapzen† pelias geocoder worked for me

csvgeocode addresses.csv --verbose --url "https://api.geocode.earth/v1/search?api_key=ge-XXXXXXXX&text={{address}}" --handler mapzen

We may want to rename or alias a few things in the documentation and handler's names.

Error when using mapbox geocoder

Hi,

I getting this error when using mapbox geocoder.

csvgeocode.js:161
} else if ("lat" in result && "lng" in result) {
^

TypeError: Cannot use 'in' operator to search for 'lat' in undefined
at handleResponse (C:\Users\argenis\AppData\Roaming\npm\node_modules\csvgeocode\src\csvgeocode.js:161:25)
at Request._callback (C:\Users\argenis\AppData\Roaming\npm\node_modules\csvgeocode\src\csvgeocode.js:133:9)
at Request.self.callback (C:\Users\argenis\AppData\Roaming\npm\node_modules\csvgeocode\node_modules\request\request.
js:199:22)
at emitTwo (events.js:87:13)
at Request.emit (events.js:172:7)
at Request. (C:\Users\argenis\AppData\Roaming\npm\node_modules\csvgeocode\node_modules\request\request.js
:1036:10)
at emitOne (events.js:82:20)
at Request.emit (events.js:169:7)
at IncomingMessage. (C:\Users\argenis\AppData\Roaming\npm\node_modules\csvgeocode\node_modules\request\re
quest.js:963:12)
at emitNone (events.js:72:20)

When I use the url in the browser I get.

{"type":"FeatureCollection","query":["maracaibo"],"features":[{"id":"place.15865","type":"Feature","text":"Maracaibo","place_name":"Maracaibo, Zulia, Venezuela","relevance":0.99,"properties":{},"bbox":[-71.7817466,10.582766,-71.586462,10.7688503],"center":[-71.6407,10.6436],"geometry":{"type":"Point","coordinates":[-71.6407,10.6436]},"context":[{"id":"region.5652266462607880","text":"Zulia"},{"id":"country.13795682694360550","text":"Venezuela","short_code":"ve"}]},{"id":"place.15864","type":"Feature","text":"Maracaibo","place_name":"Maracaibo, Valle del Cauca, Colombia","relevance":0.99,"properties":{},"bbox":[-76.5181048,3.4174864,-76.5159618,3.4195548],"center":[-76.517063,3.418486],"geometry":{"type":"Point","coordinates":[-76.517063,3.418486]},"context":[{"id":"region.13115541496430840","text":"Valle del Cauca"},{"id":"country.8835466213820440","text":"Colombia","short_code":"co"}]},{"id":"address.3991858998015520","type":"Feature","text":"Rua Maracaibo","place_name":"Rua Maracaibo, Santo André, São Paulo 09250, Brazil","relevance":0.79,"properties":{},"center":[-46.520427,-23.623473],"geometry":{"type":"Point","coordinates":[-46.520427,-23.623473]},"context":[{"id":"place.24117","text":"Santo André"},{"id":"postcode.1287776102566740","text":"09250"},{"id":"region.6056669612838220","text":"São Paulo"},{"id":"country.17318012316682710","text":"Brazil","short_code":"br"}]},{"id":"address.3219787327733630","type":"Feature","text":"Privada Maracaibo","place_name":"Privada Maracaibo, Ecatepec de Morelos, México, Mexico","relevance":0.79,"properties":{},"center":[-99.016035,19.593392],"geometry":{"type":"Point","coordinates":[-99.016035,19.593392]},"context":[{"id":"place.7221","text":"Ecatepec de Morelos"},{"id":"region.5989025148456720","text":"México"},{"id":"country.11735183656773450","text":"Mexico","short_code":"mx"}]},{"id":"address.9663854498015520","type":"Feature","text":"Rua Maracaibo","place_name":"Rua Maracaibo, Porto Alegre, Rio Grande Do Sul 91225, Brazil","relevance":0.79,"properties":{},"center":[-51.1345179,-30.0127374],"geometry":{"type":"Point","coordinates":[-51.1345179,-30.0127374]},"context":[{"id":"place.20332","text":"Porto Alegre"},{"id":"postcode.11986695081151740","text":"91225"},{"id":"region.4633809344890480","text":"Rio Grande Do Sul"},{"id":"country.17318012316682710","text":"Brazil","short_code":"br"}]}],"attribution":"NOTICE: © 2016 Mapbox and its suppliers. All rights reserved. Use of this data is subject to the Mapbox Terms of Service (https://www.mapbox.com/about/maps/). This response and the information it contains may not be retained."}

Any help?

add maxFailures option?

Add a maxFailures option so that if there are more than X consecutive failures, it stops running (so that if, e.g., you're over an API's limit for the day, it doesn't try another 1000 rows).

`=` in arguments

I am pretty sure an = is needed in arguments:

--url=thing

The documentation does not have this. My commands would not work without it.

Also, awesome module!

Fails with Javascript "in operator" error

I'm trying to geocode this file:

street,city,zip,latitude,longitude
123 E. Main St.,Charlottesville,22902,,

With this command:

$ csvgeocode --url "http://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx?apiKey=abcdefghijklmnopqrstuvwxyz&version=4.01&streetAddress={{street}}&city={{city}}&state=VA" test.csv

It's failing, with this error:

/usr/local/lib/node_modules/csvgeocode/src/csvgeocode.js:161
    } else if ("lat" in result && "lng" in result) {
                        ^
TypeError: Cannot use 'in' operator to search for 'lat' in undefined
    at handleResponse (/usr/local/lib/node_modules/csvgeocode/src/csvgeocode.js:161:25)
    at Request._callback (/usr/local/lib/node_modules/csvgeocode/src/csvgeocode.js:133:9)
    at Request.self.callback (/usr/local/lib/node_modules/csvgeocode/node_modules/request/request.js:344:22)
    at Request.emit (events.js:110:17)
    at Request.<anonymous> (/usr/local/lib/node_modules/csvgeocode/node_modules/request/request.js:1239:14)
    at Request.emit (events.js:129:20)
    at IncomingMessage.<anonymous> (/usr/local/lib/node_modules/csvgeocode/node_modules/request/request.js:1187:12)
    at IncomingMessage.emit (events.js:129:20)
    at _stream_readable.js:908:16
    at process._tickCallback (node.js:355:11)

I'd figured that result isn't defined because the file can't be opened, but running in verbose mode precedes the above error with this:

123 E. Main St.,Charlottesville,22902,
Parsing error: SyntaxError: Unexpected number

I don't know what's up with the SyntaxError, but the point is that it is retrieving the first data line from the file.

I'm running Node v0.12.0.

Any ideas?

Streams

Maybe I missed something, but it would be cool to use streams/pipes. And not because they are all the rage. The specific use case is that if I have a file with 5000 addresses and I want to geocode with Google, I'll run into the 24 hour limit. It would be great to do a run that saves what has been done so that I can come back to it later and finish it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.