geopandas / geopandas Goto Github PK

View Code? Open in Web Editor NEW

4.2K 105.0 895.0 57.16 MB

Python tools for geographic data

Home Page: http://geopandas.org/

License: BSD 3-Clause "New" or "Revised" License

Python 99.95% Shell 0.05%

geoparquet geospatial pandas python spatial

geopandas's People

Stargazers

Watchers

Forkers

sjsrey gijs jaidevd urschrei pelson jwass pepsalehi djq silky radoraykov jtratner imclab zechuan elksie5000 bhagya-rathore danybol ccchris-allen frewsxcv invisibleroads aashish24 evanmisshula noraderam tayloroshan gitoni brendancol lucasnad27 amos5 preinh ozak mukolx ianalis mjmv snazz2001 adamgreenhall dmkent fonnesbeck motherbox huangwx rrichard-gcps simudream xguse jorisvandenbossche mdbartos iamjeffg eotp kod3r taocao ubieting dlmurr maxalbert manazevedof snario kwinkunks koldunovn scw perrygeo darribas yupa884 eshagen nkhuyu souravsingh scari gitter-badger mhweber bibmartin vbs homesong ztessler aviatorbeijing nickeubank mjclawar michaelaye cwallaceh supergis sandy4321 abjer fernanccervone residentmario wyseguy7 forum2k9 casyfill lipap al00014 joshnr13 santiagorodriguezarevalo tyun08 inadj nickwg03 dcorreab pomona88 mrigal leijiancd jackieleng paliwal90 timtroendle zdong1 tjnycum ahill818 wsf1990 metrodatateam

geopandas's Issues

Pandas SQL API change breaks read_postgis()

As discovered in #97, pandas PR#6867 changed the read_sql() API in a way that breaks read_postgis() and tests. A change to get GeoPandas working again with pandas master is in 75c5301, but it needs to be cleaned up to make it backward compatible with released pandas versions.

Rename from_ functions to read_

I'd like to propose the from_file, from_postgis, and future from_... functions be renamed to read_file, read_postgis, and read_.... This is more consistent with the Pandas naming conventions.

The common pattern in Pandas is:

import pandas as pd
df = pd.read_csv(...)
df = pd.read_json(...)
...

With this suggestion, the analogous GeoPandas code would be:

import geopandas as gpd
df = gpd.read_file(...)
df = gpd.read_postgis(...)

Currently, you have to write:

import geopandas as gpd
df = gpd.GeoDataFrame.from_file(...)
df = gpd.GeoDataFrame.read_postgis(...)

The first GeoPandas version has a very similar layout to the Pandas code and would be more familiar to Pandas users.

Like Pandas, the code would be in a geopandas.io package which would be exposed at the top-level geopandas module. The to_ functions, to_json(), to_file(), etc. would remain methods of GeoDataFrame and GeoSeries.

Thoughts? I can put this together but want to make sure others agree first.

Spatial joins

This is a sketch of an API for spatial joins in GeoPandas:

def sjoin(left_df, right_df, op='intersects', how='left', **kwargs)
    """ Spatial join of two GeoDataFrames.

    left_df, right_df are GeoDataFrames (or GeoSeries)
    op: binary predicate {'intersects', 'contains', 'dwithin'}
    how: the type of join {'left', 'right', 'inner', 'outer'}
    left_cols, right_cols: optional lists of columns to include from each side
    kwargs: passed to op method

References: PostGIS spatial joins and R's over function.

Error import geopandas.io

Traceback (most recent call last):
When I imported the library geopandas the following error occurred:

File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/geopandas/init.py", line 6, in
from geodataframe import GeoDataFrame
File "/usr/local/lib/python2.7/dist-packages/geopandas/geodataframe.py", line 12, in
import geopandas.io
ImportError: No module named io

I installed the geopandas library using the command in the terminal (Ubuntu 12.04):
pip install geopandas

The error occurred because the io folder was not installed, so I created this folder within the geopandas and copy these files into it
https://github.com/kjordahl/geopandas/commit47f25eea93fc3ad1129f972848d14b8c71828c77
The problem was solved.

tools module

As mentioned in a discussion on #121, utility functions like geocode and sjoin should go in a common tools module.

Imports would work like this:

from geopandas.tools import sjoin, geocode

'GIS-like' operations

Currently, GeoPandas has several 'overlay' operations (such as intersect, difference, union, etc), which in their current implementation, perform one-to-one spatial overlays on the aligned GeoSeries. While this is useful (particularly if the Indexes of the two GeoSeries' have some useful meaning), many GIS users may find this behavior slightly confusing. Instead, a user may expect to perform a one-to-many overlay comparison between the calling GeoSeries and the input GeoSeries (e.g., for each geometry in the calling GeoSeries, 'difference' it with all 'other' geometries and return the resultant 'differenced' geometry). For this, a spatial index is important to avoid unnecessary overlay comparisons.

A potential API for this type of operation may be something like:

def difference(self, other, by_geom=True):
    """
    Return the set-theoretic difference of each geometry with 
    *other*. This function is used to 'erase' parts of the 
    input geometies that overlap the other geometries.

    Parameters
    ----------
    other: GeoSeries or BaseGeometry
        Series of difference geometry objects
    by_geom: bool
        Should difference be applied by individual geometries,
        or across the whole GeoSeries? `by_geom`=False is the 
        normal GIS-type operation between 'layers', whereas 
        `by_geom`=True is equivalent to geoms - other
    """
...

These types of operations would be greatly enhanced (enabled) by a comprehensive spatial index implementation. This would enable something like obj1.geo_align(obj2) to be applied before the overlay operation when by_geom=False.

Import failing due to shapely import error

After d071f48, I can no longer import GeoPandas, because Shapely 1.2.17 doesn't have an ops.transform method, which was only introduced here: https://github.com/sgillies/shapely/commit/1ce9c0503d1d06a7df805c2cda03f30cd45fbcad

Not sure how best to handle this – maybe just a note in the README to alert people to install shapely from Github.

Have GeoSeries return regular Series if not geo-like

I.e. GeoSeries([1.0, 2.0, 3.0]) return Series([1.0, 2.0, 3.0]). That way you don't have to decide whether to use GeoSeries or Series in operations. Very simple to implement if this is something people want. Related question - how do you validate that something is geo-like?

ENH: Beginnings of a Geo groupby

It would be great to be able to pass a collection of polygons to groupby and then end up with a {polygon: rows-contained-within-polygon} grouping. I'm focusing on the idea of points grouped by polygon as an abstraction, but it seems like it would be similar for most geometries.

Some context under the hood from pandas (not particularly complicated):

Index objects have a groupby method takes in an ordered collection of objects and returns a group:

>>> index = Index(range(6))
>>> index.groupby(np.array(['a', 'a', 'b', 'c', 'a', 'c']))
{'a': [0L, 1L, 4L], 'b': [2L], 'c': [3L, 5L]}

which is how you end up with the groups in the GroupByObject. However, the grouper is required to be the same length as the index/NDFrame, which doesn't really make sense for geometry operations.

For our purposes makes sense to allow an arbitrarily long set of polygons and then group geometries by containment.

geo_groupby(points, polygons) --> {poly1: indices_of_points, poly2: indices_of_points...}

My understanding is that this would be trivial to handle if we had an R-tree mapping points --> integer position and then queried for the points that intersect with an arbitrary polygon.

Eventually we could skip the step of calling groupby on the Index and try to hook into pandas internals, but for now, it might be easier to just create an array of polygons for pandas to group by hashing, i.e.:

{poly1: [0, 3, 5], poly2: [1, 2], poly3: [4]} --> [poly1, poly2, poly1, poly4, poly1]

It does some duplicate work (because it'll group on the hash of each polygon), but it makes our own code much simpler.

This makes sense for grouping an arbitrary GeoSeries a single time if you don't shuffle or make any other changes (or if we used an integer index) and is probably a good place to start. Things will get a little more complicated if we want to reuse the same R-tree across multiple objects -- maybe we can come up with some sort of intermediate hash mapping (or buckle down and create a GeoIndex :P). Don't have any great ideas, but reusing Categorical could be an easy way to handle this. (i.e. a geo column would be an integer column internally to pandas, but we keep track of a separate categorical that translates those integers to level labels, or maybe have a 'hidden' integer column).

Keep R-trees up to date with series and frames

@kjordahl points out that setting geometries in series and frames needs to be considered. I'm not currently addressing this. TODO.

API: pandas is changing _propogate_attributes

see here: pandas-dev/pandas#5205

now going to be called __finalize__, and data is now _metadata.
and going to called on pretty much everything before returning to the user.

I am thinking about allowing this:

def __finalize__(self, other, method=None, **kwargs):
....

so that in theory a sub-class (as you guys have done), could dispatch on the method (if you need/want).

does this help?

ENH: Access row geometry methods for GeoDataFrame.iterrows

As @jwass mentions in #134 :

"it's weird and annoying to iterate over the rows and not have the geom methods"

You can get to the geometry methods via row.geometry.* but you have no way of knowing if geometry is the active geometry field on that Series.

Add to_csv() method to GeoDataFrame for point geometries

Add a to_csv method to GeoDataFrame assuming all the geometries are points. This is similar to the special to_json() method that writes GeoJSON.

The signature might look like
def to_csv(self, x, y, *args, **kwargs):
x is a string specifying the name of the column to use for the x coordinate
y is a string specifying the name of the column to use for the y coordinate
args and kwargs are passed to pandas DataFrame.to_csv()

Roughly the equivalent of:

gdf = GeoDataFrame(...)
df = gdf.drop('geometry', axis=1)  # df is a DataFrame, not GeoDataFrame after the drop
df[x] = gdf.geometry.apply(lambda p: p.x)
df[y] = gdf.geometry.apply(lambda p: p.y)
df.to_csv(...)

which enables

gdf = GeoDataFrame(...)
gdf.to_csv(x='longitude', y='latitude', 'output.csv')

This would raise TypeError(?) if a non-Point geometry is encountered.

Some ideas from PyData

Had some ideas from PyData that could be useful sugar:

~~autoconvert geometry on setting crs (and validate) eg string rep set becomes dict under the hood~~ (not doing this)
add tests to make sure that getitem with 'geometry' returns the column named geometry, not the overall geometry.
pick a default unit spec
keep track of geo units on columns
allow remembering geo data when you have multiple columns and switch geometries with set_geometry.

Implement the __geo_interface__

Implement a __geo_interface__ method on GeoDataFrames and GeoSeries. See https://gist.github.com/sgillies/2217756

Add spatial index to GeoDataFrame

An r-tree for the data frame's geometry column.

Add spatial indexing

We'll be adding an spatial index to speed up spatial queries in GeoPandas objects. The focus right now is to get a useful API working, with potential behind the scenes changes for future development.

Check whether crs is the same when transforming

Currently, the to_crs method will always do a transformation, even if the requested crs is the same as the current one. This could be improved with a simple dictionary check to skip if the crs is equal. A more involved check would need to determine whether coordinate systems are equivalent even if the PROJ.4 parameters are not equal.

An example of different crs mappings that refer to the same projection is

crs1 = {'datum': 'NAD83', 'ellps': 'GRS80', 'proj': 'utm', 'units': 'm', 'zone': 18, 'no_defs': True}
crs2 = {'init': 'epsg:26918', 'no_defs': True}

which both represent UTM zone 18N.

De-couple exact column name from definition of geometry with a set_geometry() method on GeoDataFrame.

Is the assumption that there is always one and only one 'geometry' column that defines the geometry on the object? For more flexibility on defining the internal representation of geometries, and more clarity on what's going on set_geometry(col_name_or_series) (same/similar to set_index()) and have a .geometry property.

For now, that could just set the 'geometry' column, but this could be helpful when showing examples for GeoDataFrame. Also means you could potentially have two columns with different geo data (though that'd be weird) and more easily switch between them.

GeoDataFrame geometry methods should return a GeoDataFrame

Currently, something like df.buffer(50) returns a GeoSeries, rather than a GeoDataFrame. This is because of how things are implemented in GeoPandasBase (more specifically, _geo_op.

Instead of returning gpd.GeoSeries() wrapped results, we should check for the type, and if it is a GeoDataFrame, set the geometry column and return it.

Geo operations on GeoDataFrame

Currently to perform any geometric operations on a GeoDataFrame, you have to pull out the geometry column which can be cumbersome. This can be easier for users by adding the corresponding GeoSeries methods to GeoDataFrame, which would delegate. Also, the geo methods should be able to accept a GeoDataFrame or GeoSeries.

The following would be equivalent:

gdf1 = GeoDataFrame(...)
gdf2 = GeoDataFrame(...)

s = gdf1['geometry'].intersects(gdf2['geometry'])  # How it must be done now
s = gdf1.intersects(gdf2['geometry'])
s = gdf1['geometry'].intersects(gdf2)
s = gdf1.intersects(gdf2)

All of the above would return a GeoSeries or the Series equivalent type depending on the function.

To do this there would be some base GeoPandas class containing the definitions of the methods, which GeoSeries and GeoDataFrame would also inherit from. The geo methods would call something like _geo_data when they need a GeoSeries. GeoSeries would return self and GeoDataFrame would return self['geometry'].

Thoughts? I can take a crack at putting this together.

Improve docstrings

Some docstrings are missing, some are fairly rudimentary. All modules, classes, methods and functions should have useful docstrings.

Optionally include bbox in GeoJSON representations

The json (and geo interface) representations of the data frame should optionally include the bounding box (bbox).

This could be implemented at the FeatureCollection level (using df.total_bounds) as well as on each Feature.

http://geojson.org/geojson-spec.html#bounding-boxes

Have GeoDataFrame return regular DataFrame if no geometry column

This is directly related to #53. Should the following return a DataFrame, or a GeoDataFrame. If it returns a GeoDataFrame, then any calls to 'geo' methods will result in an error.

df = GeoDataFrame({"A":[1,2,3,4,5,6], "B":[6,5,4,3,2,1]})
df.geometry

AttributeError: 'GeoDataFrame' object has no attribute 'geometry'

to_file(): 'long' isn't a valid Fiona property type

The question http://gis.stackexchange.com/questions/89206/geopandas-error-when-writing-to-file-valueerror-long-is-not-in-list revealed a bug to me.

If you pass schema={'geometry': 'Point', 'properties': {'foo': 'long'}} into fiona.open(), the type 'long' isn't found at https://github.com/Toblerity/Fiona/blob/master/src/fiona/ogrext.pyx#L973. OGR doesn't distinguish between long and int, so converting 'long' to 'int' within Fiona may help...

But :)

Fiona will always return 'int' in the .schema attribute and this could cause trouble for programs that pass 'long' and expect it to stick. So, let's fix up geopandas so it always uses 'int' and never 'long'.

KeyError for `to_file` with 'custom' geometry column name

Currently, we get a KeyError if we try to write a GeoDataFrame to_file after using set_geometry on a column with a name other than 'geometry':

In [23]: df = df.set_geometry('geom')
In [24]: df.to_file('output.shp')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-24-abb121197786> in <module>()
----> 1 df.to_file('output.shp')

/Users/cfarmer/Dropbox/Git/geopandas/geopandas/geodataframe.pyc in to_file(self, filename, driver, **kwargs)
    292         # Some (most?) providers expect a single geometry type:
    293         # Point, LineString, or Polygon
--> 294         geom_types = self['geometry'].geom_type.unique()
    295         from os.path import commonprefix # To find longest common prefix
    296         geom_type = commonprefix([g[::-1] for g in geom_types])[::-1]  # Reverse

# Long output supressed

KeyError: 'geometry'

Set operators are inconsistently handled

Since merging #68 to allow geometric operations on GeoDataFrame objects, set operators work with a dataframe as the second item in an operation, but not the first.

For example,

a = GeoSeries(...)
b = GeoDataFrame(...)

a.intersection(b)  # Okay
b.intersection(a)  # Okay
a & b  # Okay
b & a # Error

Require Rtree 0.8

In requirements.txt. Blocked by Toblerity/rtree#21. For now, we'll require from the repo.

Handle empty geometry

When I called geopandas.GeoDataFrame.from_file() I was getting an exception. I followed the error and realized I had geometry that was being returned by fiona as 'None'. The following code, for instance, solves the issue by checking for 'None' and continuing.

https://github.com/fscottfoti/geopandas/commit/a4cbbde5467043338ee2404df0ad7af3e82a490e

I'm not sure how you want to handle this issue, but I figured you should be aware of it - it seems like geopandas should gracefully handle empty geometry. When I added code to allow adding records with empty geometry to the dataframe, I got another exception when I called 'to_file'. I'm happy to fix this if you want to advise me on how you want it done.

Thanks -
Fletcher

Travis CI builds failing

Something seems to have changed with the ubuntugis repository. All the Travis builds are failing on installing gdal with the following errors:

The following packages have unmet dependencies:
 gdal-bin : Depends: libgdal1h (>= 1.10.0-1~precise1) but it is not going to be installed
 libgdal-dev : Depends: libgdal1h (= 1.10.0-1~precise1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Similar issues had already prevented postgis from being installed on Travis since f9ee0be .

Suggestions welcome. Going back to ubuntugis:unstable may be worth a try, or alternate PPAs, or maybe the main 12.04 repo would work now.

Implement skipped geometry tests for GeoSeries

There are currently 8 tests in tests/test_geoseries.py that are skipped with the line

 @unittest.skip('TODO')

(for example, the crosses() method). It would be great to get them implemented and complete test coverage of the Shapely methods on GeoSeries objects. If there are any other holes in test coverage for the Shapely API on series objects they can be added to this issue.

Faster coordinate reference transformation

Using the OGR SpatialReference and CoordinateTransformation classes, we can get about a 40x speedup on coordinate reference transformations. Since Fiona requires OGR anyway, this doesn't really add any additional dependencies that I am aware of, and in fact, eliminates the pyproj dependency. I'll open a PR if the following proves useful. You should be able to just 'drop' this function into geoseries, with the following imports at the top of the file:

from shapely import wkb
from osgeo.osr import SpatialReference, CoordinateTransformation
import ogr

and the main function is here:

def transform(self, crs=None, epsg=None):
    if self.crs is None:
        raise ValueError('Cannot transform naive geometries.  '
                                'Please set a crs on the object first.')
    if crs is None:
        target_crs = SpatialReference()
        if target_crs.ImportFromEPSG(epsg):
            raise TypeError('Invalid EPSG code.')
    else: # Might be a proj4string, or a dict of proj params
        target_crs = SpatialReference()
        if type(crs) in (str, unicode):
            if target_crs.ImportFromProj4(crs):
                raise TypeError('Invalid proj4 string.')
        else:
            if target_crs.ImportFromProj4(to_string(crs)):
                raise TypeError('Invalid proj4 string.')
    origin_crs = SpatialReference()
    origin_crs.ImportFromProj4(to_string(self.crs)) # Should be a dict of params
    transformer = CoordinateTransformation(origin_crs, target_crs)
    def trans(geom):
        g = ogr.CreateGeometryFromWkb(geom.wkb)
        if g.Transform(transformer):
            raise RuntimeError('Could not transform geometry.')
        g = wkb.loads(g.ExportToWkb())
        return g
    result = self.apply(trans)
    result.__class__ = GeoSeries
    result.crs = crs
    return result

BUG: Make sure to reset cache on setting anything on GeoSeries

total_bounds is set as cache_readonly, leading to this:

In [6]: s = GeoSeries([Point(x,y) for x, y in zip(range(5), range(5))])

In [11]: s.total_bounds
Out[11]: (0.0, 0.0, 4.0, 4.0)

In [12]: s[0] = Point(10, 50)

In [13]: s
Out[13]:
0    POINT (10.0000000000000000 50.0000000000000000)
1      POINT (1.0000000000000000 1.0000000000000000)
2      POINT (2.0000000000000000 2.0000000000000000)
3      POINT (3.0000000000000000 3.0000000000000000)
4      POINT (4.0000000000000000 4.0000000000000000)
Name: geometry, dtype: object

In [14]: s.total_bounds
Out[14]: (0.0, 0.0, 4.0, 4.0)

If we base on 0.13 only, then there's a _reset_cache() (or _update_inplace()) method that works just fine for this. Otherwise, need to monkey patch that in or just delete _cache after setting.

Implement pandas methods for GeoDataFrame objects

GeoPandas GeoDataFrame objects should be able to handle most standard pandas methods. Currently, those which do not behave well (or make sense) on a geometry column will fail.

See #19 (comment) for some examples. Implementing a geometry attribute that a separate GeoSeries object rather than a normal column should make this easier to do as well (see #45).

Generate rtree when joining frames

The product of joining of two geodataframes needs an rtree. Related to #118 and #120.

Tests fail intermittently due to GeocoderTimedOut error

Occasionally I experience a geopy.exc.GeocoderTimedOut: Service timed out exception from the various geocoding services when running tests, both locally and on Travis.

I'm not sure what the options are but we should ensure that a broken third-party web service doesn't break the unit tests. I'm not a big fan of mocking but it might be appropriate here? Or maybe just raise a unittest.SkipTest if we get a GeocoderTimedOut?

Coveralls is broken since move to new organization

The new link to the coveralls run should be https://coveralls.io/r/geopandas/geopandas, but I haven't been able to add it yet on coveralls. I've tried "sync repos", but I don't see the geopandas org in my account yet.

Maybe I'm missing something, or maybe it will show up eventually.

Finish implementing pandas methods

There are still a number of pandas operations on GeoPandas objects that return the wrong type of object. The most egregious of these is slicing:

>>> s
0    POINT (0.0000000000000000 0.0000000000000000)
1    POINT (1.0000000000000000 1.0000000000000000)
2    POINT (2.0000000000000000 2.0000000000000000)
dtype: object

>>> type(s)
geopandas.geoseries.GeoSeries

>>> type(s[:2])
pandas.core.series.Series

This stems from difficulty in subclassing pandas objects. This should become much easier in pandas after #3482 is merged (expected to be in 0.13 release). So the question is how much it is worth working around this now, or just to wait and only support newer versions of pandas. My inclination is that slicing, at the very least, is important enough that it should work with pandas 0.12. I'll do some testing after that PR is merged into pandas master and see how much it fixes.

Spatial indexes for geo data frames and series

A new issue and branch because we're starting over from f42d049.

When a GeoDataFrame or GeoSeries is created, it will get an R-tree attached as _sindex. This R-tree can be used to speed up queries. Usage example:

from geopandas import read_file

boros = read_file("/nybb_14a_av/nybb.shp", vfs="zip://examples/nybb_14aav.zip")
tree = self.boros._sindex
hits = list(tree.intersection((1012821.80, 229228.26)))
print(boros.ix[hits]['BoroName'])
# Output:
#
#4     Bronx
#2    Queens
# Name: BoroName, dtype: object

R-tree intersection considers bounding boxes only, so it generates false positives: boros.intersects(Point(1012821.80, 229228.26)) needs to run over those 2 candidates and reduce the selection to those that precisely intersect.

After reading http://pandas.pydata.org/pandas-docs/stable/indexing.html again I think we need to put some thought into whether this supports integer or label indexing. Example above is for integer/iloc type indexing.

GeoDataFrame.iterrows() should yield GeoSeries instead of Series

Seems like a GeoSeries would be more appropriate here?

>>> for i, v in df.iterrows(): print type(v)
<class 'pandas.core.series.Series'>

Geopandas join returns pandas DataFrame

For exmaple:

boros = gp.read_file("nybb.shp")
populations = pd.read_csv("populations.csv")
populations.set_index('BoroCode', inplace=True)

joined = boros.join(populations, on='BoroCode')
type(joined)

pandas.core.frame.DataFrame

Output should be:

geopandas.geodataframe.GeoDataFrame

population.csv data:

BoroCode,BoroName,Pop
2,Bronx,1385108
3,Brooklyn,2504700
1,Manhattan,1585873
4,Queens,2230722
5,Staten Island,468730

GeoSeries.to_crs method Pyproj error

Minimal example using Geopandas 5219026, and Pyproj 1.9.3:

import geopandas as gp
from shapely.geometry import Point
from fiona.crs import from_epsg

test_series = gp.GeoSeries([Point(0.5, 49), Point(0.7, 49.5)])
test_series.crs = from_epsg(4326)
test_series.to_crs(epsg=7405)

Throws:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-d3ff04143697> in <module>()
      1 test_series = gp.GeoSeries([Point(0.5, 49), Point(0.7, 49.5)])
      2 test_series.crs = from_epsg(4326)
----> 3 test_series.to_crs(epsg=7405)

/Users/sth/dev/gp/venv/lib/python2.7/site-packages/geopandas/geoseries.pyc in to_crs(self, crs, epsg)
    266                 raise TypeError('Must set either crs or epsg for output.')
    267         proj_in = pyproj.Proj(preserve_units=True, **self.crs)
--> 268         proj_out = pyproj.Proj(preserve_units=True, **crs)
    269         project = partial(pyproj.transform, proj_in, proj_out)
    270         result = self.apply(lambda geom: transform(project, geom))

/Users/sth/dev/gp/venv/lib/python2.7/site-packages/pyproj/__init__.pyc in __new__(self, projparams, preserve_units, **kwargs)
    341         # on case-insensitive filesystems).
    342         projstring = projstring.replace('EPSG','epsg')
--> 343         return _proj.Proj.__new__(self, projstring)
    344 
    345     def __call__(self, *args, **kw):

/Users/sth/gp/xmltocsv/venv/lib/python2.7/site-packages/pyproj/_proj.so in _proj.Proj.__cinit__ (_proj.c:950)()

RuntimeError: no options found in 'init' file

Minor typo?

It looks like line 121 in geoseries.py (in function _series_unary_op) should read:

return Series([getattr(geom, op) for geom in self],

rather than

return GeoSeries([getattr(geom, op) for geom in self],

This is a very minor bug, so I didn't bother with a PR, but am happy to do one if that helps.

Debian packaging

Supported python versions

Currently, GeoPandas only explicitly supports Python 2.7.

Python 3 support is currently blocked by Shapely. Once shapely fully supports 3.3, it would be worth getting GeoPandas there as well.

Python 2.6 support would probably not be difficult if it was needed, but is currently untested. There are at least a couple of dictionary comprehensions that would have to be changed to get it working. Input welcome if this is important to anyone.

Python <= 2.5 are unlikely to be supported at all. If someone feels strongly about 2.5 or earlier, submit a pull request and it will be considered.

Spatial overlays

Similar to #38 , implement an overlay function to do spatial overlays at the GeoDataFrame/GeoSeries level. This was discussed at the scipy 2014 code sprint.

The function signature would be e.g.

import geopandas as gpd
...
gpd.overlay(df1, df2, how='intersection')

The how argument defines the overlay operation. We could define these by their common GIS nomenclature:

intersection
union
identity
symmetric_difference
erase

Alternately, we may want to provide aliases for these according to SQL vocabulary.

Properly handle rows with missing geometries

GeoSeries and GeoDataFrame should support missing values in the geometry column like all of Pandas, with a fill value of np.nan.

For rows with missing geometries:
df.area should return nan
df.intersects would return False
df.buffer would return nan

Thoughts? Pretty sure that's in sync with Pandas patterns.

Implement reverse geocoding

GeoPandas geocode module only implements forward geocoding using geopy. It takes in a Series of strings and returns a GeoDataFrame of the corresponding points.

GeoPandas should take advantage of geopy's reverse geocoding implementation. An interface would take a GeoSeries of Point objects and return a DataFrame of the corresponding string locations.

The tricky part here is to normalize the column names from the different geocoders.

The new function would raise TypeError if it encounters a non-Point geometry.

This should be done in conjunction with #108

Borough Shapefile URL Is Dead

The shapefile that's downloaded for testing (line 40 of tests/util.py) is no longer at http://www.nyc.gov/html/dcp/download/bytes/nybb_13a.zip. It appears the new link is http://www.nyc.gov/html/dcp/download/bytes/nybb_14bav.zip. I was going to submit a PR with this update, but, as @cfarmer suggested, perhaps it would be better to find a more stable home for this file? Or just include the files in the source?

Use Pandas wheels to speed up Travis builds

Sadly, I haven't found any out there.