Giter VIP home page Giter VIP logo

uganda-tanzania-building-footprints's Introduction

Introduction

Under Microsoft’s AI for Humanitarian Action program, Bing Maps is contributing to an initiative from Humanitarian OpenStreetMap Team that will bring AI Assistance to open map building. More information around the partnership is available on Bing Maps blog.

Bing Maps is releasing country wide open building footprints datasets in Uganda and Tanzania. This dataset contains 17,942,345‬ computer generated building footprints derived using Bing Maps algorithms on satellite imagery. Satellite imagery used for Uganda and Tanzania extraction is from our imagery partner Maxar Technologies. The data is freely available for download and use under applicable license.

License

This data is licensed by Microsoft under the Open Data Commons Open Database License (ODbL).

FAQ

What does the data include?

17,942,345‬ building footprint polygon geometries in Uganda and Tanzania in GeoJSON format. You can download the data here:

Country Number of Buildings Unzipped MB
Uganda 6,928,078‬ 1339
Tanzania 11,014,267‬ 2202

What is the GeoJson format?

GeoJSON is a format for encoding a variety of geographic data structures. For intensive documentation and tutorials, refer to GeoJson blog.

Why is the data being released?

Microsoft has a continued interest in supporting a thriving OpenStreetMap ecosystem.

Should we import the data into OpenStreetMap?

Maybe. Never overwrite the hard work of other contributors or blindly import data into OSM without first checking the local quality. While our metrics show that this data meets or exceeds the quality of hand-drawn building footprints, the data does vary in quality from place to place, between rural and urban, mountains and plains, and so on. Inspect quality locally and discuss an import plan with the community. Always follow the OSM import community guidelines.

Will the data be used or made available in larger OpenStreetMap ecosystem?

Yes. Currently Microsoft Open Buildings dataset is used in ml-enabler for task creation. You can try it out at AI assisted Tasking Manager. Facebook has also integrated the dataset into RapiD editor. Try it out here RapiD.

What is the creation process for this data?

The building extraction is done in two stages:

  1. Semantic Segmentation – Recognizing building pixels on the satellite image using DNNs
  2. Polygonization – Converting building pixel blobs into polygons

Stage1: Semantic Segmentation

DNN architecture

The network backbone is EfficientNet B3 which can be found here. The model is fully-convolutional, meaning that the model can be applied to an image of any size (constrained by GPU memory, 4096x4096 in our case).

Training details

The training set consists of 1.2 million labeled buildings. The data is diverse in terms of geolocation, urbanization and underlying imagery, in order to attain the good corpus representativeness. We also used mixture of high and low quality labels. Images in the set are with 30 cm/pixel resolution.

Pixel Metrics

These are the intermediate stage metrics we use to track DNN model improvements and they are pixel based. Pixel precision/recall = 86.8%/81.8%.

Stage 2: Polygonization

Method description

We developed a method that approximates the prediction pixels into polygons making decisions based on the whole prediction feature space. This is very different from standard approaches, e.g. Douglas-Peucker algorithm, which are greedy in nature. The method tries to impose some of a priori building properties, which is, at the moment, manually defined and automatically tuned.

Polygon Metrics

Building matching metrics:

Metric Value
Precision 94.5%
Recall 61.8%

False positive ratio across the board is 1.6%.

We track various metrics to measure the quality of the output:

  1. Intersection over Union – This is the standard metric measuring the overlap quality against the labels
  2. Shape distance – With this metric we measure the polygon outline similarity
  3. Dominant angle rotation error – This measures the polygon rotation deviation

The evaluation set contains 18.5k building. The metrics on the set are:

  • IoU is 0.68, Shape distance is 0.39, Average rotation error is 4.1 degrees

What is the vintage of this data?

The vintage of the footprints depends on the vintage of the underlying imagery. Bing Imagery is a composite of multiple sources, therefore it is difficult to know the exact dates for individual pieces of data.

How good is the data?

Our metrics show that in the vast majority of cases the quality is at least as good as data hand digitized buildings in OpenStreetMap. It is not perfect, particularly in dense urban areas but it provides good recall in rural areas. See below for metrics by area type:

What is the coordinate reference system?

EPSG: 4326

Will there be more data coming for other geographies?

Maybe. This is a work in progress.


Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Legal Notices

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found here.

Privacy information can be found here

Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.

uganda-tanzania-building-footprints's People

Contributors

microsoftopensource avatar msftgits avatar nikolatr avatar ssinghw avatar timusumisu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uganda-tanzania-building-footprints's Issues

Polygonization

Is there a specific description or code for Polygonization?

Tanzania geometry character strings are limited to 2^31-1 bytes

Hi, I'm getting an error
character strings are limited to 2^31-1 bytes
in R when importing the Tanzania data set.

From what I can determine, it seems to be one of the geometries that is very long and complex.

Uganda imports correctly with no issues. Thanks for the amazing data set.

How to access imagery

As far as I can tell the geojson contains FeatureCollections of polygons but contains no URL links for source imagery, and no way to work out where exactly the polygons are located.

How can I access the geotiff imagery that the polygons were drawn on?

Sharing the original training data

This is great work! However, the building polygons are based on prediction results from the model, I am wondering if it is possible to share the original training data for the model so other people can develop models to scale mapping outside Uganda and Tanzania or increase mapping accuracy within these two countries with corrected labels.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.