Giter VIP home page Giter VIP logo

training-data-stac-spec's Introduction

This repository is archived. For the latest specification of STAC training data please refer to label extension in radiantearth/stac-spec repository.

Training Data STAC Profile

The Training Data STAC Profile is an extension of the SpatioTemporal Asset Catalog core to handle 'training data' assets. Training data is earth observation imagery (and potentially other types of assets) with associated labels that describe what is in the imagery. Labels typically describe a set of geospatial 'things' that are contained in the image, be it forests, roads, ships or walmarts. The training data specification is agnostic as to what is actually labeled - the structure specified can be used to describe anything that can be identified on the earth. The primary use of training data is as input in to Machine Learning models, training them to automatically recognize or segment the same types of objects in new imagery. But the labels can be used for a variety of purposes.

Training Data STAC Item

The core of the Training Data profile is a STAC Training Data Item that extends the core STAC definition. Some of the meanings have been tweaked and refined. There are two required assets for a Training Data Item - it should have a source asset that the labels are created from, and

element type info name description
id string Training Data Item ID The ID of the Item assigned by the Labels Creator
geometry geojson Geometry A polygon of the area of the image that labels are valid for, in lat/long (EPSG 4326),
datetime date and time Date and Time The searchable date/time of the source asset for the labels, in UTC (Formatted in RFC 3339)
links array Resource Links Dict of link objects to resources and related URLs (self required)
assets array Assets Dict of asset objects that can be be download: 'source-asset' and 'labels' required, with thumbnail strongly recommended)
td:contributor string Contributor The name of the contributor who created the Training Data Item
td:method string Method The method of gathering the training data - from an image, from ground truth, etc (TODO: pick the possible values)
td:task_type string (enum) Task One of 'Tile Classification', 'Object Detection', 'Segmentation', or 'Other'
provider string Provider (optional) The provider of the labels (image provider is in the input-asset
license string Data License (optional) Item's license name based on SPDX License List or following guidelines for non-SPDX licenses

training-data-stac-spec's People

Contributors

cholmes avatar hamedalemo avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

cholmes

training-data-stac-spec's Issues

Simplified initial version of Training Data STAC

I had a fruitful discussion with @dlindenbaum that I want to record here as a suggestion and use case description as a starting point for a minimal implementation of TD STAC.

Raster Vision can consume labeled data that has the following items:

  • Image(s) of the scene
  • GeoJSON
  • An optional AOI polygon or set of polygons that describe the area of the image that is fully labeled.

For the "Image(s) of a scene" part, it's good to have the image size scoped such that downloading and loading up the corresponding GeoJSON in QGIS with the Raster Vision Plugin won't put an end to my machine.

Currently the Rio dataset in SpaceNet is set up where there are a set of images, one large label GeoJSON, and a total AOI. This notebook splits it up labels to fit with the above scheme.

Other cities in SpaceNet have COGs, large GeoJSON in a tarball, and a tar of the images chipped out to various smaller sizes with corresponding label GeoJSONs. For working with those datasets, we'll have to make similar preprocessing, sometimes requiring the user to download all the imagery to get at the labels - something we'd like to avoid.

Dave talked about STAC-ifying all of SpaceNet, and TD-STAC-ifying it as well. He also mentioned that a good first step is just un-tarring some of those files and exposing the files in a way that they could be directly read off of S3 and not require bulk downloaded. I mentioned that, because all I really need is that (Image(s), GeoJSON, Optional[AOI]) triplet, that triplet is really all I would want for now out of a TD STAC of SpaceNet, or anything else for that matter.

I'd like to propose we figure out a simplified version of the TD STAC that just tries to get us to that point - not necessarily containing everything in the table currently in the README, but just getting to an indexable set of labeled data that people putting training data out there can aim at, and consumers like Raster Vision can utilize.

This issue can serve as a place for discussing ideas about this "TD STAC 0.0.0.1" implementation before making the PR's to add info about it to the repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.