training-data-stac-spec's Introduction

This repository is archived. For the latest specification of STAC training data please refer to label extension in radiantearth/stac-spec repository.

Training Data STAC Profile

The Training Data STAC Profile is an extension of the SpatioTemporal Asset Catalog core to handle 'training data' assets. Training data is earth observation imagery (and potentially other types of assets) with associated labels that describe what is in the imagery. Labels typically describe a set of geospatial 'things' that are contained in the image, be it forests, roads, ships or walmarts. The training data specification is agnostic as to what is actually labeled - the structure specified can be used to describe anything that can be identified on the earth. The primary use of training data is as input in to Machine Learning models, training them to automatically recognize or segment the same types of objects in new imagery. But the labels can be used for a variety of purposes.

Training Data STAC Item

The core of the Training Data profile is a STAC Training Data Item that extends the core STAC definition. Some of the meanings have been tweaked and refined. There are two required assets for a Training Data Item - it should have a source asset that the labels are created from, and

element	type info	name	description
id	string	Training Data Item ID	The ID of the Item assigned by the Labels Creator
geometry	geojson	Geometry	A polygon of the area of the image that labels are valid for, in lat/long (EPSG 4326),
datetime	date and time	Date and Time	The searchable date/time of the source asset for the labels, in UTC (Formatted in RFC 3339)
links	array	Resource Links	Dict of link objects to resources and related URLs (self required)
assets	array	Assets	Dict of asset objects that can be be download: 'source-asset' and 'labels' required, with thumbnail strongly recommended)
td:contributor	string	Contributor	The name of the contributor who created the Training Data Item
td:method	string	Method	The method of gathering the training data - from an image, from ground truth, etc (TODO: pick the possible values)
td:task_type	string (enum)	Task	One of 'Tile Classification', 'Object Detection', 'Segmentation', or 'Other'
provider	string	Provider (optional)	The provider of the labels (image provider is in the input-asset
license	string	Data License (optional)	Item's license name based on SPDX License List or following guidelines for non-SPDX licenses

training-data-stac-spec's People

Contributors

Stargazers

Watchers

training-data-stac-spec's Issues

Simplified initial version of Training Data STAC

I had a fruitful discussion with @dlindenbaum that I want to record here as a suggestion and use case description as a starting point for a minimal implementation of TD STAC.

Raster Vision can consume labeled data that has the following items:

Image(s) of the scene
GeoJSON
An optional AOI polygon or set of polygons that describe the area of the image that is fully labeled.

For the "Image(s) of a scene" part, it's good to have the image size scoped such that downloading and loading up the corresponding GeoJSON in QGIS with the Raster Vision Plugin won't put an end to my machine.

Currently the Rio dataset in SpaceNet is set up where there are a set of images, one large label GeoJSON, and a total AOI. This notebook splits it up labels to fit with the above scheme.

Other cities in SpaceNet have COGs, large GeoJSON in a tarball, and a tar of the images chipped out to various smaller sizes with corresponding label GeoJSONs. For working with those datasets, we'll have to make similar preprocessing, sometimes requiring the user to download all the imagery to get at the labels - something we'd like to avoid.

Dave talked about STAC-ifying all of SpaceNet, and TD-STAC-ifying it as well. He also mentioned that a good first step is just un-tarring some of those files and exposing the files in a way that they could be directly read off of S3 and not require bulk downloaded. I mentioned that, because all I really need is that (Image(s), GeoJSON, Optional[AOI]) triplet, that triplet is really all I would want for now out of a TD STAC of SpaceNet, or anything else for that matter.

I'd like to propose we figure out a simplified version of the TD STAC that just tries to get us to that point - not necessarily containing everything in the table currently in the README, but just getting to an indexable set of labeled data that people putting training data out there can aim at, and consumers like Raster Vision can utilize.

This issue can serve as a place for discussing ideas about this "TD STAC 0.0.0.1" implementation before making the PR's to add info about it to the repository.

Recommend Projects

radiantearth / training-data-stac-spec Goto Github PK

training-data-stac-spec's Introduction

This repository is archived. For the latest specification of STAC training data please refer to label extension in radiantearth/stac-spec repository.

Training Data STAC Profile

Training Data STAC Item

training-data-stac-spec's People

Contributors

Stargazers

Watchers

Forkers

training-data-stac-spec's Issues

Simplified initial version of Training Data STAC

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent