Giter VIP home page Giter VIP logo

geopython / pycsw Goto Github PK

View Code? Open in Web Editor NEW
197.0 20.0 153.0 8.25 MB

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.

Home Page: https://pycsw.org

License: MIT License

Python 96.01% Shell 0.01% Dockerfile 0.29% Makefile 0.07% HTML 3.58% XSLT 0.04%
ogc csw metadata geospatial

pycsw's Introduction

pycsw

DOI Build Status Join the chat at https://gitter.im/geopython/pycsw Documentation Vulnerabilities

pycsw is an OGC API - Records and CSW server implementation written in Python.

pycsw fully implements the the OGC API - Records (OARec) standard and the OpenGIS Catalogue Service Implementation Specification (Catalogue Service for the Web). Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. Since 2015, pycsw is an official OSGeo Project.

pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (OGC API - Records, CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures.

pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X).

Please read the docs at https://pycsw.org/docs for more information.

pycsw's People

Contributors

ahinz avatar ahmdthr avatar amercader avatar bukun avatar epifanio avatar etj avatar fazledyn-or avatar gsaaportal avatar ingenieroariel avatar johanvdw avatar kalxas avatar kindly avatar koalageo avatar kwilcox avatar mandyellow avatar menegon avatar milokmet avatar minhd avatar netanelc avatar ocefpaf avatar pvgenuchten avatar rcoup avatar ricardogsilva avatar rouault avatar sebastic avatar simod avatar sriranganathan avatar tomkralidis avatar vjf avatar volter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycsw's Issues

fix metadata output when elementsetname=full

When {{{elementsetname=full}}}, the codebase always returns the contents of the {{{xml}}} column verbatim. When the metadata is in a given format, and requested in a different {{{outputSchema}}}, the native format is returned.

Fix each supported {{{outputSchema}}}, ensure that {{{full}}} metadata is as per the {{{outputSchema}}} requested.

support Django orm

Add the ability for pycsw to operate against repositories using the Django ORM.

This will require abstraction so to as support both django and sqlalchemy (which is default).

support links in metadata model

Support online resources in data model, allowing for multiplicity, where link has the following properties: url, protocol, name, description.

enhance csw:ResponseHandler support

!GetRecords and Harvest allow a {{{csw:ResponseHandler}}} to be specified (subclause 10.8.4.14) for asynchronous processing.

Enhance current support to cover both operations (asynchronously) as well as additional protocols.

improve performance

the current approach of querying via !GetRecords results in performance improvements when metadata repositories have > 8000 records. Factors:

  • data model: we store the full XML document and query via embedded Python lxml xpath functions
  • sql fetching: we currently query and return all applicable records
  • sqlalchemy: creates some overhead cost

After many rounds of testing, it was found that deconstructing the data model (parse out XML and store in db columns) to eliminate the (expensive) xpath queries against the db in realtime and implementing paging makes performance results more than acceptable.

migrate repository layout to support pure XML model querying

Currently, pycsw splits metadata XML (the core queryables) into the data model.

The attached patch (so I don't lose the work) enables the use of pycsw with a simpler repository. Notes:

  • repository model much simpler to manage ({{{identifier}}} and {{{bbox}}} still stored in separate columns, the rest is full XML in text field)
  • id and spatial queries are done against column data (still) and SQLite3 functions mapping to Python methods
  • anytext type queries are done in the same manner
  • all other logical queries are done with XPath (SQLite3 functions mapping to Python method {{{util.query_xpath}}}

Benefits

  • less parsing on metadata loading
  • one repository is always loaded/queried
  • less configuration for the catalog administrator
  • less code handling of queryables (no having to keep track of db column names or object attribute names)
  • dead easy to define queryables (i.e. ns:queryablename = XPath)
  • handles multiple / repeated entities
  • !GetRecord queries against multiple typenames are now supported as a result
  • easier result set handling
  • streamlined db setup / metadata loading
  • this sets up for easier CSW-T functionality

The patch passes OGC CITE 103/103. APISO has not been updated yet. This is a break from the previous approach and a significant change.

fix GetRecords request handling with invalid SortBy property name

In !GetRecords requests, when an invalid !SortBy !PropertyName is specified, the CSW should return an !ExceptionReport. Currently returns a 500. The following request reproduces the error:

{{{

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" xmlns:gml="http://www.opengis.net/gml">
<csw:Query typeNames="csw:Record">
csw:ElementSetNamebrief/csw:ElementSetName
<csw:Constraint version="1.1.0">
ogc:Filter
ogc:BBOX
ogc:PropertyNameows:BoundingBox/ogc:PropertyName
gml:Envelope
gml:lowerCorner47 -5/gml:lowerCorner
gml:upperCorner55 20/gml:upperCorner
/gml:Envelope
/ogc:BBOX
/ogc:Filter
/csw:Constraint
ogc:SortBy
ogc:SortProperty
ogc:PropertyNamedc:foo/ogc:PropertyName
ogc:SortOrderDESC/ogc:SortOrder
/ogc:SortProperty
/ogc:SortBy
/csw:Query
/csw:GetRecords
}}}

add support for OGC WMS

per subject. Implementation notes:

  • via CSW-T harvest operation
  • use OWSLib for service parsing
  • break out into seperate metadata documents:
    • 1 document for service offering
    • 1 document foreach Layer
  • keep in mind support for future OGC services (WFS, WCS, SOS)

repurpose server/profiles

Move {{{server/profiles}}} to {{{server/plugins/profiles}}} so the plugin architecture can be abstracted to take on more than just CSW profiles.

add support for setup.py type install

Currently, the codebase is a simple download and configure approach. Add support for installation via setup.py. This will add support for properly referencing dependencies, and generating various templates/etc., at install time.

fix profile loading on Windows

For Windows installs, when server.profiles is set, running any CSW request results in the following:

{{{
Traceback (most recent call last):
File "C:/Program Files/Apache Software Foundation/Apache2.2/htdocs/pycsw/csw.py", line 37, in <module>
CSW = server.Csw('./default.cfg')
File "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\pycsw\server\server.py", line 110, in init
self.config.get('server', 'profiles'))
File "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\pycsw\server\profile.py", line 137, in load_profiles
look_for_subclass(modulename)
File "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\pycsw\server\profile.py", line 110, in look_for_subclass
module = import(modulename)
ImportError: Import by filename is not supported.
}}}

The following patch solves the issue:
{{{

Index: profile.py

--- profile.py (revision 310)
+++ profile.py (working copy)
@@ -103,10 +103,12 @@

def load_profiles(path, cls, profiles):
''' load CSW profiles, return dict by class name '''

  • import imp

def look_for_subclass(modulename):

- module = import(modulename)

  •    module = imp.load_source(modulename, path)
    
    • dmod = module.dict
      for modname in modulename.split('.')[1:]:
      dmod = dmod[modname].dict
      }}}

Will apply to trunk once tested on Linux.

support geometry transformation

!GetRecords requests with spatial filters allow for a given geometry to be passed with {{{@srsName}}}, a projection identifier.

The pycsw native geometry model (WKT) is always 4326, with an xy axis order.

Support the ability to convert from a non 4326 geometry for accurate spatial filtering. Taking advantage of changes implemented as per #35.

add testing framework

pycsw currently [wiki:OGCCITECompliance utilizes] the OGC CITE tests as a means of testing functionality.

There are features and use cases which OGC CITE does not test, but are valuable to pycsw testing (more exhaustive HTTP GET requests, error handling, etc.).

The proposed pycsw testsuite will be developed in the spirit of the [http://trac.osgeo.org/mapserver/browser/trunk/msautotest/wxs MapServer msautotest] tests. The test suite should allow for testing various configurations and profiles. Test data should be part of the testsuite, as well as various config files ({{{default.cfg}}}).

The test suite should be integrated with the {{{tester/}}} application, to maximize tests between the tester as well as local testing.

repurpose axis order for geometry

Process axis order accordingly, when:

  • parsing metadata for insert
  • storing WKT in the repository
  • processing !GetRecords requests with spatial predicates

This will include intelligence for CRS handling and axis order.

support abstract core model

Currently the codebase binds to the underlying database by way of configuration in {{{server/config.py}}}, and each of the profile queryable models, by way of mapping queryables (i.e. {{{apiso:Title}}}) to database column names. There is then a reverse mapping to {{{csw:Record}}}, of only queryable names, so as to transform to/from metadata formats.

This presents an issue when trying to setup pycsw against an existing, different metadata model. In this case, the user must override the mappings in each model to the underlying db columns.

Implementing a core abstract model, where all other models map to, would alleviate this issue such that users can override the mappings only in the core abstract model, either in the codebase, or by allowing a mapping file specified in, say {{{repository.model=path/to/model.py}}}, or configuration file {{{repository.model=path/to/model.cfg}}}, or inline in the main configuration.

add support for ogc:Function

Add the ability for OGC filters to accept ogc:Function's. This includes:

  • advertising (in Capabilities XML)
  • processing ogc:Function and applying to underlying queries

FYI non-aggregate queries only (aggregate functions are not supporting by OGC Filter).

safeguard ConfigParser config objects

When [metadata:main] options are not set/commented out, pycsw returns 500.

Implement an approach to safeguard these outputs by setting default values.

fix support for specifying alternate configuration from URL

Currently, this is accomplished by the client passing an HTTP header ({{{PYCSW_CONFIG}}}), and is used for the tester.

Passing the configuration as part of the base URL will allow for pycsw to advertise multiple configurations without the client having prior knowledge of {{{PYCSW_CONFIG}}} values/configurations.

This will also make the tester framework more straightforward.

fix resulttype handling

!GetRecords resulttype=hits should return an empty result set, with an empty {{{csw:SearchResults}}} element/attributes.

use non trunk version of owslib

For releases, a stable version owslib should be used. This will prevent older releases from pulling owslib svn trunk, which may cause errors.

fix geometry handling for older Shapely builds

{{{geometry.exterior.bounds}}} fails for the test case of a POLYGON with 0 area (i.e. {{{POLYGON ((10 10, 10 10, 10 10, 10 10, 10 10))}}}). !GetRecords handling fails when trying to write out geometry constructs of records.

add support for WSGI

Adding functionality for [http://en.wikipedia.org/wiki/Web_Server_Gateway_Interface WSGI] will improve performance, as well as integration with other applications.

internalize core queryable configuration settings

Currently, we allow for the user to map column names to queryables in config.

While we originally implemented this with maximum flexibility in mind, given our tie to SQLite3 (registered functions, etc.), we are (at this point) bound to the underlying data model. As well, the chances of a user having a custom defined SQLite3 database of metadata are not common.

Most CSW implementations provide a core data repository to which users import their metadata into, as opposed to allowing further flexibility.

Pulling these configuration options into the code will make configuration much easier for the end user.

add support for additional databases

Currently bound to SQLite3, extend to support other DB's (SQLAlchemy gives us this abstraction, but the Python SQLite3 bindings are tied to custom functions).

add support for JSON output

!DescribeRecord, !GetRecords, and !GetRecordById allow for specifying an {{{outputFormat}}} parameter. This should be a valid MIME type.

This is independent of the model of the metadata ({{{outputSchema}}}).

We currently support {{{application/xml}}} as the default {{{outputFormat}}}.

Steps to implement JSON as an {{{outputFormat}}}:

  • advertise {{{application/json}}} for !GetCapabilities responses
    • FYI the codebase already checks for invalid {{{outputFormat}}} request values
  • implement XML to JSON convertor, to be called before returning response. Because of the varying metadata models, implement generically (inspired by https://bitbucket.org/smulloni/pesterfish)
  • implement such that additional {{{outputFormat}}}'s can be supported if required (such as {{{text/plain}}}, {{{text/html}}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.