Giter VIP home page Giter VIP logo

nmandery / batyr Goto Github PK

View Code? Open in Web Editor NEW
28.0 7.0 10.0 3.14 MB

Microservice for on-demand synchronization of geographical vector datasources to a PostgreSQL/PostGIS database. The service provides an HTTP API for easy integration into other applications.

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.39% C++ 48.27% C 0.16% Shell 0.73% Python 5.84% CSS 13.53% HTML 2.83% JavaScript 1.85% CMake 26.41%
postgis-database ogr synchronization geometry gis postgresql gdal

batyr's Introduction

batyr

A server which connects all kinds of vector geodata sources to a PostgreSQL/PostGIS database and provides a structured way to synchronize external data to database tables.

One common situation when dealing with geographic data is repeatedly exporting and importing this data to and from a PostGIS-enabled database. While the export is very well covered by products like Mapserver and GeoServer, importing is a bit more tricky. Common solutions consist mostly of custom scripts wrapping commands like shp2pgsql or ogr2ogr. These solutions often fail or at least need some tricky hacks if single rows of data should be updated instead of deleting and restoring the complete table content. It is also hard to account for slow or interrupted transactions and still make sure that the data stays synchronized as a whole. Using these import scripts requires either command line access or some custom code to hook them up to a job queue or even web interface to make them usable from within other applications.

Flaws like those were the reason for us to create batyr as a reusable solution for similar demands in the future.

Screenshot of the status page of the webinterface

batyr is a single server application providing the following:

  • "Intelligent" writing of data. A synchronization does not consist of a complete truncate and restore of a table anymore. Only features which have any differences to the ones provided by the external datasource are actually updated. New features are only created if they are not already in the database and features get (optionally) removed from the database if they are not part of the datasource any more. All this uses the primary key of the table to identify matching features from the datasource.
  • An integrated web-interface to get an overview on the current state of the server and to optionally start syncronizations manually.
  • A well-documented HTTP-API to easily integrate the batyr into other applications and allow flexible triggering of synchroniszations. Furthermore the HTTP-API provides methods to integrate batyr in existing monitoring systems like Nagios.
  • On-the-fly transformation of geometries to the spatial reference system of the database table. The required reference system is looked up in the PostGIS geometry_columns view/table and the transformation itself is performed by PostGIS.
  • Internally batyr uses the OGR-library to access datasources. So batyr covers all vector formats supported by OGR and connecting to - for example - a WFS is possible. Additionally this allows using OGR Virtual Formats for extended configuration options.
  • Synchronization jobs are internally queued and are handled in parallel using a configurable number of database connections. This takes care of a responsive HTTP-API as well as optimal usage of resources.

With these features it is possible to quickly integrate external geodata into your PostGIS database - without having to spend time creating custom code.

Screenshot of the jobqueue of the webinterface

For the complete manual see the included MANUAL.md file.

batyr's People

Contributors

azuledu avatar nmandery avatar patrickbr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

batyr's Issues

Interval based pulling

Allow layers to be pulled in an configurable interval.

In the most minimal case this would me mean adding a pull.job to the queue every n-seconds. In the best case batyr should estimate the total duration which it will take to finish all jobs which are queued and add the new pull event to the queue so that the interval is kept with as little delay as possible.

The second approach would mean collecting statistics on the duration of all job types and variants

configuration option to specify the primary key for a table

This would be useful to handle

  • tables without primary key
  • views with rules to distribute data to other tables

This option should allow a comma-seperated list of column names to handle composite primary keys.

The option should override existing primary keys

Build failed on debian stretch with updated gdal lib

I have tried to compile the project on debian stretch. Because of the updated libary versions in the distribution, i also updated the install steps.
I'm using the Gitlab CI Service for building, you can see the complete build progress here:
https://gitlab.com/lb1c/batyr/-/jobs/53577003

                 from /builds/lb1c/batyr/src/server/db/transaction.h:8,
                 from /builds/lb1c/batyr/src/server/db/connection.h:13,
                 from /builds/lb1c/batyr/src/server/worker.h:12,
                 from /builds/lb1c/batyr/src/server/worker.cpp:11:
/usr/include/gdal/ogrsf_frmts.h:245:25: note: declared here
     static void         DestroyDataSource( OGRDataSource * ) OGR_DEPRECATED("Use GDALDataset class instead");
                         ^~~~~~~~~~~~~~~~~
src/server/CMakeFiles/batyrd.dir/build.make:614: recipe for target 'src/server/CMakeFiles/batyrd.dir/worker.cpp.o' failed
make[2]: *** [src/server/CMakeFiles/batyrd.dir/worker.cpp.o] Error 1
CMakeFiles/Makefile2:269: recipe for target 'src/server/CMakeFiles/batyrd.dir/all' failed
make[1]: *** [src/server/CMakeFiles/batyrd.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
ERROR: Job failed: exit code 1```

Crash with message "longjmp causes uninitialized stack frame"

The backtrace send to stdout:

*** longjmp causes uninitialized stack frame ***: /usr/bin/batyrd terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f3d78cbb2a7]
/lib/x86_64-linux-gnu/libc.so.6(+0xef239)[0x7f3d78cbb239]
/lib/x86_64-linux-gnu/libc.so.6(__longjmp_chk+0x33)[0x7f3d78cbb1a3]
/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4(+0xbc35)[0x7f3d71c41c35]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7f3d79704030]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xed77)[0x7f3d79703d77]
/lib/x86_64-linux-gnu/libpthread.so.0(sigwait+0x37)[0x7f3d79703df7]
/usr/lib/libPocoUtil.so.9(_ZN4Poco4Util17ServerApplication25waitForTerminationRequestEv+0x5d)[0x7f3d7ab5b49d]
/usr/bin/batyrd(_ZN5Batyr6Server4mainERKSt6vectorISsSaISsEE+0x131)[0x416fb1]
/usr/lib/libPocoUtil.so.9(_ZN4Poco4Util11Application3runEv+0x1b)[0x7f3d7ab4476b]
/usr/lib/libPocoUtil.so.9(_ZN4Poco4Util17ServerApplication3runEiPPc+0x79)[0x7f3d7ab5b809]
/usr/bin/batyrd(main+0x27)[0x4171f7]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f3d78beaead]
/usr/bin/batyrd[0x4174f1]

Wrap prepared-statements in own object for RAII-style free-ing

To avoid issues like

2013-11-18 13:35:02 [Information] [Worker] job c8d14bb2e0f140a29fa6edd50a9b51925192 pulling layer "investements"
2013-11-18 13:35:02 [Information] [Worker] job c8d14bb2e0f140a29fa6edd50a9b51925192 geometry column geom_station_1 uses a known, defined SRID (21781). Reprojecting the geometries if they got a SRS assigned, otherwise assigning the SRID of the table.
2013-11-18 13:35:02 [Error] [Db::Transaction] query failed: prepared statement "batyr_insertc8d14bb2e0f140a29fa6edd50a9b51925192" already exists [sqlstate: 42P05]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.