Giter VIP home page Giter VIP logo

nldi-crawler's Introduction

NLDI Crawler

Build Status

The Crawler is used to ingest event data and link it to the network. The only requirement is that the source system is able to provide GeoJSON via a web request. A database table (nldi_data.crawler_source) contains metadata about the GeoJSON. We can link events to the network via latitude/longitude coordinates or NHDPlus reach and measure.

Current nldi_data.crawler_souce table fields:

crawler_source_id source_name source_suffix source_uri feature_id feature_name feature_uri feature_reach feature_measure ingest_type
An integer used to identify the source when starting the crawler source. A human-oriented name for the source. The suffix to use in NLDI service urls to identify the source. A uri the crawler can use to retrieve source data to be indexed by the crawling method. The attribute in the returned data used to identify the feature for use in NLDI service urls. A human readable name used to label the source feature. A uri that can be used to access information about the feature. Conditionally Optional The attribute in the source feature data where the crawler can find a reachcode. Conditionally Optional The attribute in the source feature data where the crawler can find a measure to be used with the reachcode. Either reach or point. If reach then the feature_reach and feature_measure fields must be populated.

A command-line argument is used to initiate the ingest process for a data source:

java -jar target/nldi-crawler-0.4-SNAPSHOT.jar <crawler_source_id>

Developer Environment

nldi-db contains everything you need to set up a development database environment. It includes data for the Yahara River in Wisconsin.

To run the project you will need to create the file application.yml in the project's root directory and add the following:

nldiDbHost: hostNameOfDatabase
nldiDbPort: portNumberForDatabase
nldiDbUsername: dbUserName
nldiDbPassword: dbPassword

To run:

% mvn spring-boot:run

This project has some integration testing against the database. The "package" goal of the maven command will stop the build before running them. To set up the project for running the integration tests, add the following to your maven settings.xml file (the values below will work with the nldi-db docker container running on the same machine as the tests):

<profiles>
  <profile>
    <id>default</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <properties>
        <nldi.url>jdbc:postgresql://127.0.0.1:5433/nldi</nldi.url>
        <nldi.dbUsername>nldi</nldi.dbUsername>
        <nldi.dbPassword>nldi</nldi.dbPassword>
        <nldi.dbUnitUsername>nldi</nldi.dbUnitUsername>
        <nldi.dbUnitPassword>nldi</nldi.dbUnitPassword>
      </properties>
  </profile>
</profiles>

If running integration tests without maven, you may specify the properties in the file, application-it.yml. See the maven-failsafe-plugin configuration in the pom.xml for the mapping of properties to varables.

Running via Docker Compose

To run via Docker Compose, create a secrets.env file with the following format:

nldiDbHost: hostNameOfDatabase
nldiDbPort: portNumberForDatabase
nldiDbUsername: dbUserName
nldiDbPassword: dbPassword

... And run with:

docker-compose run -e CRAWLER_SOURCE_ID=<crawler_source_id> nldi-crawler

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.