Giter VIP home page Giter VIP logo

openownership / register-ingester-dk Goto Github PK

View Code? Open in Web Editor NEW
0.0 7.0 0.0 115 KB

Register Ingester DK is an application designed for use with beneficial ownership data from Denmark's Central Business Register published by the Danish Business Authority

Home Page: https://bods-data.openownership.org/source/denmark/

License: Apache License 2.0

Shell 3.57% Dockerfile 4.16% Ruby 92.27%
beneficial-ownership beneficial-ownership-data denmark elasticsearch open-source opendata

register-ingester-dk's Introduction

Register Ingester DK

Register Ingester DK is a data ingester for the OpenOwnership Register project. It processes bulk data published in the Central Business Register published by the Danish Business Authority in Denmark, and ingests records into Elasticsearch. Optionally, it can also publish new records to AWS Kinesis. It uses raw records only, and doesn't do any conversion into the Beneficial Ownership Data Standard (BODS) format.

Installation

Install and boot Register.

Configure your environment using the example file:

cp .env.example .env
  • DK_CVR_PASSWORD: DK Elasticsearch source password
  • DK_CVR_USERNAME: DK Elasticsearch source username
  • DK_STREAM: AWS Kinesis stream to which to publish new records (optional)

Create the Elasticsearch indexes:

docker compose run ingester-dk create-indexes

Testing

Run the tests:

docker compose run ingester-dk test

Usage

To ingest the bulk data:

docker compose run ingester-dk ingest-bulk

register-ingester-dk's People

Contributors

dependabot[bot] avatar spacesnottabs avatar stephenabbott avatar tiredpixel avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

register-ingester-dk's Issues

No search context found for id error

This month's bulk data import crashed during DK ingestion with No search context found for id.

I haven't investigated, but I suspect that something took too long, and the scroll cursor timed out.

I'm trying to simply rerun the entire DK ingestion.

/usr/local/bundle/gems/elasticsearch-transport-7.17.7/lib/elasticsearch/transport/transport/base.rb:218:in `__raise_transport_error': [404] {"error":{"root_cause":[{"type":"search_context_missing_exception","reason":"No search context found for id [325827711]"},{"type":"search_context_missing_exception","reason":"No search context found for id [325827713]"},{"type":"search_context_missing_exception","reason":"No search context found for id [325827712]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":-1,"index":null,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [325827711]"}},{"shard":-1,"index":null,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [325827713]"}},{"shard":-1,"index":null,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [325827712]"}}],"caused_by":{"type":"search_context_missing_exception","reason":"No search context found for id [325827712]"}},"status":404} (Elasticsearch::Transport::Transport::Errors::NotFound)
        from /usr/local/bundle/gems/elasticsearch-transport-7.17.7/lib/elasticsearch/transport/transport/base.rb:341:in `perform_request'
        from /usr/local/bundle/gems/elasticsearch-transport-7.17.7/lib/elasticsearch/transport/transport/http/faraday.rb:36:in `perform_request'
        from /usr/local/bundle/gems/elasticsearch-transport-7.17.7/lib/elasticsearch/transport/client.rb:197:in `perform_request'
        from /usr/local/bundle/gems/elasticsearch-7.17.7/lib/elasticsearch.rb:41:in `method_missing'
        from /usr/local/bundle/gems/elasticsearch-api-7.17.7/lib/elasticsearch/api/actions/scroll.rb:57:in `scroll'
        from /home/x/r/lib/register_ingester_dk/clients/dk_client.rb:53:in `scroll'
        from /home/x/r/lib/register_ingester_dk/clients/dk_client.rb:25:in `block in all_records'
        from /home/x/r/lib/register_ingester_dk/apps/ingester.rb:28:in `each'
        from /home/x/r/lib/register_ingester_dk/apps/ingester.rb:28:in `each'
        from /home/x/r/lib/register_ingester_dk/apps/ingester.rb:28:in `each'
        from /home/x/r/lib/register_ingester_dk/apps/ingester.rb:28:in `each_slice'
        from /home/x/r/lib/register_ingester_dk/apps/ingester.rb:28:in `call'
        from /home/x/r/lib/register_ingester_dk/apps/ingester.rb:16:in `bash_call'
        from /home/x/r/bin/ingest-bulk:8:in `<main>'

repo branch name standardisation

Without commenting on the broader debate of branch naming, 11 register repos use main as the main/master/latest branch, and 4 register repos use master as the main/master/latest branch.

I propose we standardise everything, and change this repo's main/master branch to main.

This will require co-ordination with CircleCI and Heroku configuration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.