Giter VIP home page Giter VIP logo

openownership / register-transformer-sk Goto Github PK

View Code? Open in Web Editor NEW
0.0 10.0 0.0 144 KB

Register Transformer SK ingests records from a Kinesis stream (published by register_ingester_sk) and transforms them into BODS v0.2 records

Home Page: https://bods-data.openownership.org/source/slovakia/

Shell 1.90% Dockerfile 2.10% Ruby 96.00%
beneficial-ownership beneficial-ownership-data elasticsearch open-standards

register-transformer-sk's Introduction

Register Transformer SK

Register Transformer SK is a data transformer for the OpenOwnership Register project. It processes bulk data published to AWS S3, such as emitted from AWS Kinesis Data Firehose, converts them into the Beneficial Ownership Data Standard (BODS) format, and stores records in Elasticsearch. Optionally, it can also use AWS Kinesis for processing streamed data (rather than bulk data published to AWS S3), or for publishing newly-transformed records to a different stream.

The transformation schema is BODS 0.2.

Installation

Install and boot Register.

Configure your environment using the example file:

cp .env.example .env

Create the Elasticsearch indexes:

docker compose run transformer-sk create-indexes

Testing

Run the tests:

docker compose run transformer-sk test

Usage

To transform the bulk data from a prefix in AWS S3:

docker compose run transformer-sk transform-bulk raw_data/source=SK/year=2023/month=10/

register-transformer-sk's People

Contributors

dependabot[bot] avatar spacesnottabs avatar tiredpixel avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

register-transformer-sk's Issues

Add Register Partnerov Verejného Sektora ID scheme to org-id and revise ID mappings to use correct prefix/ID number

The ID scheme for entities and individuals listed on Slovakia's Register Partnerov Verejného Sektora (RPVS) is not listed on org-id.

This means that the Open Ownership Register code looks to be generating temporary IDs for people when data is ingested from the RPVS. Entity IDs are pulled in from RPVS but are being stored without an org-id prefix.

Here is an example: https://register.openownership.org/entities/59c2263f67e4ebf340344201

The BODS JSON contains:

    "identifiers": [
      {
        "schemeName": "SK Register Partnerov Verejného Sektora",
        "id": "201232"
      },
      {
        "scheme": "MISC-Slovakia PSP Register",
        "schemeName": "Not a valid Org-Id scheme, provided for backwards compatibility",
        "id": "201232"
      },
 

for Thomas Michael Amend.

And here is the entity ID for Körber Supply Chain Logistics GmbH:

  {
    "statementID": "openownership-register-5837167456734863248",
    "statementType": "entityStatement",
    "entityType": "registeredEntity",
    "name": "Körber Supply Chain Logistics GmbH",
    "incorporatedInJurisdiction": {
      "name": "Germany",
      "code": "DE"
    },
    "identifiers": [
      {
        "schemeName": "SK Register Partnerov Verejného Sektora",
        "id": "hrb711189"
      },

This ticket would be for Open Ownership and Open Data Services to consider adding RPVS to org-id and then making the necessary updates to the ID mappings for our Slovakia RPVS data source.

Person Statement - Address country

Currently the country field of a person's registered address is generated by using the country code of an individual's nationality. It is not uncommon for people to live abroad from their home nation, therefore the accuracy of this method isn't ideal.

While the alternative method of determining a country from its address using Google Geocoder (as done for entities) isn't perfect either, it could potentially be more accurate.

Mapping PEP status in BODS v0.2

The BODS schema states that PEP fields should only be used when these declarations of status are expected as part of a BO regime. Slovak law does mandate disclosure of public official status in the register, therefore this requirement to use those fields is met.

PEP status is represented in the RPVS data source called KonecniUzivateliaVyhod which contains details about the beneficial owners of companies. The field JeVerejnyCinitel translates as "Is a public official" and contains a boolean value of true or false. In BODS v0.2 this can be directly mapped to the hasPepStatus field, which is also boolean, within a personStatement .

OOC Statement - statementDate mapping

Currently, it appears that the Slovak transformer determines the statementDate by looking for the max date in a record. However, in the instance of updates, this results in the same date being used for all the OOC statements about an entity including the deprecated statement. This means that it is impossible to tell which OOC statement is actually current.

The Open Ownership register records the statementDate field correctly. The method used there should be replicated here.

An example:
On the SK register -
Entity name: GPP INDUSTRIE BAU, s. r. o.
Entity Statement ID: 6595377034097487679
Subject of two OOC statements:
OOC Statement ID: 12345957095720737766
OOC Statement date: 2017-08-03
OOC Statement ID: 4402911589374326650
OOC Statement date: 2017-08-03

On the OO register -
Entity name: GPP INDUSTRIE BAU, s. r. o.
Entity Statement ID: "openownership-register-12113881716578601941"
Subject of two OOC statements:
OOC Statement ID: "openownership-register-5410405319263359080"
OOC Statement date: 2017-08-03
OOC Statement ID: "openownership-register-8730749564560403448"
OOC Statement date: 2017-02-01

The OO register is accurately reflecting the source data here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.