Giter VIP home page Giter VIP logo

openownership / register-transformer-dk Goto Github PK

View Code? Open in Web Editor NEW
0.0 10.0 0.0 156 KB

Register Transformer DK ingests records from a Kinesis stream (published by register_ingester_dk) and transforms them into BODS v0.2 records

Home Page: https://bods-data.openownership.org/source/denmark/

Shell 1.57% Dockerfile 1.74% Ruby 96.69%
beneficial-ownership beneficial-ownership-data denmark opendata elasticsearch open-standards

register-transformer-dk's Introduction

Register Transformer DK

Register Transformer DK is a data transformer for the OpenOwnership Register project. It processes bulk data published to AWS S3, such as emitted from AWS Kinesis Data Firehose, converts them into the Beneficial Ownership Data Standard (BODS) format, and stores records in Elasticsearch. Optionally, it can also use AWS Kinesis for processing streamed data (rather than bulk data published to AWS S3), or for publishing newly-transformed records to a different stream.

The transformation schema is BODS 0.2.

Installation

Install and boot Register.

Configure your environment using the example file:

cp .env.example .env

Create the Elasticsearch indexes:

docker compose run transformer-dk create-indexes

Testing

Run the tests:

docker compose run transformer-dk test

Usage

To transform the bulk data from a prefix in AWS S3:

docker compose run transformer-dk transform-bulk raw_data/source=DK/year=2023/month=10/

register-transformer-dk's People

Contributors

dependabot[bot] avatar spacesnottabs avatar tiredpixel avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

register-transformer-dk's Issues

Map to the BODS 0.2 property interestLevel

At the moment, the directness of interests is not retained in the mapping.

Where an attribute has type SÆRLIGE_EJERFORHOLD in the source data, look for the […].vaerdier[].vaerdi value ‘Har indirekte besiddelser’ (has indirect holdings).

EXCEPT where the ‘Har indirekte besiddelser’ value is found, for each interest, set interest.interestLevel to ‘direct’. If '‘Har indirekte besiddelser’ is found:

  • Set interest.interestLevel to ‘unknown’.
  • Use a BODS annotation to point to the containing BODS Interests field. Set annotation.motivation to ‘commenting’ and ‘description’ to “One or more of the interests held in the subject is held indirectly”.

Retain entity form information

The original DK data does contain info about the form of entity (virksomhedSummariskRelation[].virksomhed.virksomhedsform[]).

                "virksomhedsform":
                [
                    {
                        "ansvarligDataleverandoer": "E&S",
                        "kortBeskrivelse": "APS",
                        "langBeskrivelse": "Anpartsselskab",
                        "periode":
                        {
                            "gyldigFra": "2005-03-07",
                            "gyldigTil": null
                        },
                        "sidstOpdateret": "2005-03-08T15:30:15.000+01:00",
                        "virksomhedsformkode": 80
                    }
                ]

However, this is not used and mapped. So at the moment, all DK entities are given the BODS code 'registeredEntity' and there is no further info.

Missing Data - Company Management Declared as BO

Where a company has declared management as the BO due to not being able to find out the true BO it seems like these people’s records aren’t being ingested (or are being ingested but not included in the mapping). This is based on a search of several companies using their CVR number.

Examples of countries declaring management as BO from Danish register
https://datacvr.virk.dk/enhed/virksomhed/4008830571
https://datacvr.virk.dk/enhed/virksomhed/4001192978
https://datacvr.virk.dk/enhed/virksomhed/4001920667
https://datacvr.virk.dk/enhed/virksomhed/4001534784
https://datacvr.virk.dk/enhed/virksomhed/16095230

Recommendation: Ensure companies reporting management as BOs are included in the mapping to BODS

Inaccurate Nationality Data

The data in person_nationalities is being taken from the country code of people's addresses.

Nationality isn't included in the raw data - there is no field for 'nationalitet' and the only country codes included are in the address field.

We should remove nationality from this mapping to avoid confusion. Address data including country codes is already being captured in person_addresses.

repo branch name standardisation

Without commenting on the broader debate of branch naming, 11 register repos use main as the main/master/latest branch, and 4 register repos use master as the main/master/latest branch.

I propose we standardise everything, and change this repo's main/master branch to main.

This will require co-ordination with CircleCI and Heroku configuration.

Map to the beneficialOwnershipOrControl property

As per this BODS guidance, interest.beneficialOwnershipOrControl should be set to True to indicate that the interestedParty is a beneficial owner.

Currently in the record_processor.rb script, the value ‘reel ejer’ (beneficial owner) is looked for amongst the medlemsData attributes, in order to identify BO records to process. This is correct. Beyond that, for each beneficial ownership interest, interest.beneficialOwnershipOrControl should be set to True.

Widen the scope of the interest_parser script to handle additional interest types

On the evidence of this mapping script, the register only looks for shareholding and voting rights interests once the ‘Reel ejer’ flag is found.

Looking at a sample of Danish source data, there are other types of interest that should be handled. The scope of the interest_parser script should be widened to handle the additional types:

  • BETYDELIG_INDFLYDELSE_VIA_ROLLE (Significant influence through role)
  • EJERANDEL_KAPITALKLASSE (Owner's capital class)
  • EJERANDEL_MEDDELELSE_DATO (Ownership Message Date)
  • SÆRLIGE_EJERFORHOLD (Special ownership)
  • SÆRLIGE_EJERFORHOLD_BESKRIVELSE (Special ownership description)

Mappings for these are below.

There may be other interest types that need to be handled. This can only be ascertained by identifying reliable documentation or by examining the full DK dataset, not just a sample.

BETYDELIG_INDFLYDELSE_VIA_ROLLE

If the value of this attribute is 'Er reel ejer som bestyrelsesmedlem' (Is beneficial owner as a board member) then:

  • interest.type = 'other-influence-or-control'
  • interest.details = '"Er reel ejer som bestyrelsesmedlem" (Is beneficial owner as a board member)'

If the value of this attribute is 'Er reel ejer som udpeget daglig ledelse' (Is beneficial owner as appointed daily management) then:

  • interest.type = 'senior-managing-official'
  • interest.details = '"Er reel ejer som udpeget daglig ledelse" (Is beneficial owner as appointed daily management)'

EJERANDEL_KAPITALKLASSE

This attribute accompanies the EJERANDEL_PROCENT (Ownership percent) and EJERANDEL_STEMMERET_PROCENT (Ownership voting percent) attributes which are already handled. The value of EJERANDEL_KAPITALKLASSE gives information about the share classes of the shares. So where EJERANDEL_PROCENT and EJERANDEL_STEMMERET_PROCENT are mapped to BODS interest types 'shareholding' and 'voting-rights' the EJERANDEL_KAPITALKLASSE attribute should be sought. If a value (X) exists for EJERANDEL_KAPITALKLASSE then for those shareholding and voting rights interests:

  • interest.details = 'Share class details: X'

EJERANDEL_MEDDELELSE_DATO

For the moment, I don't think we should do anything with the value of EJERANDEL_MEDDELELSE_DATO (Ownership message date). If it becomes clear what this date relates to, then we can map it properly.

SÆRLIGE_EJERFORHOLD

If the value of this attribute is 'Andre vedtægtsbestemte rettigheder' (Other statutory rights) then:

  • interest.type = 'other-influence-or-control'
  • interest.details = '"Andre vedtægtsbestemte rettigheder" (Other statutory rights)'

If the value of this attribute is 'Potentielle kapital- og/eller stemmeandele' (Potential capital and/or voting shares) then:

  • interest.type = 'other-influence-or-control'
  • interest.details = '"Potentielle kapital- og/eller stemmeandele" (Potential capital and/or voting shares)'

If the value of this attribute is 'Ret til at godkende årsrapport i forhold til udbyttebetalinger' (Right to approve annual report in relation to dividend payments) then:

  • interest.type = 'other-influence-or-control'
  • interest.details = '"Ret til at godkende årsrapport i forhold til udbyttebetalinger" (Right to approve annual report in relation to dividend payments)'

If the value of this attribute is 'Ret til at udpege ledelsesmedlemmer' (Right to appoint directors) then:

  • interest.type = 'appointment-of-board'
  • interest.details = '"Ret til at udpege ledelsesmedlemmer" (Right to appoint directors)'

If the value of this attribute is 'Vetoret' (Veto) then:

  • interest.type = 'other-influence-or-control'
  • interest.details = '"Vetoret" (Veto)'

SÆRLIGE_EJERFORHOLD_BESKRIVELSE

If the value of this attribute is X then:

  • interest.type = 'other-influence-or-control'
  • interest.details = 'X'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.