Giter VIP home page Giter VIP logo

social_investment_analytical_layer's Introduction

Social Investment Analytical Layer

GitHub version

Standardised tables for doing cross agency work in the IDI. Refer to CHANGELOG.md for summary of changes.

Overview:

The SIU has created a Social Investment Analytical Layer (SIAL): events-structured tables that arrange a selected subset of the data held in Statistics New Zealand's IDI into a consistent format, making it easier and faster for authorised IDI users (researchers and analysts) to use and understand.

The SIAL is designed to be reusable tool for the purposes of easy creation of variables for analysis and to add extra cost information that is not inherently available in the IDI. It reformats the commonly used social sector tables available in the IDI into tables based on events (events-structured). An event is defined by a start date and an end date during which the individual was engaged in a particular interaction as recorded by an agency (governmental or NGO). The event can be described as being of a particular event-type (like a particular benefit type or a hospitalisation purchase unit code), and the SIAL also attempts to attach a cost for this event. When such costs are not available directly from the IDI, the SIAL augments the IDI data with costing information obtained from specific agencies. For most of the commonly used tables available in the IDI, there is a corresponding events-structured table under the SIAL.

For example, a hospitalisation spell for an individual in SIAL event terms will have the snz_uid, start and end dates of the hospitalisation, the type of event described by the purchase unit code, the cost of that event and the agency and sub-agency information that the individual interacted with. Another example would be a school enrollment, again defined by a start and end date of enrollment, along with the cost of enrollment. If costs are available from the IDI tables, these are directly used. In the above examples, the costs are not readily available in the IDI, but the SIAL has additional cost-related info obtained from the respective govt. agencies that enables putting a price to each of those events. Wherever the costs cannot be quantified, the cost columns are left NULL, or are completely absent. Also, events of an instantaneous nature (for example- purchasing a pharmaceutical drug) will have the same start and end dates.

Folder Descriptions:

SIAL_dependencies: this folder contains the main_sial.sas script that builds all tables/views. It also contains cost data that isn't available in the IDI along with code to load these files in to SQL.
SIAL_bqa_complete: this folder contains scripts that have been reviewed by the corresponding agencies.
SIAL_bqa_incomplete: this folder contains scripts that are currently being reviewed by the corresponding agencies.
SIAL_log: This folder is used to store the output logs that SAS generates. Since SAS does not have easy to use error handling the logs are written to file and scanned for errors.
SIAL_docs: This folder contains the data dictionary for the SIAL and an overview of the datasets timeline.

Pre-requisites

Running the SIAL requires read privileges for the following schemas in the IDI_Clean (for any archive version)-

  • moh_clean
  • moe_clean
  • msd_clean
  • acc_clean
  • cyf_clean
  • cor_clean
  • moj_clean
  • pol_clean
  • hnz_clean

If you do not have access to any of the above schemas, then the SIAL installation will still create the tables/views pertaining to that schema but will return a “SELECT PERMISSION not allowed” error when trying to utilise the view. It will still create the components that for the schemas that you have access to. At the end of installation, the tool creates a log itemising which components were successfully created, and which ones failed, or have been created but with SELECT Permission denied. It also gives you the reason for failure. To get access to the schemas listed above, please contact Statistics New Zealand.

Installation:

  1. Download the zip file containing the SIAL scripts from github.

  2. Email the zipped file to [email protected] and ask them to move it into your project folder.

  3. Unzip the files into your project.

  4. Within the unzipped folder, navigate to social_investment_analytical_layer\SIAL_dependencies folder. From this folder, open main_sial_sas_grid.sas in SAS EG

  5. Go to the main_sial_sas_grid.sas script- At the top of the script there are three macro variables called idi_refresh, targetschema & sial_code_path

  6. Change the targetschema to the location you wish to write your files to e.g. the SIU schema is DL-MAA2016-15

  7. Change the sial_code_path to the location where you stored the scripts on the network. E.g. for the SIU our location would be \wprdfs08\MAA2016-15 Supporting the Social Investment Unit\social_investment_analytical_layer

  8. Change idi_refresh to the required version of IDI_Clean database, e.g. IDI_CLEAN_20190420. 9. Run the main_sial_sas_grid.sas script.

The scripts runs in the following fashion-

  • First, it creates all the necessary variables required for execution based on the user's input.
  • It then uninstalls all SIAL components that are already available in the target schema specified by the user.
  • It then runs through all the .sas programs found in "sial_bqa_complete" folder, and then the ones in "sial_bqa_incomplete" folder, which all create the SIAL views and tables
  • The SIAL tables are then created one by one.

Note that each time you run main_sial.sas it will uninstall all the SIAL tables you currently have before creating the SIAL tables. Once the script finishes running, the WORK.sialexecresults table will give you the errors for each SIAL component from the execution if there are any.

Most of the individual sas programs are simply creating SIAL views and so they run in a matter of seconds. However, the two SIAL tables that are still created as tables SIAL_MSD_T1_events and SIAL_MOE_school_events take a significantly longer time to run. End to end, main_sial.sas should run in around 3 hours.

Output:

The following tables and views will be created in the target schema that you specified under IDI_Sandpit-

SIAL Views (can be found in SQL Management Studio under IDI_UserCode -> target schema -> Views)-

  • SIAL_ACC_injury_events
  • SIAL_COR_sentence_events
  • SIAL_CYF_abuse_events
  • SIAL_CYF_client_events
  • SIAL_HNZ_register_events
  • SIAL_IRD_income_events
  • SIAL_MIX_mortality_events
  • SIAL_MIX_selfharm_events
  • SIAL_MOE_ece_events
  • SIAL_MOE_intervention_events
  • SIAL_MOE_itl_events
  • SIAL_MOE_tertiary_events
  • SIAL_MOH_B4School_events
  • SIAL_MOH_cancer_events
  • SIAL_MOH_chronic_events
  • SIAL_MOH_gms_events
  • SIAL_MOH_labtest_events
  • SIAL_MOH_nir_events
  • SIAL_MOH_nnpac_events
  • SIAL_MOH_pfhd_events
  • SIAL_MOH_pharm_events
  • SIAL_MOH_primhd_events
  • SIAL_MOJ_courtcase_events
  • SIAL_MSD_T2_events
  • SIAL_MSD_T3_events
  • SIAL_POL_offender_events
  • SIAL_POL_victim_events

SIAL tables (can be found in SQL Management Studio under IDI_Sandpit -> target schema -> Tables)-

  • SIAL_MSD_T1_events
  • SIAL_MOE_school_events

SIAL Supporting tables(can be found in SQL Management Studio under IDI_Sandpit -> target schema -> Tables)-

  • inflation_index
  • moe_school_decile_pricing
  • moe_ter_fundingrates
  • moh_b4sc_pricing
  • moh_primhd_pu_pricing
  • moh_pu_pricing
  • moj_offence_to_category_map
  • moj_offense_cat_pricing
  • cor_mmc_pricing

Uninstallation:

  1. Open uninstall_sial.sas under social_investment_analytical_layer\SIAL_dependencies folder in SAS EG
  2. Go to the uninstall_sial.sas script- At the top of the script there are 2 macro variables called targetschema & sial_code_path
  3. Change the targetschema to the project schema from which the SIAL components are to be removed (for example, the SIU schema is DL-MAA2016-15).
  4. Change the sial_code_path to the location where you stored the scripts on the network
  5. Run the uninstall.sas script.

Known Issues:

  1. IDI_Clean.moh_clean.mortality_registrations table structure has changed since the latest IDI refresh. The latest SIAL script has been modified to accommodate for this change. What this means is that if the SIAL is repointed to IDI_Clean_20161020, then the script 'MIX_mortality_events.sql' is bound to fail. You will need to obtain the earlier version of this SIAL script from Github and replace the new script with the older version.

  2. If the SIAL is to be run pointing to the IDI_Clean_20160715 refresh version, please note that you will have failures for 'MOH_nnpac_events.sql', 'MIX_mortality_events.sql' and 'CYF_client_events.sql'. The underlying IDI tables have changed the strucuture since this refresh, and the latest SIAL tables have also changed to reflect these underlying table-level changes. To circumvent these errors, use an older version of these scripts from Github or make the required changes to these 3 scripts manually by renaming the appropriate columns to reflect what is available in this IDI refresh version. For more guidance/help, contact us at [email protected].

Getting Help:

For more help/guidance in running the SIAL, email [email protected]

Tracking number: SIU-2017-0139

social_investment_analytical_layer's People

Contributors

conrad-mac avatar ernestynne avatar kylie-reiri avatar simon-anasta avatar vinaybenny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

social_investment_analytical_layer's Issues

Corrections cost logic incorrect

The code is not taking into account the fact that there are different daily corrections costs each year. The join will need to be adjusted to take into account the years.

sole trader, partnership, shareholder and rental losses counted as profits

The SIAL income table counts takes abs() of sole trader, partnership, shareholder and rental net profits. This means losses are counted as profits.

Furthermore, there are some values of suspiciously large magnitude (e.g. greater than 100 million). As no individual is likely to have profits or losses that are so large within a single year a filter should be applied to automatically sanitize such extreme values.

SIAL_HNZ_register_events doesn't show all applications

This is because an inner join is applied between housing application tables and register exit. This fetches only those records that are either housed or exitted. The applications still on register does not appear. This needs to be changed to a left join.

WFF double count

Those who are not aware may double count Working for Families costs because these are captured in the IRD income table as well as the MSD T2 table. Would recommend removing events with service code '064' from the T2 SIAL table so that there is no risk of people double counting working for Families. This would be consistent with how T1 main benefit costs are retrieved from IRD information.

SIAL prefixes

Folder names all have SIAL_ prefix. This is redundant and doesn't follow current code guidelines (which is understandable as this repo was created before guidelines were created). Something to think about for next update. Will require some code change where code uses filepaths.

README doesn't mention expected output

The readme needs to mention the expected output from running the main_sial.sas script and where to look for the outputs views and tables. The same needs to be done for uninstall_sial as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.