Giter VIP home page Giter VIP logo

pes_match's Introduction

ONS PES Matching Pipeline Scripts

A python pipeline for matching census and post-enumeration survey (PES) data deterministically.

For additional information consider reading:

Setup instructions

Installing Python and an IDE

We recommend using the IDE Spyder which is available through the anaconda navigator. This can be installed here and further instructions on installing Spyder can be found here.

Using this code

Once Python is installed, download this repository and execute the command pip install -r requirements.txt to install/update the required packages for this repository.

The aim of this repository is to provide a set of functions for cleaning and then matching census and PES data. PES_MATCH gives matchers the tools to automatically match records at the chosen level of geography and then resolve conflicts/difficult cases using the Clerical Resolution Online Widget. Associative matching can also be implemented to assist in finding all possible matches between census and PES households. Once all methods have been applied at a single level of geography, matches can be combined and residuals collected for the next stage of matching.

An example of a typical matching pipeline is provided in the pipeline/ repository, however users may wish to apply matching methods in their own chosen order. Files can be executed in the command line using python file_name.py or within Spyder using the inbuilt run function.

Pipeline running order

  • Step 1: Run scripts in pipeline/processing/ to clean raw files
  • Step 2: Update matchkeys in pipeline/X_Stage_X/ before running the scripts in order. 2 stages are included but more can be run if needed
  • Step 3 (Optional): Run clerical search (pipeline/Clerical_Search/) to find the remaining matches within chosen level of geography e.g., postcode or enumeration area

Directories and files of note

Descriptions of project directories and other significant files:

  • CROW/ - contains the code and config files for the Clerical Resolution Online Widget
  • Data/ - contains mock data and stores cleaned data, clerical inputs/outputs, checkpoint files and final outputs
  • library/ - contains functions and a configurable parameter file
  • pipeline/ - scripts forming the record matching pipeline. Matchkeys can be updated within the scripts

pes_match's People

Contributors

c-tomlin avatar

Stargazers

 avatar

Watchers

 avatar

pes_match's Issues

Matchkey Functions

Matchkeys are currently within functions.
The are updated within the function and then the function is called on two datasets at the chosen level of geography
Want to take the matchkeys out of functions and simply execute the matchkey code from the main pipeline

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.