Giter VIP home page Giter VIP logo

artificial-data-plug-and-play's Introduction

Artificial Data Plug and Play

Get up and running with experimenting on artificial NHS data!

This material is maintained by the NHS England Data Science team.

See our other work here: NHS England Analytical Services.

To contact us raise an issue on Github or via email and we will respond promptly.

What is artificial data?

Artificial data sets provide users with large volumes of data that share some of the characteristics of real data while protecting patient confidentiality. They are designed to model the structure of real data but are completely artificial โ€“ they do not contain any actual patient records. We are piloting this new service with a limited number of artificial data sets.

You can find out more about the pilot on the NHS website.

What is this repo for?

This repo contains some example code for getting started with using artificial data with minimal setup.

It was creating using the rap-package-template which provides a neat way to create new repositories for Reproducible Analytical Pipelines.

What does the repo contain?

The repo contains the following files and directories:

|- sql                  # Code for interacting with SQL
|- src                  # Source code for data ingestion, cleaning, processing, etc
|- templates            # Templates for excel reporting
|- tests                # Test modules
|- pyproject.toml       # Configuration
|- plug_and_play.ipynb  # Plug and play notebook
|- requirements.txt     # Python dependencies to be installed via pip
|- ...                  # Additional repo files (e.g. .gitignore)

Note: because this repo was created from the rap-package-template there are a number of files / folders that persist from that template. These have been left in the repo so that you can fork the repo and adapt to your own needs!

For the plug and play tutorial, the main file you'll be interacting with is plug_and_play.ipynb. See below for instructions on how to get set up to run the tutorial.

How do I get started?

If you are setting up the tutorial in an environment which is provisioned out of the box (such as Google Colab or GitHub Codespaces), see Quick start. More detailed instructions can be found in Full setup.

Quick start

The easiest way to run the tutorial is in an environment which is provisioned out of the box. Clicking one of the buttons below will open the repo in the respective environment with all the dependencies setup so you can just get coding!

Open In Codespaces Open In Colab

Full setup

Prerequisites:

  • A bash terminal (although similar instructions will work in PowerShell)
  • Python >= 3.10
  • An IDE or text editor (such as VS Code or PyCharm)

Open a terminal and execute the following

  1. Navigate to a directory you want to create the tutorial repo in (using cd DESTINATION_DIRECTORY)
  2. Clone the repo using git clone https://github.com/NHSDigital/artificial-data-plug-and-play.git
  3. Open the repo in the terminal using cd artificial-data-plug-and-play and create a virtual environment via python -m venv .venv (note you don't have to do this in a virtual environment, but it is recommended)
  4. Activate the environment and install the requirements . .venv/bin/activate && pip install -r requirements.txt
  5. (Optional) Install jupyter via pip install jupyter. This will allow you to use jupyter notebooks thoough the classic web interface.
  6. Open the tutorial
    • Using jupyter if you installed it using the command above jupyter notebook plug_and_play.ipynb
    • Alternatively, you can open the notebook in your IDE of choice (for example using VS Code)

You should now be ready to run the plug and play!

See also

Here are some other related projects that are worth checking out:

  1. Reproducible Analytical Pipeline example which uses artificial HES data to create a simple stats publication
  2. Codebase to generate artificial data written for Databricks using Python / PySpark

artificial-data-plug-and-play's People

Contributors

alistair-jones avatar

Stargazers

Chaeyoon avatar Alan Macdonald avatar

Watchers

Dan Stefaniuk avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.