Giter VIP home page Giter VIP logo

edstuto's Introduction

EDS-TeVa

EDS-Tuto

About

In this tutorial we introduce some issues related to the analysis of real world data that are made available for research in clinical data warehouses. It is targeted towards data scientists that master the basics of Python programming and data analysis. The tutorial is decomposed in a series of small exercises and a final project. Whereas small exercises illustrate specific issues, the final project mimics an end-to-end research study that may be reported in a scientific article.

Data is fake, and this project can consequently be freely shared without impacting patients’ privacy. A fake data generator is made available and can be tuned to illustrate various use cases. Its development has been freely inspired by the characteristics and issues observed while analyzing data of the Greater Paris University Hospitals.

The 2022 session for CentraleSupelec worked on the 0.0.1 version.

Getting started

Environment and kernel creation

Python, JupyterLab and an environment manager are recommended. You may choose for instance Anaconda.

First clone the project locally : git clone https://github.com/aphp/edstuto.git

If you use Conda as an environment manager, create a new Python environment with the required packages:

  1. conda create -n eds-tuto python=3.7
  2. conda activate eds-tuto
  3. pip install -r requirements.txt

Create and name a Jupyter kernel related to this virtual environment: 4. python -m ipykernel install --user --name eds_tutorial A kernel named eds_tuto is now available in your jupyterlab!

Start JupyterLab using: 5. jupyter lab JupyterLab will open automatically in your browser.

NB: For VS Code users, in order to see clearly the plots, it is recommended to enable the Theme Matplotlib Plots in your setting > Extensions > Jupyter.

Scientific libraries installation

The following scientific libraries developed in the context of Paris’ clinical data warehouse may moreover be leveraged to facilitate the resolution of some exercises:

  • eds-scikit: a set of tools to assist data scientists working on a clinical data warehouse (structured data).
  • edsnlp: a set of spaCy components that are used to extract information from clinical notes written in French (unstructured data).
  • edsteva: a set of tools to measure indicators describing data quality and its temporal variation (quality indicators).

Acknowledgement

We would like to thank Assistance Publique – Hôpitaux de Paris and AP-HP Foundation for funding this project.

edstuto's People

Contributors

aremaki avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.