Welcome to the repository for synthetic data generation based on New York's SPARCS Hospital Discharge Records). Our goal here is to showcase how synthetic data can enhance the capabilities of machine learning models.
Synthetic data is a game-changer in the realm of machine learning. By generating high-quality synthetic data, we can train large-scale ML systems which, interestingly, can often perform more effectively on real-world data compared to models trained and tested solely on actual data. This approach minimizes overfitting, diversifies the training regime, and can provide robustness to the machine learning systems, making them more adaptable to a variety of scenarios.
Item | Status |
---|---|
Synthetic Data Generation Models | โ๏ธ Completed |
Data Specification Paper | โณ In Progress |
Risk Prediction Paper | โณ In Progress |
This Jupyter notebook walks you through the initial steps in our data preprocessing pipeline. Here, we detail the process of cleaning the SPARCS Hospital Discharge Records, ensuring data quality and consistency.
In this notebook, we delve into encoding techniques tailored for the dataset. Encoding is crucial in preparing the data for synthetic data generation, ensuring that the generated data maintains properties similar to the original.
A comprehensive guide that takes you step-by-step through the process of synthetic data generation for the NY SPARCS data. Ideal for both beginners and experienced practitioners.
New York's SPARCS Hospital Discharge Record data from 2009 to 2021 can be obtained from the Health Data NY Catalog.
This text file contains ICD-9 to ICD-10 mappings, an essential tool for understanding the conversion between these medical coding systems.
Updated geographical data linked with the NY SPARCS records. This dataset provides details on Operating Certificate Numbers, assisting in the geographic location mapping of hospitals and care facilities.
This project is licensed, ensuring open source availability while protecting against unauthorized use or modification. Please take a look at the LICENSE file for a detailed description of terms and conditions.
Contributions are what make the open-source community such an inspiring place for discovery, learning, and innovation. We welcome your contributions! Whether it's fixing bugs, enhancing documentation, proposing new features, or helping out in any other way, every contribution counts.
We thank the New York Department of Health for releasing these datasets. Note: The New York State Department of Health makes no representation, warranty or guarantee relating to the data or analyses derived from these data. For more information on the New York Department of Health Data Use Policy, please see their statement.