This project is based on the National NLP Clinical Challenges (n2c2) and focuses on automatically determining the smoking status of patients from clinical discharge records. It involves the use of machine learning and deep learning models, detailed in the provided documentation and Jupyter Notebooks.
The repository includes the following files:
6001_Project_Report.pdf
- A detailed project report providing background, methodology, results, and discussion.Project_2_Smoking_Status_ALL_FINAL.ipynb
- Jupyter Notebook for preprocessing data and training machine learning models.Smoking_Status_Keras_Model.ipynb
- Jupyter Notebook for applying a deep learning model using Keras.
To view the Jupyter Notebooks, you'll need an environment that supports .ipynb
files. Here are a couple of options:
- JupyterLab/Jupyter Notebook: If you have this installed, navigate to the directory containing the files and run
jupyter notebook
. - Google Colab: Upload the notebooks to Google Colab to view and run them in the cloud.
The PDF file can be viewed with any standard PDF reader.
Feel free to fork this repository and contribute by:
- Extending the existing models
- Improving the methodologies
- Fixing bugs or issues
- Joan Mattle
- Yuktha Penumala
- Aizhan Uteubayeva
This project is open source and available under the MIT License.
- Harvard Medical School and the National NLP Clinical Challenges (n2c2) for the inspiration and dataset.
- All contributors and supporters of this project.