smoking-status-identification-nlp's Introduction

Smoking Status Identification from Clinical Notes

Project Overview

This project is based on the National NLP Clinical Challenges (n2c2) and focuses on automatically determining the smoking status of patients from clinical discharge records. It involves the use of machine learning and deep learning models, detailed in the provided documentation and Jupyter Notebooks.

Files

The repository includes the following files:

6001_Project_Report.pdf - A detailed project report providing background, methodology, results, and discussion.
Project_2_Smoking_Status_ALL_FINAL.ipynb - Jupyter Notebook for preprocessing data and training machine learning models.
Smoking_Status_Keras_Model.ipynb - Jupyter Notebook for applying a deep learning model using Keras.

Viewing the Files

To view the Jupyter Notebooks, you'll need an environment that supports .ipynb files. Here are a couple of options:

JupyterLab/Jupyter Notebook: If you have this installed, navigate to the directory containing the files and run jupyter notebook.
Google Colab: Upload the notebooks to Google Colab to view and run them in the cloud.

The PDF file can be viewed with any standard PDF reader.

Contributing

Feel free to fork this repository and contribute by:

Extending the existing models
Improving the methodologies
Fixing bugs or issues

Authors

Joan Mattle
Yuktha Penumala
Aizhan Uteubayeva

License

This project is open source and available under the MIT License.

Acknowledgments

Harvard Medical School and the National NLP Clinical Challenges (n2c2) for the inspiration and dataset.
All contributors and supporters of this project.

Recommend Projects