This project involves the development of a resume classification system using a fine-tuned BERT model. The model classifies resumes into predefined job categories and is trained on a dataset of 2400+ resumes. The project also includes a web application for uploading resumes in PDF format and predicting the job category using Streamlit.
Accurate classification of resumes is crucial for efficient talent acquisition and human resource management. This project leverages a fine-tuned BERT model to classify resumes into specified job categories, providing a scalable solution integrated with a user-friendly web application.
- PDF Resume Upload: Upload resumes in PDF format.
- Resume Classification: Predict job categories using a fine-tuned BERT model.
- Web Application: Streamlit-based interface for easy interaction.
- Real-time Processing: Handle resume uploads and provide instant classification results.
Follow these steps to set up the project locally:
- Clone the repository:
git clone https://github.com/dvtushar/Resume-Classifier.git cd resume-classifier
- Create a virtual environment::
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install the required packages::
pip install -r requirements.txt
To run the Streamlit web application:
- Activate the virtual environment::
source venv/bin/activate # On Windows: venv\Scripts\activate
- Start the Streamlit application::
streamlit run app.py --server.enableXsrfProtection false
- Upload a PDF Resume:: Open your browser and go to http://localhost:8501, upload a PDF resume, and view the predicted job category.
The model training process involves the following steps:
- Data Preparation::
- Convert PDFs to text using the PyPDF2 library.
- Encode job categories using a label encoder.
- Tokenize the text data using BERT tokenizer.
- Fine-tuning BERT::
- Use Hugging Face's Transformers library to fine-tune the BERT model on the dataset.
- Utilize AdamW optimizer and a learning rate scheduler.
- Training Loop::
- Train the model for 3 epochs with gradient accumulation.
- Evaluate the model on validation and test datasets.
The model achieved the following performance metrics:
- Validation Accuracy: 0.7882037533512064
- Validation F1 Score: 0.79
- Test Accuracy: 0.7828418230563002
- Test F1 Score: 0.78
The Streamlit web application (app.py) allows users to upload resumes and get predictions for job categories.
The Streamlit web application (app.py) allows users to upload resumes and get predictions for job categories.