Resume Classifier

This project involves the development of a resume classification system using a fine-tuned BERT model. The model classifies resumes into predefined job categories and is trained on a dataset of 2400+ resumes. The project also includes a web application for uploading resumes in PDF format and predicting the job category using Streamlit.

Introduction
Features
Installation
Usage

Introduction

Accurate classification of resumes is crucial for efficient talent acquisition and human resource management. This project leverages a fine-tuned BERT model to classify resumes into specified job categories, providing a scalable solution integrated with a user-friendly web application.

Features

PDF Resume Upload: Upload resumes in PDF format.
Resume Classification: Predict job categories using a fine-tuned BERT model.
Web Application: Streamlit-based interface for easy interaction.
Real-time Processing: Handle resume uploads and provide instant classification results.

Installation

Follow these steps to set up the project locally:

Clone the repository:

git clone https://github.com/dvtushar/Resume-Classifier.git
cd resume-classifier

Create a virtual environment::

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages::
```
pip install -r requirements.txt
```

Usage

To run the Streamlit web application:

Activate the virtual environment::

 source venv/bin/activate  # On Windows: venv\Scripts\activate

Start the Streamlit application::

streamlit run app.py --server.enableXsrfProtection false

Upload a PDF Resume:: Open your browser and go to http://localhost:8501, upload a PDF resume, and view the predicted job category.

Model Training and Evaluation

The model training process involves the following steps:

Data Preparation::

Convert PDFs to text using the PyPDF2 library.
Encode job categories using a label encoder.
Tokenize the text data using BERT tokenizer.

Fine-tuning BERT::

Use Hugging Face's Transformers library to fine-tune the BERT model on the dataset.
Utilize AdamW optimizer and a learning rate scheduler.

Training Loop::

Train the model for 3 epochs with gradient accumulation.
Evaluate the model on validation and test datasets.

Results:

The model achieved the following performance metrics:

Validation Accuracy: 0.7882037533512064
Validation F1 Score: 0.79
Test Accuracy: 0.7828418230563002
Test F1 Score: 0.78

The Streamlit web application (app.py) allows users to upload resumes and get predictions for job categories.

Screenshot of the working application

The Streamlit web application (app.py) allows users to upload resumes and get predictions for job categories.

dvtushar / resume_classifier Goto Github PK