Giter VIP home page Giter VIP logo

csr-call-scoring's Introduction

CSR-Call-Scoring

DataLoader_fns.py: contains functions needed to create dataloader object (save vocabulary, pad reviews, collate, get indices of a sentence from the vocab)

DatasetClasses.py: contains the classes that define the Yelp dataset and the Call Transcripts dataset

Inference_fns.py: contains functions that make inferences (getting accuracy, predicting for a new data point)

MainCalls.py: creates datasets from call transcripts for train, test, and dev, saves vocabulary, creates dataloaders, creates initial word embeddings matrix, trains model, runs model on test set and reports accuracies, saves the final model after training

MainYelp.py: creates datasets from Yelp Dataset for train, test, and dev, saves vocabulary, creates dataloaders, creates initial word embeddings matrix, trains model, runs model on test set and reports accuracies, saves the final model after training

MainCallsInference.py: loads in saved model and uses it to predict whether a new call is good or bad (cmd line argument is the file name that you want to run inference on)

Models.py: contains class definitions for the encoder and binary classifier pytorch models

Preprocessing.py: contains functions for call transcript preprocessing (takes in call transcript and returns an array of cleaned tokenized sentences)

TrainModel.py: contains function to train the model and reports accuracy and loss at each epoch

Yelp files:

dataset_dev.json: JSON file for the dev set of Yelp reviews. Each record contains the preprocessed review body and its classification.

dataset_test.json: JSON file for the test set of Yelp reviews. Each record contains the preprocessed review body and its classification.

dataset_train.json: JSON file for the train set of Yelp reviews. Each record contains the preprocessed review body and its classification.

yelp_preprocessing.py: Preprocesses the Yelp dataset. Takes the original Yelp dataset and creates dev, test, and train dataset JSON files from a subset. Cleans each review as it makes the dataset. Also reports statistics on the subset of the dataset used. NOTE: this script is not part of the model pipeline. It should be run to produce the three JSON files listed above after any changes are made. This script requires yelp_academic_dataset_review.json, which contains the full dataset of Yelp reviews. This can be downloaded from https://www.kaggle.com/yelp-dataset/yelp-dataset.

csr-call-scoring's People

Contributors

nmakkar99 avatar bharat-suri avatar hsaraiya5 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.