Giter VIP home page Giter VIP logo

generic-names's Introduction

Create the virtual environment required to run this project by installing Anaconda and running the following command to install the required packages
conda env create --file environment.yml

Activate the environment by running the below
activate generic-names

To test the program according to the specification, run the following command
python main.py

To run a training session on the full dataset
python src/modelling/train.py

To view how the hyperparameters were selected, run the below (this will take about 10 to 15 minutes)
python src/modelling/tune_hyperparameters.py

To generate a plot of the precision vs recall curve, run the below
python src/modelling/tune_decision_threshold.py

In the end, a decision threshold of 0.2 was selected since that seemed to be the best trade off between the precision and recall.
We get about a 50% to 60% precision and about a 60% to 70% (depending on the train/test split selected) recall at this threshold 
so we're capturing approximately two-thirds of all the generic names while still keeping the false positives relatively low 
especially considering how imbalanced the dataset is. Of course, this threshold can be tuned depending on what's more important:
minimising false positives or capturing as many of the generic names as possible

To view a sample plot of the precision recall curve, look at reports/decisionthreshold.png

As an exploration step, I also plotted the frequencies of all the characters in the generic and non-generic classes (note the frequencies
were normalised based on how many samples there were in each class). The plot seems to indicate that there some differences in these frequency 
distributions. Have a look at reports/frequency.png to see this difference.

To generate the plot again, run the below command
python src/data/explore.py

generic-names's People

Contributors

amritpurshotam avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.