Giter VIP home page Giter VIP logo

samecorporation's Introduction

Find Similar Companies

About

Find Similar Companies is a project that will allow you to find the most simmilar company names for your given input. For the development of this project, both general techniques from NLP and machine learning are used.
This project are published on hugging face spaces! Our production model are also present on hugging face models.

Score


Installation

In order to run inference you need to install the necessary dependencies:

pip install -r requirements.txt

Inference

To launch inference run the server script:

python src/server.py

Test stand

Type Model
CPU Intel Core i5-3470
GPU (optional) NVIDIA GeForce GTX 1060 6gb
RAM Crucial DDR3 1600MHz 8GB x2

Comparison

Method F1 - score Accuracy Precision Recall Performance
word-by-word comparison 0.3540 0.9931 0.5398 0.2633 4.9571
Levenshtein distance 0.3499 0.9931 0.546 0.2574 6.4292
TF-IDF 0.5204 0.9918 0.457 0.6042 -
TF-IDF + Logistic regression 0.5009 0.9914 0.4336 0.593 -
fastText cosine similarity 0.409 0.9916 0.4629 0.3664 15.0971
sentence-bert (pretrained) 0.4459 0.9925 0.4223 0.4724 14.9001 (GPU)
sentence-bert (fine-tuned) 0.8815 0.9982 0.8642 0.8996 15.2045 (GPU)

Performance is a value (in seconds) for which the entire dataset (500k rows) is processed by method. For fastText and sentence-bert methods sentences embeddings are cached. Also, for sentence-bert, caching done by passing all unique names (17k samples) in one batch to GPU.


License

MIT

samecorporation's People

Contributors

sesevasa64 avatar klaasibub avatar

Stargazers

William R avatar Grazia Sveva avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.