Giter VIP home page Giter VIP logo

fashion_nlp_v2's Introduction

FashionNLP

FashionBrain D2.1: Named Entity Recognition and Linking Methods EU project 732328: "Fashion Brain"

In this repository, we provide a natural language processing tool called FashionNLP which is specially designed for fashion textual data. This tool extends existing state of the art NER technique to fashion application. More specifically, FashionNLP has three main components: NER, where fashion entities are recognized on textual data, NEL, where we link the fashion entity to the FashionBrain taxonomy and finally, in case the fashion entity does not exist in the FashionBrain taxonomy, we add it to the taxonomy.

Getting Started

This project requires PyTorch 0.4+ and Python 3.6+. you need to install Flair using this command

pip install flair

Then, you can start using the FashionNLP package:

git clone https://github.com/FashionBrainTeam/fashion_nlp_v2
cd ./fashion_nlp_v2/

Description of the FashionNLP Package

The "fashionnlp" package contains three folders:

  • src contains three python scripts:
    • lstm_fashion.py: the implementation of the LSTM-CRF models
    • bootsrap_lstm.py: the implementation of the bootstraping approach
    • taxonomy_matching.py: the implementation of the taxonomy enrichment
  • data contains the training set (fashion_items_train.txt), the testing set (fashion_items_test.txt) and the FashionBrain taxonomy (FBtaxonomy.csv)
  • output contains the results of the bootstrap approach.

Running the code

  • The arguments needed to train an LSTM-CRF model are:
    • data folder: Path to input folder containing training and testing sets
    • embedding: Type of embedding and it could be one of the three options: ‘no char’, ‘char’ or ‘flair’ The word ‘fashion’ should be filled with either white or black RGB # F1
    • epochs: Number of epochs to train the chosen model
    • output folder: Path to the output folder to save three files: loss.tsv contains the accuracy measures in each epoch, test.tsv contains the the testing set with model labels and training.log contains the log history.

Example:

   python lstm_fashion.py --data_folder '../data/lstm_input' --embedding 'no_char' --epochs 150 --output_folder '../output'
  • The arguments needed to use the bootstrap approach are:
    • model: Path to folder containing the model to load
    • first iteration: Path to folder containing the first iteration input data
    • second iteration: Path to folder containing the second iteration input data
    • epochs: Number of epochs to train the chosen model
    • retrained: Path to the data file trained in the second iteration
    • output folder: Path to the output folder to save three files

Example:

   python bootsrap_lstm.py --model '../output/no_char_1st_iter/final-model.pt' --first_iteration '../data/lstm_input' --second_iteration '../data/lstm_bootstrap' --epochs 100 --retrained '../data/lstm_bootstrap/retrained_data.tsv' --output_folder '../output/no_char_2nd_iter'
  • The arguments needed to enrich the FashionBrain taxonomy:
    • taxonomy: Path to the FashionBrain taxonomy
    • test result: Path to the file containing the testing result filled with orange. Web safe RGB # F16823
   python taxonomy_matching.py --taxonomy '../data/enrichment_input/FBtaxonomy.csv' --test_result '../data/enrichment_input/test_result.txt'

fashion_nlp_v2's People

Contributors

inesarous avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.