Giter VIP home page Giter VIP logo

matchxml's Introduction

MatchXML

This is the official repo of the paper MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

Install the environment

  • Create a virtual environment
    # We recommend you to use Anaconda to create a conda environment 
    conda create --name matchxml python=3.8
    conda activate matchxml
  • Install the required software:
    pip install -r requirements.txt

Prepare Data

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

  • Download six XMC datasets from XR-Transformer

  • Download our trained label embeddings from Google Drive and save them to xmc-base/{dataset}

  • Download our static text features(static sentence embeddings + TF-IDF features) from Google Drive and save them to xmc-base/{dataset}/tfidf-attnxml, replace the original TF-IDF features.

Train MatchXML and evaluation

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash run.sh {dataset}

Train label2vec

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash ./label2vec_run/{dataset}.sh

Generate static text features

python sentence_embedding.py

Pre-trained models

  • Our pre-trained models can be downloaded from Google Drive

Citation

If you find this work useful in your research, please consider citing:

@article{ye2024matchxml,
  title={MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification},
  author={Ye, Hui and Sunderraman, Rajshekhar and Ji, Shihao},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2024},
  publisher={IEEE}
}

matchxml's People

Contributors

huiyegit avatar shihaoji avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.