Giter VIP home page Giter VIP logo

evoimp's Introduction

EvoImp

Multiple Imputation of Multi-label Classification Data With a Genetic Algorithm


About the paper

Article submitted to the journal Plos One, in the year 2023. This work proposed an method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios.

Authors (original paper)


Antonio F. L. Jacob Jr.

UEMA | UFMA

Fabrício A. do Carmo

UEMA

Ádamo L. Santana

Fuji Electric Co., Japan

Ewaldo Santana

UEMA | UFMA

Fábio M. F. Lobato

UFOPA | UEMA

Abstract

Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be ''drama'' and ''bibliography'' simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi-label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.

If you use any of the resources available here, to cite this work, please use:

Paper

Jacob Junior, A. F. L., do Carmo, F. A., de Santana, A. L., Santana, E. E. C., & Lobato, F. M. F. (2024). EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm. Plos one, 19(1), e0297147. https://doi.org/10.1371/journal.pone.0297147

Dataset

Antonio F. L. Jacob Jr., Fabrício A. do Carmo, Ádamo L. de Santana, Ewaldo Santana, & Fábio M. F. Lobato. (2023). Multi-Label Datasets with Missing Values [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7748933

Directory description
  • dist --> Includes a distribution version (.jar) and an example along with the folder structure required to run
  • src --> Contains the Java code of GAMultImp and the libraries used.
  • supp --> Presents supplementary files commented in the article, such as the complexity of the method, baseline tests, etc.

evoimp's People

Contributors

jacobjr avatar

Stargazers

Renan Victor avatar Aline Mariana avatar Jonas Carvalho avatar Thyago Rodrigues avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.