Giter VIP home page Giter VIP logo

wangxb96 / eode Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 96.17 MB

Code for: Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner -- [IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB 24)]

License: MIT License

MATLAB 99.60% M 0.40%
cancer-classification clustering ensemble-learning feature-selection grey-wolf-optimizer model-selection nature-inspired-algorithms

eode's Introduction

Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner

Xubin Wang1 · Yunhe Wang2* · Zhiqing Ma3 · Ka-Chun Wong4 · Xiangtao Li1*

1Jilin University · 2Hebei University of Technology · 3Northeast Normal University · 4City University of Hong Kong

*corresponding authors

PDF · Code

Contents

Overview

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers.

Framework

model Overview of the proposed EODE algorithm: In the GWO feature selection phase, the original cancer gene expression training data is utilized to train all base classifiers, and the classifier with the highest performance is selected as the evaluation classifier. The processed data is then optimized to construct an ensemble model. Specifically, the training data is incrementally clustered using the K-means method to form subspace clusters. These clusters are used to train individual base classifiers, which are then added to the model pool. Any classifiers in the pool with below-average performance are filtered out. Next, the GWO is applied to optimize the classifier pool and determine the best possible ensemble combination. Finally, the optimized ensemble model is evaluated on the independent test dataset using a plurality voting strategy to generate the final cancer type predictions.

Data and Baseline Availability

  • ComparisonMethods: The baselines for comparison, including nature-inspired methods, machine learning methods and ensemble methods.
  • OriginalData: The original data. They were randomly divided into the training set and the test set in an 8:2 ratio.
  • TrainData: Training data used in the experiment.
  • TestData: Test data used in the experiment.

Dependencies

  • This project was developed with MATLAB 2021a. Early versions of MATLAB may have incompatibilities.

Instructions

1. Main Code

  • EODE.m (This is the main file of the proposed model)
    • You can replace your data in the Problem. For example:
      • Problem = {'The_name_of_your_own_data'};
    • How to load your own data?
        traindata = load(['C:\Users\c\Desktop\EODE\train\',p_name]);
        traindata = getfield(traindata, p_name);
        data = traindata;
        feat = data(:,1:end-1); 
        label = data(:,end);
      
    • You can set the number of iterations of the whole experiment through numRun
    • The file path can be replaced under traindata and testdata
    • The parameters of GWO algorithm can be replaced in:
      • opts.k = 3; % number of k in K-nearest neighbor
      • opts.N = 100; % number of solutions
      • opts.T = 50; % maximum number of iterations

To reproduce our experiments, you can run EODE.m ten times and take the average of the results.

2. Data Partition

  • DataPartition.m (This file is used to divide the raw data in a 8:2 ratio)

3. Feature Selection Phase

  • jGreyWolfOptimizer.m (To find an optimal feature subset)

4. Classifier Generation Phase

  • generateClusters.m (To generate multiple clusters)
  • trainClassifiers.m (To train base classifiers use these clusters)

5. Classifier Pool Optimization Phase

  • classifierSelectionGWO.m (Use GWO algorithm to find an optimal classifier set)
  • GWOPredict.m

6. Model Fusion

  • fusion.m

7. Fitness Function

  • jFeatureSelectionFunction.m
  • jFitnessFunction.m

Results

We conducted experiments on 35 datasets encompassing various cancer types, and the results demonstrate the effectiveness of our algorithm compared to four nature-inspired ensemble methods (PSOEL, EAEL, FESM, and GA-Bagging-SVM), six benchmark machine learning algorithms (KNN, DT, ANN, SVM, DISCR, and NB), six state-of-the-art ensemble algorithms (RF, ADABOOST, RUSBOOST, SUBSPACE, TOTALBOOST, and LPBOOST), and seven nature-inspired methods (ACO, CS, DE, GA, GWO, PSO, and ABC). Our algorithm outperformed these methods in terms of classification accuracy.

Cite Our Work

@article{wang2024eode,
  title={Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner},
  author={Wang, Xubin and Wang, Yunhe and Ma, Zhiqiang and Wong, Ka-Chun and Li, Xiangtao},
  journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
  year={2024},
  publisher={IEEE/ACM}
}

Contact

wangxb19 at mails.jlu.edu.cn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.