• Stars
    star
    243
  • Rank 165,382 (Top 4 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This toolbox offers 13 wrapper feature selection methods (PSO, GA, GWO, HHO, BA, WOA, and etc.) with examples. It is simple and easy to implement.

Jx-WFST : Wrapper Feature Selection Toolbox

License GitHub release


"Toward Talent Scientist: Sharing and Learning Together" --- Jingwei Too


Wheel

Introduction

  • This toolbox offers 13 wrapper feature selection methods
  • The Demo_PSO provides an example of how to apply PSO on benchmark dataset
  • Source code of these methods are written based on pseudocode & paper

Usage

The main function jfs is adopted to perform feature selection. You may switch the algorithm by changing the pso in from FS.pso import jfs to other abbreviations

  • If you wish to use particle swarm optimization ( PSO ) then you may write
from FS.pso import jfs
  • If you want to use differential evolution ( DE ) then you may write
from FS.de import jfs

Input

  • feat : feature vector matrix ( Instance x Features )
  • label : label matrix ( Instance x 1 )
  • opts : parameter settings
    • N : number of solutions / population size ( for all methods )
    • T : maximum number of iterations ( for all methods )
    • k : k-value in k-nearest neighbor

Output

  • Acc : accuracy of validation model
  • fmdl : feature selection model ( It contains several results )
    • sf : index of selected features
    • nf : number of selected features
    • c : convergence curve

Example 1 : Particle Swarm Optimization ( PSO )

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from FS.pso import jfs   # change this to switch algorithm 
import matplotlib.pyplot as plt


# load data
data  = pd.read_csv('ionosphere.csv')
data  = data.values
feat  = np.asarray(data[:, 0:-1])   # feature vector
label = np.asarray(data[:, -1])     # label vector

# split data into train & validation (70 -- 30)
xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label)
fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest}

# parameter
k    = 5     # k-value in KNN
N    = 10    # number of particles
T    = 100   # maximum number of iterations
w    = 0.9
c1   = 2
c2   = 2
opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'w':w, 'c1':c1, 'c2':c2}

# perform feature selection
fmdl = jfs(feat, label, opts)
sf   = fmdl['sf']

# model with selected features
num_train = np.size(xtrain, 0)
num_valid = np.size(xtest, 0)
x_train   = xtrain[:, sf]
y_train   = ytrain.reshape(num_train)  # Solve bug
x_valid   = xtest[:, sf]
y_valid   = ytest.reshape(num_valid)  # Solve bug

mdl       = KNeighborsClassifier(n_neighbors = k) 
mdl.fit(x_train, y_train)

# accuracy
y_pred    = mdl.predict(x_valid)
Acc       = np.sum(y_valid == y_pred)  / num_valid
print("Accuracy:", 100 * Acc)

# number of selected features
num_feat = fmdl['nf']
print("Feature Size:", num_feat)

# plot convergence
curve   = fmdl['c']
curve   = curve.reshape(np.size(curve,1))
x       = np.arange(0, opts['T'], 1.0) + 1.0

fig, ax = plt.subplots()
ax.plot(x, curve, 'o-')
ax.set_xlabel('Number of Iterations')
ax.set_ylabel('Fitness')
ax.set_title('PSO')
ax.grid()
plt.show()

Example 2 : Genetic Algorithm ( GA )

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from FS.ga import jfs   # change this to switch algorithm 
import matplotlib.pyplot as plt


# load data
data  = pd.read_csv('ionosphere.csv')
data  = data.values
feat  = np.asarray(data[:, 0:-1])
label = np.asarray(data[:, -1])

# split data into train & validation (70 -- 30)
xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label)
fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest}

# parameter
k    = 5     # k-value in KNN
N    = 10    # number of chromosomes
T    = 100   # maximum number of generations
CR   = 0.8
MR   = 0.01
opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'CR':CR, 'MR':MR}

# perform feature selection
fmdl = jfs(feat, label, opts)
sf   = fmdl['sf']

# model with selected features
num_train = np.size(xtrain, 0)
num_valid = np.size(xtest, 0)
x_train   = xtrain[:, sf]
y_train   = ytrain.reshape(num_train)  # Solve bug
x_valid   = xtest[:, sf]
y_valid   = ytest.reshape(num_valid)  # Solve bug

mdl       = KNeighborsClassifier(n_neighbors = k) 
mdl.fit(x_train, y_train)

# accuracy
y_pred    = mdl.predict(x_valid)
Acc       = np.sum(y_valid == y_pred)  / num_valid
print("Accuracy:", 100 * Acc)

# number of selected features
num_feat = fmdl['nf']
print("Feature Size:", num_feat)

# plot convergence
curve   = fmdl['c']
curve   = curve.reshape(np.size(curve,1))
x       = np.arange(0, opts['T'], 1.0) + 1.0

fig, ax = plt.subplots()
ax.plot(x, curve, 'o-')
ax.set_xlabel('Number of Iterations')
ax.set_ylabel('Fitness')
ax.set_title('GA')
ax.grid()
plt.show()

Requirement

  • Python 3
  • Numpy
  • Pandas
  • Scikit-learn
  • Matplotlib

List of available wrapper feature selection methods

  • Note that the methods are altered so that they can be used in feature selection tasks
  • The extra parameters represent the parameter(s) other than population size and maximum number of iterations
  • Click on the name of method to view how to set the extra parameter(s)
  • Use the opts to set the specific parameters
  • If you do not set extra parameters then the algorithm will use default setting in here
No. Abbreviation Name Year Extra Parameters
13 hho Harris Hawk Optimization 2019 No
12 ssa Salp Swarm Algorithm 2017 No
11 woa Whale Optimization Algorithm 2016 Yes
10 sca Sine Cosine Algorithm 2016 Yes
09 ja Jaya Algorithm 2016 No
08 gwo Grey Wolf Optimizer 2014 No
07 fpa Flower Pollination Algorithm 2012 Yes
06 ba Bat Algorithm 2010 Yes
05 fa Firefly Algorithm 2010 Yes
04 cs Cuckoo Search Algorithm 2009 Yes
03 de Differential Evolution 1997 Yes
02 pso Particle Swarm Optimization 1995 Yes
01 ga Genetic Algorithm - Yes

More Repositories

1

Wrapper-Feature-Selection-Toolbox

This toolbox offers more than 40 wrapper feature selection methods include PSO, GA, DE, ACO, GSA, and etc. They are simple and easy to implement.
MATLAB
164
star
2

EMG-Feature-Extraction-Toolbox

This toolbox offers 40 feature extraction methods (EMAV, EWL, MAV, WL, SSC, ZC, and etc.) for Electromyography (EMG) signals applications.
MATLAB
79
star
3

EEG-Feature-Extraction-Toolbox

This toolbox offers 30 types of EEG feature extraction methods (HA, HM, HC, and etc.) for Electroencephalogram (EEG) applications.
MATLAB
69
star
4

Binary-Grey-Wolf-Optimization-for-Feature-Selection

Demonstration on how binary grey wolf optimization (BGWO) applied in the feature selection task.
MATLAB
33
star
5

Advanced-Feature-Selection-Toolbox

This toolbox offers advanced feature selection tools. Several modifications, variants, enhancements, or improvements of algorithms such as GWO, FPA, SCA, PSO and SSA are provided.
Python
31
star
6

Whale-Optimization-Algorithm-for-Feature-Selection

Application of Whale Optimization Algorithm (WOA) in the feature selection tasks.
MATLAB
23
star
7

Machine-Learning-Toolbox

This toolbox offers 8 machine learning methods including KNN, SVM, DA, DT, and etc., which are simpler and easy to implement.
MATLAB
22
star
8

Filter-Feature-Selection-Toolbox

Simple, fast and ease of implementation. The filter feature selection methods include Relief-F, PCC, TV, and NCA.
MATLAB
21
star
9

Binary-Harris-Hawk-Optimization-for-Feature-Selection

The binary version of Harris Hawk Optimization (HHO), called Binary Harris Hawk Optimization (BHHO) is applied for feature selection tasks.
MATLAB
12
star
10

Ant-Colony-Optimization-for-Feature-Selection

Implantation of ant colony optimization (ACO) without predetermined number of selected features in feature selection tasks.
MATLAB
11
star
11

Binary-Differential-Evolution-for-Feature-Selection

The binary version of Differential Evolution (DE), named as Binary Differential Evolution (BDE) is applied for feature selection tasks.
MATLAB
10
star
12

Sine-Cosine-Algorithm-for-Feature-Selection

Application of Sine Cosine Algorithm (SCA) in the feature selection tasks.
MATLAB
9
star
13

Neural-Network-Toolbox

This toolbox contains 6 types of neural networks, which is simple and easy to implement.
MATLAB
9
star
14

Salp-Swarm-Algorithm-for-Feature-Selection

Application of Salp Swarm Algorithm (SSA) in the feature selection tasks.
MATLAB
8
star
15

Equilibrium-Optimizer-for-Feature-Selection

Application of Equilibrium Optimizer (EO) in the feature selection tasks.
MATLAB
7
star
16

Binary-Particle-Swarm-Optimization-for-Feature-Selection

Simple algorithm shows how binary particle swarm optimization (BPSO) used in feature selection problem.
MATLAB
7
star
17

Particle-Swarm-Optimization-for-Feature-Selection

Application of Particle Swarm Optimization (PSO) in the feature selection tasks.
MATLAB
7
star
18

Henry-Gas-Solubility-Optimization-for-Feature-Selection

Application of Henry Gas Solubility Optimization (HGSO) in the feature selection tasks.
MATLAB
6
star
19

Binary-Dragonfly-Algorithm-for-Feature-Selection

Application of Binary Dragonfly Algorithm (BDA) in the feature selection tasks.
MATLAB
6
star
20

Genetic-Algorithm-for-Feature-Selection

Simple algorithm shows how the genetic algorithm (GA) used in the feature selection problem.
MATLAB
5
star
21

Ant-Colony-System-for-Feature-Selection

Application of ant colony optimization (ACO) for feature selection problems.
MATLAB
4
star
22

Deep-Learning-Toolbox-Python

This toolbox offers several deep learning methods, which are simple and easy to implement.
Python
4
star
23

Atom-Search-Optimization-for-Feature-Selection

Application of Atom Search Optimization (ASO) in the feature selection tasks.
MATLAB
4
star
24

Binary-Tree-Growth-Algorithm-for-Feature-Selection

A feature selection algorithm, named as Binary Tree Growth Algorithm (BTGA) is applied for feature selection tasks.
MATLAB
4
star
25

Deep-Learning-Toolbox

This toolbox offers convolution neural networks (CNN) using k-fold cross-validation, which are simple and easy to implement.
MATLAB
3
star
26

Binary-Atom-Search-Optimization-for-Feature-Selection

A new feature selection algorithm, named as Binary Atom Search Optimization (BASO) is applied for feature selection tasks.
MATLAB
3
star
27

Machine-Learning-Regression-Toolbox

This toolbox offers 7 machine learning methods for regression problems.
Python
2
star
28

Machine-Learning-Toolbox-Python

This toolbox offers 6 machine learning methods including KNN, SVM, LDA, DT, and etc., which are simpler and easy to implement.
Python
2
star
29

JingweiToo

1
star
30

Dimensionality-Reduction-Demonstration

Application of principal component analysis (PCA) for feature reduction.
MATLAB
1
star