• Stars
    star
    133
  • Rank 272,600 (Top 6 %)
  • Language
    Python
  • License
    GNU Lesser Genera...
  • Created over 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An open source python library for automated feature engineering based on Genetic Programming

Evolutionary Forest

Documentation Status Updates

An open source python library for automated feature engineering based on Genetic Programming

Introduction

Feature engineering is a long-standing issue that has plagued machine learning practitioners for many years. Deep learning techniques have significantly reduced the need for manual feature engineering in recent years. However, a critical issue is that the features discovered by deep learning methods are difficult to interpret.

In the domain of interpretable machine learning, genetic programming has demonstrated to be a promising method for automated feature construction, as it can improve the performance of traditional machine learning systems while maintaining similar interpretability. Nonetheless, such a potent method is rarely mentioned by practitioners. We believe that the main reason for this phenomenon is that there is still a lack of a mature package that can automatically build features based on the genetic programming algorithm. As a result, we propose this package with the goal of providing a powerful feature construction tool for enhancing existing state-of-the-art machine learning algorithms, particularly decision-tree based algorithms.

Features

  • A powerful feature construction tool for generating interpretable machine learning features.
  • A reliable machine learning model has powerful performance on the small dataset.

Installation

From PyPI:

pip install -U evolutionary_forest

From GitHub (Latest Code):

pip install git+https://github.com/hengzhe-zhang/EvolutionaryForest.git

Supported Algorithms

Example

An example of usage:

X, y = load_diabetes(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
r = EvolutionaryForestRegressor(max_height=3, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True)
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))

An example of improvements brought about by constructed features:

https://raw.githubusercontent.com/zhenlingcn/EvolutionaryForest/master/docs/constructed_features.png

Tutorials

Here are some nodebook examples of using Evolutionary Forest:

Documentation

Tutorial: English Version | 中文版本

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Citation

Please cite our paper if you find it helpful :)

@article{zhang2021evolutionary,
  title={An Evolutionary Forest for Regression},
  author={Zhang, Hengzhe and Zhou, Aimin and Zhang, Hu},
  journal={IEEE Transactions on Evolutionary Computation},
  volume={26},
  number={4},
  pages={735--749},
  year={2021},
  publisher={IEEE}
}

@article{zhang2023sr,
  title={SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method},
  author={Zhang, Hengzhe and Zhou, Aimin and Chen, Qi and Xue, Bing and Zhang, Mengjie},
  journal={IEEE Transactions on Evolutionary Computation},
  year={2023},
  publisher={IEEE}
}

More Repositories

1

awesome-genetic-programming

A curated list of resources for genetic programming.
45
star
2

PS-Tree

An open source python library for non-linear piecewise symbolic regression based on Genetic Programming
Python
25
star
3

DEAP-GP-Tutorial

A Tutorial for DEAP
Jupyter Notebook
19
star
4

scikit-obliquetree

Oblique Decision Tree in Python
Python
16
star
5

scikit-rtdl

A scikit-learn compatible neural network library based on "Revisiting Tabular Deep Learning" (RTDL)
Python
10
star
6

RL-GEP

A hybridization method of genetic programming and reinforcement learning for symbolic regression
Python
8
star
7

GPED

A re-implementation of GPED
Python
4
star
8

SR-Forest

An open source python library for GP-based ensemble learning methods
Python
4
star
9

GPTP-PPT

3
star
10

RM-MEDA

RM-MEDA Python3
Python
3
star
11

FlexAutoML

A comprehensive, end-to-end solution designed to streamline your machine learning pipeline tasks.
Python
3
star
12

KneePoint-Python

A package for knee point selection
Python
3
star
13

DoubleLexicaseSelection

GECCO 2023
Python
3
star
14

Scikit-MTR

A multi-target regression framework
Python
3
star
15

EFS-RL

Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis
Python
2
star
16

EvolutionaryParameterGrid

Evolutionary Parameter Grid
Jupyter Notebook
2
star
17

Cosine-MAP-Elites-GP

Python
2
star
18

EF-Experiment

Here is some code related to EF experimentation
Python
1
star
19

SlurmNeSI

This is a simple Slurm script that runs Python jobs on a NeSI cluster.
Python
1
star
20

Measuring-VC-Dimension

Measuring the VC-dimension using experimental method.
Python
1
star
21

GP-GPPI

Python
1
star
22

EuroGP-2023-Recap

EuroGP 2023 Recap
TeX
1
star
23

EuroGP-2024-PPT

1
star