• Stars
    star
    539
  • Rank 79,348 (Top 2 %)
  • Language
    Julia
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 2 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distributed High-Performance Symbolic Regression in Julia

SymbolicRegression.jl searches for symbolic expressions which optimize a particular objective.

sr_animation.mp4
Latest release Documentation Forums Paper
version Dev Discussions Paper
Build status Coverage
CI Coverage Status
Aqua QA

Check out PySR for a Python frontend. Cite this software

Contents:

Contributors ✨

We are eager to welcome new contributors! If you have an idea for a new feature, don't hesitate to share it on the issues page or forums.

Mark Kittisopikul
Mark Kittisopikul

πŸ’» πŸ’‘ πŸš‡ πŸ“¦ πŸ“£ πŸ‘€ πŸ”§ ⚠️
T Coxon
T Coxon

πŸ› πŸ’» πŸ”Œ πŸ’‘ πŸš‡ 🚧 πŸ‘€ πŸ”§ ⚠️ πŸ““
Dhananjay Ashok
Dhananjay Ashok

πŸ’» 🌍 πŸ’‘ 🚧 ⚠️
Johan BlΓ₯bΓ€ck
Johan BlΓ₯bΓ€ck

πŸ› πŸ’» πŸ’‘ 🚧 πŸ“£ πŸ‘€ ⚠️ πŸ““
JuliusMartensen
JuliusMartensen

πŸ› πŸ’» πŸ“– πŸ”Œ πŸ’‘ πŸš‡ 🚧 πŸ“¦ πŸ“£ πŸ‘€ πŸ”§ πŸ““
ngam
ngam

πŸ’» πŸš‡ πŸ“¦ πŸ‘€ πŸ”§ ⚠️
Kaze Wong
Kaze Wong

πŸ› πŸ’» πŸ’‘ πŸš‡ 🚧 πŸ“£ πŸ‘€ πŸ”¬ πŸ““
Christopher Rackauckas
Christopher Rackauckas

πŸ› πŸ’» πŸ”Œ πŸ’‘ πŸš‡ πŸ“£ πŸ‘€ πŸ”¬ πŸ”§ ⚠️ πŸ““
Patrick Kidger
Patrick Kidger

πŸ› πŸ’» πŸ“– πŸ”Œ πŸ’‘ 🚧 πŸ“£ πŸ‘€ πŸ”¬ πŸ”§ ⚠️ πŸ““
Okon Samuel
Okon Samuel

πŸ› πŸ’» πŸ“– 🚧 πŸ’‘ πŸš‡ πŸ‘€ ⚠️ πŸ““
William Booth-Clibborn
William Booth-Clibborn

πŸ’» 🌍 πŸ“– πŸ““ 🚧 πŸ‘€ πŸ”§ ⚠️
Pablo Lemos
Pablo Lemos

πŸ› πŸ’‘ πŸ“£ πŸ‘€ πŸ”¬ πŸ““
Jerry Ling
Jerry Ling

πŸ› πŸ’» πŸ“– 🌍 πŸ’‘ πŸ“£ πŸ‘€ πŸ““
Charles Fox
Charles Fox

πŸ› πŸ’» πŸ’‘ 🚧 πŸ“£ πŸ‘€ πŸ”¬ πŸ““
Johann Brehmer
Johann Brehmer

πŸ’» πŸ“– πŸ’‘ πŸ“£ πŸ‘€ πŸ”¬ ⚠️ πŸ““
Marius Millea
Marius Millea

πŸ’» πŸ’‘ πŸ“£ πŸ‘€ πŸ““
Coba
Coba

πŸ› πŸ’» πŸ’‘ πŸ‘€ πŸ““
Pietro Monticone
Pietro Monticone

πŸ› πŸ“– πŸ’‘
Mateusz Kubica
Mateusz Kubica

πŸ“– πŸ’‘
Jay Wadekar
Jay Wadekar

πŸ› πŸ’‘ πŸ“£ πŸ”¬
Anthony Blaom, PhD
Anthony Blaom, PhD

πŸš‡ πŸ’‘ πŸ‘€
Jgmedina95
Jgmedina95

πŸ› πŸ’‘ πŸ‘€
Michael Abbott
Michael Abbott

πŸ’» πŸ’‘ πŸ‘€ πŸ”§
Oscar Smith
Oscar Smith

πŸ’» πŸ’‘
Eric Hanson
Eric Hanson

πŸ’‘ πŸ“£ πŸ““
Henrique Becker
Henrique Becker

πŸ’» πŸ’‘ πŸ‘€
qwertyjl
qwertyjl

πŸ› πŸ“– πŸ’‘ πŸ““
Rik Huijzer
Rik Huijzer

πŸ’‘ πŸš‡
Hongyu Wang
Hongyu Wang

πŸ’‘ πŸ“£ πŸ”¬
Saurav Maheshkar
Saurav Maheshkar

πŸ”§

Quickstart

Install in Julia with:

using Pkg
Pkg.add("SymbolicRegression")

MLJ Interface

The easiest way to use SymbolicRegression.jl is with MLJ. Let's see an example:

import SymbolicRegression: SRRegressor
import MLJ: machine, fit!, predict, report

# Dataset with two named features:
X = (a = rand(500), b = rand(500))

# and one target:
y = @. 2 * cos(X.a * 23.5) - X.b ^ 2 

# with some noise:
y = y .+ randn(500) .* 1e-3

model = SRRegressor(
    niterations=50,
    binary_operators=[+, -, *],
    unary_operators=[cos],
)

Now, let's create and train this model on our data:

mach = machine(model, X, y)

fit!(mach)

You will notice that expressions are printed using the column names of our table. If, instead of a table-like object, a simple array is passed (e.g., X=randn(100, 2)), x1, ..., xn will be used for variable names.

Let's look at the expressions discovered:

report(mach)

Finally, we can make predictions with the expressions on new data:

predict(mach, X)

This will make predictions using the expression selected using the function passed to selection_method. By default this selection is made a mix of accuracy and complexity. For example, we can make predictions using expression 2 with:

mach.model.selection_method = Returns(2)
predict(mach, X)

For fitting multiple outputs, one can use MultitargetSRRegressor. For a full list of options available to each regressor, see the API page.

Low-Level Interface

The heart of SymbolicRegression.jl is the equation_search function. This takes a 2D array and attempts to model a 1D array using analytic functional forms. Note: unlike the MLJ interface, this assumes column-major input of shape [features, rows].

import SymbolicRegression: Options, equation_search

X = randn(2, 100)
y = 2 * cos.(X[2, :]) + X[1, :] .^ 2 .- 2

options = Options(
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
    populations=20
)

hall_of_fame = equation_search(
    X, y, niterations=40, options=options,
    parallelism=:multithreading
)

You can view the resultant equations in the dominating Pareto front (best expression seen at each complexity) with:

import SymbolicRegression: calculate_pareto_frontier

dominating = calculate_pareto_frontier(hall_of_fame)

This is a vector of PopMember type - which contains the expression along with the score. We can get the expressions with:

trees = [member.tree for member in dominating]

Each of these equations is a Node{T} type for some constant type T (like Float32).

You can evaluate a given tree with:

import SymbolicRegression: eval_tree_array

tree = trees[end]
output, did_succeed = eval_tree_array(tree, X, options)

The output array will contain the result of the tree at each of the 100 rows. This did_succeed flag detects whether an evaluation was successful, or whether encountered any NaNs or Infs during calculation (such as, e.g., sqrt(-1)).

Constructing trees

You can also manipulate and construct trees directly. For example:

import SymbolicRegression: Options, Node, eval_tree_array

options = Options(;
    binary_operators=[+, -, *, ^, /], unary_operators=[cos, exp, sin]
)
x1, x2, x3 = [Node(; feature=i) for i=1:3]
tree = cos(x1 - 3.2 * x2) - x1^3.2

This tree has Float64 constants, so the type of the entire tree will be promoted to Node{Float64}.

We can convert all constants (recursively) to Float32:

float32_tree = convert(Node{Float32}, tree)

We can then evaluate this tree on a dataset:

X = rand(Float32, 3, 100)
output, did_succeed = eval_tree_array(tree, X, options)

Exporting to SymbolicUtils.jl

We can view the equations in the dominating Pareto frontier with:

dominating = calculate_pareto_frontier(hall_of_fame)

We can convert the best equation to SymbolicUtils.jl with the following function:

import SymbolicRegression: node_to_symbolic

eqn = node_to_symbolic(dominating[end].tree, options)
println(simplify(eqn*5 + 3))

We can also print out the full pareto frontier like so:

import SymbolicRegression: compute_complexity, string_tree

println("Complexity\tMSE\tEquation")

for member in dominating
    complexity = compute_complexity(member, options)
    loss = member.loss
    string = string_tree(member.tree, options)

    println("$(complexity)\t$(loss)\t$(string)")
end

Code structure

SymbolicRegression.jl is organized roughly as follows. Rounded rectangles indicate objects, and rectangles indicate functions.

(if you can't see this diagram being rendered, try pasting it into mermaid-js.github.io/mermaid-live-editor)

flowchart TB
    op([Options])
    d([Dataset])
    op --> ES
    d --> ES
    subgraph ES[equation_search]
        direction TB
        IP[sr_spawner]
        IP --> p1
        IP --> p2
        subgraph p1[Thread 1]
            direction LR
            pop1([Population])
            pop1 --> src[s_r_cycle]
            src --> opt[optimize_and_simplify_population]
            opt --> pop1
        end
        subgraph p2[Thread 2]
            direction LR
            pop2([Population])
            pop2 --> src2[s_r_cycle]
            src2 --> opt2[optimize_and_simplify_population]
            opt2 --> pop2
        end
        pop1 --> hof
        pop2 --> hof
        hof([HallOfFame])
        hof --> migration
        pop1 <-.-> migration
        pop2 <-.-> migration
        migration[migrate!]
    end
    ES --> output([HallOfFame])

The HallOfFame objects store the expressions with the lowest loss seen at each complexity.

The dependency structure of the code itself is as follows:

stateDiagram-v2
    AdaptiveParsimony --> Mutate
    AdaptiveParsimony --> Population
    AdaptiveParsimony --> RegularizedEvolution
    AdaptiveParsimony --> SingleIteration
    AdaptiveParsimony --> SymbolicRegression
    CheckConstraints --> Mutate
    CheckConstraints --> SymbolicRegression
    Complexity --> CheckConstraints
    Complexity --> HallOfFame
    Complexity --> LossFunctions
    Complexity --> Mutate
    Complexity --> Population
    Complexity --> SearchUtils
    Complexity --> SingleIteration
    Complexity --> SymbolicRegression
    ConstantOptimization --> Mutate
    ConstantOptimization --> SingleIteration
    Core --> AdaptiveParsimony
    Core --> CheckConstraints
    Core --> Complexity
    Core --> ConstantOptimization
    Core --> HallOfFame
    Core --> InterfaceDynamicExpressions
    Core --> LossFunctions
    Core --> Migration
    Core --> Mutate
    Core --> MutationFunctions
    Core --> PopMember
    Core --> Population
    Core --> Recorder
    Core --> RegularizedEvolution
    Core --> SearchUtils
    Core --> SingleIteration
    Core --> SymbolicRegression
    Dataset --> Core
    HallOfFame --> SearchUtils
    HallOfFame --> SingleIteration
    HallOfFame --> SymbolicRegression
    InterfaceDynamicExpressions --> LossFunctions
    InterfaceDynamicExpressions --> SymbolicRegression
    LossFunctions --> ConstantOptimization
    LossFunctions --> HallOfFame
    LossFunctions --> Mutate
    LossFunctions --> PopMember
    LossFunctions --> Population
    LossFunctions --> SymbolicRegression
    Migration --> SymbolicRegression
    Mutate --> RegularizedEvolution
    MutationFunctions --> Mutate
    MutationFunctions --> Population
    MutationFunctions --> SymbolicRegression
    Operators --> Core
    Operators --> Options
    Options --> Core
    OptionsStruct --> Core
    OptionsStruct --> Options
    PopMember --> ConstantOptimization
    PopMember --> HallOfFame
    PopMember --> Migration
    PopMember --> Mutate
    PopMember --> Population
    PopMember --> RegularizedEvolution
    PopMember --> SingleIteration
    PopMember --> SymbolicRegression
    Population --> Migration
    Population --> RegularizedEvolution
    Population --> SearchUtils
    Population --> SingleIteration
    Population --> SymbolicRegression
    ProgramConstants --> Core
    ProgramConstants --> Dataset
    ProgressBars --> SearchUtils
    ProgressBars --> SymbolicRegression
    Recorder --> Mutate
    Recorder --> RegularizedEvolution
    Recorder --> SingleIteration
    Recorder --> SymbolicRegression
    RegularizedEvolution --> SingleIteration
    SearchUtils --> SymbolicRegression
    SingleIteration --> SymbolicRegression
    Utils --> CheckConstraints
    Utils --> ConstantOptimization
    Utils --> Options
    Utils --> PopMember
    Utils --> SingleIteration
    Utils --> SymbolicRegression

Bash command to generate dependency structure from src directory (requires vim-stream):

echo 'stateDiagram-v2'
IFS=$'\n'
for f in *.jl; do
    for line in $(cat $f | grep -e 'import \.\.' -e 'import \.'); do
        echo $(echo $line | vims -s 'dwf:d$' -t '%s/^\.*//g' '%s/Module//g') $(basename "$f" .jl);
    done;
done | vims -l 'f a--> ' | sort

Search options

See https://astroautomata.com/SymbolicRegression.jl/stable/api/#Options

More Repositories

1

PySR

High-Performance Symbolic Regression in Python and Julia
Python
1,924
star
2

symbolic_deep_learning

Code for "Discovering Symbolic Models from Deep Learning with Inductive Biases"
Python
684
star
3

lagrangian_nns

Lagrangian Neural Networks
Python
394
star
4

awesome-ml-demos

Curated list of interactive ML demos
329
star
5

anki_science

Anki decks for physics, astronomy, computer science, machine learning, and statistics.
Jupyter Notebook
134
star
6

vim-stream

vims - an improved CLI for vim, to use it like sed or awk
Shell
103
star
7

gso

πŸƒ Google StackOverflow in Vim. Copy-pastes the code directly in your script.
Vim Script
79
star
8

AirspeedVelocity.jl

Easily benchmark a Julia package over its commit history
Julia
72
star
9

pysr_paper

A paper describing the implementation of PySR and SymbolicRegression.jl
TeX
42
star
10

sympy2jax

Turning SymPy expressions into JAX functions
Python
41
star
11

python_citations

Bibtex for various Python science and machine learning software
TeX
31
star
12

pysr_scaling_laws

You should use PySR to find scaling laws. Here's an example.
Python
29
star
13

gnn_resource_allocation

Code for our paper on doing resource allocation with graph neural networks
Python
25
star
14

DeprecateKeywords.jl

Macro for deprecating keyword parameters
Julia
9
star
15

bnn_chaos_model

Model and training code for Bayesian neural network for compact planetary instability
Python
8
star
16

rescue_time_statusbar

Show productivity pulse in macOS status bar
Python
7
star
17

easy_normalizing_flow

Simple normalizing flow with a conditional variable of any size
Python
7
star
18

pysr_tutorial

Jupyter Notebook
6
star
19

pysr_interactive

Experiments in creating a PySR web app
TypeScript
6
star
20

showyourwork_julia_example

TeX
6
star
21

htm

Human Task Manager. A featureful script-based task manager for projects and to-do's.
Python
5
star
22

easy_distributed_hyperopt

Do distributed hyperparameter optimization using only a shared folder between processes.
Python
5
star
23

ArguMend.jl

Autosuggestions for function keywords
Julia
5
star
24

differentiable_quantile_transform

Quantile transform that is differentiable for PyTorch
Python
3
star
25

public_CMD_normalizing_flow

A repository accompanying the CMD normalizing flow paper
3
star
26

clean_detex

Use grammar checkers on your LaTeX paper
Shell
3
star
27

xd_vs_flow

XD vs normalizing flows demo
Jupyter Notebook
2
star
28

conferences

Generate a Google calendar of conference deadlines
Python
2
star
29

galago

GaLaGo - Gregory and Loredo algorithm, GPU optimized
C++
2
star
30

git-stats

Calculate number of words changed on an overleaf doc (or any git repo), excluding text that was just moved.
Shell
2
star
31

paper-style-checkers

Style checklists for research papers, implemented programmatically in vim
1
star
32

PySR_Docs

Stores images for the main PySR documentation site
1
star
33

dockers

Docker images and tricks 🐳
Shell
1
star
34

tape_snake

Snake implemented on blinky tape
Python
1
star
35

research_match

Filter a list of names by research interest
Python
1
star
36

gifs

1
star
37

pysr_wandb

Python
1
star