• Stars
    star
    536
  • Rank 82,794 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Minimum-distortion embedding with PyTorch

PyMDE

PyPI version Conda Version

The official documentation for PyMDE is available at www.pymde.org.

This repository accompanies the monograph Minimum-Distortion Embedding.

PyMDE is a Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network, or any other abstract object.

What sets PyMDE apart from other embedding libraries is that it provides a simple but general framework for embedding, called Minimum-Distortion Embedding (MDE). With MDE, it is easy to recreate well-known embeddings and to create new ones, tailored to your particular application.

PyMDE is competitive in runtime with more specialized embedding methods. With a GPU, it can be even faster.

Overview

PyMDE can be enjoyed by beginners and experts alike. It can be used to:

  • visualize datasets, small or large;
  • generate feature vectors for supervised learning;
  • compress high-dimensional vector data;
  • draw graphs (in up to orders of magnitude less time than packages like NetworkX);
  • create custom embeddings, with custom objective functions and constraints (such as having uncorrelated feature columns);
  • and more.

PyMDE is very young software, under active development. If you run into issues, or have any feedback, please reach out by filing a Github issue.

This README gives a very brief overview of PyMDE. Make sure to read the official documentation at www.pymde.org, which has in-depth tutorials and API documentation.

Installation

PyMDE is available on the Python Package Index, and on Conda Forge.

To install with pip, use

pip install pymde

Alternatively, to install with conda, use

conda install -c pytorch -c conda-forge pymde

PyMDE has the following requirements:

  • Python >= 3.7
  • numpy >= 1.17.5
  • scipy
  • torch >= 1.7.1
  • torchvision >= 0.8.2
  • pynndescent
  • requests

Getting started

Getting started with PyMDE is easy. For embeddings that work out-of-the box, we provide two main functions:

pymde.preserve_neighbors

which preserves the local structure of original data, and

pymde.preserve_distances

which preserves pairwise distances or dissimilarity scores in the original data.

Arguments. The input to these functions is the original data, represented either as a data matrix in which each row is a feature vector, or as a (possibly sparse) graph encoding pairwise distances. The embedding dimension is specified by the embedding_dim keyword argument, which is 2 by default.

Return value. The return value is an MDE object. Calling the embed() method on this object returns an embedding, which is a matrix (torch.Tensor) in which each row is an embedding vector. For example, if the original input is a data matrix of shape (n_items, n_features), then the embedding matrix has shape (n_items, embeddimg_dim).

We give examples of using these functions below.

Preserving neighbors

The following code produces an embedding of the MNIST dataset (images of handwritten digits), in a fashion similar to LargeVis, t-SNE, UMAP, and other neighborhood-based embeddings. The original data is a matrix of shape (70000, 784), with each row representing an image.

import pymde

mnist = pymde.datasets.MNIST()
embedding = pymde.preserve_neighbors(mnist.data, verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

Unlike most other embedding methods, PyMDE can compute embeddings that satisfy constraints. For example:

embedding = pymde.preserve_neighbors(mnist.data, constraint=pymde.Standardized(), verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

The standardization constraint enforces the embedding vectors to be centered and have uncorrelated features.

Preserving distances

The function pymde.preserve_distances is useful when you're more interested in preserving the gross global structure instead of local structure.

Here's an example that produces an embedding of an academic coauthorship network, from Google Scholar. The original data is a sparse graph on roughly 40,000 authors, with an edge between authors who have collaborated on at least one paper.

import pymde

google_scholar = pymde.datasets.google_scholar()
embedding = pymde.preserve_distances(google_scholar.data, verbose=True).embed()
pymde.plot(embedding, color_by=google_scholar.attributes['coauthors'], color_map='viridis', background_color='black')

More collaborative authors are colored brighter, and are near the center of the embedding.

Example notebooks

We have several example notebooks that show how to use PyMDE on real (and synthetic) datasets.

Citing

To cite our work, please use the following BibTex entry.

@article{agrawal2021minimum,
  author  = {Agrawal, Akshay and Ali, Alnur and Boyd, Stephen},
  title   = {Minimum-Distortion Embedding},
  journal = {arXiv},
  year    = {2021},
}

PyMDE was designed and developed by Akshay Agrawal.

More Repositories

1

cvxpylayers

Differentiable convex optimization layers
Python
1,788
star
2

cvxportfolio

Portfolio optimization and back-testing.
Python
966
star
3

scs

Splitting Conic Solver
C
547
star
4

cvxbook_additional_exercises

Additional exercises and data for EE364a. No solutions; for public consumption.
Julia
544
star
5

cvx_short_course

Materials for a short course on convex optimization.
Jupyter Notebook
327
star
6

CVXR

An R modeling language for convex optimization problems.
R
206
star
7

proximal

Sample implementations of proximal operators
MATLAB
186
star
8

cvxpygen

Code generation with CVXPY
Python
127
star
9

dccp

A CVXPY extension for convex-concave programming
Python
123
star
10

qcqp

A CVXPY extension for handling nonconvex QCQP via Suggest-and-Improve framework
Python
106
star
11

GGS

Greedy Gaussian Segmentation
Python
96
star
12

diffcp

Differentiation through cone programs
Python
91
star
13

cocp

Source code for the examples accompanying the paper "Learning convex optimization control policies."
Jupyter Notebook
80
star
14

ncvx

Python
73
star
15

cvxflow

Python
66
star
16

signal-decomposition

A simple and general framework for signal decomposition
Jupyter Notebook
60
star
17

auto_ks

Repository for "Fitting a Kalman Smoother to Data"
Python
55
star
18

cov_pred_finance

Jupyter Notebook
54
star
19

cvxpower

Power Network Optimization and Simulation.
Python
48
star
20

dmcp

A CVXPY extension for multi-convex programming
Python
45
star
21

qcml

A Python parser for generating Python/C/Matlab solver interfaces
Python
43
star
22

CVXcanon

C++
42
star
23

miqp_admm

ADMM for Mixed-Integer Quadratic Programming
C
41
star
24

vwap_opt_exec

Volume Weighted Average Price Optimal Execution
Jupyter Notebook
41
star
25

fastpathplanning

A fast algorithm for finding an optimal path in a collection of safe boxes
Python
37
star
26

simulator

Tool to support backtests
Jupyter Notebook
36
star
27

cptopt

Portfolio Optimization with Cumulative Prospect Theory Utility via Convex Optimization
Python
31
star
28

a2dr

Anderson accelerated Douglas-Rachford splitting
Python
29
star
29

kelly_code

Code and examples for the project on risk-constrained Kelly gambling
Jupyter Notebook
26
star
30

strat_models

A distributed method for fitting Laplacian regularized stratified models.
Python
25
star
31

dsp

A CVXPY extension for saddle problems
Python
24
star
32

cvxmarkowitz

Jupyter Notebook
23
star
33

nonexp_global_aa1

Globally Convergent Type-I Anderson Acceleration for Non-Smooth Fixed-Point Iterations
MATLAB
21
star
34

osc

C package performing operator splitting for control
C
21
star
35

markowitz-reference

This repository contains a reference implementation of the Markowitz portfolio optimization problem discussed in the paper Markowitz Portfolio Construction at Seventy.
Python
20
star
36

exp_util_gm_portfolio_opt

Minimal entropic value at risk (EVaR) portfolio construction under a Gaussian mixture model of returns.
Python
20
star
37

pdos

Primal-Dual Operator Splitting Method for Conic Optimization
C
20
star
38

cvxstatarb

Jupyter Notebook
19
star
39

aa

Anderson Acceleration
Jupyter Notebook
19
star
40

covpred

Covariance prediction via convex optimization
Python
18
star
41

rsw

rsw: optimal representative sample weighting.
Python
17
star
42

l1_ls

This is the repository for the l1_ls, a simple Matlab solver for l1-regularized least squares problems.
MATLAB
17
star
43

cvx_opt_risk_neutral

Convex optimization over risk-neutral probabilities.
Jupyter Notebook
14
star
44

cvxpyrepair

Code for "Automatic repair of convex optimization problems".
Python
14
star
45

osmm

oracle-structured minimization method
Python
13
star
46

lrsm_portfolio

Portfolio Construction using Stratified Models
Jupyter Notebook
12
star
47

robust_bond_portfolio

Robust Bond Portfolio Construction via Convex-Concave Saddle Point Optimization
Python
10
star
48

mkvchain

Fitting Feature-Dependent Markov Chains
Jupyter Notebook
10
star
49

icqm

MATLAB script for approximating the solution to the integer convex quadratic minimization problem
MATLAB
10
star
50

subgradpy

Subgradient calculator for Python
Python
9
star
51

cone_prog_refine

Cone program refinement
Python
9
star
52

PrincipalTimeSeries

MATLAB
9
star
53

torch_linops

A library to define abstract linear operators, and associated algebra and matrix-free algorithms, that works with pyTorch Tensors.
Python
9
star
54

cvxrisk

Compile risk with cvxpy
Jupyter Notebook
9
star
55

vgi

Value-gradient iteration for convex stochastic control
Python
8
star
56

SURE-CR

Tractable evaluation of Stein's Unbiased Risk Estimator on convexly regularized estimators
Python
8
star
57

OSBDO

Oracle-Structured Bundle Distributed Optimization (OSBDO)
Python
7
star
58

sigopt

Solvers for sigmoidal programming problems
Python
7
star
59

cvxcla

critical line algorithm for efficient frontier
Jupyter Notebook
7
star
60

qss

QSS: Quadratic-Separable Solver
Jupyter Notebook
7
star
61

spcqe

Smooth periodic consistent quantile estimation
Jupyter Notebook
7
star
62

low_rank_forecasting_code

Code for "Low Rank Forecasting" paper.
Jupyter Notebook
6
star
63

graph_isom

Python
6
star
64

mlr_fitting

Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices
Jupyter Notebook
6
star
65

WaveOperators.jl

Building matrices in physics is hard; that's why this package exists.
Julia
6
star
66

l1_tf

This is the repository for the l1_tf, software for l1 trend filtering.
C
6
star
67

cvx-docker

Docker image containing CVXPY and other cvxgrp libraries
6
star
68

cvx-finance-examples

Makefile
6
star
69

lfd_lqr

Code for "Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint"
Jupyter Notebook
5
star
70

lass

Linear algebra for structured sparse matrices
Python
5
star
71

sccf

Repository for "Minimizing a sum of clipped convex functions" paper
Python
5
star
72

mm_dist_lapl

Python
5
star
73

ls-spa

A package for efficient Shapley performance attribution for least-squares problems
Python
5
star
74

conda-recipes

Anaconda recipes for cvxgrp python packages
Shell
4
star
75

joint-lrsm

Joint graph learning and model fitting in Laplacian Regularized Stratified Models
Python
4
star
76

multi_period_liability_clearing

Code for the paper "Multi-period liability clearing via convex optimal control"
Python
4
star
77

l1_logreg

This is the repository for the l1_logreg, l1-regularized logistic regression problem solver.
C
3
star
78

resalloc

Efficient allocation of fungible resources
Jupyter Notebook
3
star
79

multilevel_factor_model

Fitting multilevel factor model
Jupyter Notebook
3
star
80

n-queens

Python
2
star
81

PhysicalBounds.jl

Julia
2
star
82

incre_prox_mf_mpc

code for the paper Incremental Proximal Multi-Forecast Model Predictive Control
Jupyter Notebook
2
star
83

home-energy-management

Home energy management with dynamic tariffs and tiered peak power charges.
Jupyter Notebook
2
star
84

cvxcli

Example cli using fire, poetry and pipx
Python
2
star
85

boolprob

A Python tool to analyze joint distributions of boolean random variables
Python
2
star
86

cvxbson

dealing with json and bson files
Python
2
star
87

opt_cap_res

Solves the problem of reserving link capacity in a network in such a way that any of a given set of flow scenarios can be supported.
Python
2
star
88

smooth_multiperiodic_forecasting_experiments

Notebook accompanying numerical results section of the paper "Interpretable Net Load Forecasting Using Smooth Multiperiodic Features".
Jupyter Notebook
2
star
89

ewmm_code

Code for the EWMM paper
Jupyter Notebook
2
star
90

pv_bundt_cake

Code reproducing results of the paper "Time Dilated Bundt Cake Analysis of PV Output"
Jupyter Notebook
2
star
91

rerm_code

Public code for Robust Empirical Risk Minimization Paper
Python
1
star
92

ls-spa-benchmark

Python
1
star
93

extquadcontrol

Python
1
star
94

convexjl

A julia package for disciplined convex programming.
1
star
95

cvx_stat_arb

Jupyter Notebook
1
star
96

cvxbacktest

Python
1
star
97

coneos

C package that solves convex cone problems via operator splitting (DEPRECATED, new project https://github.com/cvxgrp/scs)
C
1
star
98

pd-heuristics-and-bounds

Julia
1
star
99

boilerplate

We use this repo to automate and avoid boilerplate issue
Python
1
star
100

randalo

Python
1
star