• Stars
    star
    516
  • Rank 82,965 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Minimum-distortion embedding with PyTorch

PyMDE

PyPI version Conda Version

The official documentation for PyMDE is available at www.pymde.org.

This repository accompanies the monograph Minimum-Distortion Embedding.

PyMDE is a Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network, or any other abstract object.

What sets PyMDE apart from other embedding libraries is that it provides a simple but general framework for embedding, called Minimum-Distortion Embedding (MDE). With MDE, it is easy to recreate well-known embeddings and to create new ones, tailored to your particular application.

PyMDE is competitive in runtime with more specialized embedding methods. With a GPU, it can be even faster.

Overview

PyMDE can be enjoyed by beginners and experts alike. It can be used to:

  • visualize datasets, small or large;
  • generate feature vectors for supervised learning;
  • compress high-dimensional vector data;
  • draw graphs (in up to orders of magnitude less time than packages like NetworkX);
  • create custom embeddings, with custom objective functions and constraints (such as having uncorrelated feature columns);
  • and more.

PyMDE is very young software, under active development. If you run into issues, or have any feedback, please reach out by filing a Github issue.

This README gives a very brief overview of PyMDE. Make sure to read the official documentation at www.pymde.org, which has in-depth tutorials and API documentation.

Installation

PyMDE is available on the Python Package Index, and on Conda Forge.

To install with pip, use

pip install pymde

Alternatively, to install with conda, use

conda install -c pytorch -c conda-forge pymde

PyMDE has the following requirements:

  • Python >= 3.7
  • numpy >= 1.17.5
  • scipy
  • torch >= 1.7.1
  • torchvision >= 0.8.2
  • pynndescent
  • requests

Getting started

Getting started with PyMDE is easy. For embeddings that work out-of-the box, we provide two main functions:

pymde.preserve_neighbors

which preserves the local structure of original data, and

pymde.preserve_distances

which preserves pairwise distances or dissimilarity scores in the original data.

Arguments. The input to these functions is the original data, represented either as a data matrix in which each row is a feature vector, or as a (possibly sparse) graph encoding pairwise distances. The embedding dimension is specified by the embedding_dim keyword argument, which is 2 by default.

Return value. The return value is an MDE object. Calling the embed() method on this object returns an embedding, which is a matrix (torch.Tensor) in which each row is an embedding vector. For example, if the original input is a data matrix of shape (n_items, n_features), then the embedding matrix has shape (n_items, embeddimg_dim).

We give examples of using these functions below.

Preserving neighbors

The following code produces an embedding of the MNIST dataset (images of handwritten digits), in a fashion similar to LargeVis, t-SNE, UMAP, and other neighborhood-based embeddings. The original data is a matrix of shape (70000, 784), with each row representing an image.

import pymde

mnist = pymde.datasets.MNIST()
embedding = pymde.preserve_neighbors(mnist.data, verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

Unlike most other embedding methods, PyMDE can compute embeddings that satisfy constraints. For example:

embedding = pymde.preserve_neighbors(mnist.data, constraint=pymde.Standardized(), verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

The standardization constraint enforces the embedding vectors to be centered and have uncorrelated features.

Preserving distances

The function pymde.preserve_distances is useful when you're more interested in preserving the gross global structure instead of local structure.

Here's an example that produces an embedding of an academic coauthorship network, from Google Scholar. The original data is a sparse graph on roughly 40,000 authors, with an edge between authors who have collaborated on at least one paper.

import pymde

google_scholar = pymde.datasets.google_scholar()
embedding = pymde.preserve_distances(google_scholar.data, verbose=True).embed()
pymde.plot(embedding, color_by=google_scholar.attributes['coauthors'], color_map='viridis', background_color='black')

More collaborative authors are colored brighter, and are near the center of the embedding.

Example notebooks

We have several example notebooks that show how to use PyMDE on real (and synthetic) datasets.

Citing

To cite our work, please use the following BibTex entry.

@article{agrawal2021minimum,
  author  = {Agrawal, Akshay and Ali, Alnur and Boyd, Stephen},
  title   = {Minimum-Distortion Embedding},
  journal = {arXiv},
  year    = {2021},
}

PyMDE was designed and developed by Akshay Agrawal.

More Repositories

1

cvxpylayers

Differentiable convex optimization layers
Python
1,718
star
2

cvxportfolio

Portfolio optimization and back-testing.
Python
782
star
3

scs

Splitting Conic Solver
C
523
star
4

cvxbook_additional_exercises

Additional exercises and data for EE364a. No solutions; for public consumption.
Julia
435
star
5

cvx_short_course

Materials for a short course on convex optimization.
Jupyter Notebook
309
star
6

CVXR

An R modeling language for convex optimization problems.
R
195
star
7

proximal

Sample implementations of proximal operators
MATLAB
181
star
8

dccp

A CVXPY extension for convex-concave programming
Python
122
star
9

cvxpygen

Code generation with CVXPY
Python
119
star
10

qcqp

A CVXPY extension for handling nonconvex QCQP via Suggest-and-Improve framework
Python
103
star
11

GGS

Greedy Gaussian Segmentation
Python
90
star
12

diffcp

Differentiation through cone programs
Python
86
star
13

cocp

Source code for the examples accompanying the paper "Learning convex optimization control policies."
Jupyter Notebook
78
star
14

ncvx

Python
69
star
15

cvxflow

Python
66
star
16

signal-decomposition

A simple and general framework for signal decomposition
Jupyter Notebook
55
star
17

auto_ks

Repository for "Fitting a Kalman Smoother to Data"
Python
51
star
18

cvxpower

Power Network Optimization and Simulation.
Python
47
star
19

cov_pred_finance

Jupyter Notebook
45
star
20

dmcp

A CVXPY extension for multi-convex programming
Python
43
star
21

CVXcanon

C++
42
star
22

qcml

A Python parser for generating Python/C/Matlab solver interfaces
Python
41
star
23

miqp_admm

ADMM for Mixed-Integer Quadratic Programming
C
39
star
24

vwap_opt_exec

Volume Weighted Average Price Optimal Execution
Jupyter Notebook
33
star
25

simulator

Tool to support backtests
Jupyter Notebook
32
star
26

a2dr

Anderson accelerated Douglas-Rachford splitting
Python
29
star
27

cptopt

Portfolio Optimization with Cumulative Prospect Theory Utility via Convex Optimization
Python
27
star
28

fastpathplanning

A fast algorithm for finding an optimal path in a collection of safe boxes
Python
27
star
29

strat_models

A distributed method for fitting Laplacian regularized stratified models.
Python
25
star
30

kelly_code

Code and examples for the project on risk-constrained Kelly gambling
Jupyter Notebook
24
star
31

dsp

A CVXPY extension for saddle problems
Python
21
star
32

osc

C package performing operator splitting for control
C
19
star
33

pdos

Primal-Dual Operator Splitting Method for Conic Optimization
C
19
star
34

nonexp_global_aa1

Globally Convergent Type-I Anderson Acceleration for Non-Smooth Fixed-Point Iterations
MATLAB
18
star
35

covpred

Covariance prediction via convex optimization
Python
18
star
36

aa

Anderson Acceleration
Jupyter Notebook
18
star
37

l1_ls

This is the repository for the l1_ls, a simple Matlab solver for l1-regularized least squares problems.
MATLAB
16
star
38

exp_util_gm_portfolio_opt

Minimal entropic value at risk (EVaR) portfolio construction under a Gaussian mixture model of returns.
Python
16
star
39

rsw

rsw: optimal representative sample weighting.
Python
15
star
40

cvxpyrepair

Code for "Automatic repair of convex optimization problems".
Python
14
star
41

osmm

oracle-structured minimization method
Python
13
star
42

cvx_opt_risk_neutral

Convex optimization over risk-neutral probabilities.
Jupyter Notebook
12
star
43

cvxstatarb

Jupyter Notebook
12
star
44

lrsm_portfolio

Portfolio Construction using Stratified Models
Jupyter Notebook
12
star
45

cvxmarkowitz

Jupyter Notebook
11
star
46

mkvchain

Fitting Feature-Dependent Markov Chains
Jupyter Notebook
10
star
47

cone_prog_refine

Cone program refinement
Python
9
star
48

icqm

MATLAB script for approximating the solution to the integer convex quadratic minimization problem
MATLAB
9
star
49

subgradpy

Subgradient calculator for Python
Python
8
star
50

PrincipalTimeSeries

MATLAB
8
star
51

torch_linops

A library to define abstract linear operators, and associated algebra and matrix-free algorithms, that works with pyTorch Tensors.
Python
8
star
52

vgi

Value-gradient iteration for convex stochastic control
Python
8
star
53

robust_bond_portfolio

Robust Bond Portfolio Construction via Convex-Concave Saddle Point Optimization
Python
8
star
54

sigopt

Solvers for sigmoidal programming problems
Python
7
star
55

cvxcla

critical line algorithm for efficient frontier
Jupyter Notebook
7
star
56

qss

QSS: Quadratic-Separable Solver
Jupyter Notebook
7
star
57

OSBDO

Oracle-Structured Bundle Distributed Optimization (OSBDO)
Python
7
star
58

SURE-CR

Tractable evaluation of Stein's Unbiased Risk Estimator on convexly regularized estimators
Python
7
star
59

mlr_fitting

Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices
Jupyter Notebook
6
star
60

WaveOperators.jl

Building matrices in physics is hard; that's why this package exists.
Julia
6
star
61

markowitz-reference

Python
6
star
62

l1_tf

This is the repository for the l1_tf, software for l1 trend filtering.
C
6
star
63

cvx-docker

Docker image containing CVXPY and other cvxgrp libraries
6
star
64

spcqe

Smooth periodic consistent quantile estimation
Jupyter Notebook
6
star
65

low_rank_forecasting_code

Code for "Low Rank Forecasting" paper.
Jupyter Notebook
5
star
66

lass

Linear algebra for structured sparse matrices
Python
5
star
67

sccf

Repository for "Minimizing a sum of clipped convex functions" paper
Python
5
star
68

cvxrisk

Compile risk with cvxpy
Jupyter Notebook
5
star
69

graph_isom

Python
4
star
70

conda-recipes

Anaconda recipes for cvxgrp python packages
Shell
4
star
71

lfd_lqr

Code for "Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint"
Jupyter Notebook
4
star
72

mm_dist_lapl

Python
4
star
73

multi_period_liability_clearing

Code for the paper "Multi-period liability clearing via convex optimal control"
Python
4
star
74

ls-spa

Python
4
star
75

l1_logreg

This is the repository for the l1_logreg, l1-regularized logistic regression problem solver.
C
3
star
76

resalloc

Efficient allocation of fungible resources
Jupyter Notebook
3
star
77

joint-lrsm

Joint graph learning and model fitting in Laplacian Regularized Stratified Models
Python
3
star
78

n-queens

Python
2
star
79

PhysicalBounds.jl

Julia
2
star
80

boolprob

A Python tool to analyze joint distributions of boolean random variables
Python
2
star
81

cvxcli

Example cli using fire, poetry and pipx
Python
2
star
82

cvxbson

dealing with json and bson files
Python
2
star
83

opt_cap_res

Solves the problem of reserving link capacity in a network in such a way that any of a given set of flow scenarios can be supported.
Python
2
star
84

rerm_code

Public code for Robust Empirical Risk Minimization Paper
Python
1
star
85

ls-spa-benchmark

Python
1
star
86

extquadcontrol

Python
1
star
87

convexjl

A julia package for disciplined convex programming.
1
star
88

boilerplate

We use this repo to automate and avoid boilerplate issue
Python
1
star
89

incre_prox_mf_mpc

code for the paper Incremental Proximal Multi-Forecast Model Predictive Control
Jupyter Notebook
1
star
90

home-energy-management

Home energy management with dynamic tariffs and tiered peak power charges.
Jupyter Notebook
1
star
91

cvx_stat_arb

Jupyter Notebook
1
star
92

cvxbacktest

Python
1
star
93

coneos

C package that solves convex cone problems via operator splitting (DEPRECATED, new project https://github.com/cvxgrp/scs)
C
1
star
94

pd-heuristics-and-bounds

Julia
1
star