• Stars
    star
    369
  • Rank 114,993 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Model Predictive Path Integral (MPPI) with approximate dynamics implemented in pytorch

PyTorch MPPI Implementation

This repository implements Model Predictive Path Integral (MPPI) with approximate dynamics in pytorch. MPPI typically requires actual trajectory samples, but this paper showed that it could be done with approximate dynamics (such as with a neural network) using importance sampling.

Thus it can be used in place of other trajectory optimization methods such as the Cross Entropy Method (CEM), or random shooting.

Installation

pip install pytorch-mppi

for autotuning hyperparameters, install with

pip install pytorch-mppi[tune]

for running tests, install with

pip install pytorch-mppi[test]

for development, clone the repository then install in editable mode

pip install -e .

Usage

See tests/pendulum_approximate.py for usage with a neural network approximating the pendulum dynamics. See the not_batch branch for an easier to read algorithm. Basic use case is shown below

from pytorch_mppi import MPPI

# create controller with chosen parameters
ctrl = MPPI(dynamics, running_cost, nx, noise_sigma, num_samples=N_SAMPLES, horizon=TIMESTEPS,
            lambda_=lambda_, device=d,
            u_min=torch.tensor(ACTION_LOW, dtype=torch.double, device=d),
            u_max=torch.tensor(ACTION_HIGH, dtype=torch.double, device=d))

# assuming you have a gym-like env
obs = env.reset()
for i in range(100):
    action = ctrl.command(obs)
    obs, reward, done, _ = env.step(action.cpu().numpy())

Requirements

  • pytorch (>= 1.0)
  • next state <- dynamics(state, action) function (doesn't have to be true dynamics)
    • state is K x nx, action is K x nu
  • cost <- running_cost(state, action) function
    • cost is K x 1, state is K x nx, action is K x nu

Features

  • Approximate dynamics MPPI with importance sampling
  • Parallel/batch pytorch implementation for accelerated sampling
  • Control bounds via sampling control noise from rectified gaussian
  • Handle stochastic dynamic models (assuming each call is a sample) by sampling multiple state trajectories for the same action trajectory with rollout_samples

Parameter tuning and hints

terminal_state_cost - function(state (K x T x nx)) -> cost (K x 1) by default there is no terminal cost, but if you experience your trajectory getting close to but never quite reaching the goal, then having a terminal cost can help. The function should scale with the horizon (T) to keep up with the scaling of the running cost.

lambda_ - higher values increases the cost of control noise, so you end up with more samples around the mean; generally lower values work better (try 1e-2)

num_samples - number of trajectories to sample; generally the more the better. Runtime performance scales much better with num_samples than horizon, especially if you're using a GPU device (remember to pass that in!)

noise_mu - the default is 0 for all control dimensions, which may work out really poorly if you have control bounds and the allowed range is not 0-centered. Remember to change this to an appropriate value for non-symmetric control dimensions.

Autotune

from version 0.5.0 onwards, you can automatically tune the hyperparameters. A convenient tuner compatible with the popular ray tune library is implemented. You can select from a variety of cutting edge black-box optimizers such as CMA-ES, HyperOpt, fmfn/BayesianOptimization, and so on. See tests/auto_tune_parameters.py for an example. A tutorial based on it follows.

The tuner can be used for other controllers as well, but you will need to define the appropriate TunableParameter subclasses.

First we create a toy 2D environment to do controls on and create the controller with some default parameters.

import torch
from pytorch_mppi import MPPI

device = "cpu"
dtype = torch.double

# create toy environment to do on control on (default start and goal)
env = Toy2DEnvironment(visualize=True, terminal_scale=10)

# create MPPI with some initial parameters
mppi = MPPI(env.dynamics, env.running_cost, 2,
            terminal_state_cost=env.terminal_cost,
            noise_sigma=torch.diag(torch.tensor([5., 5.], dtype=dtype, device=device)),
            num_samples=500,
            horizon=20, device=device,
            u_max=torch.tensor([2., 2.], dtype=dtype, device=device),
            lambda_=1)

We then need to create an evaluation function for the tuner to tune on. It should take no arguments and output a EvaluationResult populated at least by costs. If you don't need rollouts for the cost evaluation, then you can set it to None in the return. Tips for creating the evaluation function are described in comments below:

from pytorch_mppi import autotune
# use the same nominal trajectory to start with for all the evaluations for fairness
nominal_trajectory = mppi.U.clone()
# parameters for our sample evaluation function - lots of choices for the evaluation function
evaluate_running_cost = True
num_refinement_steps = 10
num_trajectories = 5

def evaluate():
    costs = []
    rollouts = []
    # we sample multiple trajectories for the same start to goal problem, but in your case you should consider
    # evaluating over a diverse dataset of trajectories
    for j in range(num_trajectories):
        mppi.U = nominal_trajectory.clone()
        # the nominal trajectory at the start will be different if the horizon's changed
        mppi.change_horizon(mppi.T)
        # usually MPPI will have its nominal trajectory warm-started from the previous iteration
        # for a fair test of tuning we will reset its nominal trajectory to the same random one each time
        # we manually warm it by refining it for some steps
        for k in range(num_refinement_steps):
            mppi.command(env.start, shift_nominal_trajectory=False)

        rollout = mppi.get_rollouts(env.start)

        this_cost = 0
        rollout = rollout[0]
        # here we evaluate on the rollout MPPI cost of the resulting trajectories
        # alternative costs for tuning the parameters are possible, such as just considering terminal cost
        if evaluate_running_cost:
            for t in range(len(rollout) - 1):
                this_cost = this_cost + env.running_cost(rollout[t], mppi.U[t])
        this_cost = this_cost + env.terminal_cost(rollout, mppi.U)

        rollouts.append(rollout)
        costs.append(this_cost)
    # can return None for rollouts if they do not need to be calculated
    return autotune.EvaluationResult(torch.stack(costs), torch.stack(rollouts))

With this we have enough to start tuning. For example, we can tune iteratively with the CMA-ES optimizer

# these are subclass of TunableParameter (specifically MPPIParameter) that we want to tune
params_to_tune = [autotune.SigmaParameter(mppi), autotune.HorizonParameter(mppi), autotune.LambdaParameter(mppi)]
# create a tuner with a CMA-ES optimizer
tuner = autotune.Autotune(params_to_tune, evaluate_fn=evaluate, optimizer=autotune.CMAESOpt(sigma=1.0))
# tune parameters for a number of iterations
iterations = 30
for i in range(iterations):
  # results of this optimization step are returned
  res = tuner.optimize_step()
  # we can render the rollouts in the environment
  env.draw_rollouts(res.rollouts)
# get best results and apply it to the controller
# (by default the controller will take on the latest tuned parameter, which may not be best)
res = tuner.get_best_result()
tuner.apply_parameters(res.param_values)

This is a local search method that optimizes starting from the initially defined parameters. For global searching, we use ray tune compatible searching algorithms. Note that you can modify the search space of each parameter, but default reasonable ones are provided.

# can also use a Ray Tune optimizer, see
# https://docs.ray.io/en/latest/tune/api_docs/suggestion.html#search-algorithms-tune-search
# rather than adapting the current parameters, these optimizers allow you to define a search space for each
# and will search on that space
from pytorch_mppi import autotune_global
from ray.tune.search.hyperopt import HyperOptSearch
from ray.tune.search.bayesopt import BayesOptSearch

# the global version of the parameters define a reasonable search space for each parameter
params_to_tune = [autotune_global.SigmaGlobalParameter(mppi),
                  autotune_global.HorizonGlobalParameter(mppi),
                  autotune_global.LambdaGlobalParameter(mppi)]

# be sure to close any figures before ray tune optimization or they will be duplicated
env.visualize = False
plt.close('all')
tuner = autotune_global.AutotuneGlobal(params_to_tune, evaluate_fn=evaluate,
                                       optimizer=autotune_global.RayOptimizer(HyperOptSearch))
# ray tuners cannot be tuned iteratively, but you can specify how many iterations to tune for
res = tuner.optimize_all(100)
res = tuner.get_best_result()
tuner.apply_parameters(res.params)

For example tuning hyperparameters (with CMA-ES) only on the toy problem (the nominal trajectory is reset each time so they are sampling from noise):

toy tuning

If you want more than just the best solution found, such as if you want diversity across hyperparameter values, or if your evaluation function has large uncertainty, then you can directly query past results by

for res in tuner.optim.all_res:
    # the cost
    print(res.metrics['cost'])
    # extract the parameters
    params = tuner.config_to_params(res.config)
    print(params)
    # apply the parameters to the controller
    tuner.apply_parameters(params)

Alternatively you can try Quality Diversity optimization using the CMA-ME optimizer. This optimizer will try to optimize for high quality parameters while ensuring there is diversity across them. However, it is very slow and you might be better using a RayOptimizer and selecting for top results while checking for diversity. To use it, you need to install

pip install ribs

You then use it as

import pytorch_mppi.autotune_qd

optim = pytorch_mppi.autotune_qd.CMAMEOpt()
tuner = autotune_global.AutotuneGlobal(params_to_tune, evaluate_fn=evaluate,
                                       optimizer=optim)

iterations = 10
for i in range(iterations):
  # results of this optimization step are returned
  res = tuner.optimize_step()
  # we can render the rollouts in the environment
  best_params = optim.get_diverse_top_parameters(5)
  for res in best_params:
    print(res)

Tests

You'll need to install gym<=0.20 to run the tests (for the Pendulum-v0 environment). The easy way to install this and other testing dependencies is running python setup.py test. Note that gym past 0.20 deprecated Pendulum-v0 for Pendulum-v1 with incompatible dynamics.

Under tests you can find the MPPI method applied to known pendulum dynamics and approximate pendulum dynamics (with a 2 layer feedforward net estimating the state residual). Using a continuous angle representation (feeding cos(\theta), sin(\theta) instead of \theta directly) makes a huge difference. Although both works, the continuous representation is much more robust to controller parameters and random seed. In addition, the problem of continuing to spin after over-swinging does not appear.

Sample result on approximate dynamics with 100 steps of random policy data to initialize the dynamics:

pendulum results

Related projects

  • pytorch CEM - an alternative MPC shooting method with similar API as this project

More Repositories

1

pytorch_kinematics

Robot kinematics implemented in pytorch
Python
233
star
2

sdf_tools

Builds 2D signed distance fields from images, 3D signed distance fields from pointclouds, 3D signed distance fields from Octomap, provides a lightweight signed distance field library, message types for signed distance fields, and tools to compress signed distance fields for transport.
C++
187
star
3

Chamfer-Distance-API

A python class that calculates chamfer distance between point clouds using tensorflow
C++
58
star
4

cdcpd

Library for Constrained Deformable Coherent Point Drift
C++
29
star
5

pytorch_volumetric

Volumetric structures such as voxels and SDFs implemented in pytorch
Python
23
star
6

vr_teleop

[ROS 1] Teleop control of dual arm robots using virtual reality devices
CMake
19
star
7

Efficient-Eng-2-LTL

The associated repo for paper "Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification"
Python
16
star
8

chsel

Constrained pose Hypothesis Set Elimination official code for pose estimation
Python
11
star
9

arc_utilities

C++ and Python utilities. ARC -> ARM
C++
10
star
10

husky_stereo_nav

ROS package for navigation of Clearpath Husky using visual SLAM with a stereo camera
Python
7
star
11

kuka_iiwa_interface

Developmental controllers for the IIWA 7 robot, including a Catkinized version of the Kuka FRI SDK
C++
7
star
12

pytorch_collision_checker

PyTorch Collision Checker
Python
6
star
13

probabilistic_shape_completion

Code for Probabilistic Shape Completion
Python
6
star
14

arm_robots

Packages for controlling our robots at a high-level, though simple to use ROS and Python interfaces.
Python
6
star
15

arm_pytorch_utilities

Utilities for working with pytorch
Python
6
star
16

tampc

Trap aware model predictive control
Python
5
star
17

fast_kinematic_simulator

Fast kinematic simulator used for motion planning and control with uncertainty projects.
C++
5
star
18

merge_pointclouds

C++
5
star
19

unity_victor_teleop

Unity project for teleoperation of ARM-lab's "Victor" robot using an HTC Vive
C#
5
star
20

visual_servoing

position based visual servoing library with PnP pose estimation
Python
5
star
21

robot_calibration

A ROS package that publishes position of a table in kinect camera frame.
Python
5
star
22

uncertainty_planning_examples

SE(2), SE(3), Baxter, and UR5 examples for using uncertainty_planning_core
C++
4
star
23

vr_ros2_bridge

Unity project and ROS 2 interface definitions for using the HTC Vive
C#
4
star
24

gpu_voxel_planning

Brad's library for PhD research on CHS, BTP, and other voxel-based planning approaches using ARMLabs Victor robot
C++
3
star
25

mab_ms

Bandit-based Model Selection for Deformable Object Manipulation
C++
3
star
26

CudaSift

Cuda
3
star
27

Traversability-Based-Contact-Space-Planner

Traversability-based contact planner code release
C++
3
star
28

multihypothesis_segmentation_tracking

Public release of code for Manipulating Piles of Stuff voxel pipeline.
Jupyter Notebook
3
star
29

stucco

Soft Tracking Using Contacts for Cluttered Objects
Python
3
star
30

deformable_manipulation_interface

ROS interface between a robot/simulator and planners for deformable object manipulation.
C++
3
star
31

mjregrasping_cpp

C++
2
star
32

arm_segmentation

Python
2
star
33

pytorch_rrt

Kinodynamic RRT implemented in pytorch
Python
2
star
34

arm_video_recorder

ROS node to record from a connected video device (webcam) and save to a file
Python
2
star
35

uncertainty_planning_core

Framework for motion planning with uncertainty
C++
2
star
36

plant_motion_planning

This repository contains code related to the motion planning around plants project.
C++
2
star
37

mjregrasping

Code accompanying "The Grasp Loop Signature: A Topological Representation for Manipulation Planning with Ropes and Cables"
Python
2
star
38

unreliable-deform-manipulation

Code accompanying the paper "Learning Where to Trust Unreliable Models in an Unstructured World for Deformable Object Manipulation"
Jupyter Notebook
2
star
39

gpu_profiling

GPU Profiling Tools For The Lab
Python
1
star
40

val_pbvs_pybullet

Pybullet simulation for visual servoing of a dual arm robot val
Python
1
star
41

SiamMask

Python
1
star
42

deform_control

Project for deformable object control
C++
1
star
43

kinect2_calibration_files

Calibration files for use with the iai_kinect2 ROS package.
CMake
1
star
44

smmap

Simultaneous Modeling Manipulation And Planning
C++
1
star
45

unity_victor_teleop_headless

C#
1
star
46

pr2_reachability_map_visualization

Python
1
star
47

focused-adaptation-dynamics

Code accompanying Focused Adaptation of Dynamics Models for Deformable Object Manipulation
Jupyter Notebook
1
star
48

mps_shape_completion

Neural network completion of partial shapes in a voxel grid
Python
1
star
49

pytorch_seed

RNG seeding and context management for pytorch
Python
1
star
50

data-augmentation-for-manipulation

Code accompanying the RSS paper "Data Augmentation for Manipulation"
Jupyter Notebook
1
star
51

arm_clouds

Simple module for convenient NumPy and PyTorch point cloud utilities
Python
1
star
52

lightweight_vicon_bridge

Bridge between Vicon DataStream SDK and ROS. Designed specifically for reduced latency.
C++
1
star
53

husky_ark_armlab

A Python ROS package to provide additional functionality for the Clearpath Autonomous Research Kit. This project was conducted for the University of Michigan's ARM Lab.
Python
1
star