Optimizers

Optimizer Base

class blackbox_mpc.optimizers.OptimizerBase(name, planning_horizon, max_iterations, num_agents, env_action_space, env_observation_space)[source]
__call__(current_state, time_step, add_exploration_noise)[source]

This is the call function for the Base Optimizer Class. It is used to calculate the optimal solution for action at the current timestep given the current state.

Parameters
  • current_state (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)

  • time_step (tf.float32) – Defines the current timestep of the episode.

  • exploration_noise (tf.bool) – Define if the optimal action should have some noise added to it before returning it.

Returns

  • resulting_action (tf.float32) – The optimal solution for the first action to be applied in the current time step.

  • next_state (tf.float32) – The next state predicted using the dynamics model in the trajectory evaluator.

  • rewards_of_next_state (tf.float32) – The predicted reward achieved after applying the action given by the optimizer.

__init__(name, planning_horizon, max_iterations, num_agents, env_action_space, env_observation_space)[source]

This is the base class of the optimizers

Parameters
  • name (String) – Defines the name of the block of the optimizer.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • max_iterations (tf.int32) – Defines the maximimum iterations for the CEM optimizer to refine its guess for the optimal solution.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (tf.int32) – Defines the observation space of the gym environment.

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.

set_trajectory_evaluator(trajectory_evaluator)[source]

Sets the trajectory evaluator to be used by the optimizer.

Parameters

trajectory_evaluator – (EvaluatorBaseClass) Defines the trajectory evaluator to be used to evaluate the reward of a sequence of actions.

Returns

Cross Entropy Method

class blackbox_mpc.optimizers.CEMOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, epsilon=0.001, alpha=0.25)[source]
__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, epsilon=0.001, alpha=0.25)[source]

This Class defines a Cross-Entropy Method optimizer. (http://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf)

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • max_iterations (tf.int32) – Defines the maximimum iterations for the CEM optimizer to refine its guess for the optimal solution.

  • population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.

  • num_elite (tf.int32) – Defines the number of elites kept for the next iteration from the population.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • epsilon (tf.float32) – Defines the epsilon threshold for the difference between iterations solutions so that the optimizer returns the solution earlier than max iterations.

  • alpha (tf.float32) – Defines the weight of the solution at t-1 in determining the solution at t, ex: mean = alpha*old_mean + (1-alpha)*new_mean.

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Random Shooting

class blackbox_mpc.optimizers.RandomSearchOptimizer(env_action_space, env_observation_space, planning_horizon=50, population_size=1024, num_agents=5)[source]
__init__(env_action_space, env_observation_space, planning_horizon=50, population_size=1024, num_agents=5)[source]

This class is responsible for performing random shooting and choosing the best possible predicted trajectory and returning the first action of this trajectory.

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Covariance Matrix Adaptation Evolutionary-Strategy

class blackbox_mpc.optimizers.CMAESOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, alpha_cov=<tf.Tensor: id=4, shape=(), dtype=float32, numpy=2.0>, h_sigma=<tf.Tensor: id=5, shape=(), dtype=float32, numpy=1.0>)[source]
__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, alpha_cov=<tf.Tensor: id=4, shape=(), dtype=float32, numpy=2.0>, h_sigma=<tf.Tensor: id=5, shape=(), dtype=float32, numpy=1.0>)[source]

This class defines a Covariance Matrix Adaptation Evolutionary-Strategy. (https://arxiv.org/pdf/1604.00772.pdf) Note: this optimzer is not optimized for more than one agent

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.

  • population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.

  • num_elite (tf.int32) – Defines the number of elites kept for the next iteration from the population.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • alpha_cov (tf.float32) – Defines the alpha covariance to be used.

  • h_sigma (tf.float32) – Defines the h sigma to be used.

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Particle Swarm Optimizer

class blackbox_mpc.optimizers.PSOOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, c1=<tf.Tensor: id=7, shape=(), dtype=float32, numpy=0.3>, c2=<tf.Tensor: id=8, shape=(), dtype=float32, numpy=0.5>, w=<tf.Tensor: id=9, shape=(), dtype=float32, numpy=0.2>, initial_velocity_fraction=<tf.Tensor: id=10, shape=(), dtype=float32, numpy=0.01>)[source]
__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, c1=<tf.Tensor: id=7, shape=(), dtype=float32, numpy=0.3>, c2=<tf.Tensor: id=8, shape=(), dtype=float32, numpy=0.5>, w=<tf.Tensor: id=9, shape=(), dtype=float32, numpy=0.2>, initial_velocity_fraction=<tf.Tensor: id=10, shape=(), dtype=float32, numpy=0.01>)[source]

This class defines the particle swarm optimizer. (https://www.cs.tufts.edu/comp/150GA/homeworks/hw3/_reading6%201995%20particle%20swarming.pdf)

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.

  • population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • c1 (tf.float32) – Defines the fraction of the local best known position direction.

  • c2 (tf.float32) – Defines the fraction of the global best known position direction.

  • w (tf.float32) – Defines the fraction of the current velocity to use.

  • initial_velocity_fraction (tf.float32) – Defines the initial velocity fraction out of the action space.

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Path Integral (Information Theoretic MPC)

class blackbox_mpc.optimizers.PI2Optimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, lamda=<tf.Tensor: id=6, shape=(), dtype=float32, numpy=1.0>)[source]
__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, lamda=<tf.Tensor: id=6, shape=(), dtype=float32, numpy=1.0>)[source]

This class defines the information theortic MPC based on path intergral methods. (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7989202)

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.

  • population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • lamda (tf.float32) – Defines the lamda used the energy function.

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Simultaneous Perturbation Stochastic Approximation Optimizer

class blackbox_mpc.optimizers.SPSAOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, alpha=<tf.Tensor: id=0, shape=(), dtype=float32, numpy=0.602>, gamma=<tf.Tensor: id=1, shape=(), dtype=float32, numpy=0.101>, a_par=<tf.Tensor: id=2, shape=(), dtype=float32, numpy=0.01>, noise_parameter=<tf.Tensor: id=3, shape=(), dtype=float32, numpy=0.3>)[source]
__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, alpha=<tf.Tensor: id=0, shape=(), dtype=float32, numpy=0.602>, gamma=<tf.Tensor: id=1, shape=(), dtype=float32, numpy=0.101>, a_par=<tf.Tensor: id=2, shape=(), dtype=float32, numpy=0.01>, noise_parameter=<tf.Tensor: id=3, shape=(), dtype=float32, numpy=0.3>)[source]

This class defines the simultaneous perturbation stochastic approximation optimizer. (https://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_Stochastic_Optimization.PDF)

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).

  • max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.

  • population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • alpha (tf.float32) – Defines the alpha used.

  • gamma (tf.float32) – Defines the gamma used.

  • a_par (tf.float32) – Defines the a_par used.

  • noise_parameter (tf.float32) – Defines the noise_parameter used.

reset()[source]

This method resets the optimizer to its default state at the beginning of the trajectory/episode.