Optimizers¶

Optimizer Base¶

class blackbox_mpc.optimizers.OptimizerBase(name, planning_horizon, max_iterations, num_agents, env_action_space, env_observation_space)[source]¶

__call__(current_state, time_step, add_exploration_noise)[source]¶

This is the call function for the Base Optimizer Class. It is used to calculate the optimal solution for action at the current timestep given the current state.

Parameters

current_state (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
time_step (tf.float32) – Defines the current timestep of the episode.
exploration_noise (tf.bool) – Define if the optimal action should have some noise added to it before returning it.

Returns

resulting_action (tf.float32) – The optimal solution for the first action to be applied in the current time step.
next_state (tf.float32) – The next state predicted using the dynamics model in the trajectory evaluator.
rewards_of_next_state (tf.float32) – The predicted reward achieved after applying the action given by the optimizer.

__init__(name, planning_horizon, max_iterations, num_agents, env_action_space, env_observation_space)[source]¶

This is the base class of the optimizers

Parameters

name (String) – Defines the name of the block of the optimizer.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
max_iterations (tf.int32) – Defines the maximimum iterations for the CEM optimizer to refine its guess for the optimal solution.
num_agents (tf.int32) – Defines the number of runner running in parallel
env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (tf.int32) – Defines the observation space of the gym environment.

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.

set_trajectory_evaluator(trajectory_evaluator)[source]¶

Sets the trajectory evaluator to be used by the optimizer.

Parameters: trajectory_evaluator – (EvaluatorBaseClass) Defines the trajectory evaluator to be used to evaluate the reward of a sequence of actions.
Returns

Cross Entropy Method¶

class blackbox_mpc.optimizers.CEMOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, epsilon=0.001, alpha=0.25)[source]¶

__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, epsilon=0.001, alpha=0.25)[source]¶

This Class defines a Cross-Entropy Method optimizer. (http://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf)

Parameters

env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
max_iterations (tf.int32) – Defines the maximimum iterations for the CEM optimizer to refine its guess for the optimal solution.
population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.
num_elite (tf.int32) – Defines the number of elites kept for the next iteration from the population.
num_agents (tf.int32) – Defines the number of runner running in parallel
epsilon (tf.float32) – Defines the epsilon threshold for the difference between iterations solutions so that the optimizer returns the solution earlier than max iterations.
alpha (tf.float32) – Defines the weight of the solution at t-1 in determining the solution at t, ex: mean = alpha*old_mean + (1-alpha)*new_mean.

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Random Shooting¶

class blackbox_mpc.optimizers.RandomSearchOptimizer(env_action_space, env_observation_space, planning_horizon=50, population_size=1024, num_agents=5)[source]¶

__init__(env_action_space, env_observation_space, planning_horizon=50, population_size=1024, num_agents=5)[source]¶

This class is responsible for performing random shooting and choosing the best possible predicted trajectory and returning the first action of this trajectory.

Parameters

env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.
num_agents (tf.int32) – Defines the number of runner running in parallel

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Covariance Matrix Adaptation Evolutionary-Strategy¶

class blackbox_mpc.optimizers.CMAESOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, alpha_cov=<tf.Tensor: id=4, shape=(), dtype=float32, numpy=2.0>, h_sigma=<tf.Tensor: id=5, shape=(), dtype=float32, numpy=1.0>)[source]¶

__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_elite=50, num_agents=5, alpha_cov=<tf.Tensor: id=4, shape=(), dtype=float32, numpy=2.0>, h_sigma=<tf.Tensor: id=5, shape=(), dtype=float32, numpy=1.0>)[source]¶

This class defines a Covariance Matrix Adaptation Evolutionary-Strategy. (https://arxiv.org/pdf/1604.00772.pdf) Note: this optimzer is not optimized for more than one agent

Parameters

env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.
population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.
num_elite (tf.int32) – Defines the number of elites kept for the next iteration from the population.
num_agents (tf.int32) – Defines the number of runner running in parallel
alpha_cov (tf.float32) – Defines the alpha covariance to be used.
h_sigma (tf.float32) – Defines the h sigma to be used.

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Particle Swarm Optimizer¶

class blackbox_mpc.optimizers.PSOOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, c1=<tf.Tensor: id=7, shape=(), dtype=float32, numpy=0.3>, c2=<tf.Tensor: id=8, shape=(), dtype=float32, numpy=0.5>, w=<tf.Tensor: id=9, shape=(), dtype=float32, numpy=0.2>, initial_velocity_fraction=<tf.Tensor: id=10, shape=(), dtype=float32, numpy=0.01>)[source]¶

__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, c1=<tf.Tensor: id=7, shape=(), dtype=float32, numpy=0.3>, c2=<tf.Tensor: id=8, shape=(), dtype=float32, numpy=0.5>, w=<tf.Tensor: id=9, shape=(), dtype=float32, numpy=0.2>, initial_velocity_fraction=<tf.Tensor: id=10, shape=(), dtype=float32, numpy=0.01>)[source]¶

This class defines the particle swarm optimizer. (https://www.cs.tufts.edu/comp/150GA/homeworks/hw3/_reading6%201995%20particle%20swarming.pdf)

Parameters

env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.
population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.
num_agents (tf.int32) – Defines the number of runner running in parallel
c1 (tf.float32) – Defines the fraction of the local best known position direction.
c2 (tf.float32) – Defines the fraction of the global best known position direction.
w (tf.float32) – Defines the fraction of the current velocity to use.
initial_velocity_fraction (tf.float32) – Defines the initial velocity fraction out of the action space.

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Path Integral (Information Theoretic MPC)¶

class blackbox_mpc.optimizers.PI2Optimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, lamda=<tf.Tensor: id=6, shape=(), dtype=float32, numpy=1.0>)[source]¶

__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, lamda=<tf.Tensor: id=6, shape=(), dtype=float32, numpy=1.0>)[source]¶

This class defines the information theortic MPC based on path intergral methods. (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7989202)

Parameters

env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.
population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.
num_agents (tf.int32) – Defines the number of runner running in parallel
lamda (tf.float32) – Defines the lamda used the energy function.

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.

Simultaneous Perturbation Stochastic Approximation Optimizer¶

class blackbox_mpc.optimizers.SPSAOptimizer(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, alpha=<tf.Tensor: id=0, shape=(), dtype=float32, numpy=0.602>, gamma=<tf.Tensor: id=1, shape=(), dtype=float32, numpy=0.101>, a_par=<tf.Tensor: id=2, shape=(), dtype=float32, numpy=0.01>, noise_parameter=<tf.Tensor: id=3, shape=(), dtype=float32, numpy=0.3>)[source]¶

__init__(env_action_space, env_observation_space, planning_horizon=50, max_iterations=5, population_size=500, num_agents=5, alpha=<tf.Tensor: id=0, shape=(), dtype=float32, numpy=0.602>, gamma=<tf.Tensor: id=1, shape=(), dtype=float32, numpy=0.101>, a_par=<tf.Tensor: id=2, shape=(), dtype=float32, numpy=0.01>, noise_parameter=<tf.Tensor: id=3, shape=(), dtype=float32, numpy=0.3>)[source]¶

This class defines the simultaneous perturbation stochastic approximation optimizer. (https://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_Stochastic_Optimization.PDF)

Parameters

env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.
planning_horizon (Int) – Defines the planning horizon for the optimizer (how many steps to lookahead and optimize for).
max_iterations (tf.int32) – Defines the maximimum iterations for the CMAES optimizer to refine its guess for the optimal solution.
population_size (tf.int32) – Defines the population size of the particles evaluated at each iteration.
num_agents (tf.int32) – Defines the number of runner running in parallel
alpha (tf.float32) – Defines the alpha used.
gamma (tf.float32) – Defines the gamma used.
a_par (tf.float32) – Defines the a_par used.
noise_parameter (tf.float32) – Defines the noise_parameter used.

reset()[source]¶: This method resets the optimizer to its default state at the beginning of the trajectory/episode.