Policies

Model Based Base Policy

class blackbox_mpc.policies.ModelBasedBasePolicy(trajectory_evaluator)[source]
__init__(trajectory_evaluator)[source]

This is the model based policy base class for controlling the agent

Parameters

trajectory_evaluator (EvaluatorBase) – Defines the trajectory evaluator to be used in the optimizer to evaluate trajectories.

__weakref__

list of weak references to the object (if defined)

act(observations, t, exploration_noise=False)[source]

This is the act function for the model based policy base class, which should be called to provide the action to be executed at the current time step.

Parameters
  • observations (tf.float32) – Defines the current observations received from the environment.

  • t (tf.float32) – Defines the current timestep.

  • exploration_noise (bool) – Defines if exploration noise should be added to the action to be executed.

Returns

  • action (tf.float32) – The action to be executed for each of the runner (dims = runner X dim_U)

  • next_observations (tf.float32) – The next observations predicted using the dynamics function learned so far.

  • rewards_of_next_state (tf.float32) – The predicted reward if the action was executed using the predicted observations.

reset()[source]

This is the reset function for the model based policy base class, which should be called at the beginning of the episode.

Model Predictive Control Policy

class blackbox_mpc.policies.MPCPolicy(trajectory_evaluator=None, optimizer=None, tf_writer=None, log_dir=None, reward_function=None, env_action_space=None, env_observation_space=None, dynamics_function=None, dynamics_handler=None, true_model=False, optimizer_name=None, num_agents=None, save_model_frequency=1, saved_model_dir=None, **optimizer_args)[source]
__init__(trajectory_evaluator=None, optimizer=None, tf_writer=None, log_dir=None, reward_function=None, env_action_space=None, env_observation_space=None, dynamics_function=None, dynamics_handler=None, true_model=False, optimizer_name=None, num_agents=None, save_model_frequency=1, saved_model_dir=None, **optimizer_args)[source]

This is the model predictive control policy for controlling the agent

Parameters
  • trajectory_evaluator (EvaluatorBase) – Defines the trajectory evaluator to be used in the optimizer to evaluate trajectories.

  • tf_writer (tf.summary) – Tensorflow writer to be used in logging the data.

  • optimizer_name (str) – optimizer name between in [‘CEM’, ‘CMA-ES’, ‘PI2’, ‘RandomSearch’, ‘PSO’, ‘SPSA’].

  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (gym.ObservationSpace) – Defines the observation space of the gym environment.

  • dynamics_function (DeterministicDynamicsFunctionBaseClass) – Defines the system dynamics function.

  • dynamics_handler (SystemDynamicsHandler) – The system_dynamics_handler is a handler of the state, actions and targets processing funcs as well as the dynamics function.

  • reward_function (tf_function) – Defines the reward function with the prototype: tf_func_name(current_state, current_actions, next_state), where current_state is BatchXdim_S, next_state is BatchXdim_S and current_actions is BatchXdim_U.

  • true_model (bool) – boolean defining if its a true model dynamics or not.

  • log_dir (string) – Defines the log directory to save the normalization statistics in.

  • num_agents (tf.int32) – Defines the number of runner running in parallel

  • saved_model_dir (string) – Defines the saved model directory where the model is saved in, in case of loading the model.

  • save_model_frequency (Int) – Defines how often the model should be saved (defined relative to the number of refining iters)

  • optimizer_args (args) – other arguments specific to the optimizer.

act(observations, t, exploration_noise=False)[source]

This is the act function for the model predictive control policy, which should be called to provide the action to be executed at the current time step.

Parameters
  • observations (tf.float32) – Defines the current observations received from the environment.

  • t (tf.float32) – Defines the current timestep.

  • exploration_noise (bool) – Defines if exploration noise should be added to the action to be executed.

Returns

  • action (tf.float32) – The action to be executed for each of the runner (dims = runner X dim_U)

  • next_observations (tf.float32) – The next observations predicted using the dynamics function learned so far.

  • rewards_of_next_state (tf.float32) – The predicted reward if the action was executed using the predicted observations.

reset()[source]

This is the reset function for the model predictive control policy, which should be called at the beginning of the episode.

switch_optimizer(optimizer=None, optimizer_name='', **optimizer_args)[source]

This function is used to switch the optimizer of model predictive control policy.

Parameters
  • optimizer (OptimizerBaseClass) – Optimizer to be used that optimizes for the best action sequence and returns the first action.

  • optimizer_name (str) – optimizer name between in [‘CEM’, ‘CMA-ES’, ‘PI2’, ‘RandomSearch’, ‘PSO’, ‘SPSA’].

  • optimizer_args (args) – other arguments specific to the optimizer.

Model Free Base Policy

class blackbox_mpc.policies.ModelFreeBasePolicy[source]
__init__()[source]

This is the model free policy base class for controlling the agent

__weakref__

list of weak references to the object (if defined)

act(observations, t, exploration_noise=False)[source]

This is the act function for the model free policy base class, which should be called to provide the action to be executed at the current time step.

Parameters
  • observations (tf.float32) – Defines the current observations received from the environment.

  • t (tf.float32) – Defines the current timestep.

  • exploration_noise (bool) – Defines if exploration noise should be added to the action that will be executed.

Returns

action – The action to be executed for each of the runner (dims = runner X dim_U)

Return type

tf.float32

reset()[source]

This is the reset function for the model free policy base class, which should be called at the beginning of the episode.

Random Policy

class blackbox_mpc.policies.RandomPolicy(number_of_agents, env_action_space)[source]
__init__(number_of_agents, env_action_space)[source]

This is the random policy for controlling the agent

Parameters
  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • number_of_agents (tf.int32) – Defines the number of runner running in parallel

act(observations, t, exploration_noise=False)[source]

This is the act function for the random policy, which should be called to provide the action to be executed at the current time step.

Parameters
  • observations (tf.float32) – Defines the current observations received from the environment.

  • t (tf.float32) – Defines the current timestep.

  • exploration_noise (bool) – Defines if exploration noise should be added to the action to be executed.

Returns

action – The action to be executed for each of the runner (dims = runner X dim_U)

Return type

tf.float32

reset()[source]

This is the reset function for the random policy, which should be called at the beginning of the episode.