Trajectory Evaluators¶

Evaluator Base Class¶

class blackbox_mpc.trajectory_evaluators.EvaluatorBase(reward_function, system_dynamics_handler, name=None)[source]¶

This is the base class of the trajectory evaluators

__call__(current_states, action_sequences, time_step)[source]¶

This is the call function for the Evaluator Base Class. It is used to calculate the rewards corresponding to each of the action sequences starting from the current state.

Parameters

current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
action_sequences (tf.float32) – Defines the action sequences to be evaluated, (dims = population X num_of_agents X planning_horizon X dim_U)
time_step (tf.float32) – Defines the current timestep of the episode.

Returns

rewards – The rewards corresponding to each action sequence (dims = 1 X population)

Return type

tf.float32

__init__(reward_function, system_dynamics_handler, name=None)[source]¶

This is the initializer function for the Evaluator Base Class.

Parameters: name (String) – Defines the name of the block of the evaluator.

evaluate_next_reward(current_state, next_state, current_action)[source]¶

This is the function used to predict the next reward using the internal dynamics handler.

Parameters

current_state (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
next_state (tf.float32) – Defines the next state of the system, (dims=num_of_agents X dim_S)
current_action (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)

Returns

reward – returns the predicted reward using the action, current state and the next one, (dims=num_of_agents X 1)

Return type

tf.float32

predict_next_state(current_state, current_action)[source]¶

This is the function used to predict the next state using the internal dynamics handler.

Parameters

current_state (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
current_action (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)

Returns

next_state – Defines the next state of the system, (dims=num_of_agents X dim_S)

Return type

tf.float32

Deterministic Evaluator¶

class blackbox_mpc.trajectory_evaluators.DeterministicTrajectoryEvaluator(reward_function, system_dynamics_handler)[source]¶

__call__(current_states, action_sequences, time_step)[source]¶

This is the call function for the Deterministic Trajectory Evaluator Class. It is used to calculate the rewards corresponding to each of the action sequences starting from the current state.

Parameters

current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
action_sequences (tf.float32) – Defines the action sequences to be evaluated, (dims = population X num_of_agents X planning_horizon X dim_U)
time_step (tf.float32) – Defines the current timestep of the episode.

Returns

rewards – The rewards corresponding to each action sequence (dims = 1 X population)

Return type

tf.float32

__init__(reward_function, system_dynamics_handler)[source]¶

This is the trajectory evaluator class for a deterministic dynamics function

Parameters

reward_function (tf_function) – Defines the reward function with the prototype: tf_func_name(current_state, current_actions, next_state), where current_state is BatchXdim_S, next_state is BatchXdim_S and current_actions is BatchXdim_U.
system_dynamics_handler (SystemDynamicsHandler) –

Defines the system dynamics handler class with its own trainer and observations and actions
preprocessing functions.

evaluate_next_reward(current_states, next_states, current_actions)[source]¶

This is the function used to predict the next reward using the internal dynamics handler.

Parameters

current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
next_states (tf.float32) – Defines the next state of the system, (dims=num_of_agents X dim_S)
current_actions (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)

Returns

reward – returns the predicted reward using the action, current state and the next one, (dims=num_of_agents X 1)

Return type

tf.float32

predict_next_state(current_states, current_actions)[source]¶

This is the function used to predict the next state using the internal dynamics handler.

Parameters

current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
current_actions (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)

Returns

next_state – Defines the next state of the system, (dims=num_of_agents X dim_S)

Return type

tf.float32