Trajectory Evaluators¶
Evaluator Base Class¶
-
class
blackbox_mpc.trajectory_evaluators.EvaluatorBase(reward_function, system_dynamics_handler, name=None)[source]¶ This is the base class of the trajectory evaluators
-
__call__(current_states, action_sequences, time_step)[source]¶ This is the call function for the Evaluator Base Class. It is used to calculate the rewards corresponding to each of the action sequences starting from the current state.
- Parameters
current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
action_sequences (tf.float32) – Defines the action sequences to be evaluated, (dims = population X num_of_agents X planning_horizon X dim_U)
time_step (tf.float32) – Defines the current timestep of the episode.
- Returns
rewards – The rewards corresponding to each action sequence (dims = 1 X population)
- Return type
tf.float32
-
__init__(reward_function, system_dynamics_handler, name=None)[source]¶ This is the initializer function for the Evaluator Base Class.
- Parameters
name (String) – Defines the name of the block of the evaluator.
-
evaluate_next_reward(current_state, next_state, current_action)[source]¶ This is the function used to predict the next reward using the internal dynamics handler.
- Parameters
current_state (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
next_state (tf.float32) – Defines the next state of the system, (dims=num_of_agents X dim_S)
current_action (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)
- Returns
reward – returns the predicted reward using the action, current state and the next one, (dims=num_of_agents X 1)
- Return type
tf.float32
-
predict_next_state(current_state, current_action)[source]¶ This is the function used to predict the next state using the internal dynamics handler.
- Parameters
current_state (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
current_action (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)
- Returns
next_state – Defines the next state of the system, (dims=num_of_agents X dim_S)
- Return type
tf.float32
-
Deterministic Evaluator¶
-
class
blackbox_mpc.trajectory_evaluators.DeterministicTrajectoryEvaluator(reward_function, system_dynamics_handler)[source]¶ -
__call__(current_states, action_sequences, time_step)[source]¶ This is the call function for the Deterministic Trajectory Evaluator Class. It is used to calculate the rewards corresponding to each of the action sequences starting from the current state.
- Parameters
current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
action_sequences (tf.float32) – Defines the action sequences to be evaluated, (dims = population X num_of_agents X planning_horizon X dim_U)
time_step (tf.float32) – Defines the current timestep of the episode.
- Returns
rewards – The rewards corresponding to each action sequence (dims = 1 X population)
- Return type
tf.float32
-
__init__(reward_function, system_dynamics_handler)[source]¶ This is the trajectory evaluator class for a deterministic dynamics function
- Parameters
reward_function (tf_function) – Defines the reward function with the prototype: tf_func_name(current_state, current_actions, next_state), where current_state is BatchXdim_S, next_state is BatchXdim_S and current_actions is BatchXdim_U.
system_dynamics_handler (SystemDynamicsHandler) –
- Defines the system dynamics handler class with its own trainer and observations and actions
preprocessing functions.
-
evaluate_next_reward(current_states, next_states, current_actions)[source]¶ This is the function used to predict the next reward using the internal dynamics handler.
- Parameters
current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
next_states (tf.float32) – Defines the next state of the system, (dims=num_of_agents X dim_S)
current_actions (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)
- Returns
reward – returns the predicted reward using the action, current state and the next one, (dims=num_of_agents X 1)
- Return type
tf.float32
-
predict_next_state(current_states, current_actions)[source]¶ This is the function used to predict the next state using the internal dynamics handler.
- Parameters
current_states (tf.float32) – Defines the current state of the system, (dims=num_of_agents X dim_S)
current_actions (tf.float32) – Defines the current action to be applied, (dims = num_of_agents X dim_U)
- Returns
next_state – Defines the next state of the system, (dims=num_of_agents X dim_S)
- Return type
tf.float32
-