Dynamics Handlers¶

class blackbox_mpc.dynamics_handlers.SystemDynamicsHandler(env_action_space, env_observation_space, dynamics_function=None, true_model=False, is_normalized=True, log_dir=None, tf_writer=None, save_model_frequency=1, saved_model_dir=None, transform_targets_func=<tensorflow.python.eager.def_function.Function object>, inverse_transform_targets_func=<tensorflow.python.eager.def_function.Function object>)[source]¶

__init__(env_action_space, env_observation_space, dynamics_function=None, true_model=False, is_normalized=True, log_dir=None, tf_writer=None, save_model_frequency=1, saved_model_dir=None, transform_targets_func=<tensorflow.python.eager.def_function.Function object>, inverse_transform_targets_func=<tensorflow.python.eager.def_function.Function object>)[source]¶

This is the system dynamics handler class that is reponsible for training the dynamics functions, storing the rollouts as well as

prepocessing and postprocessing of the MDP elements

Parameters

dynamics_function (DeterministicDynamicsFunctionBase) – Defines the system dynamics function.
env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.
env_observation_space (tf.int32) – Defines the observation space of the gym environment.
tf_writer (tf.summary) – Defines a tensorflow writer to be used for logging
log_dir (string) – Defines the log directory to save the normalization statistics in.
saved_model_dir (string) – Defines the saved model directory where the model is saved in, in case of loading the model.
dynamics_function – Defines the dynamics_function of the nn dynamics function itself
transform_targets_func (tf_function) – Defines a tf function to transform the next states as targets (output of the nn dynamics), by default this is the deviation function which is (target = next_state - current_state).
inverse_transform_targets_func (tf_function) – Defines a tf function to inverse transform the targets (output of the nn dynamics), by default this is the inverse of the deviation function which is (next_state = target + current_state).
save_model_frequency (Int) – Defines how often the model should be saved (defined relative to the number of refining iters)
true_model (bool) – Defines if the dynamics function is a non trainable model or not.
is_normalized (bool) – Defines if the dynamics function should be trained with normalization or not.

__weakref__¶: list of weak references to the object (if defined)

get_dynamics_function()[source]¶

returns the dynamics function used by the system handler.

Returns

process_input(states, actions)[source]¶

This is the process_input function, which takes in the states and actions and preprocesses them for the dynamics function, (normalization..etc)

Parameters

states (tf.float32) – The current states has a shape of (Batch Xdim_S)
actions (tf.float32) – The current actions has a shape of (Batch Xdim_U)

Returns

result – concatenated states and actions after preprocessing them.

Return type

tf.float32

process_output(inputs_states, raw_output)[source]¶

This is the process_state_output function, which takes in the previous states predicted target/delta and processes them to get the predicted absolute next state.

Parameters

inputs_states (tf.float32) – The previous states has a shape of (Batch Xdim_S)
raw_output (tf.float32) – The predicted normalized delta as received from the dynamics function has a shape of (Batch Xdim_U).

Returns

result – absolute predicted next state.

Return type

tf.float32

train(observations_trajectories, actions_trajectories, rewards_trajectories, validation_split=0.2, batch_size=128, learning_rate=0.001, epochs=30, nn_optimizer=<class 'tensorflow.python.keras.optimizer_v2.adam.Adam'>)[source]¶

This is the train function, which takes in the data of the MDP to train the dynamics model on it.

Parameters

observations_trajectories ([np.float32]) – A list of observations of each of the episodes for the n agents.
actions_trajectories ([np.float32]) – A list of actions of each of the episodes for the n agents.
rewards_trajectories ([np.float32]) – A list of rewards of each of the episodes for the n agents.
learning_rate (float) – Learning rate to be used in training the dynamics function.
epochs (Int) – Number of epochs to be used in training the dynamics function everytime train is called.
validation_split (float32) – Defines the validation split to be used of the rollouts collected.
batch_size (int) – Defines the batch size to be used for training the model.
nn_optimizer (tf.keras.optimizers) – Defines the optimizer to use with the neural network.