Dynamics Handlers

Dynamics Handlers

class blackbox_mpc.dynamics_handlers.SystemDynamicsHandler(env_action_space, env_observation_space, dynamics_function=None, true_model=False, is_normalized=True, log_dir=None, tf_writer=None, save_model_frequency=1, saved_model_dir=None, transform_targets_func=<tensorflow.python.eager.def_function.Function object>, inverse_transform_targets_func=<tensorflow.python.eager.def_function.Function object>)[source]
__init__(env_action_space, env_observation_space, dynamics_function=None, true_model=False, is_normalized=True, log_dir=None, tf_writer=None, save_model_frequency=1, saved_model_dir=None, transform_targets_func=<tensorflow.python.eager.def_function.Function object>, inverse_transform_targets_func=<tensorflow.python.eager.def_function.Function object>)[source]

This is the system dynamics handler class that is reponsible for training the dynamics functions, storing the rollouts as well as

prepocessing and postprocessing of the MDP elements

Parameters
  • dynamics_function (DeterministicDynamicsFunctionBase) – Defines the system dynamics function.

  • env_action_space (gym.ActionSpace) – Defines the action space of the gym environment.

  • env_observation_space (tf.int32) – Defines the observation space of the gym environment.

  • tf_writer (tf.summary) – Defines a tensorflow writer to be used for logging

  • log_dir (string) – Defines the log directory to save the normalization statistics in.

  • saved_model_dir (string) – Defines the saved model directory where the model is saved in, in case of loading the model.

  • dynamics_function – Defines the dynamics_function of the nn dynamics function itself

  • transform_targets_func (tf_function) – Defines a tf function to transform the next states as targets (output of the nn dynamics), by default this is the deviation function which is (target = next_state - current_state).

  • inverse_transform_targets_func (tf_function) – Defines a tf function to inverse transform the targets (output of the nn dynamics), by default this is the inverse of the deviation function which is (next_state = target + current_state).

  • save_model_frequency (Int) – Defines how often the model should be saved (defined relative to the number of refining iters)

  • true_model (bool) – Defines if the dynamics function is a non trainable model or not.

  • is_normalized (bool) – Defines if the dynamics function should be trained with normalization or not.

__weakref__

list of weak references to the object (if defined)

get_dynamics_function()[source]

returns the dynamics function used by the system handler.

Returns

process_input(states, actions)[source]

This is the process_input function, which takes in the states and actions and preprocesses them for the dynamics function, (normalization..etc)

Parameters
  • states (tf.float32) – The current states has a shape of (Batch Xdim_S)

  • actions (tf.float32) – The current actions has a shape of (Batch Xdim_U)

Returns

result – concatenated states and actions after preprocessing them.

Return type

tf.float32

process_output(inputs_states, raw_output)[source]

This is the process_state_output function, which takes in the previous states predicted target/delta and processes them to get the predicted absolute next state.

Parameters
  • inputs_states (tf.float32) – The previous states has a shape of (Batch Xdim_S)

  • raw_output (tf.float32) – The predicted normalized delta as received from the dynamics function has a shape of (Batch Xdim_U).

Returns

result – absolute predicted next state.

Return type

tf.float32

train(observations_trajectories, actions_trajectories, rewards_trajectories, validation_split=0.2, batch_size=128, learning_rate=0.001, epochs=30, nn_optimizer=<class 'tensorflow.python.keras.optimizer_v2.adam.Adam'>)[source]

This is the train function, which takes in the data of the MDP to train the dynamics model on it.

Parameters
  • observations_trajectories ([np.float32]) – A list of observations of each of the episodes for the n agents.

  • actions_trajectories ([np.float32]) – A list of actions of each of the episodes for the n agents.

  • rewards_trajectories ([np.float32]) – A list of rewards of each of the episodes for the n agents.

  • learning_rate (float) – Learning rate to be used in training the dynamics function.

  • epochs (Int) – Number of epochs to be used in training the dynamics function everytime train is called.

  • validation_split (float32) – Defines the validation split to be used of the rollouts collected.

  • batch_size (int) – Defines the batch size to be used for training the model.

  • nn_optimizer (tf.keras.optimizers) – Defines the optimizer to use with the neural network.