swarmrl.losses.policy_gradient_loss Module API Reference¶
Module for the implementation of policy gradient loss.
Policy gradient is the most simplistic loss function where critic loss drives the entire policy learning.
Notes¶
https://spinningup.openai.com/en/latest/algorithms/vpg.html
PolicyGradientLoss
¶
Bases: Loss
Parent class for the reinforcement learning tasks.
Notes¶
Source code in swarmrl/losses/policy_gradient_loss.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
__init__(value_function=ExpectedReturns())
¶
Constructor for the reward class.
Parameters¶
value_function : ExpectedReturns
Source code in swarmrl/losses/policy_gradient_loss.py
35 36 37 38 39 40 41 42 43 44 45 46 |
|
compute_loss(network, episode_data)
¶
Compute the loss and update the shared actor-critic network.
Parameters¶
network : Network actor-critic model to use in the analysis. episode_data : np.ndarray (n_timesteps, n_particles, feature_dimension) Observable data for each time step and particle within the episode.
Returns¶
Source code in swarmrl/losses/policy_gradient_loss.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|