You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: units/en/unit4/policy-gradient.mdx
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -54,6 +54,7 @@ Let's give some more details on this formula:
54
54
55
55
56
56
-\\(R(\tau)\\) : Return from an arbitrary trajectory. To take this quantity and use it to calculate the expected return, we need to multiply it by the probability of each possible trajectory.
57
+
57
58
-\\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\( \theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited).
0 commit comments