Skip to content

Commit bf5a72a

Browse files
Merge pull request #420 from fzyzcjy/patch-1
Super tiny fix format
2 parents 262cc0c + 59bce06 commit bf5a72a

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

units/en/unit4/policy-gradient.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ Let's give some more details on this formula:
5454

5555

5656
- \\(R(\tau)\\) : Return from an arbitrary trajectory. To take this quantity and use it to calculate the expected return, we need to multiply it by the probability of each possible trajectory.
57+
5758
- \\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\( \theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited).
5859

5960
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/probability.png" alt="Probability"/>

0 commit comments

Comments
 (0)