Home
About
Uooh's blog
Trust region policy optimization
Mar 28, 2024
TRPO aims steady improvement of policy.
Policy Gradient Theorem
Mar 16, 2024
Policy Gradient Theorem for discounted reward setting.
Older
Newer