RLtoolbox Function
Mc_On_Policy_Improve - Search an optimal policy.
Calling Sequence
- Pi=Mc_On_Policy_Improve(Episode, Pi, Q, Actionlist, _epsilon)
Parameters
- Episode
: List of each action-value during episode
- Pi
: Policy to be improved. Size : NbStates x 1
- Q
: Value of each action state
- ActionList
: List of each action
- _epsilon
: Parameter influencing "the memory" of the agent. 0 < G < 1
- Pi
: Matrix with the optimal policy found. Size : NbStates x 1
Description
Search an optimal policy for the environment described by the others parameters.
Examples
None
See Also
none