RLtoolbox Function
Mc_Off_Policy_Improve - Search an optimal policy.
Calling Sequence
- Pi=Mc_Off_Policy_Improve(Episode, Pi, Q, Actionlist)
Parameters
- Episode
: List of each action-value during episode
- Pi
: Policy to be improved. Size : NbStates x 1
- Q
: Value of each action state
- ActionList
: List of each action
- Pi
: Matrix with the optimal policy found. Size : NbStates x 1
Description
Search an optimal policy for the environment described by the others parameters.
Examples
None
See Also
none