RLtoolbox Function Value_Iteration - Search an optimal policy.
Calling Sequence
- [Pi,V,T_V,]=Value_Iteration(NbStates, NbActions, TransProb, Rewards, Gamma, Actions_States)
Parameters
- NbStates
: Number of states of the agent could go in
- NbActions
: Number of actions the agent could make
- TransProb
: List representing the probabilities of transition between two states with a given action. Size of the list : NbActions, Size of each element : NbStates x NbStates
- Rewards
: List with the rewards given by the environnment to the agent when it choose an action from a given state. Size of the list : NbActions, Size of each element : NbStates x NbStates
- Gamma
: Parameter influencing "the memory" of the agent. 0 < G < 1
- Actions_States
: Matrix representing
- Pi
: Matrix with the optimal policy found. Size : NbStates x 1
- V
: Matrix with the value of each state. Size : NbStates x 1
- T_V
: List with all the matrices Vcalculated by the function.
Description
Search an optimal policy for the environnment described by the others parameters.
Examples
None
See Also
Iter_Policy_Improv