RLtoolbox Function
TD
Calling Sequence
- Q=TD_R_learning(NbIterations,Actionstates,NbStates,NbActions,Alpha,Beta,Epsilon)
Parameters
- NbIterations
: Number of episodes to simulate
- Actionstates
: Matrix definining possible action in a state
- NbStates
: Number of states in environnement
- NbActions
: Number of possible actions
- Alpha
: Convergence parameter
- Beta
: Convergence parameter
- Epsilon
: Parameter influencing Epsilon-greedy policy
- Q
: The value-action function
Description
Compute V values for NbEpisodes
Examples
None
See Also
TD prediction (policy Evaluation)
Sarsa: On-Policy TD Control
Q-learning: Off-Policy TD Control