RLtoolbox Function
TD
Calling Sequence
- Q=TD_Q_learning(NbEpisodes,Actionstates,NbStates,NbActions,Alpha,Gamma,Epsilon)
Parameters
- NbEpisodes
: Number of episodes to simulate
- Actionstates
: Matrix definining possible action in a state
- NbStates
: Number of states in environnement
- NbActions
: Number of possible actions
- Alpha
: Convergence parameter
- Gamma
: Parameter influencing "the memory" of the agent. 0 < G < 1
- Q
: The value-action function
Description
Compute V values for NbEpisodes
Examples
None
See Also
TD prediction (policy Evaluation)
Sarsa: On-Policy TD Control
R-learning: for Undiscounted Continual Tasks