RLtoolbox Function
TD
Calling Sequence
- [V,T]=TD_nonbatch_prediction(NbEpisodes,NbStates,NbActions,Alpha,Gamma)
Parameters
- NbEpisodes
: Number of episodes to simulate
- NbStates
: Number of states in environnement
- NbActions
: Number of possible actions
- Alpha
: Convergence parameter
- Gamma
: Parameter influencing "the memory" of the agent. 0 < G < 1
- V
: The value-state function
- T
: Some V Value memorized for display
Description
Compute V values for NbEpisodes
Examples
None
See Also
Sarsa: On-Policy TD Control
Q-learning: Off-Policy TD Control
R-learning: for Undiscounted Continual Tasks