RLtoolbox Function

TD

Calling Sequence

Q=TD_Sarsa(NbEpisodes,Actionstates,NbStates,NbActions,Alpha,Gamma,Epsilon)

Parameters

Description

Compute Q values for NbEpisodes.

Examples

See Also

TD prediction (policy Evaluation)
Q-learning: Off-Policy TD Control
R-learning: for Undiscounted Continual Tasks