Chapter 9: On-policy Prediction with Approximation From this chapter, we...
Chapter 7: n-step Bootstrapping n-step TD methods span a spectrum with M...
Chapter 6: Temporal-Difference Learning Temporal-difference (TD) learnin...
Chapter 5: Monte Carlo Methods Monte Carlo (MC) methods are learning met...
Chapter 4: Dynamic Programming Dynamic programming computes optimal poli...
Chapter 3: Finite Markov Decision Processes Basic Definitions MDP is the...
Chapter 2: Multi-armed Bandits Multi-armed bandits can be seen as the si...
Pointer Networks Oriol Vinyals, Meire Fortunato, Navdeep JaitlyGoogle, B...
Neural Computation of Decisions in Optimization Problems J. J. Hopfield,...