Our website is made possible by displaying online advertisements to our visitors.
Please consider supporting us by disabling your ad blocker.

Responsive image


Q-learning

Q-learning is a model-free reinforcement learning algorithm that teaches an agent to assign values to each action it might take, conditioned on the agent being in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations.[1]

For any finite Markov decision process, Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.[2] Q-learning can identify an optimal action-selection policy for any given finite Markov decision process, given infinite exploration time and a partly random policy.[2] "Q" refers to the function that the algorithm computes: the expected reward—that is, the quality—of an action taken in a given state.[3]

  1. ^ Li, Shengbo (2023). Reinforcement Learning for Sequential Decision and Optimal Control (First ed.). Springer Verlag, Singapore. pp. 1–460. doi:10.1007/978-981-19-7784-8. ISBN 978-9-811-97783-1. S2CID 257928563.{{cite book}}: CS1 maint: location missing publisher (link)
  2. ^ a b Melo, Francisco S. "Convergence of Q-learning: a simple proof" (PDF).
  3. ^ Matiisen, Tambet (December 19, 2015). "Demystifying Deep Reinforcement Learning". neuro.cs.ut.ee. Computational Neuroscience Lab. Retrieved 2018-04-06.

Previous Page Next Page






Q-learning Catalan Q-Lernen German Q-learning Spanish کیو-یادگیری FA Q-learning French Q-learning HE Q-learning Italian Q学習 Japanese Q 러닝 Korean Q-læring NB

Responsive image

Responsive image