Counter examples for compact action Markov decision chains with average reward criteria


Article
pp 357-368.
Related Files
asset icon
(CounterExamples_1987.pdf, 0.3MB)

(publisher's version.url.txt, 44 bytes)

In this note we present two examples of compact-action finite-state Markov decision chains in which a policy improvement procedure yields wrong or limited results. In the first example, which exhibits a multichain structure, there is no convergence of the average rewards of the successive policies to the maximal value. In the second example, which has a unichain structure, the lack of uniqueness of maximizing policies in each step of the algorithm means that there is no convergence of either bias vectors or maximizing policies. Accordingly, no solution to the average optimality equations can be obtained.



Keywords


Automatically Extracted Terms
  • c t i
  • t h e
  • dutch library consortium
  • i t i
  • t h i
  • e c i
  • dekker titles
  • dekker
  • i n e
  • reward
  • markov
  • policy
  • improvement
  • t e p
  • n p r
  • n c h
  • title
  • library
  • dutch
  • downloaded