Recurrence conditions for average and Blackwell optimality in denumerable state Markov decision chains.
In a previous paper Dekker and Hordijk (1988) presented an operator theoretical approach for multichain Markov decision processes with a countable state space, compact action sets and unbounded rewards. Conditions were presented guaranteeing the existence of a Laurent series expansion for the discounted rewards, the existence of average and Blackwell iptimal policies and the existence of solutions for the average and Blackwell optimality equations. While these assumptions were operator oriented and formulated as conditions for the deviation matrix, we will show in this paper that the same approach can also be carried out under recurrence conditions. These new conditions seem easier to check in general and are especially suited for applications in queuingn models.
|Keywords||Denumerable Markov Decision Chains, Laurent series expansion, average optimality, multi-chain model, optimality equation, recurrence conditions, sensitive optimality criteria, unbounded immediate rewards|
Dekker, R., & Hordijk, A.. (1992). Recurrence conditions for average and Blackwell optimality in denumerable state Markov decision chains.. Mathematics of Operations Research, 271–289. Retrieved from http://hdl.handle.net/1765/2252