Counter examples for compact action Markov decision chains with average reward criteria

Dekker, Rommert

doi:10.1080/15326348708807061

In this note we present two examples of compact-action finite-state Markov decision chains in which a policy improvement procedure yields wrong or limited results. In the first example, which exhibits a multichain structure, there is no convergence of the average rewards of the successive policies to the maximal value. In the second example, which has a unichain structure, the lack of uniqueness of maximizing policies in each step of the algorithm means that there is no convergence of either bias vectors or maximizing policies. Accordingly, no solution to the average optimality equations can be obtained.

Additional Metadata
Keywords	Markov Decision Chains, stochastic models
Persistent URL	doi.org/10.1080/15326348708807061, hdl.handle.net/1765/2250
Journal	Stochastic Models
Organisation	Erasmus School of Economics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Dekker, R. (1987). Counter examples for compact action Markov decision chains with average reward criteria. Stochastic Models, 357–368. doi:10.1080/15326348708807061

View at Publisher

Free Full Text ( Final Version , 305kb )

Additional Files
publisher's version Final Version

Counter examples for compact action Markov decision chains with average reward criteria

Publication

Publication

About

Counter examples for compact action Markov decision chains with average reward criteria

Publication

Publication

Workflow

Workflow

Add Content