Abstract
In this paper we investigate denumerable state semi-Markov decision chains with small interest rates. We consider average and Blackwell optimality and allow for multiple closed sets and unbounded immediate rewards. Our analysis uses the existence of a Laurent series expansion for the total discounted rewards and the continuity of its terms. The assumptions are expressed in terms of a weighted supremum norm. Our method is based on an algebraic treatment of Laurent series; it constructs an appropriate linear space with a lexicographic ordering. Using two operators and a positiveness property we establish the existence of bounded solutions to optimality equations. The theory is illustrated with an example of aK-dimensional queueing system. This paper is strongly based on the work of Denardo [11] and Dekker and Hordijk [7].
Similar content being viewed by others
References
J.A. Bather, Optimal decision procedures for finite Markov chains, part I: Examples, Adv. Appl. Prob. 5 (1973) 328–339; part II: Communicating systems, Adv. Appl. Prob. 5 (1973) 521–540; part III: General convex systems. Adv. Appl. Prob. 5 (1973) 541–553.
D. Blackwell, Discrete dynamic programming, Ann. Math. Statist. 33 (1962) 719–726.
D. Blackwell, Discounted dynamic programming, Ann. Math. Statist. 36 (1965) 226–235.
K.L. Chung,Markov Chains with Stationary Transition Probabilities (Springer, Berlin, 1960).
E. Cinlar,Introduction to Stochastic Processes (Prentice-Hall, Englewood Cliffs, NJ, 1975).
J.S. De Cani, A dynamic programming algorithm for embedded Markov chains when the planning horizon is infinitely, Mgmt. Sci. 10 (1964) 716–733.
R. Dekker and A. Hordijk, Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards, Math. Oper. Res. 13 (1988) 395–421.
R. Dekker and A. Hordijk, Recurrence conditions for average and Blackwell optimality in denumerable state Markov decision chains, Technical report, Dept. of Math, and Comp. Sci, Univ. of Leiden (1989) to appear in Math. Oper. Res.
R. Dekker, Denumerable Markov decision chains: optimal policies for small interest rates, Ph.D. thesis, Univ. of Leiden (1985).
E.V. Denardo and B.L. Miller, An optimality condition for discrete dynamic programming with no discounting, Ann. Math. Statist. 39 (1968) 1220–1227.
E.V. Denardo, Markov renewal programming with small interest rates, Ann. Math. Statist. 42 (1971) 477–496.
H. Deppe, On the existence of average optimal policies in semiregenerative decision models, Math. Oper. Res. 9 (1984) 558–575.
C. Derman, Denumerable state Markovian decision processes — average cost criterion, Ann. Math. Statist. 42 (1966) 1545–1553.
A. Federgruen, A. Hordijk and H.C. Tijms, Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion, Stoch. Proc. Appl. 9 (1979) 223–235.
A. Federgruen, P.J. Schweitzer and H.C. Tijms, Denumerable undiscounted semi-Markov decision processes with unbounded rewards, Math. Oper. Res. 8 (1983) 298–313.
A. Federgruen and H.C. Tijms, The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms, J. Appl. Prob. 15 (1978) 356–373.
J.M. Harrison, Discrete dynamic programming with unbounded rewards, Ann. Math. Statist. 43 (1972) 636–644.
A. Hordijk,Dynamic Programming and Markov Potential Theory, Mathematical Centre Tract no. 51, Amsterdam (1974).
A. Hordijk and K. Sladky, Sensitive optimality criteria in countable state dynamic programming, Math. Oper. Res. 2 (1977) 1–14.
R.A. Howard,Semi-Markovian Decision Processes, Proc. Int. Statist. Inst., Ottawa, Canada (1963).
W.S. Jewell, Markov-renewal programming I: formulation, finite return models; Markov-renewal programming II: infinite return models, example, Oper. Res. 11 (1963) 938–971.
G.P. Klimov, Time-sharing service systems I, Th. Prob. Appl. 19 (1974) 532–551.
J.B. Lasserre, Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes, J. Math. Anal. Appl. 136 (1988) 479–490.
E. Mann, Optimality equations and sensitive optimality in bounded Markov decision processes, Optimization 16 (1985) 767–781.
B.L. Miller, Finite state continuous time Markov decision processes with infinite planning horizon, J. Math. Anal. Appl. 22 (1968) 552–569.
B.L. Miller and A.F. Veinott Jr., Discrete dynamic programming with a small interest rate, Ann. Math. Statist. 40 (1969) 366–370.
S.M. Ross, Non-discounted denumerable Markovian decision models, Ann. Math. Statist. 39 (1968) 412–423.
S.M. Ross,Applied Probability Models with Optimization Applications (Holden-Day, San Fransisco, 1970).
H.L. Royden,Real Analysis (MacMillan, New-York, 2nd ed. 1968).
M. SchÄl, Conditions for optimality in dynamic programming and for the limit ofn-stage optimal policies to be optimal, Z. Wahr. verw. Gebiete 32 (1975) 179–196.
M. SchÄl, On the second optimality equation for semi-Markov decision models, Technical report. Inst. Angew. Math., Univ. Bonn (1989) submitted for publication.
P.J. Schweitzer, Perturbation theory and Markovian decision processes, Ph.D. dissertation, Mass. Inst. of Techn. (1965).
L.J. Sennott, Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs, Oper. Res. 37 (1989) 626–633.
L.J. Sennott, Average cost semi-Markov decision processes and the control of queueing systems. Prob. Eng. Inform. Sci. 2 (1989) 247–272.
F.M. Spieksma, The existence of sensitive optimal policies in two multi-dimensional queueing models, this volume, pp. 273–296.
Sh. Stidham, Jr. and R.R. Weber, Monotonic and insensitive optimal policies for control of queues with undiscounted costs, Oper. Res. 87 (1989) 611–625.
J.A.E.E. van Nunen,Contracting Markov Decision Processes, Mathematical Centre Tract no. 71, Amsterdam (1976).
J.A.E.E. van Nunen and J. Wessels, Markov decision processes with unbounded rewards, in:Markov Decision Theory, eds. H.C. Tijms and J. Wessels, Mathematical Centre Tract no. 93, Amsterdam (1976).
R.R. Weber and Sh. Stidham, Jr., Optimal control of service rates in networks of queues, Adv. Appl. Prob. 19 (1987) 202–218.
A.F. Veinott Jr., On discrete dynamic programming with sensitive discount optimality criteria, Ann. Math. Stat. 40 (1969) 1635–1660.
W.H.M. Zijm, The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the bounded cost case, Stat. Decisions 3 (1985) 143–165.
W.H.M. Zijm, The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the unbounded cost case, C.Q.M. Note 22, Centre for Quant. Methods, Philips B.V. Eindhoven (1984).
Author information
Authors and Affiliations
Additional information
This research has partially been sponsored by the Netherlands Organization for Scientific Research (NWO).
Rights and permissions
About this article
Cite this article
Dekker, R., Hordijk, A. Denumerable semi-Markov decision chains with small interest rates. Ann Oper Res 28, 185–211 (1991). https://doi.org/10.1007/BF02055581
Issue Date:
DOI: https://doi.org/10.1007/BF02055581