Résumé | Markov Decision Processes (MDP) are a widely used model including both
non-deterministic and probabilistic choices. Minimal and maximal
probabilities to reach a target set of states, with respect to a
policy resolving non-determinism, may be computed by several methods
including value iteration. This algorithm, easy to implement and
efficient in terms of space complexity, consists in iteratively
finding the probabilities of paths of increasing length. However, it
raises three issues: (1) defining a stopping criterion ensuring a
bound on the approximation, (2) analyzing the rate of convergence, and
(3) specifying an additional procedure to obtain the exact values once
a sufficient number of iterations has been performed. The first two
issues are still open and for the third one a « crude » upper bound on
the number of iterations has been proposed. Based on a graph analysis
and transformation of MDPs, we address these problems. First we
introduce an interval iteration algorithm, for which the stopping
criterion is straightforward. Then we exhibit convergence
rate. Finally we significantly improve the bound on the number of
iterations required to get the exact values. |