Colloquium Talk – Friday, Feb 21 at 2:00 PM
Dr. Masayuki Horiguchi from Kanagawa University, Japan, invited by Dr. Isaac Sonin, will be giving a colloquium talk on Friday, February 21, at 2:00 PM in Fretwell 116.
The title and abstract of the talk are provided below.
Title: Interval Bayesian Markov decision processes: the average optimality criteria
Speaker: Masayuki HORIGUCHI
(Kanagawa University, Fac. Sci., Dept. Math., Yokohama Japan)
Abstract: We treat sequential decision model with probabilistic state-action transition law called Markov Decision Processes (MDPs). In analyzing and solving practical problems of Markov decision processes (c.f. (Blackwell 1962, Hinderer 1970, Howard 1960, Puterman 1994)), we often come across with the case that the transition probability of states varies in some domain at each time and its variation is unknown or unobservable to the decision maker. In order to treat such a case, (Kurano et al. 1998) has introduced a decision model, called controlled Markov set-chains, based on Markov set-chains(Hartfiel 1998). In this model, true transition laws for each state-action are all expressed as the set of interval transition matrices. Markov decision processes (MDPs) with unknown transition matrices is also called uncertain MDPs.
In this talk, we give an introductory guide to understand our uncertain MDPs as the following:
we formulate uncertain MDPs by using Bayesian inference of intervals of prior measures by (DeRobertis and Hartigan 1981), the true transition matrix is estimated by an interval matrix which is induced from state-action observations and Bayesian mechanics. Uncertain MDPs of Bayesian approach is defined as a controlled Markov set-chain model and we consider the case of average optimality criteria (c.f. (Kurano et al. 1999)). Interval representation of transition law which is inferred based on observed data set of pairs of state-action has `robust’ property in the sense that it is more flexible than one point estimation. It is useful for `rough approximation’ of unknown transition matrices. Brief sketch of our optimization approach are (i) we give a brief exposition of characterization about Pareto Optimal policies, (ii) we develop the asymptotic bounds of value function v_t(f) with stationary distribution on periodic Markov chains, (iii) we deal with the case of Π_1-Pareto optimality, (iv) we develop interval Bayesian Markov decision processes under expected average criteria. Also, we give numerical examples related to stochastic optimization problems on passage times and traffic assignments, respectively.