Thu, Aug 28 |
Organizational issues. Course outline.
Stochastic bandits.
|
Syllabus
Stochastic bandits
|
Tue, Sep 2 |
Regret lower bounds for stochastic bandits.
Lecture will stop at 2:30, to allow the class to attend
Tze Leung Lai's talk (2:30-3:30, 3113 Etcheverry Hall).
|
Lower bounds
Tze Leung Lai talk details
|
Thu, Sep 4 |
Regret lower bounds.
Robbins' strategy.
|
Robbins' strategy
See also
(Robbins, 1952).
|
Tue, Sep 9 |
Regret upper bounds:
Concentration inequalities.
UCB.
|
UCB notes
See also: Section 2.2 of
(Bubeck and Cesa-Bianchi, 2012);
(Agrawal, 1995);
(Auer, Cesa-Bianchi and Fischer, 2002);
Sections 3 and 4 of
(Lai and Robbins, 1985).
|
Thu, Sep 11 |
KL-UCB.
|
KL-UCB notes
See also:
(Cappé, Garivier, Maillard, Munos, Stoltz, 2013).
|
Tue, Sep 16 |
More KL-UCB.
|
Updated KL-UCB notes
|
Thu, Sep 18 |
Gittins index.
|
Gittins notes
See also:
(Weber, 1992).
|
Tue, Sep 23 |
Thompson sampling.
|
Thompson notes
See also:
(Agrawal and Goyal, 2012).
|
Thu, Sep 25 |
Minimax regret.
|
Minimax notes
See also: Section 5 and Appendix A of
(Auer et al, 2002).
|
Tue, Sep 30 |
Adversarial bandits.
|
Adversarial bandits
See also: Section 3 of
(Auer et al, 2002).
|
Thu, Oct 2 |
Partial information games.
|
Partial monitoring
See also:
(Piccolboni and Schindelhauer, 2001),
(Cesa-Bianchi et al, 2006),
(Bartók et al, 2011).
|
Tue, Oct 7 |
Contextual bandits.
|
Contextual bandits
See also:
(Woodroofe, 1979),
(Sarkar, 1991),
Section 7 of
(Auer et al, 2002).
|
Thu, Oct 9 |
Contextual bandits: Infinite comparison classes. epsilon-covers.
|
Contextual bandits,
epsilon-covers
See also:
(Beygelzimer et al, 2011).
Discussion paper:
Kaufmann et al, 2012.
Bayes UCB: On Bayesian Upper Confidence Bounds for Bandit Problems.
|
Tue, Oct 14 |
Contextual bandits: reduction to classification.
|
Contextual bandits,
Reduction to classification
See also:
(Agarwal et al, 2014).
Discussion papers:
Gyorgy et al, 2007. The on-line shortest
path problem under partial monitoring.
Agrawal and Goyal, 2012.
Thompson Sampling for Contextual Bandits with Linear Payoffs.
|
Thu, Oct 16 |
Linear bandits.
|
Linear bandits
See also:
(Awerbuch and Kleinberg, 2008),
(Dani et al, 2008)
Discussion paper:
Chapelle and Li, 2011.
An Empirical Evaluation of Thompson Sampling.
|
Tue, Oct 21 |
Linear bandits: exponential weights.
|
More linear bandits
See also:
(Cesa-Bianchi and Lugosi, 2009),
(Bubeck et al, 2012)
Discussion paper:
Bertsimas and Nino-Mora, 2000.
Restless bandits, linear programming relaxations, and a primal-dual index heuristic.
|
Thu, Oct 23 |
Linear bandits: lower bounds.
|
Still more linear bandits
See also:
(Dani et al, 2008)
Discussion papers:
Bartok, 2013.
A near-optimal algorithm for finite partial-monitoring games against adversarial opponents.
Korda et al, 2013.
Thompson Sampling for 1-Dimensional Exponential Family Bandits.
Audibert et al. 2009.
Exploration-exploitation trade-off using variance
estimates in multi-armed bandits.
|
Tue, Oct 28 |
Linear bandits: stochastic mirror descent.
|
Mirror descent
See also: Chapter 5 of
Bubeck and Cesa-Bianchi, 2012.
Discussion papers:
Gopalan et al, 2013.
Thompson Sampling for Complex Bandit Problems.
Audibert et al, 2012.
Regret in Online Combinatorial Optimization.
|
Thu, Oct 30 |
Linear bandits: stochastic mirror descent.
|
See also:
Bubeck et al, 2012.
Discussion papers:
Russo and Van Roy, 2013.
Learning to Optimize Via Posterior Sampling.
Badanidiyuru et al, 2014.
Resourceful Contextual Bandits.
|
Tue, Nov 4 |
Markov decision processes.
|
MDPs
See also
Slides
based on
Bertsekas, 2005.
Discussion paper:
Mannor and Shamir, 2011.
From Bandits to Experts: On the Value of Side-Observations.
|
Thu, Nov 6 |
More Markov decision processes.
|
More MDPs
Discussion papers:
Salomon and Audibert, 2011.
Deviations of stochastic bandit regret.
Slivkins, 2009.
Contextual Bandits with Similarity Information.
|
Tue, Nov 11 |
Veterans Day |
|
Thu, Nov 13 |
Approximate methods for MDPs.
|
Approximate methods
See also
Chapter 6
of Bertsekas, 2012.
Discussion papers:
Hazan and Kale, 2011.
Better Algorithms for Benign Bandits.
Abernethy et al, 2012.
Interior-Point Methods for Full-Information and Bandit Online
Learning.
|
Tue, Nov 18 |
Guest lecture:
Nikos Vlassis
On the Computational Complexity of Stochastic Controller Optimization
in POMDPs.
|
Discussion papers:
Filippi et al, 2010.
Parametric Bandits: The Generalized Linear Case.
Bubeck et al, 2013.
Bounded regret in stochastic multi-armed bandits.
|
Tue, Nov 25 |
Guest lecture:
Mohammad Ghavamzadeh.
Finite-Sample Analysis of Approximate DP Algorithms.
|
See also:
ICML2012 Tutorial:
Statistical Learning Theory in Reinforcement Learning and Approximate Dynamic Programming.
|
Thu, Nov 27 |
Thanksgiving |
|
Tue, Dec 2 |
Final project presentations: CS 194 students.
|
Max and Walid; Dylan; Ming and Yuxun; Michael and Jeff; Ke; James.
|
Thu, Dec 4 |
Final project presentations: Stat 260 students.
|
Animesh; Andres and Yonatan; Kieren; Soeren and Soumendu; Jiung and
Siyuan; Yannik; Auyon
and Birce.
|