CS 294 / Stat 260, Fall 2014:
Learning in Sequential Decision
Problems
Class Discussion Papers
Available Papers
This list will be updated through the semester.
If you'd like to choose to present one of these papers, then
be the first to email me (bartlett at cs) with the name of the
paper.
-
(no longer available)
Deviations of stochastic bandit regret.
Antoine Salomon and Jean-Yves Audibert.
Proceedings of Algorithmic Learning Theory (ALT 2011). pp 159-173.
2011.
-
(no longer available)
Exploration-exploitation trade-off using variance
estimates in multi-armed bandits.
J.-Y. Audibert, R. Munos, and C. Szepesvari.
Theoretical Computer Science, vol. 410, pp. 1876-1902.
2009.
-
(no longer available)
Bounded regret in stochastic multi-armed bandits.
Sébastien Bubeck, Vianney Perchet and Philippe Rigollet.
arXiv:1302.1611 [math.ST].
2013.
-
(no longer available)
Bayes UCB: On Bayesian Upper Confidence Bounds for Bandit Problems.
Emilie Kaufmann, Olivier Cappe, Aurelien Garivier.
JMLR W&CP 22:592-600.
2012.
-
(no longer available)
Thompson Sampling for 1-Dimensional Exponential Family Bandits.
Nathaniel Korda, Emilie Kaufmann, and Rémi Munos.
arXiv:1307.3400 [stat.ML].
2013.
-
(no longer available)
Thompson Sampling for Complex Bandit Problems.
Aditya Gopalan, Shie Mannor and Yishay Mansour.
arXiv:1311.0466 [stat.ML].
2013.
-
(no longer available)
Learning to Optimize Via Posterior Sampling.
Daniel Russo and Benjamin Van Roy.
arXiv:1301.2609 [cs.LG].
2013.
-
(no longer available)
An Empirical Evaluation of Thompson Sampling.
Olivier Chapelle and Lihong Li.
Advances in Neural Information Processing Systems 24:2249-2257.
2011.
-
(no longer available)
From Bandits to Experts: On the Value of Side-Observations.
Shie Mannor, Ohad Shamir.
NIPS 2011.
-
(no longer available)
Better Algorithms for Benign Bandits.
Elad Hazan, Satyen Kale.
JMLR 12:1287−1311, 2011.
-
(no longer available)
A near-optimal algorithm for finite partial-monitoring games against
adversarial opponents.
Gabor Bartok.
COLT 2013.
-
Robust approachability and regret minimization in games with partial
monitoring.
Shie Mannor, Vianney Perchet and Gilles Stoltz.
JMLR (to appear).
-
(no longer available)
Restless bandits, linear programming relaxations,
and a primal-dual index heuristic.
Dimitris Bertsimas and Jose Nino-Mora.
Operations Research, 48, 80-90, 2000.
-
(no longer available)
The on-line shortest path problem under partial monitoring.
A. Gyorgy, T. Linder, G. Lugosi, and G. Ottucsak.
Journal of Machine Learning Research, vol. 8, pp. 2369-2403, 2007.
-
(no longer available)
Regret in Online Combinatorial Optimization.
Jean-Yves Audibert, Sébastien Bubeck, Gábor Lugosi.
Mathematics of Operations Research, 39:31--45.
-
(no longer available)
Contextual Bandits with Similarity Information.
Aleksandrs Slivkins.
COLT 2011. JMLR W&CP 19:679-702, 2011.
-
(no longer available)
Parametric Bandits: The Generalized Linear Case.
Sarah Filippi, Olivier Cappe, Aurelien Garivier and Csaba Szepesvari.
NIPS 2010: 586-594, 2010.
-
(no longer available)
Thompson Sampling for Contextual Bandits with Linear Payoffs.
Shipra Agrawal and Navin Goyal.
ICML2013 pp. 127-135. 2013.
-
(no longer available)
Resourceful Contextual Bandits.
Ashwinkumar Badanidiyuru, John Langford and Aleksandrs Slivkins.
COLT2014, pp. 1109-1134. 2014.
-
Finite-time analysis of kernelised contextual bandits.
Michal Valko, Nathaniel Korda, Rémi Munos, Ilias Flaounas,
and Nello Cristianini.
UAI 2013.
-
(no longer available)
Interior-Point Methods for Full-Information and Bandit Online
Learning.
Jacob Abernethy, Elad Hazan and Alexander Rakhlin.
IEEE Transactions on Information Theory. 58(7):4164-4175, 2012.
Back to course home page