OxTalks is Changing
On 28th November OxTalks will move to the new Halo platform and will become 'Oxford Events' (full details are available on the Staff Gateway).
There will be an OxTalks freeze beginning on Friday 14th November. This means you will need to publish any of your known events to OxTalks by then as there will be no facility to publish or edit events in that fortnight. During the freeze, all events will be migrated to the new Oxford Events site. It will still be possible to view events on OxTalks during this time.
If you have any questions, please contact halo@digital.ox.ac.uk
Regret bounds for Narendra-Shapiro bandit algorithms
Narendra-Shapiro (NS) algorithms are bandit-type algorithms introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this talk, we study the efficiency of these bandit algorithms from a regret point of view. We show that some competitive bounds can be obtained for such algorithms in a modified penalized version. Up to an over-penalization modification, the pseudo-regret Rn related to the penalized two-armed bandit is uniformly bounded by C sqrt(n) (for a known C). We also generalize existing convergence and rates of convergence results to the multi-armed case of the over-penalized bandit algorithm, including the convergence toward the invariant measure of a Piecewise Deterministic Markov Process (PDMP) after a suitable renormalization. Finally, ergodic properties of this PDMP are given in the multi-armed case.
Date:
30 April 2015, 14:15
Venue:
1 South Parks Road, 1 South Parks Road OX1 3TG
Venue Details:
Lecture Theatre
Speaker:
Sebastian Gadat (Toulouse School of Economics)
Organising department:
Department of Statistics
Part of:
Statistics, Applied Probability & Operational Research Seminars
Booking required?:
Not required
Audience:
Members of the University only
Editor:
Anne Bowtell