Supervisors: David Leslie (Lancaster) and Raphael Clifford (Bristol)
Closing date: When we have found a suitable candidate
Start date: October 2022
Reinforcement learning (RL) is machine learning technique in which
computers experiment with an environment and learn effective
behaviour. The problems in which RL are particularly effective are
sequential decision-making problems where the task to be solved
consists of observing the state of the environment, selecting an
action, incurring some cost, and moving to a new state, where the cost
and the successor state depend both on the initial state and the
action selected; the canonical mathematical formulation of these
challenges is a Markov decision process. The most famous recent
example of RL success is the game of Go, addressed by Deepmind, which
builds on a rich history in both games and individual decision-making
examples (see Sutton and Barto (2018) for a survey).
In common reinforcement learning approaches, the set of actions which
is available to the decision maker at each time instant is very
regular. In many examples it is either fixed, finite and small (e.g.
move North, South, East or West), or a simple continuous space (an
angle and speed to move at). However, in lots of problems, the action
space is more complex. In robotic soccer-playing environments, the
player can choose whether to run, turn or kick, and each of these
choices is then parameterised by the strength and/or direction; this
type of action space, with a finite number of action families each of
which is indexed by a parameter, is called a parameterised action
space.
In contrast to image processing, for which standard deep learning
methods and libraries now exist, when action sets with complex
structure are encountered, custom solutions have generally been
required. This custom approach severely hinders the ability of
non-specialists to deploy RL methods on their own problems. Thus the
focus of the PhD topic is to formulate and code modular reinforcement
learning components for general structured action spaces.
The project could take several directions, including:
• devising policy optimisation analogues of existing value learning
approaches for structured action spaces
• extending parameterised action space approaches to more general
structured action spaces
• deriving exploration strategies for parameterised action spaces to
ensure efficient experimentation
A successful candidate will have skills in both mathematics and
computer science - you will formulate methods for awkward action
spaces, implement methods in modular code, and run computer
experiments to compare methods on various problems.
To start the application process, please send your undergraduate
transcript to d.leslie@lancaster.ac.uk with a brief note about why
this project interests you.
--
David Leslie (he/him/his), Professor of Statistical Learning,
Head of Statistics, Lancaster University
**********************************************************
*
* Contributions to be spread via DMANET are submitted to
*
* DMANET@zpr.uni-koeln.de
*
* Replies to a message carried on DMANET should NOT be
* addressed to DMANET but to the original sender. The
* original sender, however, is invited to prepare an
* update of the replies received and to communicate it
* via DMANET.
*
* DISCRETE MATHEMATICS AND ALGORITHMS NETWORK (DMANET)
* http://www.zaik.uni-koeln.de/AFS/publications/dmanet/
*
**********************************************************