Reinforcement Learning

There are three main angles that we take in studying reinforcement learning. These are:

  1. Improving the efficiency of specific algorithms for continuous control (sample efficiency)
  2. Tools for the interpretability of deep networks trained to perform control
  3. Custom environments for training DRL systems

Sample Efficiency

Tianhong Dai and Kai Arulkumaran have been working on techniques to improve specific algorithms for continuous control. These include increasing the diversity of action sequences, and the selection of diverse goals in sparse reward contexts. Recent papers have appeared in PRICAI 2021, and Neurocomputing.


DRL systems make use of deep neural networks in several ways, including to suggest actions, to provide estimates of Q-values or rewards, and even to critique actions that are taken. Treating these networks as black-boxes does not inspire huge confidence.

Custom environments

DRL is typically evaluated in well known “benchmark” situations, involving physics-based simulations of the real world: examples include an inverted pendulum, simple models of articulated, grasping robots and wheeled robots.

However, there are many other places where DRL could have impact, and whilst these are not widely used by the DRL community, they have interest in other scientific/engineering areas. These might include engineering systems which exhibit active control (e.g. damping systems), and even the process of tracking, structures, similar to techniques of active vision.

Our work on axon tracking (a type of problem encountered in microscopy for neuroscience) is an example of the possible use of DRL in a non-traditional context.