Online Distillation Network

We propose a lifelong learning system that utilizes knowledge obtained from previously learned skills in order to help generalize and accelerate the learning of new skills. A skill is a strategy learned by the agent in order to perform a certain task. A Multi-Task Agent is an agent that can use multiple skills.

Policy Distillation is an offline learning method to create Multi-Task Agents, by using multiple copies of the last layer of a NN, each correlating to a skill. These agents cannot learn new skills.

We propose a method to create these Multi-Task agents with the ability to learn new skills. In our novel approach, we accumulate knowledge in the form of skills and enable the learning of new skills, using our online form of policy distillation, eliminating the need for a domain expert required in traditional distillation setups, speeding up the training process while enabling knowledge retention of previously learned skills.

This is done by incorporating two changes into the architecture of the NN:

1. Increase number of task-specific layers, compared to policy distillation's traditional last-layer-per-task. As in traditional policy distillation, each task has its own set of task-specific layers.

2. Changing the network optimization flow. While learning a new skill, we do not want changes made to the domain-related layers. By only optimizing the task-specific layers set, we achieve both increase in learning speed and overcoming catastrophic forgetting.