
Utilizing off-policy offline data is still a hard problem for reinforcement learning agents. In this project we will investigate methods of making unbiased use of very large datasets through experience replays. Our agent will simultaneously learn how to efficiently sample helpful transitions while learning to gather new experience.