CodaLab -

> How to train a batch reinforcement based agent without target position

I'm trying to use batch reinforcement learning method as mentioned in the white paper to solve the robot learning problem. However, I found that there is no target position in the demonstation data, which means that I can't get the reward for the control samples in each timestamp. What's more, if I can't evaluate the demonstration policy but treat it as a perfect expert control policy, then there is impossible to learn a control policy that obtains a control loss lower that 1 (which means that it is better than the demonstration policy). Then, this problem becomes a pure imitation learning one.Therefore, is it possible to add the target position in the demonstation data? Or does anyone agree that this problem can be solved by a batch reinforcement learning algorithm without target position?

Posted by: zhaoyinuo @ Aug. 2, 2021, 9:38 a.m.

Thank you for your question. As clarification of the terminology in the whitepaper, we think of the tasks as 'offline' because training data is collected without the agent interacting with the system. This is not meant to imply that data and tasks fit directly into any offline learning framework.

The training targets are only used internally to generate sensible training trajectories and hence do not contain any intended task-related information (see also 'Training data' in Section 1.2 of the whitepaper). At this moment we will not include targets in the training data.

We hope this helps,
Søren

Posted by: LearningByDoing @ Aug. 4, 2021, 2:16 p.m.

Post in this thread

Forums

Learning By Doing NeurIPS 2021 Competition – ROBO Forum

> How to train a batch reinforcement based agent without target position