Learning By Doing NeurIPS 2021 Competition – ROBO Forum

Go back to competition Back to thread list Post in this thread

> Evaluation function


I just started looking into this track and I have a question about the evaluation strategy. We are given a typical LQR type objective but where we can't really know the two constants b and c? While the overall scale of these two constants is irrelevant, their relative size is not. Without knowing what is the desired tradeoff between tracking error and input energy for each repetition beforehand how can we know whether to make the controller more conservative or more aggressive?

I just find it a bit strange that we can't really know what we are trying to optimize for. Even if we were somehow able to figure out the true dynamics AND the LQR controller used we don't know the target trajectory beforehand so we can't really compute the scaling constants. Not sure if I'm missing something.

Best regards,

Posted by: Ajoo @ Aug. 24, 2021, 11:24 a.m.

Hi João,

thank you for your question. Your observation is correct, it seems, and this does add to the complexity of the task.

The scaling coefficients for the score are selected to normalize the difficulty of target trajectories as described by the whitepaper.

Best wishes,

Posted by: LearningByDoing @ Aug. 27, 2021, 11:15 a.m.
Post in this thread