It seems that after we edited the observation function and selected some of the data as a truncated/transformed agent's observation, the environment's observation function is changed globally and the default full observation is no more available. We are trying to use the truncated observation to train the agent, whereas use the full observation for other purposes, is there a way to do this under the evaluation scheme?
Posted by: Team67 @ Dec. 5, 2019, 3:57 p.m.A simple workaround here may be to use a global variable to collect the un-transformed observations for out-of-agent processing, something like:
https://gist.github.com/davidrusu/ed5e13d80ab9d242c378aa9aff14c619
This should work just fine with multiple workers in RLLib as well since the program is forked prior to the first call to the user defined `observation()` , you won't be sharing a heap between worker processes
Posted by: HuaweiUK @ Dec. 6, 2019, 7:48 p.m.