When I increase the number of parameters for the model, the training phase will randomly crashed with this error. Do anyone have idea about it?
Posted by: Team85 @ Nov. 8, 2019, 1:28 a.m.We are aware of this issue and are looking into it. If using rllib, setting 'max_failures' to a high number as a temporary workaround will allow you to continue training even if this error crops up. In practice we only see this error on rare occasions and so training has progressed smoothly for us, but as you point out this may be model-dependent and so your mileage may vary.
Posted by: HuaweiUK @ Nov. 8, 2019, 10:59 a.m.