Hello,
We are facing the following error when submitting a new model:
OSError: [Errno 12] Cannot allocate memory
Error: tcpip::Socket::recvAndCheck @ recv: peer shutdown
Quitting (on error).
This work just fine for the first episodes, but after a while, the error is raised.
Could you point out how to solve it?
Thank you,
Team 153
Hi, some suggestions for you to investigate:
1. At the end of each episode, Policy.teardown() is called, make sure you are letting go of any references to large objects when this is called
2. Watch out for any accumulated data over the course of an episode, are you appending objects to lists each iteration? this can quickly eat up a lot of memory, especially if they are not cleaned up after each episode.
3. Use the run.py scripts that come with the starter kit examples and watch for any memory growth caused by your policy code for long episodes, and even across multiple episodes to catch cases where memory is leaked across episodes.
Best of luck,
DriveML team