We tried to submit 8 zip files of our models yesterday, but, each and every time we tried to submit our zip files, our submissions are failed after our zip files are actually submitted. There are two issues that are still unclarified: how can this problem be solved? And, do failed submissions count in the 20 times limit per day? Urgent help needed @HuaweiUK TIA!
Below are the two variants of the returned error messages for our submissions:
Error1:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
X.Org X Server 1.19.6
Release Date: 2017-12-20
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.4.0-148-generic x86_64 Ubuntu
Current Operating System: Linux 88ba6ce3a4b9 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-4.4.0-87-generic root=/dev/mapper/newminio--vg-root ro
Build Date: 03 June 2019 08:10:35AM
xorg-server 2:1.19.6-1ubuntu4.3 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.34.0
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(++) Log file: "./xdummy.log", Time: Sun Dec 8 04:41:38 2019
(++) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
INFO:/tmp/codalab/tmpUASpau/run/program/evaluate.py:Root Directory: [metadata, evaluate.py, deps]
/usr/share/sumo/tools/sumolib/__init__.py:28: UserWarning: No module named 'matplotlib'
warnings.warn(str(e))
INFO:/tmp/codalab/tmpUASpau/run/program/evaluate.py:args.input_dir: [scores, res, ref, metadata, tmpkDMXNq.zip, tmpE4ByJX.txt, coopetition, tmpazx9Zl.zip, tmpcYzQjv.zip, history, tmpFulYC3.txt]
INFO:/tmp/codalab/tmpUASpau/run/program/evaluate.py:args.output_dir: [metadata]
INFO:/tmp/codalab/tmpUASpau/run/program/evaluate.py:Submission Directory: [model000, metadata]
INFO:/tmp/codalab/tmpUASpau/run/program/evaluate.py:Evaluation Scenarios Directory: [1lane_10v, 3lane_sharp_b_10v, 1lane, 2lane_sharp_bwd_10v, 3lane_sharp_b_25v, 3lane_bwd_10v, 3lane_sharp_b_50v]
INFO:/tmp/codalab/tmpUASpau/run/program/evaluate.py:Score Output Directory: [metadata]
Traceback (most recent call last):
File "/tmp/codalab/tmpUASpau/run/program/evaluate.py", line 80, in
from policy import Policy
ModuleNotFoundError: No module named 'policy'
Error2:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
X.Org X Server 1.19.6
Release Date: 2017-12-20
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.4.0-148-generic x86_64 Ubuntu
Current Operating System: Linux 1603cf9d3329 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-4.4.0-87-generic root=/dev/mapper/newminio--vg-root ro
Build Date: 03 June 2019 08:10:35AM
xorg-server 2:1.19.6-1ubuntu4.3 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.34.0
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(++) Log file: "./xdummy.log", Time: Mon Dec 9 02:58:49 2019
(++) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Root Directory: [metadata, evaluate.py, deps]
/usr/share/sumo/tools/sumolib/__init__.py:28: UserWarning: No module named 'matplotlib'
warnings.warn(str(e))
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:args.input_dir: [tmp79NrlZ.zip, tmpzbcq_u.zip, scores, res, ref, tmpClIKEy.txt, metadata, tmpO_3Y9E.zip, coopetition, history, tmps4L7BL.txt]
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:args.output_dir: [metadata]
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Submission Directory: [saved_model.pb, variables, metadata, policy.py]
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Evaluation Scenarios Directory: [1lane_10v, 3lane_sharp_b_10v, 1lane, 2lane_sharp_bwd_10v, 3lane_sharp_b_25v, 3lane_bwd_10v, 3lane_sharp_b_50v]
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Score Output Directory: [metadata]
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Loaded user policy:
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Running evaluation for scenario=1lane
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Creating gym environment using scenario=SumoScenario(
_root=/tmp/codalab/tmpJNXRrs/run/input/ref/1lane,
_random_social_vehicle_count=0
), seed=649
Known pipe types:
glxGraphicsPipe
(all display modules loaded.)
:device(error): Error adding inotify watch on /dev/input: No such file or directory
:device(error): Error opening directory /dev/input: No such file or directory
AL lib: (WW) pulse_load: Failed to load libpulse.so.0
AL lib: (WW) alc_initconfig: Failed to initialize backend "pulse"
AL lib: (WW) alsa_load: Failed to load libasound.so.2
AL lib: (WW) alc_initconfig: Failed to initialize backend "alsa"
AL lib: (EE) ALCplaybackOSS_open: Could not open /dev/dsp: No such file or directory
AL lib: (WW) alcSetError: Error generated on device (nil), code 0xa004
AL lib: (EE) ALCplaybackOSS_open: Could not open /dev/dsp: No such file or directory
AL lib: (WW) alcSetError: Error generated on device (nil), code 0xa004
:audio(error): Couldn't open default OpenAL device
:audio(error): OpenALAudioManager: No open device or context
:audio(error): OpenALAudioManager is not valid, will use NullAudioManager
AL lib: (EE) ALCplaybackOSS_open: Could not open /dev/dsp: No such file or directory
AL lib: (WW) alcSetError: Error generated on device (nil), code 0xa004
AL lib: (EE) ALCplaybackOSS_open: Could not open /dev/dsp: No such file or directory
AL lib: (WW) alcSetError: Error generated on device (nil), code 0xa004
:audio(error): Couldn't open default OpenAL device
:audio(error): OpenALAudioManager: No open device or context
:audio(error): OpenALAudioManager is not valid, will use NullAudioManager
INFO:/tmp/codalab/tmpJNXRrs/run/program/evaluate.py:Episode (0) start
Traceback (most recent call last):
File "/tmp/codalab/tmpJNXRrs/run/program/evaluate.py", line 162, in
agent_policy.setup()
File "/tmp/codalab/tmpJNXRrs/run/input/res/policy.py", line 217, in setup
self._model = EvaluationModel()
File "/tmp/codalab/tmpJNXRrs/run/input/res/policy.py", line 202, in __init__
tags=['serve'])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 268, in load
loader = SavedModelLoader(export_dir)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 284, in __init__
self._saved_model = parse_saved_model(export_dir)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 83, in parse_saved_model
constants.SAVED_MODEL_FILENAME_PB))
OSError: SavedModel file does not exist at: /tmp/codalab/tmpJNXRrs/run/input/res/model/{saved_model.pbtxt|saved_model.pb}
Error: tcpip::Socket::recvAndCheck @ recv: peer shutdown
Quitting (on error).
This is likely an error in your submission directory structure
Make sure that the way you are reading your model in for evaluation matches what you are including in your submission.
If you take a look at the EvaluationModel in the policy.py starter kit, you'll see that it tries to load saved_models from a `model` directory.
```
model_path = os.path.join(
os.path.dirname(os.path.realpath(__file__)),
'model')
self.base_model = tf.saved_model.load(self._sess,
export_dir=model_path,
tags=['serve'])
````
If you are using code similar to this to load your saved models, make sure your submission folder structure matches what this code expects. Your submission when unzipped should have a folder structure that looks like this:
```
.
...
├── model
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00002
│ ├── variables.data-00001-of-00002
│ └── variables.index
├── policy.py
...
```
To speed up your debugging, make sure that the `run.py` script that comes with the rllib example works with your policy before submitting to Codalab.
Posted by: HuaweiUK @ Dec. 9, 2019, 6:17 p.m.Now consider the problem solved as my newly-generated model can be submitted successfully. Thanks for the information.
And please review my other issues in my new threads and reply to them, TIA