Hi guys,
Unfortunately, the model constantly crashes on the configuration (Ubuntu 16.04, Python 3.6, CUDA 9.0)
to be exact, it's python 3.6.3, if you find it important
Posted by: arsenyinfo @ May 27, 2018, 10:19 p.m.to be exact, it's python 3.6.3, if you find it important
Posted by: arsenyinfo @ May 27, 2018, 10:19 p.m.I've investigated a bit, it looks like a mysterious conflict with `torchvision.transforms`:
arseny@cobalt:~/mcs$ ipython
Python 3.6.3 (default, Mar 31 2018, 11:33:10)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import MCS2018
In [2]: from torchvision import transforms
Segmentation fault
arseny@cobalt:~/mcs$ ipython
Python 3.6.3 (default, Mar 31 2018, 11:33:10)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from torchvision import transforms
In [2]: import MCS2018
Segmentation fault
Please try the following:
1. import MCS2018
net = MCS2018.Predictor(pu_id)
# and then import transforms
import torchvision
2. If the following does not help, please downgrade pytorch from 0.4. to 0.3 and then try commands in 1.
The point is that is somehow conflicts, but if you define your net before importing torchvision, its ok.
Posted by: oleggrinch @ May 28, 2018, 10:02 a.m.Thanks! It worked with torch 0.3.
Would you please also compile it with CuDNN 7? It falls with such error now: RuntimeError: [enforce fail at common_cudnn.h:108] version_match || backwards_compatible_7 || patch_compatible. cuDNN compiled (7103) and runtime (7003) versions mismatch
Have you tried this order of imports?
>>>
1. import MCS2018
net = MCS2018.Predictor(pu_id)
# and then import transforms
import torchvision
Yes, steps (1) didn't work for me until I downgraded my torch.
Posted by: arsenyinfo @ May 28, 2018, 10:17 a.m.This step should also solve the problem with cudnn version mismatch
Posted by: oleggrinch @ May 28, 2018, 1:53 p.m.Just tried both downgrade and import order change, and new, even more interesting issue happened:
[MCS2018] Initializing.
[MCS2018] Mode: Gpu 0
[MCS2018] Loading predictor.
[MCS2018] Ready.
/home/arseny/.pyenv/versions/3.6.3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
0%| | 0/1068366 [00:00<?, ?it/s]
*** Error in `python': malloc(): memory corruption: 0x0000000001edbe80 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f61f5d667e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f61f5d7113e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f61f5d73184]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x18)[0x7f61ef1eae78]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE+0x59)[0x7f61ef22be39]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x1b)[0x7f61ef22cc6b]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs7reserveEm+0x44)[0x7f61ef22cd24]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE8overflowEi+0xc2)[0x7f61ef223ee2]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x89)[0x7f61ef27ae79]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x1a6)[0x7f61ef26bec6]
/home/arseny/mcs/src/MCS2018.so(+0xd0ba2b)[0x7f61b7961a2b]
/home/arseny/mcs/src/MCS2018.so(_ZN6caffe211CudnnConvOp13DoRunWithTypeIffffEEbv+0xc2a)[0x7f61b79720ea]
/home/arseny/mcs/src/MCS2018.so(_ZN6caffe211CudnnConvOp11RunOnDeviceEv+0x178)[0x7f61b795f5e8]
/home/arseny/mcs/src/MCS2018.so(+0xcd07cd)[0x7f61b79267cd]
/home/arseny/mcs/src/MCS2018.so(_ZN6caffe29SimpleNet3RunEv+0x172)[0x7f61b74f0b42]
/home/arseny/mcs/src/MCS2018.so(+0x861ac3)[0x7f61b74b7ac3]
/home/arseny/mcs/src/MCS2018.so(+0x834027)[0x7f61b748a027]
/home/arseny/mcs/src/MCS2018.so(+0x83b4c7)[0x7f61b74914c7]
/home/arseny/mcs/src/MCS2018.so(+0x825c1a)[0x7f61b747bc1a]
python(_PyCFunction_FastCallKeywords+0x1ac)[0x4aedac]
python[0x540268]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python(PyEval_EvalCodeEx+0x3e)[0x54101e]
python[0x485796]
python(PyObject_Call+0x5c)[0x452fac]
python(_PyEval_EvalFrameDefault+0x2d2a)[0x54421a]
python[0x53f761]
python[0x540611]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python[0x5403b7]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python[0x5403b7]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python(PyEval_EvalCode+0x23)[0x540f93]
python(PyRun_FileExFlags+0x16f)[0x42762f]
python(PyRun_SimpleFileExFlags+0xec)[0x42785c]
python(Py_Main+0xd85)[0x43bc25]
python(main+0x162)[0x41dd72]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f61f5d0f830]
Please try the following .so (for python 3.5 for now)
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-35m-x86_64-linux-gnu.so
Upd
3.6:
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-36m-x86_64-linux-gnu.so
3.5:
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-35m-x86_64-linux-gnu.so
2.7:
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.so
Скачал последний выложенный вариант для Ubuntu/CUDA 9.0/Python3.6.
Сначала получал segfault, но увидел эту ветку и уронил torch до 0.3.1.
Теперь получаю:
python prepare_data.py --root data/student_model_imgs/ --datalist_path data/datalist/ --datalist_type train --gpu_id 0
\[MCS2018] Initializing.
[MCS2018] Mode: Gpu 0
[MCS2018] Loading predictor.
[MCS2018] Ready.
/home/sadworker/anaconda3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
0%| | 0/1000 [00:00<?, ?it/s]*** Error in `python': malloc(): memory corruption: 0x000055cb84c24740
@tetelias
try
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-36m-x86_64-linux-gnu.so
Завелось, спасибо!
Нужно исправить пару вещей в описании на гитхабе:
в пункте 5 --datalist ../data/datalist_small/ лучше убрать
в пункте 6 файл называется best_model_chkpt.t7, а не best_model_ckpt.t7 :)
Я поторопился: при попытке запустить пункт 8 с гитхаба выпадает:
python evaluate.py --attack_root ./baseline1/ --target_dscr ./data/val_descriptors.npy --submit_name Baseline1 --gpu_id 0
[MCS2018] Initializing. | 0/10000 [00:00<?, ?it/s]
[MCS2018] Mode: Gpu 0 | 2/10000 [00:00<10:04, 16.54it/s]
Traceback (most recent call last): | 0/10000 [00:00<?, ?it/s]
File "evaluate.py", line 160, in <module> | 0/10000 [00:00<?, ?it/s]
main(args) | 0/10000 [00:00<?, ?it/s]
File "evaluate.py", line 60, in main | 0/10000 [00:00<?, ?it/s]
net = MCS2018.Predictor(args.gpu_id) | 0/10000 [00:00<?, ?it/s]
RuntimeError: [enforce fail at common_cudnn.h:108] version_match || backwards_compatible_7 || patch_compatible. cuDNN compiled (7103) and runtime (7005) versions mismatch | 0/10000 [00:00<?, ?it/s]
Все работает: в соседней ветке есть рекоментация по инициализации net перез импортом из torchvision
Posted by: tetelias @ May 31, 2018, 2:15 p.m.