CodaLab -

> segfault (Ubuntu 16.04, Python 3.6, CUDA 9.0)

Hi guys,
Unfortunately, the model constantly crashes on the configuration (Ubuntu 16.04, Python 3.6, CUDA 9.0)

Posted by: arsenyinfo @ May 27, 2018, 10:18 p.m.

to be exact, it's python 3.6.3, if you find it important

Posted by: arsenyinfo @ May 27, 2018, 10:19 p.m.

to be exact, it's python 3.6.3, if you find it important

Posted by: arsenyinfo @ May 27, 2018, 10:19 p.m.

I've investigated a bit, it looks like a mysterious conflict with `torchvision.transforms`:

arseny@cobalt:~/mcs$ ipython
Python 3.6.3 (default, Mar 31 2018, 11:33:10)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import MCS2018
In [2]: from torchvision import transforms
Segmentation fault

In [1]: from torchvision import transforms
In [2]: import MCS2018
Segmentation fault

Posted by: arsenyinfo @ May 27, 2018, 10:41 p.m.

Please try the following:
1. import MCS2018
net = MCS2018.Predictor(pu_id)
# and then import transforms
import torchvision

2. If the following does not help, please downgrade pytorch from 0.4. to 0.3 and then try commands in 1.

The point is that is somehow conflicts, but if you define your net before importing torchvision, its ok.

Posted by: oleggrinch @ May 28, 2018, 10:02 a.m.

Thanks! It worked with torch 0.3.
Would you please also compile it with CuDNN 7? It falls with such error now: RuntimeError: [enforce fail at common_cudnn.h:108] version_match || backwards_compatible_7 || patch_compatible. cuDNN compiled (7103) and runtime (7003) versions mismatch

Posted by: arsenyinfo @ May 28, 2018, 10:12 a.m.

Have you tried this order of imports?

>>>
1. import MCS2018
net = MCS2018.Predictor(pu_id)
# and then import transforms
import torchvision

Posted by: oleggrinch @ May 28, 2018, 10:16 a.m.

Yes, steps (1) didn't work for me until I downgraded my torch.

Posted by: arsenyinfo @ May 28, 2018, 10:17 a.m.

This step should also solve the problem with cudnn version mismatch

Posted by: oleggrinch @ May 28, 2018, 1:53 p.m.

Just tried both downgrade and import order change, and new, even more interesting issue happened:

[MCS2018] Initializing.
[MCS2018] Mode: Gpu 0
[MCS2018] Loading predictor.
[MCS2018] Ready.
/home/arseny/.pyenv/versions/3.6.3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
0%| | 0/1068366 [00:00<?, ?it/s]
*** Error in `python': malloc(): memory corruption: 0x0000000001edbe80 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f61f5d667e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f61f5d7113e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f61f5d73184]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x18)[0x7f61ef1eae78]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE+0x59)[0x7f61ef22be39]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x1b)[0x7f61ef22cc6b]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs7reserveEm+0x44)[0x7f61ef22cd24]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE8overflowEi+0xc2)[0x7f61ef223ee2]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x89)[0x7f61ef27ae79]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x1a6)[0x7f61ef26bec6]
/home/arseny/mcs/src/MCS2018.so(+0xd0ba2b)[0x7f61b7961a2b]
/home/arseny/mcs/src/MCS2018.so(_ZN6caffe211CudnnConvOp13DoRunWithTypeIffffEEbv+0xc2a)[0x7f61b79720ea]
/home/arseny/mcs/src/MCS2018.so(_ZN6caffe211CudnnConvOp11RunOnDeviceEv+0x178)[0x7f61b795f5e8]
/home/arseny/mcs/src/MCS2018.so(+0xcd07cd)[0x7f61b79267cd]
/home/arseny/mcs/src/MCS2018.so(_ZN6caffe29SimpleNet3RunEv+0x172)[0x7f61b74f0b42]
/home/arseny/mcs/src/MCS2018.so(+0x861ac3)[0x7f61b74b7ac3]
/home/arseny/mcs/src/MCS2018.so(+0x834027)[0x7f61b748a027]
/home/arseny/mcs/src/MCS2018.so(+0x83b4c7)[0x7f61b74914c7]
/home/arseny/mcs/src/MCS2018.so(+0x825c1a)[0x7f61b747bc1a]
python(_PyCFunction_FastCallKeywords+0x1ac)[0x4aedac]
python[0x540268]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python(PyEval_EvalCodeEx+0x3e)[0x54101e]
python[0x485796]
python(PyObject_Call+0x5c)[0x452fac]
python(_PyEval_EvalFrameDefault+0x2d2a)[0x54421a]
python[0x53f761]
python[0x540611]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python[0x5403b7]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python[0x5403b7]
python(_PyEval_EvalFrameDefault+0x102d)[0x54251d]
python[0x540105]
python(PyEval_EvalCode+0x23)[0x540f93]
python(PyRun_FileExFlags+0x16f)[0x42762f]
python(PyRun_SimpleFileExFlags+0xec)[0x42785c]
python(Py_Main+0xd85)[0x43bc25]
python(main+0x162)[0x41dd72]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f61f5d0f830]

Posted by: arsenyinfo @ May 28, 2018, 1:59 p.m.

Please try the following .so (for python 3.5 for now)
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-35m-x86_64-linux-gnu.so

Posted by: oleggrinch @ May 28, 2018, 5:38 p.m.

Upd
3.6:
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-36m-x86_64-linux-gnu.so

3.5:
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-35m-x86_64-linux-gnu.so

2.7:
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.so

Posted by: oleggrinch @ May 30, 2018, 9:17 a.m.

Скачал последний выложенный вариант для Ubuntu/CUDA 9.0/Python3.6.
Сначала получал segfault, но увидел эту ветку и уронил torch до 0.3.1.
Теперь получаю:
python prepare_data.py --root data/student_model_imgs/ --datalist_path data/datalist/ --datalist_type train --gpu_id 0
\[MCS2018] Initializing.
[MCS2018] Mode: Gpu 0
[MCS2018] Loading predictor.
[MCS2018] Ready.
/home/sadworker/anaconda3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
0%| | 0/1000 [00:00<?, ?it/s]*** Error in `python': malloc(): memory corruption: 0x000055cb84c24740

Posted by: tetelias @ May 30, 2018, 8:06 p.m.

@tetelias
try
http://mcs2018-competition.visionlabs.ru/distribs/test/MCS2018.cpython-36m-x86_64-linux-gnu.so

Posted by: a.parkin @ May 31, 2018, 8:44 a.m.

Завелось, спасибо!

Нужно исправить пару вещей в описании на гитхабе:
в пункте 5 --datalist ../data/datalist_small/ лучше убрать
в пункте 6 файл называется best_model_chkpt.t7, а не best_model_ckpt.t7 :)

Posted by: tetelias @ May 31, 2018, 12:55 p.m.

Я поторопился: при попытке запустить пункт 8 с гитхаба выпадает:
python evaluate.py --attack_root ./baseline1/ --target_dscr ./data/val_descriptors.npy --submit_name Baseline1 --gpu_id 0
[MCS2018] Initializing. | 0/10000 [00:00<?, ?it/s]
[MCS2018] Mode: Gpu 0 | 2/10000 [00:00<10:04, 16.54it/s]
Traceback (most recent call last): | 0/10000 [00:00<?, ?it/s]
File "evaluate.py", line 160, in <module> | 0/10000 [00:00<?, ?it/s]
main(args) | 0/10000 [00:00<?, ?it/s]
File "evaluate.py", line 60, in main | 0/10000 [00:00<?, ?it/s]
net = MCS2018.Predictor(args.gpu_id) | 0/10000 [00:00<?, ?it/s]
RuntimeError: [enforce fail at common_cudnn.h:108] version_match || backwards_compatible_7 || patch_compatible. cuDNN compiled (7103) and runtime (7005) versions mismatch | 0/10000 [00:00<?, ?it/s]

Posted by: tetelias @ May 31, 2018, 2:01 p.m.

Все работает: в соседней ветке есть рекоментация по инициализации net перез импортом из torchvision

Posted by: tetelias @ May 31, 2018, 2:15 p.m.

Post in this thread

Forums

MCS 2018. Adversarial Attacks on Black Box Face Recognition Forum

> segfault (Ubuntu 16.04, Python 3.6, CUDA 9.0)