Ejecutar tensorflow en el clúster GPU en virtualenv
Instalé la versión de GPU de tensorflow en un virtualenv siguiendo estosinstrucciones. El problema es que obtengo una falla de segmentación al comenzar una sesión. Es decir, este código:
import tensorflow as tf
sess = tf.InteractiveSession()
sale con el siguiente error:
(tesnsorflowenv)user@machine$ python testtensorflow.py
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:93] Couldn't open CUDA library libcudnn.so.6.5. LD_LIBRARY_PATH: :/vol/cuda/7.0.28/lib64
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1382] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 40
Segmentation fault
Traté de profundizar usando gdb pero solo obtuve los siguientes resultados adicionales:
[New Thread 0x7fffdf880700 (LWP 32641)]
[New Thread 0x7fffdf07f700 (LWP 32642)]
... lines omitted
[New Thread 0x7fffadffb700 (LWP 32681)]
[Thread 0x7fffadffb700 (LWP 32681) exited]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
¿Alguna idea de lo que está sucediendo aquí y cómo solucionarlo?
Aquí está la salida de nvidia-smi:
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:06:00.0 Off | 0 |
| N/A 65C P0 142W / 149W | 235MiB / 11519MiB | 81% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 0000:07:00.0 Off | 0 |
| N/A 25C P8 30W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 0000:0D:00.0 Off | 0 |
| N/A 27C P8 26W / 149W | 55MiB / 11519MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 0000:0E:00.0 Off | 0 |
| N/A 25C P8 28W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 0000:86:00.0 Off | 0 |
| N/A 46C P0 85W / 149W | 206MiB / 11519MiB | 97% E. Process |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 0000:87:00.0 Off | 0 |
| N/A 27C P8 29W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 0000:8D:00.0 Off | 0 |
| N/A 28C P8 26W / 149W | 55MiB / 11519MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 0000:8E:00.0 Off | 0 |
| N/A 23C P8 30W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
¡Gracias por cualquier ayuda en este tema!