emergingstack / es-dev-stack Goto Github PK

View Code? Open in Web Editor NEW

259.0 259.0 43.0 17 KB

An on-premises, bare-metal solution for deploying GPU-powered applications in containers

License: MIT License

Shell 7.77% Jupyter Notebook 92.23%

artificial-intelligence docker nvidia

es-dev-stack's People

Contributors

Stargazers

Watchers

es-dev-stack's Issues

Readme requires basic instructions

Update the readme to provide basic deployment and testing instructions.

libssl-dev, openssl are missing

Running the script i get this error :
scripts/sign-file.c:23:30: fatal error: openssl/opensslv.h: No such file or directory
#include <openssl/opensslv.h>
^
compilation terminated.
make[1]: *** [scripts/sign-file] Error 1
make: *** [scripts] Error 2

libssl-dev is missing in your apt-get line

Step 13 fails with:gzip: /proc/config.gz: No such file or directory

[on ubuntu]

docker build -t cuda .

Step 11 : RUN git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux
---> Running in 79114fa89e84
Cloning into 'linux'...
Checking out files: 100% (53649/53649), done.
---> 9168a4ef123a
Removing intermediate container 79114fa89e84
Step 12 : WORKDIR linux
---> Running in 1ec4c1e43813
---> 2523f3c1008d
Removing intermediate container 1ec4c1e43813
Step 13 : RUN git checkout -b stable vuname -r | sed -e "s/-.*//" | sed -e "s/\.[0]*$//" && zcat /proc/config.gz > .config && make modules_prepare
---> Running in 12bbc24c2fff
Switched to a new branch 'stable'
gzip: /proc/config.gz: No such file or directory
The command '/bin/sh -c git checkout -b stable vuname -r | sed -e "s/-.*//" | sed -e "s/\.[0]*$//" && zcat /proc/config.gz > .config && make modules_prepare' returned a non-zero code: 1

Maintain consistency of Linux kernel version: CoreOS and that for building CUDA kernel module

I installed the stable channel of CoreOS to disk. And docker build works with corenvidiadrivers. But when I run docker run -it --privileged cuda, it complains that

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against
       the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the
       target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module
       from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by
       this NVIDIA Linux graphics driver release.

       Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file
       '/var/log/nvidia-installer.log' for more information.

Also, it is said that CoreOS automatically upgrades itself. Seems that every time it upgrades, we should be able to rebuild the CUDA kernel module and reload it?

The iPyhton kernel dead at the end of the example (version1.2)

The Jupyter Notebook kernel dead at the end of the example (BTW very good job!)

Output from the console:
docker run --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia1:/dev/nvidia1 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm -it -p 8888:8888 --privileged tflowgpu
[I 07:25:01.001 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[W 07:25:01.018 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 07:25:01.018 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using authentication. This is highly insecure and not recommended.
[I 07:25:01.021 NotebookApp] Serving notebooks from local directory: /examples
[I 07:25:01.021 NotebookApp] 0 active kernels
[I 07:25:01.021 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/
[I 07:25:01.021 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 07:25:46.927 NotebookApp] Writing notebook-signing key to /root/.local/share/jupyter/notebook_secret
[W 07:25:46.928 NotebookApp] Notebook CNN.ipynb is not trusted
[I 07:25:47.225 NotebookApp] Kernel started: 30c8c283-a7f0-4596-94b3-641563e7cf2d
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:99] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1407] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
[I 07:26:16.736 NotebookApp] Kernel shutdown: 30c8c283-a7f0-4596-94b3-641563e7cf2d
[W 07:26:21.891 NotebookApp] Notebook CNN.ipynb is not trusted
[I 07:26:22.195 NotebookApp] Kernel started: 6daac0d5-a27e-4ef5-96c4-af5e4e6fd345
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:99] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1407] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:83:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 11.27GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0x420d900000 extends to 0x44df1b5400
F tensorflow/stream_executor/cuda/cuda_dnn.cc:204] could not find cudnnCreate in cudnn DSO; dlerror: /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cudnnCreate
[I 07:26:43.197 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 6daac0d5-a27e-4ef5-96c4-af5e4e6fd345 restarted

Did you intend to checkout 'v3.13.0-48-generic' which can not be resolved as commit?

Step 9 : RUN git checkout -b stable v`uname -r` && zcat /proc/config.gz > .config && make modules_prepare
 ---> Running in d97be9a10152
fatal: Cannot update paths and switch to branch 'stable' at the same time.
Did you intend to checkout 'v3.13.0-48-generic' which can not be resolved as commit?
The command '/bin/sh -c git checkout -b stable v`uname -r` && zcat /proc/config.gz > .config && make modules_prepare' returned a non-zero code: 128

Environment

~$ docker --version
env:  Docker version 1.9.1, build a34a1d5

~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:    14.04
Codename:   trusty

Happens for both:

~/es-dev-stack/corenvidiadrivers$ docker build -t cuda .

and

  ~/es-dev-stack/tflowgpu$ docker build -t tflowgpu .

Can build CUDA kernel module but can NOT load it.

I am using AWS EC2 g2.8xlarge instances with 4.3.6 CoreOS stable channel:

core@ip-172-31-32-170 ~# uname -a
Linux ip-172-31-32-170.us-west-2.compute.internal 4.3.6-coreos #2 SMP Tue Apr 5 10:32:16 UTC 2016 x86_64 Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz GenuineIntel GNU/Linux

The compiler used to build the kernel is GCC 4.9.3,

core@ip-172-31-32-170 ~# cat /proc/version 
Linux version 4.3.6-coreos (buildbot@ip-10-204-3-57) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.3, pie-0.6.3) ) #2 SMP Tue Apr 5 10:32:16 UTC 2016

which is the same version as the one used in the Dockerfile https://github.com/emergingstack/es-dev-stack/blob/master/corenvidiadrivers/Dockerfile

root@bd330876a124:/# gcc --version
gcc (Ubuntu 4.9.3-8ubuntu2~14.04) 4.9.3

I checkout the same version of Linux kernel source code as of the CoreOS kernel:

git clone -b v4.3.6 --depth 1 git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux

I manually did every step in the Dockerfile in docker run -it ubuntu:14.04 /bin/bash. Everything works fine. But the last command ./NVIDIA-Linux-x86_64-352.39/nvidia-installer -q -a -n -s --kernel-source-path=/usr/src/kernels/linux/ failed.

For your reference, the tail of /var/log/nvidia-installer.log is as following:

-> done.
-> Kernel module compilation complete.
-> Unable to determine if Secure Boot is enabled: No such file or directory
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Operation not permitted
-> Kernel messages:
[16796.891483] docker0: port 1(vethef6dba9) entered forwarding state
[16796.994375] docker0: port 1(vethef6dba9) entered disabled state
[16796.998753] eth0: renamed from vethf74bd80
[16797.011532] docker0: port 1(vethef6dba9) entered forwarding state
[16797.015840] docker0: port 1(vethef6dba9) entered forwarding state
[16812.064059] docker0: port 1(vethef6dba9) entered forwarding state
[17191.805854] docker0: port 1(vethef6dba9) entered disabled state
[17191.805982] vethf74bd80: renamed from eth0
[17191.834880] docker0: port 1(vethef6dba9) entered forwarding state
[17191.839654] docker0: port 1(vethef6dba9) entered forwarding state
[17191.851665] docker0: port 1(vethef6dba9) entered disabled state
[17191.857097] device vethef6dba9 left promiscuous mode
[17191.857100] docker0: port 1(vethef6dba9) entered disabled state
[17321.036031] device veth0ad1bb0 entered promiscuous mode
[17321.036313] IPv6: ADDRCONF(NETDEV_UP): veth0ad1bb0: link is not ready
[17321.038931] IPv6: ADDRCONF(NETDEV_CHANGE): veth0ad1bb0: link becomes ready
[17321.038976] docker0: port 1(veth0ad1bb0) entered forwarding state
[17321.038982] docker0: port 1(veth0ad1bb0) entered forwarding state
[17322.119380] docker0: port 1(veth0ad1bb0) entered disabled state
[17322.123987] eth0: renamed from vethe4e5899
[17322.132413] docker0: port 1(veth0ad1bb0) entered forwarding state
[17322.136489] docker0: port 1(veth0ad1bb0) entered forwarding state
[17337.184133] docker0: port 1(veth0ad1bb0) entered forwarding state
[18408.303762] deprecated_sysctl_warning: 27 callbacks suppressed
[18408.308095] warning: process `nvidia-installe' used the deprecated sysctl system call with 1.23.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Consider using CoreOS developer container

See https://gist.github.com/marineam/9914debc25c8d7dc458f
It uses the same toolchain as the one that built the CoreOS kernel and it doesn't require running Ubuntu. I haven't tested it, but I think the current approach results in a much larger driver image than necessary.

Perhaps this approach would work, for a given CoreOS release XYZ:

Grab cuda_*run and extract it to a directory on the host machine. You can't do that inside the container, because it doesn't have enough space to download the archive, let alone expand it, etc.
Prepare a shell script to be run on the target nodes, which performs sanity checks, then runs insmod
Run container version XYZ using systemd-nspawn, adding --bind for the cuda directory, the script and any other files/directories needed. Make nspawn run a Makefile or similar (CoreOS doesn't ship with Make, but the container does!), which perhaps outputs a tar file with just the driver, a shell script and maybe insmod
Run docker import to create a Docker image from the tarball (which can be piped), with version XYZ and the shell script as the entrypoint
???
Profit!

What do you think? There'd be no Dockerfile left, at the end, but it might be worth it.

Is it possible to combine corenvidiadrivers/Dockerfile and tflowgpu/Dockerfile?

I can't test this because of another issue - and I realize this is not a container best practice - but is it possible to combine these two?

What would the docker run statement look like for this combined Docker image - specifically, in terms of the --device mappings?

Thanks!

-Chris

CUDA behaviour when changing GPUs

Hi @zheng-xq,

I got a message from Eugene on our blog that mentioned you may have some information about some strange Nvidia behaviour with regards to docker and CUDA drivers when changing cards causing the Nvidia drivers to push invalid firmware to a GPU. Would love to discuss it with you sometime or get some insight into this behaviour.

Thanks

Mike

Unable to run cuda:latest image - nvidia: version magic

Shortly after running container stdout print:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built
       against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one
       used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents
       the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed
       in this system is supported by this NVIDIA Linux graphics driver release.

       Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file
       '/var/log/nvidia-installer.log' for more information.

ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find
       suggestions on fixing installation problems in the README available on the Linux driver download page at
       www.nvidia.com.

In /var/log/nvidia-installer.log:

-> done.
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Exec format error
-> Kernel messages:
[245414.289012] vgaarb: device changed decodes: PCI:0000:06:00.0,olddecodes=none,decodes=none:owns=none
[245414.289295] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015
[245414.290603] nvidia_uvm: version magic '4.5.0-coreos-r1.0 SMP mod_unload ' should be '4.5.0-coreos-r1 SMP mod_unload '
[245530.126055] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=io+mem
[245530.126355] vgaarb: device changed decodes: PCI:0000:06:00.0,olddecodes=none,decodes=none:owns=none
[245530.126647] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015
[245530.127915] nvidia_uvm: version magic '4.5.0-coreos-r1.0 SMP mod_unload ' should be '4.5.0-coreos-r1 SMP mod_unload '
[245683.707725] docker0: port 1(veth22fab08) entered disabled state
[245683.714388] vethac0f204: renamed from eth0
[245683.736914] docker0: port 1(veth22fab08) entered forwarding state
[245683.743732] docker0: port 1(veth22fab08) entered forwarding state
[245683.838647] docker0: port 1(veth22fab08) entered disabled state
[245683.848227] device veth22fab08 left promiscuous mode
[245683.854539] docker0: port 1(veth22fab08) entered disabled state
[245689.662547] device veth1ced9a7 entered promiscuous mode
[245689.668489] IPv6: ADDRCONF(NETDEV_UP): veth1ced9a7: link is not ready
[245689.676076] IPv6: ADDRCONF(NETDEV_CHANGE): veth1ced9a7: link becomes ready
[245689.683594] docker0: port 1(veth1ced9a7) entered forwarding state
[245689.690294] docker0: port 1(veth1ced9a7) entered forwarding state
[245689.850591] docker0: port 1(veth1ced9a7) entered disabled state
[245689.857775] eth0: renamed from veth91b59ba
[245689.871251] docker0: port 1(veth1ced9a7) entered forwarding state
[245689.878092] docker0: port 1(veth1ced9a7) entered forwarding state
[245704.917474] docker0: port 1(veth1ced9a7) entered forwarding state
[245729.922871] nvidia: version magic '4.5.0-coreos-r1.0 SMP mod_unload ' should be '4.5.0-coreos-r1 SMP mod_unload '
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

I am running 4.5.0-coreos-r1, /usr/src/kernels/linux contains 4.5.5 kernel.

I have forced container to run by editing /usr/src/kernels/linux/include/generated/utsrelease.h from:

#define UTS_RELEASE "4.5.0-coreos-r1.0"

#define UTS_RELEASE "4.5.0-coreos-r1"

Probably 21 string in current master Dockerfile is redundant.

emergingstack / es-dev-stack Goto Github PK

es-dev-stack's People

Contributors

Stargazers

Watchers

Forkers

es-dev-stack's Issues

Readme requires basic instructions

libssl-dev, openssl are missing

Step 13 fails with:gzip: /proc/config.gz: No such file or directory

Maintain consistency of Linux kernel version: CoreOS and that for building CUDA kernel module

The iPyhton kernel dead at the end of the example (version1.2)

Did you intend to checkout 'v3.13.0-48-generic' which can not be resolved as commit?

Can build CUDA kernel module but can NOT load it.

Consider using CoreOS developer container

Is it possible to combine corenvidiadrivers/Dockerfile and tflowgpu/Dockerfile?

CUDA behaviour when changing GPUs

Unable to run cuda:latest image - nvidia: version magic

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent