You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Booting a remote p2.xlarge server with Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-1020-aws x86_64)
Cloning the repo
Running the install script
. . . I get this:
./scripts/install-nvidia.sh
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libc6-dev' instead of 'libc-dev'
gcc is already the newest version (4:5.3.1-1ubuntu1).
make is already the newest version (4.1-6).
libc6-dev is already the newest version (2.23-0ubuntu9).
0 upgraded, 0 newly installed, 0 to remove and 128 not upgraded.
--2018-01-11 15:31:19-- http://us.download.nvidia.com/XFree86/Linux-x86_64/361.42/NVIDIA-Linux-x86_64-361.42.run
Resolving us.download.nvidia.com (us.download.nvidia.com)... 192.229.211.70, 2606:2800:21f:3aa:dcf:37b:1ed6:1fb
Connecting to us.download.nvidia.com (us.download.nvidia.com)|192.229.211.70|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 86760004 (83M) [application/octet-stream]
Saving to: ‘/tmp/NVIDIA-Linux-x86_64-361.42.run.1’
NVIDIA-Linux-x86_64-361.42.run.1 100%[=============================================================================================>] 82.74M 140MB/s in 0.6s
2018-01-11 15:31:19 (140 MB/s) - ‘/tmp/NVIDIA-Linux-x86_64-361.42.run.1’ saved [86760004/86760004]
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 361.42...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources,
with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA
kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver
release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README
available on the Linux driver download page at www.nvidia.com.
--2018-01-11 15:31:53-- https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
Resolving github.com (github.com)... 192.30.253.113, 192.30.253.112
Connecting to github.com (github.com)|192.30.253.113|:443... connected.
HTTP request sent, awaiting response... 502 Bad Gateway
2018-01-11 15:31:53 ERROR 502: Bad Gateway.
dpkg: error processing archive /tmp/nvidia-docker*.deb (--install):
cannot access archive: No such file or directory
Errors were encountered while processing:
/tmp/nvidia-docker*.deb
sudo: nvidia-docker: command not found
This seems like a driver mismatch. I'm unable to test this locally, unfortunately (wrong GPU), so I'm left to guess if the image needs rebuilding or if I need to change my EC2 config somehow. It looks like the appropriate driver version needs a bump.
The text was updated successfully, but these errors were encountered:
Hey @wboykinm ! It's been a while since I last used that script for deploying this container, so I'm afraid it's pretty much outdated. My recommendation right now would be to create an instance based on one of the AMIs provided by NVIDIA, which already comes prepared with the appropriate drivers and nvidia-toolkit versions.
I use the AMI named "NVIDIA CUDA Toolkit 7.5 on Amazon Linux" an that one works pretty well, the only thing you need to manually install after creating the instance would be docker and nvidia-docker. After that you should be ready to run the container!
Following the remote-launch outline laid out in @albarji's blog post . . .
p2.xlarge
server with Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-1020-aws x86_64). . . I get this:
This seems like a driver mismatch. I'm unable to test this locally, unfortunately (wrong GPU), so I'm left to guess if the image needs rebuilding or if I need to change my EC2 config somehow. It looks like the appropriate driver version needs a bump.
The text was updated successfully, but these errors were encountered: