Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xorg fatal error: (EE) no screens found(EE) #11

Closed
reinismu opened this issue Aug 3, 2021 · 11 comments
Closed

Xorg fatal error: (EE) no screens found(EE) #11

reinismu opened this issue Aug 3, 2021 · 11 comments

Comments

@reinismu
Copy link

reinismu commented Aug 3, 2021

Have been trying to get it to work for a while now.

What seems to stand out

[  2386.268] (EE) NVIDIA(GPU-0): Failed to acquire modesetting permission.
[  2386.268] (EE) NVIDIA(0): Failing initialization of X screen

Any ides?

/var/log/Xorg.0.log
[  2386.258] 
X.Org X Server 1.20.9
X Protocol Version 11, Revision 0
[  2386.258] Build Operating System: Linux 4.15.0-140-generic x86_64 Ubuntu
[  2386.258] Current Operating System: Linux d515df6cb351 5.12.14-arch1-1 #1 SMP PREEMPT Thu, 01 Jul 2021 07:26:06 +0000 x86_64
[  2386.258] Kernel command line: initrd=\initramfs-linux.img root=PARTUUID=1fc6aac3-02 rootfstype=ext4 add_efi_memmap pcie_acs_override=downstream,multifunction systemd.unified_cgroup_hierarchy=0 nvidia-drm.modeset=1
[  2386.258] Build Date: 08 April 2021  12:29:22PM
[  2386.258] xorg-server 2:1.20.9-2ubuntu1.2~20.04.2 (For technical support please see http://www.ubuntu.com/support) 
[  2386.258] Current version of pixman: 0.38.4
[  2386.258] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[  2386.258] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  2386.258] (==) Log file: "/var/log/Xorg.0.log", Time: Tue Aug  3 10:21:30 2021
[  2386.258] (==) Using config file: "/etc/X11/xorg.conf"
[  2386.258] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  2386.258] (==) ServerLayout "Layout0"
[  2386.258] (**) |-->Screen "Screen0" (0)
[  2386.258] (**) |   |-->Monitor "Monitor0"
[  2386.258] (**) |   |-->Device "Device0"
[  2386.259] (**) |-->Input Device "Keyboard0"
[  2386.259] (**) |-->Input Device "Mouse0"
[  2386.259] (==) Automatically adding devices
[  2386.259] (==) Automatically enabling devices
[  2386.259] (==) Automatically adding GPU devices
[  2386.259] (==) Automatically binding GPU devices
[  2386.259] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  2386.259] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (==) FontPath set to:
	/usr/share/fonts/X11/misc,
	/usr/share/fonts/X11/Type1,
	built-ins
[  2386.259] (==) ModulePath set to "/usr/lib/xorg/modules"
[  2386.259] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  2386.259] (WW) Disabling Keyboard0
[  2386.259] (WW) Disabling Mouse0
[  2386.259] (II) Loader magic: 0x5582690b8020
[  2386.259] (II) Module ABI versions:
[  2386.259] 	X.Org ANSI C Emulation: 0.4
[  2386.259] 	X.Org Video Driver: 24.1
[  2386.259] 	X.Org XInput driver : 24.1
[  2386.259] 	X.Org Server Extension : 10.0
[  2386.259] (++) using VT number 7

[  2386.259] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[  2386.259] (II) xfree86: Adding drm device (/dev/dri/card0)
[  2386.261] (--) PCI:*(10@0:0:0) 10de:1b81:1043:8598 rev 161, Mem @ 0xf6000000/16777216, 0xe0000000/268435456, 0xf0000000/33554432, I/O @ 0x0000e000/128, BIOS @ 0x????????/131072
[  2386.261] (II) LoadModule: "glx"
[  2386.261] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  2386.262] (II) Module glx: vendor="X.Org Foundation"
[  2386.262] 	compiled for 1.20.9, module version = 1.0.0
[  2386.262] 	ABI class: X.Org Server Extension, version 10.0
[  2386.262] (II) LoadModule: "nvidia"
[  2386.262] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  2386.262] (II) Module nvidia: vendor="NVIDIA Corporation"
[  2386.262] 	compiled for 1.6.99.901, module version = 1.0.0
[  2386.262] 	Module class: X.Org Video Driver
[  2386.262] (II) NVIDIA dlloader X Driver  465.31  Thu May 13 22:19:15 UTC 2021
[  2386.262] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  2386.262] (WW) xf86OpenConsole: setpgid failed: Operation not permitted
[  2386.262] (WW) xf86OpenConsole: VT_GETSTATE failed: Inappropriate ioctl for device
[  2386.262] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
[  2386.262] (II) Loading sub module "fb"
[  2386.262] (II) LoadModule: "fb"
[  2386.262] (II) Loading /usr/lib/xorg/modules/libfb.so
[  2386.262] (II) Module fb: vendor="X.Org Foundation"
[  2386.262] 	compiled for 1.20.9, module version = 1.0.0
[  2386.262] 	ABI class: X.Org ANSI C Emulation, version 0.4
[  2386.262] (II) Loading sub module "wfb"
[  2386.262] (II) LoadModule: "wfb"
[  2386.262] (II) Loading /usr/lib/xorg/modules/libwfb.so
[  2386.262] (II) Module wfb: vendor="X.Org Foundation"
[  2386.262] 	compiled for 1.20.9, module version = 1.0.0
[  2386.262] 	ABI class: X.Org ANSI C Emulation, version 0.4
[  2386.262] (II) Loading sub module "ramdac"
[  2386.262] (II) LoadModule: "ramdac"
[  2386.262] (II) Module "ramdac" already built-in
[  2386.263] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  2386.263] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[  2386.263] (==) NVIDIA(0): RGB weight 888
[  2386.263] (==) NVIDIA(0): Default visual is TrueColor
[  2386.263] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[  2386.263] (**) NVIDIA(0): Option "DPI" "96 x 96"
[  2386.263] (**) NVIDIA(0): Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced"
[  2386.263] (**) NVIDIA(0): Option "ProbeAllGpus" "False"
[  2386.263] (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration" "True"
[  2386.263] (**) NVIDIA(0): Option "ConnectedMonitor" "DP-0"
[  2386.263] (**) NVIDIA(0): Enabling 2D acceleration
[  2386.263] (**) NVIDIA(0): ConnectedMonitor string: "DP-0"
[  2386.263] (II) Loading sub module "glxserver_nvidia"
[  2386.263] (II) LoadModule: "glxserver_nvidia"
[  2386.263] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[  2386.266] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[  2386.266] 	compiled for 1.6.99.901, module version = 1.0.0
[  2386.266] 	Module class: X.Org Server Extension
[  2386.266] (II) NVIDIA GLX Module  465.31  Thu May 13 22:16:59 UTC 2021
[  2386.266] (II) NVIDIA: The X server supports PRIME Render Offload.
[  2386.268] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:10:0:0
[  2386.268] (--) NVIDIA(0):     DFP-0
[  2386.268] (--) NVIDIA(0):     DFP-1
[  2386.268] (--) NVIDIA(0):     DFP-2
[  2386.268] (--) NVIDIA(0):     DFP-3
[  2386.268] (--) NVIDIA(0):     DFP-4
[  2386.268] (--) NVIDIA(0):     DFP-5 (boot)
[  2386.268] (--) NVIDIA(0):     DFP-6
[  2386.268] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-3".
[  2386.268] (WW) NVIDIA: No DRM device: No direct render devices found.
[  2386.268] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce GTX 1070 (GP104-A) at PCI:10:0:0
[  2386.268] (II) NVIDIA(0):     (GPU-0)
[  2386.268] (--) NVIDIA(0): Memory: 8388608 kBytes
[  2386.268] (--) NVIDIA(0): VideoBIOS: 86.04.50.00.64
[  2386.268] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[  2386.268] (EE) NVIDIA(GPU-0): Failed to acquire modesetting permission.
[  2386.268] (EE) NVIDIA(0): Failing initialization of X screen
[  2386.268] (II) UnloadModule: "nvidia"
[  2386.268] (II) UnloadSubModule: "glxserver_nvidia"
[  2386.268] (II) Unloading glxserver_nvidia
[  2386.268] (II) UnloadSubModule: "wfb"
[  2386.269] (II) UnloadSubModule: "fb"
[  2386.269] (EE) Screen(s) found, but none have a usable configuration.
[  2386.269] (EE) 
Fatal server error:
[  2386.269] (EE) no screens found(EE) 
[  2386.269] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[  2386.269] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  2386.269] (EE) 
[  2386.269] (EE) Server terminated with error (1). Closing log file.
@ehfd
Copy link
Member

ehfd commented Aug 8, 2021

YOU CANNOT USE THIS CONTAINER IF YOU ALREADY HAVE AN X SERVER IN YOUR HOST FOR THAT GPU!!!
Use https://github.com/selkies-project/docker-nvidia-egl-desktop if you do not permissions to touch the host.

In order to use an X server on the host for your monitor with one GPU, and then provision other GPUs for the containers, it is required to change your /etc/X11/xorg.conf configurations.
First use nvidia-xconfig --no-probe-all-gpus --busid=$BUS_ID --only-one-x-screen to generate /etc/X11/xorg.conf where BUS_ID is generated with the below script. GPU_SELECT is the ID of the specific GPU you want to provision.

HEX_ID=$(nvidia-smi --query-gpu=pci.bus_id --id="$GPU_SELECT" --format=csv | sed -n 2p)
IFS=":." ARR_ID=($HEX_ID)
unset IFS
BUS_ID=PCI:$((16#${ARR_ID[1]})):$((16#${ARR_ID[2]})):$((16#${ARR_ID[3]}))

Then, edit /etc/X11/xorg.conf and add the following to the end.

Section "ServerFlags"
    Option "AutoAddGPU" "false"
EndSection

Note: https://man.archlinux.org/man/extra/xorg-server/xorg.conf.d.5.en
If you restart your OS or the Xorg server, you will now be able to use one GPU for your host X server and your real monitor, and use the rest of the GPUs for the containers.
Use docker --gpus '"device=1,2"' to provision GPUs with device IDs 1 and 2 to the container. --gpus 1 means any one GPU, not device ID of 1. Same for podman.

@reinismu
Copy link
Author

reinismu commented Aug 8, 2021

Ohh so I to make this work I shouldnt use GPU on host?

I did have X server running on host

@ehfd
Copy link
Member

ehfd commented Aug 11, 2021

You should not use one GPU for two X servers.
Check the --no-probe-all-gpus --busid=BUS_ID --only-one-x-screen options for nvidia-xconfig to create /etc/X11/xorg.conf automatically.

Stripped down example:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 460.39

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/mouse"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    Option         "DPI" "96 x 96"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:97:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "ProbeAllGpus" "False"
    SubSection     "Display"
        Virtual     1920 1080
        Depth       24
        Modes      "1920x1080"
    EndSubSection
EndSection

@ehfd ehfd pinned this issue Aug 11, 2021
@ehfd ehfd closed this as completed Aug 15, 2021
@LeehanLee
Copy link

@ehfd Do you have any idea why we can't use one GPU for multiple X servers?

@ehfd
Copy link
Member

ehfd commented Sep 29, 2021

The NVIDIA X.org driver specifications I guess. Transition to Wayland is going to make things better for sure.

@ehfd
Copy link
Member

ehfd commented Jun 15, 2022

For people new to Linux:

The container is usable with (and recommended to use) multiple GPUs in one host, as long as you set up nvidia-container-toolkit (also called nvidia-docker).
However, you should try disabling the GUI on your host (not the container) first and then try if it works, then if you need GUI on the host, only allocate a single GPU for the X server in the host like the above (again not the container) if you have multiple GPUs, and then exclude the GPU with the set PCI ID for the X server on nvidia-container-toolkit from docker allocation (use nvidia-smi to find out which GPU it is).

If you only have one GPU just disable the X server on the host since you cannot use GUI together with host & container.
The focus here is to sideline one GPU for the host X server, then allocate the remaining GPUs to the container to prevent collision.

You can stop the GUI in the host with sudo service lightdm stop or sudo service gdm stop (both can be disabled after this by changing stop to disable) based on your display manager. If you don't need GUI on the host this is cleanest. You can work on the command line console on your monitor or just ssh in.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration
This is how you select a specific GPU ID (this may not be the PCI ID, check nvidia-smi) or the UUID for a container.

Also, play with the VIDEO_PORT environment variable (try setting to DP-0, DP-1, DP-2, etc), only change on non-datacenter/Tesla GPUs.

@ayunami2000
Copy link

#26 this seems to fix it but let me know if its secretly breaking something elsewhere

@ehfd
Copy link
Member

ehfd commented Jun 16, 2022

#26 this seems to fix it but let me know if its secretly breaking something elsewhere

Not affirmed. Check my answer on #26.

@ehfd ehfd reopened this Jul 3, 2022
@ehfd
Copy link
Member

ehfd commented Aug 18, 2022

In order to use an X server on the host for your monitor with one GPU, and then provision other GPUs for the containers, it is required to change your /etc/X11/xorg.conf configurations.
First use nvidia-xconfig --no-probe-all-gpus --busid=$BUS_ID --only-one-x-screen to generate /etc/X11/xorg.conf where BUS_ID is generated with the below script. GPU_SELECT is the ID of the specific GPU you want to provision.

HEX_ID=$(nvidia-smi --query-gpu=pci.bus_id --id="$GPU_SELECT" --format=csv | sed -n 2p)
IFS=":." ARR_ID=($HEX_ID)
unset IFS
BUS_ID=PCI:$((16#${ARR_ID[1]})):$((16#${ARR_ID[2]})):$((16#${ARR_ID[3]}))

Then, edit /etc/X11/xorg.conf and add the following to the end.

Section "ServerFlags"
    Option "AutoAddGPU" "false"
EndSection

Note: https://man.archlinux.org/man/extra/xorg-server/xorg.conf.d.5.en
If you restart your OS or the Xorg server, you will now be able to use one GPU for your host X server and your real monitor, and use the rest of the GPUs for the containers.
Use docker --gpus '"device=1,2"' to provision GPUs with device IDs 1 and 2 to the container. --gpus 1 means any one GPU, not device ID of 1. Same for podman.

@ehfd ehfd closed this as completed Aug 18, 2022
@ehfd
Copy link
Member

ehfd commented Aug 24, 2022

Added in Documentation.

@ehfd
Copy link
Member

ehfd commented Aug 24, 2022

Do not use MIG on Ampere (A100, A30) GPUs. Anything graphics-related will not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants