Continuing the discussion from https://discuss.getsol.us/d/9461-systemctl-status-solus-state-degraded
The nvidia driver works however after boot it results in
~$systemctl status solus
State: degraded
causes failure to activate this nvidia-persistenced.service which it's not entirely clear to me what it's for and if it's for (I don't have a power9 if you ask).
However if as a user I do nvidia-persistenced.service it asks me for the password
of root and part and everything is solved.
From what I think I understand it is a problem of
permissions,indeed nvidia-persistenced has its own user and group and instead the rest
nvdia etc is root, so probably nvidia-peresitenced only at
power off and/or power on phase can't write/erase somewhere somewhat
(/var/run/nvidia-persistenced ?)
But at the end of the day it's me who can do something or is a bug of
nvidia package to fix at build time?
More info:
`~ $ sudo systemctl status nvidia-persistenced.service`
`[sudo] password for -..:`
`× nvidia-persistenced.service - NVIDIA Persistence Daemon` `Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; preset: enabled)` `Active: failed (Result: exit-code) since Sat 2023-07-15 17:50:10 CEST; 7min ago` `rocess: 1034 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced (code=exited, status=1/FAILURE)` `Process: 1039 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)` `CPU: 3ms` `lug 15 17:50:10 solus systemd[1]: Starting NVIDIA Persistence Daemon...` `lug 15 17:50:10 solus nvidia-persistenced[1035]: Started (1035)` `lug 15 17:50:10 solus nvidia-persistenced[1035]: Failed to query NVIDIA devices. Please ensure that the NVIDIA devi>` `lug 15 17:50:10 solus nvidia-persistenced[1034]: nvidia-persistenced failed to initialize. Check syslog for more de>` `lug 15 17:50:10 solus nvidia-persistenced[1035]: Shutdown (1035)` `lug 15 17:50:10 solus systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE` `lug 15 17:50:10 solus systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.` `lug 15 17:50:10 solus systemd[1]: Failed to start NVIDIA Persistence Daemon.`
`~ $ list-installed nvidia | grep nvidia`
`libnvidia-container - NVIDIA container runtime library` `nvidia-470-glx-driver-modaliases - These files are used by the Software Center for hardware detection` `nvidia-container-toolkit - NVIDIA Container Toolkit` `nvidia-glx-driver-common - Shared assets for the NVIDIA GLX Driver` `nvidia-glx-driver-current - NVIDIA Binary Driver (Current Kernel)` `nvidia-glx-driver-modaliases - These files are used by the Software Center for hardware detection`
`~ $ id nvidia-persistenced`
`uid=143(nvidia-persistenced) gid=143(nvidia-persistenced) gruppi=143(nvidia-persistenced) `
`~ $ ls -l /dev/nvidia*`
`crw-rw-rw- 1 root root 195, 0 15 lug 17.50 /dev/nvidia0`
`crw-rw-rw- 1 root root 195, 255 15 lug 17.50 /dev/nvidiactl`
`crw-rw-rw- 1 root root 195, 254 15 lug 17.50 /dev/nvidia-modeset`
`crw-rw-rw- 1 root root 237, 0 15 lug 17.50 /dev/nvidia-uvm`
`crw-rw-rw- 1 root root 237, 1 15 lug 17.50`
`/dev/nvidia-uvm-tools /dev/nvidia-caps:`
`totale 0`
`cr-------- 1 root root 240, 1 15 lug 17.50 nvidia-cap1`
`cr--r--r-- 1 root root 240, 2 15 lug 17.50 nvidia-cap2 `
`~ $ inxi -F | grep NVIDIA`
`Device-2: NVIDIA TU117M [GeForce GTX 1650 Mobile / Max-Q] driver: nvidia`
`API: OpenGL v: 4.6.0 NVIDIA 535.54.03 renderer: NVIDIA GeForce GTX`
`Device-2: NVIDIA driver: snd_hda_inte`l
~ $ nvidia-smi
Sat Jul 15 18:14:43 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1650 Off | 00000000:01:00.0 On | N/A |
| N/A 46C P8 6W / 50W | 510MiB / 4096MiB | 6% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1020 G /usr/lib64/xorg-server/Xorg 302MiB |
| 0 N/A N/A 1613 G /usr/bin/budgie-wm 44MiB |
| 0 N/A N/A 2896 G /usr/bin/firefox 159MiB |
+---------------------------------------------------------------------------------------+
I think it could center with this thing here:
The daemon does not require root privileges to run, and may safely be run as an unprivileged user, given that its runtime directory, /var/run/nvidia-persistenced, is created for and owned by that user prior to starting the daemon. nvidia-persistenced also requires read and write access to the NVIDIA character device files. If the permissions of the device files have been altered through any of the NVreg_DeviceFileUID, NVreg_DeviceFile_GID, or NVreg_DeviceFileMode NVIDIA kernel module options, nvidia-persistenced will need to run as a suitable user. If the daemon is started with root privileges, the --user option may be used instead to indicate that the daemon should drop its privileges and run as the specified user after setting up its runtime directory. Using this option may cause the daemon to be unable to remove the /var/run/nvidia-persistenced directory when it is killed, if the specified user does not have write permissions to /var/run. In this case, directory removal should be handled by a post-execution script. See the sample init scripts provided in /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2 for examples of this behavior. The daemon indirectly utilizes nvidia-modprobe via the nvidia-cfg library to load the NVIDIA kernel module and create the NVIDIA character device files after the daemon has dropped its root privileges, if it had any to begin with. If nvidia-modprobe is not installed, the daemon may not be able to start properly if it is not run with root privileges
From https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions
"NVIDIA is providing a user-space daemon on Linux to support persistence of driver state across CUDA job runs. The daemon approach provides a more elegant and robust solution to this problem than persistence mode. For more details on the NVIDIA Persistence Daemon, see the documentation here.
The NVIDIA Persistence Daemon can be started as the root user by running:
/usr/bin/nvidia-persistenced --verbose
This command should be run on boot. Consult your Linux distribution’s init documentation for details on how to automate this.''
~ $ /usr/bin/nvidia-persistenced --verbose
nvidia-persistenced failed to initialize. Check syslog for more details.
~ $ journalctl
....
`ug 15 18:37:32 solus nvidia-persistenced[13610]: Verbose syslog connection opened
lug 15 18:37:32 solus nvidia-persistenced[13610]: Failed to create directory /var/run/nvidia-persistenced: Permission denied
lug 15 18:37:32 solus nvidia-persistenced[13610]: Directory /var/run/nvidia-persistenced will not be removed on exit
lug 15 18:37:32 solus nvidia-persistenced[13610]: Unable to access /var/run/nvidia-persistenced: No such file or directory
lug 15 18:37:32 solus nvidia-persistenced[13610]: Shutdown (13610)
...
`q
Other link:
https://forums.developer.nvidia.com/t/nvidia-persistenced-fails-to-start-if-user-option-is-set-to-non-root-user/174542