Also, can we get the output of: sudo env CBM_DEBUG=1 clr-boot-manager update
?
Major difficulties recovering system after week 27 and 28 updates
Have you tried this? https://help.getsol.us/docs/user/troubleshooting/boot-rescue
After chroot /target
, try to update to latest sudo eopkg up
, which should have a fix for nvidia-*
Not sure if you need to run this sudo usysconf run -f
but just do it for sure, then exist (chroot mode), umount and restart. This is how I recovered my PC
infinitymdm silke
Here is the output of lsblk
nvme0n1p1 would be my efi partition
sda 8:0 0 5.5T 0 disk
├─sda1 8:1 0 5.4T 0 part /mnt/quaternary
└─sda2 8:2 0 94.1G 0 part
sdb 8:16 0 465.8G 0 disk
├─sdb1 8:17 0 499M 0 part
├─sdb2 8:18 0 100M 0 part
├─sdb3 8:19 0 16M 0 part
└─sdb4 8:20 0 195.4G 0 part
sdc 8:32 0 931.5G 0 disk
└─sdc1 8:33 0 931.5G 0 part /mnt/secondary
sdd 8:48 0 3.6T 0 disk
└─sdd3 8:51 0 3.6T 0 part /mnt/tertiary
sde 8:64 1 14.5G 0 disk
└─sde1 8:65 1 14.5G 0 part
zram0 252:0 0 8G 0 disk [SWAP]
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 512M 0 part
├─nvme0n1p2 259:2 0 70.8G 0 part /
├─nvme0n1p3 259:3 0 161.4G 0 part /home
├─nvme0n1p4 259:4 0 29.8G 0 part [SWAP]
└─nvme0n1p5 259:5 0 668.9G 0 part /mnt/fast storage
and here is the output of sudo env CBM_DEBUG=1 clr-boot-manager update
[DEBUG] cbm (../src/cli/cli.c:L142): No such file: //etc/kernel/update_efi_vars
[INFO] cbm (../src/bootman/bootman.c:L787): Current running kernel: 6.9.8-294.current
[INFO] cbm (../src/bootman/sysconfig.c:L179): Discovered UEFI ESP: /dev/disk/by-partuuid/e9fc2609-be10-4546-ab1b-f7beebb9167e
[INFO] cbm (../src/bootman/sysconfig.c:L256): Fully resolved boot device: /dev/nvme0n1p1
[DEBUG] cbm (../src/bootman/bootman.c:L141): shim-systemd caps: 0x26, wanted: 0x26
[DEBUG] cbm (../src/bootman/bootman.c:L156): UEFI boot now selected (shim-systemd)
[INFO] cbm (../src/bootman/bootman.c:L807): path ///etc/kernel/initrd.d does not exist
[INFO] cbm (../src/bootman/bootman.c:L807): path ///usr/lib/initrd.d does not exist
[INFO] cbm (../src/bootman/bootman.c:L503): Checking for mounted boot dir
[INFO] cbm (../src/bootman/bootman.c:L555): Mounting boot device /dev/nvme0n1p1 at /boot
[SUCCESS] cbm (../src/bootman/bootman.c:L568): /dev/nvme0n1p1 successfully mounted at /boot
[DEBUG] cbm (../src/bootman/update.c:L164): Now beginning update_native
[DEBUG] cbm (../src/bootman/update.c:L173): update_native: 1 available kernels
[DEBUG] cbm (../src/bootman/update.c:L191): update_native: Running kernel is (current) ///usr/lib/kernel/com.solus-project.current.6.9.8-294
[SUCCESS] cbm (../src/bootman/update.c:L205): update_native: Bootloader updated
[DEBUG] cbm (../src/bootman/kernel.c:L617): installing extra initrd: /usr/lib64/kernel/initrd-com.solus-project.current.6.9.8-294.nvidia
[DEBUG] cbm (../src/bootloaders/systemd-class.c:L219): adding extra initrd to bootloader: initrd-com.solus-project.current.6.9.8-294.nvidia
[SUCCESS] cbm (../src/bootman/update.c:L220): update_native: Repaired running kernel ///usr/lib/kernel/com.solus-project.current.6.9.8-294
[DEBUG] cbm (../src/bootman/update.c:L230): update_native: Checking kernels for type current
[INFO] cbm (../src/bootman/update.c:L243): update_native: Default kernel for type current is ///usr/lib/kernel/com.solus-project.current.6.9.8-294
[DEBUG] cbm (../src/bootman/kernel.c:L617): installing extra initrd: /usr/lib64/kernel/initrd-com.solus-project.current.6.9.8-294.nvidia
[DEBUG] cbm (../src/bootloaders/systemd-class.c:L219): adding extra initrd to bootloader: initrd-com.solus-project.current.6.9.8-294.nvidia
[SUCCESS] cbm (../src/bootman/update.c:L255): update_native: Installed tip for current: ///usr/lib/kernel/com.solus-project.current.6.9.8-294
[DEBUG] cbm (../src/bootman/kernel.c:L617): installing extra initrd: /usr/lib64/kernel/initrd-com.solus-project.current.6.9.8-294.nvidia
[DEBUG] cbm (../src/bootloaders/systemd-class.c:L219): adding extra initrd to bootloader: initrd-com.solus-project.current.6.9.8-294.nvidia
[SUCCESS] cbm (../src/bootman/update.c:L269): update_native: Installed last_good kernel (current) (///usr/lib/kernel/com.solus-project.current.6.9.8-294)
[DEBUG] cbm (../src/bootman/update.c:L280): update_native: Analyzing for type current: ///usr/lib/kernel/com.solus-project.current.6.9.8-294
[DEBUG] cbm (../src/bootman/update.c:L285): update_native: Skipping running kernel
[INFO] cbm (../src/bootman/bootman.c:L503): Checking for mounted boot dir
[INFO] cbm (../src/bootman/bootman.c:L510): boot_dir is already mounted: /boot
[SUCCESS] cbm (../src/bootman/update.c:L338): update_native: Default kernel for current is ///usr/lib/kernel/com.solus-project.current.6.9.8-294
[DEBUG] cbm (../src/bootman/update.c:L353): No kernel removals found
[INFO] cbm (../src/bootman/bootman.c:L469): Attempting umount of /boot
[SUCCESS] cbm (../src/bootman/bootman.c:L473): Unmounted boot directory
Which appears to work as intended?
- Edited
minh I was about to chroot and try this boot rescue when my computer booted as normal unexpectedly. I am not sure if it would still be worthwhile to chroot in, shouldn't I be able to do everything from inside my system now? If it's still worthwhile lmk and I will give it a shot!
Also weirdly I haven't been having any issues with my nvidia card since the initial error with lightdm. I have tested out a few different games as well and the system is definitely running on the gpu rather than internal graphics as performance is as expected.
We pushed a hotfix for the nvidia driver, if you installed that it should be working (assuming it's the same issue). We're still trying to figure out the root cause of the issue.
Which appears to work as intended?
Yep, strange that it sometimes complains about /dev/nvme0n1p1
not existing.
ReillyBrogan If I try to update my system now it appears everything is up to date. The issue doesn't appear to be with the nvidia driver anymore, but clr-boot-manager is still giving me errors.
For now it seem I am still able to restart my computer and use it as normal. I believe it is able to find the efi partition when actually booting since my system starts, so I am not sure why it cannot properly detect it once the system is up and running.
- Edited
silke It really is bizarre. I was also having a very similar issue with my laptop today when I went to start it, except it was unable to detect my /home partition and so was only reaching the terminal. From the terminal if I rebooted it one or two times it would catch the partition and start as normal. But again if I turned it off I would risk it "losing" the partition again.
Incredibly odd that it is happening across both my main solus devices. The laptop is a t480s without any integrated graphics for what it is worth so nvidia should most definitely not be playing a role on my laptop.
*on Friday I believe I will be able to hop into the matrix at some point as I have more time available. I just need to sign up for it still.
Matt_Nico I am wondering if it may be something to do with these drives being nvme devices. Both the drive on my laptop and desktop with my solus install are fast nvme drives, this is purely conjecture but I wonder if the speed of these drives may be causing the issue? Could things be moving faster than clr-boot-manager or eopkg can keep up? That would explain why the issue is intermittent, as the system may be able to grab the information in time on some boot sequences but not on others.
Matt_Nico I've run Solus on a variety of PCIe x4 gen 3 and gen 4 drives. The speed of the drive is probably not your problem.
Matt_Nico I'm also using NVMe devices and have no issues. My guess is that there's something weird going on. You can check the kernel logs (journalctl -k
) for any suspicious information.
You could try updating the firmware using fwupd
(eopkg install fwupd
). Make sure you have a good backup beforehand though (I haven't seen it brick a system yet, but someone has to be the first).
- Ensure
/boot
is mounted:
This is normally done automatically, but it can't harm to double check seeing as the partitions seem a bit flaky.clr-boot-manager mount-boot
- Check for updates:
fwupdmgr refresh fwupgmgr get-updates
- Install them:
fwupdmgr update
silke here is the output of journalctl -k
to me it didn't look like there was anything glaringly wrong but I am definitely out of my depth here.
Jul 18 14:09:59 solus kernel: Command line: initrd=\EFI\com.solus-project\initrd-com.solus-project.current.6.9.8-294 initrd=>
Jul 18 14:09:59 solus kernel: BIOS-provided physical RAM map:
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000057fff] usable
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000000058000-0x0000000000058fff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000000059000-0x000000000009dfff] usable
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000000009e000-0x00000000000fffff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000040000000-0x00000000403fffff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000040400000-0x0000000069bd7fff] usable
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000069bd8000-0x0000000069bd8fff] ACPI NVS
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000069bd9000-0x0000000069bd9fff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x0000000069bda000-0x000000007b1befff] usable
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000007b1bf000-0x000000007b68efff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000007b68f000-0x000000007b6fefff] ACPI data
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000007b6ff000-0x000000007bb2efff] ACPI NVS
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000007bb2f000-0x000000007cffefff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000007cfff000-0x000000007cffffff] usable
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x000000007d000000-0x000000007fffffff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
Jul 18 14:09:59 solus kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
lines 1-24
When I am home later I would be happy to try to update the firmware. I have my important files saved on a separate drive but have not done a proper backup. Would something like this guide be acceptable to create a backup?
iirc this installation was done years ago with a Solus 4.2 or 4.3 iso, would that point to a firmware issue?
Matt_Nico silke I didn't end up going through with the fwupdmgr update
commands as of yet as I am not sure if it will actually do anything given the output of fwupdmgr get-updates
command.
matt@matt-solus-desktop ~ $ fwupdmgr get-updates
WARNING: This package has not been validated, it may not work properly.
Devices with no available firmware updates:
• SSD 850 EVO 500GB
• SSD 860 EVO 1TB
• WD BLACK SN750 SE 1TB
• WDC WD40EZRZ-75GXCB0
• WDC WD60EZAZ-00SF3B0
No updatable devices
It appears as though all my drives are up to date firmware wise so would it be worth it to go through and do the fwupdmgr update
command?
Still running into these issues on both my laptop and my desktop.
Matt_Nico If there are no updates available, running fwupdmgr update
will just tell you the same thing that get-updates
did. No need to run it.
Just to be clear, you're still getting this error you mentioned in your earlier post, correct?
Matt_Nico
[✗] Updating clr-boot-manager failedA copy of the command output follows:
[FATAL] cbm (../src/bootman/bootman.c:L562): FATAL: Cannot mount boot device /dev/nvme0n1p1 on /boot: No such device
Are there any other errors you're seeing, or any other behavior that you don't think is normal?
Could we get the output of sudo journalctl -k | grep nvme
? That should filter the system logs for kernel messages containing the string "nvme". There may be better search strings to try, this is just where I would start.
infinitymdm On my laptop there is additional weird behaviour occurring. Namely it is failing to mount the /home partition on startup around 50% of the time. If I just get it to reboot after the error it will boot just fine most times but sometimes it takes multiple attempts. Here is an image of the error:
And yes to be clear the other error is still persisting, this is as it appears on my laptop:
[✗] Updating clr-boot-manager failed
A copy of the command output follows:
[FATAL] cbm (../src/bootman/bootman.c:L562): FATAL: Cannot mount boot device /dev/nvme0n1p1 on /boot: No such device
[✗] Updating clr-boot-manager failed
A copy of the command output follows:
[FATAL] cbm (../src/bootman/bootman.c:L562): FATAL: Cannot mount boot device /dev/nvme0n1p1 on /boot: No such device
[✓] Running depmod on kernel 6.9.8-294.current success
Here is the output of sudo journalctl -k | grep nvme
on my laptop (I will also respond with my desktops output in a few minutes):
Jul 19 15:42:35 solus kernel: nvme nvme0: 8/0/0 default/read/poll queues
Jul 19 15:42:35 solus kernel: nvme0n1: p1 p2 p3 p4
Jul 19 15:42:37 solus kernel: EXT4-fs (nvme0n1p3): mounted filesystem 9aa71c0a-9d55-4f79-b370-a3f554f8eb80 r/w with ordered data mode. Quota mode: none.
Jul 19 15:42:35 solus kernel: nvme nvme0: pci function 0000:3e:00.0
Jul 19 15:42:35 solus kernel: nvme nvme0: 8/0/0 default/read/poll queues
Jul 19 15:42:35 solus kernel: nvme0n1: p1 p2 p3 p4
Jul 19 15:42:37 solus kernel: EXT4-fs (nvme0n1p3): mounted filesystem 9aa71c0a-9d55-4f79-b370-a3f554f8eb80 r/w with ordered data mode. Quota mode: none.
Jul 19 15:42:38 matt-solus-t480s kernel: EXT4-fs (nvme0n1p3): re-mounted 9aa71c0a-9d55-4f79-b370-a3f554f8eb80 r/w. Quota mode: none.
Jul 19 15:42:38 matt-solus-t480s kernel: Adding 11718652k swap on /dev/nvme0n1p2. Priority:-2 extents:1 across:11718652k SS
Jul 19 15:42:40 matt-solus-t480s kernel: EXT4-fs (nvme0n1p4): mounted filesystem fa0ff32f-4b57-468f-981f-e79f6fed9aa7 r/w with ordered data mode. Quota mode: none.
What I find most bizarre is that the error is almost exactly replicated on both of my systems.
infinitymdm One additional weird thing which is occurring on my desktop is that I will sometimes be unable to log in from the lock screen after the computer has gone into standby mode. There will be no option to type in the field where I must enter the password. This behaviour stops if I log out of the account and then log back into the system. This happens when the system is left in standby for longer than 6 hours, so I had just switched to powering off my system when this behaviour occasionally flares up (usually it will happen a few days in a row and then I will switch to powering the system down. I don't think this would be related.
Here is the error as it has been appearing on my desktop system:
[✓] Updating dynamic library cache success
[✗] Updating clr-boot-manager failed
A copy of the command output follows:
[FATAL] cbm (../src/bootman/bootman.c:L562): FATAL: Cannot mount boot device /dev/nvme0n1p1 on /boot: No such device
[✗] Updating clr-boot-manager failed
A copy of the command output follows:
[FATAL] cbm (../src/bootman/bootman.c:L562): FATAL: Cannot mount boot device /dev/nvme0n1p1 on /boot: No such device
[✗] Updating clr-boot-manager failed
A copy of the command output follows:
[FATAL] cbm (../src/bootman/bootman.c:L562): FATAL: Cannot mount boot device /dev/nvme0n1p1 on /boot: No such device
[✓] Running depmod on kernel 6.9.8-294.current success
[✓] Updating hwdb success
[✓] Updating system users success
[✓] Updating systemd tmpfiles success
[✓] Reloading systemd configuration success
[✓] Re-starting vendor-enabled .socket units success
[✓] Compiling and Reloading AppArmor profiles success
[✓] Updating manpages database success
[✓] Reloading udev rules success
[✓] Applying udev rules success
and the output of sudo journalctl -k | grep nvme
on my desktop system:
Jul 19 16:01:31 solus kernel: nvme nvme0: allocated 64 MiB host memory buffer.
Jul 19 16:01:31 solus kernel: nvme nvme0: 6/0/0 default/read/poll queues
Jul 19 16:01:31 solus kernel: nvme0n1: p1 p2 p3 p4 p5
Jul 19 16:01:33 solus kernel: EXT4-fs (nvme0n1p2): mounted filesystem 2456cde0-a7e1-4af1-99ca-c30eb65f868a r/w with ordered data mode. Quota mode: none.
Jul 19 16:01:31 solus kernel: nvme nvme0: pci function 0000:03:00.0
Jul 19 16:01:31 solus kernel: nvme nvme0: allocated 64 MiB host memory buffer.
Jul 19 16:01:31 solus kernel: nvme nvme0: 6/0/0 default/read/poll queues
Jul 19 16:01:31 solus kernel: nvme0n1: p1 p2 p3 p4 p5
Jul 19 16:01:33 solus kernel: EXT4-fs (nvme0n1p2): mounted filesystem 2456cde0-a7e1-4af1-99ca-c30eb65f868a r/w with ordered data mode. Quota mode: none.
Jul 19 16:01:34 matt-solus-desktop kernel: EXT4-fs (nvme0n1p2): re-mounted 2456cde0-a7e1-4af1-99ca-c30eb65f868a r/w. Quota mode: none.
Jul 19 16:01:34 matt-solus-desktop kernel: Adding 31250428k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:31250428k SS
Jul 19 16:01:34 matt-solus-desktop kernel: EXT4-fs (nvme0n1p5): mounted filesystem 81b8fa97-0b82-4375-a08a-6d7ec6daa7af r/w with ordered data mode. Quota mode: none.
Jul 19 16:01:35 matt-solus-desktop kernel: EXT4-fs (nvme0n1p3): mounted filesystem 168e8227-9af4-4f3a-b2be-3bdf0875dece r/w with ordered data mode. Quota mode: none.
Jul 19 16:01:36 matt-solus-desktop kernel: block nvme0n1: No UUID available providing old NGUID
- Edited
Matt_Nico now I am running into the same issue with the /home partition on my desktop computer. I snapped a pic of it as well:
It seems to be identical to the issue which is present on the laptop. So now I can say that both systems are exhibiting the exact same set of errors.
I have not yet applied the W39 updates as I do not know what will happen when I do. Should I just go for it?
Been watching not sure why 2 computers different would have the same issue.
Do you have fastboot turned off on each?
Its interesting.
Matt_Nico One additional weird thing which is occurring on my desktop is that I will sometimes be unable to log in from the lock screen after the computer has gone into standby mode. There will be no option to type in the field where I must enter the password. This behaviour stops if I log out of the account and then log back into the system. This happens when the system is left in standby for longer than 6 hours, so I had just switched to powering off my system when this behaviour occasionally flares up (usually it will happen a few days in a row and then I will switch to powering the system down. I don't think this would be related.
I actually also have this issue, but I usually resolve it by hitting "Switch User" and then logging back in. It kicks you back to the standby screen, but then password entry works. So I don't think this is related to your primary issue.
The /home failing to mount, alongside the boot manager issue, would lead me to think there's a problem with the storage device. But the fact that it's replicated across both systems should rule that out. The odds of two drives failing in exactly the same way at exactly the same time in two separate systems are basically zero.
Your issue seems to be beyond my knowledge to troubleshoot. I don't see any issues in your kernel logs either. This sure is a puzzle. If I think of anything I'll pipe in, but I'm not sure how to help you at this point. You need someone more experienced than myself.