
(thanks @pomon, @Staudey and Copilot )
Everything works now. I'll explain the problem in detail :
Symptoms
After an NVIDIA drivers update, I found myself with an unusable PC. Four times.
when I launched my laptop, I was greeted with a white toggling bar instead of KDE Plasma. I thought several times that the OS was at fault, turns out, it wasn't the case, since Boot Rescue (very good guide BTW) didn't bring back my system. I learned way too late (7 months after the first "blackout") that I could launch my session using tty (ctr+alt+f3
) and starting manually Plasma : startplasma-wayland
.
After this manipulation, I went in the settings, to see that my Kernel was as old as the world itself (joking of course, actually from some months ago I guess) and that my graphic card (NVIDIA, you guessed) was no longer recognized. I uninstalled the drivers and my setup was back on track again.
I was thus in this situation where my system could start most of the time (sometimes I couldn't enter my password, I had to pass through tty), but where I couldn't use my graphic card. That's where this discussion started.
What caused this situation
There are several reasons why this rare problem stuck. While I doubt anyone will find themself into such a complete mess, I will detail the causes to those problems, in case someone faces a similar problem.
remnants from other OSes : honestly, I'm not proud of this. When I crashed Solus for the first time (this time it was entirely my fault, I shut the system down while updating), I reinstalled it over the previously used partition. Then Solus crashed again, because of those drivers. I reinstalled again, then it happened again. Then I tried Fedora Kinoite on the same machine. Problems with NVIDIA drivers. Got back with Solus. Another crash. At this stage, my GRUB looked like a groceries list (3 Fedora entries, 4 Solus 4.7). I guess this would explain the old LTS kernel problem : an old OS certainly managed to load a prehistorical kernel, making my system hardly usable.
Full EFI partition : once again, a bit of negligence from a beginner had consequences. My EFI partition is only 512 Mo. I remember that at some point I made it bigger (1Go) during a Solus install but obviously it wasn't the last I made.
A boot device ... not in /boot : this one is the most comical one. My boot directory was virtually empty. The kernels existed, of course, since I had SIX OF THEM shown when running sudo clr-boot-manager list-kernels
. No surprises my already small EFI partition was full at 93%. In summary, I could mount a boot partition that didn't existed, and could use bootman, since it didn't know where the kernels where and or the EFI partition was full (as @Staudey pointed out).
The solutions
The process to solve those issues is, as expected, a bit tedious :
Print the list of installed kernels : sudo clr-boot-manager list-kernels
Check the kernel currently in use : uname -a
. If it isn't the most recent installed kernel, it is definitely not good.
At this point, @pomon advised me to remove old kernels to free up space and update the boot manager, thing that I could do since my boot device wasn't reckognized. At this point I shamefully went to Copilot asking for help (still thanks to @pomon and @Staudey for their guidance). We solved this /boot issue; Identify the EFI partition : lsblk
then identify the partition with 512 Mo or 1 Go, in fat32 or vfat format. Most of the time it's even mentioned that it's the EFI partition.
Change the partition type : sudo gdisk /dev/nvme0n1
(the disk, not the partition) then :
- press p to find the partition table. Enter the corresponding number according to the previous step (in my case, 1)
- press t to change the partition type, enter the number once again, then
EF00
to make it an EFI partition (if that's not the case)
- press p then x to save and exit
Check the partition : sudo fdisk -l /dev/nvme0n1
. The result should look like this : /dev/nvme0n1p1 2048 ... 512M EFI System
Update /etc/fstab for Persistent Mount : lsblk
to list the partitions again. Note the UUID of the EFI, then open fstab with nano : sudo nano /etc/fstab
. Replace the already existing UUID with the one you noted (in my case, the previous one was extremely long, while the latter had 8 characters); it should look like this : UUID=YOUR-UUID-HERE /boot vfat umask=0077 0 1
. Save and exit. (the reason why /boot instead of /boot/efi will be explained later).
At this point, Copilot noticed something strange : while every forum/tutorial I found mentioned /boot/efi/
as the directory to work with bootmanager, my systems actually expected /boot to be the reference directory, so we kept this direction. Mount /boot : sudo umount /boot/efi
if mounted, then sudo mount /dev/nvme0n1p1 /boot
.
Check free space in EFI partition then clean it : df -h /boot /boot/efi
, this should show if there's space in those 2 directories. As expected, /boot was almost completely filled. We check the content of /boot/EFI/com.solus.-project to know which files to delete (kernels + associated initrd), then remove all the useless kernels (quite a lot, it's almost ridiculous) :
sudo rm /boot/EFI/com.solus-project/initrd-com.solus-project.current.6.12.5-311
sudo rm /boot/EFI/com.solus-project/initrd-com.solus-project.current.6.12.5-311.nvidia
sudo rm /boot/EFI/com.solus-project/initrd-com.solus-project.current.6.12.9-312
sudo rm /boot/EFI/com.solus-project/initrd-com.solus-project.current.6.12.9-312.nvidia
sudo rm /boot/EFI/com.solus-project/initrd-com.solus-project.lts.6.6.70-263
sudo rm /boot/EFI/com.solus-project/kernel-com.solus-project.current.6.12.5-311
sudo rm /boot/EFI/com.solus-project/kernel-com.solus-project.current.6.12.9-312
sudo rm /boot/EFI/com.solus-project/kernel-com.solus-project.lts.6.6.70-263
sudo rm /boot/EFI/com.solus-project/kernel-com.solus-project.lts.6.6.75-264
Check free space after cleaning : df -h /boot
. My EFI partition is now as clean as a penny. However, updating the boot manager sudo clr-boot-manager update
and set the recent kernel as reference sudo clr-boot-manager set-kernel com.solus-project.lts.6.12.28-269
didn't make Solus start with this one. The strangest part of this is that the newest kernel is shown as the main one by entering sudo clr-boot-manager list-kernels
Set and clean the bootloader : this final step, specific to my configuration I think (since I did sooooo well by reinstalling OSes), consists of finding the boot order, change it, then suppressing all the crap left by other OSes. We get the order by entering sudo efibootmgr
. In my case, the Solus bootloader was Boot0000 and the Fedora bootloader Boot0005; guess which one came first in the boot sequence (hint : the oldest). We change that with the following command : sudo efibootmgr -o 0000,0006,0005,0001,0002,0003,0004
. Since I don't have Fedora, I remove its boot entry : sudo efibootmgr -b 0005 -B
. Reboot afterwards, then install the drivers using DoFlicky after checking that the correct kernel is being used.
Results
2 of my problems have been solved :
- My system is restored, I can use my graphic card like I did before the update
- My system automatically boots on Solus, without having to select it instead of Fedora in GRUB.
Hope this discussion will help someone in the future if needed, thanks again to @pomon and @Staudey. I'll go enjoy my no-so-vanilla Solus KDE, have a nice journey on this OS.