I have an Dell Inspiron 5770 laptop with Intel UHD Graphics 620 and discrete AMD ATI Radeon R7.
With the 5.3 series I would get i915 fifo overrun messages that manifested as a screen flash - these happened infrequently. With the latest 5.4 I've now been getting lockups.

Relevant dmesg:

[ 7294.702710] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
[ 7294.702895] i915 0000:00:02.0: Resetting chip for hang on rcs0
[ 7295.198781] pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
[ 7295.206230] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 7295.211065] amdgpu: [powerplay] can't get the mac of 5
[ 7295.414739] rfkill: input handler enabled
[ 7296.686784] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
[ 7296.687530] i915 0000:00:02.0: Resetting chip for hang on rcs0

I've had to reboot the laptop to recover each time this has happened. As a temporary workaround I've enabled my discrete AMD card and will see if that works or not.

Side rant: the Linux kernel has become increasingly unstable with the 5.0+ series. I've actually had data loss thanks to the poor quality of kernel development. This isn't the responsibility of the Solus team of course - I really like Solus and Budgie and the curated rolling release model. But with crappy quality control in upstream Linux kernel development you might start thinking seriously about offering only an LTS kernel. I'm probably going to have to switch to the -lts package anyways if my AMD workaround fails.

    appmath I seem to agree. Now that m wifi keeps shutting down after new kernel updates. Maybe Solus team should slow down the newer kernel installations until they have been tested over time? Unless an Ethernet connection is readily available as a backup, the rude surprise of my machine not connected to internet make me not a very happy camper indeed. This is a serious issue for those whose PC is daily driver for work and home.

    Regarding LTS vs Latest: the main reason we provide the -current releases of the kernel is for hardware support. If we didn't you might have to wait several months after even the -current release gets it for it to get backported. That could be over a year in some cases. But that's also assuming it ever gets backported at all. Unfortunately 4.14 was a problem child when we had it as -current so we were forced to stay back with 4.9. This happened again with 4.19 because it was the first LTS after the new AMD DC/DAL stack was merged. This broke a lot of things with older AMD GPUs. So again 4.9 has been held back as the LTS release. Well guess what folks, 5.4 is slated to be the next LTS and once again we have several people having huge regressions that don't seem to be getting fixed fast enough. I'm not a superstitious person, but damn if it doesn't seem like all of the LTS releases after 4.9 are just cursed.

    I don't know how to fix this problem, but I also can't in good conscience keep us on an EOL kernel and 4.9 just doesn't support most new hardware.

      I didn't get a lockup with my AMD ATI R7 workaround but there was a momentary freeze when the i915 driver screwed up again.
      So after that happened I tried the linux-lts ... Sadly I can't use that one because my wi-fi device crashes on startup:

      [ 12.166125] ath10k_pci 0000:03:00.0: firmware crashed! (uuid n/a)
      [ 12.166127] ath10k_pci 0000:03:00.0: qca9377 hw1.1 target 0x05020001 chip_id 0x003821ff sub 1028:1810
      [ 12.166128] ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 0 tracing 0 dfs 0 testmode 0
      [ 12.166577] ath10k_pci 0000:03:00.0: firmware ver WLAN.TF.1.0-00002-QCATFSWPZ-5 api 5 features ignore-otp crc32 c3e0d04f
      [ 12.166734] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 8aedfa4a
      [ 12.166735] ath10k_pci 0000:03:00.0: htt-ver 0.0 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
      [ 12.169051] ath10k_pci 0000:03:00.0: firmware register dump:
      [ 12.169052] ath10k_pci 0000:03:00.0: [00]: 0x05020001 0x00000000 0x00A0F774 0x00000000
      [ 12.169053] ath10k_pci 0000:03:00.0: [04]: 0x00A0F774 0x00060130 0x00000010 0xFFFFE000
      [ 12.169054] ath10k_pci 0000:03:00.0: [08]: 0x0042136C 0x00420660 0x00400000

      <etc>

      Oh well.

      DataDrake

      Since you're at the mercy of an incompetent Linux kernel team (I don't know what the hell Linus thinks is going on) all I can suggest is taking a stable LTS kernel from Red Hat/Ubuntu/etc (whoever has the installed base to find problems quickly and the manpower to apply custom patches) and using that as the Solus LTS.

      • n2o replied to this.

        appmath Not sure if "incompetent" is really the right word... I'm guessing the whole kernel thing does come with a certain complexity.

        DataDrake Thanks for that post. Lots of insights there for a noob like me 😁 Funny enough 5.4.8-141.current is the first current kernel that works flawlessly on my XPS 9370. Until now I stayed on the LTS kernel but that has a problem with bluetooth on my machine.

        • [deleted]

        can confirm the lockups on Gnome Edition as well with 5.4. Booting previous 5.3.18-140 for now - no issues over there....

          [deleted]
          I've had to return to 5.3.18-140 as well since my attempt at a workaround (using my discrete gpu) failed. I suppose blacklisting the i915 driver would be the next thing to try but I find myself unmotivated to do that.
          So for the first time I find myself running a FrankenSolus - up to date but downgraded to the previous linux-current, -headers and virtualbox.
          So far so good...

            appmath How did you get the headers for 5.3.18-140 ? I only installed the kernel but could not find the headers. And why rollback the VirtualBox as well? FrankenSolus...lol ! 🤣

              elfprince
              I rolled back Virtualbox because when you install it manually you generally have to rebuild the kernel modules using dkms - not being sure how Solus does it (haven't had to worry about it, Solus just works), I wanted to be safe and not sorry. 99%+ of the time we get the kernel, headers and virtualbox updated together.

              The current headers probably aren't necessary since I'm not manually building drivers atm but I wanted to keep my FrankenSolus at least somewhat consistent with how the upgrade process goes.

              • linux-current-5.3.18-140-1-x86_64.eopkg
              • linux-current-headers-5.3.18-140-1-x86_64.eopkg

              Thanks, but headers don't seem to exist.
              $ sudo eopkg it linux-current-headers-5.3.18-140-1-x86_64.eopkg
              Password:
              Program terminated.
              Cannot open package file: [Errno 2] No such file or directory: 'linux-current-headers-5.3.18-140-1-x86_64.eopkg'

                DataDrake just to throw that in here randomly if its okay.
                The last update with 5.4 is actually the first kernel update that gave me problems on my Solus machine. The entire System "feels" unstable, some tasks don't close as fast as they used to and/or make the system "stutter" while the task is getting closed/killed.

                Also I've graphic glitches in some games with this Kernel. I can't tell if this has anything to do with the Kernel or if its an Nvidia driver issue. But with the latest 5.3 kernel those problems do not happen at all. I am using a Ryzen 1700+ btw. with a 1080 nvidia.

                Edit: and also the budgie-wm process has unusual very high CPU load hovers around 8% when idle and goes up to 60 or 70% when I do some simple stuff like using nautilus windows or Firefox etc. That is also the reason it seems, for the "stuttering when closing tasks" my fans go crazy because CPU load goes up like crazy and then everything slows down becaue budgie-wm process takes all the CPU.

                14 days later

                Looks like it is kernel related (rather than NVidia driver).
                My machine is running internal GPU (intel 620) only, and I am getting the GPU panics as well.

                Driver: 20190822
                Time: 1580203603 s 941455 us
                Boottime: 5955 s 161091 us
                Uptime: 14 s 9601 us
                Epoch: 4300622016 jiffies (1000 HZ)
                Capture: 4300622016 jiffies; 184551 ms ago, 0 ms after epoch
                Reset count: 1
                Suspend count: 0
                Platform: KABYLAKE
                Subplatform: 0x0
                PCI ID: 0x5917
                PCI Revision: 0x07
                PCI Subsystem: 10cf:1959
                7 days later

                I also can attest that 5.4.x kernels are giving me unstable system. Intel video
                kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
                Mouse moves, picture is frozen, I can ssh in and reboot. Sometimes I can switch to other tty e.g. with ctrl+alt+f3, other times switching is very slow or not happening at all.
                Noticed that electron-based(?)/gpu accelerated(?) software, like opera/chrome, spotify, vscode sometimes starts to flicker and becomes unusable.

                Since I can choose to boot 5.3.x, I use it. So far it works without problems.

                5.4.12 brought back my troubles as well while 5.4.8 worked nicely. Not sure if it's really the kernel itself or some incompatibility though. My knowledge of these things is very superficial.

                The 5.5.1 kernel is coming soon and may have fixes if enough people reported this issue to the kernel developers.

                7 days later

                I had the same problem with kernel 5.4, 4.9 works nice but wifi/ethernet drivers are not working properly with it.