I am having a problem with my Linux desktop and hoping this community can help. The hardware is only about a year old: CPU: AMD Ryzen 7 7800X3D, GPU: AMD Radeon RX 7900XTX. I am running KDE Plasma 6.4.4 on Wayland. My distro is Feodra 42, and I keep up with regular updates. Kernel version is 6.15.9.

Lately, I have been getting a flicker of horizontal white stripes occasionally coming across my monitors. It does not seem to matter what the program on the screen is. They’re not making it impossible to use the computer, but it is very distracting when it happens. I am also worried that it may be the hardware failing, but I am hoping it’s just a driver issue.

Is this a known issue with AMD drivers? Part of my concern is that last year I installed amdgpu and rcom from the AMDGPU and ROCm repos to play with AI models locally. Now I am wondering if that is messing with my video drivers. How can I tell which ones are being used? If I want to go back to the stock drivers, do I just uninstall the amdgpu package with dnf?

  • Vik@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    3 months ago

    Perhaps i fudged the dnf command example, it doesn’t appear to be possible to list the gfx kernel driver version via dnf. The entity we see is the external package installed for ROCm support. On that note, it’s worth mentioning that Fedora directly packages ROCm, and I’d recommend uninstalling the downloaded driver and switching over unless you have a specific reason to use the out-of-repo package.

    As for underflow, a colleague of mine had provided a nice little run-down: The display buffer holds image you want to send to the display, is like part of GPU memory. When you send via DisplayPort, HDMI etc., you add some sync signals to indicate properties like the start of a frame, start of line, and extra data. If at any point this exchange gets messed up (like memory is unreadable, or you send data when it’s not ready), you get garbage on-screen.

    One way underflow can happen from the context of DAL (display abstraction layer) is during a DPM (dynamic power management) change. This change takes a relatively long time, and on rare occasions, lead to underflow.

    Alternatively, sometimes display hardware can’t change refresh rate (for example) as fast as we ask it to, and you may end up with underflow from that. I suppose we could try other cables but the impression I get is that this just started happening at some point?

    if this behaviour is very recent, it could be from the distro provided amdgpu kernel driver, though I’m not sure if there could be any conflict by having the external package installed. Like I’ve mentioned above, it could be worth removing that set of packages installing the Fedora ROCm metapackage instead (sudo dnf install rocm).

    • folekaule@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      3 months ago

      Thank you for the detailed answer, this is very informative. I should read up some more on the underflow issue. This is the first time I’ve heard that term. I’m familiar with buffer underflows, but this sounds a little more complex.

      You are right, it did start happening just recently (within the last couple of weeks maybe). I forgot to mention that I am running this through a 4-port KVM. I didn’t think it relevant before because I’m not seeing any issues when using another port (my work PC), but I can’t rule it out.

      It has been some time since I played with it, but I think the reason I went out-of-repo at the time was to get recent enough versions of amdgpu and rocm to run ollama. I was following some online guide and had no idea what I was doing so I probably messed something up.

      It sounds like it should be safe to uninstall the out-of-repo amdgpu and rocm packages. I am not doing any local AI right now so I can probably leave it out. I do use the PC for gaming, but from what I’ve been reading it sounds like the standard drivers are good enough for that now.

      Thank you everyone for your help. I didn’t expect anyone to actually reply and you guys have been awesome! I am going to swap the cables first of all since that’s an easy thing to try and see if anything changes.

      • Vik@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 months ago

        to be honest, I feel as if I may have jumped the gun somewhat by suggesting this could be display underflow. I gather how tricky this could be but if you could somehow capture some footage of this in progress, I’d be curious to see it.

        Does this white line show up across both displays in tandem?

        • folekaule@lemmy.worldOP
          link
          fedilink
          arrow-up
          1
          ·
          3 months ago

          It’s very quick and unpredictable. I have not noticed it across both screens at the same time, but can’t rule that out. They’re pretty thin lines and usually only one at a time, like a scan line on a TV. It just started becoming more noticeable recently so I’ll try to keep track of when it happens.

          • Vik@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            If it’s captured via DVR system, it’d likely be a UMD issue. If not, I would attribute it to DAL / KMD

            Try to use the instant replay feature in GPU screen recorder to see if you can cap it in there?

              • Vik@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                2 months ago

                I don’t think it would either to be honest, but it’s a good way to determine if a form of visual corruption is caused by the display abstraction layer, or the gfx umd (or something else entirely, like the OS). I use the same technique at work 😅