|
|
Subscribe / Log in / New account

A Nouveau graphics driver update

By Jonathan Corbet
December 1, 2023
LPC
Support for NVIDIA graphics processors has traditionally been a sore point for Linux users; NVIDIA has not felt the need to cooperate with the kernel community or make free drivers available, and the reverse-engineered Nouveau driver has often struggled to keep up with product releases. There have, however, been signs of improvement in recent years. At the 2023 Linux Plumbers Conference, graphics subsystem maintainer Dave Airlie provided an update on the state of support for NVIDIA GPUs and what remains to be done.

The kernel community's relationship with NVIDIA "has gone up and down" over the years, Airlie began. Recently, though, the company has rearchitected its products, adding a large RISC-V processor (the GPU system processor, or GSP) and moving much of the functionality once handled by drivers into the GSP firmware. The company allows that firmware to be used by Linux and shipped by distributors. This arrangement brings a number of advantages; for example, it is now possible for the kernel to do reclocking of NVIDIA GPUs, running them at full speed just like the proprietary drivers can. It is, he said, a big improvement over the Nouveau-only firmware that was provided previously.

[Dave Airlie] There are a number of disadvantages too, though. The firmware provides no stable ABI, and a lot of the calls it provides are not documented. The firmware files themselves are large, in the range of 20-30MB, and two of them are required for any given device. That significantly bloats a system's /boot directory and initramfs image (which must provide every version of the firmware that the kernel might need), and forces the Nouveau developers to be strict and careful about picking up firmware updates.

Nouveau work has taken a bit of a setback since longtime developer Ben Skeggs left the project, but he did manage to do a lot of refactoring before he went. Nouveau now has initial GSP support for one firmware version; that code was merged in for the 6.7-rc1 release. It is only enabled for the Ada series of GPUs by default; with a command-line argument it can be made to work with Turing and Ampere devices as well. It is missing some features, including fault handling (which "shouldn't be too hard" to add) and sensor monitoring, which doesn't work at all.

NVIDIA's firmware, Airlie said, comes with a set of include files that, in turn, define structures that change over time. To deal with these changes, the driver is going to need some sort of automated ABI generation; he noted that the developers working on the Apple M1 GPU driver have run into the same problem. This problem could be made easier to tackle, he suggested, if the driver were, like the M1 driver, to be rewritten in Rust.

Next steps

Supporting the GSP firmware is just the beginning, though; at this point, Airlie took a step back and talked about the task of making a useful GPU driver in general. Years ago, a graphics card came with some video RAM and a graphics translation table (GTT). The driver would map system memory into the graphics card; user space could then submit buffer handles that would be relocated for the graphics device. This approach works, he said, but it is slow.

Current GPUs have full virtual memory, instead, which saves a lot of that overhead. The kernel has grown a number of subsystems for working with this virtual memory, including the graphics execution manager (GEM) for buffer-object management, the translation table manager (TTM) for discrete video-RAM buffer-object management, and a bunch of synchronization and fencing code. Initially, the DRM subsystem would tie the allocation of a buffer to an allocation of virtual memory at the same time; that was easy to do and sufficed to implement OpenGL. But, he said, the graphics world moved on from there.

Specifically, Vulkan came along. It brought the concept of sparse memory and, with it, virtual memory that is managed by user space. Vulkan can handle both synchronous and asynchronous virtual-area updates, but it "gets complicated". Various drivers started inventing their own virtual-area management; as a way of bringing that work back together, the VM_BIND API was developed.

This is consistent with a recurring pattern, Airlie said. The DRM developers work to share common code between graphics drivers, but the driver developers keep trying to reinvent wheels, a tendency that has to be resisted. The subsystem did well with regard to mode setting, he said, but less well on the acceleration side; there is a "common GPU scheduler" that is only used by one driver, for example. Similarly, there are a lot of drivers implementing VM_BIND by doing their own virtual-area management.

In response, Airlie came up with the "good idea" of getting somebody else to write a common virtual-area manager, called GPUVM, inspired by the amdgpu code. It is intended to be useful for all drivers; it is used by the Nouveau, Xe (Intel's new driver), and Panfrost drivers now. Hopefully the amdgpu and MSM drivers will pick it up as well. The best part is that there are multiple developers who understand it and can help to keep it from going off in the wrong direction. GPUVM has been through a lot of iterations, he said, providing "lots of learning experiences".

As an example, he talked about the problem of fence signaling. A fence indicates when a series of GPU operations has been completed; waits for these fences have to be time-bounded, or the memory-management subsystem might deadlock. In short, a GPU can easily pin down all of a system's RAM if given the opportunity. There is a shrinker that can be called when memory gets tight, but it will have to wait for fences to be signaled to know when memory can be freed. If the code that set the fence decides to allocate more memory while this is happening, a deadlock results. To avoid this outcome, developers have to strictly limit the operations that can be performed in fence-signaling critical sections; care must also be taken before acquiring any locks. It would be nice to be able to update the page tables during this code, but that ran into deadlock problems and had to be backed out.

Returning to Nouveau, Airlie said that the initial VM_BIND API, using GPUVM, synchronous objects, and integration with the scheduler, was merged for the 6.6 release. There are a lot of improvements in the works that should land in 6.8. At this point, he said, we have the core of a modern GPU driver for NVIDIA hardware — for graphics, at least. More work will be required before Nouveau can support compute applications.

On the user-space side, Faith Ekstrand has been developing the NVK Vulkan driver for Nouveau; this driver recently reached Vulkan 1.0 conformance. This work involved creating a new compiler, called NAK, that has just been merged into Mesa; this compiler yields far better performance (from 20 frames per second to over 1000) than the old "codegen" compiler did. Naturally, this compiler is written in Rust. The next step, Airlie concluded, is to move forward to Vulkan 1.3.

Video and slides from the talk are available.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]

Index entries for this article
KernelDevice drivers/Graphics
ConferenceLinux Plumbers Conference/2023


(Log in to post comments)

A Nouveau graphics driver update

Posted Dec 3, 2023 6:19 UTC (Sun) by logang (subscriber, #127618) [Link]

My perspective as a user or the Linux graphics stack has been poor. I have one machine stuck on an aging 5.4 kernel as the Intel integrated graphics doesn't work on anything newer. And another older machine that, after upgrading to bookworm, the Nouveau driver stopped working. I was forced to install the proprietary Nvidia driver just to get any work done, and now the graphics on that machine are far buggier and crash prone than it used to be. I've spent way too much time trying to solve both these problems. I miss the days when Linux would reliably work on nearly any hardware -- seems to only be getting worse now.

A Nouveau graphics driver update

Posted Dec 3, 2023 11:16 UTC (Sun) by tux3 (subscriber, #101245) [Link]

I mostly remember days where getting any sound out of it was a hero's journey, wifi was complicated, and the GPU was for displaying a tty and/or wallpaper (when Xorg wasn't crashing)

I'm not sure moving all the software into the GSP is a good sign of nvidia-upstream collaboration, but pragmatically maybe it's easier to manage than the current situation?

Won't fix your old GPU, but if the new way is "keep a giant blob around", we could hope old hardware support will have less places to bitrot.
Not that I would bet on it, mind you.

A Nouveau graphics driver update

Posted Dec 3, 2023 15:50 UTC (Sun) by Gerardo (subscriber, #37539) [Link]

Try setting i915.modeset=0 in the kernel cmdline.

A Nouveau graphics driver update

Posted Dec 3, 2023 20:15 UTC (Sun) by roc (subscriber, #30627) [Link]

> I miss the days when Linux would reliably work on nearly any hardware

I'm a huge Linux fan, been running it on my laptops and desktops for > 15 years, and those days never existed.

A Nouveau graphics driver update

Posted Dec 3, 2023 20:42 UTC (Sun) by mb (subscriber, #50428) [Link]

>and those days never existed

Yes, the hardware compatibility as of today is better than it has ever been.

20 years ago one has to check "Linux compatibility" before buying hardware.
Today virtually all new hardware just works.
And old hardware keeps working most of the time.

Today it is possible to buy a random Laptop and everything just works.
That was not the case 20 years ago.

A Nouveau graphics driver update

Posted Dec 3, 2023 20:51 UTC (Sun) by himi (subscriber, #340) [Link]

Certainly never for graphics or sound, and a fairly wide range of other "consumer" devices.

Most of the sound issues have gone away since the sound card market kind of went away (outside the "pro" market) - everything's integrated these days, which means there's a much smaller set of things that need some level of support, and they generally come from larger vendors. So sound mostly "just works" now.

Graphics is a /long/ way from getting to that point, though it's not /that/ hard to pick hardware that will likely work out of the box. There are really only three vendors to worry about, and two of them work pretty well with Linux - if you get a laptop with integrated Intel or AMD graphics you're probably not going to have many issues (though that's far from guaranteed). But the graphics market includes crazy shit like Prime, which is a nightmare to deal with sensibly - I've got a laptop with an NVidia card that's got perfectly solid (if proprietary) driver support, and an integrated Intel chip that's really well supported, but the combination falls over regularly.

Obviously the server side is different, but in that market Linux is the 800 pound gorilla. Consumer markets, not so much . . .

A Nouveau graphics driver update

Posted Dec 3, 2023 22:28 UTC (Sun) by ballombe (subscriber, #9523) [Link]

On the other hand, no operating system in history support as much hardware as linux.

A Nouveau graphics driver update

Posted Dec 4, 2023 0:07 UTC (Mon) by willy (subscriber, #9762) [Link]

I don't really care how many ISA cards work on a MIPS platform if my new ethernet card doesn't work. This is a red herring.

A Nouveau graphics driver update

Posted Dec 4, 2023 5:03 UTC (Mon) by wtogami (subscriber, #32325) [Link]

Nothing else comes close to Linux in terms of dropping your disk into another x86_64 machine and it probably working. Windows is a f*ing mess with mutual conflicting drivers that often breaks this.

A Nouveau graphics driver update

Posted Dec 7, 2023 11:01 UTC (Thu) by ceplm (subscriber, #41334) [Link]

???

When was the last time that particularly Ethernet card didn’t work for you? Their manufacturers are now absolutely crazy about supporting Linux, because with the dominant position Linux has in the server world Ethernet card which doesn’t work with it is dead.

A Nouveau graphics driver update

Posted Dec 7, 2023 11:42 UTC (Thu) by james (subscriber, #1325) [Link]

Depends if we're still counting "wireless Ethernet". Realtek RTL8821/8822 chips provide 802.11ce / Wi-Fi 5 and Bluetooth on a single chip: the configuration I had didn't work with an in-tree driver until kernel 5.12 in 2021.

The Windows drivers on the Realtek website date from 2017.

Incidentally, the last three cheap-ish Core 5/Ryzen 5 laptops I've specced (HP, Dell, Asus) have all had the chip — I presume it's cheap.

Thank goodness for USB dongles.

A Nouveau graphics driver update

Posted Dec 8, 2023 0:37 UTC (Fri) by jschrod (subscriber, #1646) [Link]

Last year, I set up a rather old small Lenovo box (that we never used) as a disaster backup for one of my firewalls. I bought a 2nd Intel network card and then discovered that the box came with a RealTek network controller. (I would have to boot the system to check what the actual Realtek network controller is -- but it's 01:36am now and I won't do this if it's not necessary.)

Under Debian 11, I needed the proprietary module r8168-dkms to enable it. Robust support was not provided by the in-kernel driver r8169.

Debian unstable still lists this proprietary kernel module package in sid -- and I would have thought they would got rid of it if all network devices are supported in-kernel by now.

Are all RealTek network devices that are listed in https://packages.debian.org/de/sid/r8168-dkms are fully supported now by r8169? That would be good news.

A Nouveau graphics driver update

Posted Dec 4, 2023 0:12 UTC (Mon) by gerdesj (subscriber, #5446) [Link]

"My perspective as a user or [of] the Linux graphics stack has been poor. I have one machine stuck on an aging 5.4 kernel as the Intel integrated graphics doesn't work on anything newer. And another older machine that, after upgrading to bookworm, the Nouveau driver stopped working."

I remember XFree86 mode line challenges that could end in real tears and not just fuzzy lines on a CRT. Those are tears of sadness at a destroyed monitor, not unpleasant graphic artifacts. I recall using a Windows wifi driver via some magnificent hack on one laptop.

For me, around 2005ish, graphic support on Linux generally became reliably stable, with some spectacular buggerations! Certainly no worse than Windows. Even today, we (I own a small IT company) have Windows laptops being ... retired after everyone has given up getting something to work properly, be it graphics, wifi or whatever.

My last few laptops have been customer cast offs and my current one an employee cast off. They might be shite for Windows but still fine for this old nerd running Arch (actually). I'm currently rocking a (smbios-sys-info) ... "HP 255 G6 Notebook PC" - good enough for me, barely runs Win 10 and won't ever see Win 11! It does grind a little when I fire up some CAD apps. I can still run this: http://webglsamples.org/aquarium/aquarium.html with 30,000 fish and the fan barely twitches. I use KDE as my WM which is hardly tiny. I also run ESET AV to show solidarity and tick various boxes.

My general experience of Linux is that it really does generally work on nearly anything I throw it at but might need some tweaks, which is the same of any other OS I have ever encountered. I generally "upgrade" to the next laptop by moving my M.2 SSD and getting on with work and life.

A Nouveau graphics driver update

Posted Dec 3, 2023 15:24 UTC (Sun) by wsy (subscriber, #121706) [Link]

Just tried NVK on latest kernel + mesa. The result is beyond my expectation. A lot of vulkan demos[1] just work.

You guys are heroes.

[1] https://github.com/SaschaWillems/Vulkan

A Nouveau graphics driver update

Posted Dec 3, 2023 21:18 UTC (Sun) by PastyAndroid (subscriber, #168019) [Link]

I'm glad to see there is more progress on Nouveau! I've been watching it since it came out hoping for improvements.

It's a shame that more often than not when discussing Linux with potential new users I have to ask that question "Do you have an AMD or Nvidia GPU?" followed by "Well, if it's Nvidia you might have to do this that and this to make it work.".

It's not ideal, and it is unreasonable to suggest to someone to replace their hardware for best results. On most distributions it is relatively easy to setup the proprietary drivers. However, it is still an entry barrier, however small, for a new user.

I do find it amusing how the situation has been flipped, where previously the advice would have been to avoid ATI/AMD and get Nvidia instead for 3D on Linux.

I'm hoping that someday both AMD and Nvidia can work out the box in the same way on any given distribution. But in the mean time, I will continue using my AMD GPU.

A Nouveau graphics driver update

Posted Dec 4, 2023 0:23 UTC (Mon) by khim (subscriber, #9252) [Link]

> It's not ideal, and it is unreasonable to suggest to someone to replace their hardware for best results.

It's very reasonable and I often say it's the right thing to do. If you buy some crazy device with integrated Intel PowerVR based GPU then you couldn't use with any operation system today: it's too underpowered to run Windows 10 and Linux drivers don't exist (except for some old version of Android and then these are binary-only).

Okay, many nVidia-based laptops are not that old, but still… you wouldn't expect great usability from some poor device with HDD and soldered-on 4GB of RAM in Windows 11, why should Linux support anything and everything?

Linux suffers from it's inability to provide SDK which would make it possible to develop software for it, but expecting full-blown support for any random hardware is unreasonable: none of OSes that exist achieve that feat.

A Nouveau graphics driver update

Posted Dec 4, 2023 18:58 UTC (Mon) by PastyAndroid (subscriber, #168019) [Link]

That is true in the case of obscure hardware. But for mainstream hardware, it is unreasonable to suggest hardware replacement simply because of a driver.

For example, if someone has a NVIDIA GeForce RTX 3060, a very popular Nvidia card that is not obscure, it would be unreasonable to tell this person they must switch to AMD to use Wayland for example. (Wayland on Nvidia is still a mess.)

Bearing in mind, aside from having to provide instructions on how to install the proprietary drivers (which can vary depending on the distribution), even if they have the drivers installed the experience with Wayland may be less than optimal due to poor driver functionality.

To be clear though, I am not blaming the open source developers here. I am blaming Nvidia. Nvidia can, and should, go the same route as AMD whereby users can buy any AMD graphics card and simply plug and play on Linux, without further instructions being necessary.

I should also make it clear; I am talking about potential new users, who have no prior Linux experience. I hope that it will become as easy and as simple for them as possible.

Everyone deserves the freedom of FOSS, even if they are not technically minded.

A Nouveau graphics driver update

Posted Dec 3, 2023 23:57 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

I assume that the firmware is encrypted, so it can't be compressed? Perhaps NVidia can switch to signing instead, so that binary diffs can meaningfully decrease the size?


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds