A Nouveau graphics driver update
The kernel community's relationship with NVIDIA "has gone up and down" over the years, Airlie began. Recently, though, the company has rearchitected its products, adding a large RISC-V processor (the GPU system processor, or GSP) and moving much of the functionality once handled by drivers into the GSP firmware. The company allows that firmware to be used by Linux and shipped by distributors. This arrangement brings a number of advantages; for example, it is now possible for the kernel to do reclocking of NVIDIA GPUs, running them at full speed just like the proprietary drivers can. It is, he said, a big improvement over the Nouveau-only firmware that was provided previously.
There are a number of disadvantages too, though. The firmware provides no
stable ABI, and a lot of the calls it provides are not documented. The
firmware files themselves are large, in the range of 20-30MB, and two of
them are required for any given device. That significantly bloats a
system's /boot directory and initramfs image (which must provide
every version of the firmware that the kernel might need), and forces the
Nouveau developers to be strict and careful about picking up firmware
updates.
Nouveau work has taken a bit of a setback since longtime developer Ben Skeggs left the project, but he did manage to do a lot of refactoring before he went. Nouveau now has initial GSP support for one firmware version; that code was merged in for the 6.7-rc1 release. It is only enabled for the Ada series of GPUs by default; with a command-line argument it can be made to work with Turing and Ampere devices as well. It is missing some features, including fault handling (which "shouldn't be too hard" to add) and sensor monitoring, which doesn't work at all.
NVIDIA's firmware, Airlie said, comes with a set of include files that, in turn, define structures that change over time. To deal with these changes, the driver is going to need some sort of automated ABI generation; he noted that the developers working on the Apple M1 GPU driver have run into the same problem. This problem could be made easier to tackle, he suggested, if the driver were, like the M1 driver, to be rewritten in Rust.
Next steps
Supporting the GSP firmware is just the beginning, though; at this point, Airlie took a step back and talked about the task of making a useful GPU driver in general. Years ago, a graphics card came with some video RAM and a graphics translation table (GTT). The driver would map system memory into the graphics card; user space could then submit buffer handles that would be relocated for the graphics device. This approach works, he said, but it is slow.
Current GPUs have full virtual memory, instead, which saves a lot of that overhead. The kernel has grown a number of subsystems for working with this virtual memory, including the graphics execution manager (GEM) for buffer-object management, the translation table manager (TTM) for discrete video-RAM buffer-object management, and a bunch of synchronization and fencing code. Initially, the DRM subsystem would tie the allocation of a buffer to an allocation of virtual memory at the same time; that was easy to do and sufficed to implement OpenGL. But, he said, the graphics world moved on from there.
Specifically, Vulkan came along. It brought the concept of sparse memory and, with it, virtual memory that is managed by user space. Vulkan can handle both synchronous and asynchronous virtual-area updates, but it "gets complicated". Various drivers started inventing their own virtual-area management; as a way of bringing that work back together, the VM_BIND API was developed.
This is consistent with a recurring pattern, Airlie said. The DRM developers work to share common code between graphics drivers, but the driver developers keep trying to reinvent wheels, a tendency that has to be resisted. The subsystem did well with regard to mode setting, he said, but less well on the acceleration side; there is a "common GPU scheduler" that is only used by one driver, for example. Similarly, there are a lot of drivers implementing VM_BIND by doing their own virtual-area management.
In response, Airlie came up with the "good idea" of getting somebody else to write a common virtual-area manager, called GPUVM, inspired by the amdgpu code. It is intended to be useful for all drivers; it is used by the Nouveau, Xe (Intel's new driver), and Panfrost drivers now. Hopefully the amdgpu and MSM drivers will pick it up as well. The best part is that there are multiple developers who understand it and can help to keep it from going off in the wrong direction. GPUVM has been through a lot of iterations, he said, providing "lots of learning experiences".
As an example, he talked about the problem of fence signaling. A fence indicates when a series of GPU operations has been completed; waits for these fences have to be time-bounded, or the memory-management subsystem might deadlock. In short, a GPU can easily pin down all of a system's RAM if given the opportunity. There is a shrinker that can be called when memory gets tight, but it will have to wait for fences to be signaled to know when memory can be freed. If the code that set the fence decides to allocate more memory while this is happening, a deadlock results. To avoid this outcome, developers have to strictly limit the operations that can be performed in fence-signaling critical sections; care must also be taken before acquiring any locks. It would be nice to be able to update the page tables during this code, but that ran into deadlock problems and had to be backed out.
Returning to Nouveau, Airlie said that the initial VM_BIND API, using GPUVM, synchronous objects, and integration with the scheduler, was merged for the 6.6 release. There are a lot of improvements in the works that should land in 6.8. At this point, he said, we have the core of a modern GPU driver for NVIDIA hardware — for graphics, at least. More work will be required before Nouveau can support compute applications.
On the user-space side, Faith Ekstrand has been developing the NVK Vulkan driver for Nouveau; this driver recently reached Vulkan 1.0 conformance. This work involved creating a new compiler, called NAK, that has just been merged into Mesa; this compiler yields far better performance (from 20 frames per second to over 1000) than the old "codegen" compiler did. Naturally, this compiler is written in Rust. The next step, Airlie concluded, is to move forward to Vulkan 1.3.
Video and slides from the talk are available.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our
travel to this event.]
Index entries for this article | |
---|---|
Kernel | Device drivers/Graphics |
Conference | Linux Plumbers Conference/2023 |
(Log in to post comments)
A Nouveau graphics driver update
Posted Dec 3, 2023 6:19 UTC (Sun) by logang (subscriber, #127618) [Link]
A Nouveau graphics driver update
Posted Dec 3, 2023 11:16 UTC (Sun) by tux3 (subscriber, #101245) [Link]
I'm not sure moving all the software into the GSP is a good sign of nvidia-upstream collaboration, but pragmatically maybe it's easier to manage than the current situation?
Won't fix your old GPU, but if the new way is "keep a giant blob around", we could hope old hardware support will have less places to bitrot.
Not that I would bet on it, mind you.
A Nouveau graphics driver update
Posted Dec 3, 2023 15:50 UTC (Sun) by Gerardo (subscriber, #37539) [Link]
A Nouveau graphics driver update
Posted Dec 3, 2023 20:15 UTC (Sun) by roc (subscriber, #30627) [Link]
I'm a huge Linux fan, been running it on my laptops and desktops for > 15 years, and those days never existed.
A Nouveau graphics driver update
Posted Dec 3, 2023 20:42 UTC (Sun) by mb (subscriber, #50428) [Link]
Yes, the hardware compatibility as of today is better than it has ever been.
20 years ago one has to check "Linux compatibility" before buying hardware.
Today virtually all new hardware just works.
And old hardware keeps working most of the time.
Today it is possible to buy a random Laptop and everything just works.
That was not the case 20 years ago.
A Nouveau graphics driver update
Posted Dec 3, 2023 20:51 UTC (Sun) by himi (subscriber, #340) [Link]
Most of the sound issues have gone away since the sound card market kind of went away (outside the "pro" market) - everything's integrated these days, which means there's a much smaller set of things that need some level of support, and they generally come from larger vendors. So sound mostly "just works" now.
Graphics is a /long/ way from getting to that point, though it's not /that/ hard to pick hardware that will likely work out of the box. There are really only three vendors to worry about, and two of them work pretty well with Linux - if you get a laptop with integrated Intel or AMD graphics you're probably not going to have many issues (though that's far from guaranteed). But the graphics market includes crazy shit like Prime, which is a nightmare to deal with sensibly - I've got a laptop with an NVidia card that's got perfectly solid (if proprietary) driver support, and an integrated Intel chip that's really well supported, but the combination falls over regularly.
Obviously the server side is different, but in that market Linux is the 800 pound gorilla. Consumer markets, not so much . . .
A Nouveau graphics driver update
Posted Dec 3, 2023 22:28 UTC (Sun) by ballombe (subscriber, #9523) [Link]
A Nouveau graphics driver update
Posted Dec 4, 2023 0:07 UTC (Mon) by willy (subscriber, #9762) [Link]
A Nouveau graphics driver update
Posted Dec 4, 2023 5:03 UTC (Mon) by wtogami (subscriber, #32325) [Link]
A Nouveau graphics driver update
Posted Dec 7, 2023 11:01 UTC (Thu) by ceplm (subscriber, #41334) [Link]
When was the last time that particularly Ethernet card didn’t work for you? Their manufacturers are now absolutely crazy about supporting Linux, because with the dominant position Linux has in the server world Ethernet card which doesn’t work with it is dead.
A Nouveau graphics driver update
Posted Dec 7, 2023 11:42 UTC (Thu) by james (subscriber, #1325) [Link]
Depends if we're still counting "wireless Ethernet". Realtek RTL8821/8822 chips provide 802.11ce / Wi-Fi 5 and Bluetooth on a single chip: the configuration I had didn't work with an in-tree driver until kernel 5.12 in 2021.The Windows drivers on the Realtek website date from 2017.
Incidentally, the last three cheap-ish Core 5/Ryzen 5 laptops I've specced (HP, Dell, Asus) have all had the chip — I presume it's cheap.
Thank goodness for USB dongles.
A Nouveau graphics driver update
Posted Dec 8, 2023 0:37 UTC (Fri) by jschrod (subscriber, #1646) [Link]
Under Debian 11, I needed the proprietary module r8168-dkms to enable it. Robust support was not provided by the in-kernel driver r8169.
Debian unstable still lists this proprietary kernel module package in sid -- and I would have thought they would got rid of it if all network devices are supported in-kernel by now.
Are all RealTek network devices that are listed in https://packages.debian.org/de/sid/r8168-dkms are fully supported now by r8169? That would be good news.
A Nouveau graphics driver update
Posted Dec 4, 2023 0:12 UTC (Mon) by gerdesj (subscriber, #5446) [Link]
I remember XFree86 mode line challenges that could end in real tears and not just fuzzy lines on a CRT. Those are tears of sadness at a destroyed monitor, not unpleasant graphic artifacts. I recall using a Windows wifi driver via some magnificent hack on one laptop.
For me, around 2005ish, graphic support on Linux generally became reliably stable, with some spectacular buggerations! Certainly no worse than Windows. Even today, we (I own a small IT company) have Windows laptops being ... retired after everyone has given up getting something to work properly, be it graphics, wifi or whatever.
My last few laptops have been customer cast offs and my current one an employee cast off. They might be shite for Windows but still fine for this old nerd running Arch (actually). I'm currently rocking a (smbios-sys-info) ... "HP 255 G6 Notebook PC" - good enough for me, barely runs Win 10 and won't ever see Win 11! It does grind a little when I fire up some CAD apps. I can still run this: http://webglsamples.org/aquarium/aquarium.html with 30,000 fish and the fan barely twitches. I use KDE as my WM which is hardly tiny. I also run ESET AV to show solidarity and tick various boxes.
My general experience of Linux is that it really does generally work on nearly anything I throw it at but might need some tweaks, which is the same of any other OS I have ever encountered. I generally "upgrade" to the next laptop by moving my M.2 SSD and getting on with work and life.
A Nouveau graphics driver update
Posted Dec 3, 2023 15:24 UTC (Sun) by wsy (subscriber, #121706) [Link]
You guys are heroes.
A Nouveau graphics driver update
Posted Dec 3, 2023 21:18 UTC (Sun) by PastyAndroid (subscriber, #168019) [Link]
It's a shame that more often than not when discussing Linux with potential new users I have to ask that question "Do you have an AMD or Nvidia GPU?" followed by "Well, if it's Nvidia you might have to do this that and this to make it work.".
It's not ideal, and it is unreasonable to suggest to someone to replace their hardware for best results. On most distributions it is relatively easy to setup the proprietary drivers. However, it is still an entry barrier, however small, for a new user.
I do find it amusing how the situation has been flipped, where previously the advice would have been to avoid ATI/AMD and get Nvidia instead for 3D on Linux.
I'm hoping that someday both AMD and Nvidia can work out the box in the same way on any given distribution. But in the mean time, I will continue using my AMD GPU.
A Nouveau graphics driver update
Posted Dec 4, 2023 0:23 UTC (Mon) by khim (subscriber, #9252) [Link]
> It's not ideal, and it is unreasonable to suggest to someone to replace their hardware for best results.It's very reasonable and I often say it's the right thing to do. If you buy some crazy device with integrated Intel PowerVR based GPU then you couldn't use with any operation system today: it's too underpowered to run Windows 10 and Linux drivers don't exist (except for some old version of Android and then these are binary-only).
Okay, many nVidia-based laptops are not that old, but still… you wouldn't expect great usability from some poor device with HDD and soldered-on 4GB of RAM in Windows 11, why should Linux support anything and everything?
Linux suffers from it's inability to provide SDK which would make it possible to develop software for it, but expecting full-blown support for any random hardware is unreasonable: none of OSes that exist achieve that feat.
A Nouveau graphics driver update
Posted Dec 4, 2023 18:58 UTC (Mon) by PastyAndroid (subscriber, #168019) [Link]
For example, if someone has a NVIDIA GeForce RTX 3060, a very popular Nvidia card that is not obscure, it would be unreasonable to tell this person they must switch to AMD to use Wayland for example. (Wayland on Nvidia is still a mess.)
Bearing in mind, aside from having to provide instructions on how to install the proprietary drivers (which can vary depending on the distribution), even if they have the drivers installed the experience with Wayland may be less than optimal due to poor driver functionality.
To be clear though, I am not blaming the open source developers here. I am blaming Nvidia. Nvidia can, and should, go the same route as AMD whereby users can buy any AMD graphics card and simply plug and play on Linux, without further instructions being necessary.
I should also make it clear; I am talking about potential new users, who have no prior Linux experience. I hope that it will become as easy and as simple for them as possible.
Everyone deserves the freedom of FOSS, even if they are not technically minded.
A Nouveau graphics driver update
Posted Dec 3, 2023 23:57 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]