Improving printk()

By Jonathan Corbet
November 1, 2017

2017 Kernel Summit

When a kernel developer wants to communicate a message to user space, be it for debugging or to report a serious problem with the system, the venerable printk() function is usually the tool of choice. But, as Steve Rostedt (accompanied by Petr Mladek and Sergey Senozhatsky) noted during a brief session at the 2017 Kernel Summit, printk() has not aged well. In particular, it can affect the performance of the system as a whole; the roots of that problem and a possible solution were discussed, but a real solution will have to wait for the appearance of the code.

The problem, Rostedt said, has to do with the management of the console lock, which serializes access to the console device where messages are printed. Deep within printk(), one will find interesting call sequences like:

    if (console_trylock())
	console_unlock();

The first call will attempt to acquire the console lock but may not succeed; the second, on its surface, releases that lock immediately after it was acquired. It is the work involved in releasing the console lock that can create problems for the system.

printk() must proceed regardless of the availability of the console lock; since printk() is called from all over the kernel, waiting for any sort of lock risks deadlocking the system. So, if a particular call is unable to obtain the console lock, it simply buffers its output and returns, in the expectation that somebody else will flush that output to the console. That task falls to the thread that holds the console lock; that thread is expected to flush out all buffered output as part of the process of releasing the lock.

On a large system with a lot of CPUs, there can be multiple threads calling printk() at any given time. They can leave behind a lot of work for the unlucky thread that holds the console lock; indeed, in the worst case, output can continue to pile up while the buffer is being flushed, leaving the lock holder with a job of indefinite duration. That is bad for system performance and the latency of anything that needs to run on the affected CPU.

Peter Zijlstra jumped in to say that, whenever this problem comes up, he just removes printk() calls until it goes away. Andrew Morton, instead, asked for forgiveness for creating this mechanism in the first place; it was, he said, something he came up with at 3AM. Rostedt went on to say that, in the worst case, flushing printk() output can take so long that the watchdog fires and the system crashes. If there are 100 CPUs in the system, one of them can end up flushing printk() output forever.

There are, he said, a couple of possible solutions to the problem. One of them is to remove printk() calls as Zijlstra suggested, but that is a game of whack-a-mole that is never really done. The alternative is a new locking scheme where the second thread attempting to obtain the console lock spins and waits for it to become available. The current holder of the lock will see that there is a waiter and release the lock; the second thread will then acquire it and continue flushing the output buffer. If multiple CPUs are generating output, the lock will circulate between them, and none will end up printing output for too long.

Jan Kara said that he had once tried to implement a similar scheme, but he ran into a lot of special cases and finally gave up on it. Mathieu Desnoyers suggested deferring any excess printing work to a workqueue rather than pushing it out immediately; Ben Herrenschmidt concurred, saying that there is no real need to flush output right away. But Rostedt answered that Linus Torvalds insists that crash dumps must go out immediately, so any scheme that can delay output will not fly. The entire printk() buffer must be printed out as soon as possible.

There was some unstructured discussion on the details of the new approach, but no real conclusions were reached. This is a conversation that will have to resume once the code to implement the new mechanism has been posted.

[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to the Kernel Summit.]

Index entries for this article
Kernel	Kernel messages
Kernel	printk()
Conference	Kernel Summit/2017

(Log in to post comments)

Improving printk()

Posted Nov 1, 2017 16:01 UTC (Wed) by smoogen (subscriber, #97) [Link]

I wonder how much large CPU systems need to start dedicating CPUs to certain tasks so that work can continue.

I am sorry CPU1, your job til we reboot is flushing the printks. CPU2 you have to deal with networking and you over there CPU17?.. you got threading. Got it? Good. The other 240 cpus get to work on other stuff this time. What me? CPU0? Oh I am just here to make sure you guys get your work done. Someone has to manage you lot managing jobs.

Improving printk()

Posted Nov 1, 2017 16:27 UTC (Wed) by gioele (subscriber, #61675) [Link]

> I wonder how much large CPU systems need to start dedicating CPUs to certain tasks so that work can continue.
>
> I am sorry CPU1, your job til we reboot is flushing the printks. CPU2 you have to deal with networking and you over there CPU17?.. you got threading. Got it? Good. The other 240 cpus get to work on other stuff this time. What me? CPU0? Oh I am just here to make sure you guys get your work done. Someone has to manage you lot managing jobs.

So, finally the Cell architecture would be here?

What you are describing is a Cell processor with 1 PPE + 240 SPE. https://en.wikipedia.org/wiki/Cell_(microprocessor)#Synergistic_Processing_Elements_.28SPE.29

Improving printk()

Posted Nov 1, 2017 16:32 UTC (Wed) by setech (subscriber, #105454) [Link]

That’s exactly the thought I had. Without changing the requirements, at such scale you might as well dedicate a single core for flushing buffers. Maybe we can give it a name: kjanitor’s Closet

Improving printk()

Posted Nov 2, 2017 14:45 UTC (Thu) by felixfix (subscriber, #242) [Link]

"kjanitor’s Closet"? Surely Fibber McGee and Molly's closet!

Improving printk()

Posted Nov 1, 2017 18:25 UTC (Wed) by jhoblitt (subscriber, #77733) [Link]

Perhaps high volume messages shouldn't be going to the console at all? What scenarios generate these heavy loads in the first place? All that comes to my mind is ipfilter logging dropped packets.

Improving printk()

Posted Nov 2, 2017 16:57 UTC (Thu) by nevets (subscriber, #11875) [Link]

BTW, a lockup was reported the day after this Kernel Summit discussion that was a result of exactly what I talked about. A lot of CPUs were doing printk() (specifically calling warn_alloc()), and as all sorts of threads start calling warn_alloc() due to memory failing to allocate, the original thread ends up processing all other warnings, and never executes OOM.

http://lkml.kernel.org/r/1509017339-4802-1-git-send-email...

I replied with a patch that implements what I talked about at Kernel Summit (and described in this article). Now there's real code to review.

http://lkml.kernel.org/r/20171102115625.13892e18@gandalf....

Improving Crash Dumps & printk()

Posted Nov 10, 2017 14:01 UTC (Fri) by roblucid (guest, #48964) [Link]

Why can't crash dumping use an explicit immediate flush on console, so it is immediate? With the non-Torvalds mandated printk() stuff get passed off to a worker thread. If you've decided to crash then you're about to terminate so locks can be over-ridden. Guess someone will then need a way to flush real error messages somewhere for post-crash inspection.

Frankly from first impression of Linux in the 90's I was a bit surprised by how chatty it was on boot, in more recent times most of that is hidden to the user.