|
|
Subscribe / Log in / New account

The problematic kthread freezer

By Jonathan Corbet
November 2, 2016
2016 Kernel Summit
The kernel thread ("kthread") freezer, as its name would suggest, is charged with freezing kernel threads during a system hibernation cycle. At the 2016 Kernel Summit, Jiri Kosina took the stage (for the second time) to say that the usage of the kthread freezer is "out of control" and "broken everywhere." It is time, he said, to bring things under control, then get rid of the freezer altogether.

The first problem, he said, is that the freezer's semantics are not well defined; nobody really knows what it means for a kthread to be frozen. Most of the current uses of the freezer are superfluous. In many cases, the purpose is to have filesystems be in a consistent state during hibernation; that can be better achieved with the filesystem freeze mechanism. It doesn't make sense to freeze I/O operations in general, since they are needed to write out the hibernation image. There is a lot of freezing in drivers too, a situation which, he said, makes no sense. There is a well-defined set of power-management callbacks in place to put drivers into a suspended state during hibernation.

The kernel, he said, is the victim of a massive copy-and-paste cargo cult. Uses of the kthread freezer are spreading like a disease, a situation that has to stop.

There are two especially pathological uses that he called out. One is try_to_freeze() calls for threads that have not been marked freezable in the first place; those calls will never have any effect. The other is try_to_freeze() calls after starting I/O, but without waiting for that I/O to complete.

The solution is to eliminate use of the kthread freezer wherever possible. It is not needed in threads that will not generate disk I/O. It is also not needed — indeed, its use is a bug — in I/O helper threads. The best solution would be to move the entire hibernation subsystem to use filesystem freezing instead, and simply get rid of the kthread freezer. It might be necessary to keep it around for NFS, he said, but there's not much else that should need it. But the first step is to stop its use from spreading.

Ben Herrenschmidt spent a while talking about the history of the freezer, which, he said, was invented as "a big, fat band-aid" without which the system could not suspend properly. Now, instead, we simply need to make our drivers cope properly with I/O during a suspend operation. As the session closed, Linus agreed that the best approach was to get rid of the kthread freezer altogether and to use filesystem freezing where it is really needed. So one should expect development to go in that direction.

Index entries for this article
KernelKernel threads
ConferenceKernel Summit/2016


(Log in to post comments)

The problematic kthread freezer

Posted Nov 3, 2016 2:03 UTC (Thu) by trondmy (subscriber, #28934) [Link]

> It might be necessary to keep it around for NFS, he said, but there's not much else that should need it.

Thanks for the offer, but no thanks. The kthread freezer is borked for NFS as well, and we'd rather get rid of it.

The problematic kthread freezer

Posted Nov 3, 2016 11:26 UTC (Thu) by jlayton (subscriber, #31672) [Link]

Agreed. As we discussed recently, I think the right solution is to wire up the fs_freeze mechanism for NFS to quiesce it like we would any other filesystem. The machine suspending would fs_freeze all of its filesystems and then go to sleep.

The problematic kthread freezer

Posted Nov 5, 2016 14:53 UTC (Sat) by jikos (subscriber, #43140) [Link]

That'd be awesome; if you could then CC me once the ->freeze_fs() callback implementation is going in, I'd immediately proceed, as that'd remove the one of the biggest current roadblocks blocking my "move hibernation towards fs freezing and kill kthread freezer" patchset. Thanks.

The problematic kthread freezer

Posted Nov 9, 2016 15:29 UTC (Wed) by jlayton (subscriber, #31672) [Link]

Well, I handwaved that a bit. It _is_ rather tricky to wire up for NFS.

Basically what I think we'd want to do is to have fsfreeze tell the RPC transport layer that it should stop sending RPCs to the server(s) and drain the queue by waiting on replies to come in.

The question though is what to do with threads sitting in syscalls that need to issue an RPC. "Parking" them down at the layer where we're synchronously waiting for an RPC reply would be bad, as it would mean that we could easily be holding vfs-layer locks at that point (inode->i_rwsem for instance).

How should that work?

The problematic kthread freezer

Posted Nov 7, 2016 19:30 UTC (Mon) by Alan.Stern (subscriber, #12437) [Link]

It's important not to go too far. Work queues are implemented as kernel threads, and there are legitimate reasons for a work queue to be freezable. I wouldn't want to see freezable work queues acting up at the wrong time simply because somebody had decided to stop freezing kernel threads.


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds