KS2010: A staging process for ABIs

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jonathan Corbet
November 2, 2010

2010 Kernel Summit

Like many kernel developers, Steve Rostedt has found out that user-space interfaces are hard. API design is hard in general, but, once an interface makes it into a released kernel it must be maintained indefinitely. Breaking applications is just not something that can be allowed to happen. But we always learn things about ABIs after people start trying to actually use them. So, he asked: do we need a way to stage new ABIs into the kernel? New interfaces, perhaps, could be specially marked and only available via debugfs; they could be withdrawn or changed in any future kernel release. That scheme would give developers a chance to find and fix any remaining problems before committing to the ABI.

The answer from Linus came quickly: "no." Any sort of staging process for adding ABIs would, in his opinion, be a failure. If we need any such thing, we are clearly just adding too many ABIs in the first place. We have too many system calls, and too many other ways of interacting with the kernel. We should, instead, be talking about how to say "no" more often.

Another way of putting it, he said, is that, if you still want people to try out an ABI, you should not be asking him to pull it. In general, it can be better if new interfaces stay out of the kernel for a while. SystemTap was given as an example here: according to Linus, time has shown that the SystemTap interface is not a good one. He's very glad he never pulled it into the kernel. The lesson is that it's a good idea to impose a certain amount of pain on people who want to create new interfaces; let them live out of the mainline for a while. If, after five years it looks like a good idea, the code can be taken upstream. Maintainers should not be accepting ABIs which have not seen that sort of testing.

Ted Ts'o talked for a bit about the financial resources behind some new features; fanotify was given as an example. Companies that want such features will put their cash behind them; given enough time, some of those features will get past the community's defenses. It is going to happen at times, how can we deal with it?

Another example that was raised was the Android suspend blockers. The answer here is that the code has now been merged; the final pieces went in for 2.6.37. Of course, it's not suspend blockers that were merged, it was the opportunistic suspend and wakeup sources work done by Rafael Wysocki; suspend blockers "done the right way." The only problem here is that the Android developers have not said whether they will use this ABI or not; this particular interface is essentially untested and without users at the moment.

Should new ABIs go into linux-next? Patches going there are supposed to be intended for merging in the next development cycle, so the answer is "no" unless it's clear that the ABI is ready to go in.

What about removing ABIs that we don't like? Linus's response was, once again, clear: breaking applications is a regression. So if he gets even a single complaint about a removed interface, he'll revert the patch and put the ABI back. It's really only possible to remove an ABI if nobody will notice that it's gone. Andrew Morton agreed, but also pointed out that we have to have some way of getting rid of old ABIs if we are going to preserve our sanity over the long term. The first step, he says, is to warn users. Then, after a while (five to ten years, perhaps), there will be no users left and the code can go away. Linus noted that Google can be an effective way of looking for deprecation warnings. If nobody has posted a log with a warning in at least a year, it's probably safe to remove the interface.

Andrew added that a bad ABI indicates a failure of the review process. And the review process, he said, is what we should be caring about more than anything else. When a new ABI is posted, everybody should be looking at it. He was clearly not happy about the amount of review that is happening now.

Dave Airlie asked if it would help to require man pages for every new ABI. In the past, Michael Kerrisk's man page work has helped to reveal a number of ABI problems and bugs, but Michael is not doing that anymore. Linus responded that we've tried in the past, but it hasn't worked very well. Al Viro added that "the man page kind of sucks" is a weak last line of defense which comes too late. Beyond that, as Linus noted, man pages tend to describe system calls, but that's not where the real problem is. Much more ABI trouble comes from tracepoints, ioctl() calls, sysfs, etc.

Ted said that, in the end, bad ABIs are really a maintainer problem. Maintainers have to say "no" more often. Hugh Dickins suggested that a special effort could be dedicated to removing crap in the -rc2 and -rc3 releases that was added in -rc1. At the closing of the session, it was suggested that there would be value in having a tool which could identify all new user-space ABIs added since the previous kernel release. That could make a good project for somebody who would like to help the kernel process.

Next: Deadline scheduling.

Index entries for this article
Kernel	Development model/User-space ABI

(Log in to post comments)

KS2010: A staging process for ABIs

Posted Nov 2, 2010 22:23 UTC (Tue) by buck (subscriber, #55985) [Link]

One way to ease the pain of ABIs might be to insist that an ABI have
an API that lives in userland that client code could depend on not
changing and have whoever sponsors the ABI inclusion commit to main-
taining the API and making sure it works across kernel versions, no
matter the ABI stability. At worst, if the commitment is broken,
the API can be fixed by some responsible user and everybody dependent
on the feature would benefit. And changes to the API could even be
countenanced apart from the kernel, by the concerned community, re-
gardless of how stable/unstable the ABI is

Or is this just a way of introducing inefficiency, complexity and
bugs (on top of what the user is liable to do already)?

KS2010: A staging process for ABIs

Posted Nov 3, 2010 15:06 UTC (Wed) by njs (subscriber, #40338) [Link]

This is what they do now, just the boundary between the compatibility interface and the real implementation lies inside the kernel, not in userspace. I'm not sure what difference it would make to move that outside -- it's still the same amount of code to be maintained, and now you introduce potential version skew issues.

KS2010: A staging process for ABIs

Posted Nov 3, 2010 0:38 UTC (Wed) by nix (subscriber, #2304) [Link]

We have too many system calls

But, but, Linus has been encouraging people to add new syscalls instead of other interaction methods!

KS2010: A staging process for ABIs

Posted Nov 3, 2010 3:37 UTC (Wed) by arnd (subscriber, #8866) [Link]

Well, we also have too many of the others. I guess it mostly comes down to having too many interfaces of all sorts, syscalls being the more obvious kind.

KS2010: A staging process for ABIs

Posted Nov 3, 2010 5:12 UTC (Wed) by magfr (subscriber, #16052) [Link]

Regarding the manpage issue I think one of the big problems is that they fail to describe ioctls, sysfs, etc so it becomes very hard to figure out how to figure out how to use the ABI, or even that it exists.
Hence I think it would be better to add more man pages and require them.

KS2010: A staging process for ABIs

Posted Nov 3, 2010 13:05 UTC (Wed) by fuhchee (subscriber, #40059) [Link]

"SystemTap was given as an example here: according to Linus, time has shown that the SystemTap interface is not a good one."

What mush. systemtap was never proposed for pulling. In any case, systemtap does not provide a kernel interface or ABI. Perhaps he meant "utrace", but that also was not a user-space ABI. So I can't see what he might have meant in this context. Can someone clarify / correct?

Remove bad ABIs as fast as possible.

Posted Nov 4, 2010 15:24 UTC (Thu) by i3839 (guest, #31386) [Link]

I think Linus is too stubborn with his willingness to remove existing ABIs. Removing an ABI should be about as hard as adding one, because both actions change the ABI. If a new ABI is introduced then userspace can't be sure it will be there and has to check for it anyway. So the sooner you remove a bad ABI, the better, because less userspace relies on it being there. It doesn't matter if an ABI doesn't work because the kernel is too old, or because the kernel is too new. Waiting 5 to 10 years is the worst possible approach, except for ABIs that are twice that old.