Linux 5.12's very bad, double ungood day

By Jonathan Corbet
March 8, 2021

The -rc kernels released by Linus Torvalds exist for a reason: after 10,000 or so changes flow into the kernel over a two-week merge window, there will surely be some bugs in need of squashing. The -rc kernels provide an opportunity for wider testing after all those patches have been integrated. Most of the time, -rc kernels (even the initial -rc1 releases) are surprisingly safe to run. Occasionally, though, something goes wrong, giving early testers reason to reconsider their life choices. The 5.12-rc1 kernel, as it turns out, was one of those.

On January 26, Christoph Hellwig posted a 17-patch series containing cleanups to the code dealing with the allocation of the BIO structures used to represent block-I/O requests. The final patch in that series simplified the allocation of requests for swap files in particular. The series was applied by block maintainer Jens Axboe one day later. The change was included in this pull request sent on February 17, and landed in the mainline on February 21 as part of the massive set of pulls done by Torvalds once his power was restored.

"Swapping" is how the kernel pushes anonymous pages (those which are not backed up by a file on disk — program data, in other words) out to persistent storage when memory gets tight. Linux can swap directly to a partition on a block device; that is how systems are traditionally set up. But the kernel can also swap to a file within a mounted filesystem with something close to the same performance. Swapping to a file gives administrators some additional flexibility; it is an easy way to give some relief to a system that is running short of memory without disturbing the existing workload, for example.

The problematic patch works just fine for swap activity to a partition. But, when a swap file is in use, the offset to the location of its blocks within the filesystem would be lost. That had the result of redirecting swap traffic to an essentially random location on the filesystem — a location that was not intended for swapping and may well have contained other information that somebody would have preferred to keep. In other words, the filesystem containing the swap file would be corrupted. (Interestingly, a similar bug was introduced in 2014 and lurked undetected because it only affects extremely rare configurations.)

On March 2, a fix was applied to the mainline repository. One day later, Torvalds sent a warning to the linux-kernel mailing list describing the problem and noting that: "This is what we in the industry call 'double ungood'". He removed the 5.12-rc1 tag from his repository (or, rather, renamed it to "5.12-rc1-dontuse") and generally made it clear that this release should be avoided. He also had a couple of requests for maintainers.

One had to do with the 5.12-rc1 tag; he can remove it from his repository, but that does not change anybody else's repository. So, he said, maintainers should manually remove that tag themselves just to prevent any accidental use of that release.

Beyond that, it is common for subsystem maintainers to base new branches on an -rc release, even -rc1. But any such branches pose a particular hazard for the development and testing community in general, because they will not contain the fix for this problem. The git bisect tool, which is frequently used to find the patch that caused a regression, can land on any location in the development history; branches containing the buggy commit (but not the fix) create extended ranges of history with the potential to destroy filesystems far into the future. This, also, seems "'double ungood'".

The answer is to not base development branches on 5.12-rc1, but instead to use either an older release or to wait until -rc2. That will not be entirely helpful for subsystem maintainers who have already published branches with an -rc1 base, as some certainly have. They have a choice between keeping the dangerous branch or rebasing it onto a safer branch point, possibly creating difficulties for anybody else who has built on that branch. That sort of rebasing would normally be frowned upon, but Torvalds made it clear that it is permissible — and expected — this time.

The kernel community is mostly treating this as a case of "bugs happen" and moving on. One can indeed argue that the process has worked the way it is supposed to: a catastrophic bug was caught early during the stabilization period and will never make it anywhere near a production system. That said, there may be room for a bit of introspection and thinking about how things could have been done better.

For example, one might argue that the review process could have worked better. The patch in question was only posted once, and was applied to the block repository the next day. The patch in the mainline repository contains two Reviewed-by tags and one Acked-by tag, but the mailing-list thread shows those tags all being offered for a different patch in the series. Perhaps all three reviewers offered their tags off-list but, from the public record, it's not clear how much review the problematic patch actually received.

There is no guarantee that more eyeballs would have found this bug before it was committed, but it seems that those eyeballs were not given much of a chance here. It is tempting to see "cleanup" patches as being inherently safe and needing less review, but those patches can contain subtle bugs as well.

A part of the process that could have worked more quickly was warning the community as a whole of the problem. In the message linked above, Torvalds acknowledged that he should have acted more quickly rather than waiting to be sure that the proposed fix actually made the problem go away. One can only imagine that the warning will go out more quickly the next time.

There is one other little problem here that surprisingly few people are talking about. The health of the mailing-lists hosted on vger.kernel.org has not been optimal for a while; at times, posts to the mailing lists have been delayed for a day or two or dropped altogether. Torvalds's warning was not delayed that badly, but it still took several hours to get to recipients of linux-kernel. Whether the delay in the delivery of that warning caused more testers to be bitten by this bug is not clear, but it certainly did not help.

The good news is that work is currently being done to put the vger lists on a more solid, well-supported footing. With luck, the mailing-list problems will be behind us in the relatively near future.

Even then, though, a warning sent just to linux-kernel has a real chance of disappearing into the noise. The traffic on that list, which routinely exceeds 1,000 messages per day, can drown out important messages and has caused many developers to unsubscribe completely. It may well be that we need better communication channels for this kind of urgent message.

In the end, the actual pain caused by this bug was probably relatively small; there have not been massive numbers of "it ate my filesystem!" messages going around. The process did mostly work the way it was intended to. With luck, lessons learned from this incident can help that process to work even better the next time.

Index entries for this article
Kernel	Releases/5.12

(Log in to post comments)

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 17:06 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

The commit message doesn't say why what is essentially

-	bio->bi_iter.bi_sector = map_swap_page(page, &bdev);
+	bio->bi_iter.bi_sector = swap_page_sector(page);

would have been the right replacement. I suppose that's where the bug lies?

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 17:56 UTC (Mon) by farnz (subscriber, #17727) [Link]

The bug is trivially awful (because it's obvious when you see it, but not when you're working on it); for most of the swap code, the swap location for any given page is referenced as an offset from the beginning of the swap file (or device). For a swap partition, this is fine - the offset from the beginning of the device is the correct location in the swap partition, too. For swap files, the correct offset on the device has to account for where on the device the filesystem has placed swap extents.

In the old code, map_swap_page calls map_swap_entry, which accounts for this offset. In the new code, it calls swap_page_sector, which was previously only ever used for swap partitions, and thus didn't account for the filesystem offset. The fix is to change swap_page_sector to account for swap files, and thus to look at the filesystem-provided extent map. That way, the foot gun is removed for everyone.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 18:10 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

> In the old code, map_swap_page calls map_swap_entry, which accounts for this offset. In the new code, it calls swap_page_sector,

And doing that without pointing it out in the commit message is super sloppy. If you're saying that you're Just Inlining Code, you'd better just inline code

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:32 UTC (Tue) by epa (subscriber, #39769) [Link]

I dream about a compiler flag that would output a 'logical build' of the code. It would compile to some kind of bytecode where all symbol names have been stripped and all non-recursive function calls have been inlined. Then if the commit message says "renamed variable" or "inlined function" or even "use ?: ternary operator instead of if-else", you could automatically verify that indeed it hasn't changed the bytecode output and so has no effect on the meaning of the code. In other words you can flag commits as pure refactorings and have a machine assistant to verify that, so that human reviewers can concentrate their attention on the patches in the series that change something.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:36 UTC (Tue) by epa (subscriber, #39769) [Link]

P.S. I know that in general, it is impossible for a computer program to spot all possible cases where two programs have identical behaviour. I am not suggesting this assistant could have anything like 100% coverage, even for pure refactoring commits. But if it can tick off 80% of them that would still make a reviewer's life easier, and improve my peace of mind as a programmer. (Did that change from a for-loop to a while-loop affect the behaviour? Hmm, I'm not sure... instead of staring at it or working it through with pencil and paper, let's see if the bytecode has changed. If it has changed, the bytecode diff tool will usually point out what I've accidentally altered.)

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 19:09 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

I tend to imagine you can get 90% of the way there with gcc -S -O0 -finline-functions -findirect-inlining and some combination of the max-inline tuning parameters (whose documentation I'm finding a little hard to follow). But that won't cross translation units because it doesn't link. You'd need some kind of LTO optionally followed by a disassembly step to get the other 10%.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 21:01 UTC (Tue) by error27 (subscriber, #8346) [Link]

I use my rename_rev.pl script to review renamed the variable patches.

https://github.com/error27/rename_rev

It also has a -r NULL mode to review changes like "if (foo != NULL) {" --> "if (foo) {" because sometimes people get those transitions inverted. A bunch of little tricks like that.

I sometimes think like you do about ways to do something similar at the bytecode level or with static analysis but I can't think how that would work...

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 7:23 UTC (Wed) by epa (subscriber, #39769) [Link]

Another approach is when you use an automated tool to make the change in the first place. I mostly develop in C# and I use the proprietary tool Resharper to rename variables, inline methods, convert for-loops to foreach-loops, and a few more exotic transformations like "extract method" (take a chunk of code and move it to its own method, passing in and out the variables it uses). In this case the commit message could include full details of the transformation you applied, in machine-readable format. Then to verify the patch series you run the same transformation and check the output code is the same. (That would be an additional verification step; it checks you just ran the "convert for to foreach" automated refactoring and didn't accidentally introduce other changes, but it doesn't check that the refactoring tool itself is correct. So some kind of bytecode check would also be useful.)

Such a machine-readable record of the refactoring change would also be handy when rebasing. Instead of hitting a merge conflict on the 'renamed variable' commit, you could reapply that change using the refactoring tool. As long as both the old commit and the new one are pure refactorings (logical bytecode doesn't change), this can be done automatically as part of the rebase process, leaving the human programmer to concentrate on the conflicts that aren't trivial.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 8:10 UTC (Wed) by error27 (subscriber, #8346) [Link]

Yeah. If you make your patch with Coccinelle then we normally encourage you to post the script (unless it's one of the ones that ship with the kernel).

Linux 5.12's very bad, double ungood day

Posted Mar 20, 2021 23:50 UTC (Sat) by Hi-Angel (guest, #110915) [Link]

For C these days part of such refactoring could be done with LSP-servers, like clangd for example. Thankfully, all actively developed editors and IDEs support LSP servers, whether natively or through a plugin.

Linux 5.12's very bad, double ungood day

Posted Mar 21, 2021 5:34 UTC (Sun) by pabs (subscriber, #43278) [Link]

While GNU ed is actively developed, it seems unlikely it will ever support LSP :)

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 17:12 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

Thank you for this! You see, rcutorture does not use swap, so I was blissfully ignorant of this problem. I see another rebase in the -rcu tree's future. ;-)

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 17:25 UTC (Mon) by brice (subscriber, #35493) [Link]

Oh man. This reminds me of when I ran a benchmark on newly installed SSDs in a production colo box by punching holes on the raw block device instead of executing something that was filesystem aware. A couple days later the database was corrupted. And for some reason I was convinced it was a LSI raid firmware issue....

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 17:30 UTC (Mon) by syrjala (subscriber, #47399) [Link]

I've not seen a definite statement on whether the patch was tested at all by the author. I don't think it's unreasonable to expect changes to swapfile.c to be tested using a swapfile? "I was too busy/lazy to run any tests whatsoever" is not a valid excuse IMO.

An excuse I would accept is that appropriate tests were run but they simply did not catch the problem. In which case I would expect to see improvements to the tests (+ making sure someone is actually running them in their CI system) to make sure this same bug doesn't happen ever again.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 19:28 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Well, the state of patch status visibility on mailing lists is…low. Long mail delivery time doesn't help the issue. I'm fine with mailing lists as a development nexus, but there needs to be some infrastructure around them to indicate the status of the patches as superceded, reviewed, rejected, queued, etc.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 20:34 UTC (Mon) by flussence (subscriber, #85566) [Link]

That's what https://patchwork.kernel.org/ appears to already address.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 21:56 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Well, it *tries*. Look at this pull request:

https://patchwork.kernel.org/project/keyrings/patch/13239...

Its state is "New", but it has been dropped by the submitter. How would one change the state from "New" to "Dropped"?

This one is tracked as "New", but has been merged (as noted by the pr-tracker-bot). How did Patchtracker miss this?

https://patchwork.kernel.org/project/keyrings/patch/13228...

And this is for pull requests.

This one is "New", but has a "Reviewed-by" someone who can collect it for the subsystem. Where is it on its way into the appropriate tree(s)?

https://patchwork.kernel.org/project/keyrings/patch/20210...

And that's just the one list I follow loosely. How can one trust this site for actual tracking of arbitrary kernel patches with this quality of tracking? Sure, this list might not *use* patchtracker as much as other lists, but how does one determine *that* bit of information?

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 16:12 UTC (Tue) by andy_shev (subscriber, #75870) [Link]

Unfortunately patchwork is disconnected from the target repository. That's why Gerrit is better, for example.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 20:29 UTC (Tue) by marcH (subscriber, #57642) [Link]

> Unfortunately patchwork is disconnected from the target repository. That's why Gerrit is better, for example.

patchwork tries to find code submissions not addressed to it directly. All other code review solutions require everything to be sent to them directly, they act as gateways. That's the very simple reason why they have all the information and all work better.

The icing on the cake is of course recipient-controlled email notifications instead of sender-control notifications (a.k.a... "spam"). In other words people routinely subscribe and unsubscribe to/from individual code reviews; not possible with a crude mailing list.

None of this is rocket science, it's all very easy to see and understand except when obsessing about the "email versus web" user interface questions; these debates are not "wrong" but they hide the much more important backend and database design issues.

You could totally have a gateway type code review tool entirely driven by email. In fact requesting all submissions, review comments and approvals or rejections to be sent (by email) to patchwork directly and putting patchwork in better control of the git repo would get at least half-way there.

Linux 5.12's very bad, double ungood day

Posted Mar 11, 2021 16:04 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Another instance I saw today was that a version of the patch that went in was never on the list (a whitespace fix). While such a tiny change is probably OK in practice, I vastly prefer having a bot attached to the service that stashes away every version of the patchset for posterity.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 20:19 UTC (Mon) by cesarb (subscriber, #6266) [Link]

> I don't think it's unreasonable to expect changes to swapfile.c to be tested using a swapfile?

Even though it's called "swapfile.c", that file has the code for both swap files and swap partitions, including things like the swapon and swapoff system calls. Surprisingly little of that code is specific to either swap files only or swap partitions only, nearly all of it being in the setup and teardown of the swap areas (setting the block size on swapon for partitions, reading the swap extents from the filesystem on swapon for regular files). Search that file for S_ISBLK and S_ISREG to see the few places where the code diverges.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 23:17 UTC (Mon) by tome (subscriber, #3171) [Link]

> An excuse I would accept is that appropriate tests were run but they simply did not catch the problem. In which case I would expect to see improvements to the tests

Indeed this bug could pass tests on a swap file accidentally by just clobbering unused locations . If so those tests need more fuzz.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:21 UTC (Tue) by matthias (subscriber, #94967) [Link]

If the test does not verify that the data is written to the correct location in the swapfile, the test will probably pass. The data can be read back fine. Even if you hit some used location this will manifest as random filesystem corruption. You only notice this if you verify the filesystem afterwards, but why should a test of the swapping code verify the correctness of a totally unrelated filesystem?

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:44 UTC (Tue) by marcH (subscriber, #57642) [Link]

Tests may pass, but some time later the machine will behave erratically which will raise suspicion. Pretty much what happened but too far away from the origin of the patch and too late.

So back to the original question: which "swaptorture" test suite was used to validate this patch and did that test suite include configuration(s) with swap file(s)? The main article wonders "how things could have been done better" and asks some interesting code review questions but does not seem to mention anything like "test gap".

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 16:43 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Here's an idea:

- initialize a swapfile
- do things with it
- copy it to a new, fresh FS
- restore from it

That at least gets the "everything was written to the swapfile" (rather than willy-nilly across the hosting FS). Doesn't guarantee that *extra* writes didn't go out. Some form of writing a given bitpattern to the entire FS and then ensuring that post-FS init and swap file creation, nothing else changes (modulo mtime/inode updates) might suffice? I have no idea if such a "simple" FS exists though. Or create a single file which takes up the remaining FS space with a given bitpattern reads properly after using the swapfile. Wrecking any data or metadata would presumably be detectable then, no?

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 1:47 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

This is (hypothetically) a test. It doesn't need to be a reasonable FS in the first place. Tell the swapfile code that the file begins at such-and-such offset and occupies such-and-such size, and fill the rest of the image with 0xDEADBEEF or whatever sentinel value you like. I'm sure they could make an extremely stupid filesystem that works that way internally (and just returns EROFS or whatever if the user tries to create files or do other things it doesn't like), so you don't even need to properly mock out any of the FS code.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 19:28 UTC (Mon) by luto (subscriber, #39314) [Link]

I would consider this situation to demonstrate a problem with git. Sure, maintainers should avoid publishing known-bad trees / commits, but git bisect should be able to avoid generating known-bad states for testing.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 20:13 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I'd have a way to add a commit range to something like a `bisect.avoidIfPossible` configuration. It would just treat any commit in there as `skip` by default and would delve in only on explicit requirements (maybe each range could even have an explanation for the exclusion so I can know whether it applies to my situation). However…

Git just generally has a problem with shipping configuration values around. I can't make bullet-proof commands for others to use because I don't even know what remote names are in use. I vastly prefer `origin` for the main repo and `$prefix/$username` for any forks (including mine and the prefix depends on the hosting service). Others prefer `origin` for their fork and `upstream` for the main repository. Useful aliases, hooks, etc. all have non-existent ways of being distributed in git today. We have a script that developers should run to at least match the docs, but that's not generally used.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 20:28 UTC (Mon) by pebolle (subscriber, #35204) [Link]

What I wonder is why v5.12-rc2 wasn't done like this:
- restart at v5.11;
- add everything in v5.12-rc1-dontuse except the misguided commit;
- add the misguided commit and it's fix squashed as a single commit;
- add everything else that is now in v5.12-rc2 but not in v5.12-rc1-dontuse.

That should make v5.12-rc1-dontuse a dead end which no-one would ever hit while bisecting. (And v5.12-rc2 basically a second try at a v5.12-rc1.) Or doesn't git work like that?

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 21:13 UTC (Mon) by josh (subscriber, #17465) [Link]

That could absolutely be done. It'd mean that the repository would have a non-fast-forwarding change, which would be disruptive. It'd also set a precedent that non-fast-forwarding changes are an option for a serious enough bug, which might be undesirable.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 21:16 UTC (Mon) by geert (subscriber, #98403) [Link]

Following your approach would rebase the master branch of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/..., which is something Linus does not want to do ever. This would affect any tree that has pulled from this branch after the bad commit was merged into his tree.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:46 UTC (Tue) by marcH (subscriber, #57642) [Link]

> This would affect any tree that has pulled from this branch after the bad commit was merged into his tree.

That's the goal.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 10:21 UTC (Tue) by pr1268 (subscriber, #24648) [Link]

Following your approach would rebase the master branch [...], which is something Linus does not want to do ever.

But, in this particular case, he's making an exception. From our editor's article (emphasis mine):

That sort of rebasing would normally be frowned upon, but Torvalds made it clear that it is permissible — and expected — this time.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 10:44 UTC (Tue) by geert (subscriber, #98403) [Link]

The discussion was about rebasing Linus Torvalds' master branch, not about rebasing maintainers' branches for submitting pull requests.
If Linus had been comfortable about rebasing his own master branch, he would have done so for sure.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 19:30 UTC (Mon) by amarao (subscriber, #87073) [Link]

I never worked on kernel CI, but for our in-house needs there is a rule of red-green bugfixing. When you have same nastiness happened, you search for root cause, commit a red test to reproduce it, and fix it to make test green. That more or less protects against repeating the same type of mistake/misunderstanding, and has amazingly high 'catch ratio'. These 'red-green' derived tests usually raise alarm more often than synthetic inventions of QA or developers.
Basically, in this story I can't see the finale: rather trivial test to see that filesystem with swap file is not corrupted after swap in / out in a pagefile on it.

Even without kernel instrumentation it would take about hour or two to develop.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 21:20 UTC (Mon) by roc (subscriber, #30627) [Link]

Yes. I was a little sad to see that "add some automated tests that would have caught this" (that run before every RC release, if not every commit) was not one of the "thinking about how things could have been done better" items in this article.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 21:42 UTC (Mon) by airlied (subscriber, #9104) [Link]

automated tests did catch it. It took out the Intel graphics CI system for a few days while they root caused it.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 23:18 UTC (Mon) by roc (subscriber, #30627) [Link]

Something is definitely wrong if automated tests caught it and it was still released as rc1 and people still based work on top of it.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 4:26 UTC (Tue) by airlied (subscriber, #9104) [Link]

the intel graphics CI only gets it hands on things at rc1, there is usually a bunch of minor regressions, and sometimes a major one.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 12:01 UTC (Tue) by roc (subscriber, #30627) [Link]

Well then, like I said above: tests need to be run before the RC releases. This is testing 101 stuff that is table stakes for most projects today.

I am constantly frustrated by the kernel testing culture, or lack of it. rr has approximately zero resources behind it and our automated tests are still responsible for detecting an average of about one kernel regression every release.

Every time I bring this up people have a string of excuses, like how it's hard to test drivers etc. Some of those are fair, but the bug in this article and pretty much all the regressions found by rr aren't in drivers, they're in core kernel code that can easily be tested, even from user space.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 13:27 UTC (Tue) by pizza (subscriber, #46) [Link]

> Well then, like I said above: tests need to be run before the RC releases.

Okay, so... when exactly?

There are 10K commits (give or take a couple thousand) that land in every -rc1. Indeed, until -rc1 lands, nobody can really be sure if a given pull request (or even a specific patch) will get accepted. This is why nearly all upstream tooling treats "-rc1" as the "time to start looking for regressions" inflection point [1], and they spend the next *two months* fixing whatever comes up. This has been the established process for over a decade now.

So what if there was a (nasty) bug that takes down a test rig? That's what the test rigs are for! The only thing unusual about this bug is that it leads to silent corruption, to the point where "testing" in of itself wasn't enough; the test would have had to been robust enough to ensure nothing unexpected was written anywhere to the disk. That's a deceptively hairy testing scenario, arguably going well beyond the tests folks developing filesystems run.

Note I'm not making excuses here; it is a nasty bug and clearly the tests that its developers ran was insufficient. But it is ridiculous to expect "release-quality" regression testing to be completed at the start of the designated testing period.

[1] Indeed, many regressions are due to the combinations of unrelated changes in a given -rc1; each of those 10K patches in of themselves is fine, but (eg) patch #3313 could lead to data loss, but only in combination of a specific kernel option being enabled, and run on a system containing an old 3Ware RAID controller and a specific motherboard with a PCI-X bridge that can't pass through MSI interrupts due to how it was physically wired up. [2] [3]

[2] It's sitting about four feet away from me as I type this.

[3] Kernel bugzilla #43074

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 14:11 UTC (Tue) by epa (subscriber, #39769) [Link]

I guess if the test setup uses a checksummed filesystem then you verify the filesystem after the tests have completed and if it's corrupted, that's a failure. If the filesystem doesn't have per-file checksums, you can at least do the usual fsck stuff to check metadata. (For testing it would be handy to have a filesystem mode that always zeroes out unused pages, so that a thorough fsck can later check that all unused pages are zero.)

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:12 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Linus can issue a pre-RC (alpha1?) to give time to run the tests, a day before the actual RC.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:22 UTC (Tue) by geert (subscriber, #98403) [Link]

That's why we linux-next, which is used for lots of automated integration tests

$ git tag --contains 48d15436fde6
next-20210128
next-20210129
next-20210201
next-20210202
next-20210203
next-20210204
next-20210205
next-20210208
next-20210209
next-20210210
next-20210211
next-20210212
next-20210215
next-20210216
next-20210217
next-20210218
next-20210219
next-20210222
next-20210223
next-20210224
next-20210225
next-20210226
next-20210301
next-20210302
next-20210303
next-20210304
next-20210305
next-20210309
v5.12-rc1
v5.12-rc1-dontuse
v5.12-rc2

Three weeks passed between the buggy commit entering linux-next and upstream.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:35 UTC (Tue) by pizza (subscriber, #46) [Link]

> That's why we linux-next, which is used for lots of automated integration tests
> Three weeks passed between the buggy commit entering linux-next and upstream.

So the "problem" here isn't that nothing was being tested, it's just that none of the tests run during this interval window caught this particular issue. It's also not clear that there was even a test out there that could have caught this, except by pure happenstance.

But that's the reality of software work; a bug turns up, write a test to catch it (and hopefully others of the same class), add it to the test suite (which runs as often as your available resources allow) .... and repeat endlessly.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:25 UTC (Tue) by pizza (subscriber, #46) [Link]

... okay, so rename "rc1" to "alpha1" , "rc2" to "alpha2" and so forth. Problem solved?

Not that it will stop folks complaining when "5.32-alpha0-rc4-pre3" fails to boot on their production system, obviously because it should have been tested first, and we need a pre-pre-pre-pre-pre release snapshot to start testing against.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Kinda. You'll have alphaN that is released before rcN, and the distinction is that alphaN is meant only for automatic testing on throwaway hardware.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 17:47 UTC (Tue) by Wol (subscriber, #4433) [Link]

And how many people will ignore that (or be unaware) and load alphaN on their production server anyway?

Horse to water and all that ...

Cheers,
Wol

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 19:55 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Crazy idea: require kernel.testing.alpha=5.20.0-alpha1 on the cmdline to boot such an alpha kernel. Reject such an option in non-alpha kernels (including development kernels; the tagged release would require this code and not otherwise).

But this kind of one-off code is annoying to test itself and someone will script adding it to their boot command lines anyways.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 21:58 UTC (Wed) by sjj (subscriber, #2020) [Link]

How many people run non-distro kernels these days, especially in production? If you do that with an rc kernel, you certainly deserve whatever pieces are left of your data.

I don’t think I’ve built a kernel in 10 years, or maybe that one time 7-8 years ago.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 22:22 UTC (Wed) by roc (subscriber, #30627) [Link]

In the past, when reporting a bug in a release kernel, people have asked me to install some random kernel revision to see if the bug is still present with it. If we should never do that, people should stop asking for it.

Linux 5.12's very bad, double ungood day

Posted Mar 11, 2021 8:43 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

You weren't supposed to do that in production though. Also, answering "sorry I can't" is perfectly valid. :)

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 23:20 UTC (Wed) by Wol (subscriber, #4433) [Link]

> I don’t think I’ve built a kernel in 10 years, or maybe that one time 7-8 years ago.

You clearly don't run gentoo :-)

Cheers,
Wol

Linux 5.12's very bad, double ungood day

Posted May 2, 2021 2:58 UTC (Sun) by pizza (subscriber, #46) [Link]

> Not that it will stop folks complaining when "5.32-alpha0-rc4-pre3" fails to boot on their production system, obviously because it should have been tested first, and we need a pre-pre-pre-pre-pre release snapshot to start testing against.

I saw this scroll by when I upgraded this system to Fedora 34:

$ rpm -q icedtea-web
icedtea-web-2.0.0-pre.0.3.alpha16.patched1.fc34.3.x86_64

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 22:44 UTC (Tue) by dbnichol (subscriber, #39622) [Link]

Why not test linus master HEAD prior to tagging? It's not like he pulls all the requests and tags the release in one giant push. Testing a tagged release makes sense as that's what people are going to use, but a CI system could test HEAD constantly. I think that's what basically every other project does - test what's on the branch before you tag it.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 2:43 UTC (Wed) by roc (subscriber, #30627) [Link]

Many projects run tests on every PR before merging. It's difficult to get this to scale, but not impossible.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 2:34 UTC (Wed) by roc (subscriber, #30627) [Link]

Projects that are serious about testing run all their automated tests as often as they can. The sooner you detect a regression the easier it is to bisect, debug, and fix, with less impact on other developers and users.

In practice, large projects often try to maximise bang-for-the-buck by dividing tests into tiers, e.g. tier 1 tests run on every push, tier 2 every day, maybe a tier 3 that runs less often. Many projects use heuristics or machine learning to choose which tests to run in each run of tier 1.

Yes, I understand that it's difficult to thoroughly test weird hardware and configuration combinations. Ideally organizations that produce hardware with Linux support would contribute testing on that hardware. But even if we ignore all those bugs, there are still lots of core kernel bugs not being caught by kernel CI.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 23:33 UTC (Mon) by mss (subscriber, #138799) [Link]

Phoronix also hit it pretty early:
https://phoronix.com/scan.php?page=news_item&px=Linux...

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 14:29 UTC (Tue) by daenzer (subscriber, #7050) [Link]

Like other commenters, I was a bit disappointed that the (otherwise excellent as usual) article doesn't discuss the possibility of automated tests catching buggy changes like this before they are merged anywhere. The kernel is behind current best practices there.

E.g. it shouldn't be hard to hook up tests to a GitLab CI pipeline which can catch this bug (and more), and only allow merging changes which pass the tests.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 14:54 UTC (Tue) by pbonzini (subscriber, #60935) [Link]

There's already CKI, KernelCI and more. However testing kernels is not that easy. For example for obvious reasons you cannot install a kernel from a bog standard gitlab pipeline.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 16:12 UTC (Tue) by pabs (subscriber, #43278) [Link]

You could run user-mode-linux, or possibly kvm if the platform has nested VMs.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 17:32 UTC (Tue) by daenzer (subscriber, #7050) [Link]

Yeah, user-mode-linux should have sufficed in this case.

And there are CI pipelines on https://gitlab.freedesktop.org/ running VMs via KVM, which should allow booting kernels built as part of the pipeline as well. (Another possibility is having dedicated test machines which can be powered on/off remotely; Mesa is using a number of those for testing GPU drivers)

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 21:11 UTC (Tue) by marcH (subscriber, #57642) [Link]

> I was a bit disappointed that the (otherwise excellent as usual) article doesn't discuss the possibility of automated tests catching buggy changes like this before they are merged anywhere. The kernel is behind current best practices there.

I think the gap in the article hides a much more worrying gap. There's a long and interesting discussion in the middle of the comments about test setups and workflows with not a single link to .... test code. Is there any?

Automated tests

Posted Mar 9, 2021 21:33 UTC (Tue) by corbet (editor, #1) [Link]

One could certainly write an automated test to catch this, but it would not be easy. You'd need to set up a machine with a swap file, drive it into memory pressure with a lot of dirty anonymous pages, then somehow verify that none of the swap traffic went astray. That means comparing the entire block device (including free space) with what you expect it to be, or mapping out the swap file, picking the swap traffic out of a blktrace stream, and ensuring that each page goes to the right place.

Certainly doable, but this would not be a fast-running test.

Automated tests

Posted Mar 10, 2021 2:49 UTC (Wed) by roc (subscriber, #30627) [Link]

You could make it run pretty fast by having the test generate a virtual machine image that is just big enough, i.e. a minimal amount of memory and a minimal-sized block device. Lots of tests could potentially benefit from this.

You'd have to write block device verification code to check the free space and the contents of all files, but that code could be useful for detecting all kinds of bugs.

One thing about automated testing is that once you bite the bullet and start creating infrastructure for things that look hard to test, you make it easier to test all kinds of things and people are much more willing to write tests for all kinds of things as part of their normal development. So the sooner you create such infrastructure, the better.

Automated tests

Posted Mar 10, 2021 2:52 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> One could certainly write an automated test to catch this, but it would not be easy.
I was one of people responsible for getting EC2 instances to hibernate. We used files for hibernation and actually found that the kernel had been broken for YEARS with file hibernation (it required a reboot for the hibernation target setting to take effect).

We also had a test for this very issue. It created an EC2 instance with a small disk (~2Gb) and limited RAM (512Mb). The test program created a swap file and then filled the disk to capacity with pseudo-random numbers (by creating a file and writing to it). It then allocated enough pseudo-random data to swap out at least some of it.

Then hibernate, thaw, and checksum the disk and the data in RAM to check for corruption.

The tests ran in about 2 minutes.

Automated tests

Posted Mar 10, 2021 22:12 UTC (Wed) by sjj (subscriber, #2020) [Link]

Interesting, I never thought about hibernation in AWS. I haven’t thought about hibernation in years, since it was the unreliable thing you had to do on laptops of the day.

Curious what the use case for it in AWS is.

Hibernation

Posted Mar 10, 2021 22:18 UTC (Wed) by corbet (editor, #1) [Link]

That work, and the use case behind it, were discussed in this OSPM article from last May.

Automated tests

Posted Mar 10, 2021 6:42 UTC (Wed) by marcH (subscriber, #57642) [Link]

> You'd need to set up a machine with a swap file,

As this is an apparently common configuration, you'd expect anyone modifying swap code to grant that configuration some reasonable amount of test time.

> drive it into memory pressure with a lot of dirty anonymous pages,

Also called "swapping"?

> then somehow verify that none of the swap traffic went astray.

Unless you're re-installing your entire system every few tests, there's a good chance you will soon notice something somewhere has gone terribly wrong even when not verifying every byte on the disk. This is apparently how the bug was found and relatively quickly by people not even testing swap but other things. The perfect that does not get done is the enemy of the good that does and this is especially true with chronically underdeveloped validation.

Automated tests

Posted Mar 11, 2021 19:56 UTC (Thu) by thumperward (guest, #34368) [Link]

For an integration test that would specifically have caught this issue, sure. But given the assumption that swap_page_sector() could be called on a swap file, a unit test that called swap_page_sector() on a swap file with a given input and verify that the file contained the right bytes in the right order afterwards is something that could well have existed and caught said bug before the refectoring inadvertently exposed it.