Variable-length arrays and the max() mess

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jonathan Corbet
March 12, 2018

Variable-length arrays (VLAs) have a non-constant size that is determined (and which can vary) at run time; they are supported by the ISO C99 standard. Use of VLAs in the kernel has long been discouraged but not prohibited, so there are naturally numerous VLA instances to be found. A recent push to remove VLAs from the kernel entirely has gained momentum, but it ran into an interesting snag on the way.

A VLA is simply an array that is declared with a non-constant dimension. For example, btree_merge() begins like this:

    int btree_merge(struct btree_head *target, struct btree_head *victim,
		    struct btree_geo *geo, gfp_t gfp)
    {
	unsigned long key[geo->keylen];
	unsigned long dup[geo->keylen];

The length of both the key and dup arrays is determined by the value stored in the keylen field of the passed-in geo structure. The compiler cannot know what that value will be at compile time, so those arrays must be allocated at run time. For this reason, VLAs must be automatic variables, allocated on the stack.

As an extension to the language, GCC also allows the use of VLAs within structures. The LLVM Clang developers have refused to add this extension, though; that has led developers interested in building the kernel with Clang to work to remove VLAs from structures in the kernel. C99 VLAs are supported by Clang, though, so developers working on Clang compatibility have not paid much attention to them.

Since VLAs are standard C, one might wonder why there is a desire to remove them. One reason is that they add a bit of run-time overhead, since the size of a VLA must be calculated every time that the function declaring it is called. But the bigger issue has to do with stack usage. Stacks in the kernel are small, so every kernel developer must be constantly aware of how much stack space each function is using. VLAs, since they are automatic variables whose size is determined at run time, add a degree of uncertainty to a function's stack usage. Changes in distant code might result in a VLA growing in surprising ways. If an attacker finds a way to influence the size of a VLA, the potential for all kinds of mischief arises.

For these reasons, Linus Torvalds recently declared that "using VLA's is actively bad not just for security worries, but simply because VLA's are a really horribly bad idea in general in the kernel". That added some energy to the VLA-removal work that was already underway. In the process, though, Kees Cook discovered an interesting surprise: a number of VLAs in the kernel are not actually variable in length and were never meant to be seen as such by the compiler.

Accidental VLAs and the difficult story of max()

A useful tool in the quest to remove VLAs from the kernel is the GCC -Wvla option, which issues warnings when VLAs are declared. Cook found, though, that it was issuing warnings for arrays that were meant to be of constant size; one of them was this bit of code from lib/vsprintf.c:

    #define RSRC_BUF_SIZE	((2 * sizeof(resource_size_t)) + 4)
    #define FLAG_BUF_SIZE	(2 * sizeof(res->flags))
    #define DECODED_BUF_SIZE	sizeof("[mem - 64bit pref window disabled]")
    #define RAW_BUF_SIZE	sizeof("[mem - flags 0x]")
	char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
		     2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];

The length of sym is clearly constant and can be determined at compile time, but GCC warns that sym is a VLA anyway. The problem turns out to be the kernel's max() macro, which generates an expression that is not recognized as constant by the compiler.

If one looks on page 87 of the original edition of The C Programming Language by Kernighan and Ritchie, one will find the classic definition of the max() macro:

    #define max(A, B)  ((A) > (B) ? (A) : (B))

There are some problems with this definition that make it unsuitable for kernel use, including the double-evaluation of the arguments and the lack of any sort of type checking. So a lot of effort has gone into the kernel's special version of max():

    #define __max(t1, t2, max1, max2, x, y) ({		\
	t1 max1 = (x);					\
	t2 max2 = (y);					\
	(void) (&max1 == &max2);			\
	max1 > max2 ? max1 : max2; })

    #define max(x, y)					\
	__max(typeof(x), typeof(y),			\
	      __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),	\
	      x, y)

For the curious, the outer max() macro uses __UNIQUE_ID() to generate two unique variable names. Those names, along with the types of the two values and the values themselves, are passed to __max(). That macro declares two new variables (using the unique names) and assigns the passed-in values to them; this is done to prevent x and y from being evaluated more than once. Pointers to those two variables are then compared while discarding the result; this is done to force the compiler to issue an error if the two operands do not have compatible types. Finally, the two values themselves are compared to determine which one is greater.

It was initially assumed that GCC was simply not smart enough to understand that the result of this whole mess was still constant, so it created a VLA instead. Torvalds eventually figured out the real problem, though: the C standard makes a distinction between a "constant value" and a "constant expression". Array dimensions are required to be constant expressions, but the max() macro does not qualify as such. As Torvalds pointed out, the warning from GCC is thus somewhat misleading; while the compiler is emitting the VLA code for these arrays, they are not actually variable in length, and the problem is more a matter of syntax.

Regardless of the reason for their creation, it clearly would be a good thing to get rid of these "accidental" VLAs. What followed was an effort to rewrite max(); it was the sort of quest that the kernel community is so good at mustering. At one point, Cook came up with this:

    #define __single_eval_max(t1, t2, max1, max2, x, y) ({	\
 	t1 max1 = (x);					\
 	t2 max2 = (y);					\
 	(void) (&max1 == &max2);			\
 	max1 > max2 ? max1 : max2; })

    #define __max(t1, t2, x, y)						\
	__builtin_choose_expr(__builtin_constant_p(x) &&		\
			      __builtin_constant_p(y),			\
			      (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),	\
			      __single_eval_max(t1, t2,			\
						__UNIQUE_ID(max1_),	\
						__UNIQUE_ID(max2_),	\
						x, y))

    #define max(x, y)	__max(typeof(x), typeof(y), x, y)

Essentially, this version uses more macro trickery to short out much of the max() logic when the two operands are constants. It worked well — on recent compilers. The results turned out to not be so pleasing on older compilers, though. Initially there was some thought of revisiting the discussion on deprecating support for older compilers, but then 4.8.5 was reported to fail as well. At that point, Cook threw up his hands and gave up on creating a better max(). The solution going forward is likely to be what he suggested when he first discovered the problem: create a new SIMPLE_MAX() macro that is really just the original Kernighan and Ritchie max() with a different name. It is good enough for constant array dimensions.

Finishing the job

Now that this discussion has run its course, the process of eliminating all of the non-accidental VLAs can continue. There are just over 200 of them in the kernel; many of them are easily replaced with ordinary fixed-length arrays. Several developers have posted patches eliminating VLAs from various kernel subsystems. Getting rid of all of them will probably take some time; a few instances may require fairly deep analysis to determine the real maximum length and, perhaps, internal API changes. Eventually, though, we will get to a point where a -Wvla build generates no warnings.

Index entries for this article
Kernel	Variable-length arrays

(Log in to post comments)

Variable-length arrays and the max() mess

Posted Mar 12, 2018 21:50 UTC (Mon) by Sesse (subscriber, #53779) [Link]

Occasionally, C++ (in this case, with constexpr std::max) does seem like the more sane language. :-) Only occasionally, though.

Variable-length arrays and the max() mess

Posted Mar 13, 2018 7:38 UTC (Tue) by kccqzy (subscriber, #121854) [Link]

Really? I think the kernel developers want to additionally make sure the two expressions have the same type, and cause a compile-time error if they do not.

My C++ is quite rusty but here’s my version:

#include <type_traits>

template <typename T1, typename T2,
          typename = typename std::enable_if<
              std::is_same<typename std::decay<T1>::type,
                           typename std::decay<T2>::type>::value,
              T1>::type>
inline constexpr auto maxx(T1 a, T2 b) {
  return a > b ? a : b;
}

int test(void) {
  int a = maxx(12, 14);         // should compile
  int b = maxx(12ul, 14l);      // should fail
  int c[maxx(17, 19)];          // should compile
}

Variable-length arrays and the max() mess

Posted Mar 13, 2018 8:30 UTC (Tue) by Homer512 (subscriber, #85295) [Link]

I could be wrong but that seems unnecessary. std::max will already fail to compile if you give it two different integer types. You just have to define it with a single type for both both parameters.

template<class T>
constexpr T max(const T& left, const T& right);

Also works for pass-by-value.

Variable-length arrays and the max() mess

Posted Mar 13, 2018 8:32 UTC (Tue) by Sesse (subscriber, #53779) [Link]

Yes; it's defined with one template parameter only, so if you give it two different types, you'll either need to cast one or (better) say explicitly which max() you want, e.g. std::max<int>(a, b). See http://en.cppreference.com/w/cpp/algorithm/max.

Variable-length arrays and the max() mess

Posted Mar 16, 2018 4:22 UTC (Fri) by inphi (guest, #120554) [Link]

assuming C++, I wonder how much of the kernel can be constexpr'd and what would be the performance consequences.

Variable-length arrays and the max() mess

Posted Mar 22, 2018 19:10 UTC (Thu) by antnisp (guest, #123056) [Link]

Other than VLAs, what else precludes g++ from compiling the kernel?

Variable-length arrays and the max() mess

Posted Mar 22, 2018 20:53 UTC (Thu) by excors (subscriber, #95769) [Link]

https://lwn.net/Articles/734449/ has some relevant information.

Variable-length arrays and the max() mess

Posted Mar 12, 2018 22:44 UTC (Mon) by zblaxell (subscriber, #26385) [Link]

I might have missed this in the discussion, but surely that should be

#define __max(t1, t2, max1, max2, x, y) ({ \
const t1 max1 = (x); /* add const here */ \
const t2 max2 = (y); \
(void) (&max1 == &max2); \
max1 > max2 ? max1 : max2; })

#define max(x, y) \
__max(typeof(x), typeof(y), \
__UNIQUE_ID(max1_), __UNIQUE_ID(max2_), \
x, y)

otherwise &max1 is a pointer to non-const t1, and who knows what side-effects that has. (OK, you and I know, but the compiler may not).

Variable-length arrays and the max() mess

Posted Mar 14, 2018 14:51 UTC (Wed) by fweimer (guest, #111581) [Link]

Not sure what your point is, but const variables are not constant expressions in C. This creates a VLA:

const int count = 5;
int array[count];

Variable-length arrays and the max() mess

Posted Mar 14, 2018 17:50 UTC (Wed) by zblaxell (subscriber, #26385) [Link]

The original version, annotated:

#define __max(t1, t2, max1, max2, x, y) ({	\
    t1 max1 = (x);	\    // max1 is a non-const t1
    t2 max2 = (y);	\    // max2 is a non-const t2
    (void) (&max1 == &max2);	\
        // compute address of a non-const t1
        // compute address of a non-const t2
        // compare addresses
        // don't emit code for the last three lines
        // see questions below
    max1 > max2 ? max1 : max2; })

At what point do the compiler's pointer-aliasing optimization rules kick in? Does the compiler realize the pointers are not used, or must it assume that because these pointers exist, that max1 and max2 might be modified before the comparison? Does the & operator force the compiler to instantiate max1 and max2, then later elide the code but not forget the side-effect? Does even asking this question cause the compiler to be unable to reduce this to a compile-time constant, and turn this into a VLA instead?

Variable-length arrays and the max() mess

Posted Mar 13, 2018 2:26 UTC (Tue) by nevyn (subscriber, #33129) [Link]

Anyone know why they don't use the new magic max() on newer compilers, but continue letting older ones generate "VLAs"?

Variable-length arrays and the max() mess

Posted Mar 13, 2018 11:24 UTC (Tue) by anselm (subscriber, #2796) [Link]

At that point, Cook threw up his hands and gave up on creating a better max().

Does that mean Miracle max() is mostly dead?

Variable-length arrays and the max() mess

Posted Mar 14, 2018 18:57 UTC (Wed) by tmattox (subscriber, #4169) [Link]

LOL! Love that movie!

Variable-length arrays and the max() mess

Posted Mar 13, 2018 13:20 UTC (Tue) by epa (subscriber, #39769) [Link]

Do variable-length arrays have to go on the stack? Could the compiler generate code to call some allocation function, and then some free function when leaving the stack frame? These would have to be specified by the programmer somehow as compiler options, so it would be un-C-like, but might give a way to have the convenient syntax of VLAs while doing memory management the way kernel programmers like.

VLAs on the heap

Posted Mar 13, 2018 14:47 UTC (Tue) by corbet (editor, #1) [Link]

The kernel community tends to be less than thrilled about that kind of behind-the-curtain magic; it obscures things in a setting where developers really need to know exactly what is happening. There's also the little issue of figuring out which GFP flags to use. So I don't think I see anything like this happening.

Variable-length arrays and the max() mess

Posted Mar 13, 2018 14:50 UTC (Tue) by comio (subscriber, #115526) [Link]

if you needn't why you should use? 90% of cases you can use fixed size vectors. About 10% you can find a solid way avoiding vla. Why do kernel's devs enable warning on VLA by default? this will reduce the usage in short time avoiding wrong coding style in the future.

Variable-length arrays and the max() mess

Posted Mar 13, 2018 14:53 UTC (Tue) by comio (subscriber, #115526) [Link]

I killed royal majesty the queen of UK with the last post.

If you needn't why should you use? In 90% of cases you can use fixed size vectors instead. About 10%, you can find a solid way avoiding vla. Why don't kernel's devs enable warning on VLA by default? this will reduce the usage in short time avoiding wrong coding style in the future.

Variable-length arrays and the max() mess

Posted Mar 13, 2018 16:52 UTC (Tue) by Paf (subscriber, #91811) [Link]

I suspect they *will* enable warn on VLA by default, once they've sorted removing all the existing ones. And I also suspect that some part of the automatic building/testing infrastructure of the kernel will be tweaked to fail when VLA warnings are found.

Variable-length arrays and the max() mess

Posted Mar 17, 2018 21:43 UTC (Sat) by gerdesj (subscriber, #5446) [Link]

Missing out two letters and an apostrophe is hardly regicide!

However, if you are going to pick on a monarch, you ought to be aware that that particular Queen is multiple monarchs at the same time. You will also have attacked the Queen of Australia, New Zealand, Canada, Jamaica and several others. Each of those Queens is a separate person who happens to be the same physical individual. I'm not sure how you could have bumped off just one of them. Anyway, why did you pick on ours?

God bless her and all who sail in her!

Variable-length arrays and the max() mess

Posted Mar 13, 2018 16:10 UTC (Tue) by mb (subscriber, #50428) [Link]

What if the allocation fails?

Variable-length arrays and the max() mess

Posted Mar 13, 2018 17:29 UTC (Tue) by BenHutchings (subscriber, #37955) [Link]

Then BUG(). Which is certainly no worse than the result of a stack allocation failure (i.e. stack overflow).

Variable-length arrays and the max() mess

Posted Mar 14, 2018 17:33 UTC (Wed) by mkatiyar (guest, #75286) [Link]

> doing memory management the way kernel programmers like.

unfortunately ppl do like to do crazy things like.

foo(int x) {
int vla[x];
int y;

// size of x =
printf("%d\n", (&y - &vla[0])/sizeof(int *));
....
}

This clearly won't work if vla is not on stack.

Variable-length arrays and the max() mess

Posted Mar 14, 2018 18:15 UTC (Wed) by dtlin (subscriber, #36537) [Link]

That's UB with or without VLAs. Pointer arithmetic is only defined within the same object (or one past the end of an array).

Variable-length arrays and the max() mess

Posted Mar 22, 2018 13:13 UTC (Thu) by HelloWorld (guest, #56129) [Link]

By the way, is it allowed to have a pointer that is one past the end of a variable that isn't an array? Something like this:

void print_ints(const int *begin, const int *end) {
  for (; begin < end; ++begin) {
    printf("%d\n", *begin);
  }
}

int main() {
  int i = 42;
  print_ints(&i, &i + 1);
  return 0;
}

Variable-length arrays and the max() mess

Posted Mar 22, 2018 13:17 UTC (Thu) by HelloWorld (guest, #56129) [Link]

When compiling this with -fsanitize=address, it doesn't complain, so it's probably OK… but still, i isn't technically an array, right?

Variable-length arrays and the max() mess

Posted Mar 22, 2018 15:08 UTC (Thu) by excors (subscriber, #95769) [Link]

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf says "For the purposes of these operators ['+' and '-'], a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.", immediately before defining the rules for pointer arithmetic (where it's undefined behaviour unless "both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object"). So it sounds to me like your code is fine.

Variable-length arrays and the max() mess

Posted Apr 12, 2018 23:24 UTC (Thu) by HelloWorld (guest, #56129) [Link]

Ah right, that is the part of the spec I was looking for, thanks!

Variable-length arrays and the max() mess

Posted Mar 14, 2018 8:04 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

All of this macrology...why not an inline function for max()?

Variable-length arrays and the max() mess

Posted Mar 14, 2018 10:06 UTC (Wed) by Koral (guest, #115236) [Link]

When a function is declared the type of the arguments are specified.
max macro is type agnostic.

Variable-length arrays and the max() mess

Posted Mar 14, 2018 17:35 UTC (Wed) by mkatiyar (guest, #75286) [Link]

Inline functions are not compile time...isn't it ? So you can't declare global variables using max.