RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Tue May 9 22:40:34 UTC 2017

Warning: It may be wise to stock up on coffee or tea before reading this.

Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>

* Background (from the JBS description)

x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.

Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.

We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.

This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:

// Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
// (no special code is needed since x86 CPUs can access unaligned data)

While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.

I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.

bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:

With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.

With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.

To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.

* About the change

The key changes are in three different areas:

1. copy.[ch]pp

Introducing: conjoint_swap_if_needed

conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).

2. classFile{Parser,Stream}

The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.

However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.

Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.

3. bytes_x86.hpp

This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:

It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.

* Testing

I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!

Cheers,
Mikael