RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Tue May 16 23:11:50 UTC 2017

I am concern about using libc call memcpy() for few bytes coping. Mikael says that he verified that it is intrinsified by most C++ compilers. What about Solaris C++ which is less advance then gcc? 
Also what happens in fasdebug JVM?

May we should also put TODO comments into bytes_aarch64.hpp and bytes_s390.hpp. Or file bugs.

Thanks,
Vladimir

On 5/15/17 11:34 AM, Mikael Vidstedt wrote:
>
> New webrevs:
>
> full: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
> incremental (from webrev.00): http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01.incr/hotspot/webrev/
>
> I definitely want a second reviewer of this, and I’m (still) taking suggestions on tests to run etc.!
>
> Also, comments inline below..
>
>> On May 9, 2017, at 5:12 PM, David Holmes <david.holmes at oracle.com> wrote:
>>
>> Hi Mikael,
>>
>> On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>>>
>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>
>> Overall this looks good to me. I like the refactoring from Bytes to Endian - that simplifies things a lot I think.
>
> Thanks, I like it too ;)
>
>> The actual "copy/swap" changes and templates I'm not an expert on but I get the gist and they seemed okay.
>>
>> Was a little unsure about all the changes to void* from u2*/u1* in classFileParser.h/cpp - does that just simplify use of the copy/swap code? Though I see some casts to u2* are no longer necessary as well.
>
> Right, I could go either way here, but when I personally see a u2* I feel tempted to just dereference it, and that’s not valid in these cases. With void* it’s more obvious that you need to do something else to get the data, but the width/type of the underlying data is lost. It may make sense to introduce a few helpful typedefs of void* which include in their names the underlying data type, or something along those lines.
>
> I suggest wrapping up and pushing what I have and working on improving the story here as a separate change. Reasonable?
>
>> A couple of oddities I noticed:
>>
>> src/share/vm/classfile/classFileStream.hpp
>>
>> Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems superfluous and all uses can be replaced by the existing current() accessor.
>
> Good point, get_u1_buffer can be removed (and it’s gone in the new webrev).
>
>>
>> ---
>>
>> src/share/vm/classfile/classFileParser.cpp
>>
>> We do we have void* here:
>>
>> 1707   const void* const exception_table_start = cfs->get_u1_buffer();
>>
>> but u1* here:
>>
>> 1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();
>
> Good catch. Changed to void*.
>
> Cheers,
> Mikael
>
>>
>> Thanks,
>> David
>>
>>> * Background (from the JBS description)
>>>
>>> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>>>
>>> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>>>
>>> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>>>
>>> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>>>
>>> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
>>> // (no special code is needed since x86 CPUs can access unaligned data)
>>>
>>> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>>>
>>> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>>>
>>> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>>>
>>> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>>>
>>> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>>>
>>> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>>>
>>>
>>>
>>> * About the change
>>>
>>> The key changes are in three different areas:
>>>
>>> 1. copy.[ch]pp
>>>
>>> Introducing: conjoint_swap_if_needed
>>>
>>> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>>>
>>> 2. classFile{Parser,Stream}
>>>
>>> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>>>
>>> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>>>
>>> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>>>
>>>
>>> 3. bytes_x86.hpp
>>>
>>> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>>>
>>> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>>>
>>>
>>> * Testing
>>>
>>> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>>>
>>> Cheers,
>>> Mikael
>>>
>