Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV

Sun Jul 12 11:54:55 UTC 2015

Non reviewer here, but I'd add to the comment *why* you don't want to scale
again.

Cheers,
Martijn

On 12 July 2015 at 11:29, Serkan Özal <serkan at hazelcast.com> wrote:

> Hi all,
>
> I have created a webrev for review including the patch and shared for
> public access from here:
> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>
> Regards.
>
> On Sat, Jul 4, 2015 at 9:06 PM, Serkan Özal <serkan at hazelcast.com> wrote:
>
>> Hi,
>>
>> I have added some logs to show that problem is caused by double scaling
>> of offset (index)
>>
>> Here is my updated (log messages added) reproducer code:
>>
>>
>> int count = 100000;
>> long size = count * 8L;
>> long baseAddress = unsafe.allocateMemory(size);
>> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>>                    ", End address: " + Long.toHexString(baseAddress +
>> size));
>>
>> for (int i = 0; i < count; i++) {
>>     long address = baseAddress + (i * 8L);
>>     System.out.println(
>>         "Normal: " + Long.toHexString(address) + ", " +
>>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
>> 8L)));
>>     long expected = i;
>>     unsafe.putLong(address, expected);
>>     unsafe.getLong(address);
>> }
>>
>>
>> After sometime it crashes as
>>
>>
>> ...
>> Current thread (0x0000000002068800):  JavaThread "main" [_thread_in_Java,
>> id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>>
>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
>> ...
>> ...
>>
>>
>> And here is output of the execution until crash:
>>
>> Start address: 58bbcfa0, End address: 58c804a0
>> Normal: 58bbcfa0, If double scaled: 58bbcfa0
>> Normal: 58bbcfa8, If double scaled: 58bbcfe0
>> Normal: 58bbcfb0, If double scaled: 58bbd020
>> ...
>> ...
>> Normal: 58c517b0, If double scaled: 59061020
>>
>>
>> As seen from the logs and crash dump, double scaled version of target
>> address (*If double scaled: 59061020*) is the same with the problematic
>> address (*siginfo: ExceptionCode=0xc0000005, reading address
>> 0x0000000059061020*) that causes to crash while accessing it.
>>
>> So I think, it is obvious that the crash is caused by wrong optimization
>> of index value since index is scaled two times (for *Unsafe::put* and
>> *Unsafe::get*) instead of only one time. Then double scaled index points
>> to invalid memory address.
>>
>> Regards.
>>
>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan Özal <serkan at hazelcast.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I had dived into the issue with JDK-HotSpot commits and
>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>
>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>> }
>>>
>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>> }
>>>
>>>
>>> So I run the test by calculating address as
>>> - *"int * long"* (int is index and long is 8l)
>>> - *"long * long"* (the first long is index and the second long is 8l)
>>> - *"int * int"* (the first int is index and the second int is 8)
>>>
>>> Here are the logs:
>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>>
>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>>> same *"base"* and *"index"* instructions.
>>> This means that address is scaled one more time because there should be only one scale.
>>>
>>>
>>> When I debugged the non-problematic run (*"int * int"*),
>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>>> So there is no scaling.
>>> static bool match_index_and_scale(Instruction*  instr,
>>>                                   Instruction** index,
>>>                                   int*          log2_scale) {
>>>   ...
>>>
>>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>>   if (arith != NULL) {
>>>      ...
>>>   }
>>>
>>>   return false;
>>> }
>>>
>>>
>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>   Instruction* base = NULL;
>>>   Instruction* index = NULL;
>>>   int          log2_scale;
>>>
>>>   if (match(x, &base, &index, &log2_scale)) {
>>>     x->set_base(base);
>>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>>     if (index != NULL) {
>>>       if (index->is_pinned()) {
>>>         log2_scale = 0;
>>>       } else {
>>>         if (log2_scale != 0) {
>>>           index->pin();
>>>         }
>>>       }
>>>     }    // /////////////////////////////
>>>     x->set_log2_scale(log2_scale);
>>>     if (PrintUnsafeOptimization) {
>>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>     }
>>>   }
>>> }
>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>>
>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>>
>>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>>
>>> Regards.
>>>
>>> --
>>>
>>> Serkan ÖZAL
>>>
>>>
>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>>> converted to
>>>> long address = baseAddress + (i * 8)
>>>> test passes. Only difference is next long pointer is calculated using
>>>> integer 8 instead of long 8.
>>>> ```
>>>> for (int i = 0; i < count; i++) {
>>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>>> of long 8
>>>>     long expected = i;
>>>>     unsafe.putLong(address, expected);
>>>>     long actual = unsafe.getLong(address);
>>>>     if (expected != actual) {
>>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>>> actual);
>>>>     }
>>>> }
>>>> ```
>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>>> >* Hi all,
>>>> *>
>>>> >* While I was testing my app using java 8, I encountered the previously
>>>> *>* reported sun.misc.Unsafe issue.
>>>> *>
>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>>> *>
>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>>> *>
>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>>> *>* "1.9.0-ea-b67".
>>>> *>
>>>> >* Test is very simple:
>>>> *>
>>>> >* ```
>>>> *>* public static void main(String[] args) throws Exception {
>>>> *>*         Unsafe unsafe = findUnsafe();
>>>> *>*         // 10000 pass
>>>> *>*         // 100000 jvm crash
>>>> *>*         // 1000000 fail
>>>> *>*         int count = 100000;
>>>> *>*         long size = count * 8L;
>>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>>> *>
>>>> >*         try {
>>>> *>*             for (int i = 0; i < count; i++) {
>>>> *>*                 long address = baseAddress + (i * 8L);
>>>> *>
>>>> >*                 long expected = i;
>>>> *>*                 unsafe.putLong(address, expected);
>>>> *>
>>>> >*                 long actual = unsafe.getLong(address);
>>>> *>
>>>> >*                 if (expected != actual) {
>>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>>> *>* Actual: " + actual);
>>>> *>*                 }
>>>> *>*             }
>>>> *>*         } finally {
>>>> *>*             unsafe.freeMemory(baseAddress);
>>>> *>*         }
>>>> *>*     }
>>>> *>* ```
>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>>> *>* failing constantly.
>>>> *>
>>>> >* - With iteration count 10000, test is passing.
>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>>> *>
>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>>> *>* failing at all.
>>>> *>
>>>> >* I tested on platforms:
>>>> *>* - Centos-7/openjdk-1.8.0.45
>>>> *>* - OSX/oraclejdk-1.8.0.40
>>>> *>* - OSX/oraclejdk-1.8.0.45
>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>>> *>
>>>> >* Previous issue comment (
>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>>> *>* both are failing.
>>>> *>
>>>> >* I'm looking forward to hearing from you.
>>>> *>
>>>> >* Thanks,
>>>> *>* -Mehmet Dogan-
>>>> *>* --
>>>> *>
>>>> >* @mmdogan
>>>> *>
>>>
>>>
>>> --
>>> Serkan ÖZAL
>>> Remotest Software Engineer
>>> GSM: +90 542 680 39 18
>>> Twitter: @serkan_ozal
>>>
>>
>>
>>
>> --
>> Serkan ÖZAL
>> Remotest Software Engineer
>> GSM: +90 542 680 39 18
>> Twitter: @serkan_ozal
>>
>
>
>
> --
> Serkan ÖZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150712/bd2e2e40/attachment-0001.html>