Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV

Fri Jul 17 20:27:31 UTC 2015

It is in released few days ago JDK 8u51:

http://www.oracle.com/technetwork/java/javase/8u51-relnotes-2587590.html

Regards,
Vladimir

On 7/17/15 12:49 PM, Serkan Özal wrote:
> Hi John,
>
> Yes, I have applied your fix and it works.
> Thanks!
>
> Since which JDK version this patch will be there?
>
> Regards.
>
> On Fri, Jul 17, 2015 at 10:31 PM, John Rose <john.r.rose at oracle.com
> <mailto:john.r.rose at oracle.com>> wrote:
>
>     Thanks Serkan and Martijn for reporting and analyzing this.
>
>     We had a very similar bug reported internally, and we just
>     integrated a fix:
>     http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7
>
>     Would you mind checking if it fixes your problem also?
>
>     Best wishes,
>     — John
>
>     On Jul 12, 2015, at 5:07 AM, Serkan Özal <serkan at hazelcast.com
>     <mailto:serkan at hazelcast.com>> wrote:
>>
>>     Hi Martjin,
>>
>>     Thanks for your interest and comment for making this thread a
>>     little bit more hot.
>>
>>
>>     From my previous message
>>     (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html):
>>
>>         I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>
>>
>>         void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>
>>         if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>
>>         tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id
>>         %d, index = id %d, log2_scale = %d",
>>
>>         x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>
>>         }
>>
>>
>>         void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>
>>         if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>
>>         tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id
>>         %d, index = id %d, log2_scale = %d",
>>
>>         x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>
>>         }
>>
>>
>>
>>         So I run the test by calculating address as:
>>
>>         - *"int * long"* (int is index and long is 8l)
>>
>>         - *"long * long"* (the first long is index and the second long
>>         is 8l)
>>
>>         - *"int * int"* (the first int is index and the second int is 8)
>>
>>         Here are the logs:
>>
>>
>>         *int * long:*
>>
>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>         17, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>         19, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>         21, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>         23, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id
>>         27, log2_scale = 3
>>
>>         Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id
>>         27, log2_scale = 3
>>
>>         *long * long:*
>>
>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>         17, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>         19, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>         21, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>         23, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id
>>         14, log2_scale = 3
>>
>>         Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id
>>         14, log2_scale = 3
>>
>>         *int * int:*
>>
>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>         17, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>         19, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>         21, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>         23, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id
>>         29, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id
>>         29, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id
>>         15, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id
>>         15, log2_scale = 0
>>
>>         As you can see, at the problematic runs (*"int * long"* and
>>         *"long * long"*) there are two scaling.
>>
>>         One for *"Unsafe.put"* and the other one is for*"Unsafe.get"*
>>         and these instructions points to
>>
>>         same *"base"* and *"index"* instructions. This means that
>>         address is scaled one more time because there should be only
>>         one scale.
>>
>>
>>
>>     With this fix (or attempt since I am not %100 sure if it is
>>     perfect/optimum way or not), I prevent multiple scaling on the
>>     same index instruction.
>>
>>     Also one of my previous messages
>>     (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html)
>>     shows that there are multiple scaling on the index so when it
>>     scaled multiple, anymore it shows somewhere or anywhere in the memory.
>>
>>     On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg
>>     <martijnverburg at gmail.com <mailto:martijnverburg at gmail.com>> wrote:
>>
>>         Non reviewer here, but I'd add to the comment *why* you don't
>>         want to scale again.
>>
>>         Cheers,
>>         Martijn
>>
>>         On 12 July 2015 at 11:29, Serkan Özal <serkan at hazelcast.com
>>         <mailto:serkan at hazelcast.com>> wrote:
>>
>>             Hi all,
>>
>>             I have created a webrev for review including the patch and
>>             shared for public access from here:
>>             https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>
>>             Regards.
>>
>>             On Sat, Jul 4, 2015 at 9:06 PM, Serkan Özal
>>             <serkan at hazelcast.com <mailto:serkan at hazelcast.com>> wrote:
>>
>>                 Hi,
>>
>>                 I have added some logs to show that problem is caused
>>                 by double scaling of offset (index)
>>
>>                 Here is my updated (log messages added) reproducer code:
>>
>>
>>                 int count = 100000;
>>                 long size = count * 8L;
>>                 long baseAddress = unsafe.allocateMemory(size);
>>                 System.out.println("Start address: " +
>>                 Long.toHexString(baseAddress) +
>>                                    ", End address: " +
>>                 Long.toHexString(baseAddress + size));
>>
>>                 for (int i = 0; i < count; i++) {
>>                     long address = baseAddress + (i * 8L);
>>                     System.out.println(
>>                         "Normal: " + Long.toHexString(address) + ", " +
>>                         "If double scaled: " +
>>                 Long.toHexString(baseAddress + (i * 8L * 8L)));
>>                     long expected = i;
>>                     unsafe.putLong(address, expected);
>>                     unsafe.getLong(address);
>>                 }
>>
>>
>>                 After sometime it crashes as
>>
>>
>>                 ...
>>                 Current thread (0x0000000002068800):  JavaThread
>>                 "main" [_thread_in_Java, id=10412,
>>                 stack(0x00000000023f0000,0x00000000024f0000)]
>>
>>                 siginfo: ExceptionCode=0xc0000005, reading address
>>                 0x0000000059061020
>>                 ...
>>                 ...
>>
>>
>>                 And here is output of the execution until crash:
>>
>>                 Start address: 58bbcfa0, End address: 58c804a0
>>                 Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>                 Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>                 Normal: 58bbcfb0, If double scaled: 58bbd020
>>                 ...
>>                 ...
>>                 Normal: 58c517b0, If double scaled: 59061020
>>
>>
>>                 As seen from the logs and crash dump, double scaled
>>                 version of target address (*If double scaled:
>>                 59061020*) is the same with the problematic address
>>                 (*siginfo: ExceptionCode=0xc0000005, reading address
>>                 0x0000000059061020*) that causes to crash while
>>                 accessing it.
>>
>>                 So I think, it is obvious that the crash is caused by
>>                 wrong optimization of index value since index is
>>                 scaled two times (for *Unsafe::put* and *Unsafe::get*)
>>                 instead of only one time. Then double scaled index
>>                 points to invalid memory address.
>>
>>                 Regards.
>>
>>                 On Sun, Jun 14, 2015 at 2:39 PM, Serkan Özal
>>                 <serkan at hazelcast.com <mailto:serkan at hazelcast.com>>
>>                 wrote:
>>
>>                     Hi all, I had dived into the issue with
>>                     JDK-HotSpot commits and the issue arised after
>>                     this commit:
>>                     http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>                     Then I added some additional logs to
>>                     *"vm/c1/c1_Canonicalizer.cpp"*: void
>>                     Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>                     if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>                     tty->print_cr("Canonicalizer: do_UnsafeGetRaw id
>>                     %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(),
>>                     x->log2_scale()); } void
>>                     Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>                     if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>                     tty->print_cr("Canonicalizer: do_UnsafePutRaw id
>>                     %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(),
>>                     x->log2_scale()); }
>>
>>                     So I run the test by calculating address as -
>>                     *"int * long"* (int is index and long is 8l) -
>>                     *"long * long"* (the first long is index and the
>>                     second long is 8l) - *"int * int"* (the first int
>>                     is index and the second int is 8) Here are the
>>                     logs: *int * long:* Canonicalizer: do_UnsafeGetRaw
>>                     id 18: base = id 16, index = id 17, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 20: base = id
>>                     16, index = id 19, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 22: base = id 16, index = id
>>                     21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 24: base = id 16, index = id 23, log2_scale = 0
>>                     Canonicalizer: do_UnsafePutRaw id 33: base = id
>>                     13, index = id 27, log2_scale = 3 Canonicalizer:
>>                     do_UnsafeGetRaw id 36: base = id 13, index = id
>>                     27, log2_scale = 3*long * long:* Canonicalizer:
>>                     do_UnsafeGetRaw id 18: base = id 16, index = id
>>                     17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 20: base = id 16, index = id 19, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 22: base = id
>>                     16, index = id 21, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 24: base = id 16, index = id
>>                     23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw
>>                     id 35: base = id 13, index = id 14, log2_scale = 3
>>                     Canonicalizer: do_UnsafeGetRaw id 37: base = id
>>                     13, index = id 14, log2_scale = 3*int * int:*
>>                     Canonicalizer: do_UnsafeGetRaw id 18: base = id
>>                     16, index = id 17, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 20: base = id 16, index = id
>>                     19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 22: base = id 16, index = id 21, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 24: base = id
>>                     16, index = id 23, log2_scale = 0 Canonicalizer:
>>                     do_UnsafePutRaw id 33: base = id 13, index = id
>>                     29, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 36: base = id 13, index = id 29, log2_scale = 0
>>                     Canonicalizer: do_UnsafePutRaw id 19: base = id 8,
>>                     index = id 15, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 22: base = id 8, index = id 15,
>>                     log2_scale = 0As you can see, at the problematic
>>                     runs (*"int * long"* and *"long * long"*) there
>>                     are two scaling. One for *"Unsafe.put"* and the
>>                     other one is for*"Unsafe.get"* and these
>>                     instructions points to same *"base"* and *"index"*
>>                     instructions. This means that address is scaled
>>                     one more time because there should be only one scale.
>>
>>                     When I debugged the non-problematic run (*"int *
>>                     int"*), I saw that *"instr->as_ArithmeticOp();"*
>>                     is always returns *"null" *then
>>                     *"match_index_and_scale"* method returns*"false"*
>>                     always. So there is no scaling. static bool
>>                     match_index_and_scale(Instruction* instr,
>>                     Instruction** index, int* log2_scale) { ...
>>                     ArithmeticOp* arith = instr->as_ArithmeticOp(); if
>>                     (arith != NULL) { ... } return false; }
>>
>>                     Then I have added my fix attempt to prevent
>>                     multiple scaling for Unsafe instructions points to
>>                     same index instruction like this: void
>>                     Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>                     Instruction* base = NULL; Instruction* index =
>>                     NULL; int log2_scale; if (match(x, &base, &index,
>>                     &log2_scale)) { x->set_base(base);
>>                     x->set_index(index); // The fix attempt here //
>>                     ///////////////////////////// if (index != NULL) {
>>                     if (index->is_pinned()) { log2_scale = 0; } else {
>>                     if (log2_scale != 0) { index->pin(); } } } //
>>                     /////////////////////////////
>>                     x->set_log2_scale(log2_scale); if
>>                     (PrintUnsafeOptimization) {
>>                     tty->print_cr("Canonicalizer: UnsafeRawOp id %d:
>>                     base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(),
>>                     x->log2_scale()); } } } In this fix attempt, if
>>                     there is a scaling for the Unsafe instruction, I
>>                     pin index instruction of that instruction and at
>>                     next calls, if the index instruction is pinned, I
>>                     assummed that there is already scaling so no need
>>                     to another scaling. After this fix, I rerun the
>>                     problematic test (*"int * long"*) and it works
>>                     with these logs: *int * long (after fix):*
>>                     Canonicalizer: do_UnsafeGetRaw id 18: base = id
>>                     16, index = id 17, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 20: base = id 16, index = id
>>                     19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 22: base = id 16, index = id 21, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 24: base = id
>>                     16, index = id 23, log2_scale = 0 Canonicalizer:
>>                     do_UnsafePutRaw id 35: base = id 13, index = id
>>                     14, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw
>>                     id 37: base = id 13, index = id 14, log2_scale = 0
>>                     Canonicalizer: do_UnsafePutRaw id 21: base = id 8,
>>                     index = id 11, log2_scale = 3 Canonicalizer:
>>                     do_UnsafeGetRaw id 23: base = id 8, index = id 11,
>>                     log2_scale = 0I am not sure my fix attempt is a
>>                     really fix or maybe there are better fixes.
>>                     Regards. -- Serkan ÖZAL
>>
>>                         Btw, (thanks to one my colleagues), when
>>                         address calculation in the loop is
>>                         converted to long address = baseAddress + (i *
>>                         8) test passes. Only difference is next long
>>                         pointer is calculated using
>>                         integer 8 instead of long 8. ```
>>                         for (int i = 0; i < count; i++) {
>>                         long address = baseAddress + (i * 8); // <---
>>                         here, integer 8 instead
>>                         of long 8 long expected = i;
>>                         unsafe.putLong(address, expected); long actual
>>                         = unsafe.getLong(address); if (expected !=
>>                         actual) {
>>                         throw new AssertionError("Expected: " +
>>                         expected + ", Actual: " +
>>                         actual);
>>                         }
>>                         }
>>                         ``` On Tue, Jun 9, 2015 at 1:07 PM Mehmet
>>                         Dogan <mehmet at hazelcast.com
>>                         <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>>
>>                         wrote: >/Hi all, />
>>                         >/While I was testing my app using java 8, I
>>                         encountered the previously />/reported
>>                         sun.misc.Unsafe issue. />
>>                         >/https://bugs.openjdk.java.net/browse/JDK-8076445
>>                         />
>>                         >/http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html
>>                         />
>>                         >/Issue status says it's resolved with
>>                         resolution "Cannot Reproduce". But
>>                         />/unfortunately it's still reproducible using
>>                         "1.8.0_60-ea-b18" and />/"1.9.0-ea-b67". />
>>                         >/Test is very simple: />
>>                         >/``` />/public static void main(String[]
>>                         args) throws Exception { />/Unsafe unsafe =
>>                         findUnsafe(); />/// 10000 pass />/// 100000
>>                         jvm crash />/// 1000000 fail />/int count =
>>                         100000; />/long size = count * 8L; />/long
>>                         baseAddress = unsafe.allocateMemory(size); />
>>                         >/try { />/for (int i = 0; i < count; i++) {
>>                         />/long address = baseAddress + (i * 8L); />
>>                         >/long expected = i;
>>                         />/unsafe.putLong(address, expected); />
>>                         >/long actual = unsafe.getLong(address); />
>>                         >/if (expected != actual) { />/throw new
>>                         AssertionError("Expected: " + expected + ",
>>                         />/Actual: " + actual); />/} />/} />/} finally
>>                         { />/unsafe.freeMemory(baseAddress); />/} />/}
>>                         />/``` />/It's not failing up to version
>>                         1.8.0.31, by starting 1.8.0.40 test is
>>                         />/failing constantly. />
>>                         >/- With iteration count 10000, test is
>>                         passing. />/- With iteration count 100000, jvm
>>                         is crashing with SIGSEGV. />/- With iteration
>>                         count 1000000, test is failing with
>>                         AssertionError. />
>>                         >/When one of compilation (-Xint) or inlining
>>                         (-XX:-Inline) or />/on-stack-replacement
>>                         (-XX:-UseOnStackReplacement) is disabled, test
>>                         is not />/failing at all. />
>>                         >/I tested on platforms: />/-
>>                         Centos-7/openjdk-1.8.0.45 />/-
>>                         OSX/oraclejdk-1.8.0.40 />/-
>>                         OSX/oraclejdk-1.8.0.45 />/-
>>                         OSX/oraclejdk-1.8.0_60-ea-b18 />/-
>>                         OSX/oraclejdk-1.9.0-ea-b67 />
>>                         >/Previous issue comment (
>>                         />/https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043)
>>                         />/says "Cannot reproduce based on the latest
>>                         version". I hope that latest />/version is not
>>                         mentioning to '1.8.0_60-ea-b18' or
>>                         '1.9.0-ea-b67'. Because />/both are failing. />
>>                         >/I'm looking forward to hearing from you. />
>>                         >/Thanks, />/-Mehmet Dogan- />/-- />
>>                         >/@mmdogan />
>>
>>
>>                     --
>>                     Serkan ÖZAL
>>                     Remotest Software Engineer
>>                     GSM: +90 542 680 39 18
>>                     <tel:%2B90%20542%20680%2039%2018>
>>                     Twitter: @serkan_ozal
>>
>>
>>
>>
>>                 --
>>                 Serkan ÖZAL
>>                 Remotest Software Engineer
>>                 GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>                 Twitter: @serkan_ozal
>>
>>
>>
>>
>>             --
>>             Serkan ÖZAL
>>             Remotest Software Engineer
>>             GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>             Twitter: @serkan_ozal
>>
>>
>>
>>
>>
>>     --
>>     Serkan ÖZAL
>>     Remotest Software Engineer
>>     GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>     Twitter: @serkan_ozal
>
>
>
>
> --
> Serkan ÖZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal