Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV

Sun Jun 14 11:39:05 UTC 2015

Hi all,

I had dived into the issue with JDK-HotSpot commits and
the issue arised after this commit:
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a

Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
  if (OptimizeUnsafes) do_UnsafeRawOp(x);
  tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d,
index = id %d, log2_scale = %d",
                    x->id(), x->base()->id(), x->index()->id(),
x->log2_scale());
}

void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
  if (OptimizeUnsafes) do_UnsafeRawOp(x);
  tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d,
index = id %d, log2_scale = %d",
                    x->id(), x->base()->id(), x->index()->id(),
x->log2_scale());
}

So I run the test by calculating address as
- *"int * long"* (int is index and long is 8l)
- *"long * long"* (the first long is index and the second long is 8l)
- *"int * int"* (the first int is index and the second int is 8)

Here are the logs:
*int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index
= id 17, log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0
Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27,
log2_scale = 3
Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27,
log2_scale = 3
*long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16,
index = id 17, log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0
Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14,
log2_scale = 3
Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14,
log2_scale = 3
*int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index
= id 17, log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0
Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29,
log2_scale = 0
Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0

As you can see, at the problematic runs (*"int * long"* and *"long *
long"*) there are two scaling.
One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and
these instructions points to
same *"base"* and *"index"* instructions.
This means that address is scaled one more time because there should
be only one scale.

When I debugged the non-problematic run (*"int * int"*),
I saw that *"instr->as_ArithmeticOp();"* is always returns *"null"
*then *"match_index_and_scale"* method returns* "false"* always.
So there is no scaling.
static bool match_index_and_scale(Instruction*  instr,
                                  Instruction** index,
                                  int*          log2_scale) {
  ...

  ArithmeticOp* arith = instr->as_ArithmeticOp();
  if (arith != NULL) {
     ...
  }

  return false;
}

Then I have added my fix attempt to prevent multiple scaling for
Unsafe instructions points to same index instruction like this:
void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
  Instruction* base = NULL;
  Instruction* index = NULL;
  int          log2_scale;

  if (match(x, &base, &index, &log2_scale)) {
    x->set_base(base);
    x->set_index(index);    // The fix attempt here    //
/////////////////////////////
    if (index != NULL) {
      if (index->is_pinned()) {
        log2_scale = 0;
      } else {
        if (log2_scale != 0) {
          index->pin();
        }
      }
    }    // /////////////////////////////
    x->set_log2_scale(log2_scale);
    if (PrintUnsafeOptimization) {
      tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d,
index = id %d, log2_scale = %d",
                    x->id(), x->base()->id(), x->index()->id(),
x->log2_scale());
    }
  }
}
In this fix attempt, if there is a scaling for the Unsafe instruction,
I pin index instruction of that instruction
and at next calls, if the index instruction is pinned, I assummed that
there is already scaling so no need to another scaling.

After this fix, I rerun the problematic test (*"int * long"*) and it
works with these logs:
*int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base =
id 16, index = id 17, log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0
Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0
Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14,
log2_scale = 3
Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14,
log2_scale = 0
Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0

I am not sure my fix attempt is a really fix or maybe there are better fixes.

Regards.

--

Serkan ÖZAL

> Btw, (thanks to one my colleagues), when address calculation in the loop is
> converted to
> long address = baseAddress + (i * 8)
> test passes. Only difference is next long pointer is calculated using
> integer 8 instead of long 8.
> ```
> for (int i = 0; i < count; i++) {
>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
> of long 8
>     long expected = i;
>     unsafe.putLong(address, expected);
>     long actual = unsafe.getLong(address);
>     if (expected != actual) {
>         throw new AssertionError("Expected: " + expected + ", Actual: " +
> actual);
>     }
> }
> ```
> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
> >* Hi all,
> *>
> >* While I was testing my app using java 8, I encountered the previously
> *>* reported sun.misc.Unsafe issue.
> *>
> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
> *>
> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
> *>
> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
> *>* "1.9.0-ea-b67".
> *>
> >* Test is very simple:
> *>
> >* ```
> *>* public static void main(String[] args) throws Exception {
> *>*         Unsafe unsafe = findUnsafe();
> *>*         // 10000 pass
> *>*         // 100000 jvm crash
> *>*         // 1000000 fail
> *>*         int count = 100000;
> *>*         long size = count * 8L;
> *>*         long baseAddress = unsafe.allocateMemory(size);
> *>
> >*         try {
> *>*             for (int i = 0; i < count; i++) {
> *>*                 long address = baseAddress + (i * 8L);
> *>
> >*                 long expected = i;
> *>*                 unsafe.putLong(address, expected);
> *>
> >*                 long actual = unsafe.getLong(address);
> *>
> >*                 if (expected != actual) {
> *>*                     throw new AssertionError("Expected: " + expected + ",
> *>* Actual: " + actual);
> *>*                 }
> *>*             }
> *>*         } finally {
> *>*             unsafe.freeMemory(baseAddress);
> *>*         }
> *>*     }
> *>* ```
> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
> *>* failing constantly.
> *>
> >* - With iteration count 10000, test is passing.
> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
> *>* - With iteration count 1000000, test is failing with AssertionError.
> *>
> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
> *>* failing at all.
> *>
> >* I tested on platforms:
> *>* - Centos-7/openjdk-1.8.0.45
> *>* - OSX/oraclejdk-1.8.0.40
> *>* - OSX/oraclejdk-1.8.0.45
> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
> *>* - OSX/oraclejdk-1.9.0-ea-b67
> *>
> >* Previous issue comment (
> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
> *>* says "Cannot reproduce based on the latest version". I hope that latest
> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
> *>* both are failing.
> *>
> >* I'm looking forward to hearing from you.
> *>
> >* Thanks,
> *>* -Mehmet Dogan-
> *>* --
> *>
> >* @mmdogan
> *>

-- 
Serkan ÖZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150614/876929f1/attachment-0001.html>