Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV
Serkan Özal
serkan at hazelcast.com
Fri Jul 17 19:49:38 UTC 2015
Hi John,
Yes, I have applied your fix and it works.
Thanks!
Since which JDK version this patch will be there?
Regards.
On Fri, Jul 17, 2015 at 10:31 PM, John Rose <john.r.rose at oracle.com> wrote:
> Thanks Serkan and Martijn for reporting and analyzing this.
>
> We had a very similar bug reported internally, and we just integrated a
> fix:
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7
>
> Would you mind checking if it fixes your problem also?
>
> Best wishes,
> — John
>
> On Jul 12, 2015, at 5:07 AM, Serkan Özal <serkan at hazelcast.com> wrote:
>
>
> Hi Martjin,
>
> Thanks for your interest and comment for making this thread a little bit
> more hot.
>
>
> From my previous message (
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html
> ):
>
> I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>
>
> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>
> if (OptimizeUnsafes) do_UnsafeRawOp(x);
>
> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>
> x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>
> }
>
>
> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>
> if (OptimizeUnsafes) do_UnsafeRawOp(x);
>
> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>
> x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>
> }
>
>
>
> So I run the test by calculating address as:
>
> - *"int * long"* (int is index and long is 8l)
>
> - *"long * long"* (the first long is index and the second long is 8l)
>
> - *"int * int"* (the first int is index and the second int is 8)
>
> Here are the logs:
>
>
> *int * long:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>
> *long * long:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>
> *int * int:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>
> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>
> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>
> same *"base"* and *"index"* instructions. This means that address is scaled one more time because there should be only one scale.
>
>
>
> With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction.
>
> Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory.
>
>
> On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg <martijnverburg at gmail.com
> > wrote:
>
>> Non reviewer here, but I'd add to the comment *why* you don't want to
>> scale again.
>>
>> Cheers,
>> Martijn
>>
>> On 12 July 2015 at 11:29, Serkan Özal <serkan at hazelcast.com> wrote:
>>
>>> Hi all,
>>>
>>> I have created a webrev for review including the patch and shared for
>>> public access from here:
>>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>>
>>> Regards.
>>>
>>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan Özal <serkan at hazelcast.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have added some logs to show that problem is caused by double scaling
>>>> of offset (index)
>>>>
>>>> Here is my updated (log messages added) reproducer code:
>>>>
>>>>
>>>> int count = 100000;
>>>> long size = count * 8L;
>>>> long baseAddress = unsafe.allocateMemory(size);
>>>> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>>>> ", End address: " + Long.toHexString(baseAddress +
>>>> size));
>>>>
>>>> for (int i = 0; i < count; i++) {
>>>> long address = baseAddress + (i * 8L);
>>>> System.out.println(
>>>> "Normal: " + Long.toHexString(address) + ", " +
>>>> "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
>>>> 8L)));
>>>> long expected = i;
>>>> unsafe.putLong(address, expected);
>>>> unsafe.getLong(address);
>>>> }
>>>>
>>>>
>>>> After sometime it crashes as
>>>>
>>>>
>>>> ...
>>>> Current thread (0x0000000002068800): JavaThread "main"
>>>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>>>>
>>>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
>>>> ...
>>>> ...
>>>>
>>>>
>>>> And here is output of the execution until crash:
>>>>
>>>> Start address: 58bbcfa0, End address: 58c804a0
>>>> Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>>> Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>>> Normal: 58bbcfb0, If double scaled: 58bbd020
>>>> ...
>>>> ...
>>>> Normal: 58c517b0, If double scaled: 59061020
>>>>
>>>>
>>>> As seen from the logs and crash dump, double scaled version of target
>>>> address (*If double scaled: 59061020*) is the same with the
>>>> problematic address (*siginfo: ExceptionCode=0xc0000005, reading
>>>> address 0x0000000059061020*) that causes to crash while accessing it.
>>>>
>>>> So I think, it is obvious that the crash is caused by wrong
>>>> optimization of index value since index is scaled two times (for
>>>> *Unsafe::put* and *Unsafe::get*) instead of only one time. Then double
>>>> scaled index points to invalid memory address.
>>>>
>>>> Regards.
>>>>
>>>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan Özal <serkan at hazelcast.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had dived into the issue with JDK-HotSpot commits and
>>>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>>>
>>>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>>> if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>>
>>>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>>> if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>>
>>>>>
>>>>> So I run the test by calculating address as
>>>>> - *"int * long"* (int is index and long is 8l)
>>>>> - *"long * long"* (the first long is index and the second long is 8l)
>>>>> - *"int * int"* (the first int is index and the second int is 8)
>>>>>
>>>>> Here are the logs:
>>>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>>>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>>>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>>>>
>>>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>>>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>>>>> same *"base"* and *"index"* instructions.
>>>>> This means that address is scaled one more time because there should be only one scale.
>>>>>
>>>>>
>>>>> When I debugged the non-problematic run (*"int * int"*),
>>>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>>>>> So there is no scaling.
>>>>> static bool match_index_and_scale(Instruction* instr,
>>>>> Instruction** index,
>>>>> int* log2_scale) {
>>>>> ...
>>>>>
>>>>> ArithmeticOp* arith = instr->as_ArithmeticOp();
>>>>> if (arith != NULL) {
>>>>> ...
>>>>> }
>>>>>
>>>>> return false;
>>>>> }
>>>>>
>>>>>
>>>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>>>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>>> Instruction* base = NULL;
>>>>> Instruction* index = NULL;
>>>>> int log2_scale;
>>>>>
>>>>> if (match(x, &base, &index, &log2_scale)) {
>>>>> x->set_base(base);
>>>>> x->set_index(index); // The fix attempt here // /////////////////////////////
>>>>> if (index != NULL) {
>>>>> if (index->is_pinned()) {
>>>>> log2_scale = 0;
>>>>> } else {
>>>>> if (log2_scale != 0) {
>>>>> index->pin();
>>>>> }
>>>>> }
>>>>> } // /////////////////////////////
>>>>> x->set_log2_scale(log2_scale);
>>>>> if (PrintUnsafeOptimization) {
>>>>> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>> }
>>>>> }
>>>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>>>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>>>>
>>>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>>>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>>>>
>>>>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>>>>
>>>>> Regards.
>>>>>
>>>>> --
>>>>>
>>>>> Serkan ÖZAL
>>>>>
>>>>>
>>>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>>>>> converted to
>>>>>> long address = baseAddress + (i * 8)
>>>>>> test passes. Only difference is next long pointer is calculated using
>>>>>> integer 8 instead of long 8.
>>>>>> ```
>>>>>> for (int i = 0; i < count; i++) {
>>>>>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>>>>> of long 8
>>>>>> long expected = i;
>>>>>> unsafe.putLong(address, expected);
>>>>>> long actual = unsafe.getLong(address);
>>>>>> if (expected != actual) {
>>>>>> throw new AssertionError("Expected: " + expected + ", Actual: " +
>>>>>> actual);
>>>>>> }
>>>>>> }
>>>>>> ```
>>>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>>>>> >* Hi all,
>>>>>> *>
>>>>>> >* While I was testing my app using java 8, I encountered the previously
>>>>>> *>* reported sun.misc.Unsafe issue.
>>>>>> *>
>>>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>>>>> *>
>>>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>>>>> *>
>>>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But
>>>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>>>>> *>* "1.9.0-ea-b67".
>>>>>> *>
>>>>>> >* Test is very simple:
>>>>>> *>
>>>>>> >* ```
>>>>>> *>* public static void main(String[] args) throws Exception {
>>>>>> *>* Unsafe unsafe = findUnsafe();
>>>>>> *>* // 10000 pass
>>>>>> *>* // 100000 jvm crash
>>>>>> *>* // 1000000 fail
>>>>>> *>* int count = 100000;
>>>>>> *>* long size = count * 8L;
>>>>>> *>* long baseAddress = unsafe.allocateMemory(size);
>>>>>> *>
>>>>>> >* try {
>>>>>> *>* for (int i = 0; i < count; i++) {
>>>>>> *>* long address = baseAddress + (i * 8L);
>>>>>> *>
>>>>>> >* long expected = i;
>>>>>> *>* unsafe.putLong(address, expected);
>>>>>> *>
>>>>>> >* long actual = unsafe.getLong(address);
>>>>>> *>
>>>>>> >* if (expected != actual) {
>>>>>> *>* throw new AssertionError("Expected: " + expected + ",
>>>>>> *>* Actual: " + actual);
>>>>>> *>* }
>>>>>> *>* }
>>>>>> *>* } finally {
>>>>>> *>* unsafe.freeMemory(baseAddress);
>>>>>> *>* }
>>>>>> *>* }
>>>>>> *>* ```
>>>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>>>>> *>* failing constantly.
>>>>>> *>
>>>>>> >* - With iteration count 10000, test is passing.
>>>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>>>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>>>>> *>
>>>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>>>>> *>* failing at all.
>>>>>> *>
>>>>>> >* I tested on platforms:
>>>>>> *>* - Centos-7/openjdk-1.8.0.45
>>>>>> *>* - OSX/oraclejdk-1.8.0.40
>>>>>> *>* - OSX/oraclejdk-1.8.0.45
>>>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>>>>> *>
>>>>>> >* Previous issue comment (
>>>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>>>>> *>* both are failing.
>>>>>> *>
>>>>>> >* I'm looking forward to hearing from you.
>>>>>> *>
>>>>>> >* Thanks,
>>>>>> *>* -Mehmet Dogan-
>>>>>> *>* --
>>>>>> *>
>>>>>> >* @mmdogan
>>>>>> *>
>>>>>
>>>>>
>>>>> --
>>>>> Serkan ÖZAL
>>>>> Remotest Software Engineer
>>>>> GSM: +90 542 680 39 18
>>>>> Twitter: @serkan_ozal
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Serkan ÖZAL
>>>> Remotest Software Engineer
>>>> GSM: +90 542 680 39 18
>>>> Twitter: @serkan_ozal
>>>>
>>>
>>>
>>>
>>> --
>>> Serkan ÖZAL
>>> Remotest Software Engineer
>>> GSM: +90 542 680 39 18
>>> Twitter: @serkan_ozal
>>>
>>
>>
>
>
> --
> Serkan ÖZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>
>
>
--
Serkan ÖZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150717/fd04d123/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list