Undefined behaviour in hotspot

Wed Apr 23 17:47:05 UTC 2014

Hi,

this didn't let me sleep so I did some deeper investigation. The outcome is
that I'm pretty sure this is a bug in gcc 4.9. Following the details:

It seems that 'MacroAssembler::jump_cc(Assembler::Condition,
AddressLiteral)' is getting compiled incorrectly. Looking at the assembly
of this function in gdb shows the following:

   0xf7195db0
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral>:
 push   %ebp
   0xf7195db1
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+1>:
 mov    %esp,%ebp
   0xf7195db3
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+3>:
 push   %esi
   0xf7195db4
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+4>:
 push   %ebx
   0xf7195db5
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+5>:
 call   0xf67a88e5 <__x86.get_pc_thunk.bx>
   0xf7195dba
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+10>:
 add    $0x7b61de,%ebx
   0xf7195dc0
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+16>:
 mov    0x8(%ebp),%esi
   0xf7195dc3
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+19>:
 mov    0xc(%esi),%eax
   0xf7195dc6
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+22>:
 mov    0x4(%eax),%edx
   0xf7195dc9
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+25>:
 test   %edx,%edx
   0xf7195dcb
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+27>:
 je     0xf7195df7
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+71>
   0xf7195dcd
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+29>:
 lea    -0x386e28(%ebx),%eax
   0xf7195dd3
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+35>:
 push   %eax
   0xf7195dd4
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+36>:
 lea    -0x386884(%ebx),%eax
   0xf7195dda
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+42>:
 push   %eax
   0xf7195ddb
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+43>:
 lea    -0x3872f0(%ebx),%eax
   0xf7195de1
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+49>:
 push   $0xe3
   0xf7195de6
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+54>:
 push   %eax
   0xf7195de7
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+55>:
 call   0xf6c9b730 <_Z15report_vm_errorPKciS0_S0_>
   0xf7195dec
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+60>:
 call   0xf72bc1c0 <breakpoint>
   0xf7195df1
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+65>:
 mov    0xc(%esi),%eax
   0xf7195df4
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+68>:
 add    $0x10,%esp
   0xf7195df7
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+71>:
 mov    0x8(%eax),%edx
   0xf7195dfa
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+74>:
 mov    %edx,0x4(%eax)
   0xf7195dfd:    nop
   0xf7195dfe:    xchg   %ax,%ax
   0xf7195e00 <_ZN14MacroAssembler10verify_oopEP12RegisterImplPKc.part.49>:
   push   %ebp
   0xf7195e01
<_ZN14MacroAssembler10verify_oopEP12RegisterImplPKc.part.49+1>:    mov
 %esp,%ebp

Besides the fact that the assembly for this functions is much too small,
you can see that the function has no return statement. Ath the end it just
runs into the following function (in this case
''MacroAssembler::verify_oop(RegisterImpl*, char const*)").

The check at:

   0xf7195dc6 :    mov    0x4(%eax),%edx
   0xf7195dc9 :    test   %edx,%edx

is from the assertion in InstructionMark constructor:

    InstructionMark(AbstractAssembler* assm) : _assm(assm) {
      assert(assm->inst_mark() == NULL, "overlapping instructions");
      _assm->set_inst_mark();
    }

which is at the beginning of MacroAssembler::jump_cc():

void MacroAssembler::jump_cc(Condition cc, AddressLiteral dst) {
  if (reachable(dst)) {
    InstructionMark im(this);

If 'assm->inst_mark()' is not zero, we jump right to 0xf7195df7 where we
set the instruction mark of the code section:

   0xf7195df7 :    mov    0x8(%eax),%edx
   0xf7195dfa :    mov    %edx,0x4(%eax)

'_assm->set_inst_mark()' expands to 'code_section()->set_mark()' which in
turn expands to '{ _mark = _end; }'. In register 'eax' we have the
CodeSection object where the members '_end' and '_mark' have the offsets 8
and 4 respectively. So far so good - everything looks pretty until here.

But afterwards we have two NOPs and then we unconditionally enter
'MacroAssembler::verify_oop()" which is obviously wrong!
Later we indirectly call "Assembler::push(RegisterImpl*)" from
'MacroAssembler::verify_oop()" but at that point we're already lost and we
end up with a wrong CodeSection pointer as described in my previous mail.

Funny enough, the "-fsanitize=undefined" warnings are simply wrong for the
exactly same reason. If we disassemble a version of
'MacroAssembler::jump_cc(Assembler::Condition, AddressLiteral)' wich was
compiled with '-fsanitize=undefined' we will see:

   0xf71935d4 <+100>:    mov    -0x1c(%ebp),%esi
   0xf71935d7 <+103>:    test   %esi,%esi
   0xf71935d9 <+105>:    je     0xf719362f
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+191>
   0xf71935db <+107>:    mov    0xc(%esi),%esi
   0xf71935de <+110>:    test   %esi,%esi
   0xf71935e0 <+112>:    je     0xf719361a
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+170>
   0xf71935e2 <+114>:    mov    0x8(%esi),%edi
   0xf71935e5 <+117>:    test   %esi,%esi
   0xf71935e7 <+119>:    je     0xf7193605
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+149>
   0xf71935e9 <+121>:    mov    %edi,0x4(%esi)
   0xf71935ec <+124>:    cmpl   $0xfffffff8,0x10(%ebp)
   0xf71935f0 <+128>:    je     0xf71935f2
<_ZN14MacroAssembler7jump_ccEN9Assembler9ConditionE14AddressLiteral+130>
   0xf71935f2 <+130>:    push   %eax
   0xf71935f3 <+131>:    push   %eax
   0xf71935f4 <+132>:    lea    0xb750(%ebx),%eax
   0xf71935fa <+138>:    push   $0x0
   0xf71935fc <+140>:    push   %eax
   0xf71935fd <+141>:    call   0xf6797260 <__ubsan_handle_type_mismatch at plt
>
   0xf7193602 <+146>:    add    $0x10,%esp
   0xf7193605 <+149>:    lea    0xb768(%ebx),%eax
   0xf719360b <+155>:    push   %edx
   0xf719360c <+156>:    push   %edx
   0xf719360d <+157>:    push   $0x0
   0xf719360f <+159>:    push   %eax
   0xf7193610 <+160>:    call   0xf6797260 <__ubsan_handle_type_mismatch at plt
>

We now have NULL-checks all over in the code (which doesn't fire in our
case) but after we executed 'code_section()->set_mark()' at address
0xf71935e2 to 0xf71935e9 we unconditionally run into the ubsan ("undefined
behavior sanitizer) erro handlers which print the two warnings (noticed
that I've debugged the code to verify that we didn't jump into the ubsan
handlers after a 'test' instruction).

So to cut a long story short, I've identified '-fdevirtualize' as the
culprit. '-fdevirtualize' is automatically set if we compile with -O2 but
if I compile macroAssembler_x86.o only with '-O2 -fno-devirtualize' I get a
VM which at least starts up and passes some smoke tests. @Omair: could you
please verify this. I haven't tried x86_64 until now so I can not say if
this fix also helps there.

I'll try to prepare a usefully bug-report for GCC although I've no idea how
I could cut this down to a small and reproducible test case.

Regards,
Volker

On Tue, Apr 22, 2014 at 9:11 PM, Volker Simonis <volker.simonis at gmail.com>wrote:

> Hi Omair,
>
> I've downloaded and compiled a brand new version of gcc 4.9.0 (it was
> actually released today!)
>
> I could reproduce your problem on x86. On x86_64 there's a problem as
> well - the VM hangs during the initialization. Funny enough I had no
> problems on ppc64. A freshly compiled VM from jdk9/hs-comp ran without
> any problems. So this seems to be x86 related.
>
> I've further analyzed the problem you described and the good news is
> that it is not related to calling encoding() on a zero pointer - so
> hopefully no need to massively change HotSpot code. The assembly code
> of "Assembler::push(RegisterImpl* src)" looks as follows:
>
> 0xf6959174 <_ZN9Assembler4pushEP12RegisterImpl+20>:    mov
>  0xc(%ebp),%esi
> 0xf6959177 <_ZN9Assembler4pushEP12RegisterImpl+23>:    mov
>  0x8(%ebp),%edi
> 0xf695917a <_ZN9Assembler4pushEP12RegisterImpl+26>:    cmp    $0x7,%esi
> 0xf695917d <_ZN9Assembler4pushEP12RegisterImpl+29>:    jbe
> 0xf69591a3 <_ZN9Assembler4pushEP12RegisterImpl+67>
>
> 0xf695917f <_ZN9Assembler4pushEP12RegisterImpl+31>:    lea
> -0x3a0c4e(%ebx),%eax
> 0xf6959185 <_ZN9Assembler4pushEP12RegisterImpl+37>:    push   %eax
> 0xf6959186 <_ZN9Assembler4pushEP12RegisterImpl+38>:    lea
> -0x3a0c3d(%ebx),%eax
> 0xf695918c <_ZN9Assembler4pushEP12RegisterImpl+44>:    push   %eax
> 0xf695918d <_ZN9Assembler4pushEP12RegisterImpl+45>:    lea
> -0x39be48(%ebx),%eax
> 0xf6959193 <_ZN9Assembler4pushEP12RegisterImpl+51>:    push   $0x41
> 0xf6959195 <_ZN9Assembler4pushEP12RegisterImpl+53>:    push   %eax
> 0xf6959196 <_ZN9Assembler4pushEP12RegisterImpl+54>:    call
> 0xf6c9b730 <_Z15report_vm_errorPKciS0_S0_>
> 0xf695919b <_ZN9Assembler4pushEP12RegisterImpl+59>:    call
> 0xf72bc1c0 <breakpoint>
> 0xf69591a0 <_ZN9Assembler4pushEP12RegisterImpl+64>:    add    $0x10,%esp
>
> 0xf69591a3 <_ZN9Assembler4pushEP12RegisterImpl+67>:    mov
>  0xc(%edi),%edx
> 0xf69591a6 <_ZN9Assembler4pushEP12RegisterImpl+70>:    mov    %esi,%eax
> 0xf69591a8 <_ZN9Assembler4pushEP12RegisterImpl+72>:    or     $0x50,%eax
> 0xf69591ab <_ZN9Assembler4pushEP12RegisterImpl+75>:    mov
>  0x8(%edx),%ecx
> 0xf69591ae <_ZN9Assembler4pushEP12RegisterImpl+78>:    mov    %al,(%ecx)
>
> 'this' is loaded into register 'edi' and 'src' is loaded into register
> 'esi'. 'src' (i.e. 'esi') is compared against '0x7' which comes from
> the call to 'is_valid()'. If everything is fine, we proceed at address
> '0xf69591a3'. There we load the '_code_section' field with offest 0xc
> from the AbstractAssembler which is in register 'edi' and stroe it in
> register 'edx'. We OR the register encoding with '0x50' which succeeds
> altough the actual RegisterImpl pointer (i.e. 'src') is NULL.
>
> We then load the '_end' field with offest '0x8' from the CodeSection
> in register 'edx' into 'ecx'. But this gives us a wrong address (i.e.
> 0xcccccccc) and that's the reason why we crash when we try to write
> (i.e. emit_int8()) the register encoding from register 'al' to that
> address in 'ecx'.
>
> So the problem is that for some reason the CodeSection of the
> AbstractAssembler isn't initialized correctly. This may be related to
> the "codeBuffer.hpp:181:51: runtime error: member access within null
> pointer of type 'struct CodeSection'" - problem when compiling with
> "-fsanitize=undefined" as reported by you. I'll take a look at that
> tomorrow.
>
> Regards,
> Volker
>
>
>
>
> On Tue, Apr 22, 2014 at 6:34 PM, Omair Majid <omajid at redhat.com> wrote:
> > * Volker Simonis <volker.simonis at gmail.com> [2014-04-22 04:41]:
> >> I think the simplest and safest fix would be to make encoding() (and
> >> all the corresponding functions) static functions which take a
> >> Register as argument like:
> >>
> >> static int encoding(const RegisterImpl* r) { assert(is_valid(r),
> >> "invalid register"); return (intptr_t)r; }
> >>
> >> This wouldn't waste any more memory and it would be fully C++
> >> compliant at the price of a slightly more verbose usage:
> >>
> >> 2577    void Assembler::push(Register src) {
> >> 2578      int encode = prefix_and_encode(RegisterImpl::encoding(src));
> >> 2579
> >>
> >> And of course this would work with any compiler.
> >
> > Are you sure about this?
> >
> > It seems to me that a pointer created from an int does not satisfy the
> > conditions for a "safely-derived pointer". It has implementation defined
> > behaviour [1].
> >
> >> What do you think?
> >
> > That certainly works for me, but not sure for others.
> >
> > For the sake of safety, I would rather use a typedef for registers and a
> > separate class with static methods to operate on the typedef.
> >
> > Thanks,
> > Omair
> >
> > [1] http://cpp0x.centaur.ath.cx/basic.stc.dynamic.safety.html
> >
> > --
> > PGP Key: 66484681 (http://pgp.mit.edu/)
> > Fingerprint = F072 555B 0A17 3957 4E95  0056 F286 F14F 6648 4681
>