RFR: 8263718: unused-result warning happens at os_linux.cpp

Sun Mar 21 15:45:39 UTC 2021

On Sun, Mar 21, 2021 at 2:30 PM David Holmes <david.holmes at oracle.com>
wrote:

> On 21/03/2021 5:22 pm, Thomas Stüfe wrote:
> > On Sun, Mar 21, 2021 at 7:43 AM Yasumasa Suenaga <
> ysuenaga at openjdk.java.net>
> > wrote:
> >
> >> On Sat, 20 Mar 2021 12:35:18 GMT, Yasumasa Suenaga <
> ysuenaga at openjdk.org>
> >> wrote:
> >>
> >>>> @YaSuenag The code generated by debug builds is often significantly
> >> different from release builds. You might want to check the latter
> instead
> >> to figure out if this has any effect on what we actually ship.
> >>>
> >>> @magicus I checked assembly code of release build, it is similar to
> >> fastdebug build. The stack (RSP) is expanded.
> >>> (I confirmed it with Visual Studio 16.9.2 because I received update
> >> notification before your reply...)
> >>>    // Try to randomize the cache line index of hot stack frames.
> >>>    // This helps when threads of the same stack traces evict each
> other's
> >>>    // cache lines. The threads can be either from the same JVM
> instance,
> >> or
> >>>    // from different JVM instances. The benefit is especially true for
> >>>    // processors with hyperthreading technology.
> >>>    static int counter = 0;
> >>>    int pid = os::current_process_id();
> >>> 00007FF80F4E10ED  mov         eax,dword ptr [_initial_pid
> >> (07FF80F9D8164h)]
> >>> 00007FF80F4E10F3  test        eax,eax
> >>> 00007FF80F4E10F5  jne         thread_native_entry+3Dh (07FF80F4E10FDh)
> >>> 00007FF80F4E10F7  call        qword ptr [__imp__getpid
> >> (07FF80F6EF748h)]
> >>>    _alloca(((pid ^ counter++) & 7) * 128);
> >>> 00007FF80F4E10FD  mov         ecx,dword ptr [counter (07FF80F9D8300h)]
> >>> 00007FF80F4E1103  mov         edx,ecx
> >>> 00007FF80F4E1105  xor         edx,eax
> >>> 00007FF80F4E1107  inc         ecx
> >>> 00007FF80F4E1109  mov         dword ptr [counter (07FF80F9D8300h)],ecx
> >>> 00007FF80F4E110F  and         edx,7
> >>> 00007FF80F4E1112  shl         edx,7
> >>> 00007FF80F4E1115  mov         eax,edx
> >>> 00007FF80F4E1117  lea         rcx,[rdx+0Fh]
> >>> 00007FF80F4E111B  cmp         rcx,rax
> >>> 00007FF80F4E111E  ja          thread_native_entry+6Ah (07FF80F4E112Ah)
> >>> 00007FF80F4E1120  mov         rcx,0FFFFFFFFFFFFFF0h
> >>> 00007FF80F4E112A  and         rcx,0FFFFFFFFFFFFFFF0h
> >>> 00007FF80F4E112E  mov         rax,rcx
> >>> 00007FF80F4E1131  call        __chkstk (07FF80F6ED370h)
> >>> 00007FF80F4E1136  sub         rsp,rcx
> >>
> >>> Did you measure on Alpine too, with muslc? And the XXXBsds? Are we sure
> >> we
> >>> measure the right thing? I wish there were regression tests telling us
> >> when
> >>> to re-apply this optimization.
> >>
> >> I think we can decide by whether `alloca()` or equivalent stack
> operation
> >> is contained in current binary.
> >> If current binary does not contain it like JDK 17 Linux x64, we can
> remove
> >> it because it already does not work - performance degradation will not
> >> happen.
> >>
> >>
> > How do you know this? We may have had a performance degradation since
> years
> > without noticing.  The question is, if a performance optimization,
> > considered necessary in the past, did bitrot, should we remove or repair
> it.
> >
> > In its context, I think Alpine + musl libc is also ok to remove it
> because
> >> I confirmed JDK 17 for Alpine x64 (from jdk.java.net) does not contain
> >> stack operation same as x64.
> >>
> >> 0000000000b889b0 <thread_native_entry(Thread*)>:
> >>    b889b0:       55                      push   %rbp
> >>    b889b1:       48 89 e5                mov    %rsp,%rbp
> >>    b889b4:       41 56                   push   %r14
> >>    b889b6:       41 55                   push   %r13
> >>    b889b8:       49 89 fd                mov    %rdi,%r13
> >>    b889bb:       41 54                   push   %r12
> >>    b889bd:       53                      push   %rbx
> >>    b889be:       e8 6d a5 1a 00          callq  d32f30
> >> <Thread::record_stack_base_and_size()>
> >>    b889c3:       e8 e8 fb 6a ff          callq  2385b0 <getpid at plt>
> >>    b889c8:       4c 89 ef                mov    %r13,%rdi
> >>    b889cb:       83 05 b6 98 61 00 01    addl   $0x1,0x6198b6(%rip)
> >> # 11a2288 <thread_native_entry(Thread*)::counter>
> >>    b889d2:       e8 f9 a4 1a 00          callq  d32ed0
> >> <Thread::initialize_thread_current()>
> >>    b889d7:       49 8b 9d 68 02 00 00    mov    0x268(%r13),%rbx
> >>    b889de:       31 c0                   xor    %eax,%eax
> >>
> >>
> > which does not prove it is not beneficial, it just proves it is not
> there.
>
> Right. It may be that it is of no benefit on glibc because glibc already
> does something clever here. But musl may not, in which case repairing
> the alloca call may improve performance on musl.
>
> This is frustrating because on the one hand there is no point fixing a
> warning for something the compiler elides anyway; while on the other we
> can't even know if this quirky optimisation would be of benefit under
> some conditions.
>
> And we don't know how to try and measure it regardless. :(
>
>
Sigh. This feels like cargo cult programming. This is really frustrating.

..Thomas

David
> -----
>
> >
> >>> (Please leave the alloca in the AIX implementation; we currently don't
> >> have
> >>> the cycles to run regression tests for this)
> >>
> >> Ok, I got it.
> >>
> >>
> > Cheers, Thomas
> >
> >
> >> -------------
> >>
> >> PR: https://git.openjdk.java.net/jdk/pull/3042
> >>
>