RFR: 8263718: unused-result warning happens at os_linux.cpp
Thomas Stüfe
thomas.stuefe at gmail.com
Sun Mar 21 15:45:39 UTC 2021
On Sun, Mar 21, 2021 at 2:30 PM David Holmes <david.holmes at oracle.com>
wrote:
> On 21/03/2021 5:22 pm, Thomas Stüfe wrote:
> > On Sun, Mar 21, 2021 at 7:43 AM Yasumasa Suenaga <
> ysuenaga at openjdk.java.net>
> > wrote:
> >
> >> On Sat, 20 Mar 2021 12:35:18 GMT, Yasumasa Suenaga <
> ysuenaga at openjdk.org>
> >> wrote:
> >>
> >>>> @YaSuenag The code generated by debug builds is often significantly
> >> different from release builds. You might want to check the latter
> instead
> >> to figure out if this has any effect on what we actually ship.
> >>>
> >>> @magicus I checked assembly code of release build, it is similar to
> >> fastdebug build. The stack (RSP) is expanded.
> >>> (I confirmed it with Visual Studio 16.9.2 because I received update
> >> notification before your reply...)
> >>> // Try to randomize the cache line index of hot stack frames.
> >>> // This helps when threads of the same stack traces evict each
> other's
> >>> // cache lines. The threads can be either from the same JVM
> instance,
> >> or
> >>> // from different JVM instances. The benefit is especially true for
> >>> // processors with hyperthreading technology.
> >>> static int counter = 0;
> >>> int pid = os::current_process_id();
> >>> 00007FF80F4E10ED mov eax,dword ptr [_initial_pid
> >> (07FF80F9D8164h)]
> >>> 00007FF80F4E10F3 test eax,eax
> >>> 00007FF80F4E10F5 jne thread_native_entry+3Dh (07FF80F4E10FDh)
> >>> 00007FF80F4E10F7 call qword ptr [__imp__getpid
> >> (07FF80F6EF748h)]
> >>> _alloca(((pid ^ counter++) & 7) * 128);
> >>> 00007FF80F4E10FD mov ecx,dword ptr [counter (07FF80F9D8300h)]
> >>> 00007FF80F4E1103 mov edx,ecx
> >>> 00007FF80F4E1105 xor edx,eax
> >>> 00007FF80F4E1107 inc ecx
> >>> 00007FF80F4E1109 mov dword ptr [counter (07FF80F9D8300h)],ecx
> >>> 00007FF80F4E110F and edx,7
> >>> 00007FF80F4E1112 shl edx,7
> >>> 00007FF80F4E1115 mov eax,edx
> >>> 00007FF80F4E1117 lea rcx,[rdx+0Fh]
> >>> 00007FF80F4E111B cmp rcx,rax
> >>> 00007FF80F4E111E ja thread_native_entry+6Ah (07FF80F4E112Ah)
> >>> 00007FF80F4E1120 mov rcx,0FFFFFFFFFFFFFF0h
> >>> 00007FF80F4E112A and rcx,0FFFFFFFFFFFFFFF0h
> >>> 00007FF80F4E112E mov rax,rcx
> >>> 00007FF80F4E1131 call __chkstk (07FF80F6ED370h)
> >>> 00007FF80F4E1136 sub rsp,rcx
> >>
> >>> Did you measure on Alpine too, with muslc? And the XXXBsds? Are we sure
> >> we
> >>> measure the right thing? I wish there were regression tests telling us
> >> when
> >>> to re-apply this optimization.
> >>
> >> I think we can decide by whether `alloca()` or equivalent stack
> operation
> >> is contained in current binary.
> >> If current binary does not contain it like JDK 17 Linux x64, we can
> remove
> >> it because it already does not work - performance degradation will not
> >> happen.
> >>
> >>
> > How do you know this? We may have had a performance degradation since
> years
> > without noticing. The question is, if a performance optimization,
> > considered necessary in the past, did bitrot, should we remove or repair
> it.
> >
> > In its context, I think Alpine + musl libc is also ok to remove it
> because
> >> I confirmed JDK 17 for Alpine x64 (from jdk.java.net) does not contain
> >> stack operation same as x64.
> >>
> >> 0000000000b889b0 <thread_native_entry(Thread*)>:
> >> b889b0: 55 push %rbp
> >> b889b1: 48 89 e5 mov %rsp,%rbp
> >> b889b4: 41 56 push %r14
> >> b889b6: 41 55 push %r13
> >> b889b8: 49 89 fd mov %rdi,%r13
> >> b889bb: 41 54 push %r12
> >> b889bd: 53 push %rbx
> >> b889be: e8 6d a5 1a 00 callq d32f30
> >> <Thread::record_stack_base_and_size()>
> >> b889c3: e8 e8 fb 6a ff callq 2385b0 <getpid at plt>
> >> b889c8: 4c 89 ef mov %r13,%rdi
> >> b889cb: 83 05 b6 98 61 00 01 addl $0x1,0x6198b6(%rip)
> >> # 11a2288 <thread_native_entry(Thread*)::counter>
> >> b889d2: e8 f9 a4 1a 00 callq d32ed0
> >> <Thread::initialize_thread_current()>
> >> b889d7: 49 8b 9d 68 02 00 00 mov 0x268(%r13),%rbx
> >> b889de: 31 c0 xor %eax,%eax
> >>
> >>
> > which does not prove it is not beneficial, it just proves it is not
> there.
>
> Right. It may be that it is of no benefit on glibc because glibc already
> does something clever here. But musl may not, in which case repairing
> the alloca call may improve performance on musl.
>
> This is frustrating because on the one hand there is no point fixing a
> warning for something the compiler elides anyway; while on the other we
> can't even know if this quirky optimisation would be of benefit under
> some conditions.
>
> And we don't know how to try and measure it regardless. :(
>
>
Sigh. This feels like cargo cult programming. This is really frustrating.
..Thomas
David
> -----
>
> >
> >>> (Please leave the alloca in the AIX implementation; we currently don't
> >> have
> >>> the cycles to run regression tests for this)
> >>
> >> Ok, I got it.
> >>
> >>
> > Cheers, Thomas
> >
> >
> >> -------------
> >>
> >> PR: https://git.openjdk.java.net/jdk/pull/3042
> >>
>
More information about the hotspot-dev
mailing list