RFR: 8224974: Implement JEP 352
Andrew Dinn
adinn at redhat.com
Wed Jul 31 11:29:15 UTC 2019
Hi Aleksey,
I have an update regarding the change to the computation of the
CPU_FLUSH flag. After posting the new webrev I built a debug version
with the change that tests the clflush bit on x86_64. It crashed when
the assert is reached in VM_Version::supports_cpuflush:
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (src/hotspot/cpu/x86/vm_version_x86.hpp:978),
pid=29597, tid=29602
# assert((_features & ((uint64_t)(0x20000000000ULL))) != 0) failed:
clflush should be available
So, it seems the clflush bit is not set on my x86_64 box even though I
am able to execute clflush quite happily.
"Toto, it looks like we are no longer in Antarctica."
So, I will revert this change (in the next webrev).
regards,
Andrew Dinn
-----------
On 31/07/2019 11:55, Andrew Dinn wrote:
> Hi Aleksey
>
> On 30/07/2019 17:00, Aleksey Shipilev wrote:
>> On 7/30/19 5:04 PM, Andrew Dinn wrote:
>>> JEP 352 has now been targeted for inclusion in JDK14. The latest webrev
>>> for the implementation JIRA has been rebased to apply to the current
>>> tree. Is it now ok to push this change set?
>>>
>>> JIRA: https://bugs.openjdk.java.net/browse/JDK-8224974
>>> webrev: http://cr.openjdk.java.net/~adinn/8224974/webrev.09
>> Is it okay to have a late code review? Here it goes:
>
> Sure, welcome to the party even if most of the booze has been drunk :-)
>
> Thank you for a very thorough review. Comments (mostly in agreement) are
> inline below. They include a few refusals and a walk past some of the
> jumps pending confirmation they truly form part of the obstacle course.
>
> A new webrev against the latest jdk/jdk dev tree including all
> uncontested changes is here:
>
> http://cr.openjdk.java.net/~adinn/8224974/webrev.10/
>
>> === General:
>>
>> So if pre wbsync is no-op, why do we need to handle it everywhere? We seem to be falling through all
>> the way to the stub to do nothing there, maybe we should instead cut much earlier, e.g. when
>> inlining Unsafe.writeBackPresync0? Would it be better to not emit CacheWBPreSync at all?
>
> Ah, I see you brought your own bottle ;-)
>
> The pre sync is definitely not needed at present. However, I put it
> there because I didn't know for sure if some future port of this
> capability (e.g. to ppc) might need to sync prior writes before writing
> back cache lines. [Indeed, the old Intel documentation stated that
> pre-sync was needed on x86 for clflush to be safe but it is definitely
> not needed.]
>
> Anyway, folding out the pre-sync as you suggest would mean taking what
> is still potentially an architecture-specific decision in generic code.
> That may well prove to be a redundant precaution but it also costs
> little -- every affected /compile/ will last a tad longer and be a tad
> fatter in its memory footprint. If you think change is important I will
> happily oblige.
>
>> === General:
>>
>> IIRC, notproduct and PRODUCT defines are legacy, and should be avoided? develop and ASSERT must be
>> the substitutes for this. See some discussion here: https://bugs.openjdk.java.net/browse/JDK-8183287
>
> Ok done.
>
> I changed #ifndef PRODUCT to #ifdef ASSERT in library_call.cpp and
> unsafe.cpp. I also redefined global flag TraceMemoryWriteback using
> develop rather than notproduct.
>
> I also removed the AArch64 /product/ compiler option UsePOCForPOP. This
> allowed the JIT to use a memory writeback instruction with weakened
> semantics if the full writeback is not available in a product build.
> product was chosen because this was needed to establish a benchmark for
> writeback on existing kit when using a pseudo-NVRAM memory device based
> on volatile memory. The point was to be able to do an apples to apples
> comparison with writeback to a disk device. Obviously, it's not much
> use doing that test with a a non-product build. This is only important
> during development as a one-off test once the OS support for
> pseudo-NVRAM devices arrives in Linux AArch64 (which should happen well
> before new kit with the POP writeback arrives). I can still perform it
> using a custom build so I dropped this from the patch.
>
>> === src/hotspot/cpu/aarch64/aarch64.ad
>> src/hotspot/cpu/x86/x86.ad
>>
>> This should probably be just "!VM_Version...". Braces around the statement would not hurt either.
>>
>> 2196 if (VM_Version::supports_data_cache_line_flush() == false)
>> 2197 ret_value = false;
>
> Done.
>
>> === src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>
>> Braces style mismatch, should be on the same line, as in the rest of the file:
>>
>> 5837 void MacroAssembler::cache_wb(Address line)
>> 5838 {
>>
>> 5856 void MacroAssembler::cache_wbsync(bool is_pre)
>> 5857 {
>
> Done.
>
>> === src/hotspot/cpu/x86/assembler_x86.cpp
>>
>> It feels like these comments are redundant, especially L8630 and L8646 which mention magic values
>> "6" and "7", not present in the code:
>>
>> 8624 // instruction prefix is 0x66
>>
>> 8627 // opcode family is 0x0f 0xAE
>>
>> 8630 // extended opcode byte is 7 == rdi
>>
>> 8640 // instruction prefix is 0x66
>>
>> 8643 // opcode family is 0x0f 0xAE
>>
>> 8646 // extended opcode byte is 6 == rsi
>
> Well, I think the question of redundancy depends on how you look at it.
> The comments are not there to explain that these values are the ones
> being passed to the emit functions -- that is pretty obvious. They are
> there to explain what those values mean i.e. to explain what elements of
> the fully assembled instruction the emit calls assembling piecemeal.
> Perhaps that reading woudl be clearer if each such equation of terms was
> done in reverse order:
>
> 8624 // 0x66 is instruction prefix
>
> 8627 // 0x0f 0xAE is opcode family
>
> 8630 // rdi == 7 is extended opcode byte
> . . .
>
> Given that the code is simply stuffing numbers (whether supplied as
> literals or as symbolic constants) into a byte stream I think these
> comments are a help when it comes to cross-checking each specific
> assembly against the corresponding numbers declared in the Intel
> manuals. So, I don't really want to remove them. Would you prefer me to
> reverse the wording as above?
>
>> === src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>
>> These comments feel redundant too. Well, maybe they should instead talk about the choices the
>> subsequent code makes.
>>
>> 9915 // pick the correct implementation
>>
>> 9936 // pick the correct implementation
>
> I agree it would help to expand the comment to explain the logic of the
> choice in the first case:
>
> // prefer clwb (potentially parallel writeback without evict)
> // otherwise prefer clflushopt (potentially parallel writeback
> // with evict)
> // otherwise fallback on clflush (serial writeback with evict)
>
> In the second case the comment is redundant because the need for an
> sfence is covered by the existing comment inside the if:
>
> // need an sfence for post flush when using clflushopt or clwb
> // otherwise no no need for any synchroniaztion
>
>> === src/hotspot/cpu/x86/macroAssembler_x86.cpp
>> src/hotspot/cpu/x86/vm_version_x86.hpp
>>
>> The docs say "The CLFLUSH instruction was introduced with the SSE2 extensions; however, because it
>> has its own CPUID feature flag, it can be implemented in IA-32 processors that do not include the
>> SSE2 extensions. Also, detecting the presence of the SSE2 extensions with the CPUID instruction does
>> not guarantee that the CLFLUSH instruction is implemented in the processor."
>>
>> Yet, we have:
>>
>> 9910 // 64 bit cpus always support clflush
>> 9911 assert(VM_Version::supports_clflush(), "should not reach here on 32-bit");
>>
>> 505 #ifdef _LP64
>> 506 // cpflush is always available on x86_64
>> 507 result |= CPU_FLUSH;
>> 508 #else
>> 509 if (_cpuid_info.std_cpuid1_edx.bits.clflush != 0)
>> 510 result |= CPU_FLUSH;
>> 511 #endif
>>
>> 967 // 64 bit cpus always support clflush which writes back and evicts
>> 968 // on 32 bit cpus support is recorded via a feature flag
>>
>> 980 static bool supports_clflush() { return true; }
>>
>> I think the assert should just say "clflush should be available", and clflush cpu flag detected for
>> 64-bits too?
>
> I think that documentation is only talking about x86_32. I remember
> reading somewhere that clflush is always guaranteed to be available on
> x86_64 only I cannot remember where (I think maybe it was in Intel's
> libpmem code?). Anyway, I do not take the above to imply that clflush
> has been, or even can be, omitted on a 64-bit processor. However, let us
> assume that it can in principle be omitted from an x86_64
> implementation. If so then I'll suggest that such a beast is a thing
> only in the sense that the Kingdom of Antarctica is a thing (i.e. it is
> an unactualized possible). As a non-existence proof I'll note that the
> JVM already relies implicitly (i.e. without checking any CPU bits) on
> being able to call clflush on x86_64 (see icache_x86.cpp).
>
> Anyway, granted the possibility of said Magic Kingdom I have changed the
> assert to say "clflush should be available". I have also
>
> i) modified setting of the CPU_FLUSH flag to test the relevant bit on
> both 32 and 64 bits
>
> + // cpflush should always be available on x86_64
> + // test the flag anyway and assert later
> + if (_cpuid_info.std_cpuid1_edx.bits.clflush != 0)
> + result |= CPU_FLUSH;
> . . .
>
> ii) added an assert that it is set to the 64-bit version of
> supports_clflush:
>
> +#ifdef _LP64
> + static bool supports_clflush() {
> + assert((_features & CPU_FLUSH) != 0, "clflush should be available");
> + return true;
> + }
> . . .
>
>> === src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>
>> Accidental camelCase, while hotspot code expects snake_case:
>>
>> 1794 void cache_wbsync(bool isPre);
>
> Fixed.
>
>> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>>
>> Any reason to avoid inline here, e.g. "__ cache_wb(Address(src, 0))"?
>>
>> 2921 const Address line(src, 0);
>> 2922 __ cache_wb(line);
>
> Well, I guess only if you're "not into that whole brevity thing" (c.f.
> The Big Lebowski). Condensed accordingly.
>
>> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>>
>> Is it really "cmpl" here, not "cmpb"? I think aarch64 code tests for byte.
>>
>> 2942 __ cmpl(is_pre, 0);
>
> This is a Java boolean input. I believe that means the value will be
> loaded into c_arg0 as an int so this test ought to be adequate.
>
> The Arch64 code actually tests the input value as a long word:
>
> __ enter();
> __ cbnz(is_pre, skip);
> . . .
>
> n.b. cbnz == compare and branch if non-zero (64 bit). This latter code
> could (arguably) employ a cbnzw (i.e. 32-bit variant). However, the
> argument will have been loaded into c_rarg0 using an AArch64 int load
> which guarantees to zero the top 32 bits. So, it doesn't really make any
> difference whether this uses cbnz or cbnzw.
>
>> === src/hotspot/share/classfile/vmSymbols.hpp
>>
>> Excess space before "jdk_internal..." here:
>>
>> 1096 do_intrinsic(_writebackPostSync0, jdk_internal_misc_Unsafe,
>> writebackPostSync0_name, void_method_signature , F_RN) \
>
> Fixed.
>
>> === src/hotspot/share/opto/c2compiler.cpp
>>
>> Why inject new cases here, instead of at the end of switch? Saves sudden "break":
>>
>> 578 break;
>> 579 case vmIntrinsics::_writeback0:
>> 580 if (!Matcher::match_rule_supported(Op_CacheWB)) return false;
>> 581 break;
>> 582 case vmIntrinsics::_writebackPreSync0:
>> 583 if (!Matcher::match_rule_supported(Op_CacheWBPreSync)) return false;
>> 584 break;
>> 585 case vmIntrinsics::_writebackPostSync0:
>> 586 if (!Matcher::match_rule_supported(Op_CacheWBPostSync)) return false;
>> 587 break;
>
> I placed them here so they were close to the other Unsafe intrinsics. In
> particular they precede _allocateInstance, an ordering which is also the
> case in the declarations in vmSymbols.hpp.
>
> In what sense do you mean that an extra 'break' is saved? That would be
> true as regards the textual layout. It wouldn't affect the logic of
> folding different ranges of values into branching range tests (which is
> only determined by the numeric values of the intrinsics). If you are
> concerned about the former then I would argue that placing the values in
> declaration order seems to me to be the more important concern.
>
>> === src/hotspot/share/opto/library_call.cpp
>>
>> Accidental camelCase here, should be snake_case:
>>
>> 257 bool inline_unsafe_writebackSync0(bool isPre);
>>
>> 2870 bool LibraryCallKit::inline_unsafe_writebackSync0(bool isPre) {
>
> Fixed.
>
>> === src/hotspot/share/prims/unsafe.cpp
>>
>> Odd indenting near "CC":
>>
>> 1122 {CC "writeback0", CC "(" "J" ")V", FN_PTR(Unsafe_WriteBack0)},
>> 1123 {CC "writebackPreSync0", CC "()V", FN_PTR(Unsafe_WriteBackPreSync0)},
>> 1124 {CC "writebackPostSync0", CC "()V", FN_PTR(Unsafe_WriteBackPostSync0)},
>
> Fixed.
>
>> camelCase:
>> 464 static void doWriteBackSync0(bool isPre)
>
> Fixed.
>
>> === src/hotspot/share/prims/unsafe.cpp
>>
>> Do we really need this function pointer mess?
>>
>> 457 void (*wb)(void *);
>> 458 void *a = addr_from_java(line);
>> 459 wb = (void (*)(void *)) StubRoutines::data_cache_writeback();
>> 460 assert(wb != NULL, "generate writeback stub!");
>> 461 (*wb)(a);
>>
>> Seems easier to:
>>
>> assert(StubRoutines::data_cache_writeback() != NULL, "sanity");
>> StubRoutines::data_cache_writeback()(addr_from_java(line));
> Hmm, "that whole brevity thing" again? Well, I guess you must now call
> me "El Duderino".
>
> Are you sure that all the compilers used to build openJDK will happily
> eat the second line of your replacement? If you can guarantee that I'll
> happily remove the type declarations.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
>
--
regards,
Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
More information about the hotspot-compiler-dev
mailing list