RFR (S/M) expose L1_data_cache_line_size for diagnostic/sanity checks (8049717)

Fri Jul 11 17:53:17 UTC 2014

On 7/11/14 11:48 AM, Vladimir Kozlov wrote:
> Yes, I got 32 from prtpicl on T4.
>
> I googled and T1 had S1 core, T2 and T3 had S2 core, T4,T5,T6 has S3 
> core, T7 will have S4 core.
> Based on your prtpicl data I would assume T3 will have 16 bytes cache 
> line too as T2.
>
> But it is mess. How come we don't have SPARC documents which clear 
> states all this parameters :(
>
> I don't want to start about CPUID again.
>
> How is critical for you to have correct size?

Not critical. The only reason for exposing this info is to
provide some diagnostics about possible false sharing issues.

I think the new code that I'm testing right now will get it right.

Dan

>
> Vladimir
>
> On 7/11/14 9:09 AM, Daniel D. Daugherty wrote:
>> Vladimir, thanks for the thorough review.
>>
>>
>> On 7/10/14 1:07 PM, Vladimir Kozlov wrote:
>>> Hi Dan
>>>
>>> vm_version_sparc.cpp:
>>>
>>> I don't know where you get 16 byte cache line size:
>>
>> That would be covered by the comments:
>>
>>   263   if (is_sun4v()) {
>>   264     assert(_L1_data_cache_line_size == 0, "overlap with sun4v
>> family");
>>   265     // All Niagara's are sun4v's, but not all sun4v's are 
>> Niagaras.
>>   266     //
>>   267     // Ref: UltraSPARC T1 Supplement to the UltraSPARC
>> Architecture 2005
>>   268     // Appendix F.1.3.1 Cacheable Accesses
>>   269     //
>>   270     // Ref: UltraSPARC T2: A Highly-Threaded, Power-Efficient,
>> SPARC SOC
>>   271     // Section III: SPARC Processor Core
>>   272     //
>>   273     // Ref: Oracle's SPARC T4-1, SPARC T4-2, SPARC T4-4, and SPARC
>> T4-1B Server Architecture
>>   274     // Section SPARC T4 Processor Cache Architecture
>>   275     _L1_data_cache_line_size = 16;
>>   276   }
>>
>> Unfortunately, I can no longer find the T4 L1 cache line size in
>> that last reference. Either I dreamed it or that doc has been
>> tweaked since I previously looked at it. I googled around again,
>> but I can't find a good reference for the T4 L1 cache line size.
>>
>>>
>>> /usr/sbin/prtpicl -v |grep l1-dcache |more
>>>           :l1-dcache-line-size   32
>>>           :l1-dcache-size        16384
>>>           :l1-dcache-associativity       4
>>
>> I'm guessing the above is a from a 'T4' or newer machine.
>>
>> And by the example from these machines:
>>
>> $ uname -a
>> SunOS dr-evil 5.10 Generic_142900-03 sun4v sparc SUNW,Sun-Fire-T1000
>>
>> $ /usr/sbin/prtpicl -v | grep l1-dcache-line-size | sort -u
>>            :l1-dcache-line-size      16
>>
>> $ uname -a
>> SunOS mrspock 5.10 Generic_141444-09 sun4v sparc SUNW,T5440
>>
>> $ /usr/sbin/prtpicl -v | head -1000 | grep l1-dcache-line-size | sort -u
>>            :l1-dcache-line-size   16
>>
>> prtpicl seems to go on and on and on on mrspock... hence 'head -1000'
>>
>> $ uname -a
>> SunOS terminus 5.11 11.0 sun4u sparc SUNW,SPARC-Enterprise
>>
>> $ /usr/sbin/prtpicl -v | grep l1-dcache-line-size | sort -u
>>                :l1-dcache-line-size       0x40
>>
>>
>>
>>
>>> It is 32 for T4 and for T7 it will be larger:
>>>
>>>   static intx prefetch_data_size()  {
>>>     return is_T4() && !is_T7() ? 32 : 64;  // default prefetch block
>>> size on sparc
>>>   }
>>
>> OK. So T1 and T2 have 16-byte L1 cache line sizes. Is there a T3?
>> T4 and T5 have 32-byte L1 cache lines sizes. Is there a T6?
>> T7 and newer have 64-byte cache line sizes.
>>
>> Can I repeat (from a different e-mail thread) that SPARC really
>> needs the equivalent of CPUID?
>>
>>
>>> sun4v could be defined for Fujitsu Sparc64 too:
>>>
>>>   static bool is_niagara(int features)  {
>>>     // 'sun4v_m' may be defined on both Sun/Oracle Sparc CPUs as 
>>> well as
>>>     // on Fujitsu Sparc64 CPUs, but only Sun/Oracle Sparcs can be
>>> 'niagaras'.
>>>     return (features & sun4v_m) != 0 && (features & sparc64_family_m)
>>> == 0;
>>
>> So are the three distinct SPARC 64-bit families better stated as
>> (where the Niagara family has three different L1 cache line sizes):
>>
>>      is_ultra3()         // 64-byte L1 cache line size
>>      is_niagara()
>>        is_T7()           // 64-byte L1 cache line size
>>        else is_T4()      // 32-byte L1 cache line size
>>        else /* T[12] */  // 16-byte L1 cache line size
>>      is_sparc64()        // 64-byte L1 cache line size
>>
>>
>>> vm_version_x86.hpp, vm_version_x86.cpp
>>>
>>> I would like to keep cpuid bit access in .hpp file.
>>> I would suggest to keep code prefetch_data_size() but may be rename it
>>> as L1_line_size() so that you have in .hpp:
>>>
>>>   static intx L1_line_size()  {
>>>     intx result = 0;
>>>     if (is_intel()) {
>>>       result = (_cpuid_info.dcp_cpuid4_ebx.bits.L1_line_size + 1);
>>>     } else if (is_amd()) {
>>>       result = _cpuid_info.ext_cpuid5_ecx.bits.L1_line_size;
>>>     }
>>>     if (result < 32) // not defined ?
>>>       result = 32;   // 32 bytes by default on x86 and other x64
>>>     return result;
>>>   }
>>>
>>>   static intx prefetch_data_size()  {
>>>     return L1_line_size();
>>>   }
>>>
>>> and in .cpp for > i486 (i486 code is still yours):
>>>
>>> _L1_data_cache_line_size = L1_line_size();
>>
>> Sure, I can move the CPUID bit stuff back into the .hpp file.
>> I'll do the rename and make prefetch_data_size() a wrapper
>> call to L1_line_size().
>>
>>
>>> objectMonitor.cpp and synchronizer.cpp:
>>>
>>> cast to 'int' but destination is 'unsigned' (also you can use 'uint'):
>>>
>>> unsigned int offset_stwRandom = (int)
>>
>> I'll check that out.
>>
>>
>>> combine two 'if (verbose)' into one.
>>
>> I'll check that out also.
>>
>> Dan
>>
>>
>>>
>>> On 7/9/14 9:42 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have the fix for the following bug ready for JDK9 RT_Baseline:
>>>>
>>>>      JDK-8049717 expose L1_data_cache_line_size for diagnostic/sanity
>>>>                  checks
>>>>      https://bugs.openjdk.java.net/browse/JDK-8049717
>>>>
>>>> Here is the URL for the webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8049717-webrev/0-jdk9-hs-rt/
>>>>
>>>> This fix is a standalone piece from my Contended Locking reorder
>>>> and cache-line bucket. I've split it off as an independent bug fix
>>>> in order to make the reorder and cache-line bucket more clear.
>>>>
>>>> Testing:
>>>>
>>>> - JPRT test jobs
>>>> - manual testing of the new output via existing options:
>>>>    -XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=Verbose=1
>>>>    -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>>>> - Aurora Adhoc nsk.sajdi and vm.parallel_class_loading as part of
>>>>    testing for my Contended Locking reorder and cache-line bucket
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>