RFR (S/M) expose L1_data_cache_line_size for diagnostic/sanity checks (8049717)

Fri Jul 11 17:48:03 UTC 2014

Yes, I got 32 from prtpicl on T4.

I googled and T1 had S1 core, T2 and T3 had S2 core, T4,T5,T6 has S3 
core, T7 will have S4 core.
Based on your prtpicl data I would assume T3 will have 16 bytes cache 
line too as T2.

But it is mess. How come we don't have SPARC documents which clear 
states all this parameters :(

I don't want to start about CPUID again.

How is critical for you to have correct size?

Vladimir

On 7/11/14 9:09 AM, Daniel D. Daugherty wrote:
> Vladimir, thanks for the thorough review.
>
>
> On 7/10/14 1:07 PM, Vladimir Kozlov wrote:
>> Hi Dan
>>
>> vm_version_sparc.cpp:
>>
>> I don't know where you get 16 byte cache line size:
>
> That would be covered by the comments:
>
>   263   if (is_sun4v()) {
>   264     assert(_L1_data_cache_line_size == 0, "overlap with sun4v
> family");
>   265     // All Niagara's are sun4v's, but not all sun4v's are Niagaras.
>   266     //
>   267     // Ref: UltraSPARC T1 Supplement to the UltraSPARC
> Architecture 2005
>   268     // Appendix F.1.3.1 Cacheable Accesses
>   269     //
>   270     // Ref: UltraSPARC T2: A Highly-Threaded, Power-Efficient,
> SPARC SOC
>   271     // Section III: SPARC Processor Core
>   272     //
>   273     // Ref: Oracle's SPARC T4-1, SPARC T4-2, SPARC T4-4, and SPARC
> T4-1B Server Architecture
>   274     // Section SPARC T4 Processor Cache Architecture
>   275     _L1_data_cache_line_size = 16;
>   276   }
>
> Unfortunately, I can no longer find the T4 L1 cache line size in
> that last reference. Either I dreamed it or that doc has been
> tweaked since I previously looked at it. I googled around again,
> but I can't find a good reference for the T4 L1 cache line size.
>
>>
>> /usr/sbin/prtpicl -v |grep l1-dcache |more
>>           :l1-dcache-line-size   32
>>           :l1-dcache-size        16384
>>           :l1-dcache-associativity       4
>
> I'm guessing the above is a from a 'T4' or newer machine.
>
> And by the example from these machines:
>
> $ uname -a
> SunOS dr-evil 5.10 Generic_142900-03 sun4v sparc SUNW,Sun-Fire-T1000
>
> $ /usr/sbin/prtpicl -v | grep l1-dcache-line-size | sort -u
>            :l1-dcache-line-size      16
>
> $ uname -a
> SunOS mrspock 5.10 Generic_141444-09 sun4v sparc SUNW,T5440
>
> $ /usr/sbin/prtpicl -v | head -1000 | grep l1-dcache-line-size | sort -u
>            :l1-dcache-line-size   16
>
> prtpicl seems to go on and on and on on mrspock... hence 'head -1000'
>
> $ uname -a
> SunOS terminus 5.11 11.0 sun4u sparc SUNW,SPARC-Enterprise
>
> $ /usr/sbin/prtpicl -v | grep l1-dcache-line-size | sort -u
>                :l1-dcache-line-size       0x40
>
>
>
>
>> It is 32 for T4 and for T7 it will be larger:
>>
>>   static intx prefetch_data_size()  {
>>     return is_T4() && !is_T7() ? 32 : 64;  // default prefetch block
>> size on sparc
>>   }
>
> OK. So T1 and T2 have 16-byte L1 cache line sizes. Is there a T3?
> T4 and T5 have 32-byte L1 cache lines sizes. Is there a T6?
> T7 and newer have 64-byte cache line sizes.
>
> Can I repeat (from a different e-mail thread) that SPARC really
> needs the equivalent of CPUID?
>
>
>> sun4v could be defined for Fujitsu Sparc64 too:
>>
>>   static bool is_niagara(int features)  {
>>     // 'sun4v_m' may be defined on both Sun/Oracle Sparc CPUs as well as
>>     // on Fujitsu Sparc64 CPUs, but only Sun/Oracle Sparcs can be
>> 'niagaras'.
>>     return (features & sun4v_m) != 0 && (features & sparc64_family_m)
>> == 0;
>
> So are the three distinct SPARC 64-bit families better stated as
> (where the Niagara family has three different L1 cache line sizes):
>
>      is_ultra3()         // 64-byte L1 cache line size
>      is_niagara()
>        is_T7()           // 64-byte L1 cache line size
>        else is_T4()      // 32-byte L1 cache line size
>        else /* T[12] */  // 16-byte L1 cache line size
>      is_sparc64()        // 64-byte L1 cache line size
>
>
>> vm_version_x86.hpp, vm_version_x86.cpp
>>
>> I would like to keep cpuid bit access in .hpp file.
>> I would suggest to keep code prefetch_data_size() but may be rename it
>> as L1_line_size() so that you have in .hpp:
>>
>>   static intx L1_line_size()  {
>>     intx result = 0;
>>     if (is_intel()) {
>>       result = (_cpuid_info.dcp_cpuid4_ebx.bits.L1_line_size + 1);
>>     } else if (is_amd()) {
>>       result = _cpuid_info.ext_cpuid5_ecx.bits.L1_line_size;
>>     }
>>     if (result < 32) // not defined ?
>>       result = 32;   // 32 bytes by default on x86 and other x64
>>     return result;
>>   }
>>
>>   static intx prefetch_data_size()  {
>>     return L1_line_size();
>>   }
>>
>> and in .cpp for > i486 (i486 code is still yours):
>>
>> _L1_data_cache_line_size = L1_line_size();
>
> Sure, I can move the CPUID bit stuff back into the .hpp file.
> I'll do the rename and make prefetch_data_size() a wrapper
> call to L1_line_size().
>
>
>> objectMonitor.cpp and synchronizer.cpp:
>>
>> cast to 'int' but destination is 'unsigned' (also you can use 'uint'):
>>
>> unsigned int offset_stwRandom = (int)
>
> I'll check that out.
>
>
>> combine two 'if (verbose)' into one.
>
> I'll check that out also.
>
> Dan
>
>
>>
>> On 7/9/14 9:42 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have the fix for the following bug ready for JDK9 RT_Baseline:
>>>
>>>      JDK-8049717 expose L1_data_cache_line_size for diagnostic/sanity
>>>                  checks
>>>      https://bugs.openjdk.java.net/browse/JDK-8049717
>>>
>>> Here is the URL for the webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8049717-webrev/0-jdk9-hs-rt/
>>>
>>> This fix is a standalone piece from my Contended Locking reorder
>>> and cache-line bucket. I've split it off as an independent bug fix
>>> in order to make the reorder and cache-line bucket more clear.
>>>
>>> Testing:
>>>
>>> - JPRT test jobs
>>> - manual testing of the new output via existing options:
>>>    -XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=Verbose=1
>>>    -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>>> - Aurora Adhoc nsk.sajdi and vm.parallel_class_loading as part of
>>>    testing for my Contended Locking reorder and cache-line bucket
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>>>
>