RFR (S/M) expose L1_data_cache_line_size for diagnostic/sanity checks (8049717)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jul 11 17:53:17 UTC 2014
On 7/11/14 11:48 AM, Vladimir Kozlov wrote:
> Yes, I got 32 from prtpicl on T4.
>
> I googled and T1 had S1 core, T2 and T3 had S2 core, T4,T5,T6 has S3
> core, T7 will have S4 core.
> Based on your prtpicl data I would assume T3 will have 16 bytes cache
> line too as T2.
>
> But it is mess. How come we don't have SPARC documents which clear
> states all this parameters :(
>
> I don't want to start about CPUID again.
>
> How is critical for you to have correct size?
Not critical. The only reason for exposing this info is to
provide some diagnostics about possible false sharing issues.
I think the new code that I'm testing right now will get it right.
Dan
>
> Vladimir
>
> On 7/11/14 9:09 AM, Daniel D. Daugherty wrote:
>> Vladimir, thanks for the thorough review.
>>
>>
>> On 7/10/14 1:07 PM, Vladimir Kozlov wrote:
>>> Hi Dan
>>>
>>> vm_version_sparc.cpp:
>>>
>>> I don't know where you get 16 byte cache line size:
>>
>> That would be covered by the comments:
>>
>> 263 if (is_sun4v()) {
>> 264 assert(_L1_data_cache_line_size == 0, "overlap with sun4v
>> family");
>> 265 // All Niagara's are sun4v's, but not all sun4v's are
>> Niagaras.
>> 266 //
>> 267 // Ref: UltraSPARC T1 Supplement to the UltraSPARC
>> Architecture 2005
>> 268 // Appendix F.1.3.1 Cacheable Accesses
>> 269 //
>> 270 // Ref: UltraSPARC T2: A Highly-Threaded, Power-Efficient,
>> SPARC SOC
>> 271 // Section III: SPARC Processor Core
>> 272 //
>> 273 // Ref: Oracle's SPARC T4-1, SPARC T4-2, SPARC T4-4, and SPARC
>> T4-1B Server Architecture
>> 274 // Section SPARC T4 Processor Cache Architecture
>> 275 _L1_data_cache_line_size = 16;
>> 276 }
>>
>> Unfortunately, I can no longer find the T4 L1 cache line size in
>> that last reference. Either I dreamed it or that doc has been
>> tweaked since I previously looked at it. I googled around again,
>> but I can't find a good reference for the T4 L1 cache line size.
>>
>>>
>>> /usr/sbin/prtpicl -v |grep l1-dcache |more
>>> :l1-dcache-line-size 32
>>> :l1-dcache-size 16384
>>> :l1-dcache-associativity 4
>>
>> I'm guessing the above is a from a 'T4' or newer machine.
>>
>> And by the example from these machines:
>>
>> $ uname -a
>> SunOS dr-evil 5.10 Generic_142900-03 sun4v sparc SUNW,Sun-Fire-T1000
>>
>> $ /usr/sbin/prtpicl -v | grep l1-dcache-line-size | sort -u
>> :l1-dcache-line-size 16
>>
>> $ uname -a
>> SunOS mrspock 5.10 Generic_141444-09 sun4v sparc SUNW,T5440
>>
>> $ /usr/sbin/prtpicl -v | head -1000 | grep l1-dcache-line-size | sort -u
>> :l1-dcache-line-size 16
>>
>> prtpicl seems to go on and on and on on mrspock... hence 'head -1000'
>>
>> $ uname -a
>> SunOS terminus 5.11 11.0 sun4u sparc SUNW,SPARC-Enterprise
>>
>> $ /usr/sbin/prtpicl -v | grep l1-dcache-line-size | sort -u
>> :l1-dcache-line-size 0x40
>>
>>
>>
>>
>>> It is 32 for T4 and for T7 it will be larger:
>>>
>>> static intx prefetch_data_size() {
>>> return is_T4() && !is_T7() ? 32 : 64; // default prefetch block
>>> size on sparc
>>> }
>>
>> OK. So T1 and T2 have 16-byte L1 cache line sizes. Is there a T3?
>> T4 and T5 have 32-byte L1 cache lines sizes. Is there a T6?
>> T7 and newer have 64-byte cache line sizes.
>>
>> Can I repeat (from a different e-mail thread) that SPARC really
>> needs the equivalent of CPUID?
>>
>>
>>> sun4v could be defined for Fujitsu Sparc64 too:
>>>
>>> static bool is_niagara(int features) {
>>> // 'sun4v_m' may be defined on both Sun/Oracle Sparc CPUs as
>>> well as
>>> // on Fujitsu Sparc64 CPUs, but only Sun/Oracle Sparcs can be
>>> 'niagaras'.
>>> return (features & sun4v_m) != 0 && (features & sparc64_family_m)
>>> == 0;
>>
>> So are the three distinct SPARC 64-bit families better stated as
>> (where the Niagara family has three different L1 cache line sizes):
>>
>> is_ultra3() // 64-byte L1 cache line size
>> is_niagara()
>> is_T7() // 64-byte L1 cache line size
>> else is_T4() // 32-byte L1 cache line size
>> else /* T[12] */ // 16-byte L1 cache line size
>> is_sparc64() // 64-byte L1 cache line size
>>
>>
>>> vm_version_x86.hpp, vm_version_x86.cpp
>>>
>>> I would like to keep cpuid bit access in .hpp file.
>>> I would suggest to keep code prefetch_data_size() but may be rename it
>>> as L1_line_size() so that you have in .hpp:
>>>
>>> static intx L1_line_size() {
>>> intx result = 0;
>>> if (is_intel()) {
>>> result = (_cpuid_info.dcp_cpuid4_ebx.bits.L1_line_size + 1);
>>> } else if (is_amd()) {
>>> result = _cpuid_info.ext_cpuid5_ecx.bits.L1_line_size;
>>> }
>>> if (result < 32) // not defined ?
>>> result = 32; // 32 bytes by default on x86 and other x64
>>> return result;
>>> }
>>>
>>> static intx prefetch_data_size() {
>>> return L1_line_size();
>>> }
>>>
>>> and in .cpp for > i486 (i486 code is still yours):
>>>
>>> _L1_data_cache_line_size = L1_line_size();
>>
>> Sure, I can move the CPUID bit stuff back into the .hpp file.
>> I'll do the rename and make prefetch_data_size() a wrapper
>> call to L1_line_size().
>>
>>
>>> objectMonitor.cpp and synchronizer.cpp:
>>>
>>> cast to 'int' but destination is 'unsigned' (also you can use 'uint'):
>>>
>>> unsigned int offset_stwRandom = (int)
>>
>> I'll check that out.
>>
>>
>>> combine two 'if (verbose)' into one.
>>
>> I'll check that out also.
>>
>> Dan
>>
>>
>>>
>>> On 7/9/14 9:42 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have the fix for the following bug ready for JDK9 RT_Baseline:
>>>>
>>>> JDK-8049717 expose L1_data_cache_line_size for diagnostic/sanity
>>>> checks
>>>> https://bugs.openjdk.java.net/browse/JDK-8049717
>>>>
>>>> Here is the URL for the webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8049717-webrev/0-jdk9-hs-rt/
>>>>
>>>> This fix is a standalone piece from my Contended Locking reorder
>>>> and cache-line bucket. I've split it off as an independent bug fix
>>>> in order to make the reorder and cache-line bucket more clear.
>>>>
>>>> Testing:
>>>>
>>>> - JPRT test jobs
>>>> - manual testing of the new output via existing options:
>>>> -XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=Verbose=1
>>>> -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>>>> - Aurora Adhoc nsk.sajdi and vm.parallel_class_loading as part of
>>>> testing for my Contended Locking reorder and cache-line bucket
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>
More information about the hotspot-runtime-dev
mailing list