RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
Gustavo Romero
gromero at linux.vnet.ibm.com
Tue Jul 17 11:18:25 UTC 2018
Hi Martin,
OK. Let's continue with v4_B [1] so.
Thanks for reviewing it!
Best regards,
Gustavo
[1] http://cr.openjdk.java.net/~gromero/8205582/v4_B
On 07/17/2018 06:02 AM, Doerr, Martin wrote:
> Hi Gustavo,
>
> your webrev v4_B looks good. Reviewed.
>
> I think v4_A wouldn't work appropriately when using +1 and -1 bits together.
>
> A generic version could be something like (just to explain what I was thinking about):
> for (int nbit = 0; nbit < num_failure_bits; nbit++) {
> Label do_increment, check_abort;
>
> int last_match = -1;
> for (ncounter = 0; ncounter < num_counters; ncounter++) {
> if (last_match >= 0) {
> rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
> int selection = bit_counter_map[last_match][ncounter];
> if (selection == 1) {
> bne(CCR1, do_increment);
> } else if (selection == -1) {
> beq(CCR1, do_increment);
> }
> }
> last_match = nbit;
> }
>
> assert(last_match >= 0, "should have at least one");
> rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
> int selection = bit_counter_map[last_match][ncounter];
> if (selection == 1) {
> beq(CCR1, check_abort);
> } else if (selection == -1) {
> bne(CCR1, check_abort);
> }
>
> bind(do_increment);
> ld(temp_Reg, abort_counter_offs, rtm_counters_Reg);
> addi(temp_Reg, temp_Reg, 1);
> std(temp_Reg, abort_counter_offs, rtm_counters_Reg);
> bind(check_abort);
> }
>
> But I'm fine with webrev v4_B with the comment you have added.
>
> Thanks,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
> Sent: Dienstag, 17. Juli 2018 08:55
> To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-dev at openjdk.java.net
> Cc: ppc-aix-port-dev at openjdk.java.net
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi Martin,
>
> On 07/16/2018 05:17 AM, Doerr, Martin wrote:
>> thanks for the new webrev.
>
> Thanks a lot for the thorough review.
>
>
>> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case.
>> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits.
>> If you want to keep the code with the loops, I think there should be a comment explaining this.
>
> I see. I've experimented a couple of options following your previous suggestion
> of using the counter loop as the outer loop. Most difficult point was to work
> around the constraint of having just R0 available as scratch. Bit/bitfield
> extracting instrs did not help much since most of them can't perform an OR with
> its destination operand, which gets worse with the R0 constraint. On the other,
> afaics 'rldimi' which does an OR with its destination operand can't be used to
> extract the bit/bitfields in a generic way. That best alternative I found was to
> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now,
> i.e. all bits/conditions for a given counter are ORed. failure_code logic is not
> inverted any more in the map, which seems more natural. I added also more
> information to the comment before the bit/counter map.
> I think now the loop is generic.
> Webrev for the final result:
>
> http://cr.openjdk.java.net/~gromero/8205582/v4_A/
>
>
>> Please also fix indentation.
>
> Done.
>
>
>> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse.
>>
>> Besides that, it looks good. Thanks for improving it.
>
> Due to the tight schedule I also provide the corrections for the last reviewed
> version:
>
> http://cr.openjdk.java.net/~gromero/8205582/v4_B/
>
> If v4_A also looks good I vouch for pushing it instead of v4_B.
>
>
> Best regards,
> Gustavo
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
>> Sent: Freitag, 13. Juli 2018 16:56
>> To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-dev at openjdk.java.net
>> Cc: ppc-aix-port-dev at openjdk.java.net
>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Hi Martin,
>>
>> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
>>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
>>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?
>>
>> Do you mean increment twice counter #2 (conflict counter) or another counter?
>> non_trans_cf and trans_cf are mutually exclusive.
>> You probably spotted a case I'm missing so I ask.
>>
>>
>>> Besides that, I have some improvement proposals:
>>>
>>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>>>
>>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.
>>
>> Cool :)
>>
>> New (interim) webrev:
>> http://cr.openjdk.java.net/~gromero/8205582/v3/
>>
>> Thanks.
>>
>>
>> Best regards,
>> Gustavo
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
>>> Sent: Dienstag, 10. Juli 2018 18:59
>>> To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-dev at openjdk.java.net
>>> Cc: ppc-aix-port-dev at openjdk.java.net
>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>
>>> Dear Martin,
>>>
>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>>>> Somebody may want to use the definition for other purposes.
>>>
>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
>>> using 'rldicr' in macroAssembler.
>>>
>>>
>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>>>
>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
>>> which seems a reasonable approximation to the original semantics of the debug
>>> counter on Intel. Unfortunately I could not confirm on AIX how these two events
>>> (trap and syscall in TM) will set the failure code, so the counter will never
>>> track any information on AIX. But with the current proposed change that failure
>>> code can be easily added in the future.
>>>
>>> I also realized that I used previously a wrong ME operand value in:
>>>
>>> + // Extract 11 bits
>>> + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>>>
>>> It should be 10 to extract 11 bits actually, so all extractions must be correct
>>> now.
>>>
>>>
>>> I hope you don't find the array map of failure bits vs counters overkilling.
>>>
>>> Finally, I replaced the comment:
>>>
>>> tm_tabort, // Note: Seems like signal handler sets this, too.
>>>
>>> by:
>>>
>>> tm_tabort, // Signal handler will set this too.
>>>
>>> because we just enable RTM support on Power if 'htm-nosc' is supported, so
>>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
>>> that counter to verify the number of aborts cause by the kernel (if any).
>>>
>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
>>>> Sent: Montag, 25. Juni 2018 10:19
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin <martin.doerr at sap.com>; hotspot-compiler-dev at openjdk.java.net
>>>> Cc: ppc-aix-port-dev at openjdk.java.net
>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>>
>>>> Hi,
>>>>
>>>> Could the following change be reviewed please?
>>>>
>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8205582
>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>>>
>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>>>> extracting and checking bits in the Transactional Level field of TEXASR
>>>> register.
>>>>
>>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>>>> status register supports two bits to inform two different types of memory
>>>> conflict between threads: non-transactional and transactional. According to how
>>>> the jtreg RTM tests are designed the memory conflict counter counts
>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>>>> is held on a static variable while another thread without any synchronization
>>>> (non-trasactional) tries to modify the same variable. Hence that small
>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>>>> Power. The memory conflict counter is not used in any other place besides by the
>>>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>>>
>>>> This change partially fixes some failures in
>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>>>> nested and memory conflict abort counters. The remaining issue will be fixed by
>>>> aborting on calling JNI (next RFR).
>>>>
>>>>
>>>> Thank you and best regards,
>>>> Gustavo
>>>>
>>>
>>
>
More information about the hotspot-compiler-dev
mailing list