From martin.doerr at sap.com Mon Jul 2 13:20:21 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 2 Jul 2018 13:20:21 +0000 Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional In-Reply-To: <56e66a89-42a7-eb72-05a2-a97c696379c9@linux.vnet.ibm.com> References: <56e66a89-42a7-eb72-05a2-a97c696379c9@linux.vnet.ibm.com> Message-ID: Hi Gustavo, I meant retrying "on abort", not "on busy". There are different counters for these two retry functions. RTM for Stack Locks only supports "on abort". Inflated RTM locking supports both using both counters. But I see that x86 uses the same behavior as you when using -XX:-UseRTMXendForLockBusy. I think it's not so good to treat "abort instruction on lock busy" as permanent abort reason. So the behavior is fine with UseRTMXendForLockBusy, but not without it. But I can live with your change because it only has a negative effect on an unsupported experimental option. And I think your change is fine for the other usages of tabort(). Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 26. Juni 2018 18:01 To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net Cc: ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional Hi Martin, On 06/25/2018 01:49 PM, Doerr, Martin wrote: > Looks good for the case UseRTMXendForLockBusy is active (which is default). I did all the tests focusing on when it's deactivated (-UseRTMXendForLockBusy). This is also the flag passed in jtreg tests since if it's active there are no aborts caused by 'tabort or xabort' and so no abort statistics (related to that event, which is used by the jtreg tests). > If this flag is deactivated, we use tabort if we see the object locked so your change prevents retrying the transaction in this case. > I guess this was not intended? I think that rtm_retry_lock_on_abort() is a misleading name, it should be something like rtm_retry_lock_on_conflict(), since the purpose of this function is to no retry if abort is caused by a tabort/xabort in my understanding. On Intel that function checks for bit 1 (0x2 mask) and if it is set the operation is retried. But to bit 1 be set it implies that transaction didn't abort due to xabort, otherwise that bit would be clear as: 77 // 0 Set if abort caused by XABORT instruction. 78 // 1 If set, the transaction may succeed on a retry. This bit is always clear if bit 0 is set (or is always clear if abort is caused by XABORT) That's why filtering on Power by the "Abort" bit in TEXASR makes the number of aborts behave like on x64. If we don't filter abort caused by tabort we find the pattern X*2+1 times of retries, because both rtm_retry_lock_on_abort() and rtm_retry_lock_on_busy() will try RTMRetryCount of times the operation. My change won't prevent retrying because after rtm_retry_lock_on_abort(), if cmpxchgd() does not succeed it calls rtm_retry_lock_on_busy(), which by its turn will retry the operation based too on the value specified by RTMRetryCount. I prepared a simple test-case where UseRTMXendForLockBusy is deactivated to show that if we increase the number of RTMRetryCount even with that flag deactivated the operation is retried exactly RTMRetryCount+1 times after the fix, like on Intel: https://github.com/gromero/retry You just need to clone and run it pointing to a build dir: $ git clone https://github.com/gromero/retry && cd retry $ ./retry You have to build the WhiteBox lib through "make build-test-lib" before running it. So for RTMRetryCount=1 and RTMRetryCount=2 w/ -UseRTMXendForLockBusy before the change: gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry Creating thread0... Trying to inflate lock... Is monitor inflated? Yes Entering thread to sleep... RTM.syncAndTest at 26 # rtm locks total (estimated): 3 # rtm lock aborts : 3 # rtm lock aborts 0: 3 # rtm lock aborts 1: 3 # rtm lock aborts 2: 0 # rtm lock aborts 3: 0 # rtm lock aborts 4: 0 # rtm lock aborts 5: 0 ++ set +x gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry Creating thread0... Trying to inflate lock... Is monitor inflated? Yes Entering thread to sleep... RTM.syncAndTest at 26 # rtm locks total (estimated): 5 # rtm lock aborts : 5 # rtm lock aborts 0: 5 # rtm lock aborts 1: 5 # rtm lock aborts 2: 0 # rtm lock aborts 3: 0 # rtm lock aborts 4: 0 # rtm lock aborts 5: 0 ++ set +x and after the change: gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry Creating thread0... Trying to inflate lock... Is monitor inflated? Yes Entering thread to sleep... RTM.syncAndTest at 26 # rtm locks total (estimated): 2 # rtm lock aborts : 2 # rtm lock aborts 0: 2 # rtm lock aborts 1: 2 # rtm lock aborts 2: 0 # rtm lock aborts 3: 0 # rtm lock aborts 4: 0 # rtm lock aborts 5: 0 ++ set +x gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry Creating thread0... Trying to inflate lock... Is monitor inflated? Yes Entering thread to sleep... RTM.syncAndTest at 26 # rtm locks total (estimated): 3 # rtm lock aborts : 3 # rtm lock aborts 0: 3 # rtm lock aborts 1: 3 # rtm lock aborts 2: 0 # rtm lock aborts 3: 0 # rtm lock aborts 4: 0 # rtm lock aborts 5: 0 ++ set +x Best regards, Gustavo > Thanks and best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Montag, 25. Juni 2018 10:24 > To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional > > Hi, > > Could the following change be reviewed please? > > bug : https://bugs.openjdk.java.net/browse/JDK-8205580 > webrev: http://cr.openjdk.java.net/~gromero/8205580/v1/ > > It changes the behavior of rtm_retry_lock_on_abort() by avoiding retry if abort > was a deliberate abort, i.e. caused by a 'tabort r0' instruction. > > On Intel bit 1 in abort_status_Reg (which communicates the abort status) is > always clear when a 'xabort 0' instruction is executed in order to inform that a > transactional retry /can not/ succeed on retry. So rtm_retry_lock_on_abort() on > Intel, on finding bit 1 clear in abort_status_Reg, skips the retry (don't > retry). > > Currently on Power rtm_retry_lock_on_abort() is just checking the persistent bit > (if set => skip) which /is not set/ by 'tabort r0'. Hence > rtm_retry_lock_on_abort() does retry to lock on an intentional abort caused by > 'tabort'. It leads, for instance when -XX:RTMRetryCount=1, to the following > discrepancy between Intel and Power regarding the number of retries/aborts: > > [Power] > # rtm locks total (estimated): 3 > # rtm lock aborts : 3 > # rtm lock aborts 0: 3 > # rtm lock aborts 1: 3 > # rtm lock aborts 2: 0 > # rtm lock aborts 3: 0 > # rtm lock aborts 4: 0 > # rtm lock aborts 5: 0 > > [Intel] > # rtm locks total (estimated): 2 > # rtm lock aborts : 2 > # rtm lock aborts 0: 2 > # rtm lock aborts 1: 2 > # rtm lock aborts 2: 0 > # rtm lock aborts 3: 0 > # rtm lock aborts 4: 0 > # rtm lock aborts 5: 0 > > So for -XX:RTMRetryCount=X: > on Power the number of aborts is: X*2+1 [1 first failure + 1 rtm_retry_lock_on_abort() + 1 rtm_retry_lock_on_busy()]; > on Intel the number of aborts is: X+1 [1 first failure + 1 rtm_retry_lock_on_busy()] > > This change fixes that discrepancy by using bit "Abort" in TEXASR register > (abort_status_Reg) that tells if a transaction was aborted due to a 'tabort' > instruction and skip the retry if such a bit is set. > > It fixes the following tests: > > +Passed: compiler/rtm/locking/TestRTMRetryCount.java > +Passed: compiler/rtm/locking/TestRTMAbortThreshold.java > > > Thank you and best regards, > Gustavo > From gromero at linux.vnet.ibm.com Mon Jul 2 14:03:00 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 2 Jul 2018 11:03:00 -0300 Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional In-Reply-To: References: <56e66a89-42a7-eb72-05a2-a97c696379c9@linux.vnet.ibm.com> Message-ID: Hi Martin, On 07/02/2018 10:20 AM, Doerr, Martin wrote: > I meant retrying "on abort", not "on busy". There are different counters for these two retry functions. > RTM for Stack Locks only supports "on abort". > Inflated RTM locking supports both using both counters. Yup, I meant "on abort" too, but I missed your point regarding the retry "on abort" in the RTM for Stack Locks case. > But I see that x86 uses the same behavior as you when using -XX:-UseRTMXendForLockBusy. > I think it's not so good to treat "abort instruction on lock busy" as permanent abort reason. > So the behavior is fine with UseRTMXendForLockBusy, but not without it. hmm I see. So you think it's also not fine on x64, right? In general I think it's not good to finish a RTM atomic block with `xabort/tabort`, but maybe it makes more sense on x86. Luckily -XX:+UseRTMXendForLockBusy is the default. > But I can live with your change because it only has a negative effect on an unsupported experimental option. And I think your change is fine for the other usages of tabort(). OK. I'll revisit that when doing additional performance tests with RTM for Stack Locks. Thanks! Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 26. Juni 2018 18:01 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional > > Hi Martin, > > On 06/25/2018 01:49 PM, Doerr, Martin wrote: >> Looks good for the case UseRTMXendForLockBusy is active (which is default). > > I did all the tests focusing on when it's deactivated (-UseRTMXendForLockBusy). > This is also the flag passed in jtreg tests since if it's active there are no > aborts caused by 'tabort or xabort' and so no abort statistics (related to > that event, which is used by the jtreg tests). > > >> If this flag is deactivated, we use tabort if we see the object locked so your change prevents retrying the transaction in this case. >> I guess this was not intended? > > I think that rtm_retry_lock_on_abort() is a misleading name, it should be > something like rtm_retry_lock_on_conflict(), since the purpose of this > function is to no retry if abort is caused by a tabort/xabort in my > understanding. > > On Intel that function checks for bit 1 (0x2 mask) and if it is set the operation > is retried. But to bit 1 be set it implies that transaction didn't abort due to > xabort, otherwise that bit would be clear as: > > 77 // 0 Set if abort caused by XABORT instruction. > 78 // 1 If set, the transaction may succeed on a retry. > This bit is always clear if bit 0 is set (or is always clear if abort is caused by XABORT) > > That's why filtering on Power by the "Abort" bit in TEXASR makes the number of > aborts behave like on x64. If we don't filter abort caused by tabort we find the > pattern X*2+1 times of retries, because both rtm_retry_lock_on_abort() and > rtm_retry_lock_on_busy() will try RTMRetryCount of times the operation. > > My change won't prevent retrying because after rtm_retry_lock_on_abort(), if > cmpxchgd() does not succeed it calls rtm_retry_lock_on_busy(), which by its turn > will retry the operation based too on the value specified by RTMRetryCount. > > I prepared a simple test-case where UseRTMXendForLockBusy is deactivated to show > that if we increase the number of RTMRetryCount even with that flag deactivated > the operation is retried exactly RTMRetryCount+1 times after the fix, like on > Intel: > > https://github.com/gromero/retry > > You just need to clone and run it pointing to a build dir: > > $ git clone https://github.com/gromero/retry && cd retry > $ ./retry > > You have to build the WhiteBox lib through "make build-test-lib" before running > it. > > So for RTMRetryCount=1 and RTMRetryCount=2 w/ -UseRTMXendForLockBusy before the > change: > > gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java > ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry > Creating thread0... > Trying to inflate lock... > Is monitor inflated? Yes > Entering thread to sleep... > RTM.syncAndTest at 26 > # rtm locks total (estimated): 3 > # rtm lock aborts : 3 > # rtm lock aborts 0: 3 > # rtm lock aborts 1: 3 > # rtm lock aborts 2: 0 > # rtm lock aborts 3: 0 > # rtm lock aborts 4: 0 > # rtm lock aborts 5: 0 > ++ set +x > gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java > ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry > Creating thread0... > Trying to inflate lock... > Is monitor inflated? Yes > Entering thread to sleep... > RTM.syncAndTest at 26 > # rtm locks total (estimated): 5 > # rtm lock aborts : 5 > # rtm lock aborts 0: 5 > # rtm lock aborts 1: 5 > # rtm lock aborts 2: 0 > # rtm lock aborts 3: 0 > # rtm lock aborts 4: 0 > # rtm lock aborts 5: 0 > ++ set +x > > > and after the change: > > gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java > ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry > Creating thread0... > Trying to inflate lock... > Is monitor inflated? Yes > Entering thread to sleep... > RTM.syncAndTest at 26 > # rtm locks total (estimated): 2 > # rtm lock aborts : 2 > # rtm lock aborts 0: 2 > # rtm lock aborts 1: 2 > # rtm lock aborts 2: 0 > # rtm lock aborts 3: 0 > # rtm lock aborts 4: 0 > # rtm lock aborts 5: 0 > ++ set +x > gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java > ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry > Creating thread0... > Trying to inflate lock... > Is monitor inflated? Yes > Entering thread to sleep... > RTM.syncAndTest at 26 > # rtm locks total (estimated): 3 > # rtm lock aborts : 3 > # rtm lock aborts 0: 3 > # rtm lock aborts 1: 3 > # rtm lock aborts 2: 0 > # rtm lock aborts 3: 0 > # rtm lock aborts 4: 0 > # rtm lock aborts 5: 0 > ++ set +x > > > Best regards, > Gustavo > >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Montag, 25. Juni 2018 10:24 >> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional >> >> Hi, >> >> Could the following change be reviewed please? >> >> bug : https://bugs.openjdk.java.net/browse/JDK-8205580 >> webrev: http://cr.openjdk.java.net/~gromero/8205580/v1/ >> >> It changes the behavior of rtm_retry_lock_on_abort() by avoiding retry if abort >> was a deliberate abort, i.e. caused by a 'tabort r0' instruction. >> >> On Intel bit 1 in abort_status_Reg (which communicates the abort status) is >> always clear when a 'xabort 0' instruction is executed in order to inform that a >> transactional retry /can not/ succeed on retry. So rtm_retry_lock_on_abort() on >> Intel, on finding bit 1 clear in abort_status_Reg, skips the retry (don't >> retry). >> >> Currently on Power rtm_retry_lock_on_abort() is just checking the persistent bit >> (if set => skip) which /is not set/ by 'tabort r0'. Hence >> rtm_retry_lock_on_abort() does retry to lock on an intentional abort caused by >> 'tabort'. It leads, for instance when -XX:RTMRetryCount=1, to the following >> discrepancy between Intel and Power regarding the number of retries/aborts: >> >> [Power] >> # rtm locks total (estimated): 3 >> # rtm lock aborts : 3 >> # rtm lock aborts 0: 3 >> # rtm lock aborts 1: 3 >> # rtm lock aborts 2: 0 >> # rtm lock aborts 3: 0 >> # rtm lock aborts 4: 0 >> # rtm lock aborts 5: 0 >> >> [Intel] >> # rtm locks total (estimated): 2 >> # rtm lock aborts : 2 >> # rtm lock aborts 0: 2 >> # rtm lock aborts 1: 2 >> # rtm lock aborts 2: 0 >> # rtm lock aborts 3: 0 >> # rtm lock aborts 4: 0 >> # rtm lock aborts 5: 0 >> >> So for -XX:RTMRetryCount=X: >> on Power the number of aborts is: X*2+1 [1 first failure + 1 rtm_retry_lock_on_abort() + 1 rtm_retry_lock_on_busy()]; >> on Intel the number of aborts is: X+1 [1 first failure + 1 rtm_retry_lock_on_busy()] >> >> This change fixes that discrepancy by using bit "Abort" in TEXASR register >> (abort_status_Reg) that tells if a transaction was aborted due to a 'tabort' >> instruction and skip the retry if such a bit is set. >> >> It fixes the following tests: >> >> +Passed: compiler/rtm/locking/TestRTMRetryCount.java >> +Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >> >> >> Thank you and best regards, >> Gustavo >> > From gromero at linux.vnet.ibm.com Tue Jul 10 16:59:11 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 10 Jul 2018 13:59:11 -0300 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <571b4e4aa8d04c55a2c68d655aff4023@sap.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> Message-ID: <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> Dear Martin, On 06/25/2018 01:31 PM, Doerr, Martin wrote: > I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. > Somebody may want to use the definition for other purposes. Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted using 'rldicr' in macroAssembler. > I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 (conflict). Duplicated check code for tm_failure_bit[4] was removed and now counter 4 (debug) is mapped to count traps or syscalls caught in TM events, which seems a reasonable approximation to the original semantics of the debug counter on Intel. Unfortunately I could not confirm on AIX how these two events (trap and syscall in TM) will set the failure code, so the counter will never track any information on AIX. But with the current proposed change that failure code can be easily added in the future. I also realized that I used previously a wrong ME operand value in: + // Extract 11 bits + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); It should be 10 to extract 11 bits actually, so all extractions must be correct now. I hope you don't find the array map of failure bits vs counters overkilling. Finally, I replaced the comment: tm_tabort, // Note: Seems like signal handler sets this, too. by: tm_tabort, // Signal handler will set this too. because we just enable RTM support on Power if 'htm-nosc' is supported, so treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR afaik. Since debug counter now tracks trap/syscall in TM it's possible to check that counter to verify the number of aborts cause by the kernel (if any). new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Montag, 25. Juni 2018 10:19 > To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Hi, > > Could the following change be reviewed please? > > bug : https://bugs.openjdk.java.net/browse/JDK-8205582 > webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ > > It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by > extracting and checking bits in the Transactional Level field of TEXASR > register. > > It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM > status register supports two bits to inform two different types of memory > conflict between threads: non-transactional and transactional. According to how > the jtreg RTM tests are designed the memory conflict counter counts > non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock > is held on a static variable while another thread without any synchronization > (non-trasactional) tries to modify the same variable. Hence that small > adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on > Power. The memory conflict counter is not used in any other place besides by the > RTM precise statistics (no decision is made by the JVM based on that amount). > > This change partially fixes some failures in > compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the > nested and memory conflict abort counters. The remaining issue will be fixed by > aborting on calling JNI (next RFR). > > > Thank you and best regards, > Gustavo > From mikael.vidstedt at oracle.com Wed Jul 11 22:52:26 2018 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 11 Jul 2018 15:52:26 -0700 Subject: RFR(S): 8207011: Remove uses of the register storage class specifier Message-ID: Please review the below change which removes *most* uses of the register keyword/storage class specifier. Bug: https://bugs.openjdk.java.net/browse/JDK-8207011 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8207011/webrev.01/open/webrev/ * Background (from the bug) The C/C++ register keyword/storage class specifier may have made a difference many moons ago, but the C++11 standard deprecated it, and starting with C++17 it is a reserved keyword. Some compilers emit deprecation warnings even when targeting earlier C++ standards such as C++14. * Commentary The one case where the register keyword remains is when compiling (effectively) inline assembly with gcc, patterns like: address os::current_stack_pointer() { ? #else // gcc register void *esp __asm__ (SPELL_REG_SP); return (address) esp; #endif } Removing the register keyword here breaks the code, and gcc does *not* complain about using it for these patterns, so I chose to leave it there. An alternative to that would be to always use the ?clang? style mov instruction. I know there is another thread[1] discussing how to move forward with the current_stack_pointer on clang 4.0. I?ll keep my eyes on that to make sure we don?t collide (and cc:ing Martin for good luck). Would appreciate some help from the respective porting folks to verify the aix/ppc/s390 changes. Cheers, Mikael [1] http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-July/029099.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From kim.barrett at oracle.com Thu Jul 12 04:03:42 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 12 Jul 2018 00:03:42 -0400 Subject: RFR(S): 8207011: Remove uses of the register storage class specifier In-Reply-To: References: Message-ID: <35BB03F7-3FCB-4DC9-82A1-F1DFD4FDCD2F@oracle.com> > On Jul 11, 2018, at 6:52 PM, Mikael Vidstedt wrote: > > > Please review the below change which removes *most* uses of the register keyword/storage class specifier. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8207011 > Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8207011/webrev.01/open/webrev/ > > * Background (from the bug) > > The C/C++ register keyword/storage class specifier may have made a difference many moons ago, but the C++11 standard deprecated it, and starting with C++17 it is a reserved keyword. Some compilers emit deprecation warnings even when targeting earlier C++ standards such as C++14. > > > * Commentary > > The one case where the register keyword remains is when compiling (effectively) inline assembly with gcc, patterns like: > > address os::current_stack_pointer() { > ? > #else // gcc > register void *esp __asm__ (SPELL_REG_SP); > return (address) esp; > #endif > } > > Removing the register keyword here breaks the code, and gcc does *not* complain about using it for these patterns, so I chose to leave it there. An alternative to that would be to always use the ?clang? style mov instruction. I know there is another thread[1] discussing how to move forward with the current_stack_pointer on clang 4.0. I?ll keep my eyes on that to make sure we don?t collide (and cc:ing Martin for good luck). > > Would appreciate some help from the respective porting folks to verify the aix/ppc/s390 changes. > > Cheers, > Mikael > > [1] http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-July/029099.html Looks good. From martin.doerr at sap.com Thu Jul 12 15:30:27 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 12 Jul 2018 15:30:27 +0000 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> Message-ID: Hi Gustavo, I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? Besides that, I have some improvement proposals: - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 10. Juli 2018 18:59 To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net Cc: ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions Dear Martin, On 06/25/2018 01:31 PM, Doerr, Martin wrote: > I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. > Somebody may want to use the definition for other purposes. Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted using 'rldicr' in macroAssembler. > I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 (conflict). Duplicated check code for tm_failure_bit[4] was removed and now counter 4 (debug) is mapped to count traps or syscalls caught in TM events, which seems a reasonable approximation to the original semantics of the debug counter on Intel. Unfortunately I could not confirm on AIX how these two events (trap and syscall in TM) will set the failure code, so the counter will never track any information on AIX. But with the current proposed change that failure code can be easily added in the future. I also realized that I used previously a wrong ME operand value in: + // Extract 11 bits + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); It should be 10 to extract 11 bits actually, so all extractions must be correct now. I hope you don't find the array map of failure bits vs counters overkilling. Finally, I replaced the comment: tm_tabort, // Note: Seems like signal handler sets this, too. by: tm_tabort, // Signal handler will set this too. because we just enable RTM support on Power if 'htm-nosc' is supported, so treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR afaik. Since debug counter now tracks trap/syscall in TM it's possible to check that counter to verify the number of aborts cause by the kernel (if any). new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Montag, 25. Juni 2018 10:19 > To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Hi, > > Could the following change be reviewed please? > > bug : https://bugs.openjdk.java.net/browse/JDK-8205582 > webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ > > It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by > extracting and checking bits in the Transactional Level field of TEXASR > register. > > It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM > status register supports two bits to inform two different types of memory > conflict between threads: non-transactional and transactional. According to how > the jtreg RTM tests are designed the memory conflict counter counts > non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock > is held on a static variable while another thread without any synchronization > (non-trasactional) tries to modify the same variable. Hence that small > adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on > Power. The memory conflict counter is not used in any other place besides by the > RTM precise statistics (no decision is made by the JVM based on that amount). > > This change partially fixes some failures in > compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the > nested and memory conflict abort counters. The remaining issue will be fixed by > aborting on calling JNI (next RFR). > > > Thank you and best regards, > Gustavo > From martinrb at google.com Thu Jul 12 18:42:03 2018 From: martinrb at google.com (Martin Buchholz) Date: Thu, 12 Jul 2018 11:42:03 -0700 Subject: RFR(S): 8207011: Remove uses of the register storage class specifier In-Reply-To: References: Message-ID: On Wed, Jul 11, 2018 at 3:52 PM, Mikael Vidstedt wrote: > > Removing the register keyword here breaks the code, and gcc does *not* > complain about using it for these patterns, so I chose to leave it there. > An alternative to that would be to always use the ?clang? style mov > instruction. I know there is another thread[1] discussing how to move > forward with the current_stack_pointer on clang 4.0. I?ll keep my eyes on > that to make sure we don?t collide (and cc:ing Martin for good luck). > As I wrote elsewhere, I'd like to get rid of the super-brittle stack pointer assembly entirely, but especially for stack alignment checking. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Fri Jul 13 14:55:53 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 13 Jul 2018 11:55:53 -0300 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> Message-ID: <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> Hi Martin, On 07/12/2018 12:30 PM, Doerr, Martin wrote: > I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. > Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? Do you mean increment twice counter #2 (conflict counter) or another counter? non_trans_cf and trans_cf are mutually exclusive. You probably spotted a case I'm missing so I ask. > Besides that, I have some improvement proposals: > > - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? > > - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. Cool :) New (interim) webrev: http://cr.openjdk.java.net/~gromero/8205582/v3/ Thanks. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 10. Juli 2018 18:59 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Dear Martin, > > On 06/25/2018 01:31 PM, Doerr, Martin wrote: >> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. >> Somebody may want to use the definition for other purposes. > > Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted > using 'rldicr' in macroAssembler. > > >> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. > > Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 > (conflict). Duplicated check code for tm_failure_bit[4] was removed and now > counter 4 (debug) is mapped to count traps or syscalls caught in TM events, > which seems a reasonable approximation to the original semantics of the debug > counter on Intel. Unfortunately I could not confirm on AIX how these two events > (trap and syscall in TM) will set the failure code, so the counter will never > track any information on AIX. But with the current proposed change that failure > code can be easily added in the future. > > I also realized that I used previously a wrong ME operand value in: > > + // Extract 11 bits > + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); > > It should be 10 to extract 11 bits actually, so all extractions must be correct > now. > > > I hope you don't find the array map of failure bits vs counters overkilling. > > Finally, I replaced the comment: > > tm_tabort, // Note: Seems like signal handler sets this, too. > > by: > > tm_tabort, // Signal handler will set this too. > > because we just enable RTM support on Power if 'htm-nosc' is supported, so > treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR > afaik. Since debug counter now tracks trap/syscall in TM it's possible to check > that counter to verify the number of aborts cause by the kernel (if any). > > new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Montag, 25. Juni 2018 10:19 >> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >> >> Hi, >> >> Could the following change be reviewed please? >> >> bug : https://bugs.openjdk.java.net/browse/JDK-8205582 >> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >> >> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >> extracting and checking bits in the Transactional Level field of TEXASR >> register. >> >> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM >> status register supports two bits to inform two different types of memory >> conflict between threads: non-transactional and transactional. According to how >> the jtreg RTM tests are designed the memory conflict counter counts >> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock >> is held on a static variable while another thread without any synchronization >> (non-trasactional) tries to modify the same variable. Hence that small >> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on >> Power. The memory conflict counter is not used in any other place besides by the >> RTM precise statistics (no decision is made by the JVM based on that amount). >> >> This change partially fixes some failures in >> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the >> nested and memory conflict abort counters. The remaining issue will be fixed by >> aborting on calling JNI (next RFR). >> >> >> Thank you and best regards, >> Gustavo >> > From martin.doerr at sap.com Mon Jul 16 08:17:59 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 16 Jul 2018 08:17:59 +0000 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> Message-ID: <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> Hi Gustavo, thanks for the new webrev. You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case. However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits. If you want to keep the code with the loops, I think there should be a comment explaining this. Please also fix indentation. Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse. Besides that, it looks good. Thanks for improving it. Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Freitag, 13. Juli 2018 16:56 To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net Cc: ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions Hi Martin, On 07/12/2018 12:30 PM, Doerr, Martin wrote: > I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. > Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? Do you mean increment twice counter #2 (conflict counter) or another counter? non_trans_cf and trans_cf are mutually exclusive. You probably spotted a case I'm missing so I ask. > Besides that, I have some improvement proposals: > > - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? > > - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. Cool :) New (interim) webrev: http://cr.openjdk.java.net/~gromero/8205582/v3/ Thanks. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 10. Juli 2018 18:59 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Dear Martin, > > On 06/25/2018 01:31 PM, Doerr, Martin wrote: >> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. >> Somebody may want to use the definition for other purposes. > > Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted > using 'rldicr' in macroAssembler. > > >> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. > > Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 > (conflict). Duplicated check code for tm_failure_bit[4] was removed and now > counter 4 (debug) is mapped to count traps or syscalls caught in TM events, > which seems a reasonable approximation to the original semantics of the debug > counter on Intel. Unfortunately I could not confirm on AIX how these two events > (trap and syscall in TM) will set the failure code, so the counter will never > track any information on AIX. But with the current proposed change that failure > code can be easily added in the future. > > I also realized that I used previously a wrong ME operand value in: > > + // Extract 11 bits > + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); > > It should be 10 to extract 11 bits actually, so all extractions must be correct > now. > > > I hope you don't find the array map of failure bits vs counters overkilling. > > Finally, I replaced the comment: > > tm_tabort, // Note: Seems like signal handler sets this, too. > > by: > > tm_tabort, // Signal handler will set this too. > > because we just enable RTM support on Power if 'htm-nosc' is supported, so > treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR > afaik. Since debug counter now tracks trap/syscall in TM it's possible to check > that counter to verify the number of aborts cause by the kernel (if any). > > new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Montag, 25. Juni 2018 10:19 >> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >> >> Hi, >> >> Could the following change be reviewed please? >> >> bug : https://bugs.openjdk.java.net/browse/JDK-8205582 >> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >> >> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >> extracting and checking bits in the Transactional Level field of TEXASR >> register. >> >> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM >> status register supports two bits to inform two different types of memory >> conflict between threads: non-transactional and transactional. According to how >> the jtreg RTM tests are designed the memory conflict counter counts >> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock >> is held on a static variable while another thread without any synchronization >> (non-trasactional) tries to modify the same variable. Hence that small >> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on >> Power. The memory conflict counter is not used in any other place besides by the >> RTM precise statistics (no decision is made by the JVM based on that amount). >> >> This change partially fixes some failures in >> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the >> nested and memory conflict abort counters. The remaining issue will be fixed by >> aborting on calling JNI (next RFR). >> >> >> Thank you and best regards, >> Gustavo >> > From gromero at linux.vnet.ibm.com Tue Jul 17 06:55:17 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 03:55:17 -0300 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> Message-ID: <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> Hi Martin, On 07/16/2018 05:17 AM, Doerr, Martin wrote: > thanks for the new webrev. Thanks a lot for the thorough review. > You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case. > However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits. > If you want to keep the code with the loops, I think there should be a comment explaining this. I see. I've experimented a couple of options following your previous suggestion of using the counter loop as the outer loop. Most difficult point was to work around the constraint of having just R0 available as scratch. Bit/bitfield extracting instrs did not help much since most of them can't perform an OR with its destination operand, which gets worse with the R0 constraint. On the other, afaics 'rldimi' which does an OR with its destination operand can't be used to extract the bit/bitfields in a generic way. That best alternative I found was to use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now, i.e. all bits/conditions for a given counter are ORed. failure_code logic is not inverted any more in the map, which seems more natural. I added also more information to the comment before the bit/counter map. I think now the loop is generic. Webrev for the final result: http://cr.openjdk.java.net/~gromero/8205582/v4_A/ > Please also fix indentation. Done. > Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse. > > Besides that, it looks good. Thanks for improving it. Due to the tight schedule I also provide the corrections for the last reviewed version: http://cr.openjdk.java.net/~gromero/8205582/v4_B/ If v4_A also looks good I vouch for pushing it instead of v4_B. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Freitag, 13. Juli 2018 16:56 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Hi Martin, > > On 07/12/2018 12:30 PM, Doerr, Martin wrote: >> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. >> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? > > Do you mean increment twice counter #2 (conflict counter) or another counter? > non_trans_cf and trans_cf are mutually exclusive. > You probably spotted a case I'm missing so I ask. > > >> Besides that, I have some improvement proposals: >> >> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? >> >> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. > > Cool :) > > New (interim) webrev: > http://cr.openjdk.java.net/~gromero/8205582/v3/ > > Thanks. > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 10. Juli 2018 18:59 >> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >> >> Dear Martin, >> >> On 06/25/2018 01:31 PM, Doerr, Martin wrote: >>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. >>> Somebody may want to use the definition for other purposes. >> >> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted >> using 'rldicr' in macroAssembler. >> >> >>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. >> >> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 >> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now >> counter 4 (debug) is mapped to count traps or syscalls caught in TM events, >> which seems a reasonable approximation to the original semantics of the debug >> counter on Intel. Unfortunately I could not confirm on AIX how these two events >> (trap and syscall in TM) will set the failure code, so the counter will never >> track any information on AIX. But with the current proposed change that failure >> code can be easily added in the future. >> >> I also realized that I used previously a wrong ME operand value in: >> >> + // Extract 11 bits >> + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); >> >> It should be 10 to extract 11 bits actually, so all extractions must be correct >> now. >> >> >> I hope you don't find the array map of failure bits vs counters overkilling. >> >> Finally, I replaced the comment: >> >> tm_tabort, // Note: Seems like signal handler sets this, too. >> >> by: >> >> tm_tabort, // Signal handler will set this too. >> >> because we just enable RTM support on Power if 'htm-nosc' is supported, so >> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR >> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check >> that counter to verify the number of aborts cause by the kernel (if any). >> >> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ >> >> >> Best regards, >> Gustavo >> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Montag, 25. Juni 2018 10:19 >>> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >>> Cc: ppc-aix-port-dev at openjdk.java.net >>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>> >>> Hi, >>> >>> Could the following change be reviewed please? >>> >>> bug : https://bugs.openjdk.java.net/browse/JDK-8205582 >>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >>> >>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >>> extracting and checking bits in the Transactional Level field of TEXASR >>> register. >>> >>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM >>> status register supports two bits to inform two different types of memory >>> conflict between threads: non-transactional and transactional. According to how >>> the jtreg RTM tests are designed the memory conflict counter counts >>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock >>> is held on a static variable while another thread without any synchronization >>> (non-trasactional) tries to modify the same variable. Hence that small >>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on >>> Power. The memory conflict counter is not used in any other place besides by the >>> RTM precise statistics (no decision is made by the JVM based on that amount). >>> >>> This change partially fixes some failures in >>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the >>> nested and memory conflict abort counters. The remaining issue will be fixed by >>> aborting on calling JNI (next RFR). >>> >>> >>> Thank you and best regards, >>> Gustavo >>> >> > From gromero at linux.vnet.ibm.com Tue Jul 17 07:06:57 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 04:06:57 -0300 Subject: RFR(xs): 8205578: jtreg: Fix failing TestRTMAbortRatio on PPC64 In-Reply-To: <76020dfc-81b1-897e-f04e-5ac5cd021418@linux.vnet.ibm.com> References: <37227b8b-7977-349a-249c-f7613d06d64f@linux.vnet.ibm.com> <25dba6c8-067d-3d3f-c4f3-dca19a34d70b@oracle.com> <76020dfc-81b1-897e-f04e-5ac5cd021418@linux.vnet.ibm.com> Message-ID: <1bdcc4ba-6cdc-7fb8-f859-ef7558b51314@linux.vnet.ibm.com> Hi, I'm going to push it to jdk/jdk11 after running the tests if there are no objections. Thanks, Gustavo On 06/28/2018 03:00 PM, Gustavo Romero wrote: > Hi Vladimir, > > On 06/28/2018 02:43 PM, Vladimir Kozlov wrote: >> Looks good. > > Thanks a lot for reviewing it. :) > > > Regards, > Gustavo > >> Thanks, >> Vladimir >> >> On 6/28/18 6:13 AM, Gustavo Romero wrote: >>> Hi Igor, >>> >>> On 06/28/2018 03:26 AM, Igor Ignatyev wrote: >>>> Hi Gustavo, >>>> >>>> looks fine to me. >>> >>> Thanks! >>> >>> Could I get a second review please? >>> >>> >>> Regards, >>> Gustavo >>> >>>> Thanks, >>>> -- Igor >>>> >>>>> On Jun 25, 2018, at 1:29 AM, Gustavo Romero wrote: >>>>> >>>>> Hi, >>>>> >>>>> Could the following simple change be reviewed please? >>>>> >>>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205578 >>>>> webrev: http://cr.openjdk.java.net/~gromero/8205578/v1/ >>>>> >>>>> Currently native method pageSize() is used to cause deliberate transactional >>>>> aborts. However in test TestRTMAbortRatio pageSize() is not marked to be >>>>> compilable and as a consequence it's never called through the code path of >>>>> SharedRuntime::generate_native_wrapper(). As that code path is never exercised >>>>> no 'tabort' on JNI call is executed and the test fails on Power because of fewer >>>>> aborts than expected by the test. >>>>> >>>>> I can't say for sure why that test is getting the correct number of aborts on >>>>> x86. Nonetheless I can confirm that even on x86 the aborts do not come from the >>>>> native wrapper, i.e. from 'xabort' in SharedRuntime::generate_native_wrapper(). >>>>> I suspect the aborts on x86 are occurring a bit latter when the native function >>>>> is called and a "Far Call" is executed in the native method by chance and not in >>>>> a controlled way. As far as I know there is no way to inspect the exact address >>>>> when a transaction failed on Intel as it's possible on Power. >>>>> >>>>> Anyway, marking pageSize() as compilable does not cause any regression on Intel >>>>> (at the same time it starts to exercise the generate_native_wrapper code path) >>>>> and makes the test pass on Power as expected. >>>>> >>>>> So it fixes the following test on Power: >>>>> >>>>> +Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>> >>>>> >>>>> Thank you and best regards, >>>>> Gustavo >>>>> >>>> >>> >> > From gromero at linux.vnet.ibm.com Tue Jul 17 07:10:12 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 04:10:12 -0300 Subject: RFR(xs): 8205390: jtreg: Fix failing TestRTMSpinLoopCount on PPC64 In-Reply-To: <4385AD40-8851-41D1-AA48-D2F53CF9A7BA@oracle.com> References: <13b30945-b86b-5c4e-c92b-a47b7ed425d3@oracle.com> <4385AD40-8851-41D1-AA48-D2F53CF9A7BA@oracle.com> Message-ID: <6a1bc8d7-6a6d-b83b-38d3-3446fc33126a@linux.vnet.ibm.com> Hi, I'm going to push it to jdk/jdk11 after running the tests if there are no objections. Thanks, Gustavo On 06/26/2018 05:56 PM, Igor Ignatyev wrote: > +1 > > -- Igor > >> On Jun 25, 2018, at 10:21 AM, Vladimir Kozlov wrote: >> >> Good. >> >> Thanks, >> Vladimir >> >> On 6/25/18 1:31 AM, Gustavo Romero wrote: >>> Hi, >>> Could the following change be reviewed please? >>> bug : https://bugs.openjdk.java.net/browse/JDK-8205390 >>> webrev: http://cr.openjdk.java.net/~gromero/8205390/v1/ >>> It adds a new throttling sequence for PPC64 because the last value on current >>> sequence does not fit on PPC64. >>> By using the new sequence the following test is fixed: >>> +Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>> Thank you and best regards, >>> Gustavo > From martin.doerr at sap.com Tue Jul 17 09:02:01 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 17 Jul 2018 09:02:01 +0000 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> Message-ID: <428b95df734446c285cdbccf1c2dc24d@sap.com> Hi Gustavo, your webrev v4_B looks good. Reviewed. I think v4_A wouldn't work appropriately when using +1 and -1 bits together. A generic version could be something like (just to explain what I was thinking about): for (int nbit = 0; nbit < num_failure_bits; nbit++) { Label do_increment, check_abort; int last_match = -1; for (ncounter = 0; ncounter < num_counters; ncounter++) { if (last_match >= 0) { rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); int selection = bit_counter_map[last_match][ncounter]; if (selection == 1) { bne(CCR1, do_increment); } else if (selection == -1) { beq(CCR1, do_increment); } } last_match = nbit; } assert(last_match >= 0, "should have at least one"); rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); int selection = bit_counter_map[last_match][ncounter]; if (selection == 1) { beq(CCR1, check_abort); } else if (selection == -1) { bne(CCR1, check_abort); } bind(do_increment); ld(temp_Reg, abort_counter_offs, rtm_counters_Reg); addi(temp_Reg, temp_Reg, 1); std(temp_Reg, abort_counter_offs, rtm_counters_Reg); bind(check_abort); } But I'm fine with webrev v4_B with the comment you have added. Thanks, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 17. Juli 2018 08:55 To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net Cc: ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions Hi Martin, On 07/16/2018 05:17 AM, Doerr, Martin wrote: > thanks for the new webrev. Thanks a lot for the thorough review. > You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case. > However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits. > If you want to keep the code with the loops, I think there should be a comment explaining this. I see. I've experimented a couple of options following your previous suggestion of using the counter loop as the outer loop. Most difficult point was to work around the constraint of having just R0 available as scratch. Bit/bitfield extracting instrs did not help much since most of them can't perform an OR with its destination operand, which gets worse with the R0 constraint. On the other, afaics 'rldimi' which does an OR with its destination operand can't be used to extract the bit/bitfields in a generic way. That best alternative I found was to use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now, i.e. all bits/conditions for a given counter are ORed. failure_code logic is not inverted any more in the map, which seems more natural. I added also more information to the comment before the bit/counter map. I think now the loop is generic. Webrev for the final result: http://cr.openjdk.java.net/~gromero/8205582/v4_A/ > Please also fix indentation. Done. > Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse. > > Besides that, it looks good. Thanks for improving it. Due to the tight schedule I also provide the corrections for the last reviewed version: http://cr.openjdk.java.net/~gromero/8205582/v4_B/ If v4_A also looks good I vouch for pushing it instead of v4_B. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Freitag, 13. Juli 2018 16:56 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Hi Martin, > > On 07/12/2018 12:30 PM, Doerr, Martin wrote: >> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. >> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? > > Do you mean increment twice counter #2 (conflict counter) or another counter? > non_trans_cf and trans_cf are mutually exclusive. > You probably spotted a case I'm missing so I ask. > > >> Besides that, I have some improvement proposals: >> >> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? >> >> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. > > Cool :) > > New (interim) webrev: > http://cr.openjdk.java.net/~gromero/8205582/v3/ > > Thanks. > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 10. Juli 2018 18:59 >> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >> >> Dear Martin, >> >> On 06/25/2018 01:31 PM, Doerr, Martin wrote: >>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. >>> Somebody may want to use the definition for other purposes. >> >> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted >> using 'rldicr' in macroAssembler. >> >> >>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. >> >> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 >> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now >> counter 4 (debug) is mapped to count traps or syscalls caught in TM events, >> which seems a reasonable approximation to the original semantics of the debug >> counter on Intel. Unfortunately I could not confirm on AIX how these two events >> (trap and syscall in TM) will set the failure code, so the counter will never >> track any information on AIX. But with the current proposed change that failure >> code can be easily added in the future. >> >> I also realized that I used previously a wrong ME operand value in: >> >> + // Extract 11 bits >> + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); >> >> It should be 10 to extract 11 bits actually, so all extractions must be correct >> now. >> >> >> I hope you don't find the array map of failure bits vs counters overkilling. >> >> Finally, I replaced the comment: >> >> tm_tabort, // Note: Seems like signal handler sets this, too. >> >> by: >> >> tm_tabort, // Signal handler will set this too. >> >> because we just enable RTM support on Power if 'htm-nosc' is supported, so >> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR >> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check >> that counter to verify the number of aborts cause by the kernel (if any). >> >> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ >> >> >> Best regards, >> Gustavo >> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Montag, 25. Juni 2018 10:19 >>> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >>> Cc: ppc-aix-port-dev at openjdk.java.net >>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>> >>> Hi, >>> >>> Could the following change be reviewed please? >>> >>> bug : https://bugs.openjdk.java.net/browse/JDK-8205582 >>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >>> >>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >>> extracting and checking bits in the Transactional Level field of TEXASR >>> register. >>> >>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM >>> status register supports two bits to inform two different types of memory >>> conflict between threads: non-transactional and transactional. According to how >>> the jtreg RTM tests are designed the memory conflict counter counts >>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock >>> is held on a static variable while another thread without any synchronization >>> (non-trasactional) tries to modify the same variable. Hence that small >>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on >>> Power. The memory conflict counter is not used in any other place besides by the >>> RTM precise statistics (no decision is made by the JVM based on that amount). >>> >>> This change partially fixes some failures in >>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the >>> nested and memory conflict abort counters. The remaining issue will be fixed by >>> aborting on calling JNI (next RFR). >>> >>> >>> Thank you and best regards, >>> Gustavo >>> >> > From gromero at linux.vnet.ibm.com Tue Jul 17 11:18:25 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 08:18:25 -0300 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <428b95df734446c285cdbccf1c2dc24d@sap.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> <428b95df734446c285cdbccf1c2dc24d@sap.com> Message-ID: Hi Martin, OK. Let's continue with v4_B [1] so. Thanks for reviewing it! Best regards, Gustavo [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B On 07/17/2018 06:02 AM, Doerr, Martin wrote: > Hi Gustavo, > > your webrev v4_B looks good. Reviewed. > > I think v4_A wouldn't work appropriately when using +1 and -1 bits together. > > A generic version could be something like (just to explain what I was thinking about): > for (int nbit = 0; nbit < num_failure_bits; nbit++) { > Label do_increment, check_abort; > > int last_match = -1; > for (ncounter = 0; ncounter < num_counters; ncounter++) { > if (last_match >= 0) { > rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); > int selection = bit_counter_map[last_match][ncounter]; > if (selection == 1) { > bne(CCR1, do_increment); > } else if (selection == -1) { > beq(CCR1, do_increment); > } > } > last_match = nbit; > } > > assert(last_match >= 0, "should have at least one"); > rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); > int selection = bit_counter_map[last_match][ncounter]; > if (selection == 1) { > beq(CCR1, check_abort); > } else if (selection == -1) { > bne(CCR1, check_abort); > } > > bind(do_increment); > ld(temp_Reg, abort_counter_offs, rtm_counters_Reg); > addi(temp_Reg, temp_Reg, 1); > std(temp_Reg, abort_counter_offs, rtm_counters_Reg); > bind(check_abort); > } > > But I'm fine with webrev v4_B with the comment you have added. > > Thanks, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 17. Juli 2018 08:55 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions > > Hi Martin, > > On 07/16/2018 05:17 AM, Doerr, Martin wrote: >> thanks for the new webrev. > > Thanks a lot for the thorough review. > > >> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case. >> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits. >> If you want to keep the code with the loops, I think there should be a comment explaining this. > > I see. I've experimented a couple of options following your previous suggestion > of using the counter loop as the outer loop. Most difficult point was to work > around the constraint of having just R0 available as scratch. Bit/bitfield > extracting instrs did not help much since most of them can't perform an OR with > its destination operand, which gets worse with the R0 constraint. On the other, > afaics 'rldimi' which does an OR with its destination operand can't be used to > extract the bit/bitfields in a generic way. That best alternative I found was to > use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now, > i.e. all bits/conditions for a given counter are ORed. failure_code logic is not > inverted any more in the map, which seems more natural. I added also more > information to the comment before the bit/counter map. > I think now the loop is generic. > Webrev for the final result: > > http://cr.openjdk.java.net/~gromero/8205582/v4_A/ > > >> Please also fix indentation. > > Done. > > >> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse. >> >> Besides that, it looks good. Thanks for improving it. > > Due to the tight schedule I also provide the corrections for the last reviewed > version: > > http://cr.openjdk.java.net/~gromero/8205582/v4_B/ > > If v4_A also looks good I vouch for pushing it instead of v4_B. > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Freitag, 13. Juli 2018 16:56 >> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >> >> Hi Martin, >> >> On 07/12/2018 12:30 PM, Doerr, Martin wrote: >>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. >>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? >> >> Do you mean increment twice counter #2 (conflict counter) or another counter? >> non_trans_cf and trans_cf are mutually exclusive. >> You probably spotted a case I'm missing so I ask. >> >> >>> Besides that, I have some improvement proposals: >>> >>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? >>> >>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. >> >> Cool :) >> >> New (interim) webrev: >> http://cr.openjdk.java.net/~gromero/8205582/v3/ >> >> Thanks. >> >> >> Best regards, >> Gustavo >> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Dienstag, 10. Juli 2018 18:59 >>> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >>> Cc: ppc-aix-port-dev at openjdk.java.net >>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>> >>> Dear Martin, >>> >>> On 06/25/2018 01:31 PM, Doerr, Martin wrote: >>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. >>>> Somebody may want to use the definition for other purposes. >>> >>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted >>> using 'rldicr' in macroAssembler. >>> >>> >>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. >>> >>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 >>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now >>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events, >>> which seems a reasonable approximation to the original semantics of the debug >>> counter on Intel. Unfortunately I could not confirm on AIX how these two events >>> (trap and syscall in TM) will set the failure code, so the counter will never >>> track any information on AIX. But with the current proposed change that failure >>> code can be easily added in the future. >>> >>> I also realized that I used previously a wrong ME operand value in: >>> >>> + // Extract 11 bits >>> + rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); >>> >>> It should be 10 to extract 11 bits actually, so all extractions must be correct >>> now. >>> >>> >>> I hope you don't find the array map of failure bits vs counters overkilling. >>> >>> Finally, I replaced the comment: >>> >>> tm_tabort, // Note: Seems like signal handler sets this, too. >>> >>> by: >>> >>> tm_tabort, // Signal handler will set this too. >>> >>> because we just enable RTM support on Power if 'htm-nosc' is supported, so >>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR >>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check >>> that counter to verify the number of aborts cause by the kernel (if any). >>> >>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>> Sent: Montag, 25. Juni 2018 10:19 >>>> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>>> >>>> Hi, >>>> >>>> Could the following change be reviewed please? >>>> >>>> bug : https://bugs.openjdk.java.net/browse/JDK-8205582 >>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >>>> >>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >>>> extracting and checking bits in the Transactional Level field of TEXASR >>>> register. >>>> >>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM >>>> status register supports two bits to inform two different types of memory >>>> conflict between threads: non-transactional and transactional. According to how >>>> the jtreg RTM tests are designed the memory conflict counter counts >>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock >>>> is held on a static variable while another thread without any synchronization >>>> (non-trasactional) tries to modify the same variable. Hence that small >>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on >>>> Power. The memory conflict counter is not used in any other place besides by the >>>> RTM precise statistics (no decision is made by the JVM based on that amount). >>>> >>>> This change partially fixes some failures in >>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the >>>> nested and memory conflict abort counters. The remaining issue will be fixed by >>>> aborting on calling JNI (next RFR). >>>> >>>> >>>> Thank you and best regards, >>>> Gustavo >>>> >>> >> > From gromero at linux.vnet.ibm.com Tue Jul 17 13:44:25 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 10:44:25 -0300 Subject: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls In-Reply-To: References: <86a9e4085b424a58a28436415786e312@sap.com> <7c133789-e13a-d675-17ad-8338fc19c9ce@linux.vnet.ibm.com> Message-ID: Hi, Could I get a second review for that change please? Best regards, Gustavo On 06/26/2018 09:54 AM, Doerr, Martin wrote: > Hi Gustavo, > > thanks for the update. Looks good to me. > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 26. Juni 2018 14:41 > To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls > > Hi Martin, > > Thanks for the quick review! > > On 06/25/2018 01:37 PM, Doerr, Martin wrote: >> I wonder why you placed the tabort so late in generate_native_wrapper. I'd put it at the Verified Entry Point. > > Actually for no particular reason. > > So previously: > > 2247 [Verified Entry Point] > 2248 0x00007fff9c816520: mfcr r22 > 2249 0x00007fff9c816524: std r22,8(r1) > 2250 0x00007fff9c816528: mflr r22 > 2251 0x00007fff9c81652c: std r22,16(r1) > 2252 0x00007fff9c816530: addis r11,r1,-2 > 2253 0x00007fff9c816534: std r0,0(r11) > 2254 0x00007fff9c816538: mr r21,r1 > 2255 0x00007fff9c81653c: stdu r1,-176(r1) > 2256 0x00007fff9c816540: std r3,96(r1) > 2257 0x00007fff9c816544: addi r4,r1,96 > 2258 0x00007fff9c816548: cmpdi r3,0 > 2259 0x00007fff9c81654c: bne- 0x00007fff9c816554 > 2260 0x00007fff9c816550: li r4,0 > 2261 0x00007fff9c816554: addi r3,r16,824 ; ImmutableOopMap{[96]=Oop } > 2262 0x00007fff9c816558: addis r28,r29,7 > 2263 0x00007fff9c81655c: addi r28,r28,25944 ; {internal_word} > 2264 0x00007fff9c816560: tabort. r0 <== > ... > > Now: > > 2169 [Verified Entry Point] > 2170 0x00007fff78816320: tabort. r0 <== > 2171 0x00007fff78816324: mfcr r22 > 2172 0x00007fff78816328: std r22,8(r1) > 2173 0x00007fff7881632c: mflr r22 > 2174 0x00007fff78816330: std r22,16(r1) > 2175 0x00007fff78816334: addis r11,r1,-2 > 2176 0x00007fff78816338: std r0,0(r11) > 2177 0x00007fff7881633c: mr r21,r1 > 2178 0x00007fff78816340: stdu r1,-176(r1) > 2179 0x00007fff78816344: std r3,96(r1) > 2180 0x00007fff78816348: addi r4,r1,96 > 2181 0x00007fff7881634c: cmpdi r3,0 > 2182 0x00007fff78816350: bne- 0x00007fff78816358 > 2183 0x00007fff78816354: li r4,0 > 2184 0x00007fff78816358: addi r3,r16,824 ; ImmutableOopMap{[96]=Oop } > 2185 0x00007fff7881635c: addis r28,r29,7 > 2186 0x00007fff78816360: addi r28,r28,25436 ; {internal_word} > ... > > Yep, it's better to abort sooner. > > new webrev: http://cr.openjdk.java.net/~gromero/8205581/v2/ > > > Best regards, > Gustavo > >> Besides that, it looks good to me. >> >> Thanks, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Montag, 25. Juni 2018 10:21 >> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls >> >> Hi, >> >> Could the following change be reviewed please? >> >> bug : https://bugs.openjdk.java.net/browse/JDK-8205581 >> webrev: http://cr.openjdk.java.net/~gromero/8205581/v1/ >> >> It forces a transactional state to abort before calling native methods, before >> calling runtime, and on uncommon trap checking, mostly because transaction will >> be aborted soon or latter in either case, similarly to what happens on Intel. >> The abort instruction (tabort) is only emitted if UseRTMLocking is "true" and >> any 'tabort' instruction is treated as a 'nop' instruction if TM state is >> non-transactional. >> >> It fixes the following tests: >> >> +Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> +Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >> >> >> Thank you and best regards, >> Gustavo >> > From gromero at linux.vnet.ibm.com Tue Jul 17 13:44:17 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 10:44:17 -0300 Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional In-Reply-To: References: <56e66a89-42a7-eb72-05a2-a97c696379c9@linux.vnet.ibm.com> Message-ID: <33106239-dc48-0599-3b20-66d93f27463e@linux.vnet.ibm.com> Hi, Could I get a second review for that change please? It's already reviewed by Martin. Best regards, Gustavo On 07/02/2018 11:03 AM, Gustavo Romero wrote: > Hi Martin, > > On 07/02/2018 10:20 AM, Doerr, Martin wrote: >> I meant retrying "on abort", not "on busy". There are different counters for these two retry functions. >> RTM for Stack Locks only supports "on abort". >> Inflated RTM locking supports both using both counters. > > Yup, I meant "on abort" too, but I missed your point regarding the retry > "on abort" in the RTM for Stack Locks case. > > >> But I see that x86 uses the same behavior as you when using -XX:-UseRTMXendForLockBusy. >> I think it's not so good to treat "abort instruction on lock busy" as permanent abort reason. >> So the behavior is fine with UseRTMXendForLockBusy, but not without it. > > hmm I see. So you think it's also not fine on x64, right? In general I think > it's not good to finish a RTM atomic block with `xabort/tabort`, but maybe it > makes more sense on x86. Luckily -XX:+UseRTMXendForLockBusy is the default. > > >> But I can live with your change because it only has a negative effect on an unsupported experimental option. And I think your change is fine for the other usages of tabort(). > > OK. I'll revisit that when doing additional performance tests with RTM for Stack > Locks. > > Thanks! > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 26. Juni 2018 18:01 >> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional >> >> Hi Martin, >> >> On 06/25/2018 01:49 PM, Doerr, Martin wrote: >>> Looks good for the case UseRTMXendForLockBusy is active (which is default). >> >> I did all the tests focusing on when it's deactivated (-UseRTMXendForLockBusy). >> This is also the flag passed in jtreg tests since if it's active there are no >> aborts caused by 'tabort or xabort' and so no abort statistics (related to >> that event, which is used by the jtreg tests). >> >> >>> If this flag is deactivated, we use tabort if we see the object locked so your change prevents retrying the transaction in this case. >>> I guess this was not intended? >> >> I think that rtm_retry_lock_on_abort() is a misleading name, it should be >> something like rtm_retry_lock_on_conflict(), since the purpose of this >> function is to no retry if abort is caused by a tabort/xabort in my >> understanding. >> >> On Intel that function checks for bit 1 (0x2 mask) and if it is set the operation >> is retried. But to bit 1 be set it implies that transaction didn't abort due to >> xabort, otherwise that bit would be clear as: >> >> ?? 77?? //???? 0???? Set if abort caused by XABORT instruction. >> ?? 78?? //???? 1???? If set, the transaction may succeed on a retry. >> ???????????????????? This bit is always clear if bit 0 is set (or is always clear if abort is caused by XABORT) >> >> That's why filtering on Power by the "Abort" bit in TEXASR makes the number of >> aborts behave like on x64. If we don't filter abort caused by tabort we find the >> pattern X*2+1 times of retries, because both rtm_retry_lock_on_abort() and >> rtm_retry_lock_on_busy() will try RTMRetryCount of times the operation. >> >> My change won't prevent retrying because after rtm_retry_lock_on_abort(), if >> cmpxchgd() does not succeed it calls rtm_retry_lock_on_busy(), which by its turn >> will retry the operation based too on the value specified by RTMRetryCount. >> >> I prepared a simple test-case where UseRTMXendForLockBusy is deactivated to show >> that if we increase the number of RTMRetryCount even with that flag deactivated >> the operation is retried exactly RTMRetryCount+1 times after the fix, like on >> Intel: >> >> https://github.com/gromero/retry >> >> You just need to clone and run it pointing to a build dir: >> >> $ git clone https://github.com/gromero/retry && cd retry >> $ ./retry >> >> You have to build the WhiteBox lib through "make build-test-lib" before running >> it. >> >> So for RTMRetryCount=1 and RTMRetryCount=2 w/ -UseRTMXendForLockBusy before the >> change: >> >> gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry >> Creating thread0... >> Trying to inflate lock... >> Is monitor inflated? Yes >> Entering thread to sleep... >> RTM.syncAndTest at 26 >> # rtm locks total (estimated): 3 >> # rtm lock aborts? : 3 >> # rtm lock aborts 0: 3 >> # rtm lock aborts 1: 3 >> # rtm lock aborts 2: 0 >> # rtm lock aborts 3: 0 >> # rtm lock aborts 4: 0 >> # rtm lock aborts 5: 0 >> ++ set +x >> gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry >> Creating thread0... >> Trying to inflate lock... >> Is monitor inflated? Yes >> Entering thread to sleep... >> RTM.syncAndTest at 26 >> # rtm locks total (estimated): 5 >> # rtm lock aborts? : 5 >> # rtm lock aborts 0: 5 >> # rtm lock aborts 1: 5 >> # rtm lock aborts 2: 0 >> # rtm lock aborts 3: 0 >> # rtm lock aborts 4: 0 >> # rtm lock aborts 5: 0 >> ++ set +x >> >> >> and after the change: >> >> gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry >> Creating thread0... >> Trying to inflate lock... >> Is monitor inflated? Yes >> Entering thread to sleep... >> RTM.syncAndTest at 26 >> # rtm locks total (estimated): 2 >> # rtm lock aborts? : 2 >> # rtm lock aborts 0: 2 >> # rtm lock aborts 1: 2 >> # rtm lock aborts 2: 0 >> # rtm lock aborts 3: 0 >> # rtm lock aborts 4: 0 >> # rtm lock aborts 5: 0 >> ++ set +x >> gromero at gromero16:~/git/retry$ ./retry.sh /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry.java >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/jdk/bin/java -Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking -XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:-UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 -XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports java.base/jdk.internal.misc=ALL-UNNAMED retry >> Creating thread0... >> Trying to inflate lock... >> Is monitor inflated? Yes >> Entering thread to sleep... >> RTM.syncAndTest at 26 >> # rtm locks total (estimated): 3 >> # rtm lock aborts? : 3 >> # rtm lock aborts 0: 3 >> # rtm lock aborts 1: 3 >> # rtm lock aborts 2: 0 >> # rtm lock aborts 3: 0 >> # rtm lock aborts 4: 0 >> # rtm lock aborts 5: 0 >> ++ set +x >> >> >> Best regards, >> Gustavo >> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Montag, 25. Juni 2018 10:24 >>> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >>> Cc: ppc-aix-port-dev at openjdk.java.net >>> Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional >>> >>> Hi, >>> >>> Could the following change be reviewed please? >>> >>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205580 >>> webrev: http://cr.openjdk.java.net/~gromero/8205580/v1/ >>> >>> It changes the behavior of rtm_retry_lock_on_abort() by avoiding retry if abort >>> was a deliberate abort, i.e. caused by a 'tabort r0' instruction. >>> >>> On Intel bit 1 in abort_status_Reg (which communicates the abort status) is >>> always clear when a 'xabort 0' instruction is executed in order to inform that a >>> transactional retry /can not/ succeed on retry. So rtm_retry_lock_on_abort() on >>> Intel, on finding bit 1 clear in abort_status_Reg, skips the retry (don't >>> retry). >>> >>> Currently on Power rtm_retry_lock_on_abort() is just checking the persistent bit >>> (if set => skip) which /is not set/ by 'tabort r0'. Hence >>> rtm_retry_lock_on_abort() does retry to lock on an intentional abort caused by >>> 'tabort'. It leads, for instance when -XX:RTMRetryCount=1, to the following >>> discrepancy between Intel and Power regarding the number of retries/aborts: >>> >>> [Power] >>> # rtm locks total (estimated): 3 >>> # rtm lock aborts? : 3 >>> # rtm lock aborts 0: 3 >>> # rtm lock aborts 1: 3 >>> # rtm lock aborts 2: 0 >>> # rtm lock aborts 3: 0 >>> # rtm lock aborts 4: 0 >>> # rtm lock aborts 5: 0 >>> >>> [Intel] >>> # rtm locks total (estimated): 2 >>> # rtm lock aborts? : 2 >>> # rtm lock aborts 0: 2 >>> # rtm lock aborts 1: 2 >>> # rtm lock aborts 2: 0 >>> # rtm lock aborts 3: 0 >>> # rtm lock aborts 4: 0 >>> # rtm lock aborts 5: 0 >>> >>> So for -XX:RTMRetryCount=X: >>> on Power the number of aborts is: X*2+1 [1 first failure + 1 rtm_retry_lock_on_abort() + 1 rtm_retry_lock_on_busy()]; >>> on Intel the number of aborts is: X+1?? [1 first failure + 1 rtm_retry_lock_on_busy()] >>> >>> This change fixes that discrepancy by using bit "Abort" in TEXASR register >>> (abort_status_Reg) that tells if a transaction was aborted due to a 'tabort' >>> instruction and skip the retry if such a bit is set. >>> >>> It fixes the following tests: >>> >>> +Passed: compiler/rtm/locking/TestRTMRetryCount.java >>> +Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>> >>> >>> Thank you and best regards, >>> Gustavo >>> >> > From gromero at linux.vnet.ibm.com Tue Jul 17 13:44:33 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 17 Jul 2018 10:44:33 -0300 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> <428b95df734446c285cdbccf1c2dc24d@sap.com> Message-ID: <4d26ba51-30c5-d230-8017-96b67d4a758f@linux.vnet.ibm.com> Hi, Could I get a second review for that change please? Best regards, Gustavo On 07/17/2018 08:18 AM, Gustavo Romero wrote: > Hi Martin, > > OK. Let's continue with v4_B [1] so. > > Thanks for reviewing it! > > > Best regards, > Gustavo > > [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B > > On 07/17/2018 06:02 AM, Doerr, Martin wrote: >> Hi Gustavo, >> >> your webrev v4_B looks good. Reviewed. >> >> I think v4_A wouldn't work appropriately when using +1 and -1 bits together. >> >> A generic version could be something like (just to explain what I was thinking about): >> for (int nbit = 0; nbit < num_failure_bits; nbit++) { >> ?? Label do_increment, check_abort; >> >> ?? int last_match = -1; >> ?? for (ncounter = 0; ncounter < num_counters; ncounter++) { >> ???? if (last_match >= 0) { >> ?????? rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); >> ?????? int selection = bit_counter_map[last_match][ncounter]; >> ?????? if (selection == 1) { >> ???????? bne(CCR1, do_increment); >> ?????? } else if (selection == -1) { >> ???????? beq(CCR1, do_increment); >> ?????? } >> ???? } >> ???? last_match = nbit; >> ?? } >> >> ?? assert(last_match >= 0, "should have at least one"); >> ?? rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); >> ?? int selection = bit_counter_map[last_match][ncounter]; >> ?? if (selection == 1) { >> ???? beq(CCR1, check_abort); >> ?? } else if (selection == -1) { >> ???? bne(CCR1, check_abort); >> ?? } >> ?? bind(do_increment); >> ?? ld(temp_Reg, abort_counter_offs, rtm_counters_Reg); >> ?? addi(temp_Reg, temp_Reg, 1); >> ?? std(temp_Reg, abort_counter_offs, rtm_counters_Reg); >> ?? bind(check_abort); >> } >> >> But I'm fine with webrev v4_B with the comment you have added. >> >> Thanks, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 17. Juli 2018 08:55 >> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >> Cc: ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >> >> Hi Martin, >> >> On 07/16/2018 05:17 AM, Doerr, Martin wrote: >>> thanks for the new webrev. >> >> Thanks a lot for the thorough review. >> >> >>> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case. >>> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits. >>> If you want to keep the code with the loops, I think there should be a comment explaining this. >> >> I see. I've experimented a couple of options following your previous suggestion >> of using the counter loop as the outer loop. Most difficult point was to work >> around the constraint of having just R0 available as scratch. Bit/bitfield >> extracting instrs did not help much since most of them can't perform an OR with >> its destination operand, which gets worse with the R0 constraint. On the other, >> afaics 'rldimi' which does an OR with its destination operand can't be used to >> extract the bit/bitfields in a generic way. That best alternative I found was to >> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now, >> i.e. all bits/conditions for a given counter are ORed. failure_code logic is not >> inverted any more in the map, which seems more natural. I added also more >> information to the comment before the bit/counter map. >> I think now the loop is generic. >> Webrev for the final result: >> >> http://cr.openjdk.java.net/~gromero/8205582/v4_A/ >> >> >>> Please also fix indentation. >> >> Done. >> >> >>> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse. >>> >>> Besides that, it looks good. Thanks for improving it. >> >> Due to the tight schedule I also provide the corrections for the last reviewed >> version: >> >> http://cr.openjdk.java.net/~gromero/8205582/v4_B/ >> >> If v4_A also looks good I vouch for pushing it instead of v4_B. >> >> >> Best regards, >> Gustavo >> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Freitag, 13. Juli 2018 16:56 >>> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >>> Cc: ppc-aix-port-dev at openjdk.java.net >>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>> >>> Hi Martin, >>> >>> On 07/12/2018 12:30 PM, Doerr, Martin wrote: >>>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed. >>>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right? >>> >>> Do you mean increment twice counter #2 (conflict counter) or another counter? >>> non_trans_cf and trans_cf are mutually exclusive. >>> You probably spotted a case I'm missing so I ask. >>> >>> >>>> Besides that, I have some improvement proposals: >>>> >>>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better? >>>> >>>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value. >>> >>> Cool :) >>> >>> New (interim) webrev: >>> http://cr.openjdk.java.net/~gromero/8205582/v3/ >>> >>> Thanks. >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>> Sent: Dienstag, 10. Juli 2018 18:59 >>>> To: Doerr, Martin ; Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>>> >>>> Dear Martin, >>>> >>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote: >>>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp. >>>>> Somebody may want to use the definition for other purposes. >>>> >>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted >>>> using 'rldicr' in macroAssembler. >>>> >>>> >>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4]. >>>> >>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2 >>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now >>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events, >>>> which seems a reasonable approximation to the original semantics of the debug >>>> counter on Intel. Unfortunately I could not confirm on AIX how these two events >>>> (trap and syscall in TM) will set the failure code, so the counter will never >>>> track any information on AIX. But with the current proposed change that failure >>>> code can be easily added in the future. >>>> >>>> I also realized that I used previously a wrong ME operand value in: >>>> >>>> +??????? // Extract 11 bits >>>> +??????? rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); >>>> >>>> It should be 10 to extract 11 bits actually, so all extractions must be correct >>>> now. >>>> >>>> >>>> I hope you don't find the array map of failure bits vs counters overkilling. >>>> >>>> Finally, I replaced the comment: >>>> >>>> tm_tabort, // Note: Seems like signal handler sets this, too. >>>> >>>> by: >>>> >>>> tm_tabort, // Signal handler will set this too. >>>> >>>> because we just enable RTM support on Power if 'htm-nosc' is supported, so >>>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR >>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check >>>> that counter to verify the number of aborts cause by the kernel (if any). >>>> >>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>>> Sent: Montag, 25. Juni 2018 10:19 >>>>> To: Lindenmaier, Goetz ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net >>>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions >>>>> >>>>> Hi, >>>>> >>>>> Could the following change be reviewed please? >>>>> >>>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205582 >>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >>>>> >>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >>>>> extracting and checking bits in the Transactional Level field of TEXASR >>>>> register. >>>>> >>>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM >>>>> status register supports two bits to inform two different types of memory >>>>> conflict between threads: non-transactional and transactional. According to how >>>>> the jtreg RTM tests are designed the memory conflict counter counts >>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock >>>>> is held on a static variable while another thread without any synchronization >>>>> (non-trasactional) tries to modify the same variable. Hence that small >>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on >>>>> Power. The memory conflict counter is not used in any other place besides by the >>>>> RTM precise statistics (no decision is made by the JVM based on that amount). >>>>> >>>>> This change partially fixes some failures in >>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the >>>>> nested and memory conflict abort counters. The remaining issue will be fixed by >>>>> aborting on calling JNI (next RFR). >>>>> >>>>> >>>>> Thank you and best regards, >>>>> Gustavo >>>>> >>>> >>> >> > From goetz.lindenmaier at sap.com Tue Jul 17 14:40:29 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 17 Jul 2018 14:40:29 +0000 Subject: Filed 8207404: MulticastSocket tests failing on Aix Message-ID: <36050ccf558f438981d21626cae2f685@sap.com> Hi, is there anyone at IBM interested in looking at this issue? https://bugs.openjdk.java.net/browse/JDK-8207404 We see this failing on our systems all the time. I'll problem list these tests. Best regards, Goetz. From goetz.lindenmaier at sap.com Tue Jul 17 15:06:38 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 17 Jul 2018 15:06:38 +0000 Subject: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls In-Reply-To: References: <86a9e4085b424a58a28436415786e312@sap.com> <7c133789-e13a-d675-17ad-8338fc19c9ce@linux.vnet.ibm.com> Message-ID: <52a75406da49472ea22039cf846fa64f@sap.com> Hi Gustavo, change looks good, Reviewed. Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 17. Juli 2018 15:44 > To: Lindenmaier, Goetz > Cc: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls > > Hi, > > Could I get a second review for that change please? > > Best regards, > Gustavo > > On 06/26/2018 09:54 AM, Doerr, Martin wrote: > > Hi Gustavo, > > > > thanks for the update. Looks good to me. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > > Sent: Dienstag, 26. Juni 2018 14:41 > > To: Doerr, Martin ; Lindenmaier, Goetz > ; hotspot-compiler-dev at openjdk.java.net > > Cc: ppc-aix-port-dev at openjdk.java.net > > Subject: Re: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls > > > > Hi Martin, > > > > Thanks for the quick review! > > > > On 06/25/2018 01:37 PM, Doerr, Martin wrote: > >> I wonder why you placed the tabort so late in generate_native_wrapper. > I'd put it at the Verified Entry Point. > > > > Actually for no particular reason. > > > > So previously: > > > > 2247 [Verified Entry Point] > > 2248 0x00007fff9c816520: mfcr r22 > > 2249 0x00007fff9c816524: std r22,8(r1) > > 2250 0x00007fff9c816528: mflr r22 > > 2251 0x00007fff9c81652c: std r22,16(r1) > > 2252 0x00007fff9c816530: addis r11,r1,-2 > > 2253 0x00007fff9c816534: std r0,0(r11) > > 2254 0x00007fff9c816538: mr r21,r1 > > 2255 0x00007fff9c81653c: stdu r1,-176(r1) > > 2256 0x00007fff9c816540: std r3,96(r1) > > 2257 0x00007fff9c816544: addi r4,r1,96 > > 2258 0x00007fff9c816548: cmpdi r3,0 > > 2259 0x00007fff9c81654c: bne- 0x00007fff9c816554 > > 2260 0x00007fff9c816550: li r4,0 > > 2261 0x00007fff9c816554: addi r3,r16,824 ; > ImmutableOopMap{[96]=Oop } > > 2262 0x00007fff9c816558: addis r28,r29,7 > > 2263 0x00007fff9c81655c: addi r28,r28,25944 ; {internal_word} > > 2264 0x00007fff9c816560: tabort. r0 <== > > ... > > > > Now: > > > > 2169 [Verified Entry Point] > > 2170 0x00007fff78816320: tabort. r0 <== > > 2171 0x00007fff78816324: mfcr r22 > > 2172 0x00007fff78816328: std r22,8(r1) > > 2173 0x00007fff7881632c: mflr r22 > > 2174 0x00007fff78816330: std r22,16(r1) > > 2175 0x00007fff78816334: addis r11,r1,-2 > > 2176 0x00007fff78816338: std r0,0(r11) > > 2177 0x00007fff7881633c: mr r21,r1 > > 2178 0x00007fff78816340: stdu r1,-176(r1) > > 2179 0x00007fff78816344: std r3,96(r1) > > 2180 0x00007fff78816348: addi r4,r1,96 > > 2181 0x00007fff7881634c: cmpdi r3,0 > > 2182 0x00007fff78816350: bne- 0x00007fff78816358 > > 2183 0x00007fff78816354: li r4,0 > > 2184 0x00007fff78816358: addi r3,r16,824 ; > ImmutableOopMap{[96]=Oop } > > 2185 0x00007fff7881635c: addis r28,r29,7 > > 2186 0x00007fff78816360: addi r28,r28,25436 ; {internal_word} > > ... > > > > Yep, it's better to abort sooner. > > > > new webrev: http://cr.openjdk.java.net/~gromero/8205581/v2/ > > > > > > Best regards, > > Gustavo > > > >> Besides that, it looks good to me. > >> > >> Thanks, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >> Sent: Montag, 25. Juni 2018 10:21 > >> To: Lindenmaier, Goetz ; Doerr, Martin > ; hotspot-compiler-dev at openjdk.java.net > >> Cc: ppc-aix-port-dev at openjdk.java.net > >> Subject: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls > >> > >> Hi, > >> > >> Could the following change be reviewed please? > >> > >> bug : https://bugs.openjdk.java.net/browse/JDK-8205581 > >> webrev: http://cr.openjdk.java.net/~gromero/8205581/v1/ > >> > >> It forces a transactional state to abort before calling native methods, > before > >> calling runtime, and on uncommon trap checking, mostly because > transaction will > >> be aborted soon or latter in either case, similarly to what happens on > Intel. > >> The abort instruction (tabort) is only emitted if UseRTMLocking is "true" > and > >> any 'tabort' instruction is treated as a 'nop' instruction if TM state is > >> non-transactional. > >> > >> It fixes the following tests: > >> > >> +Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java > >> +Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java > >> > >> > >> Thank you and best regards, > >> Gustavo > >> > > From goetz.lindenmaier at sap.com Tue Jul 17 15:19:05 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 17 Jul 2018 15:19:05 +0000 Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional In-Reply-To: <33106239-dc48-0599-3b20-66d93f27463e@linux.vnet.ibm.com> References: <56e66a89-42a7-eb72-05a2-a97c696379c9@linux.vnet.ibm.com> <33106239-dc48-0599-3b20-66d93f27463e@linux.vnet.ibm.com> Message-ID: Hi Gustavo, the change looks good, Reviewed. Small nit, don't need a new webrev: + // used in the JVM. Thus mostly (B) a Nesting Overflows or (C) a Footprint 'Overflows' should be singular I think. Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 17. Juli 2018 15:44 > To: Lindenmaier, Goetz > Cc: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort > was intentional > > Hi, > > Could I get a second review for that change please? > It's already reviewed by Martin. > > Best regards, > Gustavo > > On 07/02/2018 11:03 AM, Gustavo Romero wrote: > > Hi Martin, > > > > On 07/02/2018 10:20 AM, Doerr, Martin wrote: > >> I meant retrying "on abort", not "on busy". There are different counters > for these two retry functions. > >> RTM for Stack Locks only supports "on abort". > >> Inflated RTM locking supports both using both counters. > > > > Yup, I meant "on abort" too, but I missed your point regarding the retry > > "on abort" in the RTM for Stack Locks case. > > > > > >> But I see that x86 uses the same behavior as you when using -XX:- > UseRTMXendForLockBusy. > >> I think it's not so good to treat "abort instruction on lock busy" as > permanent abort reason. > >> So the behavior is fine with UseRTMXendForLockBusy, but not without it. > > > > hmm I see. So you think it's also not fine on x64, right? In general I think > > it's not good to finish a RTM atomic block with `xabort/tabort`, but maybe it > > makes more sense on x86. Luckily -XX:+UseRTMXendForLockBusy is the > default. > > > > > >> But I can live with your change because it only has a negative effect on an > unsupported experimental option. And I think your change is fine for the > other usages of tabort(). > > > > OK. I'll revisit that when doing additional performance tests with RTM for > Stack > > Locks. > > > > Thanks! > > > > > > Best regards, > > Gustavo > > > >> Best regards, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >> Sent: Dienstag, 26. Juni 2018 18:01 > >> To: Doerr, Martin ; Lindenmaier, Goetz > ; hotspot-compiler-dev at openjdk.java.net > >> Cc: ppc-aix-port-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if > abort was intentional > >> > >> Hi Martin, > >> > >> On 06/25/2018 01:49 PM, Doerr, Martin wrote: > >>> Looks good for the case UseRTMXendForLockBusy is active (which is > default). > >> > >> I did all the tests focusing on when it's deactivated (- > UseRTMXendForLockBusy). > >> This is also the flag passed in jtreg tests since if it's active there are no > >> aborts caused by 'tabort or xabort' and so no abort statistics (related to > >> that event, which is used by the jtreg tests). > >> > >> > >>> If this flag is deactivated, we use tabort if we see the object locked so > your change prevents retrying the transaction in this case. > >>> I guess this was not intended? > >> > >> I think that rtm_retry_lock_on_abort() is a misleading name, it should be > >> something like rtm_retry_lock_on_conflict(), since the purpose of this > >> function is to no retry if abort is caused by a tabort/xabort in my > >> understanding. > >> > >> On Intel that function checks for bit 1 (0x2 mask) and if it is set the > operation > >> is retried. But to bit 1 be set it implies that transaction didn't abort due to > >> xabort, otherwise that bit would be clear as: > >> > >> ?? 77?? //???? 0???? Set if abort caused by XABORT instruction. > >> ?? 78?? //???? 1???? If set, the transaction may succeed on a retry. > >> ???????????????????? This bit is always clear if bit 0 is set (or is always clear if abort is > caused by XABORT) > >> > >> That's why filtering on Power by the "Abort" bit in TEXASR makes the > number of > >> aborts behave like on x64. If we don't filter abort caused by tabort we find > the > >> pattern X*2+1 times of retries, because both rtm_retry_lock_on_abort() > and > >> rtm_retry_lock_on_busy() will try RTMRetryCount of times the operation. > >> > >> My change won't prevent retrying because after > rtm_retry_lock_on_abort(), if > >> cmpxchgd() does not succeed it calls rtm_retry_lock_on_busy(), which by > its turn > >> will retry the operation based too on the value specified by > RTMRetryCount. > >> > >> I prepared a simple test-case where UseRTMXendForLockBusy is > deactivated to show > >> that if we increase the number of RTMRetryCount even with that flag > deactivated > >> the operation is retried exactly RTMRetryCount+1 times after the fix, like > on > >> Intel: > >> > >> https://github.com/gromero/retry > >> > >> You just need to clone and run it pointing to a build dir: > >> > >> $ git clone https://github.com/gromero/retry && cd retry > >> $ ./retry > >> > >> You have to build the WhiteBox lib through "make build-test-lib" before > running > >> it. > >> > >> So for RTMRetryCount=1 and RTMRetryCount=2 w/ - > UseRTMXendForLockBusy before the > >> change: > >> > >> gromero at gromero16:~/git/retry$ ./retry.sh > /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/support/test/lib/wb.jar --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry.java > >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/java - > Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- > server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - > XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- > UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - > XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry > >> Creating thread0... > >> Trying to inflate lock... > >> Is monitor inflated? Yes > >> Entering thread to sleep... > >> RTM.syncAndTest at 26 > >> # rtm locks total (estimated): 3 > >> # rtm lock aborts? : 3 > >> # rtm lock aborts 0: 3 > >> # rtm lock aborts 1: 3 > >> # rtm lock aborts 2: 0 > >> # rtm lock aborts 3: 0 > >> # rtm lock aborts 4: 0 > >> # rtm lock aborts 5: 0 > >> ++ set +x > >> gromero at gromero16:~/git/retry$ ./retry.sh > /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/support/test/lib/wb.jar --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry.java > >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/java - > Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- > server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - > XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- > UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - > XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry > >> Creating thread0... > >> Trying to inflate lock... > >> Is monitor inflated? Yes > >> Entering thread to sleep... > >> RTM.syncAndTest at 26 > >> # rtm locks total (estimated): 5 > >> # rtm lock aborts? : 5 > >> # rtm lock aborts 0: 5 > >> # rtm lock aborts 1: 5 > >> # rtm lock aborts 2: 0 > >> # rtm lock aborts 3: 0 > >> # rtm lock aborts 4: 0 > >> # rtm lock aborts 5: 0 > >> ++ set +x > >> > >> > >> and after the change: > >> > >> gromero at gromero16:~/git/retry$ ./retry.sh > /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/support/test/lib/wb.jar --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry.java > >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/java - > Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- > server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - > XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- > UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - > XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry > >> Creating thread0... > >> Trying to inflate lock... > >> Is monitor inflated? Yes > >> Entering thread to sleep... > >> RTM.syncAndTest at 26 > >> # rtm locks total (estimated): 2 > >> # rtm lock aborts? : 2 > >> # rtm lock aborts 0: 2 > >> # rtm lock aborts 1: 2 > >> # rtm lock aborts 2: 0 > >> # rtm lock aborts 3: 0 > >> # rtm lock aborts 4: 0 > >> # rtm lock aborts 5: 0 > >> ++ set +x > >> gromero at gromero16:~/git/retry$ ./retry.sh > /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/support/test/lib/wb.jar --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry.java > >> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- > normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le > >> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- > release/jdk/bin/java - > Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- > server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - > XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- > UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - > XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED retry > >> Creating thread0... > >> Trying to inflate lock... > >> Is monitor inflated? Yes > >> Entering thread to sleep... > >> RTM.syncAndTest at 26 > >> # rtm locks total (estimated): 3 > >> # rtm lock aborts? : 3 > >> # rtm lock aborts 0: 3 > >> # rtm lock aborts 1: 3 > >> # rtm lock aborts 2: 0 > >> # rtm lock aborts 3: 0 > >> # rtm lock aborts 4: 0 > >> # rtm lock aborts 5: 0 > >> ++ set +x > >> > >> > >> Best regards, > >> Gustavo > >> > >>> Thanks and best regards, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >>> Sent: Montag, 25. Juni 2018 10:24 > >>> To: Lindenmaier, Goetz ; Doerr, Martin > ; hotspot-compiler-dev at openjdk.java.net > >>> Cc: ppc-aix-port-dev at openjdk.java.net > >>> Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort > was intentional > >>> > >>> Hi, > >>> > >>> Could the following change be reviewed please? > >>> > >>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205580 > >>> webrev: http://cr.openjdk.java.net/~gromero/8205580/v1/ > >>> > >>> It changes the behavior of rtm_retry_lock_on_abort() by avoiding retry > if abort > >>> was a deliberate abort, i.e. caused by a 'tabort r0' instruction. > >>> > >>> On Intel bit 1 in abort_status_Reg (which communicates the abort > status) is > >>> always clear when a 'xabort 0' instruction is executed in order to inform > that a > >>> transactional retry /can not/ succeed on retry. So > rtm_retry_lock_on_abort() on > >>> Intel, on finding bit 1 clear in abort_status_Reg, skips the retry (don't > >>> retry). > >>> > >>> Currently on Power rtm_retry_lock_on_abort() is just checking the > persistent bit > >>> (if set => skip) which /is not set/ by 'tabort r0'. Hence > >>> rtm_retry_lock_on_abort() does retry to lock on an intentional abort > caused by > >>> 'tabort'. It leads, for instance when -XX:RTMRetryCount=1, to the > following > >>> discrepancy between Intel and Power regarding the number of > retries/aborts: > >>> > >>> [Power] > >>> # rtm locks total (estimated): 3 > >>> # rtm lock aborts? : 3 > >>> # rtm lock aborts 0: 3 > >>> # rtm lock aborts 1: 3 > >>> # rtm lock aborts 2: 0 > >>> # rtm lock aborts 3: 0 > >>> # rtm lock aborts 4: 0 > >>> # rtm lock aborts 5: 0 > >>> > >>> [Intel] > >>> # rtm locks total (estimated): 2 > >>> # rtm lock aborts? : 2 > >>> # rtm lock aborts 0: 2 > >>> # rtm lock aborts 1: 2 > >>> # rtm lock aborts 2: 0 > >>> # rtm lock aborts 3: 0 > >>> # rtm lock aborts 4: 0 > >>> # rtm lock aborts 5: 0 > >>> > >>> So for -XX:RTMRetryCount=X: > >>> on Power the number of aborts is: X*2+1 [1 first failure + 1 > rtm_retry_lock_on_abort() + 1 rtm_retry_lock_on_busy()]; > >>> on Intel the number of aborts is: X+1?? [1 first failure + 1 > rtm_retry_lock_on_busy()] > >>> > >>> This change fixes that discrepancy by using bit "Abort" in TEXASR register > >>> (abort_status_Reg) that tells if a transaction was aborted due to a > 'tabort' > >>> instruction and skip the retry if such a bit is set. > >>> > >>> It fixes the following tests: > >>> > >>> +Passed: compiler/rtm/locking/TestRTMRetryCount.java > >>> +Passed: compiler/rtm/locking/TestRTMAbortThreshold.java > >>> > >>> > >>> Thank you and best regards, > >>> Gustavo > >>> > >> > > From goetz.lindenmaier at sap.com Wed Jul 18 07:34:48 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 18 Jul 2018 07:34:48 +0000 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <4d26ba51-30c5-d230-8017-96b67d4a758f@linux.vnet.ibm.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> <428b95df734446c285cdbccf1c2dc24d@sap.com> <4d26ba51-30c5-d230-8017-96b67d4a758f@linux.vnet.ibm.com> Message-ID: <668ca44f5db64601a8c5c1bd863779f7@sap.com> Hi Gustavo, I had a look at your change. Basically looks good. Some smaller things: 2440 Otherwise, counter will be increment 2441 // more than once. This should read: Otherwise, the counter will be incremented more than once. Can you please put declaration and assignments on one line? 2472 int abortX_offs; 2473 abortX_offs = RTMLockingCounters::abortX_count_offset(); This should read: 2472 int abortX_offs = RTMLockingCounters::abortX_count_offset(); Similar 2479+2481. I don't need a new webrev for these fixes. Reviewed. Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 17. Juli 2018 15:45 > To: Lindenmaier, Goetz > Cc: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested > transactions > > Hi, > > Could I get a second review for that change please? > > Best regards, > Gustavo > > On 07/17/2018 08:18 AM, Gustavo Romero wrote: > > Hi Martin, > > > > OK. Let's continue with v4_B [1] so. > > > > Thanks for reviewing it! > > > > > > Best regards, > > Gustavo > > > > [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B > > > > On 07/17/2018 06:02 AM, Doerr, Martin wrote: > >> Hi Gustavo, > >> > >> your webrev v4_B looks good. Reviewed. > >> > >> I think v4_A wouldn't work appropriately when using +1 and -1 bits > together. > >> > >> A generic version could be something like (just to explain what I was > thinking about): > >> for (int nbit = 0; nbit < num_failure_bits; nbit++) { > >> ?? Label do_increment, check_abort; > >> > >> ?? int last_match = -1; > >> ?? for (ncounter = 0; ncounter < num_counters; ncounter++) { > >> ???? if (last_match >= 0) { > >> ?????? rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); > >> ?????? int selection = bit_counter_map[last_match][ncounter]; > >> ?????? if (selection == 1) { > >> ???????? bne(CCR1, do_increment); > >> ?????? } else if (selection == -1) { > >> ???????? beq(CCR1, do_increment); > >> ?????? } > >> ???? } > >> ???? last_match = nbit; > >> ?? } > >> > >> ?? assert(last_match >= 0, "should have at least one"); > >> ?? rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); > >> ?? int selection = bit_counter_map[last_match][ncounter]; > >> ?? if (selection == 1) { > >> ???? beq(CCR1, check_abort); > >> ?? } else if (selection == -1) { > >> ???? bne(CCR1, check_abort); > >> ?? } > >> ?? bind(do_increment); > >> ?? ld(temp_Reg, abort_counter_offs, rtm_counters_Reg); > >> ?? addi(temp_Reg, temp_Reg, 1); > >> ?? std(temp_Reg, abort_counter_offs, rtm_counters_Reg); > >> ?? bind(check_abort); > >> } > >> > >> But I'm fine with webrev v4_B with the comment you have added. > >> > >> Thanks, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >> Sent: Dienstag, 17. Juli 2018 08:55 > >> To: Doerr, Martin ; Lindenmaier, Goetz > ; hotspot-compiler-dev at openjdk.java.net > >> Cc: ppc-aix-port-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on > nested transactions > >> > >> Hi Martin, > >> > >> On 07/16/2018 05:17 AM, Doerr, Martin wrote: > >>> thanks for the new webrev. > >> > >> Thanks a lot for the thorough review. > >> > >> > >>> You're right, the two bits should be mutual exclusive, so the sum is > equivalent to the logical or in this case. > >>> However, the loop pretends to be generic, but it's not. The "or" only > works for mutual exclusive bits. > >>> If you want to keep the code with the loops, I think there should be a > comment explaining this. > >> > >> I see. I've experimented a couple of options following your previous > suggestion > >> of using the counter loop as the outer loop. Most difficult point was to > work > >> around the constraint of having just R0 available as scratch. Bit/bitfield > >> extracting instrs did not help much since most of them can't perform an > OR with > >> its destination operand, which gets worse with the R0 constraint. On the > other, > >> afaics 'rldimi' which does an OR with its destination operand can't be used > to > >> extract the bit/bitfields in a generic way. That best alternative I found was > to > >> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered > now, > >> i.e. all bits/conditions for a given counter are ORed. failure_code logic is > not > >> inverted any more in the map, which seems more natural. I added also > more > >> information to the comment before the bit/counter map. > >> I think now the loop is generic. > >> Webrev for the final result: > >> > >> http://cr.openjdk.java.net/~gromero/8205582/v4_A/ > >> > >> > >>> Please also fix indentation. > >> > >> Done. > >> > >> > >>> Alternatively, the loop could get replaced by some code for each bit. > Given that the loop is not really generic, I think this wouldn't be worse. > >>> > >>> Besides that, it looks good. Thanks for improving it. > >> > >> Due to the tight schedule I also provide the corrections for the last > reviewed > >> version: > >> > >> http://cr.openjdk.java.net/~gromero/8205582/v4_B/ > >> > >> If v4_A also looks good I vouch for pushing it instead of v4_B. > >> > >> > >> Best regards, > >> Gustavo > >> > >>> Best regards, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >>> Sent: Freitag, 13. Juli 2018 16:56 > >>> To: Doerr, Martin ; Lindenmaier, Goetz > ; hotspot-compiler-dev at openjdk.java.net > >>> Cc: ppc-aix-port-dev at openjdk.java.net > >>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on > nested transactions > >>> > >>> Hi Martin, > >>> > >>> On 07/12/2018 12:30 PM, Doerr, Martin wrote: > >>>> I think your new code may increment a counter twice when 2 bits are > specified for it. I think this should get fixed. > >>>> Iterating over the RTMLockingCounters in the outer loop and > performing several checks before incrementing should fix this, right? > >>> > >>> Do you mean increment twice counter #2 (conflict counter) or another > counter? > >>> non_trans_cf and trans_cf are mutually exclusive. > >>> You probably spotted a case I'm missing so I ask. > >>> > >>> > >>>> Besides that, I have some improvement proposals: > >>>> > >>>> - I think using 3 constant tables is not so good to read. For example, > inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 > when using int instead of bool. Would this be better? > >>>> > >>>> - I think you can get rid of rtm_counters_Reg increment and restoration > by computing the abort_offs relative to the original value. > >>> > >>> Cool :) > >>> > >>> New (interim) webrev: > >>> http://cr.openjdk.java.net/~gromero/8205582/v3/ > >>> > >>> Thanks. > >>> > >>> > >>> Best regards, > >>> Gustavo > >>> > >>>> Best regards, > >>>> Martin > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >>>> Sent: Dienstag, 10. Juli 2018 18:59 > >>>> To: Doerr, Martin ; Lindenmaier, Goetz > ; hotspot-compiler-dev at openjdk.java.net > >>>> Cc: ppc-aix-port-dev at openjdk.java.net > >>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on > nested transactions > >>>> > >>>> Dear Martin, > >>>> > >>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote: > >>>>> I think it would be better to ignore bit 63 in the macroAssembler code > and use the definition from the spec in assembler_ppc.hpp. > >>>>> Somebody may want to use the definition for other purposes. > >>>> > >>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be > extracted > >>>> using 'rldicr' in macroAssembler. > >>>> > >>>> > >>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf > would be a better match for x86's description for tm_failure_bit[2]. It's also a > little unfortunate to print the same bit twice as tm_failure_bit[4]. > >>>> > >>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will > increment counter 2 > >>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed > and now > >>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM > events, > >>>> which seems a reasonable approximation to the original semantics of > the debug > >>>> counter on Intel. Unfortunately I could not confirm on AIX how these > two events > >>>> (trap and syscall in TM) will set the failure code, so the counter will > never > >>>> track any information on AIX. But with the current proposed change > that failure > >>>> code can be easily added in the future. > >>>> > >>>> I also realized that I used previously a wrong ME operand value in: > >>>> > >>>> +??????? // Extract 11 bits > >>>> +??????? rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); > >>>> > >>>> It should be 10 to extract 11 bits actually, so all extractions must be > correct > >>>> now. > >>>> > >>>> > >>>> I hope you don't find the array map of failure bits vs counters > overkilling. > >>>> > >>>> Finally, I replaced the comment: > >>>> > >>>> tm_tabort, // Note: Seems like signal handler sets this, too. > >>>> > >>>> by: > >>>> > >>>> tm_tabort, // Signal handler will set this too. > >>>> > >>>> because we just enable RTM support on Power if 'htm-nosc' is > supported, so > >>>> treclaim. on aborting the syscall will indeed always set Abort bit in > TEXASR > >>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to > check > >>>> that counter to verify the number of aborts cause by the kernel (if > any). > >>>> > >>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ > >>>> > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>>> Best regards, > >>>>> Martin > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > >>>>> Sent: Montag, 25. Juni 2018 10:19 > >>>>> To: Lindenmaier, Goetz ; Doerr, Martin > ; hotspot-compiler-dev at openjdk.java.net > >>>>> Cc: ppc-aix-port-dev at openjdk.java.net > >>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on > nested transactions > >>>>> > >>>>> Hi, > >>>>> > >>>>> Could the following change be reviewed please? > >>>>> > >>>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205582 > >>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ > >>>>> > >>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by > >>>>> extracting and checking bits in the Transactional Level field of TEXASR > >>>>> register. > >>>>> > >>>>> It also fixes the memory conflict counter (rtm lock aborts type 2). > Power TM > >>>>> status register supports two bits to inform two different types of > memory > >>>>> conflict between threads: non-transactional and transactional. > According to how > >>>>> the jtreg RTM tests are designed the memory conflict counter counts > >>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a > RTM lock > >>>>> is held on a static variable while another thread without any > synchronization > >>>>> (non-trasactional) tries to modify the same variable. Hence that small > >>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making > it pass on > >>>>> Power. The memory conflict counter is not used in any other place > besides by the > >>>>> RTM precise statistics (no decision is made by the JVM based on that > amount). > >>>>> > >>>>> This change partially fixes some failures in > >>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java > regarding the > >>>>> nested and memory conflict abort counters. The remaining issue will > be fixed by > >>>>> aborting on calling JNI (next RFR). > >>>>> > >>>>> > >>>>> Thank you and best regards, > >>>>> Gustavo > >>>>> > >>>> > >>> > >> > > From enasser at in.ibm.com Wed Jul 18 14:03:48 2018 From: enasser at in.ibm.com (Nasser Ebrahim) Date: Wed, 18 Jul 2018 14:03:48 +0000 Subject: Filed 8207404: MulticastSocket tests failing on Aix Message-ID: Lindenmaier, Goetz goetz.lindenmaier at sap.com Tue Jul 17 14:40:29 UTC 2018 > is there anyone at IBM interested in looking at this issue? > https://bugs.openjdk.java.net/browse/JDK-8207404 Yes Goetz. I will analyze the issue and get back to you. Thank you, Nasser Ebrahim -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Wed Jul 18 14:07:39 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 18 Jul 2018 14:07:39 +0000 Subject: Filed 8207404: MulticastSocket tests failing on Aix In-Reply-To: References: Message-ID: Hi Nasser, that's great, thanks a lot! Best regards, Goetz. From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Nasser Ebrahim Sent: Mittwoch, 18. Juli 2018 16:04 To: ppc-aix-port-dev at openjdk.java.net Subject: Filed 8207404: MulticastSocket tests failing on Aix Lindenmaier, Goetz goetz.lindenmaier at sap.com Tue Jul 17 14:40:29 UTC 2018 > is there anyone at IBM interested in looking at this issue? > https://bugs.openjdk.java.net/browse/JDK-8207404 Yes Goetz. I will analyze the issue and get back to you. Thank you, Nasser Ebrahim -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Wed Jul 18 14:36:48 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Jul 2018 11:36:48 -0300 Subject: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls In-Reply-To: <52a75406da49472ea22039cf846fa64f@sap.com> References: <86a9e4085b424a58a28436415786e312@sap.com> <7c133789-e13a-d675-17ad-8338fc19c9ce@linux.vnet.ibm.com> <52a75406da49472ea22039cf846fa64f@sap.com> Message-ID: Thanks, Goetz. I'll push it to jdk/jdk11 today. Best regards, Gustavo On 07/17/2018 12:06 PM, Lindenmaier, Goetz wrote: > Hi Gustavo, > > change looks good, Reviewed. > > Best regards, > Goetz. > >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 17. Juli 2018 15:44 >> To: Lindenmaier, Goetz >> Cc: Doerr, Martin ; hotspot-compiler- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls >> >> Hi, >> >> Could I get a second review for that change please? >> >> Best regards, >> Gustavo >> >> On 06/26/2018 09:54 AM, Doerr, Martin wrote: >>> Hi Gustavo, >>> >>> thanks for the update. Looks good to me. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Dienstag, 26. Juni 2018 14:41 >>> To: Doerr, Martin ; Lindenmaier, Goetz >> ; hotspot-compiler-dev at openjdk.java.net >>> Cc: ppc-aix-port-dev at openjdk.java.net >>> Subject: Re: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls >>> >>> Hi Martin, >>> >>> Thanks for the quick review! >>> >>> On 06/25/2018 01:37 PM, Doerr, Martin wrote: >>>> I wonder why you placed the tabort so late in generate_native_wrapper. >> I'd put it at the Verified Entry Point. >>> >>> Actually for no particular reason. >>> >>> So previously: >>> >>> 2247 [Verified Entry Point] >>> 2248 0x00007fff9c816520: mfcr r22 >>> 2249 0x00007fff9c816524: std r22,8(r1) >>> 2250 0x00007fff9c816528: mflr r22 >>> 2251 0x00007fff9c81652c: std r22,16(r1) >>> 2252 0x00007fff9c816530: addis r11,r1,-2 >>> 2253 0x00007fff9c816534: std r0,0(r11) >>> 2254 0x00007fff9c816538: mr r21,r1 >>> 2255 0x00007fff9c81653c: stdu r1,-176(r1) >>> 2256 0x00007fff9c816540: std r3,96(r1) >>> 2257 0x00007fff9c816544: addi r4,r1,96 >>> 2258 0x00007fff9c816548: cmpdi r3,0 >>> 2259 0x00007fff9c81654c: bne- 0x00007fff9c816554 >>> 2260 0x00007fff9c816550: li r4,0 >>> 2261 0x00007fff9c816554: addi r3,r16,824 ; >> ImmutableOopMap{[96]=Oop } >>> 2262 0x00007fff9c816558: addis r28,r29,7 >>> 2263 0x00007fff9c81655c: addi r28,r28,25944 ; {internal_word} >>> 2264 0x00007fff9c816560: tabort. r0 <== >>> ... >>> >>> Now: >>> >>> 2169 [Verified Entry Point] >>> 2170 0x00007fff78816320: tabort. r0 <== >>> 2171 0x00007fff78816324: mfcr r22 >>> 2172 0x00007fff78816328: std r22,8(r1) >>> 2173 0x00007fff7881632c: mflr r22 >>> 2174 0x00007fff78816330: std r22,16(r1) >>> 2175 0x00007fff78816334: addis r11,r1,-2 >>> 2176 0x00007fff78816338: std r0,0(r11) >>> 2177 0x00007fff7881633c: mr r21,r1 >>> 2178 0x00007fff78816340: stdu r1,-176(r1) >>> 2179 0x00007fff78816344: std r3,96(r1) >>> 2180 0x00007fff78816348: addi r4,r1,96 >>> 2181 0x00007fff7881634c: cmpdi r3,0 >>> 2182 0x00007fff78816350: bne- 0x00007fff78816358 >>> 2183 0x00007fff78816354: li r4,0 >>> 2184 0x00007fff78816358: addi r3,r16,824 ; >> ImmutableOopMap{[96]=Oop } >>> 2185 0x00007fff7881635c: addis r28,r29,7 >>> 2186 0x00007fff78816360: addi r28,r28,25436 ; {internal_word} >>> ... >>> >>> Yep, it's better to abort sooner. >>> >>> new webrev: http://cr.openjdk.java.net/~gromero/8205581/v2/ >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Besides that, it looks good to me. >>>> >>>> Thanks, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>> Sent: Montag, 25. Juni 2018 10:21 >>>> To: Lindenmaier, Goetz ; Doerr, Martin >> ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>> Subject: RFR(s): 8205581: PPC64: RTM: Fix abort on native calls >>>> >>>> Hi, >>>> >>>> Could the following change be reviewed please? >>>> >>>> bug : https://bugs.openjdk.java.net/browse/JDK-8205581 >>>> webrev: http://cr.openjdk.java.net/~gromero/8205581/v1/ >>>> >>>> It forces a transactional state to abort before calling native methods, >> before >>>> calling runtime, and on uncommon trap checking, mostly because >> transaction will >>>> be aborted soon or latter in either case, similarly to what happens on >> Intel. >>>> The abort instruction (tabort) is only emitted if UseRTMLocking is "true" >> and >>>> any 'tabort' instruction is treated as a 'nop' instruction if TM state is >>>> non-transactional. >>>> >>>> It fixes the following tests: >>>> >>>> +Passed: compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >>>> +Passed: compiler/rtm/locking/TestRTMDeoptOnHighAbortRatio.java >>>> >>>> >>>> Thank you and best regards, >>>> Gustavo >>>> >>> > From gromero at linux.vnet.ibm.com Wed Jul 18 14:38:05 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Jul 2018 11:38:05 -0300 Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort was intentional In-Reply-To: References: <56e66a89-42a7-eb72-05a2-a97c696379c9@linux.vnet.ibm.com> <33106239-dc48-0599-3b20-66d93f27463e@linux.vnet.ibm.com> Message-ID: <17fe991d-ea35-e423-d7fe-0a9501072116@linux.vnet.ibm.com> On 07/17/2018 12:19 PM, Lindenmaier, Goetz wrote: > Hi Gustavo, > > the change looks good, Reviewed. > > Small nit, don't need a new webrev: > + // used in the JVM. Thus mostly (B) a Nesting Overflows or (C) a Footprint > 'Overflows' should be singular I think. Thanks, Goetz. I'll push it to jdk/jdk11 today. Best regards, Gustavo > Best regards, > Goetz. > > >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 17. Juli 2018 15:44 >> To: Lindenmaier, Goetz >> Cc: Doerr, Martin ; hotspot-compiler- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort >> was intentional >> >> Hi, >> >> Could I get a second review for that change please? >> It's already reviewed by Martin. >> >> Best regards, >> Gustavo >> >> On 07/02/2018 11:03 AM, Gustavo Romero wrote: >>> Hi Martin, >>> >>> On 07/02/2018 10:20 AM, Doerr, Martin wrote: >>>> I meant retrying "on abort", not "on busy". There are different counters >> for these two retry functions. >>>> RTM for Stack Locks only supports "on abort". >>>> Inflated RTM locking supports both using both counters. >>> >>> Yup, I meant "on abort" too, but I missed your point regarding the retry >>> "on abort" in the RTM for Stack Locks case. >>> >>> >>>> But I see that x86 uses the same behavior as you when using -XX:- >> UseRTMXendForLockBusy. >>>> I think it's not so good to treat "abort instruction on lock busy" as >> permanent abort reason. >>>> So the behavior is fine with UseRTMXendForLockBusy, but not without it. >>> >>> hmm I see. So you think it's also not fine on x64, right? In general I think >>> it's not good to finish a RTM atomic block with `xabort/tabort`, but maybe it >>> makes more sense on x86. Luckily -XX:+UseRTMXendForLockBusy is the >> default. >>> >>> >>>> But I can live with your change because it only has a negative effect on an >> unsupported experimental option. And I think your change is fine for the >> other usages of tabort(). >>> >>> OK. I'll revisit that when doing additional performance tests with RTM for >> Stack >>> Locks. >>> >>> Thanks! >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>> Sent: Dienstag, 26. Juni 2018 18:01 >>>> To: Doerr, Martin ; Lindenmaier, Goetz >> ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if >> abort was intentional >>>> >>>> Hi Martin, >>>> >>>> On 06/25/2018 01:49 PM, Doerr, Martin wrote: >>>>> Looks good for the case UseRTMXendForLockBusy is active (which is >> default). >>>> >>>> I did all the tests focusing on when it's deactivated (- >> UseRTMXendForLockBusy). >>>> This is also the flag passed in jtreg tests since if it's active there are no >>>> aborts caused by 'tabort or xabort' and so no abort statistics (related to >>>> that event, which is used by the jtreg tests). >>>> >>>> >>>>> If this flag is deactivated, we use tabort if we see the object locked so >> your change prevents retrying the transaction in this case. >>>>> I guess this was not intended? >>>> >>>> I think that rtm_retry_lock_on_abort() is a misleading name, it should be >>>> something like rtm_retry_lock_on_conflict(), since the purpose of this >>>> function is to no retry if abort is caused by a tabort/xabort in my >>>> understanding. >>>> >>>> On Intel that function checks for bit 1 (0x2 mask) and if it is set the >> operation >>>> is retried. But to bit 1 be set it implies that transaction didn't abort due to >>>> xabort, otherwise that bit would be clear as: >>>> >>>> ?? 77?? //???? 0???? Set if abort caused by XABORT instruction. >>>> ?? 78?? //???? 1???? If set, the transaction may succeed on a retry. >>>> ???????????????????? This bit is always clear if bit 0 is set (or is always clear if abort is >> caused by XABORT) >>>> >>>> That's why filtering on Power by the "Abort" bit in TEXASR makes the >> number of >>>> aborts behave like on x64. If we don't filter abort caused by tabort we find >> the >>>> pattern X*2+1 times of retries, because both rtm_retry_lock_on_abort() >> and >>>> rtm_retry_lock_on_busy() will try RTMRetryCount of times the operation. >>>> >>>> My change won't prevent retrying because after >> rtm_retry_lock_on_abort(), if >>>> cmpxchgd() does not succeed it calls rtm_retry_lock_on_busy(), which by >> its turn >>>> will retry the operation based too on the value specified by >> RTMRetryCount. >>>> >>>> I prepared a simple test-case where UseRTMXendForLockBusy is >> deactivated to show >>>> that if we increase the number of RTMRetryCount even with that flag >> deactivated >>>> the operation is retried exactly RTMRetryCount+1 times after the fix, like >> on >>>> Intel: >>>> >>>> https://github.com/gromero/retry >>>> >>>> You just need to clone and run it pointing to a build dir: >>>> >>>> $ git clone https://github.com/gromero/retry && cd retry >>>> $ ./retry >>>> >>>> You have to build the WhiteBox lib through "make build-test-lib" before >> running >>>> it. >>>> >>>> So for RTMRetryCount=1 and RTMRetryCount=2 w/ - >> UseRTMXendForLockBusy before the >>>> change: >>>> >>>> gromero at gromero16:~/git/retry$ ./retry.sh >> /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/support/test/lib/wb.jar --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry.java >>>> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/java - >> Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- >> server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions >> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - >> XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- >> UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - >> XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry >>>> Creating thread0... >>>> Trying to inflate lock... >>>> Is monitor inflated? Yes >>>> Entering thread to sleep... >>>> RTM.syncAndTest at 26 >>>> # rtm locks total (estimated): 3 >>>> # rtm lock aborts? : 3 >>>> # rtm lock aborts 0: 3 >>>> # rtm lock aborts 1: 3 >>>> # rtm lock aborts 2: 0 >>>> # rtm lock aborts 3: 0 >>>> # rtm lock aborts 4: 0 >>>> # rtm lock aborts 5: 0 >>>> ++ set +x >>>> gromero at gromero16:~/git/retry$ ./retry.sh >> /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/support/test/lib/wb.jar --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry.java >>>> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/java - >> Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- >> server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions >> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - >> XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- >> UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - >> XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry >>>> Creating thread0... >>>> Trying to inflate lock... >>>> Is monitor inflated? Yes >>>> Entering thread to sleep... >>>> RTM.syncAndTest at 26 >>>> # rtm locks total (estimated): 5 >>>> # rtm lock aborts? : 5 >>>> # rtm lock aborts 0: 5 >>>> # rtm lock aborts 1: 5 >>>> # rtm lock aborts 2: 0 >>>> # rtm lock aborts 3: 0 >>>> # rtm lock aborts 4: 0 >>>> # rtm lock aborts 5: 0 >>>> ++ set +x >>>> >>>> >>>> and after the change: >>>> >>>> gromero at gromero16:~/git/retry$ ./retry.sh >> /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 1 >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/support/test/lib/wb.jar --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry.java >>>> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/java - >> Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- >> server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions >> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - >> XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- >> UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - >> XX:RTMRetryCount=1 -XX:CompileOnly=RTM.syncAndTest --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry >>>> Creating thread0... >>>> Trying to inflate lock... >>>> Is monitor inflated? Yes >>>> Entering thread to sleep... >>>> RTM.syncAndTest at 26 >>>> # rtm locks total (estimated): 2 >>>> # rtm lock aborts? : 2 >>>> # rtm lock aborts 0: 2 >>>> # rtm lock aborts 1: 2 >>>> # rtm lock aborts 2: 0 >>>> # rtm lock aborts 3: 0 >>>> # rtm lock aborts 4: 0 >>>> # rtm lock aborts 5: 0 >>>> ++ set +x >>>> gromero at gromero16:~/git/retry$ ./retry.sh >> /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server-release 2 >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/javac -cp /home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/support/test/lib/wb.jar --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry.java >>>> ++ LD_LIBRARY_PATH=/home/gromero/hg/jdk/jdk/build/linux-ppc64le- >> normal-server-release/jdk/../..//src/utils/hsdis/build/linux-ppc64le >>>> ++ /home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal-server- >> release/jdk/bin/java - >> Xbootclasspath/a:/home/gromero/hg/jdk/jdk/build/linux-ppc64le-normal- >> server-release/support/test/lib/wb.jar -XX:+UnlockExperimentalVMOptions >> -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseRTMLocking - >> XX:+PrintPreciseRTMLockingStatistics -XX:-TieredCompilation -Xcomp -XX:- >> UseRTMXendForLockBusy -XX:RTMTotalCountIncrRate=1 - >> XX:RTMRetryCount=2 -XX:CompileOnly=RTM.syncAndTest --add-exports >> java.base/jdk.internal.misc=ALL-UNNAMED retry >>>> Creating thread0... >>>> Trying to inflate lock... >>>> Is monitor inflated? Yes >>>> Entering thread to sleep... >>>> RTM.syncAndTest at 26 >>>> # rtm locks total (estimated): 3 >>>> # rtm lock aborts? : 3 >>>> # rtm lock aborts 0: 3 >>>> # rtm lock aborts 1: 3 >>>> # rtm lock aborts 2: 0 >>>> # rtm lock aborts 3: 0 >>>> # rtm lock aborts 4: 0 >>>> # rtm lock aborts 5: 0 >>>> ++ set +x >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>>> Sent: Montag, 25. Juni 2018 10:24 >>>>> To: Lindenmaier, Goetz ; Doerr, Martin >> ; hotspot-compiler-dev at openjdk.java.net >>>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>>> Subject: RFR(s): 8205580: PPC64: RTM: Don't retry lock on abort if abort >> was intentional >>>>> >>>>> Hi, >>>>> >>>>> Could the following change be reviewed please? >>>>> >>>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205580 >>>>> webrev: http://cr.openjdk.java.net/~gromero/8205580/v1/ >>>>> >>>>> It changes the behavior of rtm_retry_lock_on_abort() by avoiding retry >> if abort >>>>> was a deliberate abort, i.e. caused by a 'tabort r0' instruction. >>>>> >>>>> On Intel bit 1 in abort_status_Reg (which communicates the abort >> status) is >>>>> always clear when a 'xabort 0' instruction is executed in order to inform >> that a >>>>> transactional retry /can not/ succeed on retry. So >> rtm_retry_lock_on_abort() on >>>>> Intel, on finding bit 1 clear in abort_status_Reg, skips the retry (don't >>>>> retry). >>>>> >>>>> Currently on Power rtm_retry_lock_on_abort() is just checking the >> persistent bit >>>>> (if set => skip) which /is not set/ by 'tabort r0'. Hence >>>>> rtm_retry_lock_on_abort() does retry to lock on an intentional abort >> caused by >>>>> 'tabort'. It leads, for instance when -XX:RTMRetryCount=1, to the >> following >>>>> discrepancy between Intel and Power regarding the number of >> retries/aborts: >>>>> >>>>> [Power] >>>>> # rtm locks total (estimated): 3 >>>>> # rtm lock aborts? : 3 >>>>> # rtm lock aborts 0: 3 >>>>> # rtm lock aborts 1: 3 >>>>> # rtm lock aborts 2: 0 >>>>> # rtm lock aborts 3: 0 >>>>> # rtm lock aborts 4: 0 >>>>> # rtm lock aborts 5: 0 >>>>> >>>>> [Intel] >>>>> # rtm locks total (estimated): 2 >>>>> # rtm lock aborts? : 2 >>>>> # rtm lock aborts 0: 2 >>>>> # rtm lock aborts 1: 2 >>>>> # rtm lock aborts 2: 0 >>>>> # rtm lock aborts 3: 0 >>>>> # rtm lock aborts 4: 0 >>>>> # rtm lock aborts 5: 0 >>>>> >>>>> So for -XX:RTMRetryCount=X: >>>>> on Power the number of aborts is: X*2+1 [1 first failure + 1 >> rtm_retry_lock_on_abort() + 1 rtm_retry_lock_on_busy()]; >>>>> on Intel the number of aborts is: X+1?? [1 first failure + 1 >> rtm_retry_lock_on_busy()] >>>>> >>>>> This change fixes that discrepancy by using bit "Abort" in TEXASR register >>>>> (abort_status_Reg) that tells if a transaction was aborted due to a >> 'tabort' >>>>> instruction and skip the retry if such a bit is set. >>>>> >>>>> It fixes the following tests: >>>>> >>>>> +Passed: compiler/rtm/locking/TestRTMRetryCount.java >>>>> +Passed: compiler/rtm/locking/TestRTMAbortThreshold.java >>>>> >>>>> >>>>> Thank you and best regards, >>>>> Gustavo >>>>> >>>> >>> > From gromero at linux.vnet.ibm.com Wed Jul 18 14:40:15 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Jul 2018 11:40:15 -0300 Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions In-Reply-To: <668ca44f5db64601a8c5c1bd863779f7@sap.com> References: <8aadb20a-fba7-0c52-04c0-015a731e60bd@linux.vnet.ibm.com> <571b4e4aa8d04c55a2c68d655aff4023@sap.com> <780da692-c439-2169-b007-3a7c07db5d9e@linux.vnet.ibm.com> <7f8b0534-baf1-9ddf-d4ef-e96a75b898ed@linux.vnet.ibm.com> <5b0e6dff18b74e69962b4b7e8256a17e@sap.com> <58772d79-69a3-0589-f1dd-80e33b834c2d@linux.vnet.ibm.com> <428b95df734446c285cdbccf1c2dc24d@sap.com> <4d26ba51-30c5-d230-8017-96b67d4a758f@linux.vnet.ibm.com> <668ca44f5db64601a8c5c1bd863779f7@sap.com> Message-ID: <3b461bf6-9b79-6f5c-d12f-cb786bb6a1b4@linux.vnet.ibm.com> On 07/18/2018 04:34 AM, Lindenmaier, Goetz wrote: > Hi Gustavo, > > I had a look at your change. Basically looks good. > > Some smaller things: > 2440 Otherwise, counter will be increment > 2441 // more than once. > This should read: > Otherwise, the counter will be incremented more than once. > > Can you please put declaration and assignments on one line? > 2472 int abortX_offs; > 2473 abortX_offs = RTMLockingCounters::abortX_count_offset(); > This should read: > 2472 int abortX_offs = RTMLockingCounters::abortX_count_offset(); > Similar 2479+2481. > > I don't need a new webrev for these fixes. Reviewed. Thanks, Goetz. I'll fix them and push the change to jdk/jdk11 today. Best regards, Gustavo > Best regards, > Goetz. > >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Dienstag, 17. Juli 2018 15:45 >> To: Lindenmaier, Goetz >> Cc: Doerr, Martin ; hotspot-compiler- >> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested >> transactions >> >> Hi, >> >> Could I get a second review for that change please? >> >> Best regards, >> Gustavo >> >> On 07/17/2018 08:18 AM, Gustavo Romero wrote: >>> Hi Martin, >>> >>> OK. Let's continue with v4_B [1] so. >>> >>> Thanks for reviewing it! >>> >>> >>> Best regards, >>> Gustavo >>> >>> [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B >>> >>> On 07/17/2018 06:02 AM, Doerr, Martin wrote: >>>> Hi Gustavo, >>>> >>>> your webrev v4_B looks good. Reviewed. >>>> >>>> I think v4_A wouldn't work appropriately when using +1 and -1 bits >> together. >>>> >>>> A generic version could be something like (just to explain what I was >> thinking about): >>>> for (int nbit = 0; nbit < num_failure_bits; nbit++) { >>>> ?? Label do_increment, check_abort; >>>> >>>> ?? int last_match = -1; >>>> ?? for (ncounter = 0; ncounter < num_counters; ncounter++) { >>>> ???? if (last_match >= 0) { >>>> ?????? rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); >>>> ?????? int selection = bit_counter_map[last_match][ncounter]; >>>> ?????? if (selection == 1) { >>>> ???????? bne(CCR1, do_increment); >>>> ?????? } else if (selection == -1) { >>>> ???????? beq(CCR1, do_increment); >>>> ?????? } >>>> ???? } >>>> ???? last_match = nbit; >>>> ?? } >>>> >>>> ?? assert(last_match >= 0, "should have at least one"); >>>> ?? rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0); >>>> ?? int selection = bit_counter_map[last_match][ncounter]; >>>> ?? if (selection == 1) { >>>> ???? beq(CCR1, check_abort); >>>> ?? } else if (selection == -1) { >>>> ???? bne(CCR1, check_abort); >>>> ?? } >>>> ?? bind(do_increment); >>>> ?? ld(temp_Reg, abort_counter_offs, rtm_counters_Reg); >>>> ?? addi(temp_Reg, temp_Reg, 1); >>>> ?? std(temp_Reg, abort_counter_offs, rtm_counters_Reg); >>>> ?? bind(check_abort); >>>> } >>>> >>>> But I'm fine with webrev v4_B with the comment you have added. >>>> >>>> Thanks, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>> Sent: Dienstag, 17. Juli 2018 08:55 >>>> To: Doerr, Martin ; Lindenmaier, Goetz >> ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on >> nested transactions >>>> >>>> Hi Martin, >>>> >>>> On 07/16/2018 05:17 AM, Doerr, Martin wrote: >>>>> thanks for the new webrev. >>>> >>>> Thanks a lot for the thorough review. >>>> >>>> >>>>> You're right, the two bits should be mutual exclusive, so the sum is >> equivalent to the logical or in this case. >>>>> However, the loop pretends to be generic, but it's not. The "or" only >> works for mutual exclusive bits. >>>>> If you want to keep the code with the loops, I think there should be a >> comment explaining this. >>>> >>>> I see. I've experimented a couple of options following your previous >> suggestion >>>> of using the counter loop as the outer loop. Most difficult point was to >> work >>>> around the constraint of having just R0 available as scratch. Bit/bitfield >>>> extracting instrs did not help much since most of them can't perform an >> OR with >>>> its destination operand, which gets worse with the R0 constraint. On the >> other, >>>> afaics 'rldimi' which does an OR with its destination operand can't be used >> to >>>> extract the bit/bitfields in a generic way. That best alternative I found was >> to >>>> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered >> now, >>>> i.e. all bits/conditions for a given counter are ORed. failure_code logic is >> not >>>> inverted any more in the map, which seems more natural. I added also >> more >>>> information to the comment before the bit/counter map. >>>> I think now the loop is generic. >>>> Webrev for the final result: >>>> >>>> http://cr.openjdk.java.net/~gromero/8205582/v4_A/ >>>> >>>> >>>>> Please also fix indentation. >>>> >>>> Done. >>>> >>>> >>>>> Alternatively, the loop could get replaced by some code for each bit. >> Given that the loop is not really generic, I think this wouldn't be worse. >>>>> >>>>> Besides that, it looks good. Thanks for improving it. >>>> >>>> Due to the tight schedule I also provide the corrections for the last >> reviewed >>>> version: >>>> >>>> http://cr.openjdk.java.net/~gromero/8205582/v4_B/ >>>> >>>> If v4_A also looks good I vouch for pushing it instead of v4_B. >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>>> Sent: Freitag, 13. Juli 2018 16:56 >>>>> To: Doerr, Martin ; Lindenmaier, Goetz >> ; hotspot-compiler-dev at openjdk.java.net >>>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on >> nested transactions >>>>> >>>>> Hi Martin, >>>>> >>>>> On 07/12/2018 12:30 PM, Doerr, Martin wrote: >>>>>> I think your new code may increment a counter twice when 2 bits are >> specified for it. I think this should get fixed. >>>>>> Iterating over the RTMLockingCounters in the outer loop and >> performing several checks before incrementing should fix this, right? >>>>> >>>>> Do you mean increment twice counter #2 (conflict counter) or another >> counter? >>>>> non_trans_cf and trans_cf are mutually exclusive. >>>>> You probably spotted a case I'm missing so I ask. >>>>> >>>>> >>>>>> Besides that, I have some improvement proposals: >>>>>> >>>>>> - I think using 3 constant tables is not so good to read. For example, >> inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 >> when using int instead of bool. Would this be better? >>>>>> >>>>>> - I think you can get rid of rtm_counters_Reg increment and restoration >> by computing the abort_offs relative to the original value. >>>>> >>>>> Cool :) >>>>> >>>>> New (interim) webrev: >>>>> http://cr.openjdk.java.net/~gromero/8205582/v3/ >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>>>> Sent: Dienstag, 10. Juli 2018 18:59 >>>>>> To: Doerr, Martin ; Lindenmaier, Goetz >> ; hotspot-compiler-dev at openjdk.java.net >>>>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on >> nested transactions >>>>>> >>>>>> Dear Martin, >>>>>> >>>>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote: >>>>>>> I think it would be better to ignore bit 63 in the macroAssembler code >> and use the definition from the spec in assembler_ppc.hpp. >>>>>>> Somebody may want to use the definition for other purposes. >>>>>> >>>>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be >> extracted >>>>>> using 'rldicr' in macroAssembler. >>>>>> >>>>>> >>>>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf >> would be a better match for x86's description for tm_failure_bit[2]. It's also a >> little unfortunate to print the same bit twice as tm_failure_bit[4]. >>>>>> >>>>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will >> increment counter 2 >>>>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed >> and now >>>>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM >> events, >>>>>> which seems a reasonable approximation to the original semantics of >> the debug >>>>>> counter on Intel. Unfortunately I could not confirm on AIX how these >> two events >>>>>> (trap and syscall in TM) will set the failure code, so the counter will >> never >>>>>> track any information on AIX. But with the current proposed change >> that failure >>>>>> code can be easily added in the future. >>>>>> >>>>>> I also realized that I used previously a wrong ME operand value in: >>>>>> >>>>>> +??????? // Extract 11 bits >>>>>> +??????? rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11); >>>>>> >>>>>> It should be 10 to extract 11 bits actually, so all extractions must be >> correct >>>>>> now. >>>>>> >>>>>> >>>>>> I hope you don't find the array map of failure bits vs counters >> overkilling. >>>>>> >>>>>> Finally, I replaced the comment: >>>>>> >>>>>> tm_tabort, // Note: Seems like signal handler sets this, too. >>>>>> >>>>>> by: >>>>>> >>>>>> tm_tabort, // Signal handler will set this too. >>>>>> >>>>>> because we just enable RTM support on Power if 'htm-nosc' is >> supported, so >>>>>> treclaim. on aborting the syscall will indeed always set Abort bit in >> TEXASR >>>>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to >> check >>>>>> that counter to verify the number of aborts cause by the kernel (if >> any). >>>>>> >>>>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/ >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Gustavo >>>>>> >>>>>>> Best regards, >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>>>>>> Sent: Montag, 25. Juni 2018 10:19 >>>>>>> To: Lindenmaier, Goetz ; Doerr, Martin >> ; hotspot-compiler-dev at openjdk.java.net >>>>>>> Cc: ppc-aix-port-dev at openjdk.java.net >>>>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on >> nested transactions >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Could the following change be reviewed please? >>>>>>> >>>>>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205582 >>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/ >>>>>>> >>>>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by >>>>>>> extracting and checking bits in the Transactional Level field of TEXASR >>>>>>> register. >>>>>>> >>>>>>> It also fixes the memory conflict counter (rtm lock aborts type 2). >> Power TM >>>>>>> status register supports two bits to inform two different types of >> memory >>>>>>> conflict between threads: non-transactional and transactional. >> According to how >>>>>>> the jtreg RTM tests are designed the memory conflict counter counts >>>>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a >> RTM lock >>>>>>> is held on a static variable while another thread without any >> synchronization >>>>>>> (non-trasactional) tries to modify the same variable. Hence that small >>>>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making >> it pass on >>>>>>> Power. The memory conflict counter is not used in any other place >> besides by the >>>>>>> RTM precise statistics (no decision is made by the JVM based on that >> amount). >>>>>>> >>>>>>> This change partially fixes some failures in >>>>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java >> regarding the >>>>>>> nested and memory conflict abort counters. The remaining issue will >> be fixed by >>>>>>> aborting on calling JNI (next RFR). >>>>>>> >>>>>>> >>>>>>> Thank you and best regards, >>>>>>> Gustavo >>>>>>> >>>>>> >>>>> >>>> >>> > From gromero at linux.vnet.ibm.com Wed Jul 18 21:24:38 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Jul 2018 18:24:38 -0300 Subject: RFR(xs): 8205390: jtreg: Fix failing TestRTMSpinLoopCount on PPC64 In-Reply-To: <6a1bc8d7-6a6d-b83b-38d3-3446fc33126a@linux.vnet.ibm.com> References: <13b30945-b86b-5c4e-c92b-a47b7ed425d3@oracle.com> <4385AD40-8851-41D1-AA48-D2F53CF9A7BA@oracle.com> <6a1bc8d7-6a6d-b83b-38d3-3446fc33126a@linux.vnet.ibm.com> Message-ID: <10a552a0-7099-c1da-5ffb-36e9c2e870d8@linux.vnet.ibm.com> On 07/17/2018 04:10 AM, Gustavo Romero wrote: > I'm going to push it to jdk/jdk11 after running the tests if there are no objections. All tests passed: mach5-one-gromero-JDK-8205390-20180718-1940-31577: Build tasks PASSED. Test tasks SUCCESSFUL. Pushing it. Thanks, Gustavo > > Thanks, > Gustavo > > On 06/26/2018 05:56 PM, Igor Ignatyev wrote: >> +1 >> >> -- Igor >> >>> On Jun 25, 2018, at 10:21 AM, Vladimir Kozlov wrote: >>> >>> Good. >>> >>> Thanks, >>> Vladimir >>> >>> On 6/25/18 1:31 AM, Gustavo Romero wrote: >>>> Hi, >>>> Could the following change be reviewed please? >>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205390 >>>> webrev: http://cr.openjdk.java.net/~gromero/8205390/v1/ >>>> It adds a new throttling sequence for PPC64 because the last value on current >>>> sequence does not fit on PPC64. >>>> By using the new sequence the following test is fixed: >>>> +Passed: compiler/rtm/locking/TestRTMSpinLoopCount.java >>>> Thank you and best regards, >>>> Gustavo >> > From gromero at linux.vnet.ibm.com Wed Jul 18 21:25:43 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Jul 2018 18:25:43 -0300 Subject: RFR(xs): 8205578: jtreg: Fix failing TestRTMAbortRatio on PPC64 In-Reply-To: <1bdcc4ba-6cdc-7fb8-f859-ef7558b51314@linux.vnet.ibm.com> References: <37227b8b-7977-349a-249c-f7613d06d64f@linux.vnet.ibm.com> <25dba6c8-067d-3d3f-c4f3-dca19a34d70b@oracle.com> <76020dfc-81b1-897e-f04e-5ac5cd021418@linux.vnet.ibm.com> <1bdcc4ba-6cdc-7fb8-f859-ef7558b51314@linux.vnet.ibm.com> Message-ID: <19bba62c-2e52-bf92-841f-64a3238b2d39@linux.vnet.ibm.com> On 07/17/2018 04:06 AM, Gustavo Romero wrote: > I'm going to push it to jdk/jdk11 after running the tests if there are no objections. All tests passed: mach5-one-gromero-JDK-8205578-20180718-1954-31578: Build tasks PASSED. Test tasks SUCCESSFUL. Pushing it. Thanks, Gustavo > > Thanks, > Gustavo > > On 06/28/2018 03:00 PM, Gustavo Romero wrote: >> Hi Vladimir, >> >> On 06/28/2018 02:43 PM, Vladimir Kozlov wrote: >>> Looks good. >> >> Thanks a lot for reviewing it. :) >> >> >> Regards, >> Gustavo >> >>> Thanks, >>> Vladimir >>> >>> On 6/28/18 6:13 AM, Gustavo Romero wrote: >>>> Hi Igor, >>>> >>>> On 06/28/2018 03:26 AM, Igor Ignatyev wrote: >>>>> Hi Gustavo, >>>>> >>>>> looks fine to me. >>>> >>>> Thanks! >>>> >>>> Could I get a second review please? >>>> >>>> >>>> Regards, >>>> Gustavo >>>> >>>>> Thanks, >>>>> -- Igor >>>>> >>>>>> On Jun 25, 2018, at 1:29 AM, Gustavo Romero wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Could the following simple change be reviewed please? >>>>>> >>>>>> bug?? : https://bugs.openjdk.java.net/browse/JDK-8205578 >>>>>> webrev: http://cr.openjdk.java.net/~gromero/8205578/v1/ >>>>>> >>>>>> Currently native method pageSize() is used to cause deliberate transactional >>>>>> aborts. However in test TestRTMAbortRatio pageSize() is not marked to be >>>>>> compilable and as a consequence it's never called through the code path of >>>>>> SharedRuntime::generate_native_wrapper(). As that code path is never exercised >>>>>> no 'tabort' on JNI call is executed and the test fails on Power because of fewer >>>>>> aborts than expected by the test. >>>>>> >>>>>> I can't say for sure why that test is getting the correct number of aborts on >>>>>> x86. Nonetheless I can confirm that even on x86 the aborts do not come from the >>>>>> native wrapper, i.e. from 'xabort' in SharedRuntime::generate_native_wrapper(). >>>>>> I suspect the aborts on x86 are occurring a bit latter when the native function >>>>>> is called and a "Far Call" is executed in the native method by chance and not in >>>>>> a controlled way. As far as I know there is no way to inspect the exact address >>>>>> when a transaction failed on Intel as it's possible on Power. >>>>>> >>>>>> Anyway, marking pageSize() as compilable does not cause any regression on Intel >>>>>> (at the same time it starts to exercise the generate_native_wrapper code path) >>>>>> and makes the test pass on Power as expected. >>>>>> >>>>>> So it fixes the following test on Power: >>>>>> >>>>>> +Passed: compiler/rtm/locking/TestRTMAbortRatio.java >>>>>> >>>>>> >>>>>> Thank you and best regards, >>>>>> Gustavo >>>>>> >>>>> >>>> >>> >> > From matthias.baesken at sap.com Thu Jul 19 16:05:02 2018 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Thu, 19 Jul 2018 16:05:02 +0000 Subject: Base64 encoding algorithm enhancements for ppc64 ? Message-ID: <32561163bee74bfe9eba4a22a108c411@sap.com> Hello, when looking into some jtreg test issues I came across that for x86_64 some enhancements to Base64 encoding were contributed recently . Please see : JDK-8205528 : Base64 encoding algorithm using AVX512 instructions https://bugs.openjdk.java.net/browse/JDK-8205528 http://hg.openjdk.java.net/jdk/jdk/rev/480a96a43b62 Do we have something similar for ppc64 ? If not do you think it's worth doing it for this architecture as well ? Best regards , Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Fri Jul 20 14:02:40 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 20 Jul 2018 11:02:40 -0300 Subject: Base64 encoding algorithm enhancements for ppc64 ? In-Reply-To: <32561163bee74bfe9eba4a22a108c411@sap.com> References: <32561163bee74bfe9eba4a22a108c411@sap.com> Message-ID: <416d0d0b-2985-4b23-3649-c9d127507333@linux.vnet.ibm.com> Hi Matthias, On 07/19/2018 01:05 PM, Baesken, Matthias wrote: > Hello, when looking into? some jtreg? test issues I came across? that? for x86_64? some enhancements > > to Base64 encoding?? were contributed? recently . > > Please see : > > JDK-8205528 :? Base64 encoding algorithm using AVX512 instructions > > https://bugs.openjdk.java.net/browse/JDK-8205528 > > http://hg.openjdk.java.net/jdk/jdk/rev/480a96a43b62 > > Do we have something similar? for? ppc64? ? No, not currently in the PPC64 / JVM. Your idea would be to use the VSX for Base64 encoding using SIMD instructions, right? You probably already have references on hand, but just in case there are a few refs (on Intel - I don't know any on Power so far) on this page: http://0x80.pl/notesen/2016-01-12-sse-base64-encoding.html > If not do you think it?s worth? doing it for? this architecture as well ? I'm not sure. On our side I'm not aware of any workload relying heavily on that so I did not see any comments on it being worth to be implemented for PPC64. On Intel it looks like that it's possible to get at least a 2x improvement, so maybe an initial quick test could be done outside the JVM to get a feeling about it on PPC64? I can help to run/test it on POWER9. If it proves reasonable, I think it's a matter of available manpower to do that :) BTW, I discuss from time to time with Martin, Volker, and Goetz features pending and possibly interesting to be implemented on PPC64 (like GHASH - which is in my TODO list). What do you think about creating a Wiki section, like in [1], to track these things? Best regards, Gustavo [1] https://wiki.openjdk.java.net/pages/viewpage.action?pageId=13041681 From matthias.baesken at sap.com Tue Jul 24 07:58:31 2018 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 24 Jul 2018 07:58:31 +0000 Subject: Base64 encoding algorithm enhancements for ppc64 ? In-Reply-To: <416d0d0b-2985-4b23-3649-c9d127507333@linux.vnet.ibm.com> References: <32561163bee74bfe9eba4a22a108c411@sap.com> <416d0d0b-2985-4b23-3649-c9d127507333@linux.vnet.ibm.com> Message-ID: <69931eb0ba524f82a09809484556f5a7@sap.com> > Your idea would be to use the VSX for Base64 encoding using SIMD > instructions, right? Hi Gustavo , I haven't looked into the details yet but thought about something like this . > > BTW, I discuss from time to time with Martin, Volker, and Goetz features > pending > and possibly interesting to be implemented on PPC64 (like GHASH - which is > in > my TODO list). What do you think about creating a Wiki section, like in [1], to > track these things? > Sounds like a good idea ! Best regards, Matthias > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Freitag, 20. Juli 2018 16:03 > To: Baesken, Matthias ; Simonis, Volker > ; ppc-aix-port-dev at openjdk.java.net > Cc: Lindenmaier, Goetz ; Doerr, Martin > > Subject: Re: Base64 encoding algorithm enhancements for ppc64 ? > > Hi Matthias, > > On 07/19/2018 01:05 PM, Baesken, Matthias wrote: > > Hello, when looking into? some jtreg? test issues I came across? that? for > x86_64? some enhancements > > > > to Base64 encoding?? were contributed? recently . > > > > Please see : > > > > JDK-8205528 > :? Base64 encoding algorithm using AVX512 instructions > > > > https://bugs.openjdk.java.net/browse/JDK-8205528 > > > > http://hg.openjdk.java.net/jdk/jdk/rev/480a96a43b62 > > > > Do we have something similar? for? ppc64? ? > > No, not currently in the PPC64 / JVM. > Your idea would be to use the VSX for Base64 encoding using SIMD > instructions, right? > You probably already have references on hand, but just in case there are a > few > refs (on Intel - I don't know any on Power so far) on this page: > http://0x80.pl/notesen/2016-01-12-sse-base64-encoding.html > > > > If not do you think it's worth? doing it for? this architecture as well ? > > I'm not sure. On our side I'm not aware of any workload relying heavily on > that so I did not see any comments on it being worth to be implemented for > PPC64. > On Intel it looks like that it's possible to get at least a 2x improvement, so > maybe an initial quick test could be done outside the JVM to get a feeling > about > it on PPC64? I can help to run/test it on POWER9. If it proves reasonable, I > think it's a matter of available manpower to do that :) > > BTW, I discuss from time to time with Martin, Volker, and Goetz features > pending > and possibly interesting to be implemented on PPC64 (like GHASH - which is > in > my TODO list). What do you think about creating a Wiki section, like in [1], to > track these things? > > > Best regards, > Gustavo > > [1] https://wiki.openjdk.java.net/pages/viewpage.action?pageId=13041681 From HORIE at jp.ibm.com Wed Jul 25 05:43:59 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 25 Jul 2018 14:43:59 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support Message-ID: Dear all, Would you review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 This change adds support for vectorized arithmetic calculation with SLP. The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. I confirmed this change with JTREG. In addition, I used attached micro benchmarks. (See attached file: slp_microbench.zip) Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slp_microbench.zip Type: application/zip Size: 5288 bytes Desc: not available URL: From gromero at linux.vnet.ibm.com Wed Jul 25 14:05:22 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 25 Jul 2018 11:05:22 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: Message-ID: Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote: > Dear all, > > Would you review the following change? > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > This change adds support for vectorized arithmetic calculation with SLP. > > The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. Looks good. Just a few comments: - In vmul4F_reg() would it be reasonable to use xvmulsp instead of vmaddfp in order to avoid the splat? - Although all instructions added by your change where introduced in ISA 2.06, so POWER7 and above are OK, as I see probes for PowerArchictecturePPC64=6|5 in vm_version_ppc.cpp (line 64), I'm wondering if there is any control point to guarantee that these instructions won't be emitted on a CPU that does not support them. - I think that in general string in format %{} are in upper case. For instance, this the current output on optoassembly for vmul4F: 2941835 5b4 ADDI R24, R24, #64 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector I think it would be better to be in upper case instead. I also think that if the node match emits more than one instruction all instructions must be listed in format %{}, since it's meant for detailed debugging. Finally I think it would be better to replace \t! by \t// in that string (unless I'm missing any special meaning for that char). So for vmul4F it would be something like: 2941835 5b4 ADDI R24, R24, #64 VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector But feel free to change anything just after you get additional reviews :) > I confirmed this change with JTREG. In addition, I used attached micro benchmarks. > /(See attached file: slp_microbench.zip)/ Thanks for sharing it. Btw, another option to host it would be in the CR server, in http://cr.openjdk.java.net/~mhorie/8208171 Best regards, Gustavo > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > From HORIE at jp.ibm.com Thu Jul 26 04:43:40 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Thu, 26 Jul 2018 13:43:40 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: Message-ID: Hi Gustavo, Thank you very much for your helpful comments! I updated webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ Best regards, -- Michihiro, IBM Research - Tokyo From: Gustavo Romero To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-dev at openjdk.java.net, hotspot-dev at openjdk.java.net Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" Date: 2018/07/25 23:05 Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote: > Dear all, > > Would you review the following change? > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > This change adds support for vectorized arithmetic calculation with SLP. > > The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. Looks good. Just a few comments: - In vmul4F_reg() would it be reasonable to use xvmulsp instead of vmaddfp in order to avoid the splat? - Although all instructions added by your change where introduced in ISA 2.06, so POWER7 and above are OK, as I see probes for PowerArchictecturePPC64=6|5 in vm_version_ppc.cpp (line 64), I'm wondering if there is any control point to guarantee that these instructions won't be emitted on a CPU that does not support them. - I think that in general string in format %{} are in upper case. For instance, this the current output on optoassembly for vmul4F: 2941835 5b4 ADDI R24, R24, #64 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector I think it would be better to be in upper case instead. I also think that if the node match emits more than one instruction all instructions must be listed in format %{}, since it's meant for detailed debugging. Finally I think it would be better to replace \t! by \t// in that string (unless I'm missing any special meaning for that char). So for vmul4F it would be something like: 2941835 5b4 ADDI R24, R24, #64 VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector But feel free to change anything just after you get additional reviews :) > I confirmed this change with JTREG. In addition, I used attached micro benchmarks. > /(See attached file: slp_microbench.zip)/ Thanks for sharing it. Btw, another option to host it would be in the CR server, in http://cr.openjdk.java.net/~mhorie/8208171 Best regards, Gustavo > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From gromero at linux.vnet.ibm.com Thu Jul 26 14:01:39 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 26 Jul 2018 11:01:39 -0300 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References: Message-ID: Hi Michi, On 07/26/2018 01:43 AM, Michihiro Horie wrote: > I updated webrev: > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ Thanks for providing an updated webrev and for fixing indentation and function order in assembler_ppc.inline.hpp as well. I have no further comments :) Best Regards, Gustavo > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > From: Gustavo Romero > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-dev at openjdk.java.net, hotspot-dev at openjdk.java.net > Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" > Date: 2018/07/25 23:05 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michi, > > On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > Dear all, > > > > Would you review the following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > > > This change adds support for vectorized arithmetic calculation with SLP. > > > > The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. > > Looks good. Just a few comments: > > - In vmul4F_reg() would it be reasonable to use xvmulsp instead of vmaddfp in > order to avoid the splat? > > - Although all instructions added by your change where introduced in ISA 2.06, > so POWER7 and above are OK, as I see probes for PowerArchictecturePPC64=6|5 in > vm_version_ppc.cpp (line 64), I'm wondering if there is any control point to > guarantee that these instructions won't be emitted on a CPU that does not > support them. > > - I think that in general string in format %{} are in upper case. For instance, > this the current output on optoassembly for vmul4F: > > 2941835 5b4 ADDI R24, R24, #64 > 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > I think it would be better to be in upper case instead. I also think that if > the node match emits more than one instruction all instructions must be listed > in format %{}, since it's meant for detailed debugging. Finally I think it > would be better to replace \t! by \t// in that string (unless I'm missing any > special meaning for that char). So for vmul4F it would be something like: > > 2941835 5b4 ADDI R24, R24, #64 > VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > > But feel free to change anything just after you get additional reviews :) > > > > I confirmed this change with JTREG. In addition, I used attached micro benchmarks. > > /(See attached file: slp_microbench.zip)/ > > Thanks for sharing it. > Btw, another option to host it would be in the CR > server, in http://cr.openjdk.java.net/~mhorie/8208171 > > > Best regards, > Gustavo > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > >