From david.holmes at oracle.com Sun Mar 2 20:50:37 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 03 Mar 2014 14:50:37 +1000 Subject: Removing intrinsic of Thread.isInterrupted() In-Reply-To: <530D1E53.70206@oracle.com> References: <530BF5FD.5060105@oracle.com> <530CEBA5.8090601@oracle.com> <39387847-AE08-473B-AE32-929AFEB7F210@oracle.com> <530CFFF3.6040304@oracle.com> <530D1E53.70206@oracle.com> Message-ID: <53140A1D.1030307@oracle.com> On 26/02/2014 8:50 AM, Vladimir Kozlov wrote: > Thank you, Karen and Yumin, for explanation. > > I understand the problem now. > > Should we remove (put under #ifndef) fast path "if (TLS._interrupted && > !clear_int) return true;" only for Windows or for other platforms too? I assume this was resolved elsewhere but the problem is only on Windows. David ----- > Thanks, > Vladimir > > On 2/25/14 12:41 PM, Yumin Qi wrote: >> When I am writing email I saw your email. >> >> to answer Vladimir's question, I come up with a scenario: >> >> When Thread is in process of interrupt call: >> >> OSThread* osthread = thread->osthread(); >> osthread->set_interrupted(true); >> // More than one thread can get here with the same value of osthread, >> // resulting in multiple notifications. We do, however, want the >> store >> // to interrupted() to be visible to other threads before we post >> // the interrupt event. >> OrderAccess::release(); >> SetEvent(osthread->interrupt_event()); >> >> >> Before SetEvent, bit is set. >> >> Now, with intrinsification, call >> >> Thread.currentThread().isInterrupted() will return return 'true' due >> to clear_int is 'false'. >> >> The following call >> Thread.isInterrupted() will go to slow path, since the Event is >> not set, we got a 'false' with new fix. >> >> Karen's suggestion is get ride of second fastpath (t == >> Thread.current() && !clear_int) so it will be >> >> return (t == Thread.current() && !TLS._osthread._interrupted) ? >> fast : slow >> >> Thanks >> Yumin >> Thanks >> Yumin >> >> >> >> On 2/25/2014 11:56 AM, Karen Kinnear wrote: >>> Vladimir, >>> >>> I updated the bug to reflect the code review email thread between >>> Yumin, myself >>> and David Holmes. >>> >>> To the best of my understanding there is a potential timing hole here, >>> with the clearInterrupted >>> case (study the webrev for the fix), i.e. in the isInterrupted(false) >>> case. >>> >>> What I proposed was that we could keep the intrinsic quick check for >>> isInterrupted bit, but not >>> the logic for the isInterrupted(false) - unless you want to change it >>> to add the Windows >>> WaitForSingleObject call - which I assume removes the benefit of the >>> intrinsic. >>> >>> Does that make sense to you? >>> thanks, >>> Karen >>> >>> On Feb 25, 2014, at 2:14 PM, Vladimir Kozlov wrote: >>> >>>> Yumin, >>>> >>>> On 2/24/14 5:46 PM, Yumin Qi wrote: >>>>> Hi, Compiler team >>>>> >>>>> I worked on this bug: 6498581:ThreadInterruptTest3 produces wrong >>>>> output on Windows. This is a problem thread sleep wakes up spuriously >>>>> due to a race condition between os::interrupt and os::is_interruped. >>>>> Detail please see bug comments. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-6498581 >>>>> >>>>> The fix is (without removing intrinsic, but intend to remove it) : >>>>> http://cr.openjdk.java.net/~minqi/6498581/webrev00/ >>>>> >>>>> One problem is that Thread.isInterrupted() is intrinsic and >>>>> there is >>>>> chance that code like >>>>> >>>>> boolean a = Thread.currentThread().isInterrupted(); >>>>> boolean b = Thread.interrupted(); >>>>> >>>>> Will get different value. (fast/slow path) >>>> How you come to this conclusion? You my be mistaken. We intrinsify >>>> native boolean isInterrupted(boolean ClearInterrupted) and not java >>>> method: isInterrupted(). >>>> >>>> Also you are comparing different code which is nothing to do with >>>> intrinsic: >>>> >>>> isInterrupted() passes ClearInterrupted=false: >>>> >>>> public boolean isInterrupted() { >>>> return isInterrupted(false); >>>> } >>>> >>>> when interrupted() passes 'true': >>>> >>>> public static boolean interrupted() { >>>> return currentThread().isInterrupted(true); >>>> } >>>> >>>> Both method calls native isInterrupted(bool) which is intrinsified. >>>> So both calls intrinsic. There should be no difference. >>>> >>>> From performance point of view, as Aleksey pointed, there is huge >>>> difference. We can't remove intrinsic. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> I tried to remove the intrinsic code and done a test using >>>>> following >>>>> code. The result showed there is no difference by removing the >>>>> intrinsic >>>>> of Thread.isInterrupted(). >>>>> >>>>> // test performance of removing Thread.isInterrupted() inlining >>>>> public class TestThreadInterrupted { >>>>> public static void main(String... args) { >>>>> Thread t = new Thread () { >>>>> public void run() { >>>>> boolean isInt = false; >>>>> while (!isInt) { >>>>> try { >>>>> Thread.sleep(30); >>>>> } catch (InterruptedException ie) { >>>>> isInt = true; >>>>> } >>>>> } >>>>> } >>>>> }; >>>>> >>>>> t.start(); >>>>> // run >>>>> long start, finish, isum = 0L, osum = 0L; >>>>> int NUM = 20000; >>>>> for (int j = 0; j < 100; j++) { >>>>> isum = 0L; >>>>> for (int i = 0; i < NUM; i++) { >>>>> start = System.currentTimeMillis(); >>>>> t.isInterrupted(); >>>>> finish = System.currentTimeMillis(); >>>>> isum += (finish - start); >>>>> } >>>>> >>>>> System.out.println("Total cost of " + NUM + " calls is " + >>>>> isum >>>>> + " ms"); >>>>> osum += isum; >>>>> } >>>>> System.out.println("Average " + osum/100 + " ms"); >>>>> t.interrupt(); >>>>> try { >>>>> t.join(); >>>>> } catch (InterruptedException e) {} >>>>> } >>>>> } >>>>> >>>>> And found there is no difference on Solaris-x64/sparcv9, >>>>> Windows(32/64), >>>>> linux(32/64) before and after the removing of intrinsic >>>>> Thread.isInterrupted(). >>>>> >>>>> Should I remove the intrinsic? >>>>> >>>>> Data (no major difference for both with/without intrinsic): >>>>> >>>>> 1)windows : >>>>> .... >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 1 ms >>>>> Total cost of 20000 calls is 1 ms >>>>> Total cost of 20000 calls is 1 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 1 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 0 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Average 1 ms >>>>> >>>>> 2) Solaris-x64 >>>>> .... >>>>> Total cost of 20000 calls is 3 ms >>>>> Total cost of 20000 calls is 1 ms >>>>> Total cost of 20000 calls is 4 ms >>>>> Total cost of 20000 calls is 6 ms >>>>> Total cost of 20000 calls is 6 ms >>>>> Total cost of 20000 calls is 5 ms >>>>> Total cost of 20000 calls is 7 ms >>>>> Total cost of 20000 calls is 5 ms >>>>> Total cost of 20000 calls is 5 ms >>>>> Total cost of 20000 calls is 1 ms >>>>> Total cost of 20000 calls is 3 ms >>>>> Total cost of 20000 calls is 2 ms >>>>> Total cost of 20000 calls is 3 ms >>>>> Total cost of 20000 calls is 3 ms >>>>> Total cost of 20000 calls is 5 ms >>>>> Total cost of 20000 calls is 4 ms >>>>> Total cost of 20000 calls is 4 ms >>>>> Total cost of 20000 calls is 7 ms >>>>> Average 4 ms >>>>> >>>>> 3) Linux: >>>>> >>>>> .... >>>>> Total cost of 20000 calls is 30 ms >>>>> Total cost of 20000 calls is 29 ms >>>>> Total cost of 20000 calls is 26 ms >>>>> Total cost of 20000 calls is 26 ms >>>>> Total cost of 20000 calls is 26 ms >>>>> Total cost of 20000 calls is 24 ms >>>>> Total cost of 20000 calls is 29 ms >>>>> Total cost of 20000 calls is 25 ms >>>>> Total cost of 20000 calls is 20 ms >>>>> Average 24 ms >>>>> >>>>> >>>>> Thanks >>>>> Yumin >> From albert.noll at oracle.com Mon Mar 3 00:34:32 2014 From: albert.noll at oracle.com (Albert) Date: Mon, 03 Mar 2014 09:34:32 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 Message-ID: <53143E98.5080102@oracle.com> Hi, could I get reviews for this small patch? Bug: https://bugs.openjdk.java.net/browse/JDK-8036091 Problem: JDK-8034775 changed the minimum number of compiler threads for tiered compilation to 2, since each compiler (C1 and C2) requires a separate compiler thread. The test is started with -XX:CICompilerCount=1, which is illegal after JDK-8034775. Solution: Remove -XX:CICompilerCount=1 Testing: Reproduced bug with removed option: -XX:CICompilerCount=1 Webrev: http://cr.openjdk.java.net/~anoll/8036091/webrev.00/ Thanks, Albert From roland.westrelin at oracle.com Mon Mar 3 02:04:30 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 3 Mar 2014 11:04:30 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53143E98.5080102@oracle.com> References: <53143E98.5080102@oracle.com> Message-ID: <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> Hi Albert, > Problem: > JDK-8034775 changed the minimum number of compiler threads for tiered compilation to 2, since each compiler (C1 and C2) requires a separate compiler thread. The test is started with -XX:CICompilerCount=1, which is illegal after JDK-8034775. > > Solution: > Remove -XX:CICompilerCount=1 But then if we run with tiered off, C2 will have more than 1 compiler thread which is not what the test wants for some reason. Maybe if tiered is on CICompilerCount=1 should make the VM silently use 1 thread per compilers? Roland. From igor.ignatyev at oracle.com Mon Mar 3 02:21:51 2014 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 03 Mar 2014 14:21:51 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> Message-ID: <531457BF.40509@oracle.com> maybe you should explicitly disable TieredCompilation in the test? Igor On 03/03/2014 02:04 PM, Roland Westrelin wrote: > Hi Albert, > >> Problem: >> JDK-8034775 changed the minimum number of compiler threads for tiered compilation to 2, since each compiler (C1 and C2) requires a separate compiler thread. The test is started with -XX:CICompilerCount=1, which is illegal after JDK-8034775. >> >> Solution: >> Remove -XX:CICompilerCount=1 > > But then if we run with tiered off, C2 will have more than 1 compiler thread which is not what the test wants for some reason. Maybe if tiered is on CICompilerCount=1 should make the VM silently use 1 thread per compilers? > > Roland. > From albert.noll at oracle.com Mon Mar 3 02:28:32 2014 From: albert.noll at oracle.com (Albert) Date: Mon, 03 Mar 2014 11:28:32 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> Message-ID: <53145950.7070903@oracle.com> Hi Roland, thanks for your feedback. Silently using 2 instead of 1 compiler thread when tiered is enabled and -XX:CICompilerCount=1 was the default behavior before JDK-8034775. I worked on JDK-8034775 and back than it occurred to me that it is strange that the JVM silently 'overrules' an explicit command given by the user (namely that he/she wants to use only 1 compiler thread). Since the tests starts with -Xbatch and hence background compilation is disabled, I do not see how more compiler threads can fail the test. Also, the original bug reproduces easily with more than 1 compiler thread. Best, Albert On 03/03/2014 11:04 AM, Roland Westrelin wrote: > Hi Albert, > >> Problem: >> JDK-8034775 changed the minimum number of compiler threads for tiered compilation to 2, since each compiler (C1 and C2) requires a separate compiler thread. The test is started with -XX:CICompilerCount=1, which is illegal after JDK-8034775. >> >> Solution: >> Remove -XX:CICompilerCount=1 > But then if we run with tiered off, C2 will have more than 1 compiler thread which is not what the test wants for some reason. Maybe if tiered is on CICompilerCount=1 should make the VM silently use 1 thread per compilers? > > Roland. From aleksey.shipilev at oracle.com Mon Mar 3 02:28:43 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 14:28:43 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> Message-ID: <5314595B.60600@oracle.com> On 03/03/2014 02:04 PM, Roland Westrelin wrote: > Hi Albert, > >> Problem: JDK-8034775 changed the minimum number of compiler threads >> for tiered compilation to 2, since each compiler (C1 and C2) >> requires a separate compiler thread. The test is started with >> -XX:CICompilerCount=1, which is illegal after JDK-8034775. >> >> Solution: Remove -XX:CICompilerCount=1 > > But then if we run with tiered off, C2 will have more than 1 compiler > thread which is not what the test wants for some reason. Maybe if > tiered is on CICompilerCount=1 should make the VM silently use 1 > thread per compilers? +1. I missed the review of the original CICompilerCount fix which made "1" illegal in some cases. I think there are other broken tests and/or use cases which rely on CICompilerCount=1, e.g. printing generated assembly without (hopefully) mixing up the output from the concurrent compiler tests, i.e.: $ alias java-asm alias java-asm='java -XX:+UnlockDiagnosticVMOptions -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintInlining -XX:+PrintAssembly' -Aleksey From aleksey.shipilev at oracle.com Mon Mar 3 02:36:24 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 14:36:24 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53145950.7070903@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> Message-ID: <53145B28.7040109@oracle.com> On 03/03/2014 02:28 PM, Albert wrote: > Silently using 2 instead of 1 compiler thread when tiered is enabled > and -XX:CICompilerCount=1 was the default behavior before > JDK-8034775. I worked on JDK-8034775 and back than it occurred to me > that it is strange that the JVM silently 'overrules' an explicit > command given by the user (namely that he/she wants to use only 1 > compiler thread). I don't find it strange if you follow the notion of two compilers in TieredCompilation. CICompilerCount=1 means "one thread per compiler", which given two compilers means two threads. If you really want a single thread in the compiler thread pool, then you should disable one of the compilers. -Aleksey. From albert.noll at oracle.com Mon Mar 3 02:38:03 2014 From: albert.noll at oracle.com (Albert) Date: Mon, 03 Mar 2014 11:38:03 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53145B28.7040109@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> Message-ID: <53145B8B.6060306@oracle.com> Hi Aleksey, that is not what the description of the flag says: product(intx, CICompilerCount, CI_COMPILER_COUNT, "Number of compiler threads to run") Best, Albert On 03/03/2014 11:36 AM, Aleksey Shipilev wrote: > On 03/03/2014 02:28 PM, Albert wrote: >> Silently using 2 instead of 1 compiler thread when tiered is enabled >> and -XX:CICompilerCount=1 was the default behavior before >> JDK-8034775. I worked on JDK-8034775 and back than it occurred to me >> that it is strange that the JVM silently 'overrules' an explicit >> command given by the user (namely that he/she wants to use only 1 >> compiler thread). > I don't find it strange if you follow the notion of two compilers in > TieredCompilation. CICompilerCount=1 means "one thread per compiler", > which given two compilers means two threads. If you really want a single > thread in the compiler thread pool, then you should disable one of the > compilers. > > -Aleksey. From albert.noll at oracle.com Mon Mar 3 02:42:57 2014 From: albert.noll at oracle.com (Albert) Date: Mon, 03 Mar 2014 11:42:57 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53145B8B.6060306@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> Message-ID: <53145CB1.9070804@oracle.com> Also, there is noting that conceptually prevent having a shared compilation queue between C1 and C2, which would make -XX:CICompilerCount=1 make work also for TieredCompilation. Best, Albert On 03/03/2014 11:38 AM, Albert wrote: > Hi Aleksey, > > that is not what the description of the flag says: > > product(intx, CICompilerCount, CI_COMPILER_COUNT, > "Number of compiler threads to run") > > Best, > Albert > > > On 03/03/2014 11:36 AM, Aleksey Shipilev wrote: >> On 03/03/2014 02:28 PM, Albert wrote: >>> Silently using 2 instead of 1 compiler thread when tiered is enabled >>> and -XX:CICompilerCount=1 was the default behavior before >>> JDK-8034775. I worked on JDK-8034775 and back than it occurred to me >>> that it is strange that the JVM silently 'overrules' an explicit >>> command given by the user (namely that he/she wants to use only 1 >>> compiler thread). >> I don't find it strange if you follow the notion of two compilers in >> TieredCompilation. CICompilerCount=1 means "one thread per compiler", >> which given two compilers means two threads. If you really want a single >> thread in the compiler thread pool, then you should disable one of the >> compilers. >> >> -Aleksey. > From aleksey.shipilev at oracle.com Mon Mar 3 03:03:42 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 15:03:42 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53145B8B.6060306@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> Message-ID: <5314618E.5050607@oracle.com> Hi Albert, But, your original explanation [1] means C1 and C2 are different compilers, and CICompilerCount=1 could apply to both of them individually, which means C1 and C2 get one thread each. We can probably weasel out by saying the option means "*total* number of compiler threads". However, that does not help much, because we just annoy users with VM errors. How would user force a single compiler thread to run? Before the CICC enforcement: a) +Tiered, CICC=1: user silently gets two compiler threads b) -Tiered, CICC=1: user gets single compiler thread After the CICC change: a) +Tiered, CICC=1: user gets the VM error. If user still wants to run Tiered then he/she should set CICC=2 and move on. If user wants a single thread, then we force him/her to disable Tiered. b) -Tiered, CICC=1: user is opaque about the change So, we require users to explicitly segregate the additional configuration sets depending on +/-Tiered, only to communicate to them they maybe running two separate compilers which by implementation detail require two separate threads. I think there is no new and valuable information to the users in that parlance, only the annoying. I can see yet another case why would users disable Tiered right away in their tests, which is not what we want. I think we should rollback CICC=1 check to silently mean CICC=2 in case of Tiered and rework Tiered to use the common compiler thread pool, so that to remove the substance of this discussion. Is this doable? Otherwise, I think we need to have a proper CCC in place discussing the change in the accepted values of a product flag. -Aleksey. [1] "JDK-8034775 changed the minimum number of compiler threads for tiered compilation to 2, since ***each compiler*** (C1 and C2) requires a separate compiler thread." (emphasis is mine) On 03/03/2014 02:38 PM, Albert wrote: > Hi Aleksey, > > that is not what the description of the flag says: > > product(intx, CICompilerCount, CI_COMPILER_COUNT, > "Number of compiler threads to run") > > Best, > Albert > > > On 03/03/2014 11:36 AM, Aleksey Shipilev wrote: >> On 03/03/2014 02:28 PM, Albert wrote: >>> Silently using 2 instead of 1 compiler thread when tiered is enabled >>> and -XX:CICompilerCount=1 was the default behavior before >>> JDK-8034775. I worked on JDK-8034775 and back than it occurred to me >>> that it is strange that the JVM silently 'overrules' an explicit >>> command given by the user (namely that he/she wants to use only 1 >>> compiler thread). >> I don't find it strange if you follow the notion of two compilers in >> TieredCompilation. CICompilerCount=1 means "one thread per compiler", >> which given two compilers means two threads. If you really want a single >> thread in the compiler thread pool, then you should disable one of the >> compilers. >> >> -Aleksey. > From aleksey.shipilev at oracle.com Mon Mar 3 03:11:05 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 15:11:05 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53145CB1.9070804@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <53145CB1.9070804@oracle.com> Message-ID: <53146349.6030800@oracle.com> I think that should be the way to go. However, that requires choosing the less of two evils: restore the original implicit CICC=2 with +Tiered. Because if you keep the "+Tiered should use explicit CICC=2" today, then you effectively block yourself from merging the compiler threads for C1 and C2, because then "+Tiered CICC=2" will suddenly change the meaning, and users are already exposed. -Aleksey. On 03/03/2014 02:42 PM, Albert wrote: > Also, there is noting that conceptually prevent having a shared > compilation queue between C1 and C2, > which would make -XX:CICompilerCount=1 make work also for > TieredCompilation. > > Best, > Albert > > On 03/03/2014 11:38 AM, Albert wrote: >> Hi Aleksey, >> >> that is not what the description of the flag says: >> >> product(intx, CICompilerCount, CI_COMPILER_COUNT, >> "Number of compiler threads to run") >> >> Best, >> Albert >> >> >> On 03/03/2014 11:36 AM, Aleksey Shipilev wrote: >>> On 03/03/2014 02:28 PM, Albert wrote: >>>> Silently using 2 instead of 1 compiler thread when tiered is enabled >>>> and -XX:CICompilerCount=1 was the default behavior before >>>> JDK-8034775. I worked on JDK-8034775 and back than it occurred to me >>>> that it is strange that the JVM silently 'overrules' an explicit >>>> command given by the user (namely that he/she wants to use only 1 >>>> compiler thread). >>> I don't find it strange if you follow the notion of two compilers in >>> TieredCompilation. CICompilerCount=1 means "one thread per compiler", >>> which given two compilers means two threads. If you really want a single >>> thread in the compiler thread pool, then you should disable one of the >>> compilers. >>> >>> -Aleksey. >> > From aleksey.shipilev at oracle.com Mon Mar 3 03:46:46 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 15:46:46 +0400 Subject: RFR (S) 8031818: Experimental VM flag for enforcing safe object construction In-Reply-To: <52DEF8FB.3020707@oracle.com> References: <52DEF8FB.3020707@oracle.com> Message-ID: <53146BA6.3000104@oracle.com> On 01/22/2014 02:47 AM, Aleksey Shipilev wrote: > Please review the experimental patch for switching the research VM mode > which unconditionally emits the memory barrier at the end of constructor: > http://cr.openjdk.java.net/~shade/8031818/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8031818 Since the hs-comp is finally open, let's get back to this one. Only Vladimir K. had formally reviewed, plus Vladimir I. had informally reviewed. I think we need a second formal Reviewer? Here is the current webrev: http://cr.openjdk.java.net/~shade/8031818/webrev.02/ ...and here is the changeset (second reviewer is ????): http://cr.openjdk.java.net/~shade/8031818/8031818.changeset The code was passing the full JPRT cycle two weeks ago, passed the HS jtregs back then. It applies cleanly over jdk9/hs-comp now, and builds successfully with Linux x86_64/fastdebug. Thanks, -Aleksey. From aleksey.shipilev at oracle.com Mon Mar 3 04:07:18 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 16:07:18 +0400 Subject: RFR (S): 8033380: Experimental VM flag to enforce access atomicity In-Reply-To: <52FA032E.4000206@oracle.com> References: <52FA032E.4000206@oracle.com> Message-ID: <53147076.5060909@oracle.com> On 02/11/2014 03:02 PM, Aleksey Shipilev wrote: > Please review this small feature meanwhile: > https://bugs.openjdk.java.net/browse/JDK-8033380 > http://cr.openjdk.java.net/~shade/8033380/webrev.02/ Since we are open for integration, Let us get back to this thing as well. Recapping the feedbacks: Roland had OK'ayed. Vladimir K. had OK'ayed C2 parts. I think Igor V. had blessed the C1 parts. Christian Tornqvist has doubts about whether we should commit it. Christian Thalinger had the objection, but retracted it. Marcus L. had agreed this can be pushed into the mainline. Current webrev: http://cr.openjdk.java.net/~shade/8033380/webrev.05/ Current changeset: http://cr.openjdk.java.net/~shade/8033380/8033380.changeset The code was passing the full JPRT cycle two weeks ago, passed the microbenchmark tests back then. It applies cleanly over jdk9/hs-comp now, and builds successfully with Linux x86_64/fastdebug. Thanks, -Aleksey. From roland.westrelin at oracle.com Mon Mar 3 04:21:42 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 3 Mar 2014 13:21:42 +0100 Subject: RFR(S): 8035841: assert(dp_src->tag() == dp_dst->tag()) failed: should be same tags 1 != 0 at ciMethodData.cpp:90 In-Reply-To: <5310DD74.1000604@oracle.com> References: <589EDF81-5165-4970-B8DB-E3FC37222672@oracle.com> <5310DD74.1000604@oracle.com> Message-ID: <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> Hi Vladimir, Thanks for reviewing that change. >> In ciMethodData.cpp: when the ciMethodData is loaded, the code walks over the traps in the extra data to translate their Method into a ciMethod. There can be new traps added as this is happening so the code that walks over the traps should iterate over the ciMethodData copy of the profile data. Because of concurrent updates, the assert is incorrect. > > Load_data() use Copy::disjoint_words() to get snapshot of all data (int total_size = _data_size + _extra_data_size;). Whatever we add after that concurrently should not be taking into account. Can you do that, process only _extra_data_size extra data? As I understand _extra_data_size takes into account all extra data entries, including the ones that are not yet used and the arg info entries at the end of the MDO. So I?m not sure I understand what you?re proposing. > I think load_extra_data() should get extra_data_base(), etc. from ciMethodData copy: > > 81 void ciMethodData::load_extra_data() { > 82 MethodData* mdo = get_MethodData(); > 83 > 84 // speculative trap entries also hold a pointer to a Method so need to be translated > 85 DataLayout* dp_src = mdo->extra_data_base(); > 86 DataLayout* end_src = mdo->extra_data_limit(); > 87 DataLayout* dp_dst = extra_data_base(); Are you saying that because we make a copy of the MDO we don?t need to read the references to translate from the MDO but we can read them from the copy and then overwrite them? I followed the pattern that is used elsewhere: read from the MDO the entries that need to be translated. >> In methodData.cpp: I had to remove the asserts because they are incorrect in case of concurrent updates as well. Also, the test that checks whether there is room for a speculative trap is broken in case of concurrent updates: the intent of next_extra(dp) is to check the next cell but if dp is allocated to a speculative trap concurrently it checks 2 cells from the current cell. Also, next_extra(dp)->tag() != DataLayout::no_tag doesn?t mean there?s no more space because it may have been allocated to some other trap concurrently and there may be more free space after. > > create_if_missing is true only during deoptimization so performance is not important. So can we do update under a lock? > > Concurrency will screw up you in one or an other way if you don't use lock. That sounds more reasonable. I?ll do that. Roland. > > Thanks, > Vladimir > >> >> http://cr.openjdk.java.net/~roland/8035841/webrev.00/ >> >> Roland. >> >> From albert.noll at oracle.com Mon Mar 3 04:25:12 2014 From: albert.noll at oracle.com (Albert) Date: Mon, 03 Mar 2014 13:25:12 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <5314618E.5050607@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> Message-ID: <531474A8.3090608@oracle.com> Hi Aleksey, thanks for your feedback. I just want to make clear that I do not have a strong opinion on this. I understand your arguments and if it turns out that reverting to the original behavior (silently start a second compiler thread) is the way to go, I will certainly not argue against it. For me the only clean solution that *enforces* and *retains* the meaning of the fag is to enable a compiler thread to grab tasks from both queues (or have a shared queue). Maybe someone can tell why it is implemented as it is? I agree with you that the current approach (report an error) exposes an unnecessary implementation detail of HS. However, the fact that tiered compilation is enabled by default and that tiered compilation is provided by two different compilers is implementation-specific to Hotspot. If someone looks at the code in globals.hpp and reads "Number of compiler threads to run" he/she expects to have 1 compiler thread when starting Hotspot with -XX:CICompilerCount=1. I think it is better to report a misconfiguration to the user (HS currently requires at least 2 compiler threads for tiered) than silently introducing a new behavior. I.e., behavior of HS changes from Java 7 to Java 8, since TieredCompilation is enabled by default: java -XX:CICompilerCount=1 (1 compiler thread in Java 7) java -XX:CICompilerCount=1 (2 compiler threads in Java 8) So if, for whatever reason, a customer wants to have a single compiler thread, this behavior change will go unnoticed. I think this is not good. Finally, I want to say that the meaning of the flag is not well specified. No matter how we will proceed, we should provide a precise definition. Best, Albert On 03/03/2014 12:03 PM, Aleksey Shipilev wrote: > Hi Albert, > > But, your original explanation [1] means C1 and C2 are different > compilers, and CICompilerCount=1 could apply to both of them > individually, which means C1 and C2 get one thread each. We can probably > weasel out by saying the option means "*total* number of compiler threads". > > However, that does not help much, because we just annoy users with VM > errors. How would user force a single compiler thread to run? > > Before the CICC enforcement: > a) +Tiered, CICC=1: user silently gets two compiler threads > b) -Tiered, CICC=1: user gets single compiler thread > > After the CICC change: > a) +Tiered, CICC=1: user gets the VM error. If user still wants to run > Tiered then he/she should set CICC=2 and move on. If user wants a single > thread, then we force him/her to disable Tiered. > b) -Tiered, CICC=1: user is opaque about the change > > So, we require users to explicitly segregate the additional > configuration sets depending on +/-Tiered, only to communicate to them > they maybe running two separate compilers which by implementation detail > require two separate threads. I think there is no new and valuable > information to the users in that parlance, only the annoying. I can see > yet another case why would users disable Tiered right away in their > tests, which is not what we want. > > I think we should rollback CICC=1 check to silently mean CICC=2 in case > of Tiered and rework Tiered to use the common compiler thread pool, so > that to remove the substance of this discussion. Is this doable? > Otherwise, I think we need to have a proper CCC in place discussing the > change in the accepted values of a product flag. > > -Aleksey. > > [1] "JDK-8034775 changed the minimum number of compiler threads for > tiered compilation to 2, since ***each compiler*** (C1 and C2) requires > a separate compiler thread." (emphasis is mine) > > On 03/03/2014 02:38 PM, Albert wrote: >> Hi Aleksey, >> >> that is not what the description of the flag says: >> >> product(intx, CICompilerCount, CI_COMPILER_COUNT, >> "Number of compiler threads to run") >> >> Best, >> Albert >> >> >> On 03/03/2014 11:36 AM, Aleksey Shipilev wrote: >>> On 03/03/2014 02:28 PM, Albert wrote: >>>> Silently using 2 instead of 1 compiler thread when tiered is enabled >>>> and -XX:CICompilerCount=1 was the default behavior before >>>> JDK-8034775. I worked on JDK-8034775 and back than it occurred to me >>>> that it is strange that the JVM silently 'overrules' an explicit >>>> command given by the user (namely that he/she wants to use only 1 >>>> compiler thread). >>> I don't find it strange if you follow the notion of two compilers in >>> TieredCompilation. CICompilerCount=1 means "one thread per compiler", >>> which given two compilers means two threads. If you really want a single >>> thread in the compiler thread pool, then you should disable one of the >>> compilers. >>> >>> -Aleksey. From roland.westrelin at oracle.com Mon Mar 3 04:29:32 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 3 Mar 2014 13:29:32 +0100 Subject: RFR (S) 8031818: Experimental VM flag for enforcing safe object construction In-Reply-To: <53146BA6.3000104@oracle.com> References: <52DEF8FB.3020707@oracle.com> <53146BA6.3000104@oracle.com> Message-ID: > Here is the current webrev: > http://cr.openjdk.java.net/~shade/8031818/webrev.02/ That looks good to me. Roland. From aleksey.shipilev at oracle.com Mon Mar 3 04:36:19 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 16:36:19 +0400 Subject: RFR (S) 8031818: Experimental VM flag for enforcing safe object construction In-Reply-To: References: <52DEF8FB.3020707@oracle.com> <53146BA6.3000104@oracle.com> Message-ID: <53147743.1040804@oracle.com> On 03/03/2014 04:29 PM, Roland Westrelin wrote: >> Here is the current webrev: >> http://cr.openjdk.java.net/~shade/8031818/webrev.02/ > > That looks good to me. > > Roland. Thanks Roland, I updated the changeset citing you as the Reviewer: http://cr.openjdk.java.net/~shade/8031818/8031818.changeset Vladimir I. volunteered to sponsor this. -Aleksey. From aleksey.shipilev at oracle.com Mon Mar 3 04:41:11 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 16:41:11 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531474A8.3090608@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> Message-ID: <53147867.1020401@oracle.com> On 03/03/2014 04:25 PM, Albert wrote: > I agree with you that the current approach (report an error) exposes > an unnecessary implementation detail of HS. However, the fact that > tiered compilation is enabled by default and that tiered compilation > is provided by two different compilers is implementation-specific to > Hotspot. If someone looks at the code in globals.hpp and reads > "Number of compiler threads to run" he/she expects to have 1 > compiler thread when starting Hotspot with -XX:CICompilerCount=1. I understand this as the purity argument, and I do think we need to make CICC as non-ambiguous and non-surprising as possible. My point is that both solutions (keeping the implicit CICC for Tiered, or report VM error) are bad in users' eyes. For example, it will require me to treat Tiered separately in most of my benchmarking/analysis scripts -- that is why I feel strongly about this whole thing. Thanks, -Aleksey. From aleksey.shipilev at oracle.com Mon Mar 3 04:42:37 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 16:42:37 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531474A8.3090608@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> Message-ID: <531478BD.1080806@oracle.com> On 03/03/2014 04:25 PM, Albert wrote: > For me the only clean solution that *enforces* and *retains* the > meaning of the flag is to enable a compiler thread to grab tasks from > both queues (or have a shared queue). Maybe someone can tell why it > is implemented as it is? I agree, this seems to be only good solution for not-that-familiar-with-HS-tiered-arch guy like me. -Aleksey. From albert.noll at oracle.com Mon Mar 3 04:47:58 2014 From: albert.noll at oracle.com (Albert) Date: Mon, 03 Mar 2014 13:47:58 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531478BD.1080806@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> Message-ID: <531479FE.7080005@oracle.com> Hi Aleksey, Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we decide to go for the clean solution, I could make it work. Best, Albert On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: > On 03/03/2014 04:25 PM, Albert wrote: >> For me the only clean solution that *enforces* and *retains* the >> meaning of the flag is to enable a compiler thread to grab tasks from >> both queues (or have a shared queue). Maybe someone can tell why it >> is implemented as it is? > I agree, this seems to be only good solution for > not-that-familiar-with-HS-tiered-arch guy like me. > > -Aleksey. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140303/8c4223f2/attachment.html From aleksey.shipilev at oracle.com Mon Mar 3 04:49:28 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 03 Mar 2014 16:49:28 +0400 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531479FE.7080005@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> Message-ID: <53147A58.8@oracle.com> Thanks Albert! -Aleksey. P.S. Serves me right for not paying attention to the original issue thinking it only covers the negative values. On 03/03/2014 04:47 PM, Albert wrote: > Hi Aleksey, > > Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we decide > to go for the clean > solution, I could make it work. > > Best, > Albert > > On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >> On 03/03/2014 04:25 PM, Albert wrote: >>> For me the only clean solution that *enforces* and *retains* the >>> meaning of the flag is to enable a compiler thread to grab tasks from >>> both queues (or have a shared queue). Maybe someone can tell why it >>> is implemented as it is? >> I agree, this seems to be only good solution for >> not-that-familiar-with-HS-tiered-arch guy like me. >> >> -Aleksey. > From vladimir.kozlov at oracle.com Mon Mar 3 11:50:26 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 03 Mar 2014 11:50:26 -0800 Subject: RFR (S): 8033380: Experimental VM flag to enforce access atomicity In-Reply-To: <53147076.5060909@oracle.com> References: <52FA032E.4000206@oracle.com> <53147076.5060909@oracle.com> Message-ID: <5314DD02.5080609@oracle.com> If there is no objection this time you can ask Vladimir I. to push it. I am fine with it. Thanks, Vladimir On 3/3/14 4:07 AM, Aleksey Shipilev wrote: > On 02/11/2014 03:02 PM, Aleksey Shipilev wrote: >> Please review this small feature meanwhile: >> https://bugs.openjdk.java.net/browse/JDK-8033380 >> http://cr.openjdk.java.net/~shade/8033380/webrev.02/ > > Since we are open for integration, Let us get back to this thing as well. > > Recapping the feedbacks: Roland had OK'ayed. Vladimir K. had OK'ayed C2 > parts. I think Igor V. had blessed the C1 parts. Christian Tornqvist has > doubts about whether we should commit it. Christian Thalinger had the > objection, but retracted it. Marcus L. had agreed this can be pushed > into the mainline. > > Current webrev: > http://cr.openjdk.java.net/~shade/8033380/webrev.05/ > > Current changeset: > http://cr.openjdk.java.net/~shade/8033380/8033380.changeset > > The code was passing the full JPRT cycle two weeks ago, passed the > microbenchmark tests back then. It applies cleanly over jdk9/hs-comp > now, and builds successfully with Linux x86_64/fastdebug. > > Thanks, > -Aleksey. > From vladimir.kozlov at oracle.com Mon Mar 3 12:03:03 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 03 Mar 2014 12:03:03 -0800 Subject: RFR(S): 8035841: assert(dp_src->tag() == dp_dst->tag()) failed: should be same tags 1 != 0 at ciMethodData.cpp:90 In-Reply-To: <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> References: <589EDF81-5165-4970-B8DB-E3FC37222672@oracle.com> <5310DD74.1000604@oracle.com> <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> Message-ID: <5314DFF7.8080209@oracle.com> On 3/3/14 4:21 AM, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for reviewing that change. > >>> In ciMethodData.cpp: when the ciMethodData is loaded, the code walks over the traps in the extra data to translate their Method into a ciMethod. There can be new traps added as this is happening so the code that walks over the traps should iterate over the ciMethodData copy of the profile data. Because of concurrent updates, the assert is incorrect. >> >> Load_data() use Copy::disjoint_words() to get snapshot of all data (int total_size = _data_size + _extra_data_size;). Whatever we add after that concurrently should not be taking into account. Can you do that, process only _extra_data_size extra data? > > As I understand _extra_data_size takes into account all extra data entries, including the ones that are not yet used and the arg info entries at the end of the MDO. So I?m not sure I understand what you?re proposing. You are right it is area reserved during MDO creation. > >> I think load_extra_data() should get extra_data_base(), etc. from ciMethodData copy: >> >> 81 void ciMethodData::load_extra_data() { >> 82 MethodData* mdo = get_MethodData(); >> 83 >> 84 // speculative trap entries also hold a pointer to a Method so need to be translated >> 85 DataLayout* dp_src = mdo->extra_data_base(); >> 86 DataLayout* end_src = mdo->extra_data_limit(); >> 87 DataLayout* dp_dst = extra_data_base(); > > Are you saying that because we make a copy of the MDO we don?t need to read the references to translate from the MDO but we can read them from the copy and then overwrite them? > I followed the pattern that is used elsewhere: read from the MDO the entries that need to be translated. Ignore this my comment. It was stupid. ciMethodData.cpp changes are fine. Thanks, Vladimir > >>> In methodData.cpp: I had to remove the asserts because they are incorrect in case of concurrent updates as well. Also, the test that checks whether there is room for a speculative trap is broken in case of concurrent updates: the intent of next_extra(dp) is to check the next cell but if dp is allocated to a speculative trap concurrently it checks 2 cells from the current cell. Also, next_extra(dp)->tag() != DataLayout::no_tag doesn?t mean there?s no more space because it may have been allocated to some other trap concurrently and there may be more free space after. >> >> create_if_missing is true only during deoptimization so performance is not important. So can we do update under a lock? >> >> Concurrency will screw up you in one or an other way if you don't use lock. > > That sounds more reasonable. I?ll do that. > > Roland. > >> >> Thanks, >> Vladimir >> >>> >>> http://cr.openjdk.java.net/~roland/8035841/webrev.00/ >>> >>> Roland. >>> >>> > From vladimir.kozlov at oracle.com Mon Mar 3 12:31:46 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 03 Mar 2014 12:31:46 -0800 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <53147A58.8@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> <53147A58.8@oracle.com> Message-ID: <5314E6B2.70508@oracle.com> Albert, You need to add -XX:-TieredCompilation to test's commands. I forgot it when I added the test. Removing -XX:CICompilerCount=1 is wrong because sequence of C2 compilations will affect the reproduction of the problem. CICompilerCount=1 serializes compilations and makes compilation sequence more deterministic. Even with -Xbatch tests which execute several threads the compilation is not deterministic because compilation requests from different java threads will be served by different compiler threads. CICompilerCount, by definition and by the code we used in Tiered, is total number of compilers threads, C1+C2. You can't interpret it differently for =1 case. Albert, make sure to allow CICompilerCount=1 with Tiered compilation when only C1 is used (TieredStopAtLevel < 4). Thanks, Vladimir On 3/3/14 4:49 AM, Aleksey Shipilev wrote: > Thanks Albert! > > -Aleksey. > > P.S. Serves me right for not paying attention to the original issue > thinking it only covers the negative values. > > On 03/03/2014 04:47 PM, Albert wrote: >> Hi Aleksey, >> >> Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we decide >> to go for the clean >> solution, I could make it work. >> >> Best, >> Albert >> >> On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >>> On 03/03/2014 04:25 PM, Albert wrote: >>>> For me the only clean solution that *enforces* and *retains* the >>>> meaning of the flag is to enable a compiler thread to grab tasks from >>>> both queues (or have a shared queue). Maybe someone can tell why it >>>> is implemented as it is? >>> I agree, this seems to be only good solution for >>> not-that-familiar-with-HS-tiered-arch guy like me. >>> >>> -Aleksey. >> > From christian.thalinger at oracle.com Mon Mar 3 14:04:15 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 3 Mar 2014 14:04:15 -0800 Subject: RFR (XS): 8035887: VM crashes trying to force inlining the recursive call In-Reply-To: <5310C3D6.3090903@oracle.com> References: <530F714A.7030805@oracle.com> <530F735A.80102@oracle.com> <530F7B31.5010408@oracle.com> <75C9710B-6A43-47EF-8E92-D40F5B2A62E1@oracle.com> <5310C3D6.3090903@oracle.com> Message-ID: Looks good. On Feb 28, 2014, at 9:13 AM, Vladimir Ivanov wrote: > Chris, David, thanks for review. > > Yes, that's an overlook on my side. > > Updated webrev: > http://cr.openjdk.java.net/~vlivanov/8035887/webrev.02/ > > I use empty string as a default, because NULL has special meaning for print_inlining. I hope we'll clean this up some day... > > Best regards, > Vladimir Ivanov > > On 2/28/14 8:22 PM, David Chase wrote: >> >> On 2014-02-27, at 1:01 PM, Christian Thalinger wrote: >> >>> ! const char* msg; >>> + if (callee->force_inline()) msg = "force inline by annotation"; >>> + if (callee->should_inline()) msg = "force inline by CompileOracle"; >>> + print_inlining(callee, msg); >>> >>> We shouldn?t leave msg uninitialized. I know that it is not a problem with the code as it is now but it might be in the future. >> >> And Parfait will complain right now. >> >> From roland.westrelin at oracle.com Mon Mar 3 14:12:29 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 3 Mar 2014 23:12:29 +0100 Subject: RFR(M): 8036146: make CPP interpreter build again Message-ID: I?m working on a fix that required some changes to the CPP interpreter so I got it building again on sparc and x86: http://cr.openjdk.java.net/~roland/8036146/webrev.00/ But it doesn?t run (java -version crashes). Should I push the build fixes anyway? Roland. From vladimir.kozlov at oracle.com Mon Mar 3 14:28:04 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 03 Mar 2014 14:28:04 -0800 Subject: RFR(M): 8036146: make CPP interpreter build again In-Reply-To: References: Message-ID: <531501F4.1040505@oracle.com> Changes look reasonable. I think you can push this fix. Did you file a bug for run failure? thanks, Vladimir On 3/3/14 2:12 PM, Roland Westrelin wrote: > I?m working on a fix that required some changes to the CPP interpreter so I got it building again on sparc and x86: > > http://cr.openjdk.java.net/~roland/8036146/webrev.00/ > > But it doesn?t run (java -version crashes). Should I push the build fixes anyway? > > Roland. > From vladimir.x.ivanov at oracle.com Mon Mar 3 15:24:38 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 04 Mar 2014 03:24:38 +0400 Subject: [8] RFR (XS): 8036100: Default method returns true for a while, and then returns false Message-ID: <53150F36.1010004@oracle.com> http://cr.openjdk.java.net/~vlivanov/8036100/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8036100 CHA still doesn't handle default methods right. For the following hierarchy and C1 as a context: interface I1 { default m() {} } interface I2 extends I1 { default m() {} } class C1 implements I1 {} class C2 extends C1 implements I2 {} CHA reports I1.m as a unique method. However C2.m resolves to I2.m. The fix for 8 is to disable CHA for default methods. Proper fix will go into 8u and 9. It is enough to add the check on root_m, because root_m should be non-abstract (see ciMethod::resolve_invoke). So, it's either (1) a default or (2) an instance method. (1) is covered by the fix and (2) isn't affected by default methods, because concrete method always hides all default methods in the hierarchy. Testing: regression test, vm.quick.testlist, vm.adhoc test set Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Mon Mar 3 15:49:03 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 03 Mar 2014 15:49:03 -0800 Subject: [8] RFR (XS): 8036100: Default method returns true for a while, and then returns false In-Reply-To: <53150F36.1010004@oracle.com> References: <53150F36.1010004@oracle.com> Message-ID: <531514EF.20001@oracle.com> Looks good. Thanks, Vladimir On 3/3/14 3:24 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8036100/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8036100 > > CHA still doesn't handle default methods right. > > For the following hierarchy and C1 as a context: > interface I1 { default m() {} } > interface I2 extends I1 { default m() {} } > > class C1 implements I1 {} > class C2 extends C1 implements I2 {} > > CHA reports I1.m as a unique method. However C2.m resolves to I2.m. > > The fix for 8 is to disable CHA for default methods. > Proper fix will go into 8u and 9. > > It is enough to add the check on root_m, because root_m should be > non-abstract (see ciMethod::resolve_invoke). So, it's either (1) a > default or (2) an instance method. (1) is covered by the fix and (2) > isn't affected by default methods, because concrete method always hides > all default methods in the hierarchy. > > Testing: regression test, vm.quick.testlist, vm.adhoc test set > > Best regards, > Vladimir Ivanov From john.r.rose at oracle.com Mon Mar 3 15:52:25 2014 From: john.r.rose at oracle.com (John Rose) Date: Mon, 3 Mar 2014 15:52:25 -0800 Subject: [8] RFR (XS): 8036100: Default method returns true for a while, and then returns false In-Reply-To: <53150F36.1010004@oracle.com> References: <53150F36.1010004@oracle.com> Message-ID: <86B75F52-CF2A-4EC6-9076-E32031E534BE@oracle.com> Reviewed. ? John On Mar 3, 2014, at 3:24 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8036100/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8036100 From vladimir.x.ivanov at oracle.com Mon Mar 3 16:25:37 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 04 Mar 2014 04:25:37 +0400 Subject: [8] RFR (XS): 8036100: Default method returns true for a while, and then returns false In-Reply-To: <86B75F52-CF2A-4EC6-9076-E32031E534BE@oracle.com> References: <53150F36.1010004@oracle.com> <86B75F52-CF2A-4EC6-9076-E32031E534BE@oracle.com> Message-ID: <53151D81.3030701@oracle.com> Vladimir, John, thanks for the review. Best regards, Vladimir Ivanov On 3/4/14 3:52 AM, John Rose wrote: > Reviewed. ? John > > On Mar 3, 2014, at 3:24 PM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~vlivanov/8036100/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8036100 > From vladimir.x.ivanov at oracle.com Mon Mar 3 17:07:15 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 04 Mar 2014 05:07:15 +0400 Subject: RFR (XS): 8035887: VM crashes trying to force inlining the recursive call In-Reply-To: References: <530F714A.7030805@oracle.com> <530F735A.80102@oracle.com> <530F7B31.5010408@oracle.com> <75C9710B-6A43-47EF-8E92-D40F5B2A62E1@oracle.com> <5310C3D6.3090903@oracle.com> Message-ID: <53152743.1040204@oracle.com> Chris, thank you. Best regards, Vladimir Ivanov On 3/4/14 2:04 AM, Christian Thalinger wrote: > Looks good. > > On Feb 28, 2014, at 9:13 AM, Vladimir Ivanov wrote: > >> Chris, David, thanks for review. >> >> Yes, that's an overlook on my side. >> >> Updated webrev: >> http://cr.openjdk.java.net/~vlivanov/8035887/webrev.02/ >> >> I use empty string as a default, because NULL has special meaning for print_inlining. I hope we'll clean this up some day... >> >> Best regards, >> Vladimir Ivanov >> >> On 2/28/14 8:22 PM, David Chase wrote: >>> >>> On 2014-02-27, at 1:01 PM, Christian Thalinger wrote: >>> >>>> ! const char* msg; >>>> + if (callee->force_inline()) msg = "force inline by annotation"; >>>> + if (callee->should_inline()) msg = "force inline by CompileOracle"; >>>> + print_inlining(callee, msg); >>>> >>>> We shouldn?t leave msg uninitialized. I know that it is not a problem with the code as it is now but it might be in the future. >>> >>> And Parfait will complain right now. >>> >>> > From albert.noll at oracle.com Mon Mar 3 23:29:31 2014 From: albert.noll at oracle.com (Albert) Date: Tue, 04 Mar 2014 08:29:31 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <5314E6B2.70508@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> <53147A58.8@oracle.com> <5314E6B2.70508@oracle.com> Message-ID: <531580DB.5050209@oracle.com> Hi Vladimir, thanks for your review. I did your proposed changes. Here is the new webrev: http://cr.openjdk.java.net/~anoll/8036091/webrev.01/ Best, Albert On 03/03/2014 09:31 PM, Vladimir Kozlov wrote: > Albert, > > You need to add -XX:-TieredCompilation to test's commands. I forgot it > when I added the test. Removing -XX:CICompilerCount=1 is wrong because > sequence of C2 compilations will affect the reproduction of the > problem. CICompilerCount=1 serializes compilations and makes > compilation sequence more deterministic. Even with -Xbatch tests which > execute several threads the compilation is not deterministic because > compilation requests from different java threads will be served by > different compiler threads. > > CICompilerCount, by definition and by the code we used in Tiered, is > total number of compilers threads, C1+C2. You can't interpret it > differently for =1 case. > > Albert, make sure to allow CICompilerCount=1 with Tiered compilation > when only C1 is used (TieredStopAtLevel < 4). > > Thanks, > Vladimir > > On 3/3/14 4:49 AM, Aleksey Shipilev wrote: >> Thanks Albert! >> >> -Aleksey. >> >> P.S. Serves me right for not paying attention to the original issue >> thinking it only covers the negative values. >> >> On 03/03/2014 04:47 PM, Albert wrote: >>> Hi Aleksey, >>> >>> Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we >>> decide >>> to go for the clean >>> solution, I could make it work. >>> >>> Best, >>> Albert >>> >>> On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >>>> On 03/03/2014 04:25 PM, Albert wrote: >>>>> For me the only clean solution that *enforces* and *retains* the >>>>> meaning of the flag is to enable a compiler thread to grab tasks from >>>>> both queues (or have a shared queue). Maybe someone can tell why it >>>>> is implemented as it is? >>>> I agree, this seems to be only good solution for >>>> not-that-familiar-with-HS-tiered-arch guy like me. >>>> >>>> -Aleksey. >>> >> From albert.noll at oracle.com Tue Mar 4 00:13:11 2014 From: albert.noll at oracle.com (Albert) Date: Tue, 04 Mar 2014 09:13:11 +0100 Subject: [9] RFR(XXS): 8036092: [TESTBUG] compiler/uncommontrap/TestSpecTrapClassUnloading.java fails with: Unrecognized VM option 'UseTypeSpeculation' Message-ID: <53158B17.7050204@oracle.com> Hi all, could I get reviews for this small patch? Bug: https://bugs.openjdk.java.net/browse/JDK-8036092 Problem: -XX:+UseTypeSpeculation is a C2 flag, which is not known in a client VM. Solution: add -XX:+IgnoreUnrecognizedVMOptions to @run main/othervm Testing: Failing test case Webrev: http://cr.openjdk.java.net/~anoll/8036092/webrev.00/ Many thanks in advance, Albert -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140304/6678ea66/attachment.html From vladimir.x.ivanov at oracle.com Tue Mar 4 02:18:47 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 04 Mar 2014 14:18:47 +0400 Subject: RFR (XS): 8035887: VM crashes trying to force inlining the recursive call In-Reply-To: <5310C9CE.2030000@oracle.com> References: <530F714A.7030805@oracle.com> <530F759A.9000506@oracle.com> <5310C715.6090309@oracle.com> <5310C9CE.2030000@oracle.com> Message-ID: <5315A887.6070406@oracle.com> Thanks, Vladimir! Best regards, Vladimir Ivanov On 2/28/14 9:39 PM, Vladimir Kozlov wrote: > On 2/28/14 9:27 AM, Vladimir Ivanov wrote: >> Vladimir, thanks for review! >> >> With the addition of recursive depth check for force_inline case, the >> only difference should be compiled lambda form >> case. But I haven't been able to come up with a test case which >> demonstrates the issue. >> >> Strangely, stack overflow in compiler thread Chris fixed a while back >> (8011138) was observed only in C2. >> >> I'd like to keep this fix as is for now. I'll spend more time >> investigating compiled lambda form recursive inlining >> behavior in C1, and file a bug if necessary. > > Okay. Webrev.02 seems fine. > > Thanks, > Vladimir K > >> >> Best regards, >> Vladimir Ivanov >> >> On 2/27/14 9:27 PM, Vladimir Kozlov wrote: >>> Vladimir, >>> >>> I think C1 still missing check for recursive depth in case of >>> force_inline(). In C2 recursion check is done for all types of inlining. >>> Yes, in case of lambda inlining it needs to check receiver. Only then C1 >>> and C2 will match. >>> >>> Thanks, >>> Vladimir K >>> >>> On 2/27/14 9:09 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8035887/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8035887 >>>> >>>> 4 lines changed: 3 ins; 0 del; 1 mod >>>> >>>> C1 overflows the stack when it tries to inline a recursive call of a >>>> method which is forced for inlining by CompilerOracle. >>>> >>>> The problem is that C1 doesn't check inlining depth for methods forced >>>> for inlining by CompilerOracle. >>>> >>>> The fix is to add missing checks. I added 2 checks (total depth and >>>> recursive depth). The former is to avoid a situation >>>> (very unlikely) when a long chain of methods, which are forced for >>>> inlining, overflows compiler stack. The latter is to >>>> unify behavior between C1 & C2. >>>> >>>> No regression test is added because it can take very long time to >>>> provoke the crash in some configurations. >>>> >>>> Testing: failing test >>>> >>>> Best regards, >>>> Vladimir Ivanov From roland.westrelin at oracle.com Tue Mar 4 02:27:10 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 4 Mar 2014 11:27:10 +0100 Subject: [9] RFR(XXS): 8036092: [TESTBUG] compiler/uncommontrap/TestSpecTrapClassUnloading.java fails with: Unrecognized VM option 'UseTypeSpeculation' In-Reply-To: <53158B17.7050204@oracle.com> References: <53158B17.7050204@oracle.com> Message-ID: > http://cr.openjdk.java.net/~anoll/8036092/webrev.00/ That looks good to me. Thanks for fixing that. Roland. From albert.noll at oracle.com Tue Mar 4 02:31:35 2014 From: albert.noll at oracle.com (Albert) Date: Tue, 04 Mar 2014 11:31:35 +0100 Subject: [9] RFR(XXS): 8036092: [TESTBUG] compiler/uncommontrap/TestSpecTrapClassUnloading.java fails with: Unrecognized VM option 'UseTypeSpeculation' In-Reply-To: References: <53158B17.7050204@oracle.com> Message-ID: <5315AB87.9060602@oracle.com> Thank you for looking at this, Roland. Best, Albert On 03/04/2014 11:27 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~anoll/8036092/webrev.00/ > That looks good to me. Thanks for fixing that. > > Roland. From roland.westrelin at oracle.com Tue Mar 4 03:12:46 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 4 Mar 2014 12:12:46 +0100 Subject: RFR(S): 8035841: assert(dp_src->tag() == dp_dst->tag()) failed: should be same tags 1 != 0 at ciMethodData.cpp:90 In-Reply-To: <5314DFF7.8080209@oracle.com> References: <589EDF81-5165-4970-B8DB-E3FC37222672@oracle.com> <5310DD74.1000604@oracle.com> <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> <5314DFF7.8080209@oracle.com> Message-ID: Thanks Vladimir. Here is a new webrev: http://cr.openjdk.java.net/~roland/8035841/webrev.01/ Roland. On Mar 3, 2014, at 9:03 PM, Vladimir Kozlov wrote: > On 3/3/14 4:21 AM, Roland Westrelin wrote: >> Hi Vladimir, >> >> Thanks for reviewing that change. >> >>>> In ciMethodData.cpp: when the ciMethodData is loaded, the code walks over the traps in the extra data to translate their Method into a ciMethod. There can be new traps added as this is happening so the code that walks over the traps should iterate over the ciMethodData copy of the profile data. Because of concurrent updates, the assert is incorrect. >>> >>> Load_data() use Copy::disjoint_words() to get snapshot of all data (int total_size = _data_size + _extra_data_size;). Whatever we add after that concurrently should not be taking into account. Can you do that, process only _extra_data_size extra data? >> >> As I understand _extra_data_size takes into account all extra data entries, including the ones that are not yet used and the arg info entries at the end of the MDO. So I?m not sure I understand what you?re proposing. > > You are right it is area reserved during MDO creation. > >> >>> I think load_extra_data() should get extra_data_base(), etc. from ciMethodData copy: >>> >>> 81 void ciMethodData::load_extra_data() { >>> 82 MethodData* mdo = get_MethodData(); >>> 83 >>> 84 // speculative trap entries also hold a pointer to a Method so need to be translated >>> 85 DataLayout* dp_src = mdo->extra_data_base(); >>> 86 DataLayout* end_src = mdo->extra_data_limit(); >>> 87 DataLayout* dp_dst = extra_data_base(); >> >> Are you saying that because we make a copy of the MDO we don?t need to read the references to translate from the MDO but we can read them from the copy and then overwrite them? >> I followed the pattern that is used elsewhere: read from the MDO the entries that need to be translated. > > Ignore this my comment. It was stupid. ciMethodData.cpp changes are fine. > > Thanks, > Vladimir > >> >>>> In methodData.cpp: I had to remove the asserts because they are incorrect in case of concurrent updates as well. Also, the test that checks whether there is room for a speculative trap is broken in case of concurrent updates: the intent of next_extra(dp) is to check the next cell but if dp is allocated to a speculative trap concurrently it checks 2 cells from the current cell. Also, next_extra(dp)->tag() != DataLayout::no_tag doesn?t mean there?s no more space because it may have been allocated to some other trap concurrently and there may be more free space after. >>> >>> create_if_missing is true only during deoptimization so performance is not important. So can we do update under a lock? >>> >>> Concurrency will screw up you in one or an other way if you don't use lock. >> >> That sounds more reasonable. I?ll do that. > > > >> >> Roland. >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> http://cr.openjdk.java.net/~roland/8035841/webrev.00/ >>>> >>>> Roland. >>>> >>>> >> From roland.westrelin at oracle.com Tue Mar 4 03:25:07 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 4 Mar 2014 12:25:07 +0100 Subject: RFR(M): 8036146: make CPP interpreter build again In-Reply-To: <531501F4.1040505@oracle.com> References: <531501F4.1040505@oracle.com> Message-ID: <458FDF1B-54A5-480A-B711-0CA0DA817622@oracle.com> > Changes look reasonable. Thanks Vladimir. Do I need another review? > I think you can push this fix. Did you file a bug for run failure? I filed: https://bugs.openjdk.java.net/browse/JDK-8036585 CPP interpreter crashes Roland. > > thanks, > Vladimir > > On 3/3/14 2:12 PM, Roland Westrelin wrote: >> I?m working on a fix that required some changes to the CPP interpreter so I got it building again on sparc and x86: >> >> http://cr.openjdk.java.net/~roland/8036146/webrev.00/ >> >> But it doesn?t run (java -version crashes). Should I push the build fixes anyway? >> >> Roland. >> From volker.simonis at gmail.com Tue Mar 4 05:34:59 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 4 Mar 2014 14:34:59 +0100 Subject: RFR(M): 8036146: make CPP interpreter build again In-Reply-To: References: Message-ID: Hi Roland, I suppose you need the following change in bytecodeInterpreter.cpp: ! #if !defined(ZERO) && defined(PPC) because _last_Java_fp isn't defined for sparc and x86. But do you really mean PPC or PPC64? We use '_last_Java_fp' for PPC64 (see src/cpu/ppc/vm/bytecodeInterpreter_ppc.hpp) so this should at leaset read: ! #if !defined(ZERO) && (defined(PPC) || defined(PPC64)) If you don't use the CPP interpreter for your closed PPC32 port you could probably just change the PPC to PPC64. Thanks, Volker On Mon, Mar 3, 2014 at 11:12 PM, Roland Westrelin wrote: > I?m working on a fix that required some changes to the CPP interpreter so I got it building again on sparc and x86: > > http://cr.openjdk.java.net/~roland/8036146/webrev.00/ > > But it doesn?t run (java -version crashes). Should I push the build fixes anyway? > > Roland. From volker.simonis at gmail.com Tue Mar 4 05:44:49 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 4 Mar 2014 14:44:49 +0100 Subject: RFR(M): 8036146: make CPP interpreter build again In-Reply-To: References: Message-ID: I was just told that defining PPC64 explicitly on the command line will implicitly define PPC (see src/share/vm/utilities/macros.hpp). So from our PPC64 port perspective your change looks fine as it currently is. You'll probably only want to change PPC to PPC64 if you have the CPP interpreter for your internal PPC32 port and it doesn't support '_last_Java_fp'. Thanks, Volker On Tue, Mar 4, 2014 at 2:34 PM, Volker Simonis wrote: > Hi Roland, > > I suppose you need the following change in bytecodeInterpreter.cpp: > > ! #if !defined(ZERO) && defined(PPC) > > because _last_Java_fp isn't defined for sparc and x86. > > But do you really mean PPC or PPC64? We use '_last_Java_fp' for PPC64 > (see src/cpu/ppc/vm/bytecodeInterpreter_ppc.hpp) so this should at > leaset read: > > ! #if !defined(ZERO) && (defined(PPC) || defined(PPC64)) > > If you don't use the CPP interpreter for your closed PPC32 port you > could probably just change the PPC to PPC64. > > Thanks, > Volker > > On Mon, Mar 3, 2014 at 11:12 PM, Roland Westrelin > wrote: >> I?m working on a fix that required some changes to the CPP interpreter so I got it building again on sparc and x86: >> >> http://cr.openjdk.java.net/~roland/8036146/webrev.00/ >> >> But it doesn?t run (java -version crashes). Should I push the build fixes anyway? >> >> Roland. From roland.westrelin at oracle.com Tue Mar 4 05:55:09 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 4 Mar 2014 14:55:09 +0100 Subject: RFR(M): 8036146: make CPP interpreter build again In-Reply-To: References: Message-ID: <6E226BDD-BC58-45FD-9186-53EF9948FA23@oracle.com> > I was just told that defining PPC64 explicitly on the command line > will implicitly define PPC (see src/share/vm/utilities/macros.hpp). So > from our PPC64 port perspective your change looks fine as it currently > is. Thanks for taking a look at this, Volker. I?ll leave it as it is then. Roland. From vladimir.x.ivanov at oracle.com Tue Mar 4 08:13:57 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 04 Mar 2014 20:13:57 +0400 Subject: RFR (XS): 8035887: VM crashes trying to force inlining the recursive call In-Reply-To: <5310C715.6090309@oracle.com> References: <530F714A.7030805@oracle.com> <530F759A.9000506@oracle.com> <5310C715.6090309@oracle.com> Message-ID: <5315FBC5.7030503@oracle.com> I'd like to give an update about my recent findings why C1 wasn't affected by 8011138. The culprit is that C1 doesn't do disambiguation of compiled lambda forms based on "receiver" type. So, normal recursive depth check applies here. It limits inlining horizon, but I don't think it's critical for C1. If you disagree, let me know and I'll file a RFE. Best regards, Vladimir Ivanov On 2/28/14 9:27 PM, Vladimir Ivanov wrote: > Vladimir, thanks for review! > > With the addition of recursive depth check for force_inline case, the > only difference should be compiled lambda form case. But I haven't been > able to come up with a test case which demonstrates the issue. > > Strangely, stack overflow in compiler thread Chris fixed a while back > (8011138) was observed only in C2. > > I'd like to keep this fix as is for now. I'll spend more time > investigating compiled lambda form recursive inlining behavior in C1, > and file a bug if necessary. > > Best regards, > Vladimir Ivanov > > On 2/27/14 9:27 PM, Vladimir Kozlov wrote: >> Vladimir, >> >> I think C1 still missing check for recursive depth in case of >> force_inline(). In C2 recursion check is done for all types of inlining. >> Yes, in case of lambda inlining it needs to check receiver. Only then C1 >> and C2 will match. >> >> Thanks, >> Vladimir K >> >> On 2/27/14 9:09 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8035887/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8035887 >>> >>> 4 lines changed: 3 ins; 0 del; 1 mod >>> >>> C1 overflows the stack when it tries to inline a recursive call of a >>> method which is forced for inlining by CompilerOracle. >>> >>> The problem is that C1 doesn't check inlining depth for methods forced >>> for inlining by CompilerOracle. >>> >>> The fix is to add missing checks. I added 2 checks (total depth and >>> recursive depth). The former is to avoid a situation >>> (very unlikely) when a long chain of methods, which are forced for >>> inlining, overflows compiler stack. The latter is to >>> unify behavior between C1 & C2. >>> >>> No regression test is added because it can take very long time to >>> provoke the crash in some configurations. >>> >>> Testing: failing test >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Mar 4 08:39:43 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 04 Mar 2014 08:39:43 -0800 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531580DB.5050209@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> <53147A58.8@oracle.com> <5314E6B2.70508@oracle.com> <531580DB.5050209@oracle.com> Message-ID: <531601CF.1070908@oracle.com> Looks good. Thanks, Vladimir On 3/3/14 11:29 PM, Albert wrote: > Hi Vladimir, > > thanks for your review. I did your proposed changes. > > Here is the new webrev: > http://cr.openjdk.java.net/~anoll/8036091/webrev.01/ > > Best, > Albert > > On 03/03/2014 09:31 PM, Vladimir Kozlov wrote: >> Albert, >> >> You need to add -XX:-TieredCompilation to test's commands. I forgot it >> when I added the test. Removing -XX:CICompilerCount=1 is wrong because >> sequence of C2 compilations will affect the reproduction of the >> problem. CICompilerCount=1 serializes compilations and makes >> compilation sequence more deterministic. Even with -Xbatch tests which >> execute several threads the compilation is not deterministic because >> compilation requests from different java threads will be served by >> different compiler threads. >> >> CICompilerCount, by definition and by the code we used in Tiered, is >> total number of compilers threads, C1+C2. You can't interpret it >> differently for =1 case. >> >> Albert, make sure to allow CICompilerCount=1 with Tiered compilation >> when only C1 is used (TieredStopAtLevel < 4). >> >> Thanks, >> Vladimir >> >> On 3/3/14 4:49 AM, Aleksey Shipilev wrote: >>> Thanks Albert! >>> >>> -Aleksey. >>> >>> P.S. Serves me right for not paying attention to the original issue >>> thinking it only covers the negative values. >>> >>> On 03/03/2014 04:47 PM, Albert wrote: >>>> Hi Aleksey, >>>> >>>> Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we >>>> decide >>>> to go for the clean >>>> solution, I could make it work. >>>> >>>> Best, >>>> Albert >>>> >>>> On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >>>>> On 03/03/2014 04:25 PM, Albert wrote: >>>>>> For me the only clean solution that *enforces* and *retains* the >>>>>> meaning of the flag is to enable a compiler thread to grab tasks from >>>>>> both queues (or have a shared queue). Maybe someone can tell why it >>>>>> is implemented as it is? >>>>> I agree, this seems to be only good solution for >>>>> not-that-familiar-with-HS-tiered-arch guy like me. >>>>> >>>>> -Aleksey. >>>> >>> > From vladimir.kozlov at oracle.com Tue Mar 4 08:40:31 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 04 Mar 2014 08:40:31 -0800 Subject: [9] RFR(XXS): 8036092: [TESTBUG] compiler/uncommontrap/TestSpecTrapClassUnloading.java fails with: Unrecognized VM option 'UseTypeSpeculation' In-Reply-To: <53158B17.7050204@oracle.com> References: <53158B17.7050204@oracle.com> Message-ID: <531601FF.9070802@oracle.com> Good. Vladimir On 3/4/14 12:13 AM, Albert wrote: > Hi all, > > could I get reviews for this small patch? > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8036092 > > Problem: > -XX:+UseTypeSpeculation is a C2 flag, which is not known in a client VM. > > Solution: > add -XX:+IgnoreUnrecognizedVMOptions to @run main/othervm > > Testing: > Failing test case > > Webrev: > http://cr.openjdk.java.net/~anoll/8036092/webrev.00/ > > Many thanks in advance, > Albert From vladimir.kozlov at oracle.com Tue Mar 4 08:44:42 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 04 Mar 2014 08:44:42 -0800 Subject: RFR(S): 8035841: assert(dp_src->tag() == dp_dst->tag()) failed: should be same tags 1 != 0 at ciMethodData.cpp:90 In-Reply-To: References: <589EDF81-5165-4970-B8DB-E3FC37222672@oracle.com> <5310DD74.1000604@oracle.com> <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> <5314DFF7.8080209@oracle.com> Message-ID: <531602FA.3000204@oracle.com> Looks good to me. Thanks, Vladimir On 3/4/14 3:12 AM, Roland Westrelin wrote: > Thanks Vladimir. Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8035841/webrev.01/ > > Roland. > > > On Mar 3, 2014, at 9:03 PM, Vladimir Kozlov wrote: > >> On 3/3/14 4:21 AM, Roland Westrelin wrote: >>> Hi Vladimir, >>> >>> Thanks for reviewing that change. >>> >>>>> In ciMethodData.cpp: when the ciMethodData is loaded, the code walks over the traps in the extra data to translate their Method into a ciMethod. There can be new traps added as this is happening so the code that walks over the traps should iterate over the ciMethodData copy of the profile data. Because of concurrent updates, the assert is incorrect. >>>> >>>> Load_data() use Copy::disjoint_words() to get snapshot of all data (int total_size = _data_size + _extra_data_size;). Whatever we add after that concurrently should not be taking into account. Can you do that, process only _extra_data_size extra data? >>> >>> As I understand _extra_data_size takes into account all extra data entries, including the ones that are not yet used and the arg info entries at the end of the MDO. So I?m not sure I understand what you?re proposing. >> >> You are right it is area reserved during MDO creation. >> >>> >>>> I think load_extra_data() should get extra_data_base(), etc. from ciMethodData copy: >>>> >>>> 81 void ciMethodData::load_extra_data() { >>>> 82 MethodData* mdo = get_MethodData(); >>>> 83 >>>> 84 // speculative trap entries also hold a pointer to a Method so need to be translated >>>> 85 DataLayout* dp_src = mdo->extra_data_base(); >>>> 86 DataLayout* end_src = mdo->extra_data_limit(); >>>> 87 DataLayout* dp_dst = extra_data_base(); >>> >>> Are you saying that because we make a copy of the MDO we don?t need to read the references to translate from the MDO but we can read them from the copy and then overwrite them? >>> I followed the pattern that is used elsewhere: read from the MDO the entries that need to be translated. >> >> Ignore this my comment. It was stupid. ciMethodData.cpp changes are fine. >> >> Thanks, >> Vladimir >> >>> >>>>> In methodData.cpp: I had to remove the asserts because they are incorrect in case of concurrent updates as well. Also, the test that checks whether there is room for a speculative trap is broken in case of concurrent updates: the intent of next_extra(dp) is to check the next cell but if dp is allocated to a speculative trap concurrently it checks 2 cells from the current cell. Also, next_extra(dp)->tag() != DataLayout::no_tag doesn?t mean there?s no more space because it may have been allocated to some other trap concurrently and there may be more free space after. >>>> >>>> create_if_missing is true only during deoptimization so performance is not important. So can we do update under a lock? >>>> >>>> Concurrency will screw up you in one or an other way if you don't use lock. >>> >>> That sounds more reasonable. I?ll do that. >> >> >> >>> >>> Roland. >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> http://cr.openjdk.java.net/~roland/8035841/webrev.00/ >>>>> >>>>> Roland. >>>>> >>>>> >>> > From vladimir.kozlov at oracle.com Tue Mar 4 08:49:04 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 04 Mar 2014 08:49:04 -0800 Subject: RFR(M): 8036146: make CPP interpreter build again In-Reply-To: <458FDF1B-54A5-480A-B711-0CA0DA817622@oracle.com> References: <531501F4.1040505@oracle.com> <458FDF1B-54A5-480A-B711-0CA0DA817622@oracle.com> Message-ID: <53160400.4000009@oracle.com> On 3/4/14 3:25 AM, Roland Westrelin wrote: >> Changes look reasonable. > > Thanks Vladimir. Do I need another review? You got comments from Volker, I think it is enough. Thanks, Vladimir > >> I think you can push this fix. Did you file a bug for run failure? > > I filed: > https://bugs.openjdk.java.net/browse/JDK-8036585 > CPP interpreter crashes > > Roland. > >> >> thanks, >> Vladimir >> >> On 3/3/14 2:12 PM, Roland Westrelin wrote: >>> I?m working on a fix that required some changes to the CPP interpreter so I got it building again on sparc and x86: >>> >>> http://cr.openjdk.java.net/~roland/8036146/webrev.00/ >>> >>> But it doesn?t run (java -version crashes). Should I push the build fixes anyway? >>> >>> Roland. >>> > From vladimir.kozlov at oracle.com Tue Mar 4 09:17:23 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 04 Mar 2014 09:17:23 -0800 Subject: RFR (XS): 8035887: VM crashes trying to force inlining the recursive call In-Reply-To: <5315FBC5.7030503@oracle.com> References: <530F714A.7030805@oracle.com> <530F759A.9000506@oracle.com> <5310C715.6090309@oracle.com> <5315FBC5.7030503@oracle.com> Message-ID: <53160AA3.3060302@oracle.com> Thank you for this info. I agree that it is not critical for C1. Thanks, Vladimir On 3/4/14 8:13 AM, Vladimir Ivanov wrote: > I'd like to give an update about my recent findings why C1 wasn't > affected by 8011138. > > The culprit is that C1 doesn't do disambiguation of compiled lambda > forms based on "receiver" type. So, normal recursive depth check applies > here. It limits inlining horizon, but I don't think it's critical for > C1. If you disagree, let me know and I'll file a RFE. > > Best regards, > Vladimir Ivanov > > On 2/28/14 9:27 PM, Vladimir Ivanov wrote: >> Vladimir, thanks for review! >> >> With the addition of recursive depth check for force_inline case, the >> only difference should be compiled lambda form case. But I haven't been >> able to come up with a test case which demonstrates the issue. >> >> Strangely, stack overflow in compiler thread Chris fixed a while back >> (8011138) was observed only in C2. >> >> I'd like to keep this fix as is for now. I'll spend more time >> investigating compiled lambda form recursive inlining behavior in C1, >> and file a bug if necessary. >> >> Best regards, >> Vladimir Ivanov >> >> On 2/27/14 9:27 PM, Vladimir Kozlov wrote: >>> Vladimir, >>> >>> I think C1 still missing check for recursive depth in case of >>> force_inline(). In C2 recursion check is done for all types of inlining. >>> Yes, in case of lambda inlining it needs to check receiver. Only then C1 >>> and C2 will match. >>> >>> Thanks, >>> Vladimir K >>> >>> On 2/27/14 9:09 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8035887/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8035887 >>>> >>>> 4 lines changed: 3 ins; 0 del; 1 mod >>>> >>>> C1 overflows the stack when it tries to inline a recursive call of a >>>> method which is forced for inlining by CompilerOracle. >>>> >>>> The problem is that C1 doesn't check inlining depth for methods forced >>>> for inlining by CompilerOracle. >>>> >>>> The fix is to add missing checks. I added 2 checks (total depth and >>>> recursive depth). The former is to avoid a situation >>>> (very unlikely) when a long chain of methods, which are forced for >>>> inlining, overflows compiler stack. The latter is to >>>> unify behavior between C1 & C2. >>>> >>>> No regression test is added because it can take very long time to >>>> provoke the crash in some configurations. >>>> >>>> Testing: failing test >>>> >>>> Best regards, >>>> Vladimir Ivanov From albert.noll at oracle.com Tue Mar 4 10:31:29 2014 From: albert.noll at oracle.com (Albert) Date: Tue, 04 Mar 2014 19:31:29 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531601CF.1070908@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> <53147A58.8@oracle.com> <5314E6B2.70508@oracle.com> <531580DB.5050209@oracle.com> <531601CF.1070908@oracle.com> Message-ID: <53161C01.5050906@oracle.com> Thank you, Vladimir. Best, Albert On 03/04/2014 05:39 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 3/3/14 11:29 PM, Albert wrote: >> Hi Vladimir, >> >> thanks for your review. I did your proposed changes. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~anoll/8036091/webrev.01/ >> >> Best, >> Albert >> >> On 03/03/2014 09:31 PM, Vladimir Kozlov wrote: >>> Albert, >>> >>> You need to add -XX:-TieredCompilation to test's commands. I forgot it >>> when I added the test. Removing -XX:CICompilerCount=1 is wrong because >>> sequence of C2 compilations will affect the reproduction of the >>> problem. CICompilerCount=1 serializes compilations and makes >>> compilation sequence more deterministic. Even with -Xbatch tests which >>> execute several threads the compilation is not deterministic because >>> compilation requests from different java threads will be served by >>> different compiler threads. >>> >>> CICompilerCount, by definition and by the code we used in Tiered, is >>> total number of compilers threads, C1+C2. You can't interpret it >>> differently for =1 case. >>> >>> Albert, make sure to allow CICompilerCount=1 with Tiered compilation >>> when only C1 is used (TieredStopAtLevel < 4). >>> >>> Thanks, >>> Vladimir >>> >>> On 3/3/14 4:49 AM, Aleksey Shipilev wrote: >>>> Thanks Albert! >>>> >>>> -Aleksey. >>>> >>>> P.S. Serves me right for not paying attention to the original issue >>>> thinking it only covers the negative values. >>>> >>>> On 03/03/2014 04:47 PM, Albert wrote: >>>>> Hi Aleksey, >>>>> >>>>> Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we >>>>> decide >>>>> to go for the clean >>>>> solution, I could make it work. >>>>> >>>>> Best, >>>>> Albert >>>>> >>>>> On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >>>>>> On 03/03/2014 04:25 PM, Albert wrote: >>>>>>> For me the only clean solution that *enforces* and *retains* the >>>>>>> meaning of the flag is to enable a compiler thread to grab tasks >>>>>>> from >>>>>>> both queues (or have a shared queue). Maybe someone can tell why it >>>>>>> is implemented as it is? >>>>>> I agree, this seems to be only good solution for >>>>>> not-that-familiar-with-HS-tiered-arch guy like me. >>>>>> >>>>>> -Aleksey. >>>>> >>>> >> From albert.noll at oracle.com Tue Mar 4 10:35:48 2014 From: albert.noll at oracle.com (Albert) Date: Tue, 04 Mar 2014 19:35:48 +0100 Subject: [9] RFR(XXS): 8036092: [TESTBUG] compiler/uncommontrap/TestSpecTrapClassUnloading.java fails with: Unrecognized VM option 'UseTypeSpeculation' In-Reply-To: <531601FF.9070802@oracle.com> References: <53158B17.7050204@oracle.com> <531601FF.9070802@oracle.com> Message-ID: <53161D04.1050504@oracle.com> Thank you, Vladimir. Best, Albert On 03/04/2014 05:40 PM, Vladimir Kozlov wrote: > Good. > > Vladimir > > On 3/4/14 12:13 AM, Albert wrote: >> Hi all, >> >> could I get reviews for this small patch? >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8036092 >> >> Problem: >> -XX:+UseTypeSpeculation is a C2 flag, which is not known in a client VM. >> >> Solution: >> add -XX:+IgnoreUnrecognizedVMOptions to @run main/othervm >> >> Testing: >> Failing test case >> >> Webrev: >> http://cr.openjdk.java.net/~anoll/8036092/webrev.00/ >> >> Many thanks in advance, >> Albert From christian.thalinger at oracle.com Tue Mar 4 12:47:33 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 4 Mar 2014 12:47:33 -0800 Subject: RFR(S): 8035841: assert(dp_src->tag() == dp_dst->tag()) failed: should be same tags 1 != 0 at ciMethodData.cpp:90 In-Reply-To: References: <589EDF81-5165-4970-B8DB-E3FC37222672@oracle.com> <5310DD74.1000604@oracle.com> <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> <5314DFF7.8080209@oracle.com> Message-ID: <2ED4D33A-854F-4056-83B7-CDE3756ABFB9@oracle.com> Looks good. On Mar 4, 2014, at 3:12 AM, Roland Westrelin wrote: > Thanks Vladimir. Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8035841/webrev.01/ > > Roland. > > > On Mar 3, 2014, at 9:03 PM, Vladimir Kozlov wrote: > >> On 3/3/14 4:21 AM, Roland Westrelin wrote: >>> Hi Vladimir, >>> >>> Thanks for reviewing that change. >>> >>>>> In ciMethodData.cpp: when the ciMethodData is loaded, the code walks over the traps in the extra data to translate their Method into a ciMethod. There can be new traps added as this is happening so the code that walks over the traps should iterate over the ciMethodData copy of the profile data. Because of concurrent updates, the assert is incorrect. >>>> >>>> Load_data() use Copy::disjoint_words() to get snapshot of all data (int total_size = _data_size + _extra_data_size;). Whatever we add after that concurrently should not be taking into account. Can you do that, process only _extra_data_size extra data? >>> >>> As I understand _extra_data_size takes into account all extra data entries, including the ones that are not yet used and the arg info entries at the end of the MDO. So I?m not sure I understand what you?re proposing. >> >> You are right it is area reserved during MDO creation. >> >>> >>>> I think load_extra_data() should get extra_data_base(), etc. from ciMethodData copy: >>>> >>>> 81 void ciMethodData::load_extra_data() { >>>> 82 MethodData* mdo = get_MethodData(); >>>> 83 >>>> 84 // speculative trap entries also hold a pointer to a Method so need to be translated >>>> 85 DataLayout* dp_src = mdo->extra_data_base(); >>>> 86 DataLayout* end_src = mdo->extra_data_limit(); >>>> 87 DataLayout* dp_dst = extra_data_base(); >>> >>> Are you saying that because we make a copy of the MDO we don?t need to read the references to translate from the MDO but we can read them from the copy and then overwrite them? >>> I followed the pattern that is used elsewhere: read from the MDO the entries that need to be translated. >> >> Ignore this my comment. It was stupid. ciMethodData.cpp changes are fine. >> >> Thanks, >> Vladimir >> >>> >>>>> In methodData.cpp: I had to remove the asserts because they are incorrect in case of concurrent updates as well. Also, the test that checks whether there is room for a speculative trap is broken in case of concurrent updates: the intent of next_extra(dp) is to check the next cell but if dp is allocated to a speculative trap concurrently it checks 2 cells from the current cell. Also, next_extra(dp)->tag() != DataLayout::no_tag doesn?t mean there?s no more space because it may have been allocated to some other trap concurrently and there may be more free space after. >>>> >>>> create_if_missing is true only during deoptimization so performance is not important. So can we do update under a lock? >>>> >>>> Concurrency will screw up you in one or an other way if you don't use lock. >>> >>> That sounds more reasonable. I?ll do that. >> >> >> >>> >>> Roland. >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> http://cr.openjdk.java.net/~roland/8035841/webrev.00/ >>>>> >>>>> Roland. >>>>> >>>>> >>> > From christian.thalinger at oracle.com Tue Mar 4 12:58:19 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 4 Mar 2014 12:58:19 -0800 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <531580DB.5050209@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> <53147A58.8@oracle.com> <5314E6B2.70508@oracle.com> <531580DB.5050209@oracle.com> Message-ID: <1B6083C5-3B74-464C-BBC8-24A476A64FA9@oracle.com> ! const int num_min_compiler_threads = (TieredCompilation && (TieredStopAtLevel >= 4)) ? 2 : 1; Use CompLevel_full_optimization instead of 4. Otherwise this looks good. On Mar 3, 2014, at 11:29 PM, Albert wrote: > Hi Vladimir, > > thanks for your review. I did your proposed changes. > > Here is the new webrev: > http://cr.openjdk.java.net/~anoll/8036091/webrev.01/ > > Best, > Albert > > On 03/03/2014 09:31 PM, Vladimir Kozlov wrote: >> Albert, >> >> You need to add -XX:-TieredCompilation to test's commands. I forgot it when I added the test. Removing -XX:CICompilerCount=1 is wrong because sequence of C2 compilations will affect the reproduction of the problem. CICompilerCount=1 serializes compilations and makes compilation sequence more deterministic. Even with -Xbatch tests which execute several threads the compilation is not deterministic because compilation requests from different java threads will be served by different compiler threads. >> >> CICompilerCount, by definition and by the code we used in Tiered, is total number of compilers threads, C1+C2. You can't interpret it differently for =1 case. >> >> Albert, make sure to allow CICompilerCount=1 with Tiered compilation when only C1 is used (TieredStopAtLevel < 4). >> >> Thanks, >> Vladimir >> >> On 3/3/14 4:49 AM, Aleksey Shipilev wrote: >>> Thanks Albert! >>> >>> -Aleksey. >>> >>> P.S. Serves me right for not paying attention to the original issue >>> thinking it only covers the negative values. >>> >>> On 03/03/2014 04:47 PM, Albert wrote: >>>> Hi Aleksey, >>>> >>>> Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we decide >>>> to go for the clean >>>> solution, I could make it work. >>>> >>>> Best, >>>> Albert >>>> >>>> On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >>>>> On 03/03/2014 04:25 PM, Albert wrote: >>>>>> For me the only clean solution that *enforces* and *retains* the >>>>>> meaning of the flag is to enable a compiler thread to grab tasks from >>>>>> both queues (or have a shared queue). Maybe someone can tell why it >>>>>> is implemented as it is? >>>>> I agree, this seems to be only good solution for >>>>> not-that-familiar-with-HS-tiered-arch guy like me. >>>>> >>>>> -Aleksey. >>>> >>> > From roland.westrelin at oracle.com Wed Mar 5 00:36:46 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 5 Mar 2014 09:36:46 +0100 Subject: RFR(S): 8035841: assert(dp_src->tag() == dp_dst->tag()) failed: should be same tags 1 != 0 at ciMethodData.cpp:90 In-Reply-To: <2ED4D33A-854F-4056-83B7-CDE3756ABFB9@oracle.com> References: <589EDF81-5165-4970-B8DB-E3FC37222672@oracle.com> <5310DD74.1000604@oracle.com> <67FBC1B2-4334-43D8-8DDF-896E9FCD54D7@oracle.com> <5314DFF7.8080209@oracle.com> <2ED4D33A-854F-4056-83B7-CDE3756ABFB9@oracle.com> Message-ID: Thanks Vladimir & Chris. Roland. From albert.noll at oracle.com Wed Mar 5 01:14:07 2014 From: albert.noll at oracle.com (Albert) Date: Wed, 05 Mar 2014 10:14:07 +0100 Subject: [9] RFR(XXS): 8036091: compiler/membars/DekkerTest.java fails with -XX:CICompilerCount=1 In-Reply-To: <1B6083C5-3B74-464C-BBC8-24A476A64FA9@oracle.com> References: <53143E98.5080102@oracle.com> <04D12456-BFAC-4C44-AAEF-BF52CFF73F04@oracle.com> <53145950.7070903@oracle.com> <53145B28.7040109@oracle.com> <53145B8B.6060306@oracle.com> <5314618E.5050607@oracle.com> <531474A8.3090608@oracle.com> <531478BD.1080806@oracle.com> <531479FE.7080005@oracle.com> <53147A58.8@oracle.com> <5314E6B2.70508@oracle.com> <531580DB.5050209@oracle.com> <1B6083C5-3B74-464C-BBC8-24A476A64FA9@oracle.com> Message-ID: <5316EADF.3040605@oracle.com> Thank you, Christian. I'll to the change and push it. Best, Albert On 03/04/2014 09:58 PM, Christian Thalinger wrote: > ! const int num_min_compiler_threads = (TieredCompilation && (TieredStopAtLevel >= 4)) ? 2 : 1; > > Use CompLevel_full_optimization instead of 4. > > Otherwise this looks good. > > On Mar 3, 2014, at 11:29 PM, Albert wrote: > >> Hi Vladimir, >> >> thanks for your review. I did your proposed changes. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~anoll/8036091/webrev.01/ >> >> Best, >> Albert >> >> On 03/03/2014 09:31 PM, Vladimir Kozlov wrote: >>> Albert, >>> >>> You need to add -XX:-TieredCompilation to test's commands. I forgot it when I added the test. Removing -XX:CICompilerCount=1 is wrong because sequence of C2 compilations will affect the reproduction of the problem. CICompilerCount=1 serializes compilations and makes compilation sequence more deterministic. Even with -Xbatch tests which execute several threads the compilation is not deterministic because compilation requests from different java threads will be served by different compiler threads. >>> >>> CICompilerCount, by definition and by the code we used in Tiered, is total number of compilers threads, C1+C2. You can't interpret it differently for =1 case. >>> >>> Albert, make sure to allow CICompilerCount=1 with Tiered compilation when only C1 is used (TieredStopAtLevel < 4). >>> >>> Thanks, >>> Vladimir >>> >>> On 3/3/14 4:49 AM, Aleksey Shipilev wrote: >>>> Thanks Albert! >>>> >>>> -Aleksey. >>>> >>>> P.S. Serves me right for not paying attention to the original issue >>>> thinking it only covers the negative values. >>>> >>>> On 03/03/2014 04:47 PM, Albert wrote: >>>>> Hi Aleksey, >>>>> >>>>> Vladimir K. reviewed JDK-8034775, let's see what he thinks. If we decide >>>>> to go for the clean >>>>> solution, I could make it work. >>>>> >>>>> Best, >>>>> Albert >>>>> >>>>> On 03/03/2014 01:42 PM, Aleksey Shipilev wrote: >>>>>> On 03/03/2014 04:25 PM, Albert wrote: >>>>>>> For me the only clean solution that *enforces* and *retains* the >>>>>>> meaning of the flag is to enable a compiler thread to grab tasks from >>>>>>> both queues (or have a shared queue). Maybe someone can tell why it >>>>>>> is implemented as it is? >>>>>> I agree, this seems to be only good solution for >>>>>> not-that-familiar-with-HS-tiered-arch guy like me. >>>>>> >>>>>> -Aleksey. From vladimir.x.ivanov at oracle.com Wed Mar 5 16:26:20 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 06 Mar 2014 04:26:20 +0400 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on Message-ID: <5317C0AC.2000005@oracle.com> http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8036667 10 lines changed: 8 ins; 0 del; 2 mod The assert is too strong. It doesn't take into account the case when constant offset comes from phi node. The fix is to relax the assertion by skipping phi node case. I decided to keep it simple and avoided doing deep verification of the graph. Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). Testing: failing tests. Thanks! Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Mar 5 16:49:52 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 05 Mar 2014 16:49:52 -0800 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on In-Reply-To: <5317C0AC.2000005@oracle.com> References: <5317C0AC.2000005@oracle.com> Message-ID: <5317C630.1010305@oracle.com> Why you did changes in extract_uncommon_trap_request()? How we can get phi there? What is 'off' value in this case? How you can get constant value from C2 type system (tp->offset()) but not from looking on ideal nodes? thanks, Vladimir K On 3/5/14 4:26 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8036667 > 10 lines changed: 8 ins; 0 del; 2 mod > > The assert is too strong. It doesn't take into account the case when > constant offset comes from phi node. > > The fix is to relax the assertion by skipping phi node case. I decided > to keep it simple and avoided doing deep verification of the graph. > > Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). > > Testing: failing tests. > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Mar 5 18:01:22 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 06 Mar 2014 06:01:22 +0400 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on In-Reply-To: <5317C630.1010305@oracle.com> References: <5317C0AC.2000005@oracle.com> <5317C630.1010305@oracle.com> Message-ID: <5317D6F2.6050505@oracle.com> > Why you did changes in extract_uncommon_trap_request()? How we can get > phi there? It's a fix for a crash w/ -XX:+TraceIterGVN. Will file a separate bug to track it, as you suggested. > What is 'off' value in this case? How you can get constant value from C2 > type system (tp->offset()) but not from looking on ideal nodes? Good point! My fix isn't correct in some cases. In case only 1 phi input has non-top type, it should be safe to use offset. But when at least 2 inputs are live, it's not correct (same offset, but different address). So, I need to ignore phi case completely. Updated fix: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.01/ Best regards, Vladimir Ivanov > > On 3/5/14 4:26 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8036667 >> 10 lines changed: 8 ins; 0 del; 2 mod >> >> The assert is too strong. It doesn't take into account the case when >> constant offset comes from phi node. >> >> The fix is to relax the assertion by skipping phi node case. I decided >> to keep it simple and avoided doing deep verification of the graph. >> >> Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). >> >> Testing: failing tests. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Mar 5 18:10:10 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 05 Mar 2014 18:10:10 -0800 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on In-Reply-To: <5317D6F2.6050505@oracle.com> References: <5317C0AC.2000005@oracle.com> <5317C630.1010305@oracle.com> <5317D6F2.6050505@oracle.com> Message-ID: <5317D902.4000200@oracle.com> Looks good. Vladimir On 3/5/14 6:01 PM, Vladimir Ivanov wrote: >> Why you did changes in extract_uncommon_trap_request()? How we can get >> phi there? > It's a fix for a crash w/ -XX:+TraceIterGVN. > Will file a separate bug to track it, as you suggested. > >> What is 'off' value in this case? How you can get constant value from C2 >> type system (tp->offset()) but not from looking on ideal nodes? > Good point! My fix isn't correct in some cases. > > In case only 1 phi input has non-top type, it should be safe to use > offset. But when at least 2 inputs are live, it's not correct (same > offset, but different address). > > So, I need to ignore phi case completely. > > Updated fix: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.01/ > > Best regards, > Vladimir Ivanov > >> >> On 3/5/14 4:26 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8036667 >>> 10 lines changed: 8 ins; 0 del; 2 mod >>> >>> The assert is too strong. It doesn't take into account the case when >>> constant offset comes from phi node. >>> >>> The fix is to relax the assertion by skipping phi node case. I decided >>> to keep it simple and avoided doing deep verification of the graph. >>> >>> Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). >>> >>> Testing: failing tests. >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov From roland.westrelin at oracle.com Thu Mar 6 03:08:21 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 6 Mar 2014 12:08:21 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 Message-ID: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. This change in AbstractAssembler::generate_stack_overflow_check(): 137 int bang_end = (StackShadowPages+1)*page_size; is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. http://cr.openjdk.java.net/~roland/8032410/webrev.01/ Roland. From vladimir.x.ivanov at oracle.com Thu Mar 6 09:40:20 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 06 Mar 2014 21:40:20 +0400 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on In-Reply-To: <5317D6F2.6050505@oracle.com> References: <5317C0AC.2000005@oracle.com> <5317C630.1010305@oracle.com> <5317D6F2.6050505@oracle.com> Message-ID: <5318B304.9050002@oracle.com> Unfortunately, replacing tp->offset() with AddPNode::Ideal_base_and_offset doesn't work for other cases in LoadNode::Value. I have to revert to using tp->offset(). Updated webrev: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.02/ I did some testing and haven't found any problems with the adr being PhiNode, but I'm not sure I understand all aspects of this code. So I decided to skip phi case by adding adr->is_AddP() case. Some other cases in LoadNode::Value have similar check. Best regards, Vladimir Ivanov On 3/6/14 6:01 AM, Vladimir Ivanov wrote: >> Why you did changes in extract_uncommon_trap_request()? How we can get >> phi there? > It's a fix for a crash w/ -XX:+TraceIterGVN. > Will file a separate bug to track it, as you suggested. > >> What is 'off' value in this case? How you can get constant value from C2 >> type system (tp->offset()) but not from looking on ideal nodes? > Good point! My fix isn't correct in some cases. > > In case only 1 phi input has non-top type, it should be safe to use > offset. But when at least 2 inputs are live, it's not correct (same > offset, but different address). > > So, I need to ignore phi case completely. > > Updated fix: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.01/ > > Best regards, > Vladimir Ivanov > >> >> On 3/5/14 4:26 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8036667 >>> 10 lines changed: 8 ins; 0 del; 2 mod >>> >>> The assert is too strong. It doesn't take into account the case when >>> constant offset comes from phi node. >>> >>> The fix is to relax the assertion by skipping phi node case. I decided >>> to keep it simple and avoided doing deep verification of the graph. >>> >>> Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). >>> >>> Testing: failing tests. >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.kozlov at oracle.com Thu Mar 6 09:47:47 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 06 Mar 2014 09:47:47 -0800 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on In-Reply-To: <5318B304.9050002@oracle.com> References: <5317C0AC.2000005@oracle.com> <5317C630.1010305@oracle.com> <5317D6F2.6050505@oracle.com> <5318B304.9050002@oracle.com> Message-ID: <5318B4C3.1020000@oracle.com> Looks good. Vladimir On 3/6/14 9:40 AM, Vladimir Ivanov wrote: > Unfortunately, replacing tp->offset() with AddPNode::Ideal_base_and_offset doesn't work for other cases in > LoadNode::Value. I have to revert to using tp->offset(). > > Updated webrev: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.02/ > > I did some testing and haven't found any problems with the adr being PhiNode, but I'm not sure I understand all aspects > of this code. So I decided to skip phi case by adding adr->is_AddP() case. Some other cases in LoadNode::Value have > similar check. > > Best regards, > Vladimir Ivanov > > On 3/6/14 6:01 AM, Vladimir Ivanov wrote: >>> Why you did changes in extract_uncommon_trap_request()? How we can get >>> phi there? >> It's a fix for a crash w/ -XX:+TraceIterGVN. >> Will file a separate bug to track it, as you suggested. >> >>> What is 'off' value in this case? How you can get constant value from C2 >>> type system (tp->offset()) but not from looking on ideal nodes? >> Good point! My fix isn't correct in some cases. >> >> In case only 1 phi input has non-top type, it should be safe to use >> offset. But when at least 2 inputs are live, it's not correct (same >> offset, but different address). >> >> So, I need to ignore phi case completely. >> >> Updated fix: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.01/ >> >> Best regards, >> Vladimir Ivanov >> >>> >>> On 3/5/14 4:26 PM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8036667 >>>> 10 lines changed: 8 ins; 0 del; 2 mod >>>> >>>> The assert is too strong. It doesn't take into account the case when >>>> constant offset comes from phi node. >>>> >>>> The fix is to relax the assertion by skipping phi node case. I decided >>>> to keep it simple and avoided doing deep verification of the graph. >>>> >>>> Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). >>>> >>>> Testing: failing tests. >>>> >>>> Thanks! >>>> >>>> Best regards, >>>> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Mar 6 09:50:43 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 06 Mar 2014 21:50:43 +0400 Subject: [9] RFR (XS): 8036667: "assert(adr->is_AddP() && adr->in(AddPNode::Offset)->is_Con()) failed: offset is a constant" with FoldStableValues on In-Reply-To: <5318B4C3.1020000@oracle.com> References: <5317C0AC.2000005@oracle.com> <5317C630.1010305@oracle.com> <5317D6F2.6050505@oracle.com> <5318B304.9050002@oracle.com> <5318B4C3.1020000@oracle.com> Message-ID: <5318B573.1040703@oracle.com> Thanks, Vladimir. Best regards, Vladimir Ivanov On 3/6/14 9:47 PM, Vladimir Kozlov wrote: > Looks good. > > Vladimir > > On 3/6/14 9:40 AM, Vladimir Ivanov wrote: >> Unfortunately, replacing tp->offset() with >> AddPNode::Ideal_base_and_offset doesn't work for other cases in >> LoadNode::Value. I have to revert to using tp->offset(). >> >> Updated webrev: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.02/ >> >> I did some testing and haven't found any problems with the adr being >> PhiNode, but I'm not sure I understand all aspects >> of this code. So I decided to skip phi case by adding adr->is_AddP() >> case. Some other cases in LoadNode::Value have >> similar check. >> >> Best regards, >> Vladimir Ivanov >> >> On 3/6/14 6:01 AM, Vladimir Ivanov wrote: >>>> Why you did changes in extract_uncommon_trap_request()? How we can get >>>> phi there? >>> It's a fix for a crash w/ -XX:+TraceIterGVN. >>> Will file a separate bug to track it, as you suggested. >>> >>>> What is 'off' value in this case? How you can get constant value >>>> from C2 >>>> type system (tp->offset()) but not from looking on ideal nodes? >>> Good point! My fix isn't correct in some cases. >>> >>> In case only 1 phi input has non-top type, it should be safe to use >>> offset. But when at least 2 inputs are live, it's not correct (same >>> offset, but different address). >>> >>> So, I need to ignore phi case completely. >>> >>> Updated fix: http://cr.openjdk.java.net/~vlivanov/8036667/webrev.01/ >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> On 3/5/14 4:26 PM, Vladimir Ivanov wrote: >>>>> http://cr.openjdk.java.net/~vlivanov/8036667/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8036667 >>>>> 10 lines changed: 8 ins; 0 del; 2 mod >>>>> >>>>> The assert is too strong. It doesn't take into account the case when >>>>> constant offset comes from phi node. >>>>> >>>>> The fix is to relax the assertion by skipping phi node case. I decided >>>>> to keep it simple and avoided doing deep verification of the graph. >>>>> >>>>> Also, fixed a crash w/ -XX:+TraceIterGVN (see callnode.cpp). >>>>> >>>>> Testing: failing tests. >>>>> >>>>> Thanks! >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov From christian.thalinger at oracle.com Thu Mar 6 11:16:17 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 6 Mar 2014 11:16:17 -0800 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> Message-ID: This is a very nice cleanup too. + assert(bang_size_in_bytes >= frame_size_in_bytes, "stack bang size incorrect?); I?m pretty sure this is almost always true but it might not (for whatever reason). I don?t see much value in that assert. src/share/vm/opto/compile.cpp: + int Compile::bang_size_in_bytes() const { + int callee_locals = method() != NULL ? method()->max_locals() : 0; + int interpreter_frame_size = _interpreter_frame_size; + return MAX2(interpreter_frame_size, frame_size_in_bytes()); + } callee_locals is unused. Is there a reason you load _interpreter_frame_size into a local variable? src/share/vm/asm/assembler.cpp: ! int bang_end = (StackShadowPages+1)*page_size; Why +1? src/share/vm/ci/ciMethod.cpp: + int ciMethod::get_stack_effect_at_invoke(int bci, Bytecodes::Code code, int& inputs) { + int ciMethod::stack_effect_if_at_invoke(int bci) { Either both with or without ?get?. Since this is very hard to test what testing did you do? On Mar 6, 2014, at 3:08 AM, Roland Westrelin wrote: > This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html > > The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. > > This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). > > I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. > > This change in AbstractAssembler::generate_stack_overflow_check(): > > 137 int bang_end = (StackShadowPages+1)*page_size; > > is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. > > http://cr.openjdk.java.net/~roland/8032410/webrev.01/ > > Roland. From roland.westrelin at oracle.com Fri Mar 7 05:29:29 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 7 Mar 2014 14:29:29 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> Message-ID: <3FF8512B-ACC8-4BAE-B9A2-79439DD67193@oracle.com> Thanks for reviewing this, Chris. > This is a very nice cleanup too. > > + assert(bang_size_in_bytes >= frame_size_in_bytes, "stack bang size incorrect?); > > I?m pretty sure this is almost always true but it might not (for whatever reason). I don?t see much value in that assert. With the current code, yes. But what if changes are made to the stack banging code? Wouldn?t it be nice to catch a problem if something goes very wrong? > src/share/vm/opto/compile.cpp: > > + int Compile::bang_size_in_bytes() const { > + int callee_locals = method() != NULL ? method()->max_locals() : 0; > + int interpreter_frame_size = _interpreter_frame_size; > + return MAX2(interpreter_frame_size, frame_size_in_bytes()); > + } > > callee_locals is unused. Is there a reason you load _interpreter_frame_size into a local variable? Thanks for catching that. I?ll clean it up. > > src/share/vm/asm/assembler.cpp: > > ! int bang_end = (StackShadowPages+1)*page_size; > > Why +1? From my first email: >> This change in AbstractAssembler::generate_stack_overflow_check(): >> >> 137 int bang_end = (StackShadowPages+1)*page_size; >> >> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. > > src/share/vm/ci/ciMethod.cpp: > > + int ciMethod::get_stack_effect_at_invoke(int bci, Bytecodes::Code code, int& inputs) { > + int ciMethod::stack_effect_if_at_invoke(int bci) { > > Either both with or without ?get?. Ok. > Since this is very hard to test what testing did you do? Indeed. I logged every stack size computation that the compilers do, ran some tests (some subset of specjvm98) with DeoptimizeALot and verified with a script that the stack size computation at deoptimization matches the one done by the compilers before. I ran regression tests from: java/lang, java/util, hotspot/compiler, hotspot/runtime, hotspot/gc + nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring with -Xcomp and with and without -XX:+DeoptimizeALot on x64. Roland. > > On Mar 6, 2014, at 3:08 AM, Roland Westrelin wrote: > >> This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >> >> The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. >> >> This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). >> >> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. >> >> This change in AbstractAssembler::generate_stack_overflow_check(): >> >> 137 int bang_end = (StackShadowPages+1)*page_size; >> >> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. >> >> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >> >> Roland. > From david.r.chase at oracle.com Fri Mar 7 06:50:06 2014 From: david.r.chase at oracle.com (David Chase) Date: Fri, 7 Mar 2014 09:50:06 -0500 Subject: RFR (XS): 8028037: "[parfait] warnings from b114 for hotspot.src.share.vm " Message-ID: <5AE2AC8C-0278-4E72-9A39-667B64B71BB1@oracle.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8028037 Fix: http://cr.openjdk.java.net/~drchase/8028037/webrev.00/ Testing: local jtreg. The fix addresses half of the bug, inserting a null check to keep Parfait happy. The logic to ensure that the null cannot happen is a little convoluted and split across two places. The other half of the bug is addressed (separately, by other people) with an addition to the Parfait configuration files so that report_vm_error is treated as an exit (which it is except for certain cases of debugging). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail Url : http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140307/cdc62d93/signature.asc From vladimir.x.ivanov at oracle.com Fri Mar 7 08:26:45 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 07 Mar 2014 20:26:45 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock Message-ID: <5319F345.80607@oracle.com> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8023461 42 lines changed: 13 ins; 1 del; 28 mod The rule of thumb for VM is that a thread shouldn't hold any VM lock when it reaches a safepoint. It's not the case for MethodCompileQueue_lock now. The problem is that AdvancedThresholdPolicy updates task's rate when iterating compiler queue. It holds MethodCompileQueue_lock while doing so. Method counters are allocated lazily. If method counters aren't there and VM fails to allocate them, GC is initiated (see CollectorPolicy::satisfy_failed_metadata_allocation) and a thead entering a safepoint holding MethodCompileQueue lock. Normally, counters are initialized during method interpretation, but in Xcomp mode it's not the case. That's the mode where the failures are observed. The fix is to skip the update, if counters aren't allocated yet. Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly testing (in progress). Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Mar 7 09:08:39 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 07 Mar 2014 09:08:39 -0800 Subject: RFR (XS): 8028037: "[parfait] warnings from b114 for hotspot.src.share.vm " In-Reply-To: <5AE2AC8C-0278-4E72-9A39-667B64B71BB1@oracle.com> References: <5AE2AC8C-0278-4E72-9A39-667B64B71BB1@oracle.com> Message-ID: <5319FD17.5010507@oracle.com> David, Next line is not needed, I think it was there because some time ago proj_out() returned basic Node* type. as_Proj() is simple cast and assert. And that is already done in proj_out() method now. other_proj = other_proj -> as_Proj(); Also don't use spaces around '->' in your changes. Thanks, Vladimir On 3/7/14 6:50 AM, David Chase wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8028037 > Fix: http://cr.openjdk.java.net/~drchase/8028037/webrev.00/ > > Testing: local jtreg. > > The fix addresses half of the bug, inserting a null check to keep Parfait happy. The logic to ensure that the null cannot happen is a little convoluted and split across two places. > > The other half of the bug is addressed (separately, by other people) with an addition to the Parfait configuration files so that report_vm_error is treated as an exit (which it is except for certain cases of debugging). > > From david.r.chase at oracle.com Fri Mar 7 11:38:42 2014 From: david.r.chase at oracle.com (David Chase) Date: Fri, 7 Mar 2014 14:38:42 -0500 Subject: RFR (XS): 8028037: "[parfait] warnings from b114 for hotspot.src.share.vm " In-Reply-To: <5319FD17.5010507@oracle.com> References: <5AE2AC8C-0278-4E72-9A39-667B64B71BB1@oracle.com> <5319FD17.5010507@oracle.com> Message-ID: <6BB54560-3626-4E2C-89FA-F25486DB9C8C@oracle.com> It's a trivial change, but I'm rerunning jtreg anyhow: new fix: http://cr.openjdk.java.net/~drchase/8028037/webrev.01/ On 2014-03-07, at 12:08 PM, Vladimir Kozlov wrote: > David, > > Next line is not needed, I think it was there because some time ago proj_out() returned basic Node* type. as_Proj() is simple cast and assert. And that is already done in proj_out() method now. > > other_proj = other_proj -> as_Proj(); > > Also don't use spaces around '->' in your changes. > > Thanks, > Vladimir > > On 3/7/14 6:50 AM, David Chase wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8028037 >> Fix: http://cr.openjdk.java.net/~drchase/8028037/webrev.00/ >> >> Testing: local jtreg. >> >> The fix addresses half of the bug, inserting a null check to keep Parfait happy. The logic to ensure that the null cannot happen is a little convoluted and split across two places. >> >> The other half of the bug is addressed (separately, by other people) with an addition to the Parfait configuration files so that report_vm_error is treated as an exit (which it is except for certain cases of debugging). >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail Url : http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140307/b0bc30ca/signature.asc From vladimir.kozlov at oracle.com Fri Mar 7 11:45:20 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 07 Mar 2014 11:45:20 -0800 Subject: RFR (XS): 8028037: "[parfait] warnings from b114 for hotspot.src.share.vm " In-Reply-To: <6BB54560-3626-4E2C-89FA-F25486DB9C8C@oracle.com> References: <5AE2AC8C-0278-4E72-9A39-667B64B71BB1@oracle.com> <5319FD17.5010507@oracle.com> <6BB54560-3626-4E2C-89FA-F25486DB9C8C@oracle.com> Message-ID: <531A21D0.8020701@oracle.com> Good. Thanks, Vladimir On 3/7/14 11:38 AM, David Chase wrote: > It's a trivial change, but I'm rerunning jtreg anyhow: > > new fix: http://cr.openjdk.java.net/~drchase/8028037/webrev.01/ > > > On 2014-03-07, at 12:08 PM, Vladimir Kozlov wrote: > >> David, >> >> Next line is not needed, I think it was there because some time ago proj_out() returned basic Node* type. as_Proj() is simple cast and assert. And that is already done in proj_out() method now. >> >> other_proj = other_proj -> as_Proj(); >> >> Also don't use spaces around '->' in your changes. >> >> Thanks, >> Vladimir >> >> On 3/7/14 6:50 AM, David Chase wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8028037 >>> Fix: http://cr.openjdk.java.net/~drchase/8028037/webrev.00/ >>> >>> Testing: local jtreg. >>> >>> The fix addresses half of the bug, inserting a null check to keep Parfait happy. The logic to ensure that the null cannot happen is a little convoluted and split across two places. >>> >>> The other half of the bug is addressed (separately, by other people) with an addition to the Parfait configuration files so that report_vm_error is treated as an exit (which it is except for certain cases of debugging). >>> >>> > From christian.thalinger at oracle.com Sun Mar 9 21:47:49 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Sun, 9 Mar 2014 14:47:49 -0700 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <3FF8512B-ACC8-4BAE-B9A2-79439DD67193@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <3FF8512B-ACC8-4BAE-B9A2-79439DD67193@oracle.com> Message-ID: <7A7DF220-BF4C-4A34-B2D7-A6E3A26121EA@oracle.com> On Mar 7, 2014, at 5:29 AM, Roland Westrelin wrote: > Thanks for reviewing this, Chris. > >> This is a very nice cleanup too. >> >> + assert(bang_size_in_bytes >= frame_size_in_bytes, "stack bang size incorrect?); >> >> I?m pretty sure this is almost always true but it might not (for whatever reason). I don?t see much value in that assert. > > With the current code, yes. But what if changes are made to the stack banging code? Wouldn?t it be nice to catch a problem if something goes very wrong? > >> src/share/vm/opto/compile.cpp: >> >> + int Compile::bang_size_in_bytes() const { >> + int callee_locals = method() != NULL ? method()->max_locals() : 0; >> + int interpreter_frame_size = _interpreter_frame_size; >> + return MAX2(interpreter_frame_size, frame_size_in_bytes()); >> + } >> >> callee_locals is unused. Is there a reason you load _interpreter_frame_size into a local variable? > > Thanks for catching that. I?ll clean it up. > >> >> src/share/vm/asm/assembler.cpp: >> >> ! int bang_end = (StackShadowPages+1)*page_size; >> >> Why +1? > > From my first email: Sure but what about the people who read the code but didn?t see that email? > >>> This change in AbstractAssembler::generate_stack_overflow_check(): >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. > > > >> >> src/share/vm/ci/ciMethod.cpp: >> >> + int ciMethod::get_stack_effect_at_invoke(int bci, Bytecodes::Code code, int& inputs) { >> + int ciMethod::stack_effect_if_at_invoke(int bci) { >> >> Either both with or without ?get?. > > Ok. > >> Since this is very hard to test what testing did you do? > > Indeed. > I logged every stack size computation that the compilers do, ran some tests (some subset of specjvm98) with DeoptimizeALot and verified with a script that the stack size computation at deoptimization matches the one done by the compilers before. > I ran regression tests from: java/lang, java/util, hotspot/compiler, hotspot/runtime, hotspot/gc > + nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring > with -Xcomp and with and without -XX:+DeoptimizeALot on x64. Any tests that throw a StackOverflowError? > > Roland. > >> >> On Mar 6, 2014, at 3:08 AM, Roland Westrelin wrote: >> >>> This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>> >>> The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. >>> >>> This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). >>> >>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. >>> >>> This change in AbstractAssembler::generate_stack_overflow_check(): >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. >>> >>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>> >>> Roland. >> > From roland.westrelin at oracle.com Mon Mar 10 15:49:39 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 10 Mar 2014 16:49:39 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <7A7DF220-BF4C-4A34-B2D7-A6E3A26121EA@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <3FF8512B-ACC8-4BAE-B9A2-79439DD67193@oracle.com> <7A7DF220-BF4C-4A34-B2D7-A6E3A26121EA@oracle.com> Message-ID: Hi Chris, See below >>> This is a very nice cleanup too. >>> >>> + assert(bang_size_in_bytes >= frame_size_in_bytes, "stack bang size incorrect?); >>> >>> I?m pretty sure this is almost always true but it might not (for whatever reason). I don?t see much value in that assert. >> >> With the current code, yes. But what if changes are made to the stack banging code? Wouldn?t it be nice to catch a problem if something goes very wrong? >> >>> src/share/vm/opto/compile.cpp: >>> >>> + int Compile::bang_size_in_bytes() const { >>> + int callee_locals = method() != NULL ? method()->max_locals() : 0; >>> + int interpreter_frame_size = _interpreter_frame_size; >>> + return MAX2(interpreter_frame_size, frame_size_in_bytes()); >>> + } >>> >>> callee_locals is unused. Is there a reason you load _interpreter_frame_size into a local variable? >> >> Thanks for catching that. I?ll clean it up. >> >>> >>> src/share/vm/asm/assembler.cpp: >>> >>> ! int bang_end = (StackShadowPages+1)*page_size; >>> >>> Why +1? >> >> From my first email: > > Sure but what about the people who read the code but didn?t see that email? I?ll add a comment. >>>> This change in AbstractAssembler::generate_stack_overflow_check(): >>>> >>>> 137 int bang_end = (StackShadowPages+1)*page_size; >>>> >>>> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. >> >> >> >>> >>> src/share/vm/ci/ciMethod.cpp: >>> >>> + int ciMethod::get_stack_effect_at_invoke(int bci, Bytecodes::Code code, int& inputs) { >>> + int ciMethod::stack_effect_if_at_invoke(int bci) { >>> >>> Either both with or without ?get?. >> >> Ok. >> >>> Since this is very hard to test what testing did you do? >> >> Indeed. >> I logged every stack size computation that the compilers do, ran some tests (some subset of specjvm98) with DeoptimizeALot and verified with a script that the stack size computation at deoptimization matches the one done by the compilers before. >> I ran regression tests from: java/lang, java/util, hotspot/compiler, hotspot/runtime, hotspot/gc >> + nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring >> with -Xcomp and with and without -XX:+DeoptimizeALot on x64. > > Any tests that throw a StackOverflowError? The hotspot/compiler regression tests have a few. Otherwise, frankly, I don?t know. I don?t see what other testing I could do. Roland. > >> >> Roland. >> >>> >>> On Mar 6, 2014, at 3:08 AM, Roland Westrelin wrote: >>> >>>> This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: >>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>>> >>>> The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. >>>> >>>> This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). >>>> >>>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. >>>> >>>> This change in AbstractAssembler::generate_stack_overflow_check(): >>>> >>>> 137 int bang_end = (StackShadowPages+1)*page_size; >>>> >>>> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. >>>> >>>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>>> >>>> Roland. From christian.thalinger at oracle.com Mon Mar 10 17:04:49 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 10 Mar 2014 10:04:49 -0700 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <3FF8512B-ACC8-4BAE-B9A2-79439DD67193@oracle.com> <7A7DF220-BF4C-4A34-B2D7-A6E3A26121EA@oracle.com> Message-ID: <372A81A3-7410-427E-A463-A3E8C2F7D4FD@oracle.com> On Mar 10, 2014, at 8:49 AM, Roland Westrelin wrote: > Hi Chris, > > See below > >>>> This is a very nice cleanup too. >>>> >>>> + assert(bang_size_in_bytes >= frame_size_in_bytes, "stack bang size incorrect?); >>>> >>>> I?m pretty sure this is almost always true but it might not (for whatever reason). I don?t see much value in that assert. >>> >>> With the current code, yes. But what if changes are made to the stack banging code? Wouldn?t it be nice to catch a problem if something goes very wrong? >>> >>>> src/share/vm/opto/compile.cpp: >>>> >>>> + int Compile::bang_size_in_bytes() const { >>>> + int callee_locals = method() != NULL ? method()->max_locals() : 0; >>>> + int interpreter_frame_size = _interpreter_frame_size; >>>> + return MAX2(interpreter_frame_size, frame_size_in_bytes()); >>>> + } >>>> >>>> callee_locals is unused. Is there a reason you load _interpreter_frame_size into a local variable? >>> >>> Thanks for catching that. I?ll clean it up. >>> >>>> >>>> src/share/vm/asm/assembler.cpp: >>>> >>>> ! int bang_end = (StackShadowPages+1)*page_size; >>>> >>>> Why +1? >>> >>> From my first email: >> >> Sure but what about the people who read the code but didn?t see that email? > > I?ll add a comment. > >>>>> This change in AbstractAssembler::generate_stack_overflow_check(): >>>>> >>>>> 137 int bang_end = (StackShadowPages+1)*page_size; >>>>> >>>>> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. >>> >>> >>> >>>> >>>> src/share/vm/ci/ciMethod.cpp: >>>> >>>> + int ciMethod::get_stack_effect_at_invoke(int bci, Bytecodes::Code code, int& inputs) { >>>> + int ciMethod::stack_effect_if_at_invoke(int bci) { >>>> >>>> Either both with or without ?get?. >>> >>> Ok. >>> >>>> Since this is very hard to test what testing did you do? >>> >>> Indeed. >>> I logged every stack size computation that the compilers do, ran some tests (some subset of specjvm98) with DeoptimizeALot and verified with a script that the stack size computation at deoptimization matches the one done by the compilers before. >>> I ran regression tests from: java/lang, java/util, hotspot/compiler, hotspot/runtime, hotspot/gc >>> + nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring >>> with -Xcomp and with and without -XX:+DeoptimizeALot on x64. >> >> Any tests that throw a StackOverflowError? > > The hotspot/compiler regression tests have a few. Otherwise, frankly, I don?t know. I don?t see what other testing I could do. The testing you?ve done should be enough. Thanks. > > Roland. > >> >>> >>> Roland. >>> >>>> >>>> On Mar 6, 2014, at 3:08 AM, Roland Westrelin wrote: >>>> >>>>> This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: >>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>>>> >>>>> The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. >>>>> >>>>> This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). >>>>> >>>>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. >>>>> >>>>> This change in AbstractAssembler::generate_stack_overflow_check(): >>>>> >>>>> 137 int bang_end = (StackShadowPages+1)*page_size; >>>>> >>>>> is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. >>>>> >>>>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>>>> >>>>> Roland. From vladimir.kozlov at oracle.com Mon Mar 10 22:33:00 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Mar 2014 15:33:00 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <5319F345.80607@oracle.com> References: <5319F345.80607@oracle.com> Message-ID: <531E3D9C.2020004@oracle.com> The method Method::increment_interpreter_invocation_count(TRAP) changes are incorrect. It is used by C++ Interpreter and you did not modified code there. I would leave this method unchanged. The rest looks fine to me but Igor should know better this code. Thanks, Vladimir K On 3/7/14 8:26 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 > https://bugs.openjdk.java.net/browse/JDK-8023461 > 42 lines changed: 13 ins; 1 del; 28 mod > > The rule of thumb for VM is that a thread shouldn't hold any VM lock > when it reaches a safepoint. It's not the case for > MethodCompileQueue_lock now. > > The problem is that AdvancedThresholdPolicy updates task's rate when > iterating compiler queue. It holds MethodCompileQueue_lock while doing > so. Method counters are allocated lazily. If method counters aren't > there and VM fails to allocate them, GC is initiated (see > CollectorPolicy::satisfy_failed_metadata_allocation) and a thead > entering a safepoint holding MethodCompileQueue lock. > > Normally, counters are initialized during method interpretation, but in > Xcomp mode it's not the case. That's the mode where the failures are > observed. > > The fix is to skip the update, if counters aren't allocated yet. > > Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly > testing (in progress). > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Mon Mar 10 23:57:35 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 11 Mar 2014 03:57:35 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531E3D9C.2020004@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> Message-ID: <531E516F.5090000@oracle.com> Vladimir, thanks for the review. You are absolutely right about Method::increment_interpreter_invocation_count. Reverted the change. Updated fix: http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ Yes, Igor's feedback on this change would be invaluable. Best regards, Vladimir Ivanov On 3/11/14 2:33 AM, Vladimir Kozlov wrote: > The method Method::increment_interpreter_invocation_count(TRAP) changes > are incorrect. It is used by C++ Interpreter and you did not modified > code there. I would leave this method unchanged. > > The rest looks fine to me but Igor should know better this code. > > Thanks, > Vladimir K > > On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >> https://bugs.openjdk.java.net/browse/JDK-8023461 >> 42 lines changed: 13 ins; 1 del; 28 mod >> >> The rule of thumb for VM is that a thread shouldn't hold any VM lock >> when it reaches a safepoint. It's not the case for >> MethodCompileQueue_lock now. >> >> The problem is that AdvancedThresholdPolicy updates task's rate when >> iterating compiler queue. It holds MethodCompileQueue_lock while doing >> so. Method counters are allocated lazily. If method counters aren't >> there and VM fails to allocate them, GC is initiated (see >> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >> entering a safepoint holding MethodCompileQueue lock. >> >> Normally, counters are initialized during method interpretation, but in >> Xcomp mode it's not the case. That's the mode where the failures are >> observed. >> >> The fix is to skip the update, if counters aren't allocated yet. >> >> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >> testing (in progress). >> >> Best regards, >> Vladimir Ivanov From rednaxelafx at gmail.com Tue Mar 11 00:06:08 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 10 Mar 2014 17:06:08 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 Message-ID: Hi all, I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. static int foo(int[] a, int i) { return a[i]; } the actual load in C1 generated code would be (in AT&T syntax): mov 0x10(%rsi,%rax,4),%eax and there's an instruction prior to it that explicitly clears the high 32 bits, movslq %edx,%rax generated by LIRGenerator::emit_array_address(). So it's an invariant property enforced throughout C1, right? 2. There a piece of code in C1's linear scan register allocator that removes useless moves: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 // remove useless moves if (op->code() == lir_move) { assert(op->as_Op1() != NULL, "move must be LIR_Op1"); LIR_Op1* move = (LIR_Op1*)op; LIR_Opr src = move->in_opr(); LIR_Opr dst = move->result_opr(); if (dst == src || !dst->is_pointer() && !src->is_pointer() && src->is_same_register(dst)) { instructions->at_put(j, NULL); has_dead = true; } } and I'd like to ask two questions about it: 2.1: On amd64, moving between a 32-bit register and themselves has the side effect of clearing the high 32 bits of the corresponding 64-bit register. So the code being removed isn't entirely side-effect free. It's only safe to remove them if there's an invariant from question 1 holds. 2.2 This piece of code explicitly checks !LIR_Opr::is_pointer(). Why is this check needed? Could anybody share the history behind it? I thought LIR_Opr::is_same_register() checks LIR_Opr::is_register() which is stricter than !is_pointer(), which seems to make the !is_pointer() check redundant. Thanks, Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Mar 11 00:19:02 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Mar 2014 17:19:02 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531E516F.5090000@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> Message-ID: <531E5676.9030407@oracle.com> Looks good to me. Thanks, Vladimir On 3/10/14 4:57 PM, Vladimir Ivanov wrote: > Vladimir, thanks for the review. > > You are absolutely right about > Method::increment_interpreter_invocation_count. Reverted the change. > > Updated fix: > http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ > > Yes, Igor's feedback on this change would be invaluable. > > Best regards, > Vladimir Ivanov > > On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >> The method Method::increment_interpreter_invocation_count(TRAP) changes >> are incorrect. It is used by C++ Interpreter and you did not modified >> code there. I would leave this method unchanged. >> >> The rest looks fine to me but Igor should know better this code. >> >> Thanks, >> Vladimir K >> >> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>> 42 lines changed: 13 ins; 1 del; 28 mod >>> >>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>> when it reaches a safepoint. It's not the case for >>> MethodCompileQueue_lock now. >>> >>> The problem is that AdvancedThresholdPolicy updates task's rate when >>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>> so. Method counters are allocated lazily. If method counters aren't >>> there and VM fails to allocate them, GC is initiated (see >>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>> entering a safepoint holding MethodCompileQueue lock. >>> >>> Normally, counters are initialized during method interpretation, but in >>> Xcomp mode it's not the case. That's the mode where the failures are >>> observed. >>> >>> The fix is to skip the update, if counters aren't allocated yet. >>> >>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>> testing (in progress). >>> >>> Best regards, >>> Vladimir Ivanov From rednaxelafx at gmail.com Tue Mar 11 01:10:50 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 10 Mar 2014 18:10:50 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: Message-ID: One correction to my previous mail: by "clearing the high 32 bits" I meant to say making the high 32 bits contain something predictable (either zero-extended or sign-extended), instead of some garbage value. - Kris On Mon, Mar 10, 2014 at 5:06 PM, Krystal Mok wrote: > Hi all, > > I'd like to ask a couple of questions on C1's usage of 32-bit registers on > amd64, when they're a part of the corresponding 64-bit register (e.g. ESI > vs RSI). > > 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when > using it as a 32-bit register? If so, where does C1 enforce that? > > I see that for array indexing, C1 generates code that uses 64-bit register > whose actual value is only stored in the low 32-bit part, e.g. > > static int foo(int[] a, int i) { > return a[i]; > } > > the actual load in C1 generated code would be (in AT&T syntax): > > mov 0x10(%rsi,%rax,4),%eax > > and there's an instruction prior to it that explicitly clears the high 32 > bits, > > movslq %edx,%rax > > generated by LIRGenerator::emit_array_address(). > > So it's an invariant property enforced throughout C1, right? > > 2. There a piece of code in C1's linear scan register allocator that > removes useless moves: > > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 > > // remove useless moves > if (op->code() == lir_move) { > assert(op->as_Op1() != NULL, "move must be LIR_Op1"); > LIR_Op1* move = (LIR_Op1*)op; > LIR_Opr src = move->in_opr(); > LIR_Opr dst = move->result_opr(); > if (dst == src || > !dst->is_pointer() && !src->is_pointer() && > src->is_same_register(dst)) { > instructions->at_put(j, NULL); > has_dead = true; > } > } > > and I'd like to ask two questions about it: > > 2.1: On amd64, moving between a 32-bit register and themselves has the > side effect of clearing the high 32 bits of the corresponding 64-bit > register. So the code being removed isn't entirely side-effect free. It's > only safe to remove them if there's an invariant from question 1 holds. > > 2.2 This piece of code explicitly checks !LIR_Opr::is_pointer(). Why is > this check needed? Could anybody share the history behind it? > I thought LIR_Opr::is_same_register() checks LIR_Opr::is_register() which > is stricter than !is_pointer(), which seems to make the !is_pointer() check > redundant. > > Thanks, > Kris > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Mar 11 01:11:18 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Mar 2014 18:11:18 -0700 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> Message-ID: <531E62B6.1000708@oracle.com> Roland, Changes are good in general. I don't see corresponding changes in MachPrologNode in src/cpu/ppc/vm/ppc.ad. Do we need changes there? src/share/vm/opto/output.cpp should be + DEBUG_ONLY(|| true))); On 3/6/14 3:08 AM, Roland Westrelin wrote: > This test causes a deadlock because when the stack bang in the deopt or uncommon trap blobs triggers an exception, we throw the exception right away even if the deoptee has some monitors locked. We had several issues recently with the stack banging in the deopt/uncommon trap blobs and so rather than add more code to fix stack banging on deoptimization, this change removes the need for stack banging on deoptimization as discussed previously: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html > > The compilers compute by how much deoptimization would bang the stack at every possible deoptimization points in the compiled code and use the worst case to generate the stack banging in the nmethod. In debug builds, the stack banging code is still performed in the deopt/uncommon trap blobs but only to verify that the compiled code has done the stack banging correctly. Otherwise, the stack banging from deoptimization causes the VM to abort. > > This change contains some code refactoring. AbstractInterpreter::size_activation() is currently implemented as a call to AbstractInterpreter::layout_activation() but on most platforms, the logic to do the actual lay out of the activation and the logic to calculate its size are largely independent and having both done by layout_activation() feels wrong to me and error prone. I made AbstractInterpreter::size_activation() and AbstractInterpreter::layout_activation() two independent methods that share common helper functions if some code needs to be shared. I dropped unnecessary arguments to size_activation() in the current implementation as well. I also made it a template method so that it can be called with either a Method* (from the deoptimization code) or a ciMethod* (from the compilers). > > I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. > > This change in AbstractAssembler::generate_stack_overflow_check(): > > 137 int bang_end = (StackShadowPages+1)*page_size; > > is so that the stack banging code from the deopt/uncommon trap blobs and in the compiled code are consistent. Let?s say frame size is less than 1 page. During deoptimization, we bang sp+1 page and then sp+2 pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for a frame size of less than 1 page we need to bang sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then we need to bang sp+(StackShadowPages+2) pages etc. With +1 you will touch yellow page because it is inclusive if I read it right: while (bang_offset <= bang_end) { Can you test with StackShadowPages=1? Thanks, Vladimir > > http://cr.openjdk.java.net/~roland/8032410/webrev.01/ > > Roland. > From igor.veresov at oracle.com Tue Mar 11 02:52:20 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 10 Mar 2014 19:52:20 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: Message-ID: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> I think everything should be zero-extended by default on x64. The invariant should be supported by using only 32bit ops on 32bit arguments and using zero-extending loads. Not sure why we do sign extension in the element address formation, zero-extending would seem to be enough (which should be a no-op on x64). igor On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: > Hi all, > > I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). > > 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? > > I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. > > static int foo(int[] a, int i) { > return a[i]; > } > > the actual load in C1 generated code would be (in AT&T syntax): > > mov 0x10(%rsi,%rax,4),%eax > > and there's an instruction prior to it that explicitly clears the high 32 bits, > > movslq %edx,%rax > > generated by LIRGenerator::emit_array_address(). > > So it's an invariant property enforced throughout C1, right? > > 2. There a piece of code in C1's linear scan register allocator that removes useless moves: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 > > // remove useless moves > if (op->code() == lir_move) { > assert(op->as_Op1() != NULL, "move must be LIR_Op1"); > LIR_Op1* move = (LIR_Op1*)op; > LIR_Opr src = move->in_opr(); > LIR_Opr dst = move->result_opr(); > if (dst == src || > !dst->is_pointer() && !src->is_pointer() && > src->is_same_register(dst)) { > instructions->at_put(j, NULL); > has_dead = true; > } > } > > and I'd like to ask two questions about it: > > 2.1: On amd64, moving between a 32-bit register and themselves has the side effect of clearing the high 32 bits of the corresponding 64-bit register. So the code being removed isn't entirely side-effect free. It's only safe to remove them if there's an invariant from question 1 holds. > > 2.2 This piece of code explicitly checks !LIR_Opr::is_pointer(). Why is this check needed? Could anybody share the history behind it? > I thought LIR_Opr::is_same_register() checks LIR_Opr::is_register() which is stricter than !is_pointer(), which seems to make the !is_pointer() check redundant. > > Thanks, > Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Mar 11 03:31:36 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 10 Mar 2014 20:31:36 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531E516F.5090000@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> Message-ID: <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> I think it?s a reasonable fix. igor On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov wrote: > Vladimir, thanks for the review. > > You are absolutely right about Method::increment_interpreter_invocation_count. Reverted the change. > > Updated fix: > http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ > > Yes, Igor's feedback on this change would be invaluable. > > Best regards, > Vladimir Ivanov > > On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >> The method Method::increment_interpreter_invocation_count(TRAP) changes >> are incorrect. It is used by C++ Interpreter and you did not modified >> code there. I would leave this method unchanged. >> >> The rest looks fine to me but Igor should know better this code. >> >> Thanks, >> Vladimir K >> >> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>> 42 lines changed: 13 ins; 1 del; 28 mod >>> >>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>> when it reaches a safepoint. It's not the case for >>> MethodCompileQueue_lock now. >>> >>> The problem is that AdvancedThresholdPolicy updates task's rate when >>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>> so. Method counters are allocated lazily. If method counters aren't >>> there and VM fails to allocate them, GC is initiated (see >>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>> entering a safepoint holding MethodCompileQueue lock. >>> >>> Normally, counters are initialized during method interpretation, but in >>> Xcomp mode it's not the case. That's the mode where the failures are >>> observed. >>> >>> The fix is to skip the update, if counters aren't allocated yet. >>> >>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>> testing (in progress). >>> >>> Best regards, >>> Vladimir Ivanov From christian.thalinger at oracle.com Tue Mar 11 03:56:42 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 10 Mar 2014 20:56:42 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> Message-ID: <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: > I think everything should be zero-extended by default on x64. The invariant should be supported by using only 32bit ops on 32bit arguments and using zero-extending loads. Not sure why we do sign extension in the element address formation, zero-extending would seem to be enough (which should be a no-op on x64). I think the main reason C1 does a sign-extend on 64-bit is because pointers have the type T_LONG and we need the index register to be a T_LONG as well. Additionally to be able to reuse existing machinery we just do an I2L: #ifdef _LP64 if (index_opr->type() == T_INT) { LIR_Opr tmp = new_register(T_LONG); __ convert(Bytecodes::_i2l, index_opr, tmp); index_opr = tmp; } #endif > > igor > > On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: > >> Hi all, >> >> I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). >> >> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? >> >> I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. >> >> static int foo(int[] a, int i) { >> return a[i]; >> } >> >> the actual load in C1 generated code would be (in AT&T syntax): >> >> mov 0x10(%rsi,%rax,4),%eax >> >> and there's an instruction prior to it that explicitly clears the high 32 bits, >> >> movslq %edx,%rax >> >> generated by LIRGenerator::emit_array_address(). >> >> So it's an invariant property enforced throughout C1, right? >> >> 2. There a piece of code in C1's linear scan register allocator that removes useless moves: >> >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >> >> // remove useless moves >> if (op->code() == lir_move) { >> assert(op->as_Op1() != NULL, "move must be LIR_Op1"); >> LIR_Op1* move = (LIR_Op1*)op; >> LIR_Opr src = move->in_opr(); >> LIR_Opr dst = move->result_opr(); >> if (dst == src || >> !dst->is_pointer() && !src->is_pointer() && >> src->is_same_register(dst)) { >> instructions->at_put(j, NULL); >> has_dead = true; >> } >> } >> >> and I'd like to ask two questions about it: >> >> 2.1: On amd64, moving between a 32-bit register and themselves has the side effect of clearing the high 32 bits of the corresponding 64-bit register. So the code being removed isn't entirely side-effect free. It's only safe to remove them if there's an invariant from question 1 holds. >> >> 2.2 This piece of code explicitly checks !LIR_Opr::is_pointer(). Why is this check needed? Could anybody share the history behind it? >> I thought LIR_Opr::is_same_register() checks LIR_Opr::is_register() which is stricter than !is_pointer(), which seems to make the !is_pointer() check redundant. >> >> Thanks, >> Kris > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Tue Mar 11 06:38:01 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 10 Mar 2014 23:38:01 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> Message-ID: Hi Igor and Christian, Thanks a lot for your replies. I think my first question about the invariant boils down to these: 1. I can't trust any 64-bit register used as a 32-bit int to have its high 32 bits cleared, so: I have to always use 32-bit ops when possible; when having to use it in addressing, explicitly clear the high 32 bits. 2. The only special case of having to explicitly clear the high 32 bits is array addressing. Are these statements correct? Also, any thoughts on the second question on removing useless moves? Thanks, Kris On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger < christian.thalinger at oracle.com> wrote: > > On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: > > I think everything should be zero-extended by default on x64. The > invariant should be supported by using only 32bit ops on 32bit arguments > and using zero-extending loads. Not sure why we do sign extension in the > element address formation, zero-extending would seem to be enough (which > should be a no-op on x64). > > > I think the main reason C1 does a sign-extend on 64-bit is because > pointers have the type T_LONG and we need the index register to be a T_LONG > as well. Additionally to be able to reuse existing machinery we just do an > I2L: > > #ifdef _LP64 > if (index_opr->type() == T_INT) { > LIR_Opr tmp = new_register(T_LONG); > __ convert(Bytecodes::_i2l, index_opr, tmp); > index_opr = tmp; > } > #endif > > > igor > > On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: > > Hi all, > > I'd like to ask a couple of questions on C1's usage of 32-bit registers on > amd64, when they're a part of the corresponding 64-bit register (e.g. ESI > vs RSI). > > 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when > using it as a 32-bit register? If so, where does C1 enforce that? > > I see that for array indexing, C1 generates code that uses 64-bit register > whose actual value is only stored in the low 32-bit part, e.g. > > static int foo(int[] a, int i) { > return a[i]; > } > > the actual load in C1 generated code would be (in AT&T syntax): > > mov 0x10(%rsi,%rax,4),%eax > > and there's an instruction prior to it that explicitly clears the high 32 > bits, > > movslq %edx,%rax > > generated by LIRGenerator::emit_array_address(). > > So it's an invariant property enforced throughout C1, right? > > 2. There a piece of code in C1's linear scan register allocator that > removes useless moves: > > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 > > // remove useless moves > if (op->code() == lir_move) { > assert(op->as_Op1() != NULL, "move must be LIR_Op1"); > LIR_Op1* move = (LIR_Op1*)op; > LIR_Opr src = move->in_opr(); > LIR_Opr dst = move->result_opr(); > if (dst == src || > !dst->is_pointer() && !src->is_pointer() && > src->is_same_register(dst)) { > instructions->at_put(j, NULL); > has_dead = true; > } > } > > and I'd like to ask two questions about it: > > 2.1: On amd64, moving between a 32-bit register and themselves has the > side effect of clearing the high 32 bits of the corresponding 64-bit > register. So the code being removed isn't entirely side-effect free. It's > only safe to remove them if there's an invariant from question 1 holds. > > 2.2 This piece of code explicitly checks !LIR_Opr::is_pointer(). Why is > this check needed? Could anybody share the history behind it? > I thought LIR_Opr::is_same_register() checks LIR_Opr::is_register() which > is stricter than !is_pointer(), which seems to make the !is_pointer() check > redundant. > > Thanks, > Kris > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Mar 11 07:20:50 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 00:20:50 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> Message-ID: No, it?s quite the opposite. Upper 32bits should be clear (zeros) for 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints have upper word with zeros. So if you plan to call C2-compiled methods this must hold. Addressing requires that you use full 64-bit registers for the base and index, so if your index is 32bit, you must make it 64-bit one way on another. On SPARC however it?s another story, so you can?t rely on this in platform-independent way. igor On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: > Hi Igor and Christian, > > Thanks a lot for your replies. I think my first question about the invariant boils down to these: > > 1. I can't trust any 64-bit register used as a 32-bit int to have its high 32 bits cleared, so: I have to always use 32-bit ops when possible; when having to use it in addressing, explicitly clear the high 32 bits. > > 2. The only special case of having to explicitly clear the high 32 bits is array addressing. > > Are these statements correct? > > Also, any thoughts on the second question on removing useless moves? > > Thanks, > Kris > > > On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger wrote: > > On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: > >> I think everything should be zero-extended by default on x64. The invariant should be supported by using only 32bit ops on 32bit arguments and using zero-extending loads. Not sure why we do sign extension in the element address formation, zero-extending would seem to be enough (which should be a no-op on x64). > > I think the main reason C1 does a sign-extend on 64-bit is because pointers have the type T_LONG and we need the index register to be a T_LONG as well. Additionally to be able to reuse existing machinery we just do an I2L: > > #ifdef _LP64 > if (index_opr->type() == T_INT) { > LIR_Opr tmp = new_register(T_LONG); > __ convert(Bytecodes::_i2l, index_opr, tmp); > index_opr = tmp; > } > #endif > >> >> igor >> >> On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: >> >>> Hi all, >>> >>> I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). >>> >>> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? >>> >>> I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. >>> >>> static int foo(int[] a, int i) { >>> return a[i]; >>> } >>> >>> the actual load in C1 generated code would be (in AT&T syntax): >>> >>> mov 0x10(%rsi,%rax,4),%eax >>> >>> and there's an instruction prior to it that explicitly clears the high 32 bits, >>> >>> movslq %edx,%rax >>> >>> generated by LIRGenerator::emit_array_address(). >>> >>> So it's an invariant property enforced throughout C1, right? >>> >>> 2. There a piece of code in C1's linear scan register allocator that removes useless moves: >>> >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >>> >>> // remove useless moves >>> if (op->code() == lir_move) { >>> assert(op->as_Op1() != NULL, "move must be LIR_Op1"); >>> LIR_Op1* move = (LIR_Op1*)op; >>> LIR_Opr src = move->in_opr(); >>> LIR_Opr dst = move->result_opr(); >>> if (dst == src || >>> !dst->is_pointer() && !src->is_pointer() && >>> src->is_same_register(dst)) { >>> instructions->at_put(j, NULL); >>> has_dead = true; >>> } >>> } >>> >>> and I'd like to ask two questions about it: >>> >>> 2.1: On amd64, moving between a 32-bit register and themselves has the side effect of clearing the high 32 bits of the corresponding 64-bit register. So the code being removed isn't entirely side-effect free. It's only safe to remove them if there's an invariant from question 1 holds. >>> >>> 2.2 This piece of code explicitly checks !LIR_Opr::is_pointer(). Why is this check needed? Could anybody share the history behind it? >>> I thought LIR_Opr::is_same_register() checks LIR_Opr::is_register() which is stricter than !is_pointer(), which seems to make the !is_pointer() check redundant. >>> >>> Thanks, >>> Kris >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Tue Mar 11 07:34:27 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 11 Mar 2014 00:34:27 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> Message-ID: Hi Igor, Thanks again for your reply. I started out to believe that I should be able to trust the upper 32 bits being clean, but then I realized C1 did that i2l explicitly in array addressing. So I'm somehow confused about the assumptions in C1. If the upper 32 bits are guaranteed to be clean, why is there a need for a i2l anyway? Can't we just receive an int argument in esi and then use rsi directly in array addressing? What I really wanted to know is what could go wrong if we didn't have that i2l. If we're passing an int argument in a register, there should have been a move or a constant load, and that would have cleared the upper 32 bits already. I'm missing what the failing scenarios are... Thanks, Kris On Tuesday, March 11, 2014, Igor Veresov wrote: > No, it's quite the opposite. Upper 32bits should be clear (zeros) for > 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints > have upper word with zeros. So if you plan to call C2-compiled methods this > must hold. Addressing requires that you use full 64-bit registers for the > base and index, so if your index is 32bit, you must make it 64-bit one way > on another. > > On SPARC however it's another story, so you can't rely on this in > platform-independent way. > > igor > > On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: > > Hi Igor and Christian, > > Thanks a lot for your replies. I think my first question about the > invariant boils down to these: > > 1. I can't trust any 64-bit register used as a 32-bit int to have its high > 32 bits cleared, so: I have to always use 32-bit ops when possible; when > having to use it in addressing, explicitly clear the high 32 bits. > > 2. The only special case of having to explicitly clear the high 32 bits is > array addressing. > > Are these statements correct? > > Also, any thoughts on the second question on removing useless moves? > > Thanks, > Kris > > > On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger < > christian.thalinger at oracle.com> wrote: > > > On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: > > I think everything should be zero-extended by default on x64. The > invariant should be supported by using only 32bit ops on 32bit arguments > and using zero-extending loads. Not sure why we do sign extension in the > element address formation, zero-extending would seem to be enough (which > should be a no-op on x64). > > > I think the main reason C1 does a sign-extend on 64-bit is because > pointers have the type T_LONG and we need the index register to be a T_LONG > as well. Additionally to be able to reuse existing machinery we just do an > I2L: > > #ifdef _LP64 > if (index_opr->type() == T_INT) { > LIR_Opr tmp = new_register(T_LONG); > __ convert(Bytecodes::_i2l, index_opr, tmp); > index_opr = tmp; > } > #endif > > > igor > > On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: > > Hi all, > > I'd like to ask a couple of questions on C1's usage of 32-bit registers on > amd64, when they're a part of the corresponding 64-bit register (e.g. ESI > vs RSI). > > 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when > using it as a 32-bit register? If so, where does C1 enforce that? > > I see that for array indexing, C1 generates code that uses 64-bit register > whose actual value is only stored in the low 32-bit part, e.g. > > static int foo(int[] a, int i) { > return a[i]; > } > > the actual load in C1 generated code would be (in AT&T syntax): > > mov 0x10(%rsi,%rax,4),%eax > > and there's an instruction prior to it that explicitly clears the high 32 > bits, > > movslq %edx,%rax > > generated by LIRGenerator::emit_array_address(). > > So it's an invariant property enforced throughout C1, right? > > 2. There a piece of code in C1's linear scan register allocator that > removes useless moves: > > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 > > // remove useless moves > if (op- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Mar 11 07:58:43 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 11 Mar 2014 11:58:43 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> Message-ID: <531EC233.7050409@oracle.com> Igor, Vladimir, thanks for review. Best regards, Vladimir Ivanov On 3/11/14 7:31 AM, Igor Veresov wrote: > I think it?s a reasonable fix. > > igor > > On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov wrote: > >> Vladimir, thanks for the review. >> >> You are absolutely right about Method::increment_interpreter_invocation_count. Reverted the change. >> >> Updated fix: >> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >> >> Yes, Igor's feedback on this change would be invaluable. >> >> Best regards, >> Vladimir Ivanov >> >> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>> are incorrect. It is used by C++ Interpreter and you did not modified >>> code there. I would leave this method unchanged. >>> >>> The rest looks fine to me but Igor should know better this code. >>> >>> Thanks, >>> Vladimir K >>> >>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>> >>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>> when it reaches a safepoint. It's not the case for >>>> MethodCompileQueue_lock now. >>>> >>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>> so. Method counters are allocated lazily. If method counters aren't >>>> there and VM fails to allocate them, GC is initiated (see >>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>> entering a safepoint holding MethodCompileQueue lock. >>>> >>>> Normally, counters are initialized during method interpretation, but in >>>> Xcomp mode it's not the case. That's the mode where the failures are >>>> observed. >>>> >>>> The fix is to skip the update, if counters aren't allocated yet. >>>> >>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>> testing (in progress). >>>> >>>> Best regards, >>>> Vladimir Ivanov > From vladimir.x.ivanov at oracle.com Tue Mar 11 15:50:02 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 11 Mar 2014 19:50:02 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531EC233.7050409@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> Message-ID: <531F30AA.2070801@oracle.com> Unfortunately, it's not enough. There's another safepoint check. For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. I don't see how to avoid this situation. Any ideas? Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. Best regards, Vladimir Ivanov On 3/11/14 11:58 AM, Vladimir Ivanov wrote: > Igor, Vladimir, thanks for review. > > Best regards, > Vladimir Ivanov > > On 3/11/14 7:31 AM, Igor Veresov wrote: >> I think it?s a reasonable fix. >> >> igor >> >> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >> wrote: >> >>> Vladimir, thanks for the review. >>> >>> You are absolutely right about >>> Method::increment_interpreter_invocation_count. Reverted the change. >>> >>> Updated fix: >>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>> >>> Yes, Igor's feedback on this change would be invaluable. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>> code there. I would leave this method unchanged. >>>> >>>> The rest looks fine to me but Igor should know better this code. >>>> >>>> Thanks, >>>> Vladimir K >>>> >>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>> >>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>> when it reaches a safepoint. It's not the case for >>>>> MethodCompileQueue_lock now. >>>>> >>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>> so. Method counters are allocated lazily. If method counters aren't >>>>> there and VM fails to allocate them, GC is initiated (see >>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>> entering a safepoint holding MethodCompileQueue lock. >>>>> >>>>> Normally, counters are initialized during method interpretation, >>>>> but in >>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>> observed. >>>>> >>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>> >>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>> testing (in progress). >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >> From roland.westrelin at oracle.com Tue Mar 11 16:35:43 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 11 Mar 2014 17:35:43 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <531E62B6.1000708@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> Message-ID: <87pplslmlc.fsf@oracle.com> Hi Vladimir, > Changes are good in general. Thanks for reviewing this. > I don't see corresponding changes in MachPrologNode in > src/cpu/ppc/vm/ppc.ad. Do we need changes there? We must, I guess. I forgot about c2 ppc. I'll look into it. > src/share/vm/opto/output.cpp should be > > + DEBUG_ONLY(|| true))); Indeed. Thanks for spotting that. > On 3/6/14 3:08 AM, Roland Westrelin wrote: >> This test causes a deadlock because when the stack bang in the deopt >> or uncommon trap blobs triggers an exception, we throw the exception >> right away even if the deoptee has some monitors locked. We had >> several issues recently with the stack banging in the deopt/uncommon >> trap blobs and so rather than add more code to fix stack banging on >> deoptimization, this change removes the need for stack banging on >> deoptimization as discussed previously: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >> >> The compilers compute by how much deoptimization would bang the stack >> at every possible deoptimization points in the compiled code and use >> the worst case to generate the stack banging in the nmethod. In debug >> builds, the stack banging code is still performed in the >> deopt/uncommon trap blobs but only to verify that the compiled code >> has done the stack banging correctly. Otherwise, the stack banging >> from deoptimization causes the VM to abort. >> >> This change contains some code >> refactoring. AbstractInterpreter::size_activation() is currently >> implemented as a call to AbstractInterpreter::layout_activation() but >> on most platforms, the logic to do the actual lay out of the >> activation and the logic to calculate its size are largely >> independent and having both done by layout_activation() feels wrong >> to me and error prone. I made AbstractInterpreter::size_activation() >> and AbstractInterpreter::layout_activation() two independent methods >> that share common helper functions if some code needs to be shared. I >> dropped unnecessary arguments to size_activation() in the current >> implementation as well. I also made it a template method so that it >> can be called with either a Method* (from the deoptimization code) or >> a ciMethod* (from the compilers). >> >> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. >> >> This change in AbstractAssembler::generate_stack_overflow_check(): >> >> 137 int bang_end = (StackShadowPages+1)*page_size; >> >> is so that the stack banging code from the deopt/uncommon trap blobs >> and in the compiled code are consistent. Let?s say frame size is less >> than 1 page. During deoptimization, we bang sp+1 page and then sp+2 >> pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page >> and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? >> sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for >> a frame size of less than 1 page we need to bang >> sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then >> we need to bang sp+(StackShadowPages+2) pages etc. > > With +1 you will touch yellow page because it is inclusive if I read it > right: > > while (bang_offset <= bang_end) { > > Can you test with StackShadowPages=1? Are you suggesting I run with StackShadowPages=1 to check if: 137 int bang_end = (StackShadowPages+1)*page_size; is ok? What would I run with StackShadowPages=1? The hotspot-comp regression tests? All testing? Do you agree that if I revert to: 137 int bang_end = StackShadowPages*page_size; I need to change stack banging in the deopt/uncommon trap blobs to bang one less page? Roland. > > Thanks, > Vladimir > >> >> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >> >> Roland. >> From igor.veresov at oracle.com Tue Mar 11 16:36:42 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 09:36:42 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> Message-ID: <082C4D57-9C03-481E-B3E0-228A5099BFFD@oracle.com> In theory you need i2l, because the index can be negative. If you just used it as-is in addressing with the conversion that would be incorrect (addressing wants a 64-bit register). However, in this case you?re right, and we?re pretty sure it?s never negative and using it directly would be just as fine, except the type of the virtual register would be T_INT and we really want T_LONG. So, yes, you could have a conversion, say, ?ui2l" that essentially does nothing. But, sign-extending is not wrong either. igor On Mar 11, 2014, at 12:34 AM, Krystal Mok wrote: > Hi Igor, > > Thanks again for your reply. > > I started out to believe that I should be able to trust the upper 32 bits being clean, but then I realized C1 did that i2l explicitly in array addressing. So I'm somehow confused about the assumptions in C1. > > If the upper 32 bits are guaranteed to be clean, why is there a need for a i2l anyway? Can't we just receive an int argument in esi and then use rsi directly in array addressing? > > What I really wanted to know is what could go wrong if we didn't have that i2l. > > If we're passing an int argument in a register, there should have been a move or a constant load, and that would have cleared the upper 32 bits already. I'm missing what the failing scenarios are... > > Thanks, > Kris > > On Tuesday, March 11, 2014, Igor Veresov wrote: > No, it?s quite the opposite. Upper 32bits should be clear (zeros) for 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints have upper word with zeros. So if you plan to call C2-compiled methods this must hold. Addressing requires that you use full 64-bit registers for the base and index, so if your index is 32bit, you must make it 64-bit one way on another. > > On SPARC however it?s another story, so you can?t rely on this in platform-independent way. > > igor > > On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: > >> Hi Igor and Christian, >> >> Thanks a lot for your replies. I think my first question about the invariant boils down to these: >> >> 1. I can't trust any 64-bit register used as a 32-bit int to have its high 32 bits cleared, so: I have to always use 32-bit ops when possible; when having to use it in addressing, explicitly clear the high 32 bits. >> >> 2. The only special case of having to explicitly clear the high 32 bits is array addressing. >> >> Are these statements correct? >> >> Also, any thoughts on the second question on removing useless moves? >> >> Thanks, >> Kris >> >> >> On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger wrote: >> >> On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: >> >>> I think everything should be zero-extended by default on x64. The invariant should be supported by using only 32bit ops on 32bit arguments and using zero-extending loads. Not sure why we do sign extension in the element address formation, zero-extending would seem to be enough (which should be a no-op on x64). >> >> I think the main reason C1 does a sign-extend on 64-bit is because pointers have the type T_LONG and we need the index register to be a T_LONG as well. Additionally to be able to reuse existing machinery we just do an I2L: >> >> #ifdef _LP64 >> if (index_opr->type() == T_INT) { >> LIR_Opr tmp = new_register(T_LONG); >> __ convert(Bytecodes::_i2l, index_opr, tmp); >> index_opr = tmp; >> } >> #endif >> >>> >>> igor >>> >>> On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: >>> >>>> Hi all, >>>> >>>> I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). >>>> >>>> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? >>>> >>>> I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. >>>> >>>> static int foo(int[] a, int i) { >>>> return a[i]; >>>> } >>>> >>>> the actual load in C1 generated code would be (in AT&T syntax): >>>> >>>> mov 0x10(%rsi,%rax,4),%eax >>>> >>>> and there's an instruction prior to it that explicitly clears the high 32 bits, >>>> >>>> movslq %edx,%rax >>>> >>>> generated by LIRGenerator::emit_array_address(). >>>> >>>> So it's an invariant property enforced throughout C1, right? >>>> >>>> 2. There a piece of code in C1's linear scan register allocator that removes useless moves: >>>> >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >>>> >>>> // remove useless moves >>>> if (op- -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Tue Mar 11 17:03:15 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 11 Mar 2014 18:03:15 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <87pplslmlc.fsf@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> Message-ID: Hi, thanks for thinking about PPC. Goetz will have a look at this tomorrow. If it's not too urgent, it would be nice if you could wait with pushing until we get it running on PPC64 as well. Thanks, Volker On Tue, Mar 11, 2014 at 5:35 PM, Roland Westrelin wrote: > > Hi Vladimir, > >> Changes are good in general. > > Thanks for reviewing this. > >> I don't see corresponding changes in MachPrologNode in >> src/cpu/ppc/vm/ppc.ad. Do we need changes there? > > We must, I guess. I forgot about c2 ppc. I'll look into it. > >> src/share/vm/opto/output.cpp should be >> >> + DEBUG_ONLY(|| true))); > > Indeed. Thanks for spotting that. > >> On 3/6/14 3:08 AM, Roland Westrelin wrote: >>> This test causes a deadlock because when the stack bang in the deopt >>> or uncommon trap blobs triggers an exception, we throw the exception >>> right away even if the deoptee has some monitors locked. We had >>> several issues recently with the stack banging in the deopt/uncommon >>> trap blobs and so rather than add more code to fix stack banging on >>> deoptimization, this change removes the need for stack banging on >>> deoptimization as discussed previously: >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>> >>> The compilers compute by how much deoptimization would bang the stack >>> at every possible deoptimization points in the compiled code and use >>> the worst case to generate the stack banging in the nmethod. In debug >>> builds, the stack banging code is still performed in the >>> deopt/uncommon trap blobs but only to verify that the compiled code >>> has done the stack banging correctly. Otherwise, the stack banging >>> from deoptimization causes the VM to abort. >>> >>> This change contains some code >>> refactoring. AbstractInterpreter::size_activation() is currently >>> implemented as a call to AbstractInterpreter::layout_activation() but >>> on most platforms, the logic to do the actual lay out of the >>> activation and the logic to calculate its size are largely >>> independent and having both done by layout_activation() feels wrong >>> to me and error prone. I made AbstractInterpreter::size_activation() >>> and AbstractInterpreter::layout_activation() two independent methods >>> that share common helper functions if some code needs to be shared. I >>> dropped unnecessary arguments to size_activation() in the current >>> implementation as well. I also made it a template method so that it >>> can be called with either a Method* (from the deoptimization code) or >>> a ciMethod* (from the compilers). >>> >>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. >>> >>> This change in AbstractAssembler::generate_stack_overflow_check(): >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is so that the stack banging code from the deopt/uncommon trap blobs >>> and in the compiled code are consistent. Let?s say frame size is less >>> than 1 page. During deoptimization, we bang sp+1 page and then sp+2 >>> pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page >>> and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? >>> sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for >>> a frame size of less than 1 page we need to bang >>> sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then >>> we need to bang sp+(StackShadowPages+2) pages etc. >> >> With +1 you will touch yellow page because it is inclusive if I read it >> right: >> >> while (bang_offset <= bang_end) { >> >> Can you test with StackShadowPages=1? > > Are you suggesting I run with StackShadowPages=1 to check if: > > 137 int bang_end = (StackShadowPages+1)*page_size; > > is ok? > What would I run with StackShadowPages=1? The hotspot-comp regression > tests? All testing? > > Do you agree that if I revert to: > > 137 int bang_end = StackShadowPages*page_size; > > I need to change stack banging in the deopt/uncommon trap blobs to bang > one less page? > > Roland. >> >> Thanks, >> Vladimir >> >>> >>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>> >>> Roland. >>> From roland.westrelin at oracle.com Tue Mar 11 18:05:36 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 11 Mar 2014 19:05:36 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> Message-ID: <97F5D8E4-A24A-4FB3-8DEA-4E18025D3680@oracle.com> Hi Volker, > thanks for thinking about PPC. > Goetz will have a look at this tomorrow. > If it's not too urgent, it would be nice if you could wait with > pushing until we get it running on PPC64 as well. Sure. The webrev has a change to the cpp interpreter for PPC. I don?t know if it compiles. I forgot the ppc.ad file. And the upcoming PPC template interpreter will also need some changes. Roland. From igor.veresov at oracle.com Tue Mar 11 18:50:17 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 11:50:17 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531F30AA.2070801@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> Message-ID: <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> Could you please remind me why we can?t enter a safepoint while holding the MethodCompileQueue_lock? igor On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov wrote: > Unfortunately, it's not enough. There's another safepoint check. > > For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. > > I don't see how to avoid this situation. Any ideas? > Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. > > Best regards, > Vladimir Ivanov > > On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >> Igor, Vladimir, thanks for review. >> >> Best regards, >> Vladimir Ivanov >> >> On 3/11/14 7:31 AM, Igor Veresov wrote: >>> I think it?s a reasonable fix. >>> >>> igor >>> >>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>> wrote: >>> >>>> Vladimir, thanks for the review. >>>> >>>> You are absolutely right about >>>> Method::increment_interpreter_invocation_count. Reverted the change. >>>> >>>> Updated fix: >>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>> >>>> Yes, Igor's feedback on this change would be invaluable. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>>> code there. I would leave this method unchanged. >>>>> >>>>> The rest looks fine to me but Igor should know better this code. >>>>> >>>>> Thanks, >>>>> Vladimir K >>>>> >>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>> >>>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>>> when it reaches a safepoint. It's not the case for >>>>>> MethodCompileQueue_lock now. >>>>>> >>>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>>> so. Method counters are allocated lazily. If method counters aren't >>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>> >>>>>> Normally, counters are initialized during method interpretation, >>>>>> but in >>>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>>> observed. >>>>>> >>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>> >>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>>> testing (in progress). >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>> From rednaxelafx at gmail.com Tue Mar 11 21:53:47 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 11 Mar 2014 14:53:47 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: <082C4D57-9C03-481E-B3E0-228A5099BFFD@oracle.com> References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> <082C4D57-9C03-481E-B3E0-228A5099BFFD@oracle.com> Message-ID: Hi Igor, Alrighty, thanks again for your reply! I've got it straight now. Any ideas on the second question that I asked, about the check on !is_pointer() in LinearScan? Thanks, Kris On Tue, Mar 11, 2014 at 9:36 AM, Igor Veresov wrote: > In theory you need i2l, because the index can be negative. If you just > used it as-is in addressing with the conversion that would be incorrect > (addressing wants a 64-bit register). However, in this case you're right, > and we're pretty sure it's never negative and using it directly would be > just as fine, except the type of the virtual register would be T_INT and we > really want T_LONG. So, yes, you could have a conversion, say, "ui2l" that > essentially does nothing. But, sign-extending is not wrong either. > > igor > > On Mar 11, 2014, at 12:34 AM, Krystal Mok wrote: > > Hi Igor, > > Thanks again for your reply. > > I started out to believe that I should be able to trust the upper 32 bits > being clean, but then I realized C1 did that i2l explicitly in array > addressing. So I'm somehow confused about the assumptions in C1. > > If the upper 32 bits are guaranteed to be clean, why is there a need for a > i2l anyway? Can't we just receive an int argument in esi and then use > rsi directly in array addressing? > > What I really wanted to know is what could go wrong if we didn't have that > i2l. > > If we're passing an int argument in a register, there should have been a > move or a constant load, and that would have cleared the upper 32 bits > already. I'm missing what the failing scenarios are... > > Thanks, > Kris > > On Tuesday, March 11, 2014, Igor Veresov wrote: > >> No, it's quite the opposite. Upper 32bits should be clear (zeros) for >> 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints >> have upper word with zeros. So if you plan to call C2-compiled methods this >> must hold. Addressing requires that you use full 64-bit registers for the >> base and index, so if your index is 32bit, you must make it 64-bit one way >> on another. >> >> On SPARC however it's another story, so you can't rely on this in >> platform-independent way. >> >> igor >> >> On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: >> >> Hi Igor and Christian, >> >> Thanks a lot for your replies. I think my first question about the >> invariant boils down to these: >> >> 1. I can't trust any 64-bit register used as a 32-bit int to have its >> high 32 bits cleared, so: I have to always use 32-bit ops when possible; >> when having to use it in addressing, explicitly clear the high 32 bits. >> >> 2. The only special case of having to explicitly clear the high 32 bits >> is array addressing. >> >> Are these statements correct? >> >> Also, any thoughts on the second question on removing useless moves? >> >> Thanks, >> Kris >> >> >> On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger < >> christian.thalinger at oracle.com> wrote: >> >> >> On Mar 10, 2014, at 7:52 PM, Igor Veresov >> wrote: >> >> I think everything should be zero-extended by default on x64. The >> invariant should be supported by using only 32bit ops on 32bit arguments >> and using zero-extending loads. Not sure why we do sign extension in the >> element address formation, zero-extending would seem to be enough (which >> should be a no-op on x64). >> >> >> I think the main reason C1 does a sign-extend on 64-bit is because >> pointers have the type T_LONG and we need the index register to be a T_LONG >> as well. Additionally to be able to reuse existing machinery we just do an >> I2L: >> >> #ifdef _LP64 >> if (index_opr->type() == T_INT) { >> LIR_Opr tmp = new_register(T_LONG); >> __ convert(Bytecodes::_i2l, index_opr, tmp); >> index_opr = tmp; >> } >> #endif >> >> >> igor >> >> On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: >> >> Hi all, >> >> I'd like to ask a couple of questions on C1's usage of 32-bit registers >> on amd64, when they're a part of the corresponding 64-bit register (e.g. >> ESI vs RSI). >> >> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when >> using it as a 32-bit register? If so, where does C1 enforce that? >> >> I see that for array indexing, C1 generates code that uses 64-bit >> register whose actual value is only stored in the low 32-bit part, e.g. >> >> static int foo(int[] a, int i) { >> return a[i]; >> } >> >> the actual load in C1 generated code would be (in AT&T syntax): >> >> mov 0x10(%rsi,%rax,4),%eax >> >> and there's an instruction prior to it that explicitly clears the high 32 >> bits, >> >> movslq %edx,%rax >> >> generated by LIRGenerator::emit_array_address(). >> >> So it's an invariant property enforced throughout C1, right? >> >> 2. There a piece of code in C1's linear scan register allocator that >> removes useless moves: >> >> >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >> >> // remove useless moves >> if (op- >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Mar 11 22:04:06 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 12 Mar 2014 02:04:06 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> Message-ID: <531F8856.2050807@oracle.com> The policy for a thread is not to hold any locks VM can block on when entering a safepoint (see Thread::check_for_valid_safepoint_state). Otherwise we would need to be very careful about what code can be executed during a safepoint to avoid deadlocks. There are exceptions (like Threads_lock and Compile_lock), but generally we try to adhere the rule. Making an exception for MethodCompileQueue looks safe (I went through the code and didn't find any scenarios when VM can attempt to grab it during a safepoint), but I'd like to avoid it if possible. Best regards, Vladimir Ivanov On 3/11/14 10:50 PM, Igor Veresov wrote: > Could you please remind me why we can?t enter a safepoint while holding the MethodCompileQueue_lock? > > igor > > On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov wrote: > >> Unfortunately, it's not enough. There's another safepoint check. >> >> For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. >> >> I don't see how to avoid this situation. Any ideas? >> Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. >> >> Best regards, >> Vladimir Ivanov >> >> On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >>> Igor, Vladimir, thanks for review. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 3/11/14 7:31 AM, Igor Veresov wrote: >>>> I think it?s a reasonable fix. >>>> >>>> igor >>>> >>>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>>> wrote: >>>> >>>>> Vladimir, thanks for the review. >>>>> >>>>> You are absolutely right about >>>>> Method::increment_interpreter_invocation_count. Reverted the change. >>>>> >>>>> Updated fix: >>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>>> >>>>> Yes, Igor's feedback on this change would be invaluable. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>>>> code there. I would leave this method unchanged. >>>>>> >>>>>> The rest looks fine to me but Igor should know better this code. >>>>>> >>>>>> Thanks, >>>>>> Vladimir K >>>>>> >>>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>>> >>>>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>>>> when it reaches a safepoint. It's not the case for >>>>>>> MethodCompileQueue_lock now. >>>>>>> >>>>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>>>> so. Method counters are allocated lazily. If method counters aren't >>>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>>> >>>>>>> Normally, counters are initialized during method interpretation, >>>>>>> but in >>>>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>>>> observed. >>>>>>> >>>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>>> >>>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>>>> testing (in progress). >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>> > From igor.veresov at oracle.com Tue Mar 11 23:11:22 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 16:11:22 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531F8856.2050807@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> <531F8856.2050807@oracle.com> Message-ID: <044D1ED2-D7F7-48F4-A3DF-C15028EBA9D6@oracle.com> I vaguely remember that is was allowed before. That?s basically the reason why everything has handles in the policy. I need to recall how that works... Btw, I may be wrong but it seems like there could be a race in MethodCounters creation. There is a similar problem with MDO, but we take a lock for it to avoid races. igor On Mar 11, 2014, at 3:04 PM, Vladimir Ivanov wrote: > The policy for a thread is not to hold any locks VM can block on when entering a safepoint (see Thread::check_for_valid_safepoint_state). > > Otherwise we would need to be very careful about what code can be executed during a safepoint to avoid deadlocks. > > There are exceptions (like Threads_lock and Compile_lock), but generally we try to adhere the rule. > > Making an exception for MethodCompileQueue looks safe (I went through the code and didn't find any scenarios when VM can attempt to grab it during a safepoint), but I'd like to avoid it if possible. > > Best regards, > Vladimir Ivanov > > On 3/11/14 10:50 PM, Igor Veresov wrote: >> Could you please remind me why we can?t enter a safepoint while holding the MethodCompileQueue_lock? >> >> igor >> >> On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov wrote: >> >>> Unfortunately, it's not enough. There's another safepoint check. >>> >>> For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. >>> >>> I don't see how to avoid this situation. Any ideas? >>> Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >>>> Igor, Vladimir, thanks for review. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> On 3/11/14 7:31 AM, Igor Veresov wrote: >>>>> I think it?s a reasonable fix. >>>>> >>>>> igor >>>>> >>>>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>>>> wrote: >>>>> >>>>>> Vladimir, thanks for the review. >>>>>> >>>>>> You are absolutely right about >>>>>> Method::increment_interpreter_invocation_count. Reverted the change. >>>>>> >>>>>> Updated fix: >>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>>>> >>>>>> Yes, Igor's feedback on this change would be invaluable. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>>>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>>>>> code there. I would leave this method unchanged. >>>>>>> >>>>>>> The rest looks fine to me but Igor should know better this code. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir K >>>>>>> >>>>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>>>> >>>>>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>>>>> when it reaches a safepoint. It's not the case for >>>>>>>> MethodCompileQueue_lock now. >>>>>>>> >>>>>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>>>>> so. Method counters are allocated lazily. If method counters aren't >>>>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>>>> >>>>>>>> Normally, counters are initialized during method interpretation, >>>>>>>> but in >>>>>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>>>>> observed. >>>>>>>> >>>>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>>>> >>>>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>>>>> testing (in progress). >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Vladimir Ivanov >>>>> >> From igor.veresov at oracle.com Tue Mar 11 23:22:40 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 16:22:40 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> <082C4D57-9C03-481E-B3E0-228A5099BFFD@oracle.com> Message-ID: I don?t know. The only idea is that it could be for the case when we do pointer arithmetic in GC barriers and change the type from T_OBJECT to T_LONG/T_INT, at which point the register if it is the same should disappear from the oopmaps. But I?m probably wrong. igor On Mar 11, 2014, at 2:53 PM, Krystal Mok wrote: > Hi Igor, > > Alrighty, thanks again for your reply! I've got it straight now. > > Any ideas on the second question that I asked, about the check on !is_pointer() in LinearScan? > > Thanks, > Kris > > > On Tue, Mar 11, 2014 at 9:36 AM, Igor Veresov wrote: > In theory you need i2l, because the index can be negative. If you just used it as-is in addressing with the conversion that would be incorrect (addressing wants a 64-bit register). However, in this case you?re right, and we?re pretty sure it?s never negative and using it directly would be just as fine, except the type of the virtual register would be T_INT and we really want T_LONG. So, yes, you could have a conversion, say, ?ui2l" that essentially does nothing. But, sign-extending is not wrong either. > > igor > > On Mar 11, 2014, at 12:34 AM, Krystal Mok wrote: > >> Hi Igor, >> >> Thanks again for your reply. >> >> I started out to believe that I should be able to trust the upper 32 bits being clean, but then I realized C1 did that i2l explicitly in array addressing. So I'm somehow confused about the assumptions in C1. >> >> If the upper 32 bits are guaranteed to be clean, why is there a need for a i2l anyway? Can't we just receive an int argument in esi and then use rsi directly in array addressing? >> >> What I really wanted to know is what could go wrong if we didn't have that i2l. >> >> If we're passing an int argument in a register, there should have been a move or a constant load, and that would have cleared the upper 32 bits already. I'm missing what the failing scenarios are... >> >> Thanks, >> Kris >> >> On Tuesday, March 11, 2014, Igor Veresov wrote: >> No, it?s quite the opposite. Upper 32bits should be clear (zeros) for 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints have upper word with zeros. So if you plan to call C2-compiled methods this must hold. Addressing requires that you use full 64-bit registers for the base and index, so if your index is 32bit, you must make it 64-bit one way on another. >> >> On SPARC however it?s another story, so you can?t rely on this in platform-independent way. >> >> igor >> >> On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: >> >>> Hi Igor and Christian, >>> >>> Thanks a lot for your replies. I think my first question about the invariant boils down to these: >>> >>> 1. I can't trust any 64-bit register used as a 32-bit int to have its high 32 bits cleared, so: I have to always use 32-bit ops when possible; when having to use it in addressing, explicitly clear the high 32 bits. >>> >>> 2. The only special case of having to explicitly clear the high 32 bits is array addressing. >>> >>> Are these statements correct? >>> >>> Also, any thoughts on the second question on removing useless moves? >>> >>> Thanks, >>> Kris >>> >>> >>> On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger wrote: >>> >>> On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: >>> >>>> I think everything should be zero-extended by default on x64. The invariant should be supported by using only 32bit ops on 32bit arguments and using zero-extending loads. Not sure why we do sign extension in the element address formation, zero-extending would seem to be enough (which should be a no-op on x64). >>> >>> I think the main reason C1 does a sign-extend on 64-bit is because pointers have the type T_LONG and we need the index register to be a T_LONG as well. Additionally to be able to reuse existing machinery we just do an I2L: >>> >>> #ifdef _LP64 >>> if (index_opr->type() == T_INT) { >>> LIR_Opr tmp = new_register(T_LONG); >>> __ convert(Bytecodes::_i2l, index_opr, tmp); >>> index_opr = tmp; >>> } >>> #endif >>> >>>> >>>> igor >>>> >>>> On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). >>>>> >>>>> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? >>>>> >>>>> I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. >>>>> >>>>> static int foo(int[] a, int i) { >>>>> return a[i]; >>>>> } >>>>> >>>>> the actual load in C1 generated code would be (in AT&T syntax): >>>>> >>>>> mov 0x10(%rsi,%rax,4),%eax >>>>> >>>>> and there's an instruction prior to it that explicitly clears the high 32 bits, >>>>> >>>>> movslq %edx,%rax >>>>> >>>>> generated by LIRGenerator::emit_array_address(). >>>>> >>>>> So it's an invariant property enforced throughout C1, right? >>>>> >>>>> 2. There a piece of code in C1's linear scan register allocator that removes useless moves: >>>>> >>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >>>>> >>>>> // remove useless moves >>>>> if (op- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Tue Mar 11 23:26:54 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 11 Mar 2014 16:26:54 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> <082C4D57-9C03-481E-B3E0-228A5099BFFD@oracle.com> Message-ID: Hi Igor, I guess is_pointer() has been a confusing name: it's not about whether the semantic type is a pointer type or not, but rather if the contents of this LIR_Opr is allocated in an instance, in which case is_pointer() is true; or if the contents are actually packed in the LIR_Opr* pointer, in which case it's a fake pointer and is_pointer() is false. When LIR_Opr::is_register() is true, LIR_is_pointer() is always false. So I believe the !is_pointer() check is redundant. Does that sound reasonable? Thanks, Kris On Tue, Mar 11, 2014 at 4:22 PM, Igor Veresov wrote: > I don't know. The only idea is that it could be for the case when we do > pointer arithmetic in GC barriers and change the type from T_OBJECT to > T_LONG/T_INT, at which point the register if it is the same should > disappear from the oopmaps. But I'm probably wrong. > > igor > > > On Mar 11, 2014, at 2:53 PM, Krystal Mok wrote: > > Hi Igor, > > Alrighty, thanks again for your reply! I've got it straight now. > > Any ideas on the second question that I asked, about the check on > !is_pointer() in LinearScan? > > Thanks, > Kris > > > On Tue, Mar 11, 2014 at 9:36 AM, Igor Veresov wrote: > >> In theory you need i2l, because the index can be negative. If you just >> used it as-is in addressing with the conversion that would be incorrect >> (addressing wants a 64-bit register). However, in this case you're right, >> and we're pretty sure it's never negative and using it directly would be >> just as fine, except the type of the virtual register would be T_INT and we >> really want T_LONG. So, yes, you could have a conversion, say, "ui2l" that >> essentially does nothing. But, sign-extending is not wrong either. >> >> igor >> >> On Mar 11, 2014, at 12:34 AM, Krystal Mok wrote: >> >> Hi Igor, >> >> Thanks again for your reply. >> >> I started out to believe that I should be able to trust the upper 32 bits >> being clean, but then I realized C1 did that i2l explicitly in array >> addressing. So I'm somehow confused about the assumptions in C1. >> >> If the upper 32 bits are guaranteed to be clean, why is there a need for >> a i2l anyway? Can't we just receive an int argument in esi and then use >> rsi directly in array addressing? >> >> What I really wanted to know is what could go wrong if we didn't have >> that i2l. >> >> If we're passing an int argument in a register, there should have been a >> move or a constant load, and that would have cleared the upper 32 bits >> already. I'm missing what the failing scenarios are... >> >> Thanks, >> Kris >> >> On Tuesday, March 11, 2014, Igor Veresov wrote: >> >>> No, it's quite the opposite. Upper 32bits should be clear (zeros) for >>> 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints >>> have upper word with zeros. So if you plan to call C2-compiled methods this >>> must hold. Addressing requires that you use full 64-bit registers for the >>> base and index, so if your index is 32bit, you must make it 64-bit one way >>> on another. >>> >>> On SPARC however it's another story, so you can't rely on this in >>> platform-independent way. >>> >>> igor >>> >>> On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: >>> >>> Hi Igor and Christian, >>> >>> Thanks a lot for your replies. I think my first question about the >>> invariant boils down to these: >>> >>> 1. I can't trust any 64-bit register used as a 32-bit int to have its >>> high 32 bits cleared, so: I have to always use 32-bit ops when possible; >>> when having to use it in addressing, explicitly clear the high 32 bits. >>> >>> 2. The only special case of having to explicitly clear the high 32 bits >>> is array addressing. >>> >>> Are these statements correct? >>> >>> Also, any thoughts on the second question on removing useless moves? >>> >>> Thanks, >>> Kris >>> >>> >>> On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger < >>> christian.thalinger at oracle.com> wrote: >>> >>> >>> On Mar 10, 2014, at 7:52 PM, Igor Veresov >>> wrote: >>> >>> I think everything should be zero-extended by default on x64. The >>> invariant should be supported by using only 32bit ops on 32bit arguments >>> and using zero-extending loads. Not sure why we do sign extension in the >>> element address formation, zero-extending would seem to be enough (which >>> should be a no-op on x64). >>> >>> >>> I think the main reason C1 does a sign-extend on 64-bit is because >>> pointers have the type T_LONG and we need the index register to be a T_LONG >>> as well. Additionally to be able to reuse existing machinery we just do an >>> I2L: >>> >>> #ifdef _LP64 >>> if (index_opr->type() == T_INT) { >>> LIR_Opr tmp = new_register(T_LONG); >>> __ convert(Bytecodes::_i2l, index_opr, tmp); >>> index_opr = tmp; >>> } >>> #endif >>> >>> >>> igor >>> >>> On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: >>> >>> Hi all, >>> >>> I'd like to ask a couple of questions on C1's usage of 32-bit registers >>> on amd64, when they're a part of the corresponding 64-bit register (e.g. >>> ESI vs RSI). >>> >>> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when >>> using it as a 32-bit register? If so, where does C1 enforce that? >>> >>> I see that for array indexing, C1 generates code that uses 64-bit >>> register whose actual value is only stored in the low 32-bit part, e.g. >>> >>> static int foo(int[] a, int i) { >>> return a[i]; >>> } >>> >>> the actual load in C1 generated code would be (in AT&T syntax): >>> >>> mov 0x10(%rsi,%rax,4),%eax >>> >>> and there's an instruction prior to it that explicitly clears the high >>> 32 bits, >>> >>> movslq %edx,%rax >>> >>> generated by LIRGenerator::emit_array_address(). >>> >>> So it's an invariant property enforced throughout C1, right? >>> >>> 2. There a piece of code in C1's linear scan register allocator that >>> removes useless moves: >>> >>> >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >>> >>> // remove useless moves >>> if (op- >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Mar 11 23:34:24 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 12 Mar 2014 03:34:24 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <044D1ED2-D7F7-48F4-A3DF-C15028EBA9D6@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> <531F8856.2050807@oracle.com> <044D1ED2-D7F7-48F4-A3DF-C15028EBA9D6@oracle.com> Message-ID: <531F9D80.7060400@oracle.com> Igor, > I vaguely remember that is was allowed before. That?s basically the reason why everything has handles in the policy. I need to recall how that works... It's there for a long time, but I converted the check from VM warning to fatal error only recently. AdvancedThresholdPolicy::select_task operates on raw Method*. As I can see in the sources, handles are used only in Method::build_method_counters. Lazy allocation of method counters wasn't there originally. It was added by 8010862. > Btw, I may be wrong but it seems like there could be a race in MethodCounters creation. There is a similar problem with MDO, but we take a lock for it to avoid races. You are right. There's a window in Method::build_method_counters when counters can be allocated twice. We need to grab a lock / use CAS to avoid memory leak here. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8010862 > > igor > > On Mar 11, 2014, at 3:04 PM, Vladimir Ivanov wrote: > >> The policy for a thread is not to hold any locks VM can block on when entering a safepoint (see Thread::check_for_valid_safepoint_state). >> >> Otherwise we would need to be very careful about what code can be executed during a safepoint to avoid deadlocks. >> >> There are exceptions (like Threads_lock and Compile_lock), but generally we try to adhere the rule. >> >> Making an exception for MethodCompileQueue looks safe (I went through the code and didn't find any scenarios when VM can attempt to grab it during a safepoint), but I'd like to avoid it if possible. >> >> Best regards, >> Vladimir Ivanov >> >> On 3/11/14 10:50 PM, Igor Veresov wrote: >>> Could you please remind me why we can?t enter a safepoint while holding the MethodCompileQueue_lock? >>> >>> igor >>> >>> On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov wrote: >>> >>>> Unfortunately, it's not enough. There's another safepoint check. >>>> >>>> For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. >>>> >>>> I don't see how to avoid this situation. Any ideas? >>>> Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >>>>> Igor, Vladimir, thanks for review. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> On 3/11/14 7:31 AM, Igor Veresov wrote: >>>>>> I think it?s a reasonable fix. >>>>>> >>>>>> igor >>>>>> >>>>>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>>>>> wrote: >>>>>> >>>>>>> Vladimir, thanks for the review. >>>>>>> >>>>>>> You are absolutely right about >>>>>>> Method::increment_interpreter_invocation_count. Reverted the change. >>>>>>> >>>>>>> Updated fix: >>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>>>>> >>>>>>> Yes, Igor's feedback on this change would be invaluable. >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>>>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>>>>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>>>>>> code there. I would leave this method unchanged. >>>>>>>> >>>>>>>> The rest looks fine to me but Igor should know better this code. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir K >>>>>>>> >>>>>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>>>>> >>>>>>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>>>>>> when it reaches a safepoint. It's not the case for >>>>>>>>> MethodCompileQueue_lock now. >>>>>>>>> >>>>>>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>>>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>>>>>> so. Method counters are allocated lazily. If method counters aren't >>>>>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>>>>> >>>>>>>>> Normally, counters are initialized during method interpretation, >>>>>>>>> but in >>>>>>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>>>>>> observed. >>>>>>>>> >>>>>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>>>>> >>>>>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>>>>>> testing (in progress). >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Vladimir Ivanov >>>>>> >>> > From vladimir.kozlov at oracle.com Wed Mar 12 00:27:23 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Mar 2014 17:27:23 -0700 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <87pplslmlc.fsf@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> Message-ID: <531FA9EB.2090906@oracle.com> On 3/11/14 9:35 AM, Roland Westrelin wrote: > > Hi Vladimir, > >> Changes are good in general. > > Thanks for reviewing this. > >> I don't see corresponding changes in MachPrologNode in >> src/cpu/ppc/vm/ppc.ad. Do we need changes there? > > We must, I guess. I forgot about c2 ppc. I'll look into it. > >> src/share/vm/opto/output.cpp should be >> >> + DEBUG_ONLY(|| true))); > > Indeed. Thanks for spotting that. > >> On 3/6/14 3:08 AM, Roland Westrelin wrote: >>> This test causes a deadlock because when the stack bang in the deopt >>> or uncommon trap blobs triggers an exception, we throw the exception >>> right away even if the deoptee has some monitors locked. We had >>> several issues recently with the stack banging in the deopt/uncommon >>> trap blobs and so rather than add more code to fix stack banging on >>> deoptimization, this change removes the need for stack banging on >>> deoptimization as discussed previously: >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>> >>> The compilers compute by how much deoptimization would bang the stack >>> at every possible deoptimization points in the compiled code and use >>> the worst case to generate the stack banging in the nmethod. In debug >>> builds, the stack banging code is still performed in the >>> deopt/uncommon trap blobs but only to verify that the compiled code >>> has done the stack banging correctly. Otherwise, the stack banging >>> from deoptimization causes the VM to abort. >>> >>> This change contains some code >>> refactoring. AbstractInterpreter::size_activation() is currently >>> implemented as a call to AbstractInterpreter::layout_activation() but >>> on most platforms, the logic to do the actual lay out of the >>> activation and the logic to calculate its size are largely >>> independent and having both done by layout_activation() feels wrong >>> to me and error prone. I made AbstractInterpreter::size_activation() >>> and AbstractInterpreter::layout_activation() two independent methods >>> that share common helper functions if some code needs to be shared. I >>> dropped unnecessary arguments to size_activation() in the current >>> implementation as well. I also made it a template method so that it >>> can be called with either a Method* (from the deoptimization code) or >>> a ciMethod* (from the compilers). >>> >>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. In templateInterpreter_x86.cpp can you add {} for if statement?: 100 #ifdef ASSERT 101 if (!EnableInvokeDynamic) >>> >>> This change in AbstractAssembler::generate_stack_overflow_check(): >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is so that the stack banging code from the deopt/uncommon trap blobs >>> and in the compiled code are consistent. Let?s say frame size is less >>> than 1 page. During deoptimization, we bang sp+1 page and then sp+2 >>> pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page >>> and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? >>> sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for >>> a frame size of less than 1 page we need to bang >>> sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then >>> we need to bang sp+(StackShadowPages+2) pages etc. >> >> With +1 you will touch yellow page because it is inclusive if I read it >> right: >> >> while (bang_offset <= bang_end) { >> >> Can you test with StackShadowPages=1? > > Are you suggesting I run with StackShadowPages=1 to check if: > > 137 int bang_end = (StackShadowPages+1)*page_size; > > is ok? Yes, because you may be creating hole in banging if compiled code called from interpreter. It should be consistent with AbstractInterpreterGenerator::bang_stack_shadow_pages(). Should you also change it in generate_native_wrapper()? > What would I run with StackShadowPages=1? The hotspot-comp regression > tests? All testing? I think compiler regression tests with -XX:+DeoptimizeALot flag should be enough. Thanks, Vladimir > > Do you agree that if I revert to: > > 137 int bang_end = StackShadowPages*page_size; > > I need to change stack banging in the deopt/uncommon trap blobs to bang > one less page? > > Roland. >> >> Thanks, >> Vladimir >> >>> >>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>> >>> Roland. >>> From igor.veresov at oracle.com Wed Mar 12 00:30:16 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 17:30:16 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <531F9D80.7060400@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> <531F8856.2050807@oracle.com> <044D1ED2-D7F7-48F4-A3DF-C15028EBA9D6@oracle.com> <531F9D80.7060400@oracle.com> Message-ID: Yes, I think you?re right, this is a bug. select_task() obviously assumes there can be no safepoints. May be we can introduce a flag in CompileTask that would indicate that it needs to be removed, and then return it from select_task() and correspondingly from CompileQueue::get() and then free it with the existing CompileTaskWrapper that is CompileBroker::compiler_thread_loop() instead of invoking a compile. That way, combined with your existing fixes select_task() will be lock-free. Does that make sense? igor On Mar 11, 2014, at 4:34 PM, Vladimir Ivanov wrote: > Igor, > >> I vaguely remember that is was allowed before. That?s basically the reason why everything has handles in the policy. I need to recall how that works... > It's there for a long time, but I converted the check from VM warning to fatal error only recently. > > AdvancedThresholdPolicy::select_task operates on raw Method*. As I can see in the sources, handles are used only in Method::build_method_counters. Lazy allocation of method counters wasn't there originally. It was added by 8010862. > >> Btw, I may be wrong but it seems like there could be a race in MethodCounters creation. There is a similar problem with MDO, but we take a lock for it to avoid races. > You are right. There's a window in Method::build_method_counters when counters can be allocated twice. We need to grab a lock / use CAS to avoid memory leak here. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8010862 > >> >> igor >> >> On Mar 11, 2014, at 3:04 PM, Vladimir Ivanov wrote: >> >>> The policy for a thread is not to hold any locks VM can block on when entering a safepoint (see Thread::check_for_valid_safepoint_state). >>> >>> Otherwise we would need to be very careful about what code can be executed during a safepoint to avoid deadlocks. >>> >>> There are exceptions (like Threads_lock and Compile_lock), but generally we try to adhere the rule. >>> >>> Making an exception for MethodCompileQueue looks safe (I went through the code and didn't find any scenarios when VM can attempt to grab it during a safepoint), but I'd like to avoid it if possible. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 3/11/14 10:50 PM, Igor Veresov wrote: >>>> Could you please remind me why we can?t enter a safepoint while holding the MethodCompileQueue_lock? >>>> >>>> igor >>>> >>>> On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov wrote: >>>> >>>>> Unfortunately, it's not enough. There's another safepoint check. >>>>> >>>>> For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. >>>>> >>>>> I don't see how to avoid this situation. Any ideas? >>>>> Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >>>>>> Igor, Vladimir, thanks for review. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> On 3/11/14 7:31 AM, Igor Veresov wrote: >>>>>>> I think it?s a reasonable fix. >>>>>>> >>>>>>> igor >>>>>>> >>>>>>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>>>>>> wrote: >>>>>>> >>>>>>>> Vladimir, thanks for the review. >>>>>>>> >>>>>>>> You are absolutely right about >>>>>>>> Method::increment_interpreter_invocation_count. Reverted the change. >>>>>>>> >>>>>>>> Updated fix: >>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>>>>>> >>>>>>>> Yes, Igor's feedback on this change would be invaluable. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Vladimir Ivanov >>>>>>>> >>>>>>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>>>>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>>>>>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>>>>>>> code there. I would leave this method unchanged. >>>>>>>>> >>>>>>>>> The rest looks fine to me but Igor should know better this code. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir K >>>>>>>>> >>>>>>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>>>>>> >>>>>>>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>>>>>>> when it reaches a safepoint. It's not the case for >>>>>>>>>> MethodCompileQueue_lock now. >>>>>>>>>> >>>>>>>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>>>>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>>>>>>> so. Method counters are allocated lazily. If method counters aren't >>>>>>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>>>>>> >>>>>>>>>> Normally, counters are initialized during method interpretation, >>>>>>>>>> but in >>>>>>>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>>>>>>> observed. >>>>>>>>>> >>>>>>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>>>>>> >>>>>>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>>>>>>> testing (in progress). >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Vladimir Ivanov >>>>>>> >>>> >> From igor.veresov at oracle.com Wed Mar 12 00:50:56 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 17:50:56 -0700 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> <531F8856.2050807@oracle.com> <044D1ED2-D7F7-48F4-A3DF-C15028EBA9D6@oracle.com> <531F9D80.7060400@oracle.com> Message-ID: <76925124-60A3-430D-AE00-A32051767514@oracle.com> What I mean is a strategy like this: diff --git a/src/share/vm/compiler/compileBroker.cpp b/src/share/vm/compiler/compileBroker.cpp --- a/src/share/vm/compiler/compileBroker.cpp +++ b/src/share/vm/compiler/compileBroker.cpp @@ -247,6 +247,7 @@ _is_complete = false; _is_success = false; + _skip_compile = false; _code_handle = NULL; _hot_method = NULL; @@ -1691,9 +1692,14 @@ // Assign the task to the current thread. Mark this compilation // thread as active for the profiler. CompileTaskWrapper ctw(task); + methodHandle method(thread, task->method()); + if (task->skip_compile()) { + method->clear_queued_for_compilation(); + continue; + } + nmethodLocker result_handle; // (handle for the nmethod produced by this task) task->set_code_handle(&result_handle); - methodHandle method(thread, task->method()); // Never compile a method if breakpoints are present in it if (method()->number_of_breakpoints() == 0) { diff --git a/src/share/vm/compiler/compileBroker.hpp b/src/share/vm/compiler/compileBroker.hpp --- a/src/share/vm/compiler/compileBroker.hpp +++ b/src/share/vm/compiler/compileBroker.hpp @@ -48,6 +48,7 @@ bool _is_complete; bool _is_success; bool _is_blocking; + bool _skip_compile; int _comp_level; int _num_inlined_bytecodes; nmethodLocker* _code_handle; // holder of eventual result @@ -78,6 +79,8 @@ bool is_blocking() const { return _is_blocking; } bool is_success() const { return _is_success; } + bool skip_compile() const { return _skip_compile; } + void set_skip_compile() { _skip_compile = true; } nmethodLocker* code_handle() const { return _code_handle; } void set_code_handle(nmethodLocker* l) { _code_handle = l; } nmethod* code() const; // _code_handle->code() diff --git a/src/share/vm/runtime/advancedThresholdPolicy.cpp b/src/share/vm/runtime/advancedThresholdPolicy.cpp --- a/src/share/vm/runtime/advancedThresholdPolicy.cpp +++ b/src/share/vm/runtime/advancedThresholdPolicy.cpp @@ -174,11 +174,8 @@ if (PrintTieredEvents) { print_event(REMOVE_FROM_QUEUE, method, method, task->osr_bci(), (CompLevel)task->comp_level()); } - CompileTaskWrapper ctw(task); // Frees the task - compile_queue->remove(task); - method->clear_queued_for_compilation(); - task = next_task; - continue; + task->set_skip_compile(); + return task; } // Select a method with a higher rate On Mar 11, 2014, at 5:30 PM, Igor Veresov wrote: > Yes, I think you?re right, this is a bug. select_task() obviously assumes there can be no safepoints. May be we can introduce a flag in CompileTask that would indicate that it needs to be removed, and then return it from select_task() and correspondingly from CompileQueue::get() and then free it with the existing CompileTaskWrapper that is CompileBroker::compiler_thread_loop() instead of invoking a compile. That way, combined with your existing fixes select_task() will be lock-free. Does that make sense? > > igor > > On Mar 11, 2014, at 4:34 PM, Vladimir Ivanov wrote: > >> Igor, >> >>> I vaguely remember that is was allowed before. That?s basically the reason why everything has handles in the policy. I need to recall how that works... >> It's there for a long time, but I converted the check from VM warning to fatal error only recently. >> >> AdvancedThresholdPolicy::select_task operates on raw Method*. As I can see in the sources, handles are used only in Method::build_method_counters. Lazy allocation of method counters wasn't there originally. It was added by 8010862. >> >>> Btw, I may be wrong but it seems like there could be a race in MethodCounters creation. There is a similar problem with MDO, but we take a lock for it to avoid races. >> You are right. There's a window in Method::build_method_counters when counters can be allocated twice. We need to grab a lock / use CAS to avoid memory leak here. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8010862 >> >>> >>> igor >>> >>> On Mar 11, 2014, at 3:04 PM, Vladimir Ivanov wrote: >>> >>>> The policy for a thread is not to hold any locks VM can block on when entering a safepoint (see Thread::check_for_valid_safepoint_state). >>>> >>>> Otherwise we would need to be very careful about what code can be executed during a safepoint to avoid deadlocks. >>>> >>>> There are exceptions (like Threads_lock and Compile_lock), but generally we try to adhere the rule. >>>> >>>> Making an exception for MethodCompileQueue looks safe (I went through the code and didn't find any scenarios when VM can attempt to grab it during a safepoint), but I'd like to avoid it if possible. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> On 3/11/14 10:50 PM, Igor Veresov wrote: >>>>> Could you please remind me why we can?t enter a safepoint while holding the MethodCompileQueue_lock? >>>>> >>>>> igor >>>>> >>>>> On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov wrote: >>>>> >>>>>> Unfortunately, it's not enough. There's another safepoint check. >>>>>> >>>>>> For blocking compilation requests of stale methods CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) sends a notification to blocked threads after cancelling the compilation. It can safepoint while locking on compile task before sending notification. >>>>>> >>>>>> I don't see how to avoid this situation. Any ideas? >>>>>> Otherwise, I need to exclude MethodCompileQueue from the check in Thread::check_for_valid_safepoint_state. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >>>>>>> Igor, Vladimir, thanks for review. >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> On 3/11/14 7:31 AM, Igor Veresov wrote: >>>>>>>> I think it?s a reasonable fix. >>>>>>>> >>>>>>>> igor >>>>>>>> >>>>>>>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Vladimir, thanks for the review. >>>>>>>>> >>>>>>>>> You are absolutely right about >>>>>>>>> Method::increment_interpreter_invocation_count. Reverted the change. >>>>>>>>> >>>>>>>>> Updated fix: >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>>>>>>> >>>>>>>>> Yes, Igor's feedback on this change would be invaluable. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Vladimir Ivanov >>>>>>>>> >>>>>>>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>>>>>>> The method Method::increment_interpreter_invocation_count(TRAP) changes >>>>>>>>>> are incorrect. It is used by C++ Interpreter and you did not modified >>>>>>>>>> code there. I would leave this method unchanged. >>>>>>>>>> >>>>>>>>>> The rest looks fine to me but Igor should know better this code. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir K >>>>>>>>>> >>>>>>>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>>>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>>>>>>> >>>>>>>>>>> The rule of thumb for VM is that a thread shouldn't hold any VM lock >>>>>>>>>>> when it reaches a safepoint. It's not the case for >>>>>>>>>>> MethodCompileQueue_lock now. >>>>>>>>>>> >>>>>>>>>>> The problem is that AdvancedThresholdPolicy updates task's rate when >>>>>>>>>>> iterating compiler queue. It holds MethodCompileQueue_lock while doing >>>>>>>>>>> so. Method counters are allocated lazily. If method counters aren't >>>>>>>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>>>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>>>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>>>>>>> >>>>>>>>>>> Normally, counters are initialized during method interpretation, >>>>>>>>>>> but in >>>>>>>>>>> Xcomp mode it's not the case. That's the mode where the failures are >>>>>>>>>>> observed. >>>>>>>>>>> >>>>>>>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>>>>>>> >>>>>>>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests from nightly >>>>>>>>>>> testing (in progress). >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Vladimir Ivanov >>>>>>>> >>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Mar 12 01:09:01 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 11 Mar 2014 18:09:01 -0700 Subject: C1's usage of 32-bit registers whose part of 64-bit registers on amd64 In-Reply-To: References: <56643F3A-4DFF-4EF7-9E25-2271E7BC3B79@oracle.com> <8465A7F2-6E38-42CB-8A51-F222CAEBF3EF@oracle.com> <082C4D57-9C03-481E-B3E0-228A5099BFFD@oracle.com> Message-ID: You?re right. Mixed it up with is_oop(). igor On Mar 11, 2014, at 4:26 PM, Krystal Mok wrote: > Hi Igor, > > I guess is_pointer() has been a confusing name: it's not about whether the semantic type is a pointer type or not, but rather if the contents of this LIR_Opr is allocated in an instance, in which case is_pointer() is true; or if the contents are actually packed in the LIR_Opr* pointer, in which case it's a fake pointer and is_pointer() is false. > > When LIR_Opr::is_register() is true, LIR_is_pointer() is always false. So I believe the !is_pointer() check is redundant. > Does that sound reasonable? > > Thanks, > Kris > > > On Tue, Mar 11, 2014 at 4:22 PM, Igor Veresov wrote: > I don?t know. The only idea is that it could be for the case when we do pointer arithmetic in GC barriers and change the type from T_OBJECT to T_LONG/T_INT, at which point the register if it is the same should disappear from the oopmaps. But I?m probably wrong. > > igor > > > On Mar 11, 2014, at 2:53 PM, Krystal Mok wrote: > >> Hi Igor, >> >> Alrighty, thanks again for your reply! I've got it straight now. >> >> Any ideas on the second question that I asked, about the check on !is_pointer() in LinearScan? >> >> Thanks, >> Kris >> >> >> On Tue, Mar 11, 2014 at 9:36 AM, Igor Veresov wrote: >> In theory you need i2l, because the index can be negative. If you just used it as-is in addressing with the conversion that would be incorrect (addressing wants a 64-bit register). However, in this case you?re right, and we?re pretty sure it?s never negative and using it directly would be just as fine, except the type of the virtual register would be T_INT and we really want T_LONG. So, yes, you could have a conversion, say, ?ui2l" that essentially does nothing. But, sign-extending is not wrong either. >> >> igor >> >> On Mar 11, 2014, at 12:34 AM, Krystal Mok wrote: >> >>> Hi Igor, >>> >>> Thanks again for your reply. >>> >>> I started out to believe that I should be able to trust the upper 32 bits being clean, but then I realized C1 did that i2l explicitly in array addressing. So I'm somehow confused about the assumptions in C1. >>> >>> If the upper 32 bits are guaranteed to be clean, why is there a need for a i2l anyway? Can't we just receive an int argument in esi and then use rsi directly in array addressing? >>> >>> What I really wanted to know is what could go wrong if we didn't have that i2l. >>> >>> If we're passing an int argument in a register, there should have been a move or a constant load, and that would have cleared the upper 32 bits already. I'm missing what the failing scenarios are... >>> >>> Thanks, >>> Kris >>> >>> On Tuesday, March 11, 2014, Igor Veresov wrote: >>> No, it?s quite the opposite. Upper 32bits should be clear (zeros) for 32bit values on x64. Moreover, C2 relies on the fact the on x64 32bit ints have upper word with zeros. So if you plan to call C2-compiled methods this must hold. Addressing requires that you use full 64-bit registers for the base and index, so if your index is 32bit, you must make it 64-bit one way on another. >>> >>> On SPARC however it?s another story, so you can?t rely on this in platform-independent way. >>> >>> igor >>> >>> On Mar 10, 2014, at 11:38 PM, Krystal Mok wrote: >>> >>>> Hi Igor and Christian, >>>> >>>> Thanks a lot for your replies. I think my first question about the invariant boils down to these: >>>> >>>> 1. I can't trust any 64-bit register used as a 32-bit int to have its high 32 bits cleared, so: I have to always use 32-bit ops when possible; when having to use it in addressing, explicitly clear the high 32 bits. >>>> >>>> 2. The only special case of having to explicitly clear the high 32 bits is array addressing. >>>> >>>> Are these statements correct? >>>> >>>> Also, any thoughts on the second question on removing useless moves? >>>> >>>> Thanks, >>>> Kris >>>> >>>> >>>> On Mon, Mar 10, 2014 at 8:56 PM, Christian Thalinger wrote: >>>> >>>> On Mar 10, 2014, at 7:52 PM, Igor Veresov wrote: >>>> >>>>> I think everything should be zero-extended by default on x64. The invariant should be supported by using only 32bit ops on 32bit arguments and using zero-extending loads. Not sure why we do sign extension in the element address formation, zero-extending would seem to be enough (which should be a no-op on x64). >>>> >>>> I think the main reason C1 does a sign-extend on 64-bit is because pointers have the type T_LONG and we need the index register to be a T_LONG as well. Additionally to be able to reuse existing machinery we just do an I2L: >>>> >>>> #ifdef _LP64 >>>> if (index_opr->type() == T_INT) { >>>> LIR_Opr tmp = new_register(T_LONG); >>>> __ convert(Bytecodes::_i2l, index_opr, tmp); >>>> index_opr = tmp; >>>> } >>>> #endif >>>> >>>>> >>>>> igor >>>>> >>>>> On Mar 10, 2014, at 5:06 PM, Krystal Mok wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'd like to ask a couple of questions on C1's usage of 32-bit registers on amd64, when they're a part of the corresponding 64-bit register (e.g. ESI vs RSI). >>>>>> >>>>>> 1. Does C1 ensure the high 32 bits of a 64-bit register is cleared when using it as a 32-bit register? If so, where does C1 enforce that? >>>>>> >>>>>> I see that for array indexing, C1 generates code that uses 64-bit register whose actual value is only stored in the low 32-bit part, e.g. >>>>>> >>>>>> static int foo(int[] a, int i) { >>>>>> return a[i]; >>>>>> } >>>>>> >>>>>> the actual load in C1 generated code would be (in AT&T syntax): >>>>>> >>>>>> mov 0x10(%rsi,%rax,4),%eax >>>>>> >>>>>> and there's an instruction prior to it that explicitly clears the high 32 bits, >>>>>> >>>>>> movslq %edx,%rax >>>>>> >>>>>> generated by LIRGenerator::emit_array_address(). >>>>>> >>>>>> So it's an invariant property enforced throughout C1, right? >>>>>> >>>>>> 2. There a piece of code in C1's linear scan register allocator that removes useless moves: >>>>>> >>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/480b0109db65/src/share/vm/c1/c1_LinearScan.cpp#l2996 >>>>>> >>>>>> // remove useless moves >>>>>> if (op- >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Mar 12 01:11:49 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Mar 2014 18:11:49 -0700 Subject: RFR(L): 8031755: Type speculation should be used to optimize explicit null checks In-Reply-To: <8B07BC32-F70C-4B10-AB8C-C82D1BA8AAFF@oracle.com> References: <8B07BC32-F70C-4B10-AB8C-C82D1BA8AAFF@oracle.com> Message-ID: <531FB455.3070203@oracle.com> On 2/28/14 9:43 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8031755/webrev.00/ > > Most Type profiling points record whether a null reference was seen but type speculation doesn?t currently take advantage of it. With this change, not only is the type seen at a profiling point feed to type speculation but also whether a null pointer was seen or not. This is then used to optimize null checks. > > The _speculative field becomes: > const TypePtr* _speculative; > > so that if all we know from profiling is that a reference is not null: > _speculative = TypePtr::NOT NULL; > > When a type is met with a TypePtr (NULL_PTR for instance), the meet must be applied to the speculative types as well. To keep the type system symmetric, the _speculative field had to be moved to class TypePtr. I also moved the _inline_depth there to keep all speculative stuff together but with the current code it could have stayed in class TypeOopPtr. > > Traps caused by a failed speculative null check are recorded with speculative traps, similarly to what is done for a failed class check. The idea is good, I think, but I need time to go through changes. What about changes in arguments.cpp and phaseX.cpp? I don't understand why you changed higher_equal_speculative() in parse2.cpp. > > I made the following change: > > *** 3561,3571 **** > > > // Since klasses are different, we require a LCA in the Java > // class hierarchy - which means we have to fall to at least NotNull. > if( ptr == TopPTR || ptr == AnyNull || ptr == Constant ) > ptr = NotNull; > > - instance_id = InstanceBot; > > > // Now we find the LCA of Java classes > ciKlass* k = this_klass->least_common_ancestor(tinst_klass); > return make(ptr, k, false, NULL, off, instance_id, speculative, depth); > } // End of case InstPtr > > --- 3706,3715 ?? > > because I hit a type not symmetric failure that I think it causes: > > === Meet Not Symmetric === > t = javax/management/openmbean/OpenType:AnyNull * (inline_depth=-2) > this= javax/management/openmbean/ArrayType:TopPTR *,iid=top (inline_depth=InlineDepthTop) > mt=(t meet this)= javax/management/openmbean/ArrayType:AnyNull * (inline_depth=-2) > t_dual= javax/management/openmbean/OpenType:NotNull *,iid=top (inline_depth=2) > this_dual= javax/management/openmbean/ArrayType * > mt_dual= javax/management/openmbean/ArrayType:NotNull *,iid=top (inline_depth=2) > mt_dual meet t_dual= javax/management/openmbean/OpenType:NotNull * (inline_depth=2) > mt_dual meet this_dual= javax/management/openmbean/ArrayType * The question is why OpenType:AnyNull does not have iid=top. It is in upper type lattice. Thanks, Vladimir > > Roland. > From christian.thalinger at oracle.com Wed Mar 12 04:38:12 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 11 Mar 2014 21:38:12 -0700 Subject: RFR (S): 8031203: remove SafepointPollOffset Message-ID: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8031203 http://cr.openjdk.java.net/~twisti/8031203 8031203: remove SafepointPollOffset Reviewed-by: SafepointPollOffset is only used in C1 because of JDK-4986249 which was never reproducible with C2. This seems to be an architecture anomaly and confined to certain chip versions. An old comment of JDK-4986249 mentions "some P4's". Running on any newer chips doesn't show that behavior. I suggest to remove that option. CCC request was filed and approved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Mar 12 11:58:20 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 12 Mar 2014 15:58:20 +0400 Subject: RFR (S): 8023461: Thread holding lock at safepoint that vm can block on: MethodCompileQueue_lock In-Reply-To: <76925124-60A3-430D-AE00-A32051767514@oracle.com> References: <5319F345.80607@oracle.com> <531E3D9C.2020004@oracle.com> <531E516F.5090000@oracle.com> <59216276-E78A-431E-A88E-AFAF89905F3E@oracle.com> <531EC233.7050409@oracle.com> <531F30AA.2070801@oracle.com> <29284C33-0E48-4765-99D3-2F6DAC1189C7@oracle.com> <531F8856.2050807@oracle.com> <044D1ED2-D7F7-48F4-A3DF-C15028EBA9D6@oracle.com> <531F9D80.7060400@oracle.com> <76925124-60A3-430D-AE00-A32051767514@oracle.com> Message-ID: <53204BDC.6080404@oracle.com> Igor, thanks for the suggestion. I thought about this approach and my concern was that we need multiple passes over the queue to clear stale compile requests (worst case is O(n^2)). Currently, the queue is cleared out in a single pass. But the only option I see is to accumulate canceled tasks in a side structure and send bulk notifications between iterations of compiler loop. Best regards, Vladimir Ivanov On 3/12/14 4:50 AM, Igor Veresov wrote: > What I mean is a strategy like this: > > diff --git a/src/share/vm/compiler/compileBroker.cpp > b/src/share/vm/compiler/compileBroker.cpp > --- a/src/share/vm/compiler/compileBroker.cpp > +++ b/src/share/vm/compiler/compileBroker.cpp > @@ -247,6 +247,7 @@ > > _is_complete = false; > _is_success = false; > + _skip_compile = false; > _code_handle = NULL; > > _hot_method = NULL; > @@ -1691,9 +1692,14 @@ > // Assign the task to the current thread. Mark this compilation > // thread as active for the profiler. > CompileTaskWrapper ctw(task); > + methodHandle method(thread, task->method()); > + if (task->skip_compile()) { > + method->clear_queued_for_compilation(); > + continue; > + } > + > nmethodLocker result_handle; // (handle for the nmethod produced > by this task) > task->set_code_handle(&result_handle); > - methodHandle method(thread, task->method()); > > // Never compile a method if breakpoints are present in it > if (method()->number_of_breakpoints() == 0) { > diff --git a/src/share/vm/compiler/compileBroker.hpp > b/src/share/vm/compiler/compileBroker.hpp > --- a/src/share/vm/compiler/compileBroker.hpp > +++ b/src/share/vm/compiler/compileBroker.hpp > @@ -48,6 +48,7 @@ > bool _is_complete; > bool _is_success; > bool _is_blocking; > + bool _skip_compile; > int _comp_level; > int _num_inlined_bytecodes; > nmethodLocker* _code_handle; // holder of eventual result > @@ -78,6 +79,8 @@ > bool is_blocking() const { return _is_blocking; } > bool is_success() const { return _is_success; } > > + bool skip_compile() const { return _skip_compile; } > + void set_skip_compile() { _skip_compile = true; } > nmethodLocker* code_handle() const { return _code_handle; } > void set_code_handle(nmethodLocker* l) { _code_handle = l; } > nmethod* code() const; // _code_handle->code() > diff --git a/src/share/vm/runtime/advancedThresholdPolicy.cpp > b/src/share/vm/runtime/advancedThresholdPolicy.cpp > --- a/src/share/vm/runtime/advancedThresholdPolicy.cpp > +++ b/src/share/vm/runtime/advancedThresholdPolicy.cpp > @@ -174,11 +174,8 @@ > if (PrintTieredEvents) { > print_event(REMOVE_FROM_QUEUE, method, method, > task->osr_bci(), (CompLevel)task->comp_level()); > } > - CompileTaskWrapper ctw(task); // Frees the task > - compile_queue->remove(task); > - method->clear_queued_for_compilation(); > - task = next_task; > - continue; > + task->set_skip_compile(); > + return task; > } > > // Select a method with a higher rate > On Mar 11, 2014, at 5:30 PM, Igor Veresov > wrote: > >> Yes, I think you?re right, this is a bug. select_task() obviously >> assumes there can be no safepoints. May be we can introduce a flag in >> CompileTask that would indicate that it needs to be removed, and then >> return it from select_task() and correspondingly from >> CompileQueue::get() and then free it with the existing >> CompileTaskWrapper that is CompileBroker::compiler_thread_loop() >> instead of invoking a compile. That way, combined with your existing >> fixes select_task() will be lock-free. Does that make sense? >> >> igor >> >> On Mar 11, 2014, at 4:34 PM, Vladimir Ivanov >> > >> wrote: >> >>> Igor, >>> >>>> I vaguely remember that is was allowed before. That?s basically the >>>> reason why everything has handles in the policy. I need to recall >>>> how that works... >>> It's there for a long time, but I converted the check from VM warning >>> to fatal error only recently. >>> >>> AdvancedThresholdPolicy::select_task operates on raw Method*. As I >>> can see in the sources, handles are used only in >>> Method::build_method_counters. Lazy allocation of method counters >>> wasn't there originally. It was added by 8010862. >>> >>>> Btw, I may be wrong but it seems like there could be a race in >>>> MethodCounters creation. There is a similar problem with MDO, but we >>>> take a lock for it to avoid races. >>> You are right. There's a window in Method::build_method_counters when >>> counters can be allocated twice. We need to grab a lock / use CAS to >>> avoid memory leak here. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8010862 >>> >>>> >>>> igor >>>> >>>> On Mar 11, 2014, at 3:04 PM, Vladimir Ivanov >>>> > >>>> wrote: >>>> >>>>> The policy for a thread is not to hold any locks VM can block on >>>>> when entering a safepoint (see >>>>> Thread::check_for_valid_safepoint_state). >>>>> >>>>> Otherwise we would need to be very careful about what code can be >>>>> executed during a safepoint to avoid deadlocks. >>>>> >>>>> There are exceptions (like Threads_lock and Compile_lock), but >>>>> generally we try to adhere the rule. >>>>> >>>>> Making an exception for MethodCompileQueue looks safe (I went >>>>> through the code and didn't find any scenarios when VM can attempt >>>>> to grab it during a safepoint), but I'd like to avoid it if possible. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> On 3/11/14 10:50 PM, Igor Veresov wrote: >>>>>> Could you please remind me why we can?t enter a safepoint while >>>>>> holding the MethodCompileQueue_lock? >>>>>> >>>>>> igor >>>>>> >>>>>> On Mar 11, 2014, at 8:50 AM, Vladimir Ivanov >>>>>> >>>>> > wrote: >>>>>> >>>>>>> Unfortunately, it's not enough. There's another safepoint check. >>>>>>> >>>>>>> For blocking compilation requests of stale methods >>>>>>> CompileTaskWrapper (see AdvancedThresholdPolicy::select_task) >>>>>>> sends a notification to blocked threads after cancelling the >>>>>>> compilation. It can safepoint while locking on compile task >>>>>>> before sending notification. >>>>>>> >>>>>>> I don't see how to avoid this situation. Any ideas? >>>>>>> Otherwise, I need to exclude MethodCompileQueue from the check in >>>>>>> Thread::check_for_valid_safepoint_state. >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> On 3/11/14 11:58 AM, Vladimir Ivanov wrote: >>>>>>>> Igor, Vladimir, thanks for review. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Vladimir Ivanov >>>>>>>> >>>>>>>> On 3/11/14 7:31 AM, Igor Veresov wrote: >>>>>>>>> I think it?s a reasonable fix. >>>>>>>>> >>>>>>>>> igor >>>>>>>>> >>>>>>>>> On Mar 10, 2014, at 4:57 PM, Vladimir Ivanov >>>>>>>>> >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Vladimir, thanks for the review. >>>>>>>>>> >>>>>>>>>> You are absolutely right about >>>>>>>>>> Method::increment_interpreter_invocation_count. Reverted the >>>>>>>>>> change. >>>>>>>>>> >>>>>>>>>> Updated fix: >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.01/ >>>>>>>>>> >>>>>>>>>> Yes, Igor's feedback on this change would be invaluable. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Vladimir Ivanov >>>>>>>>>> >>>>>>>>>> On 3/11/14 2:33 AM, Vladimir Kozlov wrote: >>>>>>>>>>> The method >>>>>>>>>>> Method::increment_interpreter_invocation_count(TRAP) changes >>>>>>>>>>> are incorrect. It is used by C++ Interpreter and you did not >>>>>>>>>>> modified >>>>>>>>>>> code there. I would leave this method unchanged. >>>>>>>>>>> >>>>>>>>>>> The rest looks fine to me but Igor should know better this code. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir K >>>>>>>>>>> >>>>>>>>>>> On 3/7/14 8:26 AM, Vladimir Ivanov wrote: >>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023461/webrev.00 >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8023461 >>>>>>>>>>>> 42 lines changed: 13 ins; 1 del; 28 mod >>>>>>>>>>>> >>>>>>>>>>>> The rule of thumb for VM is that a thread shouldn't hold any >>>>>>>>>>>> VM lock >>>>>>>>>>>> when it reaches a safepoint. It's not the case for >>>>>>>>>>>> MethodCompileQueue_lock now. >>>>>>>>>>>> >>>>>>>>>>>> The problem is that AdvancedThresholdPolicy updates task's >>>>>>>>>>>> rate when >>>>>>>>>>>> iterating compiler queue. It holds MethodCompileQueue_lock >>>>>>>>>>>> while doing >>>>>>>>>>>> so. Method counters are allocated lazily. If method counters >>>>>>>>>>>> aren't >>>>>>>>>>>> there and VM fails to allocate them, GC is initiated (see >>>>>>>>>>>> CollectorPolicy::satisfy_failed_metadata_allocation) and a thead >>>>>>>>>>>> entering a safepoint holding MethodCompileQueue lock. >>>>>>>>>>>> >>>>>>>>>>>> Normally, counters are initialized during method interpretation, >>>>>>>>>>>> but in >>>>>>>>>>>> Xcomp mode it's not the case. That's the mode where the >>>>>>>>>>>> failures are >>>>>>>>>>>> observed. >>>>>>>>>>>> >>>>>>>>>>>> The fix is to skip the update, if counters aren't allocated yet. >>>>>>>>>>>> >>>>>>>>>>>> Testing: added No_Safepoint_Verifier, JPRT, failing tests >>>>>>>>>>>> from nightly >>>>>>>>>>>> testing (in progress). >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Vladimir Ivanov >>>>>>>>> >>>>>> >>>> >> > From goetz.lindenmaier at sap.com Wed Mar 12 15:40:35 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 12 Mar 2014 15:40:35 +0000 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <531FA9EB.2090906@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CEA75E1@DEWDFEMB12A.global.corp.sap> Hi Roland, Thanks for considering ppc! I did the remaining required changes, but I still get some errors. I'm working on it to get it resolved as soon as possible, and then send the patch to you. I think it's a good idea to refactor layout_activation(). While reading the code I saw that you can further simplify the code: * You can remove extra_locals in size_activation() on x86. It's never used. * size_activation_helper on sparc must not use template M. (ci)Method is not used there. * After this, only max_locals() is used from M. I would propose to pass in max_locals() and get rid of the template argument. Best regards, Goetz. -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Mittwoch, 12. M?rz 2014 01:27 To: Roland Westrelin; hotspot-compiler-dev at openjdk.java.net Cc: ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 On 3/11/14 9:35 AM, Roland Westrelin wrote: > > Hi Vladimir, > >> Changes are good in general. > > Thanks for reviewing this. > >> I don't see corresponding changes in MachPrologNode in >> src/cpu/ppc/vm/ppc.ad. Do we need changes there? > > We must, I guess. I forgot about c2 ppc. I'll look into it. > >> src/share/vm/opto/output.cpp should be >> >> + DEBUG_ONLY(|| true))); > > Indeed. Thanks for spotting that. > >> On 3/6/14 3:08 AM, Roland Westrelin wrote: >>> This test causes a deadlock because when the stack bang in the deopt >>> or uncommon trap blobs triggers an exception, we throw the exception >>> right away even if the deoptee has some monitors locked. We had >>> several issues recently with the stack banging in the deopt/uncommon >>> trap blobs and so rather than add more code to fix stack banging on >>> deoptimization, this change removes the need for stack banging on >>> deoptimization as discussed previously: >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>> >>> The compilers compute by how much deoptimization would bang the stack >>> at every possible deoptimization points in the compiled code and use >>> the worst case to generate the stack banging in the nmethod. In debug >>> builds, the stack banging code is still performed in the >>> deopt/uncommon trap blobs but only to verify that the compiled code >>> has done the stack banging correctly. Otherwise, the stack banging >>> from deoptimization causes the VM to abort. >>> >>> This change contains some code >>> refactoring. AbstractInterpreter::size_activation() is currently >>> implemented as a call to AbstractInterpreter::layout_activation() but >>> on most platforms, the logic to do the actual lay out of the >>> activation and the logic to calculate its size are largely >>> independent and having both done by layout_activation() feels wrong >>> to me and error prone. I made AbstractInterpreter::size_activation() >>> and AbstractInterpreter::layout_activation() two independent methods >>> that share common helper functions if some code needs to be shared. I >>> dropped unnecessary arguments to size_activation() in the current >>> implementation as well. I also made it a template method so that it >>> can be called with either a Method* (from the deoptimization code) or >>> a ciMethod* (from the compilers). >>> >>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. In templateInterpreter_x86.cpp can you add {} for if statement?: 100 #ifdef ASSERT 101 if (!EnableInvokeDynamic) >>> >>> This change in AbstractAssembler::generate_stack_overflow_check(): >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is so that the stack banging code from the deopt/uncommon trap blobs >>> and in the compiled code are consistent. Let?s say frame size is less >>> than 1 page. During deoptimization, we bang sp+1 page and then sp+2 >>> pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page >>> and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? >>> sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for >>> a frame size of less than 1 page we need to bang >>> sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then >>> we need to bang sp+(StackShadowPages+2) pages etc. >> >> With +1 you will touch yellow page because it is inclusive if I read it >> right: >> >> while (bang_offset <= bang_end) { >> >> Can you test with StackShadowPages=1? > > Are you suggesting I run with StackShadowPages=1 to check if: > > 137 int bang_end = (StackShadowPages+1)*page_size; > > is ok? Yes, because you may be creating hole in banging if compiled code called from interpreter. It should be consistent with AbstractInterpreterGenerator::bang_stack_shadow_pages(). Should you also change it in generate_native_wrapper()? > What would I run with StackShadowPages=1? The hotspot-comp regression > tests? All testing? I think compiler regression tests with -XX:+DeoptimizeALot flag should be enough. Thanks, Vladimir > > Do you agree that if I revert to: > > 137 int bang_end = StackShadowPages*page_size; > > I need to change stack banging in the deopt/uncommon trap blobs to bang > one less page? > > Roland. >> >> Thanks, >> Vladimir >> >>> >>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>> >>> Roland. >>> From roland.westrelin at oracle.com Wed Mar 12 16:36:05 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 12 Mar 2014 17:36:05 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CEA75E1@DEWDFEMB12A.global.corp.sap> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA75E1@DEWDFEMB12A.global.corp.sap> Message-ID: <5D92930D-3FA4-4657-9EB4-605480B03742@oracle.com> Hi Goetz, > Thanks for considering ppc! I did the remaining required changes, but I still get > some errors. I'm working on it to get it resolved as soon as possible, and then > send the patch to you. Thanks for working on it. And making the suggestions below! > I think it's a good idea to refactor layout_activation(). While reading the code I > saw that you can further simplify the code: > * You can remove extra_locals in size_activation() on x86. It's never used. Right. Thanks. > * size_activation_helper on sparc must not use template M. (ci)Method is not used there. I had problems with the solaris C compiler that required some static functions to be template functions eventhough the template parameter was not used. > * After this, only max_locals() is used from M. I would propose to pass in max_locals() > and get rid of the template argument. Isn?t it only max_stack()? That?s a good suggestion. I will do that. Roland. > > Best regards, > Goetz. > > > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Mittwoch, 12. M?rz 2014 01:27 > To: Roland Westrelin; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 > > On 3/11/14 9:35 AM, Roland Westrelin wrote: >> >> Hi Vladimir, >> >>> Changes are good in general. >> >> Thanks for reviewing this. >> >>> I don't see corresponding changes in MachPrologNode in >>> src/cpu/ppc/vm/ppc.ad. Do we need changes there? >> >> We must, I guess. I forgot about c2 ppc. I'll look into it. >> >>> src/share/vm/opto/output.cpp should be >>> >>> + DEBUG_ONLY(|| true))); >> >> Indeed. Thanks for spotting that. >> >>> On 3/6/14 3:08 AM, Roland Westrelin wrote: >>>> This test causes a deadlock because when the stack bang in the deopt >>>> or uncommon trap blobs triggers an exception, we throw the exception >>>> right away even if the deoptee has some monitors locked. We had >>>> several issues recently with the stack banging in the deopt/uncommon >>>> trap blobs and so rather than add more code to fix stack banging on >>>> deoptimization, this change removes the need for stack banging on >>>> deoptimization as discussed previously: >>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>>> >>>> The compilers compute by how much deoptimization would bang the stack >>>> at every possible deoptimization points in the compiled code and use >>>> the worst case to generate the stack banging in the nmethod. In debug >>>> builds, the stack banging code is still performed in the >>>> deopt/uncommon trap blobs but only to verify that the compiled code >>>> has done the stack banging correctly. Otherwise, the stack banging >>>> from deoptimization causes the VM to abort. >>>> >>>> This change contains some code >>>> refactoring. AbstractInterpreter::size_activation() is currently >>>> implemented as a call to AbstractInterpreter::layout_activation() but >>>> on most platforms, the logic to do the actual lay out of the >>>> activation and the logic to calculate its size are largely >>>> independent and having both done by layout_activation() feels wrong >>>> to me and error prone. I made AbstractInterpreter::size_activation() >>>> and AbstractInterpreter::layout_activation() two independent methods >>>> that share common helper functions if some code needs to be shared. I >>>> dropped unnecessary arguments to size_activation() in the current >>>> implementation as well. I also made it a template method so that it >>>> can be called with either a Method* (from the deoptimization code) or >>>> a ciMethod* (from the compilers). >>>> >>>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. > > In templateInterpreter_x86.cpp can you add {} for if statement?: > > 100 #ifdef ASSERT > 101 if (!EnableInvokeDynamic) > > >>>> >>>> This change in AbstractAssembler::generate_stack_overflow_check(): >>>> >>>> 137 int bang_end = (StackShadowPages+1)*page_size; >>>> >>>> is so that the stack banging code from the deopt/uncommon trap blobs >>>> and in the compiled code are consistent. Let?s say frame size is less >>>> than 1 page. During deoptimization, we bang sp+1 page and then sp+2 >>>> pages ? sp+(StackShadowPages+1) pages. If the frame size is > 1 page >>>> and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages ? >>>> sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for >>>> a frame size of less than 1 page we need to bang >>>> sp+(StackShadowPages+1) pages, if it?s > 1 page and <= 2 pages then >>>> we need to bang sp+(StackShadowPages+2) pages etc. >>> >>> With +1 you will touch yellow page because it is inclusive if I read it >>> right: >>> >>> while (bang_offset <= bang_end) { >>> >>> Can you test with StackShadowPages=1? >> >> Are you suggesting I run with StackShadowPages=1 to check if: >> >> 137 int bang_end = (StackShadowPages+1)*page_size; >> >> is ok? > > Yes, because you may be creating hole in banging if compiled code called > from interpreter. It should be consistent with > AbstractInterpreterGenerator::bang_stack_shadow_pages(). > > Should you also change it in generate_native_wrapper()? > >> What would I run with StackShadowPages=1? The hotspot-comp regression >> tests? All testing? > > I think compiler regression tests with -XX:+DeoptimizeALot flag should > be enough. > > Thanks, > Vladimir > >> >> Do you agree that if I revert to: >> >> 137 int bang_end = StackShadowPages*page_size; >> >> I need to change stack banging in the deopt/uncommon trap blobs to bang >> one less page? >> >> Roland. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>>> >>>> Roland. >>>> From roland.westrelin at oracle.com Wed Mar 12 17:12:09 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 12 Mar 2014 18:12:09 +0100 Subject: RFR(L): 8031755: Type speculation should be used to optimize explicit null checks In-Reply-To: <531FB455.3070203@oracle.com> References: <8B07BC32-F70C-4B10-AB8C-C82D1BA8AAFF@oracle.com> <531FB455.3070203@oracle.com> Message-ID: <87fvmnl4t2.fsf@oracle.com> Hi Vladimir, Thanks for taking a look at this. >> http://cr.openjdk.java.net/~roland/8031755/webrev.00/ >> >> Most Type profiling points record whether a null reference was seen >> but type speculation doesn?t currently take advantage of it. With >> this change, not only is the type seen at a profiling point feed to >> type speculation but also whether a null pointer was seen or >> not. This is then used to optimize null checks. >> >> The _speculative field becomes: >> const TypePtr* _speculative; >> >> so that if all we know from profiling is that a reference is not null: >> _speculative = TypePtr::NOT NULL; >> >> When a type is met with a TypePtr (NULL_PTR for instance), the meet >> must be applied to the speculative types as well. To keep the type >> system symmetric, the _speculative field had to be moved to class >> TypePtr. I also moved the _inline_depth there to keep all speculative >> stuff together but with the current code it could have stayed in >> class TypeOopPtr. >> >> Traps caused by a failed speculative null check are recorded with >> speculative traps, similarly to what is done for a failed class >> check. > > The idea is good, I think, but I need time to go through changes. > > What about changes in arguments.cpp and phaseX.cpp? In c2_globals.hpp product(intx, MaxNodeLimit, 80000, \ "Maximum number of nodes") \ So today, we're decreasing the number of nodes if incremental inlining is on. In phaseX.cpp, the problem is that during an IGVN, some nodes that become disconnected from the graph are not removed from the IGVN's hash table. It's a bit ugly but maybe good enough for some verification code? > I don't understand why you changed higher_equal_speculative() in > parse2.cpp. I added a cleanup_speculative() to Type that removes the speculative type if it's useless (not an exact klass for instance). So: 1289 TypeNode* ccast = new (C) CheckCastPPNode(control(), obj, tboth); 1290 const Type* tcc = ccast->as_Type()->type(); 1291 assert(tcc != obj_type && tcc->higher_equal(obj_type), "must improve"); The speculative type may disappear and tcc->higher_equal_speculative(obj_type) is no longer true. >> I made the following change: >> >> *** 3561,3571 **** >> >> >> // Since klasses are different, we require a LCA in the Java >> // class hierarchy - which means we have to fall to at least NotNull. >> if( ptr == TopPTR || ptr == AnyNull || ptr == Constant ) >> ptr = NotNull; >> >> - instance_id = InstanceBot; >> >> >> // Now we find the LCA of Java classes >> ciKlass* k = this_klass->least_common_ancestor(tinst_klass); >> return make(ptr, k, false, NULL, off, instance_id, speculative, depth); >> } // End of case InstPtr >> >> --- 3706,3715 ?? >> >> because I hit a type not symmetric failure that I think it causes: >> >> === Meet Not Symmetric === >> t = javax/management/openmbean/OpenType:AnyNull * (inline_depth=-2) >> this= javax/management/openmbean/ArrayType:TopPTR *,iid=top (inline_depth=InlineDepthTop) >> mt=(t meet this)= javax/management/openmbean/ArrayType:AnyNull * (inline_depth=-2) >> t_dual= javax/management/openmbean/OpenType:NotNull *,iid=top (inline_depth=2) >> this_dual= javax/management/openmbean/ArrayType * >> mt_dual= javax/management/openmbean/ArrayType:NotNull *,iid=top (inline_depth=2) >> mt_dual meet t_dual= javax/management/openmbean/OpenType:NotNull * (inline_depth=2) >> mt_dual meet this_dual= javax/management/openmbean/ArrayType * > > The question is why OpenType:AnyNull does not have iid=top. It is in > upper type lattice. I'll see if I can come up with a test case for this one. Roland. From vladimir.kozlov at oracle.com Wed Mar 12 19:11:32 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Mar 2014 12:11:32 -0700 Subject: RFR (S): 8031203: remove SafepointPollOffset In-Reply-To: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> References: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> Message-ID: <5320B164.4010509@oracle.com> Looks good. Thanks, Vladimir On 3/11/14 9:38 PM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8031203 > http://cr.openjdk.java.net/~twisti/8031203 > > 8031203: remove SafepointPollOffset > Reviewed-by: > > SafepointPollOffset is only used in C1 because of JDK-4986249 which was > never reproducible with C2. > > This seems to be an architecture anomaly and confined to certain chip > versions. An old comment of JDK-4986249 mentions "some P4's". Running on > any newer chips doesn't show that behavior. I suggest to remove that option. > > CCC request was filed and approved. > From christian.thalinger at oracle.com Wed Mar 12 19:24:14 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 12 Mar 2014 12:24:14 -0700 Subject: RFR (S): 8031203: remove SafepointPollOffset In-Reply-To: <5320B164.4010509@oracle.com> References: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> <5320B164.4010509@oracle.com> Message-ID: <3D81D8B2-DD4C-4A34-87E3-3B691FA7240E@oracle.com> Thank you, Vladimir. On Mar 12, 2014, at 12:11 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 3/11/14 9:38 PM, Christian Thalinger wrote: >> https://bugs.openjdk.java.net/browse/JDK-8031203 >> http://cr.openjdk.java.net/~twisti/8031203 >> >> 8031203: remove SafepointPollOffset >> Reviewed-by: >> >> SafepointPollOffset is only used in C1 because of JDK-4986249 which was >> never reproducible with C2. >> >> This seems to be an architecture anomaly and confined to certain chip >> versions. An old comment of JDK-4986249 mentions "some P4's". Running on >> any newer chips doesn't show that behavior. I suggest to remove that option. >> >> CCC request was filed and approved. >> From roland.westrelin at oracle.com Wed Mar 12 19:24:51 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 12 Mar 2014 20:24:51 +0100 Subject: RFR (S): 8031203: remove SafepointPollOffset In-Reply-To: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> References: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> Message-ID: <529185E3-05B4-432A-BD1B-1663B6C9CC0F@oracle.com> > http://cr.openjdk.java.net/~twisti/8031203 That looks good to me. Roland. From christian.thalinger at oracle.com Wed Mar 12 19:43:14 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 12 Mar 2014 12:43:14 -0700 Subject: RFR (S): 8031203: remove SafepointPollOffset In-Reply-To: <529185E3-05B4-432A-BD1B-1663B6C9CC0F@oracle.com> References: <2F901183-4942-47CA-A1E8-91B6A0BEED78@oracle.com> <529185E3-05B4-432A-BD1B-1663B6C9CC0F@oracle.com> Message-ID: <60FB3115-425D-4727-9AF3-9263BC236C0B@oracle.com> Thank you, Roland. On Mar 12, 2014, at 12:24 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~twisti/8031203 > > That looks good to me. > > Roland. From vladimir.kozlov at oracle.com Thu Mar 13 00:57:21 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Mar 2014 17:57:21 -0700 Subject: RFR(L): 8031755: Type speculation should be used to optimize explicit null checks In-Reply-To: <87fvmnl4t2.fsf@oracle.com> References: <8B07BC32-F70C-4B10-AB8C-C82D1BA8AAFF@oracle.com> <531FB455.3070203@oracle.com> <87fvmnl4t2.fsf@oracle.com> Message-ID: <53210271.3070302@oracle.com> On 3/12/14 10:12 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for taking a look at this. > >>> http://cr.openjdk.java.net/~roland/8031755/webrev.00/ >>> >>> Most Type profiling points record whether a null reference was seen >>> but type speculation doesn?t currently take advantage of it. With >>> this change, not only is the type seen at a profiling point feed to >>> type speculation but also whether a null pointer was seen or >>> not. This is then used to optimize null checks. >>> >>> The _speculative field becomes: >>> const TypePtr* _speculative; >>> >>> so that if all we know from profiling is that a reference is not null: >>> _speculative = TypePtr::NOT NULL; >>> >>> When a type is met with a TypePtr (NULL_PTR for instance), the meet >>> must be applied to the speculative types as well. To keep the type >>> system symmetric, the _speculative field had to be moved to class >>> TypePtr. I also moved the _inline_depth there to keep all speculative >>> stuff together but with the current code it could have stayed in >>> class TypeOopPtr. >>> >>> Traps caused by a failed speculative null check are recorded with >>> speculative traps, similarly to what is done for a failed class >>> check. >> >> The idea is good, I think, but I need time to go through changes. >> >> What about changes in arguments.cpp and phaseX.cpp? > > In c2_globals.hpp > > product(intx, MaxNodeLimit, 80000, \ > "Maximum number of nodes") \ > > So today, we're decreasing the number of nodes if incremental inlining is on. Agree. > > In phaseX.cpp, the problem is that during an IGVN, some nodes that > become disconnected from the graph are not removed from the IGVN's hash > table. It's a bit ugly but maybe good enough for some verification code? Yes, handling dead nodes is less then satisfactory in C2. > >> I don't understand why you changed higher_equal_speculative() in >> parse2.cpp. > > I added a cleanup_speculative() to Type that removes the speculative > type if it's useless (not an exact klass for instance). So: > > 1289 TypeNode* ccast = new (C) CheckCastPPNode(control(), obj, tboth); > 1290 const Type* tcc = ccast->as_Type()->type(); > 1291 assert(tcc != obj_type && tcc->higher_equal(obj_type), "must improve"); > > The speculative type may disappear and > tcc->higher_equal_speculative(obj_type) is no longer true. Okay. > >>> I made the following change: >>> >>> *** 3561,3571 **** >>> >>> >>> // Since klasses are different, we require a LCA in the Java >>> // class hierarchy - which means we have to fall to at least NotNull. >>> if( ptr == TopPTR || ptr == AnyNull || ptr == Constant ) >>> ptr = NotNull; >>> >>> - instance_id = InstanceBot; >>> >>> >>> // Now we find the LCA of Java classes >>> ciKlass* k = this_klass->least_common_ancestor(tinst_klass); >>> return make(ptr, k, false, NULL, off, instance_id, speculative, depth); >>> } // End of case InstPtr >>> >>> --- 3706,3715 ?? >>> >>> because I hit a type not symmetric failure that I think it causes: >>> >>> === Meet Not Symmetric === >>> t = javax/management/openmbean/OpenType:AnyNull * (inline_depth=-2) >>> this= javax/management/openmbean/ArrayType:TopPTR *,iid=top (inline_depth=InlineDepthTop) >>> mt=(t meet this)= javax/management/openmbean/ArrayType:AnyNull * (inline_depth=-2) >>> t_dual= javax/management/openmbean/OpenType:NotNull *,iid=top (inline_depth=2) >>> this_dual= javax/management/openmbean/ArrayType * >>> mt_dual= javax/management/openmbean/ArrayType:NotNull *,iid=top (inline_depth=2) >>> mt_dual meet t_dual= javax/management/openmbean/OpenType:NotNull * (inline_depth=2) >>> mt_dual meet this_dual= javax/management/openmbean/ArrayType * >> >> The question is why OpenType:AnyNull does not have iid=top. It is in >> upper type lattice. > > I'll see if I can come up with a test case for this one. Thanks, Vladimir > > Roland. > From goetz.lindenmaier at sap.com Thu Mar 13 15:33:55 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 13 Mar 2014 15:33:55 +0000 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <5D92930D-3FA4-4657-9EB4-605480B03742@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA75E1@DEWDFEMB12A.global.corp.sap> <5D92930D-3FA4-4657-9EB4-605480B03742@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CEA79EF@DEWDFEMB12A.global.corp.sap> Hi Roland, I made a change that applies on top of http://cr.openjdk.java.net/~roland/8032410/webrev.01/hotspot.patch The patch is here: http://cr.openjdk.java.net/~goetz/webrevs/8032410-ppc_part.patch I checked that it works with cppInterpreter and templateInterpreter. In deoptimization.cpp I had to change the test for the top frame. Also, I cleaned up the ppc code a bit. Vladimir just pushed the ppc template interpreter to hs-comp. Where will you push your change? Obviously the ppc template interpreter part of the change will only apply in hs-comp currently. Thanks for giving me the time to implement this! Best regards, Goetz. -----Original Message----- From: Roland Westrelin [mailto:roland.westrelin at oracle.com] Sent: Mittwoch, 12. M?rz 2014 17:36 To: Lindenmaier, Goetz Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 Hi Goetz, > Thanks for considering ppc! I did the remaining required changes, but I still get > some errors. I'm working on it to get it resolved as soon as possible, and then > send the patch to you. Thanks for working on it. And making the suggestions below! > I think it's a good idea to refactor layout_activation(). While reading the code I > saw that you can further simplify the code: > * You can remove extra_locals in size_activation() on x86. It's never used. Right. Thanks. > * size_activation_helper on sparc must not use template M. (ci)Method is not used there. I had problems with the solaris C compiler that required some static functions to be template functions eventhough the template parameter was not used. > * After this, only max_locals() is used from M. I would propose to pass in max_locals() > and get rid of the template argument. Isn't it only max_stack()? That's a good suggestion. I will do that. Roland. > > Best regards, > Goetz. > > > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Mittwoch, 12. M?rz 2014 01:27 > To: Roland Westrelin; hotspot-compiler-dev at openjdk.java.net > Cc: ppc-aix-port-dev at openjdk.java.net > Subject: Re: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 > > On 3/11/14 9:35 AM, Roland Westrelin wrote: >> >> Hi Vladimir, >> >>> Changes are good in general. >> >> Thanks for reviewing this. >> >>> I don't see corresponding changes in MachPrologNode in >>> src/cpu/ppc/vm/ppc.ad. Do we need changes there? >> >> We must, I guess. I forgot about c2 ppc. I'll look into it. >> >>> src/share/vm/opto/output.cpp should be >>> >>> + DEBUG_ONLY(|| true))); >> >> Indeed. Thanks for spotting that. >> >>> On 3/6/14 3:08 AM, Roland Westrelin wrote: >>>> This test causes a deadlock because when the stack bang in the deopt >>>> or uncommon trap blobs triggers an exception, we throw the exception >>>> right away even if the deoptee has some monitors locked. We had >>>> several issues recently with the stack banging in the deopt/uncommon >>>> trap blobs and so rather than add more code to fix stack banging on >>>> deoptimization, this change removes the need for stack banging on >>>> deoptimization as discussed previously: >>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-February/013519.html >>>> >>>> The compilers compute by how much deoptimization would bang the stack >>>> at every possible deoptimization points in the compiled code and use >>>> the worst case to generate the stack banging in the nmethod. In debug >>>> builds, the stack banging code is still performed in the >>>> deopt/uncommon trap blobs but only to verify that the compiled code >>>> has done the stack banging correctly. Otherwise, the stack banging >>>> from deoptimization causes the VM to abort. >>>> >>>> This change contains some code >>>> refactoring. AbstractInterpreter::size_activation() is currently >>>> implemented as a call to AbstractInterpreter::layout_activation() but >>>> on most platforms, the logic to do the actual lay out of the >>>> activation and the logic to calculate its size are largely >>>> independent and having both done by layout_activation() feels wrong >>>> to me and error prone. I made AbstractInterpreter::size_activation() >>>> and AbstractInterpreter::layout_activation() two independent methods >>>> that share common helper functions if some code needs to be shared. I >>>> dropped unnecessary arguments to size_activation() in the current >>>> implementation as well. I also made it a template method so that it >>>> can be called with either a Method* (from the deoptimization code) or >>>> a ciMethod* (from the compilers). >>>> >>>> I created src/cpu/x86/vm/templateInterpreter_x86.cpp to put code that is common to 32 and 64 bit x86. > > In templateInterpreter_x86.cpp can you add {} for if statement?: > > 100 #ifdef ASSERT > 101 if (!EnableInvokeDynamic) > > >>>> >>>> This change in AbstractAssembler::generate_stack_overflow_check(): >>>> >>>> 137 int bang_end = (StackShadowPages+1)*page_size; >>>> >>>> is so that the stack banging code from the deopt/uncommon trap blobs >>>> and in the compiled code are consistent. Let's say frame size is less >>>> than 1 page. During deoptimization, we bang sp+1 page and then sp+2 >>>> pages . sp+(StackShadowPages+1) pages. If the frame size is > 1 page >>>> and <= 2 pages, we bang sp+1page, sp+2pages and then sp+3pages . >>>> sp+(ShadowPages+2) pages. In the compiled code, to be consistent, for >>>> a frame size of less than 1 page we need to bang >>>> sp+(StackShadowPages+1) pages, if it's > 1 page and <= 2 pages then >>>> we need to bang sp+(StackShadowPages+2) pages etc. >>> >>> With +1 you will touch yellow page because it is inclusive if I read it >>> right: >>> >>> while (bang_offset <= bang_end) { >>> >>> Can you test with StackShadowPages=1? >> >> Are you suggesting I run with StackShadowPages=1 to check if: >> >> 137 int bang_end = (StackShadowPages+1)*page_size; >> >> is ok? > > Yes, because you may be creating hole in banging if compiled code called > from interpreter. It should be consistent with > AbstractInterpreterGenerator::bang_stack_shadow_pages(). > > Should you also change it in generate_native_wrapper()? > >> What would I run with StackShadowPages=1? The hotspot-comp regression >> tests? All testing? > > I think compiler regression tests with -XX:+DeoptimizeALot flag should > be enough. > > Thanks, > Vladimir > >> >> Do you agree that if I revert to: >> >> 137 int bang_end = StackShadowPages*page_size; >> >> I need to change stack banging in the deopt/uncommon trap blobs to bang >> one less page? >> >> Roland. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> http://cr.openjdk.java.net/~roland/8032410/webrev.01/ >>>> >>>> Roland. >>>> From roland.westrelin at oracle.com Thu Mar 13 15:41:19 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 13 Mar 2014 16:41:19 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CEA79EF@DEWDFEMB12A.global.corp.sap> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA75E1@DEWDFEMB12A.global.corp.sap> <5D92930D-3FA4-4657-9EB4-605480B03742@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA79EF@DEWDFEMB12A.global.corp.sap> Message-ID: Hi Goetz, > I made a change that applies on top of > http://cr.openjdk.java.net/~roland/8032410/webrev.01/hotspot.patch > The patch is here: > http://cr.openjdk.java.net/~goetz/webrevs/8032410-ppc_part.patch Thanks. > I checked that it works with cppInterpreter and templateInterpreter. > In deoptimization.cpp I had to change the test for the top frame. > Also, I cleaned up the ppc code a bit. > > Vladimir just pushed the ppc template interpreter to hs-comp. > Where will you push your change? Obviously the ppc template interpreter > part of the change will only apply in hs-comp currently. I?ll push it to hs-comp. I?ve reworked the code to drop the template as you suggested. I?ll send that updated webrev. > Thanks for giving me the time to implement this! No problem. I still have some stuff to figure out for this change anyway. Roland. From igor.veresov at oracle.com Thu Mar 13 17:20:31 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 13 Mar 2014 10:20:31 -0700 Subject: RFR(XXS): A couple of type fixes in C1 Message-ID: I?d like to push a couple of tiny fixes of types in C1 that went unnoticed when porting to 64-bit. 8037140: C1: Incorrect argument type used for SharedRuntime::OSR_migration_end in LIRGenerator::do_Goto 8037149: C1: getThreadTemp should return a T_LONG register on 64bit The fixes don?t really change the generated code since the only affect some reg-reg moves. However it?d be nice to improve the overall type hygiene. Webrevs: http://cr.openjdk.java.net/~iveresov/8037149/webrev.00/ http://cr.openjdk.java.net/~iveresov/8037140/webrev.00/ Thanks, igor From vladimir.kozlov at oracle.com Thu Mar 13 17:25:09 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Mar 2014 10:25:09 -0700 Subject: RFR(XXS): A couple of type fixes in C1 In-Reply-To: References: Message-ID: <5321E9F5.20902@oracle.com> Seems reasonable. Thanks, Vladimir On 3/13/14 10:20 AM, Igor Veresov wrote: > I?d like to push a couple of tiny fixes of types in C1 that went unnoticed when porting to 64-bit. > > 8037140: C1: Incorrect argument type used for SharedRuntime::OSR_migration_end in LIRGenerator::do_Goto > 8037149: C1: getThreadTemp should return a T_LONG register on 64bit > > The fixes don?t really change the generated code since the only affect some reg-reg moves. However it?d be nice to improve the overall type hygiene. > > Webrevs: > http://cr.openjdk.java.net/~iveresov/8037149/webrev.00/ > http://cr.openjdk.java.net/~iveresov/8037140/webrev.00/ > > Thanks, > igor > From igor.veresov at oracle.com Thu Mar 13 17:34:02 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 13 Mar 2014 10:34:02 -0700 Subject: RFR(XXS): A couple of type fixes in C1 In-Reply-To: <5321E9F5.20902@oracle.com> References: <5321E9F5.20902@oracle.com> Message-ID: <762291F3-58EB-4200-8A82-6BBA7527BFA2@oracle.com> Thanks, Vladimir! igor On Mar 13, 2014, at 10:25 AM, Vladimir Kozlov wrote: > Seems reasonable. > > Thanks, > Vladimir > > On 3/13/14 10:20 AM, Igor Veresov wrote: >> I?d like to push a couple of tiny fixes of types in C1 that went unnoticed when porting to 64-bit. >> >> 8037140: C1: Incorrect argument type used for SharedRuntime::OSR_migration_end in LIRGenerator::do_Goto >> 8037149: C1: getThreadTemp should return a T_LONG register on 64bit >> >> The fixes don?t really change the generated code since the only affect some reg-reg moves. However it?d be nice to improve the overall type hygiene. >> >> Webrevs: >> http://cr.openjdk.java.net/~iveresov/8037149/webrev.00/ >> http://cr.openjdk.java.net/~iveresov/8037140/webrev.00/ >> >> Thanks, >> igor >> From christian.thalinger at oracle.com Thu Mar 13 20:43:07 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 13 Mar 2014 13:43:07 -0700 Subject: RFR(XXS): A couple of type fixes in C1 In-Reply-To: References: Message-ID: <2F4CB0B6-5BDF-4252-A16E-0BC853370C5B@oracle.com> Looks good. On Mar 13, 2014, at 10:20 AM, Igor Veresov wrote: > I?d like to push a couple of tiny fixes of types in C1 that went unnoticed when porting to 64-bit. > > 8037140: C1: Incorrect argument type used for SharedRuntime::OSR_migration_end in LIRGenerator::do_Goto > 8037149: C1: getThreadTemp should return a T_LONG register on 64bit > > The fixes don?t really change the generated code since the only affect some reg-reg moves. However it?d be nice to improve the overall type hygiene. > > Webrevs: > http://cr.openjdk.java.net/~iveresov/8037149/webrev.00/ > http://cr.openjdk.java.net/~iveresov/8037140/webrev.00/ > > Thanks, > igor From igor.veresov at oracle.com Thu Mar 13 21:53:24 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 13 Mar 2014 14:53:24 -0700 Subject: RFR(XXS): A couple of type fixes in C1 In-Reply-To: <2F4CB0B6-5BDF-4252-A16E-0BC853370C5B@oracle.com> References: <2F4CB0B6-5BDF-4252-A16E-0BC853370C5B@oracle.com> Message-ID: Thanks, Chris! igor On Mar 13, 2014, at 1:43 PM, Christian Thalinger wrote: > Looks good. > > On Mar 13, 2014, at 10:20 AM, Igor Veresov wrote: > >> I?d like to push a couple of tiny fixes of types in C1 that went unnoticed when porting to 64-bit. >> >> 8037140: C1: Incorrect argument type used for SharedRuntime::OSR_migration_end in LIRGenerator::do_Goto >> 8037149: C1: getThreadTemp should return a T_LONG register on 64bit >> >> The fixes don?t really change the generated code since the only affect some reg-reg moves. However it?d be nice to improve the overall type hygiene. >> >> Webrevs: >> http://cr.openjdk.java.net/~iveresov/8037149/webrev.00/ >> http://cr.openjdk.java.net/~iveresov/8037140/webrev.00/ >> >> Thanks, >> igor > From rednaxelafx at gmail.com Fri Mar 14 00:35:30 2014 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 13 Mar 2014 17:35:30 -0700 Subject: Question on ciInstanceKlass::has_subklass() and unique_concrete_subklass() Message-ID: Hi all, There something I'm confused about around the way ciInstanceKlass caches values of shared ci klasses. A "shared" ci klass means it's created during ciObjectFactory initialization, and shared among all ciEnv instances. Let me use the code in current JDK9 tip to be concrete: has_subklass(): http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/vm/ci/ciInstanceKlass.hpp#l137 bool has_subklass() { assert(is_loaded(), "must be loaded"); if (_is_shared && !_has_subklass) { if (flags().is_final()) { return false; } else { return compute_shared_has_subklass(); } } return _has_subklass; } The way it's implemented, it makes sure that if a shared ci klass didn't see a subklass before, it can go fetch the current value from the runtime. The problem is: once such a klass has seen a subklass, it stays that way, and disregards class unloading that may happen later on, which may make the cached value different from the current actual value in the runtime InstanceKlass. unique_concrete_subklass(): http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/vm/ci/ciInstanceKlass.cpp#l347 // ------------------------------------------------------------------ // ciInstanceKlass::unique_concrete_subklass ciInstanceKlass* ciInstanceKlass::unique_concrete_subklass() { if (!is_loaded()) return NULL; // No change if class is not loaded if (!is_abstract()) return NULL; // Only applies to abstract classes. if (!has_subklass()) return NULL; // Must have at least one subklass. VM_ENTRY_MARK; InstanceKlass* ik = get_instanceKlass(); Klass* up = ik->up_cast_abstract(); assert(up->oop_is_instance(), "must be InstanceKlass"); if (ik == up) { return NULL; } return CURRENT_THREAD_ENV->get_instance_klass(up); } This function depends on the klass having a subklass, but as mentioned above, has_subklass() could return cached true for a shared ci klass even if the real value in the runtime is false. That's not necessarily unsafe, but logically it doesn't look right. It's interesting to see that ciInstanceKlass never caches the _implementor value for shared ci klasses. It may be a bit slow having to call into the runtime to get the _implementor value every time, but at least it's safe and sane. My question is why doesn't has_subklass() follow the model of implementor(), and call into runtime every time for shared ci klasses? Thanks, Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Fri Mar 14 13:42:59 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 14 Mar 2014 14:42:59 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <531FA9EB.2090906@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> Message-ID: <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> Hi Vladimir, >>> With +1 you will touch yellow page because it is inclusive if I read it >>> right: >>> >>> while (bang_offset <= bang_end) { >>> >>> Can you test with StackShadowPages=1? >> >> Are you suggesting I run with StackShadowPages=1 to check if: >> >> 137 int bang_end = (StackShadowPages+1)*page_size; >> >> is ok? > > Yes, because you may be creating hole in banging if compiled code called from interpreter. It should be consistent with AbstractInterpreterGenerator::bang_stack_shadow_pages(). The VM doesn?t like running with StackShadowPages=1. Every crash that I have running the regression tests with (StackShadowPages+1)*page_size, I can reproduce with StackShadowPages*page_size. The bad thing that could happen would be for stack banging to hit the red zone or even past the red zone, right? I tried to write a test case that would cause this to happen and my conclusion is that it?s not possible as long as we have more than one yellow page. Let?s say StackShadowPages=1. Worst case is that sp points somewhere in the shadow page when we enter a compiled method. Then we bang sp + 2 pages which hits in the second yellow page. This said, I don?t understand why 8026775 changed, in macroAssembler_.cpp: - for (int i = 0; i< StackShadowPages-1; i++) { to + for (int i = 1; i <= StackShadowPages; i++) { for the stack banging during deoptimizations. To me: + for (int i = 1; i < StackShadowPages; i++) { would have been good enough. So what would make sense to me is to use: + for (int i = 1; i < StackShadowPages; i++) { for the stack banging from the deopt blobs. And: int bang_end = StackShadowPages*page_size; for the stack banging from compiled code. Roland. From bharadwaj.yadavalli at oracle.com Fri Mar 14 16:18:54 2014 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Fri, 14 Mar 2014 12:18:54 -0400 Subject: [JDK9] RFR(XXS): JDK-8036576 - jtreg failed on Test6792161 timed out Message-ID: <53232BEE.4090100@oracle.com> Please review the proposed change to timeout value for the test. The reasoning for the increase is as follows: When the test is run using fastdebug VM, it executes additional debug code. Verification of IR during Range Check Elimination phase of C1 compiler is one such additional debug code executed. The runtime complexity of verification step is non-linear. Specifically, the execution of this test forces compilation and sets an increased inlining size. The increased inlining size results in increased size of compilation units in (at least some) method compilations. Consequently, the IR verification consumes significant portion of runtime. Execution of the test using a fastdebug VM built with tip of hotspot-comp tree $ time ../hotspot-comp/build/solaris/jdk-solaris-sparcv9/fastdebug/bin/java -Xcomp -XX:MaxInlineSize=120 Test6792161 real 4m19.321s user 4m19.636s sys 0m0.719s compared to that using hotspot built without the the above verification: $ time ../hotspot-comp/build/solaris/jdk-solaris-sparcv9/fastdebug/bin/java -Xcomp -XX:MaxInlineSize=120 Test6792161 real 1m31.486s user 1m31.863s sys 0m0.743s So, it appears that the timeout value needs to be increased from 300 if we use fastdebug VM to run this test. Changeset at http://cr.openjdk.java.net/~bharadwaj/8036576/webrev_0/ Bug report at https://bugs.openjdk.java.net/browse/JDK-8036576 Thanks, Bharadwaj From vladimir.kozlov at oracle.com Fri Mar 14 16:29:31 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Mar 2014 09:29:31 -0700 Subject: [JDK9] RFR(XXS): JDK-8036576 - jtreg failed on Test6792161 timed out In-Reply-To: <53232BEE.4090100@oracle.com> References: <53232BEE.4090100@oracle.com> Message-ID: <53232E6B.5010504@oracle.com> Looks good. We run our nightly testing with fastdebug VM only. Thanks, Vladimir On 3/14/14 9:18 AM, S. Bharadwaj Yadavalli wrote: > Please review the proposed change to timeout value for the test. The reasoning for the increase is as follows: > > When the test is run using fastdebug VM, it executes additional debug code. Verification of IR during Range Check > Elimination phase of C1 compiler is one such additional debug code executed. The runtime complexity of verification step > is non-linear. Specifically, the execution of this test forces compilation and sets an increased inlining size. The > increased inlining size results in increased size of compilation units in (at least some) method compilations. > Consequently, the IR verification consumes significant portion of runtime. Execution of the test using a fastdebug VM > built with tip of hotspot-comp tree > > $ time ../hotspot-comp/build/solaris/jdk-solaris-sparcv9/fastdebug/bin/java -Xcomp -XX:MaxInlineSize=120 Test6792161 > real 4m19.321s > user 4m19.636s > sys 0m0.719s > > compared to that using hotspot built without the the above verification: > > $ time ../hotspot-comp/build/solaris/jdk-solaris-sparcv9/fastdebug/bin/java -Xcomp -XX:MaxInlineSize=120 Test6792161 > > real 1m31.486s > user 1m31.863s > sys 0m0.743s > > So, it appears that the timeout value needs to be increased from 300 if we use fastdebug VM to run this test. > > Changeset at http://cr.openjdk.java.net/~bharadwaj/8036576/webrev_0/ > Bug report at https://bugs.openjdk.java.net/browse/JDK-8036576 > > Thanks, > > Bharadwaj > > > From bharadwaj.yadavalli at oracle.com Fri Mar 14 16:36:51 2014 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Fri, 14 Mar 2014 12:36:51 -0400 Subject: [JDK9] RFR(XXS): JDK-8036576 - jtreg failed on Test6792161 timed out In-Reply-To: <53232E6B.5010504@oracle.com> References: <53232BEE.4090100@oracle.com> <53232E6B.5010504@oracle.com> Message-ID: <53233023.4040107@oracle.com> Thank you, Vladimir. Bharadwaj On 03/14/2014 12:29 PM, Vladimir Kozlov wrote: > Looks good. > > We run our nightly testing with fastdebug VM only. > > Thanks, > Vladimir > > On 3/14/14 9:18 AM, S. Bharadwaj Yadavalli wrote: >> Please review the proposed change to timeout value for the test. The >> reasoning for the increase is as follows: >> <...> From christian.thalinger at oracle.com Fri Mar 14 23:57:08 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 14 Mar 2014 16:57:08 -0700 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> Message-ID: <0F2960C5-BACB-42D6-A8DB-E751C39D68A7@oracle.com> On Mar 14, 2014, at 6:42 AM, Roland Westrelin wrote: > > Hi Vladimir, > >>>> With +1 you will touch yellow page because it is inclusive if I read it >>>> right: >>>> >>>> while (bang_offset <= bang_end) { >>>> >>>> Can you test with StackShadowPages=1? >>> >>> Are you suggesting I run with StackShadowPages=1 to check if: >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is ok? >> >> Yes, because you may be creating hole in banging if compiled code called from interpreter. It should be consistent with AbstractInterpreterGenerator::bang_stack_shadow_pages(). > > The VM doesn?t like running with StackShadowPages=1. Every crash that I have running the regression tests with (StackShadowPages+1)*page_size, I can reproduce with StackShadowPages*page_size. > > The bad thing that could happen would be for stack banging to hit the red zone or even past the red zone, right? I tried to write a test case that would cause this to happen and my conclusion is that it?s not possible as long as we have more than one yellow page. Let?s say StackShadowPages=1. Worst case is that sp points somewhere in the shadow page when we enter a compiled method. Then we bang sp + 2 pages which hits in the second yellow page. > > This said, I don?t understand why 8026775 changed, in macroAssembler_.cpp: > > - for (int i = 0; i< StackShadowPages-1; i++) { > > to > > + for (int i = 1; i <= StackShadowPages; i++) { > > for the stack banging during deoptimizations. To me: > > + for (int i = 1; i < StackShadowPages; i++) { > > would have been good enough. So what would make sense to me is to use: > > + for (int i = 1; i < StackShadowPages; i++) { > > for the stack banging from the deopt blobs. And: > > int bang_end = StackShadowPages*page_size; > > for the stack banging from compiled code. This was very hairy to get right and hopefully Mikael still has all the details swapped in. > > Roland. From vladimir.kozlov at oracle.com Sat Mar 15 00:06:58 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Mar 2014 17:06:58 -0700 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> Message-ID: <532399A2.8080209@oracle.com> On 3/14/14 6:42 AM, Roland Westrelin wrote: > > Hi Vladimir, > >>>> With +1 you will touch yellow page because it is inclusive if I read it >>>> right: >>>> >>>> while (bang_offset <= bang_end) { >>>> >>>> Can you test with StackShadowPages=1? >>> >>> Are you suggesting I run with StackShadowPages=1 to check if: >>> >>> 137 int bang_end = (StackShadowPages+1)*page_size; >>> >>> is ok? >> >> Yes, because you may be creating hole in banging if compiled code called from interpreter. It should be consistent with AbstractInterpreterGenerator::bang_stack_shadow_pages(). > > The VM doesn?t like running with StackShadowPages=1. Every crash that I have running the regression tests with (StackShadowPages+1)*page_size, I can reproduce with StackShadowPages*page_size. > > The bad thing that could happen would be for stack banging to hit the red zone or even past the red zone, right? I tried to write a test case that would cause this to happen and my conclusion is that it?s not possible as long as we have more than one yellow page. Let?s say StackShadowPages=1. Worst case is that sp points somewhere in the shadow page when we enter a compiled method. Then we bang sp + 2 pages which hits in the second yellow page. You are right, I agree that we can touch the yellow page. May be we should change low limit for StackYellowPages in Arguments::check_stack_pages() (current default values are >=2). > > This said, I don?t understand why 8026775 changed, in macroAssembler_.cpp: > > - for (int i = 0; i< StackShadowPages-1; i++) { > > to > > + for (int i = 1; i <= StackShadowPages; i++) { > > for the stack banging during deoptimizations. To me: > > + for (int i = 1; i < StackShadowPages; i++) { > > would have been good enough. So what would make sense to me is to use: > > + for (int i = 1; i < StackShadowPages; i++) { > > for the stack banging from the deopt blobs. And: > > int bang_end = StackShadowPages*page_size; > > for the stack banging from compiled code. Mikael matched code in interpreter in AbstractInterpreterGenerator::bang_stack_shadow_pages(). This way we consistently touching the same number of pages. What I want to avoid is 8026775 situation when a middle page is not touched. My main concern now is that you changed the page where banging *starts* even so the variable is named bang_end: // This is how far the previous frame's stack banging extended. const int bang_end_safe = bang_end; But after 8026775 change (<=StackShadowPages) your changes seems right since StackShadowPages*page_size is already touched. You may be touching +1 page but as we discussed above with StackYellowPages >=2 it should be fine. We need to share drawing board :) thanks, Vladimir > > Roland. > From morris.meyer at oracle.com Mon Mar 17 13:51:15 2014 From: morris.meyer at oracle.com (Morris Meyer) Date: Mon, 17 Mar 2014 09:51:15 -0400 Subject: RFR(L) 8001532: C2 node files refactoring Message-ID: <5326FDD3.8080005@oracle.com> Folks, Could I get a review for 8001532 - refactoring the old connode file in the C2 source tree? I've broken connode into bitsnode, castnode, constnode, convertnode, movenode, narrownode, optonode and threadnode. This change has been through JPRT. --morris meyer JBS - https://bugs.openjdk.java.net/browse/JDK-8001532 WEBREV - http://cr.openjdk.java.net/~morris/8001532.01 From roland.westrelin at oracle.com Mon Mar 17 18:23:53 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 17 Mar 2014 19:23:53 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <532399A2.8080209@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> <532399A2.8080209@oracle.com> Message-ID: > Mikael matched code in interpreter in AbstractInterpreterGenerator::bang_stack_shadow_pages(). This way we consistently touching the same number of pages. > > What I want to avoid is 8026775 situation when a middle page is not touched. My main concern now is that you changed the page where banging *starts* even so the variable is named bang_end: > > // This is how far the previous frame's stack banging extended. > const int bang_end_safe = bang_end; > > But after 8026775 change (<=StackShadowPages) your changes seems right since StackShadowPages*page_size is already touched. > > You may be touching +1 page but as we discussed above with StackYellowPages >=2 it should be fine. Thanks, Vladimir. Here is a new webrev: http://cr.openjdk.java.net/~roland/8032410/webrev.02/ that should take all comments into account. In particular as suggested by Goetz I got rid of the templates. The native wrappers now bang to StackShadowPages+1. Roland. From christian.thalinger at oracle.com Mon Mar 17 18:26:35 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 17 Mar 2014 11:26:35 -0700 Subject: RFR(L) 8001532: C2 node files refactoring In-Reply-To: <5326FDD3.8080005@oracle.com> References: <5326FDD3.8080005@oracle.com> Message-ID: <8EB41922-3CC4-4FE9-8D72-3286BC7771DD@oracle.com> Did you try to build on Linux and/or OS X without pre-compiled headers? src/share/vm/memory/allocation.hpp: Did these changes accidentally slip in? src/share/vm/opto/matcher.hpp: +class BinaryNode : public Node { Not sure if this is a good place for BinaryNode. matcher.hpp shouldn?t contain any nodes. I don?t want to put the burden upon you but it would be *really* nice if we could unify class and method comments. Preferably Doxygen format. Otherwise this looks good. On Mar 17, 2014, at 6:51 AM, Morris Meyer wrote: > Folks, > > Could I get a review for 8001532 - refactoring the old connode file in the C2 source tree? > > I've broken connode into bitsnode, castnode, constnode, convertnode, movenode, narrownode, optonode and threadnode. > > This change has been through JPRT. > > --morris meyer > > JBS - https://bugs.openjdk.java.net/browse/JDK-8001532 > WEBREV - http://cr.openjdk.java.net/~morris/8001532.01 From vladimir.kozlov at oracle.com Mon Mar 17 22:20:27 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Mar 2014 15:20:27 -0700 Subject: RFR(L) 8001532: C2 node files refactoring In-Reply-To: <5326FDD3.8080005@oracle.com> References: <5326FDD3.8080005@oracle.com> Message-ID: <5327752B.3080601@oracle.com> File names usually match base class name of ideal nodes. Please change: constnode back to connode bitsnode --> countbitsnode narrownode --> narrowptrnode optonode --> opaquenode PartialSubtypeCheckNode class should be in new intrinsicnode file together with other similar classes from memnode files: StrIntrinsicNode and related, EncodeISOArrayNode. ThreadLocalNode can be kept in connode because it is kind of a constant pointer value. Put BinaryNode into movenode.hpp since it references cmove nodes. constnode.hpp is included into callnode.hpp so you don't need to include it into files which have callnode.hpp included. Yes, we had it before but you are cleaning the code. thanks, Vladimir On 3/17/14 6:51 AM, Morris Meyer wrote: > Folks, > > Could I get a review for 8001532 - refactoring the old connode file in > the C2 source tree? > > I've broken connode into bitsnode, castnode, constnode, convertnode, > movenode, narrownode, optonode and threadnode. > > This change has been through JPRT. > > --morris meyer > > JBS - https://bugs.openjdk.java.net/browse/JDK-8001532 > WEBREV - http://cr.openjdk.java.net/~morris/8001532.01 From goetz.lindenmaier at sap.com Tue Mar 18 09:13:27 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 18 Mar 2014 09:13:27 +0000 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> <532399A2.8080209@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CEA8BEA@DEWDFEMB12A.global.corp.sap> Hi Roland, The change in deoptimization.cpp does not reflect what I fixed in the patch I sent to you. You need to pass index==0 for test of top_frame. Best regards, Goetz. -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin Sent: Monday, March 17, 2014 7:24 PM To: Vladimir Kozlov Cc: hotspot compiler; ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 > Mikael matched code in interpreter in AbstractInterpreterGenerator::bang_stack_shadow_pages(). This way we consistently touching the same number of pages. > > What I want to avoid is 8026775 situation when a middle page is not touched. My main concern now is that you changed the page where banging *starts* even so the variable is named bang_end: > > // This is how far the previous frame's stack banging extended. > const int bang_end_safe = bang_end; > > But after 8026775 change (<=StackShadowPages) your changes seems right since StackShadowPages*page_size is already touched. > > You may be touching +1 page but as we discussed above with StackYellowPages >=2 it should be fine. Thanks, Vladimir. Here is a new webrev: http://cr.openjdk.java.net/~roland/8032410/webrev.02/ that should take all comments into account. In particular as suggested by Goetz I got rid of the templates. The native wrappers now bang to StackShadowPages+1. Roland. From lev.priima at oracle.com Tue Mar 18 13:35:10 2014 From: lev.priima at oracle.com (Lev Priima) Date: Tue, 18 Mar 2014 17:35:10 +0400 Subject: RFR(XS): 8037589: PrintFlagsFinalGetter to testlibrary Message-ID: <53284B8E.5020303@oracle.com> Please review and help me with integration: Bug: https://bugs.openjdk.java.net/browse/JDK-8037589 Webrev: in attachment -- Best regards, Lev -------------- next part -------------- A non-text attachment was scrubbed... Name: webrev.zip Type: application/zip Size: 17648 bytes Desc: not available URL: From roland.westrelin at oracle.com Tue Mar 18 13:50:51 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Mar 2014 14:50:51 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CEA8BEA@DEWDFEMB12A.global.corp.sap> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> <532399A2.8080209@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA8BEA@DEWDFEMB12A.global.corp.sap> Message-ID: <8256510E-9E2F-428E-95C5-1176F243A3F7@oracle.com> Hi Goetz, > The change in deoptimization.cpp does not reflect what I fixed > in the patch I sent to you. > You need to pass index==0 for test of top_frame. Yes, I missed that. Thanks for spotting it. Here is a new webrev: http://cr.openjdk.java.net/~roland/8032410/webrev.03/ Roland. From igor.ignatyev at oracle.com Tue Mar 18 13:58:09 2014 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 18 Mar 2014 17:58:09 +0400 Subject: RFR(XS): 8037589: PrintFlagsFinalGetter to testlibrary In-Reply-To: <53284B8E.5020303@oracle.com> References: <53284B8E.5020303@oracle.com> Message-ID: <532850F1.5010107@oracle.com> Lev, I've uploaded your webrev: http://cr.openjdk.java.net/~iignatyev/lpriima/8037589/webrev.00/ What is the common use case for PrintFlagsFinalGetter? I don't understand when you need use getFlagsFinal instead of getWithVMOpts. So I'd prefer to have one method and use ProcessTools::executeTestJvm instead of createJavaProcessBuilder Igor On 03/18/2014 05:35 PM, Lev Priima wrote: > Please review and help me with integration: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8037589 > Webrev: in attachment > From goetz.lindenmaier at sap.com Tue Mar 18 14:14:39 2014 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 18 Mar 2014 14:14:39 +0000 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <8256510E-9E2F-428E-95C5-1176F243A3F7@oracle.com> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> <532399A2.8080209@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA8BEA@DEWDFEMB12A.global.corp.sap> <8256510E-9E2F-428E-95C5-1176F243A3F7@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CEA8D6E@DEWDFEMB12A.global.corp.sap> Hi, the changes to templatInterpreter_ppc and the ad file are missing, too. Please add them, then I'll test the webrev on ppc. Thanks, Goetz. -----Original Message----- From: Roland Westrelin [mailto:roland.westrelin at oracle.com] Sent: Tuesday, March 18, 2014 2:51 PM To: Lindenmaier, Goetz Cc: Vladimir Kozlov; hotspot compiler; ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 Hi Goetz, > The change in deoptimization.cpp does not reflect what I fixed > in the patch I sent to you. > You need to pass index==0 for test of top_frame. Yes, I missed that. Thanks for spotting it. Here is a new webrev: http://cr.openjdk.java.net/~roland/8032410/webrev.03/ Roland. From roland.westrelin at oracle.com Tue Mar 18 14:21:58 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Mar 2014 15:21:58 +0100 Subject: RFR(L): 8032410: compiler/uncommontrap/TestStackBangRbp.java times out on Solaris-Sparc V9 In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CEA8D6E@DEWDFEMB12A.global.corp.sap> References: <09C8FEED-84F2-4F1A-A845-5DC8A76CF80E@oracle.com> <531E62B6.1000708@oracle.com> <87pplslmlc.fsf@oracle.com> <531FA9EB.2090906@oracle.com> <1B57AAEA-FE0D-40FC-850D-DC1E4BCD75CB@oracle.com> <532399A2.8080209@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA8BEA@DEWDFEMB12A.global.corp.sap> <8256510E-9E2F-428E-95C5-1176F243A3F7@oracle.com> <4295855A5C1DE049A61835A1887419CC2CEA8D6E@DEWDFEMB12A.global.corp.sap> Message-ID: <1707C0ED-1F21-40DC-B64C-32A3A53AED7E@oracle.com> > the changes to templatInterpreter_ppc and the ad file are missing, too. > Please add them, then I'll test the webrev on ppc. The patch doesn?t apply cleanly any more because I removed the template methods. Can you provide an updated patch? Roland. From lev.priima at oracle.com Tue Mar 18 14:31:24 2014 From: lev.priima at oracle.com (Lev Priima) Date: Tue, 18 Mar 2014 18:31:24 +0400 Subject: RFR(XS): 8037589: PrintFlagsFinalGetter to testlibrary In-Reply-To: <532850F1.5010107@oracle.com> References: <53284B8E.5020303@oracle.com> <532850F1.5010107@oracle.com> Message-ID: <532858BC.9070409@oracle.com> info from bug description in jbs: On 03/18/2014 05:58 PM, Igor Ignatyev wrote: > Lev, > > I've uploaded your webrev: > http://cr.openjdk.java.net/~iignatyev/lpriima/8037589/webrev.00/ > > What is the common use case for PrintFlagsFinalGetter? when writing test for java products which does not contain JMX(e.g, compact1, compact2) we have to read options of VM. Class com.oracle.java.testlibrary.PrintFlagsFinalGetter may help in testdev with it > I don't understand when you need use getFlagsFinal instead of > getWithVMOpts. So I'd prefer to have one method and use > ProcessTools::executeTestJvm instead of createJavaProcessBuilder > Igor > > On 03/18/2014 05:35 PM, Lev Priima wrote: >> Please review and help me with integration: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8037589 >> Webrev: in attachment >> Lev -------------- next part -------------- An HTML attachment was scrubbed... URL: From morris.meyer at oracle.com Tue Mar 18 15:57:54 2014 From: morris.meyer at oracle.com (Morris Meyer) Date: Tue, 18 Mar 2014 11:57:54 -0400 Subject: RFR(L) 8001532: C2 node files refactoring In-Reply-To: <8EB41922-3CC4-4FE9-8D72-3286BC7771DD@oracle.com> References: <5326FDD3.8080005@oracle.com> <8EB41922-3CC4-4FE9-8D72-3286BC7771DD@oracle.com> Message-ID: <53286D02.9020709@oracle.com> Christian, Thanks for the review. Comments inline. On 3/17/14, 2:26 PM, Christian Thalinger wrote: > Did you try to build on Linux and/or OS X without pre-compiled headers? Yes - Tried on the Mac without pre-compiled headers. > src/share/vm/memory/allocation.hpp: > > Did these changes accidentally slip in? Yes - purged. > src/share/vm/opto/matcher.hpp: > > +class BinaryNode : public Node { > > Not sure if this is a good place for BinaryNode. matcher.hpp shouldn?t contain any nodes. Moved this to movenode. > I don?t want to put the burden upon you but it would be *really* nice if we could unify class and method comments. Preferably Doxygen format. Added doxygen comments to convertnode.hpp. Maybe adding Doxygen comments could be a starter task? > Otherwise this looks good. JPRT - hotspotwest - 2014-03-18-144255.mameyer.8001532 WEBREV - http://cr.openjdk.java.net/~morris/8001532.02 --mm > On Mar 17, 2014, at 6:51 AM, Morris Meyer wrote: > >> Folks, >> >> Could I get a review for 8001532 - refactoring the old connode file in the C2 source tree? >> >> I've broken connode into bitsnode, castnode, constnode, convertnode, movenode, narrownode, optonode and threadnode. >> >> This change has been through JPRT. >> >> --morris meyer >> >> JBS -https://bugs.openjdk.java.net/browse/JDK-8001532 >> WEBREV -http://cr.openjdk.java.net/~morris/8001532.01 From roland.westrelin at oracle.com Tue Mar 18 18:38:21 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Mar 2014 19:38:21 +0100 Subject: RFR (S): 8007988: PrintInlining output is inconsistent with incremental inlining Message-ID: <8A3B1D76-7B9B-4EBA-9352-B0F5A549BD0E@oracle.com> The PrintInlining output with incremental inlining can be a mess. Currently it can print 2 lines for a single call site with conflicting messages and when that doesn?t happen sometimes report a failed inlining with a wrong reason. PrintInlining messages are currently stored in a list of buffers. Every new message is appended to the current buffer in the list. When a call site is enqueued for late inlining then we allocate a new buffer for the subsequent messages and enqueue it after the current buffer. The new buffer becomes the current buffer. When we inline the call site, we can then insert a new buffer for new messages in between the two 2 buffers and keep inlining messages ordered. This doesn?t work well because, when we enqueue a late inlining call site, we?ve already appended the PrintInlining messages for this call site to the current buffer. When we try to inline that late inlining call site, we want to be able to change those messages (it may have failed the first time around but succeeds now or it may have reported success but have been delayed and now we are out of nodes so we can?t do the inlining). To achieve this, this change appends PrintInlining messages for the current call site to a staging buffer, then if the call site is a late inlining call site, we allocate a new buffer and add the messages to that buffer. This way we can go back later and update the messages. This should also solve the assert failures with PrintInlining that are sometimes seen: https://bugs.openjdk.java.net/browse/JDK-8028274 http://cr.openjdk.java.net/~roland/8007988/webrev.00/ Roland. From vladimir.kozlov at oracle.com Tue Mar 18 18:53:29 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Mar 2014 11:53:29 -0700 Subject: RFR(XS): 8037589: PrintFlagsFinalGetter to testlibrary In-Reply-To: <532858BC.9070409@oracle.com> References: <53284B8E.5020303@oracle.com> <532850F1.5010107@oracle.com> <532858BC.9070409@oracle.com> Message-ID: <53289629.1060106@oracle.com> I don't get why you forking new JVM process to get flags value. It is very expensive way to get info. Why not use WB api to get flags values? Current JVM should run already with all flags specified on command line. Thanks, Vladimir On 3/18/14 7:31 AM, Lev Priima wrote: > info from bug description in jbs: > On 03/18/2014 05:58 PM, Igor Ignatyev wrote: >> Lev, >> >> I've uploaded your webrev: >> http://cr.openjdk.java.net/~iignatyev/lpriima/8037589/webrev.00/ >> >> What is the common use case for PrintFlagsFinalGetter? > when writing test for java products which does not contain JMX(e.g, > compact1, compact2) we have to read options of VM. Class > com.oracle.java.testlibrary.PrintFlagsFinalGetter may help in testdev > with it >> I don't understand when you need use getFlagsFinal instead of >> getWithVMOpts. So I'd prefer to have one method and use >> ProcessTools::executeTestJvm instead of createJavaProcessBuilder >> Igor >> >> On 03/18/2014 05:35 PM, Lev Priima wrote: >>> Please review and help me with integration: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8037589 >>> Webrev: in attachment >>> > Lev From roland.westrelin at oracle.com Tue Mar 18 18:56:15 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Mar 2014 19:56:15 +0100 Subject: RFR(S): 8005079: fix LogCompilation for incremental inlining Message-ID: This fixes the LogCompilation tool when incremental inlining happens. It required some extra data in the log file. I also changed the way the class names are reported in the log output so it uses the same as PrintInlining. It?s especially useful for lambda form: java.lang.invoke.LambdaForm$MH/1282811396 rather than java/lang/invoke/LambdaForm$MH (which makes it hard to know what LF this is) http://cr.openjdk.java.net/~roland/8005079/webrev.00/ Roland. From igor.ignatyev at oracle.com Tue Mar 18 19:28:39 2014 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 18 Mar 2014 23:28:39 +0400 Subject: RFR(XS): 8037589: PrintFlagsFinalGetter to testlibrary In-Reply-To: <53289629.1060106@oracle.com> References: <53284B8E.5020303@oracle.com> <532850F1.5010107@oracle.com> <532858BC.9070409@oracle.com> <53289629.1060106@oracle.com> Message-ID: <53289E67.1040808@oracle.com> Vladimir, I agree w/ you, WB api is better. I've a prototype for such api: > template > bool SetVMFlag(JavaThread* thread, JNIEnv* env, jstring name, T* value, bool (*TAtPut)(const char*, T*, Flag::Flags)) { > if (name == NULL) { > return false; > } > ThreadToNativeFromVM ttnfv(thread); // can't be in VM when we call JNI > const char* flag_name = env->GetStringUTFChars(name, NULL); > bool result = (*TAtPut)(flag_name, value, Flag::INTERNAL); > env->ReleaseStringUTFChars(name, flag_name); > return result; > } > > WB_ENTRY(jboolean, WB_TestSetBooleanVMFlag(JNIEnv* env, jobject o, jstring name, jboolean value)) > bool result = value; > SetVMFlag (thread, env, name, &result, &CommandLineFlags::boolAtPut); > return result; > WB_END ... I am going to integrate it as a part of 8028595. Lev, does WB api work for you? if yes, you can close 8037589 as a dup of 8028595/8032449. Igor On 03/18/2014 10:53 PM, Vladimir Kozlov wrote: > I don't get why you forking new JVM process to get flags value. It is > very expensive way to get info. > Why not use WB api to get flags values? Current JVM should run already > with all flags specified on command line. > > Thanks, > Vladimir > > On 3/18/14 7:31 AM, Lev Priima wrote: >> info from bug description in jbs: >> On 03/18/2014 05:58 PM, Igor Ignatyev wrote: >>> Lev, >>> >>> I've uploaded your webrev: >>> http://cr.openjdk.java.net/~iignatyev/lpriima/8037589/webrev.00/ >>> >>> What is the common use case for PrintFlagsFinalGetter? >> when writing test for java products which does not contain JMX(e.g, >> compact1, compact2) we have to read options of VM. Class >> com.oracle.java.testlibrary.PrintFlagsFinalGetter may help in testdev >> with it >>> I don't understand when you need use getFlagsFinal instead of >>> getWithVMOpts. So I'd prefer to have one method and use >>> ProcessTools::executeTestJvm instead of createJavaProcessBuilder >>> Igor >>> >>> On 03/18/2014 05:35 PM, Lev Priima wrote: >>>> Please review and help me with integration: >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8037589 >>>> Webrev: in attachment >>>> >> Lev From morris.meyer at oracle.com Tue Mar 18 20:34:42 2014 From: morris.meyer at oracle.com (Morris Meyer) Date: Tue, 18 Mar 2014 16:34:42 -0400 Subject: RFR(L) 8001532: C2 node files refactoring In-Reply-To: <5327752B.3080601@oracle.com> References: <5326FDD3.8080005@oracle.com> <5327752B.3080601@oracle.com> Message-ID: <5328ADE2.9020301@oracle.com> Thanks for the review Vladimir. Here is the webrev modified from yours and Christian's feedback. --mm JPRT - hotspotest - 2014-03-18-190819.mameyer.8001532 WEBREV - http://cr.openjdk.java.net/~morris/8001532.03 On 3/17/14, 6:20 PM, Vladimir Kozlov wrote: > File names usually match base class name of ideal nodes. Please change: > > constnode back to connode > bitsnode --> countbitsnode > narrownode --> narrowptrnode > optonode --> opaquenode > > PartialSubtypeCheckNode class should be in new intrinsicnode file > together with other similar classes from memnode files: > StrIntrinsicNode and related, EncodeISOArrayNode. > > ThreadLocalNode can be kept in connode because it is kind of a > constant pointer value. > > Put BinaryNode into movenode.hpp since it references cmove nodes. > > constnode.hpp is included into callnode.hpp so you don't need to > include it into files which have callnode.hpp included. Yes, we had it > before but you are cleaning the code. > > thanks, > Vladimir > > On 3/17/14 6:51 AM, Morris Meyer wrote: >> Folks, >> >> Could I get a review for 8001532 - refactoring the old connode file in >> the C2 source tree? >> >> I've broken connode into bitsnode, castnode, constnode, convertnode, >> movenode, narrownode, optonode and threadnode. >> >> This change has been through JPRT. >> >> --morris meyer >> >> JBS - https://bugs.openjdk.java.net/browse/JDK-8001532 >> WEBREV - http://cr.openjdk.java.net/~morris/8001532.01 From vladimir.kozlov at oracle.com Tue Mar 18 21:17:25 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Mar 2014 14:17:25 -0700 Subject: RFR(XS): 8037589: PrintFlagsFinalGetter to testlibrary In-Reply-To: <53289E67.1040808@oracle.com> References: <53284B8E.5020303@oracle.com> <532850F1.5010107@oracle.com> <532858BC.9070409@oracle.com> <53289629.1060106@oracle.com> <53289E67.1040808@oracle.com> Message-ID: <5328B7E5.3000400@oracle.com> Yes, 8032449 is the right solution. Thanks, Vladimir On 3/18/14 12:28 PM, Igor Ignatyev wrote: > Vladimir, > > I agree w/ you, WB api is better. I've a prototype for such api: >> template >> bool SetVMFlag(JavaThread* thread, JNIEnv* env, jstring name, T* >> value, bool (*TAtPut)(const char*, T*, Flag::Flags)) { >> if (name == NULL) { >> return false; >> } >> ThreadToNativeFromVM ttnfv(thread); // can't be in VM when we call >> JNI >> const char* flag_name = env->GetStringUTFChars(name, NULL); >> bool result = (*TAtPut)(flag_name, value, Flag::INTERNAL); >> env->ReleaseStringUTFChars(name, flag_name); >> return result; >> } >> >> WB_ENTRY(jboolean, WB_TestSetBooleanVMFlag(JNIEnv* env, jobject o, >> jstring name, jboolean value)) >> bool result = value; >> SetVMFlag (thread, env, name, &result, >> &CommandLineFlags::boolAtPut); >> return result; >> WB_END > ... > > I am going to integrate it as a part of 8028595. > > Lev, > does WB api work for you? if yes, you can close 8037589 as a dup of > 8028595/8032449. > > Igor > > On 03/18/2014 10:53 PM, Vladimir Kozlov wrote: >> I don't get why you forking new JVM process to get flags value. It is >> very expensive way to get info. >> Why not use WB api to get flags values? Current JVM should run already >> with all flags specified on command line. >> >> Thanks, >> Vladimir >> >> On 3/18/14 7:31 AM, Lev Priima wrote: >>> info from bug description in jbs: >>> On 03/18/2014 05:58 PM, Igor Ignatyev wrote: >>>> Lev, >>>> >>>> I've uploaded your webrev: >>>> http://cr.openjdk.java.net/~iignatyev/lpriima/8037589/webrev.00/ >>>> >>>> What is the common use case for PrintFlagsFinalGetter? >>> when writing test for java products which does not contain JMX(e.g, >>> compact1, compact2) we have to read options of VM. Class >>> com.oracle.java.testlibrary.PrintFlagsFinalGetter may help in testdev >>> with it >>>> I don't understand when you need use getFlagsFinal instead of >>>> getWithVMOpts. So I'd prefer to have one method and use >>>> ProcessTools::executeTestJvm instead of createJavaProcessBuilder >>>> Igor >>>> >>>> On 03/18/2014 05:35 PM, Lev Priima wrote: >>>>> Please review and help me with integration: >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8037589 >>>>> Webrev: in attachment >>>>> >>> Lev From vladimir.kozlov at oracle.com Tue Mar 18 22:11:55 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Mar 2014 15:11:55 -0700 Subject: RFR(L) 8001532: C2 node files refactoring In-Reply-To: <5328ADE2.9020301@oracle.com> References: <5326FDD3.8080005@oracle.com> <5327752B.3080601@oracle.com> <5328ADE2.9020301@oracle.com> Message-ID: <5328C4AB.9030300@oracle.com> The big comment in connode.cpp belongs to cmove nodes and should be moved into movenode.cpp. Why you kept threadnode.hpp file? Otherwise it looks good. I would ask to not push it now because it is interfering with my RTM changes (I added Opaque3Node). May be next week. Thanks, Vladimir On 3/18/14 1:34 PM, Morris Meyer wrote: > Thanks for the review Vladimir. > > Here is the webrev modified from yours and Christian's feedback. > > --mm > > JPRT - hotspotest - 2014-03-18-190819.mameyer.8001532 > WEBREV - http://cr.openjdk.java.net/~morris/8001532.03 > > On 3/17/14, 6:20 PM, Vladimir Kozlov wrote: >> File names usually match base class name of ideal nodes. Please change: >> >> constnode back to connode >> bitsnode --> countbitsnode >> narrownode --> narrowptrnode >> optonode --> opaquenode >> >> PartialSubtypeCheckNode class should be in new intrinsicnode file >> together with other similar classes from memnode files: >> StrIntrinsicNode and related, EncodeISOArrayNode. >> >> ThreadLocalNode can be kept in connode because it is kind of a >> constant pointer value. >> >> Put BinaryNode into movenode.hpp since it references cmove nodes. >> >> constnode.hpp is included into callnode.hpp so you don't need to >> include it into files which have callnode.hpp included. Yes, we had it >> before but you are cleaning the code. >> >> thanks, >> Vladimir >> >> On 3/17/14 6:51 AM, Morris Meyer wrote: >>> Folks, >>> >>> Could I get a review for 8001532 - refactoring the old connode file in >>> the C2 source tree? >>> >>> I've broken connode into bitsnode, castnode, constnode, convertnode, >>> movenode, narrownode, optonode and threadnode. >>> >>> This change has been through JPRT. >>> >>> --morris meyer >>> >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8001532 >>> WEBREV - http://cr.openjdk.java.net/~morris/8001532.01 > From igor.veresov at oracle.com Wed Mar 19 00:15:40 2014 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 18 Mar 2014 17:15:40 -0700 Subject: RFR (XL) 8031320: Use Intel RTM instructions for locks In-Reply-To: <532748EF.6080200@oracle.com> References: <532748EF.6080200@oracle.com> Message-ID: In macroAssembler_x86.cpp: 1372 // set rtm_state to "no rtm" in method oop 1388 // set rtm_state to "always rtm" in method oop These comments should be s/method oop/MDO/ In a couple of places like void MacroAssembler::rtmcounters_update() and void MacroAssembler::rtm_abortratio_calculation() the parameters although named as tmpReg or scrReg also carry useful input data. Could these be renamed to reflect the input parameter semantics? And then, before the value in them gets destroyed aliased to tmpReg or scrReg (or may be something that reflects their meaning better)? Otherwise looks good. igor On Mar 17, 2014, at 12:11 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8031320 > http://cr.openjdk.java.net/~kvn/8031320_9/webrev/ > > The Intel architectures codenamed Haswell has support for RTM (Restricted Transactional Memory) instructions xbegin, xabort, xend and xtest as part of Intel Transactional Synchronization Extension (TSX). The xbegin and xend instructions enclose a set of instructions to be executed as a transaction. If no conflict found during execution of the transaction, the memory and register modifications are committed together at xend. xabort instruction can be used for explicit abort of transaction and xtest to check if we are in transaction. > > RTM is useful for highly contended locks with low conflict in the critical region. The highly contended locks don't scale well otherwise but with RTM they show good scaling. RTM allows using coarse grain locking for applications. Also for lightly contended locks which are used by different threads RTM can reduce cache line ping pong and thereby show performance improvement too. > > Implementation: > > Generate RTM locking code for all inflated locks when "UseRTMLocking" option is on with normal locking as fall back mechanism. On abort or lock busy the lock will be retried a fixed number of times as specified by "RTMRetryCount" option. The locks which abort too often can be auto tuned or manually tuned. > > Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort ratio calculation code for each lock. The abort ratio will be calculated after "RTMAbortThreshold" aborts are encountered. > With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the nmethod containing the lock will be deoptimized and recompiled with all locks as normal (stack) locks. If the abort ratio continues to remain low after "RTMLockingThreshold" attempted locks, then the method will be deoptimized and recompiled with all locks as RTM locks without abort ratio calculation code. The abort ratio calculation can be delayed by specifying -XX:RTMLockingCalculationDelay= flag. > Deoptimization of nmethod is done by adding an uncommon trap at the beginning of the code which checks rtm state field in MDO which is modified by the abort calculation code. > > For manual tuning the abort statistics for each lock could be provided to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag. Based on the abort statistics users can create a .hotspot_compiler file or use -XX:CompileCommand=