From vladimir.kozlov at oracle.com Thu Mar 1 02:12:57 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Feb 2018 18:12:57 -0800 Subject: [11] RFR(S) 8195632: [Graal] Introduce EagerJVMCI flag to force eager JVMCI initialization In-Reply-To: <068ec530-2cd3-8c1a-e0ed-a734d2d9eb79@oracle.com> References: <54e13cbb-eb0f-8eed-1035-12153b8d48b9@oracle.com> <8475baeb-1392-5771-b300-eda1ec8d022d@oracle.com> <068ec530-2cd3-8c1a-e0ed-a734d2d9eb79@oracle.com> Message-ID: <6b7c30ce-bd9c-a149-c4ff-cf7a7a3f4ef5@oracle.com> Update: http://cr.openjdk.java.net/~kvn/8195632/webrev.01/ Added EagerJVMCI to 3 failed test. Flag check code in jvmci_globals.cpp was modified to allow specify this flag when JVMCI is disable. Ran 3 test with all combinations of JVMCI flags. Thanks, Vladimir On 2/28/18 9:32 AM, Vladimir Kozlov wrote: > On 2/28/18 5:30 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> looks good to me. Why not already add that flag to the failing tests? > > Right, we should. I asked Katya to provide list. 3 listed tests are not all failed tests as I > understand. > > I will re-post RFR when I update tests. > > Thanks, > Vladimir > >> >> Best regards, >> Tobias >> >> On 28.02.2018 02:56, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/8195632/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8195632 >>> >>> Added flag and tested it with listed in bug tests. >>> >>> Thanks, >>> Vladimir From igor.veresov at oracle.com Thu Mar 1 07:05:21 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 28 Feb 2018 23:05:21 -0800 Subject: RFR(S): 8198756: Limit number of compiler threads for small code cache In-Reply-To: References: <8c77470008d343f19d032de8f48f1bbb@sap.com> Message-ID: I?m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don?t want them to keep holding the space for their code buffers when they are idle? igor > On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov wrote: > > Hi Doerr, > > The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines). > By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened). > > Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code. > > And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast: > > Graph for 3*?log2(?x)*?log2(?log2(?x))/?2 > -60 > -55 > -50 > -45 > -40 > -35 > -30 > -25 > -20 > -15 > -10 > -5 > 5 > 10 > 15 > 20 > 25 > 30 > 35 > 40 > 45 > 50 > 55 > 60 > 65 > 70 > 75 > 80 > 85 > 90 > 95 > 100 > 105 > 110 > 115 > 120 > 125 > 130 > -35 > -30 > -25 > -20 > -15 > -10 > -5 > 5 > 10 > 15 > 20 > 25 > 30 > 35 > 40 > 45 > 50 > 55 > 60 > 65 > x: 32.0711217 > y: 17.4325495 > > > May be we should have a formula which takes into account code cache size and number of cpu threads. > > Igor Veresov was original developer of current formula. It would be nice to hear his opinion. > > Thanks, > Vladimir > > On 2/27/18 8:10 AM, Doerr, Martin wrote: >> Hi, >> >> the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size. >> This doesn't make sense for very small code cache sizes. >> >> The dynamically determined number of compiler threads can be observed by: >> jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler >> >> I suggest not to use more than 1 compiler thread per 32MB of code cache: >> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ >> >> This seems to be conservative. >> Please review and let me know if you have a different limitation proposal. >> >> Best regards, >> Martin >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Thu Mar 1 07:29:27 2018 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Thu, 1 Mar 2018 12:59:27 +0530 Subject: [11] RFR: 8198252: Null pointer dereference in fold_compares_helper In-Reply-To: References: <2e21edcd-a8cc-660f-27a6-38ccc7759788@oracle.com> Message-ID: Thanks Tobias, Vladimir. -Rahul On Tuesday 27 February 2018 10:19 PM, Vladimir Kozlov wrote: > +1 > > Vladimir > > On 2/27/18 2:55 AM, Tobias Hartmann wrote: >> Hi Rahul, >> >> looks good to me! >> >> Best regards, >> Tobias >> >> On 27.02.2018 11:37, Rahul Raghavan wrote: >>> Hi, >>> >>> Please review the following fix proposal. >>> >>> - http://cr.openjdk.java.net/~rraghavan/8198252/webrev.01/ >>> >>> - https://bugs.openjdk.java.net/browse/JDK-8198252 - >>> ?? 'Null pointer dereference in IfNode::fold_compares_helper' >>> >>> -- Reported issue is - filtered_int_type() may return NULL and >>> in IfNode::fold_compares_helper(), results of filtered_int_type() >>> call - lo_type, hi_type - are >>> dereferenced without null checks. >>> >>> -- Proposed fix above is adding NULL check for required if conditions >>> checks. >>> >>> -- Confirmed for other locations of calls to filtered_int_type(), the >>> possible NULL result is handled. >>> >>> >>> Thanks, >>> Rahul From tobias.hartmann at oracle.com Thu Mar 1 07:35:37 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 1 Mar 2018 08:35:37 +0100 Subject: [11] RFR(S) 8195632: [Graal] Introduce EagerJVMCI flag to force eager JVMCI initialization In-Reply-To: <6b7c30ce-bd9c-a149-c4ff-cf7a7a3f4ef5@oracle.com> References: <54e13cbb-eb0f-8eed-1035-12153b8d48b9@oracle.com> <8475baeb-1392-5771-b300-eda1ec8d022d@oracle.com> <068ec530-2cd3-8c1a-e0ed-a734d2d9eb79@oracle.com> <6b7c30ce-bd9c-a149-c4ff-cf7a7a3f4ef5@oracle.com> Message-ID: <9993557c-4d16-f938-6b30-a41952f31395@oracle.com> Hi Vladimir, looks good, thanks for updating! Best regards, Tobias On 01.03.2018 03:12, Vladimir Kozlov wrote: > Update: http://cr.openjdk.java.net/~kvn/8195632/webrev.01/ > > Added EagerJVMCI to 3 failed test. Flag check code in jvmci_globals.cpp was modified to allow > specify this flag when JVMCI is disable. > > Ran 3 test with all combinations of JVMCI flags. > > Thanks, > Vladimir > > On 2/28/18 9:32 AM, Vladimir Kozlov wrote: >> On 2/28/18 5:30 AM, Tobias Hartmann wrote: >>> Hi Vladimir, >>> >>> looks good to me. Why not already add that flag to the failing tests? >> >> Right, we should. I asked Katya to provide list. 3 listed tests are not all failed tests as I >> understand. >> >> I will re-post RFR when I update tests. >> >> Thanks, >> Vladimir >> >>> >>> Best regards, >>> Tobias >>> >>> On 28.02.2018 02:56, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/8195632/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8195632 >>>> >>>> Added flag and tested it with listed in bug tests. >>>> >>>> Thanks, >>>> Vladimir From martin.doerr at sap.com Thu Mar 1 08:31:52 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 1 Mar 2018 08:31:52 +0000 Subject: RFR(S): 8198756: Limit number of compiler threads for small code cache In-Reply-To: References: <8c77470008d343f19d032de8f48f1bbb@sap.com> Message-ID: Hi Igor, we observed that the compiler threads fill up the code cache faster than the sweeper can clean when using a small code cache. This doesn't seem beneficial at all. Some customers try to save memory by using a very small code cache. It's very annoying that so much memory gets wasted for such a large number of idle compiler threads which hold their arenas etc. Maybe the current formula was optimized for a special scenario with many slow cores? Maybe SPARC Niagara? Shouldn't such scenarios use a large code cache? Maybe much more than 240MB? Best regards, Martin From: Igor Veresov [mailto:igor.veresov at oracle.com] Sent: Donnerstag, 1. M?rz 2018 08:05 To: Vladimir Kozlov Cc: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache I?m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don?t want them to keep holding the space for their code buffers when they are idle? igor On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov > wrote: Hi Doerr, The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines). By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened). Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code. And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast: Graph for 3*?log2(?x)*?log2(?log2(?x))/?2 -60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: 32.0711217y: 17.4325495 May be we should have a formula which takes into account code cache size and number of cpu threads. Igor Veresov was original developer of current formula. It would be nice to hear his opinion. Thanks, Vladimir On 2/27/18 8:10 AM, Doerr, Martin wrote: Hi, the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size. This doesn't make sense for very small code cache sizes. The dynamically determined number of compiler threads can be observed by: jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler I suggest not to use more than 1 compiler thread per 32MB of code cache: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ This seems to be conservative. Please review and let me know if you have a different limitation proposal. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Thu Mar 1 15:49:49 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 1 Mar 2018 15:49:49 +0000 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Message-ID: Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math :)) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Mar 1 17:00:44 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Mar 2018 09:00:44 -0800 Subject: [11] RFR(S) 8195632: [Graal] Introduce EagerJVMCI flag to force eager JVMCI initialization In-Reply-To: <9993557c-4d16-f938-6b30-a41952f31395@oracle.com> References: <54e13cbb-eb0f-8eed-1035-12153b8d48b9@oracle.com> <8475baeb-1392-5771-b300-eda1ec8d022d@oracle.com> <068ec530-2cd3-8c1a-e0ed-a734d2d9eb79@oracle.com> <6b7c30ce-bd9c-a149-c4ff-cf7a7a3f4ef5@oracle.com> <9993557c-4d16-f938-6b30-a41952f31395@oracle.com> Message-ID: <29940f35-93a3-8c41-f548-c195bc32fa25@oracle.com> Thank you, Tobias Vladimir On 2/28/18 11:35 PM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good, thanks for updating! > > Best regards, > Tobias > > On 01.03.2018 03:12, Vladimir Kozlov wrote: >> Update: http://cr.openjdk.java.net/~kvn/8195632/webrev.01/ >> >> Added EagerJVMCI to 3 failed test. Flag check code in jvmci_globals.cpp was modified to allow >> specify this flag when JVMCI is disable. >> >> Ran 3 test with all combinations of JVMCI flags. >> >> Thanks, >> Vladimir >> >> On 2/28/18 9:32 AM, Vladimir Kozlov wrote: >>> On 2/28/18 5:30 AM, Tobias Hartmann wrote: >>>> Hi Vladimir, >>>> >>>> looks good to me. Why not already add that flag to the failing tests? >>> >>> Right, we should. I asked Katya to provide list. 3 listed tests are not all failed tests as I >>> understand. >>> >>> I will re-post RFR when I update tests. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 28.02.2018 02:56, Vladimir Kozlov wrote: >>>>> http://cr.openjdk.java.net/~kvn/8195632/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8195632 >>>>> >>>>> Added flag and tested it with listed in bug tests. >>>>> >>>>> Thanks, >>>>> Vladimir From lutz.schmidt at sap.com Thu Mar 1 17:17:55 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 1 Mar 2018 17:17:55 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics Message-ID: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> Dear all, may I please request reviews for this quite voluminous enhancement: Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The changes to other files are not difficult to understand. If you need information about what this enhancement does and how it can be used, please refer to the bug description. There I have attached some documentation which will greatly help with understanding the code. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.joelsson at oracle.com Thu Mar 1 19:54:20 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Thu, 1 Mar 2018 11:54:20 -0800 Subject: RFR(S) 8197453 : Add support of extra problem list In-Reply-To: <565cbc8c-64e7-8fd8-65a8-84564b2ce49b@oracle.com> References: <565cbc8c-64e7-8fd8-65a8-84564b2ce49b@oracle.com> Message-ID: <1b2b62ce-6c7d-9935-28c4-7a86371c38b8@oracle.com> Makefile change looks good. /Erik On 2018-02-27 09:45, Ekaterina Pavlova wrote: > Jon, > > thanks for the review. > I have updated the webrev. > > thanks, > -katya > > > On 2/26/18 12:02 PM, Jonathan Gibbons wrote: >> If these new problem-list files are destined for use by jtreg, I >> would encourage adding a platform specifier on each line, after the >> bug number. If you want to mark the test as excluded on all >> platforms, the convention is to use "generic-all". >> >> -- Jon >> >> >> On 2/26/18 11:47 AM, Igor Ignatyev wrote: >>> adding build-dev alias >>> >>> -- Igor >>> >>>> On Feb 8, 2018, at 3:08 PM, Ekaterina Pavlova >>>> wrote: >>>> >>>> Hi all, >>>> >>>> ProblemList.txt files used by makefiles for jtreg testing allow to >>>> specify list of tests to be excluded >>>> from execution on all or specific platforms. However to test such >>>> features like Graal we want to be able >>>> to specify list of failed tests which fail in particular JVM mode >>>> only. >>>> Please review this change which adds support of extra problem list >>>> and introduces 2 Graal specific problem list files. >>>> - test/hotspot/jtreg/ProblemList-graal.txt >>>> - test/jdk/ProblemList-graal.txt >>>> >>>> >>>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8197453 >>>> ? webrev: http://cr.openjdk.java.net/~epavlova//8197453/webrev.00/ >>>> testing: precheckin, tier1 and tier2 with empty EXTRA_PROBLEM_LISTS. >>>> ????????? testing in Graal mode with >>>> EXTRA_PROBLEM_LISTS=ProblemList-graal.txt >>>> >>>> thanks, >>>> -katya >>>> >>>> p.s. >>>> Igor Ignatyev volunteered to sponsor this change. >> > From vladimir.kozlov at oracle.com Thu Mar 1 20:26:12 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Mar 2018 12:26:12 -0800 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> Message-ID: <01763ff8-6eb6-88f6-0c7a-a383d9857c0c@oracle.com> Hi, Lutz I would say nice work! And nice document. Can you add "general description" (may be shorter version) as comment to heap.cpp code to avoid looking for document when you want to know how to use it. May be move new code from heap.* files to new files and to new class (pass CodeHeap as parameter) to isolate this code. I also would expect to have them in code/ directory and not memory. Did you try it with AOT libraries? You used FOR_ALL_HEAPS() which includes AOT heaps. May be you should just use FOR_ALL_ALLOCABLE_HEAPS(). We usually do not add RFE/Bug ID into comments. I also did not get your placement of PrintCodeHeapState code in "nonproduct" case of print_statistics(). I would expect it near PrintCodeCache code like in "product" case. Thanks, Vladimir On 3/1/18 9:17 AM, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this quite voluminous enhancement: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html > > Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The > changes to other files are not difficult to understand. > > If you need information about what this enhancement does and how it can be used, please refer to the > bug description. There I have attached some documentation which will greatly help with understanding > the code. > > Thank you! > > Lutz > From lutz.schmidt at sap.com Thu Mar 1 20:50:35 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 1 Mar 2018 20:50:35 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <01763ff8-6eb6-88f6-0c7a-a383d9857c0c@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <01763ff8-6eb6-88f6-0c7a-a383d9857c0c@oracle.com> Message-ID: <2CE49F4C-E625-4566-B22F-DE6465A52A25@sap.com> Thank you, Vladimir, for your kind words! I can for sure add a condensed version of the "General Description" chapter as an introduction in front of the code (heap.cpp or wherever it may finally live). When preparing the change, I was also thinking about significantly extending the help text that "jcmd help Compiler.CodeHeap_Analytics" would spit out. I didn't dare to do so because no other jcmd has an extended help text. Should I dare? I will try to move the analytics code to a new cpp/hpp pair and invent a new class. Not sure how much interfacing hassle this will cause. But I'll try... No, I did not try with AOT libs. My AOT experience is zero. For the time being, and with all the other requests I have to work on, I will resort to FOR_ALL_ALLOCABLE_HEAPS(). I added the RFE/BugID comments for the sole purpose of making the life of the reviewers easier. Removing them with the next webrev iteration should be doable. __ I will rearrange the PrintCodeHeapState code so that it is consistent in the "product" and "nonproduct" case. That will all take a while. A new webrev will most probably not be ready before the weekend. So stay tuned... Regards, Lutz ?On 01.03.18, 21:26, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: Hi, Lutz I would say nice work! And nice document. Can you add "general description" (may be shorter version) as comment to heap.cpp code to avoid looking for document when you want to know how to use it. May be move new code from heap.* files to new files and to new class (pass CodeHeap as parameter) to isolate this code. I also would expect to have them in code/ directory and not memory. Did you try it with AOT libraries? You used FOR_ALL_HEAPS() which includes AOT heaps. May be you should just use FOR_ALL_ALLOCABLE_HEAPS(). We usually do not add RFE/Bug ID into comments. I also did not get your placement of PrintCodeHeapState code in "nonproduct" case of print_statistics(). I would expect it near PrintCodeCache code like in "product" case. Thanks, Vladimir On 3/1/18 9:17 AM, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this quite voluminous enhancement: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html > > Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The > changes to other files are not difficult to understand. > > If you need information about what this enhancement does and how it can be used, please refer to the > bug description. There I have attached some documentation which will greatly help with understanding > the code. > > Thank you! > > Lutz > From vladimir.kozlov at oracle.com Thu Mar 1 22:58:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Mar 2018 14:58:01 -0800 Subject: RFR(M): 8194490: [JVMCI] Move `iterateFrames` to C++ In-Reply-To: References: <6b9d58d4-9015-dfd7-a3bf-485c40516d24@oracle.com> Message-ID: <0aa88b2a-8922-044b-8132-4186fbdb7fbb@oracle.com> Hi Gilles, What happened to this fix? Testing all passed but it was not pushed. Thanks, Vladimir On 2/1/18 5:21 PM, Vladimir Kozlov wrote: > Thank you, Gilles > > Seems fine to me. Who reviewed it in Labs? > > And thank you for running testing. > > Vladimir > > On 1/22/18 6:58 AM, Gilles Duboscq wrote: >> Hi, >> >> Please review the following fix for `HotSpotStackIntrospection.iterateFrames`. >> It moves the iteration code from Java to C++: this helps with an issue that would arise if the >> nmethod containing the `iterateFrames` hits and uncommon trap during iteration. IT would change >> the layout of the top frames which would confuse the stack walking logic. Having this loop in C++ >> ensure there can be no uncommon trap. >> >> Webrev: http://cr.openjdk.java.net/~gdub/webrev-8194490/ >> Issue: https://bugs.openjdk.java.net/browse/JDK-8194490 >> Testing: hs-tier1,hs-tier2,hs-precheckin-comp >> >> It's bit unfortunate that we have tests for implementation details of JVMCI (i.e., tests for >> CompilerToVM) instead of tests for the actual API. >> >> Thanks, >> ? Gilles >> From igor.veresov at oracle.com Fri Mar 2 00:45:31 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 1 Mar 2018 16:45:31 -0800 Subject: RFR(S): 8198756: Limit number of compiler threads for small code cache In-Reply-To: References: <8c77470008d343f19d032de8f48f1bbb@sap.com> Message-ID: Doerr, I think the optimal number of compiler threads is such that it keeps the length of the compiler queues as minimal. During startup typically the optimal number of compiler threads is equal to the number of the CPUs, may be even more than that considering threads a either C1 or C2 and compiles typically happen in waves using one and then the other. The fact that some users see code cache filling slower with fewer threads is just an indication of how huge their compile queues are, and this is certainly not good for startup. The problem of resource holding is real, since after startup we don?t need that many threads (unless you?re running something that does dynamic code generation). Perhaps the solution to all of this is having a dynamic pool of compiler threads that could expand/shrink depending on the load (the length of the compile queues). igor > On Mar 1, 2018, at 12:31 AM, Doerr, Martin wrote: > > Hi Igor, > > we observed that the compiler threads fill up the code cache faster than the sweeper can clean when using a small code cache. > This doesn't seem beneficial at all. > > Some customers try to save memory by using a very small code cache. It's very annoying that so much memory gets wasted for such a large number of idle compiler threads which hold their arenas etc. > > Maybe the current formula was optimized for a special scenario with many slow cores? Maybe SPARC Niagara? > Shouldn't such scenarios use a large code cache? Maybe much more than 240MB? > > Best regards, > Martin > > > From: Igor Veresov [mailto:igor.veresov at oracle.com] > Sent: Donnerstag, 1. M?rz 2018 08:05 > To: Vladimir Kozlov > Cc: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache > > I?m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don?t want them to keep holding the space for their code buffers when they are idle? > > igor > > > On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov > wrote: > > Hi Doerr, > > The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines). > By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened). > > Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code. > > And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast: > > Graph for 3*?log2(?x)*?log2(?log2(?x))/?2 > -60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: 32.0711217y: 17.4325495 > > > May be we should have a formula which takes into account code cache size and number of cpu threads. > > Igor Veresov was original developer of current formula. It would be nice to hear his opinion. > > Thanks, > Vladimir > > On 2/27/18 8:10 AM, Doerr, Martin wrote: > Hi, > > the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size. > This doesn't make sense for very small code cache sizes. > > The dynamically determined number of compiler threads can be observed by: > jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler > > I suggest not to use more than 1 compiler thread per 32MB of code cache: > http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ > > This seems to be conservative. > Please review and let me know if you have a different limitation proposal. > > Best regards, > Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From Derek.White at cavium.com Fri Mar 2 03:24:30 2018 From: Derek.White at cavium.com (White, Derek) Date: Fri, 2 Mar 2018 03:24:30 +0000 Subject: RFR(S): 8198756: Limit number of compiler threads for small code cache In-Reply-To: References: <8c77470008d343f19d032de8f48f1bbb@sap.com> Message-ID: Hi Igor, Martin, Just to throw out some other user experience: I?m typically running on machines with 98 to 224 CPUs. It?s not the case that *every* Java app needs to use all the CPUs for compiler threads. The JVM may not be the only JVM running on the system (Hadoop, microservices, etc), let alone the only important app on the system. Historically the GC threads have been the worst offenders in this regard. The GC thread?s ?scaling factor? is much higher than the compiler thread?s scaling factor. But with options like UseDynamicNumberOfGCThreads, the GC tries to adjust the number of GC threads to the work to be done. I think it?s important that the JVM figure out how to scale the number of compiler threads as well. I won?t claim that Martin?s scheme is the best approach, or that it should be on by default, but unless a better solution is going into JDK 11, I?d support this scheme as an experimental flag. FWIW. * Derek White, Cavium (Purveyor of fine 224 cpu systems for the discerning developer). From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Igor Veresov Sent: Thursday, March 01, 2018 7:46 PM To: Doerr, Martin Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache Doerr, I think the optimal number of compiler threads is such that it keeps the length of the compiler queues as minimal. During startup typically the optimal number of compiler threads is equal to the number of the CPUs, may be even more than that considering threads a either C1 or C2 and compiles typically happen in waves using one and then the other. The fact that some users see code cache filling slower with fewer threads is just an indication of how huge their compile queues are, and this is certainly not good for startup. The problem of resource holding is real, since after startup we don?t need that many threads (unless you?re running something that does dynamic code generation). Perhaps the solution to all of this is having a dynamic pool of compiler threads that could expand/shrink depending on the load (the length of the compile queues). igor On Mar 1, 2018, at 12:31 AM, Doerr, Martin > wrote: Hi Igor, we observed that the compiler threads fill up the code cache faster than the sweeper can clean when using a small code cache. This doesn't seem beneficial at all. Some customers try to save memory by using a very small code cache. It's very annoying that so much memory gets wasted for such a large number of idle compiler threads which hold their arenas etc. Maybe the current formula was optimized for a special scenario with many slow cores? Maybe SPARC Niagara? Shouldn't such scenarios use a large code cache? Maybe much more than 240MB? Best regards, Martin From: Igor Veresov [mailto:igor.veresov at oracle.com] Sent: Donnerstag, 1. M?rz 2018 08:05 To: Vladimir Kozlov > Cc: Doerr, Martin >; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache I?m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don?t want them to keep holding the space for their code buffers when they are idle? igor On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov > wrote: Hi Doerr, The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines). By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened). Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code. And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast: Graph for 3*?log2(?x)*?log2(?log2(?x))/?2 -60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: 32.0711217y: 17.4325495 May be we should have a formula which takes into account code cache size and number of cpu threads. Igor Veresov was original developer of current formula. It would be nice to hear his opinion. Thanks, Vladimir On 2/27/18 8:10 AM, Doerr, Martin wrote: Hi, the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size. This doesn't make sense for very small code cache sizes. The dynamically determined number of compiler threads can be observed by: jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler I suggest not to use more than 1 compiler thread per 32MB of code cache: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ This seems to be conservative. Please review and let me know if you have a different limitation proposal. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Fri Mar 2 08:35:03 2018 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 2 Mar 2018 09:35:03 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <01763ff8-6eb6-88f6-0c7a-a383d9857c0c@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <01763ff8-6eb6-88f6-0c7a-a383d9857c0c@oracle.com> Message-ID: <36d1c9d8-9e57-d74e-42d5-55080752ec50@oracle.com> Hi Lutz, On 03/01/2018 09:26 PM, Vladimir Kozlov wrote: > I also did not get your placement of PrintCodeHeapState code in "nonproduct" > case of print_statistics(). I would expect it near PrintCodeCache code like in > "product" case. Since we are removing the -XX:*Print* and friends in favor of UL, this should be something like -Xlog:compiler+codeheap+dump=debug or similar IMHO. /Robbin > > Thanks, > Vladimir > > On 3/1/18 9:17 AM, Schmidt, Lutz wrote: >> Dear all, >> >> may I please request reviews for this quite voluminous enhancement: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 >> >> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html >> >> Don?t get afraid! Most of the logic is new and isolated in a big, separate >> block in heap.cpp. The changes to other files are not difficult to understand. >> >> If you need information about what this enhancement does and how it can be >> used, please refer to the bug description. There I have attached some >> documentation which will greatly help with understanding the code. >> >> Thank you! >> >> Lutz >> From martin.doerr at sap.com Fri Mar 2 10:28:58 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 2 Mar 2018 10:28:58 +0000 Subject: RFR(S): 8198756: Limit number of compiler threads for small code cache In-Reply-To: References: <8c77470008d343f19d032de8f48f1bbb@sap.com> Message-ID: <0597e4be52684e168ba30adc2ad84b7b@sap.com> Hi Derek, Igor and Vladimir, thanks for all replies. I agree with that it would be good to have something like UseDynamicNumberOfGCThreads. Btw. I have recently requested to activate that one by default (JDK-8198547). If we can?t get it for jdk11, I?d like at least to make it easier for customers to save memory without explicitly setting CICompilerCount. Best regards, Martin From: White, Derek [mailto:Derek.White at cavium.com] Sent: Freitag, 2. M?rz 2018 04:25 To: Igor Veresov ; Doerr, Martin Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(S): 8198756: Limit number of compiler threads for small code cache Hi Igor, Martin, Just to throw out some other user experience: I?m typically running on machines with 98 to 224 CPUs. It?s not the case that *every* Java app needs to use all the CPUs for compiler threads. The JVM may not be the only JVM running on the system (Hadoop, microservices, etc), let alone the only important app on the system. Historically the GC threads have been the worst offenders in this regard. The GC thread?s ?scaling factor? is much higher than the compiler thread?s scaling factor. But with options like UseDynamicNumberOfGCThreads, the GC tries to adjust the number of GC threads to the work to be done. I think it?s important that the JVM figure out how to scale the number of compiler threads as well. I won?t claim that Martin?s scheme is the best approach, or that it should be on by default, but unless a better solution is going into JDK 11, I?d support this scheme as an experimental flag. FWIW. * Derek White, Cavium (Purveyor of fine 224 cpu systems for the discerning developer). From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Igor Veresov Sent: Thursday, March 01, 2018 7:46 PM To: Doerr, Martin > Cc: Vladimir Kozlov >; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache Doerr, I think the optimal number of compiler threads is such that it keeps the length of the compiler queues as minimal. During startup typically the optimal number of compiler threads is equal to the number of the CPUs, may be even more than that considering threads a either C1 or C2 and compiles typically happen in waves using one and then the other. The fact that some users see code cache filling slower with fewer threads is just an indication of how huge their compile queues are, and this is certainly not good for startup. The problem of resource holding is real, since after startup we don?t need that many threads (unless you?re running something that does dynamic code generation). Perhaps the solution to all of this is having a dynamic pool of compiler threads that could expand/shrink depending on the load (the length of the compile queues). igor On Mar 1, 2018, at 12:31 AM, Doerr, Martin > wrote: Hi Igor, we observed that the compiler threads fill up the code cache faster than the sweeper can clean when using a small code cache. This doesn't seem beneficial at all. Some customers try to save memory by using a very small code cache. It's very annoying that so much memory gets wasted for such a large number of idle compiler threads which hold their arenas etc. Maybe the current formula was optimized for a special scenario with many slow cores? Maybe SPARC Niagara? Shouldn't such scenarios use a large code cache? Maybe much more than 240MB? Best regards, Martin From: Igor Veresov [mailto:igor.veresov at oracle.com] Sent: Donnerstag, 1. M?rz 2018 08:05 To: Vladimir Kozlov > Cc: Doerr, Martin >; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache I?m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don?t want them to keep holding the space for their code buffers when they are idle? igor On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov > wrote: Hi Doerr, The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines). By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened). Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code. And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast: Graph for 3*?log2(?x)*?log2(?log2(?x))/?2 -60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: 32.0711217y: 17.4325495 May be we should have a formula which takes into account code cache size and number of cpu threads. Igor Veresov was original developer of current formula. It would be nice to hear his opinion. Thanks, Vladimir On 2/27/18 8:10 AM, Doerr, Martin wrote: Hi, the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size. This doesn't make sense for very small code cache sizes. The dynamically determined number of compiler threads can be observed by: jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler I suggest not to use more than 1 compiler thread per 32MB of code cache: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ This seems to be conservative. Please review and let me know if you have a different limitation proposal. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Fri Mar 2 10:55:02 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 2 Mar 2018 19:55:02 +0900 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: Hi Martin, I double checked performance with our micro benchmark. This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance. Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: "'hotspot-compiler-dev at openjdk.java.net'" Cc: "Lindenmaier, Goetz" , "Hiroshi H Horii (HORII at jp.ibm.com)" , "Michihiro Horie (HORIE at jp.ibm.com)" , Gustavo Romero Date: 2018/03/02 00:49 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math J) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From martin.doerr at sap.com Fri Mar 2 11:27:03 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 2 Mar 2018 11:27:03 +0000 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: <82435025fcb644918a3bd24405d5eaac@sap.com> Hi Michihiro, thank you very much for measuring. This sounds good. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 2. M?rz 2018 11:55 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; Hiroshi H Horii (HORII at jp.ibm.com) ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Martin, I double checked performance with our micro benchmark. This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance. Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector]"Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector instruction based CRC code. From: "Doerr, Martin" > To: "'hotspot-compiler-dev at openjdk.java.net'" > Cc: "Lindenmaier, Goetz" >, "Hiroshi H Horii (HORII at jp.ibm.com)" >, "Michihiro Horie (HORIE at jp.ibm.com)" >, Gustavo Romero > Date: 2018/03/02 00:49 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation ________________________________ Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math :)) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From gilles.m.duboscq at oracle.com Fri Mar 2 11:34:26 2018 From: gilles.m.duboscq at oracle.com (Gilles Duboscq) Date: Fri, 2 Mar 2018 12:34:26 +0100 Subject: RFR(M): 8194490: [JVMCI] Move `iterateFrames` to C++ In-Reply-To: <0aa88b2a-8922-044b-8132-4186fbdb7fbb@oracle.com> References: <6b9d58d4-9015-dfd7-a3bf-485c40516d24@oracle.com> <0aa88b2a-8922-044b-8132-4186fbdb7fbb@oracle.com> Message-ID: Hi Vladimir, Sorry about that, somehow i never saw your first message. Tom and Doug reviewed the JDK8 version internally. Beside the copyright year change of `src/hotspot/share/jvmci/jvmciCompilerToVM.cpp` everything still applies cleanly. Should i consider this reviewed? I guess i should re-run the tests since it's been a while. Gilles On 01/03/18 23:58, Vladimir Kozlov wrote: > Hi Gilles, > > What happened to this fix? > > Testing all passed but it was not pushed. > > Thanks, > Vladimir > > On 2/1/18 5:21 PM, Vladimir Kozlov wrote: >> Thank you, Gilles >> >> Seems fine to me. Who reviewed it in Labs? >> >> And thank you for running testing. >> >> Vladimir >> >> On 1/22/18 6:58 AM, Gilles Duboscq wrote: >>> Hi, >>> >>> Please review the following fix for `HotSpotStackIntrospection.iterateFrames`. >>> It moves the iteration code from Java to C++: this helps with an issue that would arise if the nmethod containing the `iterateFrames` hits and uncommon trap during iteration. IT would change the layout of the top frames which would confuse the stack walking logic. Having this loop in C++ ensure there can be no uncommon trap. >>> >>> Webrev: http://cr.openjdk.java.net/~gdub/webrev-8194490/ >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8194490 >>> Testing: hs-tier1,hs-tier2,hs-precheckin-comp >>> >>> It's bit unfortunate that we have tests for implementation details of JVMCI (i.e., tests for CompilerToVM) instead of tests for the actual API. >>> >>> Thanks, >>> ? Gilles >>> From martin.doerr at sap.com Fri Mar 2 14:42:12 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 2 Mar 2018 14:42:12 +0000 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: <31a09572cc574165b0852abfbea32123@sap.com> Hi, I just noticed that kernel_crc32_1word_vpmsum can be simplified a little more. kernel_crc32_1word can be used for the tail so we don't need to generate it separately for the small case. This is also better for my new implementation which leaves up to 255 bytes remaining. I also noticed that the unroll factor of 4096 seems to be too large. Half of it results in rather better performance. I got up to 42 GB/s, now. New webrev with these 2 minor changes: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.01 Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 2. M?rz 2018 11:55 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; Hiroshi H Horii (HORII at jp.ibm.com) ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Martin, I double checked performance with our micro benchmark. This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance. Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector]"Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector instruction based CRC code. From: "Doerr, Martin" > To: "'hotspot-compiler-dev at openjdk.java.net'" > Cc: "Lindenmaier, Goetz" >, "Hiroshi H Horii (HORII at jp.ibm.com)" >, "Michihiro Horie (HORIE at jp.ibm.com)" >, Gustavo Romero > Date: 2018/03/02 00:49 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation ________________________________ Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math :)) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From vladimir.x.ivanov at oracle.com Fri Mar 2 14:47:45 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 2 Mar 2018 17:47:45 +0300 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar Message-ID: Hi, I'm seeing unschedulable graph being produced during GCM when adding anti-dependence to a load node with a control dependency. I found the root cause, but can't decide how to fix it. Here are steps which lead to the broken graph: (1) The load causing problems (#391) is added as part of specializing ArrayCopy for small arrays (added as part of JDK-6912521 [1] in 9). Both control & memory are tied to AllocateArray. (IR [2]) (2) EA proves that AllocateArray (#363, destination) is scalar replaceable and during split_unique_types() updates corresponding MemoryMerge (#379) and it allows to directly use memory produced by ArrayCopy (#255, source) bypassing the allocation & membar (#348). (IR [3]) (3) After allocation elimination, the load control dependency is switched to MemBarCPUOrder (#348) which was immediate dominator of eliminated allocation (IR [4]) (4) After matching the load has control on the membar, but not memory (IR before [5] and after [6] matching.) (5) During GCM, anti-dependence from membar (#317) to the load is added, but it makes the graph unschedulable which then triggers the assertion [7] during LCM. Relevant places in the code: [8] Everything looks fine, except updates of MergeMems in step #2: * the load is pinned to the proper branch after deciding what direction to go; * wide membars do need anti-dependences on loads So, as a fix I'd disable memory edge updates which bypass any membars. Does it sound reasonable or am I missing something important? Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-6912521 [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png [3] http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png [4] http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png [5] http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png [6] http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png [7] # Internal Error (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), pid=90414, tid=14851 # assert(false) failed: graph should be schedulable [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From paul.sandoz at oracle.com Fri Mar 2 16:42:50 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 2 Mar 2018 08:42:50 -0800 Subject: Vectorization of Unsafe.putByte() In-Reply-To: References: <1519337615225-0.post@n7.nabble.com> Message-ID: <38D38590-6D5B-4D93-AA67-6464CDEB0974@oracle.com> Hi, I chatted with Vladimir a bit about this. Since we trust code within DirectByteBuffer to do the right thing, it should be possible vectorize since the boundaries and type are known, we *just* need to teach the JIT about ?em. There is also another solution being cooked up. An explicit API for vectorization being actively worked on under project Panama, where vectors can be loaded from DBBs. Paul. > On Feb 23, 2018, at 5:08 PM, Vladimir Kozlov wrote: > > Hi Vlad, > > Vectorization in HotSpot JVM does not work with direct addresses. It only works with Java arrays where boundaries and element type is known. For Java arrays Unsafe access converted to normal access to array's elements and vectorization is working. > > Regards, > Vladimir K > > On 2/22/18 2:13 PM, vrozov wrote: >> Hi, >> What is a reason that Unsafe.putByte(Object var1, long var2, byte var4) is >> vectorized differently compared to Unsafe.putByte(long var1, byte var3)? >> Below are results of JMH with Java 8 on my Mac. Results for Java 9 and 10 >> are similar. >> DirectBufferBenchmark.testNettyDirectPutBytes avgt 87.490 >> ms/op >> DirectBufferBenchmark.testNettyHeapPutBytes avgt 23.782 >> ms/op >> @State(Scope.Thread) >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class DirectBufferBenchmark { >> private static final int capacity = 256 * 1024 * 1024; >> private final ByteBuffer direct_source = >> ByteBuffer.allocateDirect(capacity); >> private final ByteBuffer direct_target = >> ByteBuffer.allocateDirect(capacity); >> private final ByteBuffer heap_source = ByteBuffer.allocate(capacity); >> private final ByteBuffer heap_target = ByteBuffer.allocate(capacity); >> private final long direct_source_address = >> PlatformDependent.directBufferAddress(direct_source); >> private final long direct_target_address = >> PlatformDependent.directBufferAddress(direct_target); >> private final byte[] heap_source_array = heap_source.array(); >> private final byte[] heap_target_array = heap_target.array(); >> @Benchmark >> public void testNettyHeapPutBytes() { >> for (int i = 0; i < capacity; i++) { >> PlatformDependent.putByte(heap_target_array, i, (byte)0xFF); >> } >> } >> @Benchmark >> public void testNettyDirectPutBytes() { >> for (int i = 0; i < capacity; i++) { >> PlatformDependent.putByte(direct_target_address + i, (byte)0xFF); >> } >> } >> } >> Thank you, >> Vlad >> -- >> Sent from: http://openjdk.5641.n7.nabble.com/OpenJDK-Hotspot-Compiler-Development-List-f6935.html From doug.simon at oracle.com Fri Mar 2 16:56:44 2018 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 2 Mar 2018 17:56:44 +0100 Subject: RFR: 8198571: [JVMCI] must not install wide vector code unless runtime supports it In-Reply-To: References: <17539397-8CE4-467D-A19F-C706560288FA@oracle.com> <815f1ab4-4eeb-5c03-cff0-954dedb88a2b@oracle.com> <598DBBE2-CEAD-42D5-8C9B-B747B72B5AD3@oracle.com> Message-ID: Thanks Vladimir. Can I get another review please? -Doug > On 23 Feb 2018, at 21:14, Vladimir Kozlov wrote: > > Okay. Then it is good. > > Thanks, > Vladimir > > On 2/23/18 11:59 AM, Doug Simon wrote: >> Yes. We have a separate fix for Graal that does what you propose. This is just a last bit of defense to prevent a VM crash in case the bug creeps back into Graal (or any other JVMCI compiler). >> Sent from my iPhone >>> On 23 Feb 2018, at 8:54 pm, Vladimir Kozlov wrote: >>> >>> Hi Doug, >>> >>> Are you planning changes to check MaxVectorSize value when vectors are generated by Graal? >>> Throwing exception during code installation is very late and expensive (you have to throw out all methods with vectors after spending CPUs to compile them). >>> >>> Thanks, >>> Vladimir >>> >>>> On 2/23/18 8:57 AM, Doug Simon wrote: >>>> As shown in https://github.com/oracle/graal/issues/303, a bug in a JVMCI compiler can result in vector code being installed even if the runtime doesn't support it. JVMCI should be defensive and raise an exception in this case. >>>> https://bugs.openjdk.java.net/browse/JDK-8198571 >>>> http://cr.openjdk.java.net/~dnsimon/8198571/ >>>> -Doug From vladimir.kozlov at oracle.com Fri Mar 2 18:26:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 2 Mar 2018 10:26:01 -0800 Subject: RFR(M): 8194490: [JVMCI] Move `iterateFrames` to C++ In-Reply-To: References: <6b9d58d4-9015-dfd7-a3bf-485c40516d24@oracle.com> <0aa88b2a-8922-044b-8132-4186fbdb7fbb@oracle.com> Message-ID: <904b5398-40e1-8f1d-30fb-b4afd802ee1a@oracle.com> On 3/2/18 3:34 AM, Gilles Duboscq wrote: > Hi Vladimir, > > Sorry about that, somehow i never saw your first message. > Tom and Doug reviewed the JDK8 version internally. > > Beside the copyright year change of `src/hotspot/share/jvmci/jvmciCompilerToVM.cpp` everything still applies cleanly. > Should i consider this reviewed? Yes. > I guess i should re-run the tests since it's been a while. Yes, please update to latest jdk/hs sources and run testing again. Thanks, Vladimir > > Gilles > > On 01/03/18 23:58, Vladimir Kozlov wrote: >> Hi Gilles, >> >> What happened to this fix? >> >> Testing all passed but it was not pushed. >> >> Thanks, >> Vladimir >> >> On 2/1/18 5:21 PM, Vladimir Kozlov wrote: >>> Thank you, Gilles >>> >>> Seems fine to me. Who reviewed it in Labs? >>> >>> And thank you for running testing. >>> >>> Vladimir >>> >>> On 1/22/18 6:58 AM, Gilles Duboscq wrote: >>>> Hi, >>>> >>>> Please review the following fix for `HotSpotStackIntrospection.iterateFrames`. >>>> It moves the iteration code from Java to C++: this helps with an issue that would arise if the nmethod containing the `iterateFrames` hits and uncommon trap during iteration. IT would change the layout of the top frames which would confuse the stack walking logic. Having this loop in C++ ensure there can be no uncommon trap. >>>> >>>> Webrev: http://cr.openjdk.java.net/~gdub/webrev-8194490/ >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8194490 >>>> Testing: hs-tier1,hs-tier2,hs-precheckin-comp >>>> >>>> It's bit unfortunate that we have tests for implementation details of JVMCI (i.e., tests for CompilerToVM) instead of tests for the actual API. >>>> >>>> Thanks, >>>> ? Gilles >>>> From vladimir.kozlov at oracle.com Fri Mar 2 20:55:39 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 2 Mar 2018 12:55:39 -0800 Subject: [11] RFR(S) 8198789: [TESTBUG] CTW of java.base and java.desktop takes long time Message-ID: <296b1056-7243-a8cc-cd45-5ddd887eeb76@oracle.com> http://cr.openjdk.java.net/~kvn/8198789/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8198789 Split CTW tests in 2 sets with long running tests in separate set: applications/ctw/modules/java_base.java applications/ctw/modules/java_desktop.java Run tier2 and tier3 hotspot testing which run CTW tests. -- Thanks, Vladimir From dean.long at oracle.com Fri Mar 2 21:12:31 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 2 Mar 2018 13:12:31 -0800 Subject: RFR: 8198571: [JVMCI] must not install wide vector code unless runtime supports it In-Reply-To: References: <17539397-8CE4-467D-A19F-C706560288FA@oracle.com> <815f1ab4-4eeb-5c03-cff0-954dedb88a2b@oracle.com> <598DBBE2-CEAD-42D5-8C9B-B747B72B5AD3@oracle.com> Message-ID: <9036a6f1-7de6-67f9-0c98-2852260671a7@oracle.com> Should we also bail out if HotSpotReferenceMap::maxRegisterSize(reference_map)) > MaxVectorSize or is_wide_vector(HotSpotReferenceMap::maxRegisterSize(reference_map))) && !is_wide_vector(MaxVectorSize) instead of determining based only on the existence of the safepoint blob?? Otherwise it looks good. dl On 3/2/18 8:56 AM, Doug Simon wrote: > Thanks Vladimir. > > Can I get another review please? > > -Doug > >> On 23 Feb 2018, at 21:14, Vladimir Kozlov wrote: >> >> Okay. Then it is good. >> >> Thanks, >> Vladimir >> >> On 2/23/18 11:59 AM, Doug Simon wrote: >>> Yes. We have a separate fix for Graal that does what you propose. This is just a last bit of defense to prevent a VM crash in case the bug creeps back into Graal (or any other JVMCI compiler). >>> Sent from my iPhone >>>> On 23 Feb 2018, at 8:54 pm, Vladimir Kozlov wrote: >>>> >>>> Hi Doug, >>>> >>>> Are you planning changes to check MaxVectorSize value when vectors are generated by Graal? >>>> Throwing exception during code installation is very late and expensive (you have to throw out all methods with vectors after spending CPUs to compile them). >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> On 2/23/18 8:57 AM, Doug Simon wrote: >>>>> As shown in https://github.com/oracle/graal/issues/303, a bug in a JVMCI compiler can result in vector code being installed even if the runtime doesn't support it. JVMCI should be defensive and raise an exception in this case. >>>>> https://bugs.openjdk.java.net/browse/JDK-8198571 >>>>> http://cr.openjdk.java.net/~dnsimon/8198571/ >>>>> -Doug From igor.ignatyev at oracle.com Fri Mar 2 22:34:15 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 2 Mar 2018 14:34:15 -0800 Subject: [11] RFR(S) 8198789: [TESTBUG] CTW of java.base and java.desktop takes long time In-Reply-To: <296b1056-7243-a8cc-cd45-5ddd887eeb76@oracle.com> References: <296b1056-7243-a8cc-cd45-5ddd887eeb76@oracle.com> Message-ID: Vladimir, the fix looks good to me. Thanks, -- Igor > On Mar 2, 2018, at 12:55 PM, Vladimir Kozlov wrote: > > http://cr.openjdk.java.net/~kvn/8198789/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8198789 > > Split CTW tests in 2 sets with long running tests in separate set: > > applications/ctw/modules/java_base.java > applications/ctw/modules/java_desktop.java > > Run tier2 and tier3 hotspot testing which run CTW tests. > > -- > Thanks, > Vladimir From vladimir.kozlov at oracle.com Fri Mar 2 22:35:02 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 2 Mar 2018 14:35:02 -0800 Subject: [11] RFR(S) 8198789: [TESTBUG] CTW of java.base and java.desktop takes long time In-Reply-To: References: <296b1056-7243-a8cc-cd45-5ddd887eeb76@oracle.com> Message-ID: <8dde68cc-ed1d-cec8-72ce-a9da38a283af@oracle.com> Thank you, Igor Vladimir On 3/2/18 2:34 PM, Igor Ignatyev wrote: > Vladimir, > > the fix looks good to me. > > Thanks, > -- Igor > >> On Mar 2, 2018, at 12:55 PM, Vladimir Kozlov wrote: >> >> http://cr.openjdk.java.net/~kvn/8198789/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8198789 >> >> Split CTW tests in 2 sets with long running tests in separate set: >> >> applications/ctw/modules/java_base.java >> applications/ctw/modules/java_desktop.java >> >> Run tier2 and tier3 hotspot testing which run CTW tests. >> >> -- >> Thanks, >> Vladimir > From dean.long at oracle.com Fri Mar 2 22:38:07 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 2 Mar 2018 14:38:07 -0800 Subject: [11] RFR(S) 8198789: [TESTBUG] CTW of java.base and java.desktop takes long time In-Reply-To: References: <296b1056-7243-a8cc-cd45-5ddd887eeb76@oracle.com> Message-ID: +1 dl On 3/2/18 2:34 PM, Igor Ignatyev wrote: > Vladimir, > > the fix looks good to me. > > Thanks, > -- Igor > >> On Mar 2, 2018, at 12:55 PM, Vladimir Kozlov wrote: >> >> http://cr.openjdk.java.net/~kvn/8198789/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8198789 >> >> Split CTW tests in 2 sets with long running tests in separate set: >> >> applications/ctw/modules/java_base.java >> applications/ctw/modules/java_desktop.java >> >> Run tier2 and tier3 hotspot testing which run CTW tests. >> >> -- >> Thanks, >> Vladimir From vladimir.kozlov at oracle.com Fri Mar 2 22:40:41 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 2 Mar 2018 14:40:41 -0800 Subject: [11] RFR(S) 8198789: [TESTBUG] CTW of java.base and java.desktop takes long time In-Reply-To: References: <296b1056-7243-a8cc-cd45-5ddd887eeb76@oracle.com> Message-ID: <35305b8d-b13a-5c25-de48-2df685ed50bc@oracle.com> Thank you, Dean Vladimir On 3/2/18 2:38 PM, dean.long at oracle.com wrote: > +1 > > dl > > On 3/2/18 2:34 PM, Igor Ignatyev wrote: >> Vladimir, >> >> the fix looks good to me. >> >> Thanks, >> -- Igor >> >>> On Mar 2, 2018, at 12:55 PM, Vladimir Kozlov >>> wrote: >>> >>> http://cr.openjdk.java.net/~kvn/8198789/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8198789 >>> >>> Split CTW tests in 2 sets with long running tests in separate set: >>> >>> applications/ctw/modules/java_base.java >>> applications/ctw/modules/java_desktop.java >>> >>> Run tier2 and tier3 hotspot testing which run CTW tests. >>> >>> -- >>> Thanks, >>> Vladimir > From doug.simon at oracle.com Sat Mar 3 18:25:01 2018 From: doug.simon at oracle.com (Doug Simon) Date: Sat, 3 Mar 2018 19:25:01 +0100 Subject: RFR: 8198571: [JVMCI] must not install wide vector code unless runtime supports it In-Reply-To: <9036a6f1-7de6-67f9-0c98-2852260671a7@oracle.com> References: <17539397-8CE4-467D-A19F-C706560288FA@oracle.com> <815f1ab4-4eeb-5c03-cff0-954dedb88a2b@oracle.com> <598DBBE2-CEAD-42D5-8C9B-B747B72B5AD3@oracle.com> <9036a6f1-7de6-67f9-0c98-2852260671a7@oracle.com> Message-ID: <927AACD3-AC31-4B22-B13B-FFBBA331C647@oracle.com> > On 2 Mar 2018, at 22:12, dean.long at oracle.com wrote: > > Should we also bail out if > > HotSpotReferenceMap::maxRegisterSize(reference_map)) > MaxVectorSize > > or > > is_wide_vector(HotSpotReferenceMap::maxRegisterSize(reference_map))) && !is_wide_vector(MaxVectorSize) > > instead of determining based only on the existence of the safepoint blob? Otherwise it looks good. Interesting questions. However, this test is a last line of defense in case a JVMCI compiler doesn't do the feature testing you propose. Its purpose is purely to ensure we do not install code relying on the existence of a certain safepoint handler. -Doug > > On 3/2/18 8:56 AM, Doug Simon wrote: >> Thanks Vladimir. >> >> Can I get another review please? >> >> -Doug >> >>> On 23 Feb 2018, at 21:14, Vladimir Kozlov wrote: >>> >>> Okay. Then it is good. >>> >>> Thanks, >>> Vladimir >>> >>> On 2/23/18 11:59 AM, Doug Simon wrote: >>>> Yes. We have a separate fix for Graal that does what you propose. This is just a last bit of defense to prevent a VM crash in case the bug creeps back into Graal (or any other JVMCI compiler). >>>> Sent from my iPhone >>>>> On 23 Feb 2018, at 8:54 pm, Vladimir Kozlov wrote: >>>>> >>>>> Hi Doug, >>>>> >>>>> Are you planning changes to check MaxVectorSize value when vectors are generated by Graal? >>>>> Throwing exception during code installation is very late and expensive (you have to throw out all methods with vectors after spending CPUs to compile them). >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> On 2/23/18 8:57 AM, Doug Simon wrote: >>>>>> As shown in https://github.com/oracle/graal/issues/303, a bug in a JVMCI compiler can result in vector code being installed even if the runtime doesn't support it. JVMCI should be defensive and raise an exception in this case. >>>>>> https://bugs.openjdk.java.net/browse/JDK-8198571 >>>>>> http://cr.openjdk.java.net/~dnsimon/8198571/ >>>>>> -Doug > From fairoz.matte at oracle.com Mon Mar 5 04:29:05 2018 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Sun, 4 Mar 2018 20:29:05 -0800 (PST) Subject: RFR JDK-8194642: Improve error reporting in hs_error file for JDK8 In-Reply-To: <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> References: <7035bccf-88f9-454a-8276-05c6a98d46e7@default> <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> Message-ID: Hi David, Thanks for the review. Restricting this issue only to improve OOM related error messaging. Fatal error reporting can be taken separately as there is already couple of other fatal errors need to be handled in similar way. Changed the description and scope of the work. Kindly review the webrev.01 having OOM related changes. http://cr.openjdk.java.net/~fmatte/8194642/webrev.01/ JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 Thanks, Fairoz > -----Original Message----- > From: David Holmes > Sent: Monday, February 26, 2018 11:00 AM > To: Fairoz Matte ; hotspot-compiler- > dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error file for > JDK8 > > Hi Fairoz, > > On 26/02/2018 2:10 PM, Fairoz Matte wrote: > > Hi All, > > > > Kindly review the small enhancement for 8u-dev, which is a mini backport > of JDK-8136421, only things related to hs_error file improvements were > considered. > > JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 > > Webrev - http://cr.openjdk.java.net/~fmatte/8194642/webrev.00/ > > > > Reference > > JDK9 bug - https://bugs.openjdk.java.net/browse/JDK-8136421 > > JDK9 changeset - > > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l381.1 > > and > > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l401.1 > > src/share/vm/utilities/vmError.cpp > > The backport of the OOM reason changes seems quite reasonable. > > src/share/vm/runtime/sharedRuntime.cpp > > It is not at all clear to me that simply doing "return NULL" is sufficient to > achieve the desired goal here. Given all the other changes that were done in > 8136421 I can't tell if something else may be needed for this part - which > seems to be the key change you are after. I have to wonder why we did not > already just "return NULL" if regular error reporting can already handle it? > > > Testing: JPRT no issues found > > What crash testing have you done to verify that the new error reports are as > expected? > > Thanks, > David > > > Thanks, > > Fairoz > > From david.holmes at oracle.com Mon Mar 5 05:05:09 2018 From: david.holmes at oracle.com (David Holmes) Date: Mon, 5 Mar 2018 15:05:09 +1000 Subject: RFR JDK-8194642: Improve error reporting in hs_error file for JDK8 In-Reply-To: References: <7035bccf-88f9-454a-8276-05c6a98d46e7@default> <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> Message-ID: <86127aa9-aec7-3ae3-b47a-a95bcd4915c2@oracle.com> Hi Fairoz, This seems fine. Not sure why you need the extra blank line output here: if (_verbose && _siginfo) { + st->cr(); os::print_siginfo(st, _siginfo); Thanks, David On 5/03/2018 2:29 PM, Fairoz Matte wrote: > Hi David, > > Thanks for the review. > Restricting this issue only to improve OOM related error messaging. Fatal error reporting can be taken separately as there is already couple of other fatal errors need to be handled in similar way. > Changed the description and scope of the work. > > Kindly review the webrev.01 having OOM related changes. > http://cr.openjdk.java.net/~fmatte/8194642/webrev.01/ > > JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 > > Thanks, > Fairoz > >> -----Original Message----- >> From: David Holmes >> Sent: Monday, February 26, 2018 11:00 AM >> To: Fairoz Matte ; hotspot-compiler- >> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error file for >> JDK8 >> >> Hi Fairoz, >> >> On 26/02/2018 2:10 PM, Fairoz Matte wrote: >>> Hi All, >>> >>> Kindly review the small enhancement for 8u-dev, which is a mini backport >> of JDK-8136421, only things related to hs_error file improvements were >> considered. >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 >>> Webrev - http://cr.openjdk.java.net/~fmatte/8194642/webrev.00/ >>> >>> Reference >>> JDK9 bug - https://bugs.openjdk.java.net/browse/JDK-8136421 >>> JDK9 changeset - >>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l381.1 >>> and >>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l401.1 >> >> src/share/vm/utilities/vmError.cpp >> >> The backport of the OOM reason changes seems quite reasonable. >> >> src/share/vm/runtime/sharedRuntime.cpp >> >> It is not at all clear to me that simply doing "return NULL" is sufficient to >> achieve the desired goal here. Given all the other changes that were done in >> 8136421 I can't tell if something else may be needed for this part - which >> seems to be the key change you are after. I have to wonder why we did not >> already just "return NULL" if regular error reporting can already handle it? >> >>> Testing: JPRT no issues found >> >> What crash testing have you done to verify that the new error reports are as >> expected? >> >> Thanks, >> David >> >>> Thanks, >>> Fairoz >>> From fairoz.matte at oracle.com Mon Mar 5 05:10:42 2018 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Sun, 4 Mar 2018 21:10:42 -0800 (PST) Subject: RFR JDK-8194642: Improve error reporting in hs_error file for JDK8 In-Reply-To: <86127aa9-aec7-3ae3-b47a-a95bcd4915c2@oracle.com> References: <7035bccf-88f9-454a-8276-05c6a98d46e7@default> <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> <86127aa9-aec7-3ae3-b47a-a95bcd4915c2@oracle.com> Message-ID: Hi David, > -----Original Message----- > From: David Holmes > Sent: Monday, March 05, 2018 10:35 AM > To: Fairoz Matte ; hotspot-compiler- > dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error file for > JDK8 > > Hi Fairoz, > > This seems fine. > > Not sure why you need the extra blank line output here: > > if (_verbose && _siginfo) { > + st->cr(); > os::print_siginfo(st, _siginfo); > It is just to get on next line, I will revert this change. I hope, next webrev for this change is not required? Thanks, Fairoz > Thanks, > David > > On 5/03/2018 2:29 PM, Fairoz Matte wrote: > > Hi David, > > > > Thanks for the review. > > Restricting this issue only to improve OOM related error messaging. Fatal > error reporting can be taken separately as there is already couple of other > fatal errors need to be handled in similar way. > > Changed the description and scope of the work. > > > > Kindly review the webrev.01 having OOM related changes. > > http://cr.openjdk.java.net/~fmatte/8194642/webrev.01/ > > > > JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 > > > > Thanks, > > Fairoz > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Monday, February 26, 2018 11:00 AM > >> To: Fairoz Matte ; hotspot-compiler- > >> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > >> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error > >> file for > >> JDK8 > >> > >> Hi Fairoz, > >> > >> On 26/02/2018 2:10 PM, Fairoz Matte wrote: > >>> Hi All, > >>> > >>> Kindly review the small enhancement for 8u-dev, which is a mini > >>> backport > >> of JDK-8136421, only things related to hs_error file improvements > >> were considered. > >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 > >>> Webrev - http://cr.openjdk.java.net/~fmatte/8194642/webrev.00/ > >>> > >>> Reference > >>> JDK9 bug - https://bugs.openjdk.java.net/browse/JDK-8136421 > >>> JDK9 changeset - > >>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l381.1 > >>> and > >>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l401.1 > >> > >> src/share/vm/utilities/vmError.cpp > >> > >> The backport of the OOM reason changes seems quite reasonable. > >> > >> src/share/vm/runtime/sharedRuntime.cpp > >> > >> It is not at all clear to me that simply doing "return NULL" is > >> sufficient to achieve the desired goal here. Given all the other > >> changes that were done in > >> 8136421 I can't tell if something else may be needed for this part - > >> which seems to be the key change you are after. I have to wonder why > >> we did not already just "return NULL" if regular error reporting can already > handle it? > >> > >>> Testing: JPRT no issues found > >> > >> What crash testing have you done to verify that the new error reports > >> are as expected? > >> > >> Thanks, > >> David > >> > >>> Thanks, > >>> Fairoz > >>> From david.holmes at oracle.com Mon Mar 5 06:23:48 2018 From: david.holmes at oracle.com (David Holmes) Date: Mon, 5 Mar 2018 16:23:48 +1000 Subject: RFR JDK-8194642: Improve error reporting in hs_error file for JDK8 In-Reply-To: References: <7035bccf-88f9-454a-8276-05c6a98d46e7@default> <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> <86127aa9-aec7-3ae3-b47a-a95bcd4915c2@oracle.com> Message-ID: On 5/03/2018 3:10 PM, Fairoz Matte wrote: > Hi David, > >> -----Original Message----- >> From: David Holmes >> Sent: Monday, March 05, 2018 10:35 AM >> To: Fairoz Matte ; hotspot-compiler- >> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error file for >> JDK8 >> >> Hi Fairoz, >> >> This seems fine. >> >> Not sure why you need the extra blank line output here: >> >> if (_verbose && _siginfo) { >> + st->cr(); >> os::print_siginfo(st, _siginfo); >> > > It is just to get on next line, I will revert this change. > I hope, next webrev for this change is not required? No, no need for a new webrev. All the preceding sections end with st->cr(), which seems to be the basic pattern in this code. Thanks, David > Thanks, > Fairoz > >> Thanks, >> David >> >> On 5/03/2018 2:29 PM, Fairoz Matte wrote: >>> Hi David, >>> >>> Thanks for the review. >>> Restricting this issue only to improve OOM related error messaging. Fatal >> error reporting can be taken separately as there is already couple of other >> fatal errors need to be handled in similar way. >>> Changed the description and scope of the work. >>> >>> Kindly review the webrev.01 having OOM related changes. >>> http://cr.openjdk.java.net/~fmatte/8194642/webrev.01/ >>> >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 >>> >>> Thanks, >>> Fairoz >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Monday, February 26, 2018 11:00 AM >>>> To: Fairoz Matte ; hotspot-compiler- >>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error >>>> file for >>>> JDK8 >>>> >>>> Hi Fairoz, >>>> >>>> On 26/02/2018 2:10 PM, Fairoz Matte wrote: >>>>> Hi All, >>>>> >>>>> Kindly review the small enhancement for 8u-dev, which is a mini >>>>> backport >>>> of JDK-8136421, only things related to hs_error file improvements >>>> were considered. >>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 >>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8194642/webrev.00/ >>>>> >>>>> Reference >>>>> JDK9 bug - https://bugs.openjdk.java.net/browse/JDK-8136421 >>>>> JDK9 changeset - >>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l381.1 >>>>> and >>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l401.1 >>>> >>>> src/share/vm/utilities/vmError.cpp >>>> >>>> The backport of the OOM reason changes seems quite reasonable. >>>> >>>> src/share/vm/runtime/sharedRuntime.cpp >>>> >>>> It is not at all clear to me that simply doing "return NULL" is >>>> sufficient to achieve the desired goal here. Given all the other >>>> changes that were done in >>>> 8136421 I can't tell if something else may be needed for this part - >>>> which seems to be the key change you are after. I have to wonder why >>>> we did not already just "return NULL" if regular error reporting can already >> handle it? >>>> >>>>> Testing: JPRT no issues found >>>> >>>> What crash testing have you done to verify that the new error reports >>>> are as expected? >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Fairoz >>>>> From kevin.walls at oracle.com Mon Mar 5 10:36:06 2018 From: kevin.walls at oracle.com (Kevin Walls) Date: Mon, 5 Mar 2018 10:36:06 +0000 Subject: RFR JDK-8194642: Improve error reporting in hs_error file for JDK8 In-Reply-To: References: <7035bccf-88f9-454a-8276-05c6a98d46e7@default> <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> <86127aa9-aec7-3ae3-b47a-a95bcd4915c2@oracle.com> Message-ID: <7399fbb3-929b-8d88-5e33-14f0e8941497@oracle.com> Hi Fairoz, Yes, looks good and useful.? And thanks David. In webrev.01 you want a blank line after the function end at 346 and the start of the comment for VMError::report(). If you can add a link to https://bugs.openjdk.java.net/browse/JDK-8064814 to this change in jbs that would be good, as you're kind of backporting that plus some part of 8026324 / 8026333 / 8026336 where print_oom_reasons() moves to its own function. Thanks Kevin On 05/03/2018 06:23, David Holmes wrote: > On 5/03/2018 3:10 PM, Fairoz Matte wrote: >> Hi David, >> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Monday, March 05, 2018 10:35 AM >>> To: Fairoz Matte ; hotspot-compiler- >>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error >>> file for >>> JDK8 >>> >>> Hi Fairoz, >>> >>> This seems fine. >>> >>> Not sure why you need the extra blank line output here: >>> >>> ???????? if (_verbose && _siginfo) { >>> +??????? st->cr(); >>> ?????????? os::print_siginfo(st, _siginfo); >>> >> >> It is just to get on next line, I will revert this change. >> I hope, next webrev for this change is not required? > > No, no need for a new webrev. All the preceding sections end with > st->cr(), which seems to be the basic pattern in this code. > > Thanks, > David > >> Thanks, >> Fairoz >> >>> Thanks, >>> David >>> >>> On 5/03/2018 2:29 PM, Fairoz Matte wrote: >>>> Hi David, >>>> >>>> Thanks for the review. >>>> Restricting this issue only to improve OOM related error messaging. >>>> Fatal >>> error reporting can be taken separately as there is already couple >>> of other >>> fatal errors need to be handled in similar way. >>>> Changed the description and scope of the work. >>>> >>>> Kindly review the webrev.01 having OOM related changes. >>>> http://cr.openjdk.java.net/~fmatte/8194642/webrev.01/ >>>> >>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 >>>> >>>> Thanks, >>>> Fairoz >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Monday, February 26, 2018 11:00 AM >>>>> To: Fairoz Matte ; hotspot-compiler- >>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error >>>>> file for >>>>> JDK8 >>>>> >>>>> Hi Fairoz, >>>>> >>>>> On 26/02/2018 2:10 PM, Fairoz Matte wrote: >>>>>> Hi All, >>>>>> >>>>>> Kindly review the small enhancement for 8u-dev, which is a mini >>>>>> backport >>>>> of JDK-8136421, only things related to hs_error file improvements >>>>> were considered. >>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 >>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8194642/webrev.00/ >>>>>> >>>>>> Reference >>>>>> JDK9 bug - https://bugs.openjdk.java.net/browse/JDK-8136421 >>>>>> JDK9 changeset - >>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l381.1 >>>>>> and >>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l401.1 >>>>> >>>>> src/share/vm/utilities/vmError.cpp >>>>> >>>>> The backport of the OOM reason changes seems quite reasonable. >>>>> >>>>> src/share/vm/runtime/sharedRuntime.cpp >>>>> >>>>> It is not at all clear to me that simply doing "return NULL" is >>>>> sufficient to achieve the desired goal here. Given all the other >>>>> changes that were done in >>>>> 8136421 I can't tell if something else may be needed for this part - >>>>> which seems to be the key change you are after. I have to wonder why >>>>> we did not already just "return NULL" if regular error reporting >>>>> can already >>> handle it? >>>>> >>>>>> Testing: JPRT no issues found >>>>> >>>>> What crash testing have you done to verify that the new error reports >>>>> are as expected? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, >>>>>> Fairoz >>>>>> From fairoz.matte at oracle.com Mon Mar 5 14:30:42 2018 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 5 Mar 2018 06:30:42 -0800 (PST) Subject: RFR JDK-8194642: Improve error reporting in hs_error file for JDK8 In-Reply-To: <7399fbb3-929b-8d88-5e33-14f0e8941497@oracle.com> References: <7035bccf-88f9-454a-8276-05c6a98d46e7@default> <0b00321e-7bf0-efc8-d34e-4dc7c1a08b58@oracle.com> <86127aa9-aec7-3ae3-b47a-a95bcd4915c2@oracle.com> <7399fbb3-929b-8d88-5e33-14f0e8941497@oracle.com> Message-ID: <79370e5f-d970-40b6-b09f-028afa6a118e@default> Thanks David and Kevin for the reviews. > -----Original Message----- > From: Kevin Walls > Sent: Monday, March 05, 2018 4:06 PM > To: David Holmes ; Fairoz Matte > ; hotspot-compiler-dev at openjdk.java.net; > hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error file for > JDK8 > > Hi Fairoz, > > Yes, looks good and useful.? And thanks David. > > In webrev.01 you want a blank line after the function end at 346 and the start > of the comment for VMError::report(). Added an extra line > > If you can add a link to > https://bugs.openjdk.java.net/browse/JDK-8064814 to this change in jbs > that would be good, as you're kind of backporting that plus some part of > 8026324 / 8026333 / 8026336 where print_oom_reasons() moves to its own > function. Links have been added. Thanks, Fairoz > > Thanks > Kevin > > > On 05/03/2018 06:23, David Holmes wrote: > > On 5/03/2018 3:10 PM, Fairoz Matte wrote: > >> Hi David, > >> > >>> -----Original Message----- > >>> From: David Holmes > >>> Sent: Monday, March 05, 2018 10:35 AM > >>> To: Fairoz Matte ; hotspot-compiler- > >>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > >>> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error > >>> file for > >>> JDK8 > >>> > >>> Hi Fairoz, > >>> > >>> This seems fine. > >>> > >>> Not sure why you need the extra blank line output here: > >>> > >>> ???????? if (_verbose && _siginfo) { > >>> +??????? st->cr(); > >>> ?????????? os::print_siginfo(st, _siginfo); > >>> > >> > >> It is just to get on next line, I will revert this change. > >> I hope, next webrev for this change is not required? > > > > No, no need for a new webrev. All the preceding sections end with > > st->cr(), which seems to be the basic pattern in this code. > > > > Thanks, > > David > > > >> Thanks, > >> Fairoz > >> > >>> Thanks, > >>> David > >>> > >>> On 5/03/2018 2:29 PM, Fairoz Matte wrote: > >>>> Hi David, > >>>> > >>>> Thanks for the review. > >>>> Restricting this issue only to improve OOM related error messaging. > >>>> Fatal > >>> error reporting can be taken separately as there is already couple > >>> of other fatal errors need to be handled in similar way. > >>>> Changed the description and scope of the work. > >>>> > >>>> Kindly review the webrev.01 having OOM related changes. > >>>> http://cr.openjdk.java.net/~fmatte/8194642/webrev.01/ > >>>> > >>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 > >>>> > >>>> Thanks, > >>>> Fairoz > >>>> > >>>>> -----Original Message----- > >>>>> From: David Holmes > >>>>> Sent: Monday, February 26, 2018 11:00 AM > >>>>> To: Fairoz Matte ; hotspot-compiler- > >>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > >>>>> Subject: Re: RFR JDK-8194642: Improve error reporting in hs_error > >>>>> file for > >>>>> JDK8 > >>>>> > >>>>> Hi Fairoz, > >>>>> > >>>>> On 26/02/2018 2:10 PM, Fairoz Matte wrote: > >>>>>> Hi All, > >>>>>> > >>>>>> Kindly review the small enhancement for 8u-dev, which is a mini > >>>>>> backport > >>>>> of JDK-8136421, only things related to hs_error file improvements > >>>>> were considered. > >>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8194642 > >>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8194642/webrev.00/ > >>>>>> > >>>>>> Reference > >>>>>> JDK9 bug - https://bugs.openjdk.java.net/browse/JDK-8136421 > >>>>>> JDK9 changeset - > >>>>>> > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l38 > >>>>>> 1.1 > >>>>>> and > >>>>>> > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a41fe5ffa839#l40 > >>>>>> 1.1 > >>>>> > >>>>> src/share/vm/utilities/vmError.cpp > >>>>> > >>>>> The backport of the OOM reason changes seems quite reasonable. > >>>>> > >>>>> src/share/vm/runtime/sharedRuntime.cpp > >>>>> > >>>>> It is not at all clear to me that simply doing "return NULL" is > >>>>> sufficient to achieve the desired goal here. Given all the other > >>>>> changes that were done in > >>>>> 8136421 I can't tell if something else may be needed for this part > >>>>> - which seems to be the key change you are after. I have to wonder > >>>>> why we did not already just "return NULL" if regular error > >>>>> reporting can already > >>> handle it? > >>>>> > >>>>>> Testing: JPRT no issues found > >>>>> > >>>>> What crash testing have you done to verify that the new error > >>>>> reports are as expected? > >>>>> > >>>>> Thanks, > >>>>> David > >>>>> > >>>>>> Thanks, > >>>>>> Fairoz > >>>>>> > From tobias.hartmann at oracle.com Mon Mar 5 14:31:30 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Mar 2018 15:31:30 +0100 Subject: [11] RFR(S): 8198987: [Graal] compiler/intrinsics/sha/sanity tests fail on macos with Graal as JIT Message-ID: <66ab3d9f-e57b-68bd-fca9-6ab90564ebd9@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8198987 http://cr.openjdk.java.net/~thartmann/8198987/webrev.00/ Although Graal supports SHA intrinsics [1], the test uses LogCompilation to determine if the intrinsics were emitted which is not supported by Graal. Also, the test does not check if the SHA intrinsics are actually available. It also fails if C2 is used and the intrinsics are disabled via -XX:DisableIntrinsic=_sha_implCompress,_sha2_implCompress,_sha5_implCompress. Similar to the mathexact tests (see JDK-8182727), we should use the isIntrinsicAvailable WhiteBox API method. It currently always returns false for Graal as JIT (Graal has its own unit tests for intrinsics). Thanks, Tobias [1] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/meta/HotSpotGraphBuilderPlugins.java#L514 From nils.eliasson at oracle.com Mon Mar 5 15:50:59 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 5 Mar 2018 16:50:59 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item Message-ID: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> Hi, This patch is a workaround for a scheduling problem encountered in some rare circumstances. Instead of hitting the assert we retry the compilation without subsuming loads. To quote Tobias: "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) is scheduled in a different block than its jmpCon user and the register allocator tries to spill the flag register. The problem is that PhaseCFG::schedule_late() detects an anti-dependency for the testN_mem_reg0 on a bottom memory Phi and therefore raises the LCA to the early block (see PhaseCFG::insert_anti_dependences()) which is "far away" from its jmpCon user. " Thanks to Roland who suggested the workaround. https://bugs.openjdk.java.net/browse/JDK-8192992 http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ Regards, Nils From tobias.hartmann at oracle.com Mon Mar 5 16:00:26 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Mar 2018 17:00:26 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> Message-ID: <90903ea0-442f-0e6b-953a-922ee8d7ec8e@oracle.com> Hi Nils, looks reasonable to me. Please add the "noreg-hard" label to the bug. Best regards, Tobias On 05.03.2018 16:50, Nils Eliasson wrote: > Hi, > > This patch is a workaround for a scheduling problem encountered in some rare circumstances. Instead > of hitting the assert we retry the compilation without subsuming loads. > > To quote Tobias: > > "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) is scheduled in a different > block than its jmpCon user and the register allocator tries to spill the flag register. The problem > is that PhaseCFG::schedule_late() detects an anti-dependency for the testN_mem_reg0 on a bottom > memory Phi and therefore raises the LCA to the early block (see PhaseCFG::insert_anti_dependences()) > which is "far away" from its jmpCon user. " > > Thanks to Roland who suggested the workaround. > > https://bugs.openjdk.java.net/browse/JDK-8192992 > > http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ > > Regards, > > Nils > From vladimir.kozlov at oracle.com Mon Mar 5 16:30:04 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 5 Mar 2018 08:30:04 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: References: Message-ID: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> There were several bugs before when we had trouble with loads which have control edge. As I remember we only require RAW loads to have such edges. Meaning Load nodes should have only dependency on memory state. Of cause, there could be exclusions. Originally EA can skip all membars for instance's load because it assumes that it will end-up in Store node into allocated object which should *follow* instance's allocation. And it can skip membars (which follow allocation) because nobody see non-escaping allocation. Load (#391) is not instance load from instance array (#363). It is load from source Arraycopy (#255) (it is not allocation). So it should not have bypass membars separating them: http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 So it is really some problem in step 2) in EA. Could be because only one alias index (memory slice) is used for whole array access. So what memory slice of Merge node (#379) was updated to bypass membar? Vladimir On 3/2/18 6:47 AM, Vladimir Ivanov wrote: > Hi, > > I'm seeing unschedulable graph being produced during GCM when adding > anti-dependence to a load node with a control dependency. I found the > root cause, but can't decide how to fix it. > > Here are steps which lead to the broken graph: > > ?(1) The load causing problems (#391) is added as part of specializing > ArrayCopy for small arrays (added as part of JDK-6912521 [1] in 9). Both > control & memory are tied to AllocateArray. (IR [2]) > > ?(2) EA proves that AllocateArray (#363, destination) is scalar > replaceable and during split_unique_types() updates corresponding > MemoryMerge (#379) and it allows to directly use memory produced by > ArrayCopy (#255, source) bypassing the allocation & membar (#348). (IR [3]) > > ?(3) After allocation elimination, the load control dependency is > switched to MemBarCPUOrder (#348) which was immediate dominator of > eliminated allocation (IR [4]) > > ?(4) After matching the load has control on the membar, but not memory > (IR before [5] and after [6] matching.) > > ?(5) During GCM, anti-dependence from membar (#317) to the load is > added, but it makes the graph unschedulable which then triggers the > assertion [7] during LCM. > > Relevant places in the code: [8] > > Everything looks fine, except updates of MergeMems in step #2: > > ? * the load is pinned to the proper branch after deciding what > direction to go; > > ? * wide membars do need anti-dependences on loads > > So, as a fix I'd disable memory edge updates which bypass any membars. > Does it sound reasonable or am I missing something important? > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-6912521 > > [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png > > [3] > http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png > > > [4] > http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png > > > [5] > http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png > > [6] http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png > > [7] > #? Internal Error > (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), > pid=90414, tid=14851 > #? assert(false) failed: graph should be schedulable > > > [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Mon Mar 5 17:02:22 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 5 Mar 2018 09:02:22 -0800 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> Message-ID: Hi Nils, Yes, it is legal workaround but this way you removed all subsuming loads in code. Should we do anti-dependency check for loads during matching when shared nodes are marked?: http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 How expensive would be that? Vladimir On 3/5/18 7:50 AM, Nils Eliasson wrote: > Hi, > > This patch is a workaround for a scheduling problem encountered in some > rare circumstances. Instead of hitting the assert we retry the > compilation without subsuming loads. > > To quote Tobias: > > "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) is > scheduled in a different block than its jmpCon user and the register > allocator tries to spill the flag register. The problem is that > PhaseCFG::schedule_late() detects an anti-dependency for the > testN_mem_reg0 on a bottom memory Phi and therefore raises the LCA to > the early block (see PhaseCFG::insert_anti_dependences()) which is "far > away" from its jmpCon user. " > > Thanks to Roland who suggested the workaround. > > https://bugs.openjdk.java.net/browse/JDK-8192992 > > http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ > > Regards, > > Nils > From vladimir.kozlov at oracle.com Mon Mar 5 17:05:21 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 5 Mar 2018 09:05:21 -0800 Subject: [11] RFR(S): 8198987: [Graal] compiler/intrinsics/sha/sanity tests fail on macos with Graal as JIT In-Reply-To: <66ab3d9f-e57b-68bd-fca9-6ab90564ebd9@oracle.com> References: <66ab3d9f-e57b-68bd-fca9-6ab90564ebd9@oracle.com> Message-ID: <26df25b2-e9f0-b60b-2805-1165c406e538@oracle.com> Good. Thanks, Vladimir On 3/5/18 6:31 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8198987 > http://cr.openjdk.java.net/~thartmann/8198987/webrev.00/ > > Although Graal supports SHA intrinsics [1], the test uses LogCompilation to determine if the > intrinsics were emitted which is not supported by Graal. Also, the test does not check if the SHA > intrinsics are actually available. It also fails if C2 is used and the intrinsics are disabled via > -XX:DisableIntrinsic=_sha_implCompress,_sha2_implCompress,_sha5_implCompress. > > Similar to the mathexact tests (see JDK-8182727), we should use the isIntrinsicAvailable WhiteBox > API method. It currently always returns false for Graal as JIT (Graal has its own unit tests for > intrinsics). > > Thanks, > Tobias > > [1] > https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/meta/HotSpotGraphBuilderPlugins.java#L514 > From tobias.hartmann at oracle.com Mon Mar 5 18:22:41 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Mar 2018 19:22:41 +0100 Subject: [11] RFR(S): 8198987: [Graal] compiler/intrinsics/sha/sanity tests fail on macos with Graal as JIT In-Reply-To: <26df25b2-e9f0-b60b-2805-1165c406e538@oracle.com> References: <66ab3d9f-e57b-68bd-fca9-6ab90564ebd9@oracle.com> <26df25b2-e9f0-b60b-2805-1165c406e538@oracle.com> Message-ID: Thanks Vladimir. Best regards, Tobias On 05.03.2018 18:05, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 3/5/18 6:31 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8198987 >> http://cr.openjdk.java.net/~thartmann/8198987/webrev.00/ >> >> Although Graal supports SHA intrinsics [1], the test uses LogCompilation to determine if the >> intrinsics were emitted which is not supported by Graal. Also, the test does not check if the SHA >> intrinsics are actually available. It also fails if C2 is used and the intrinsics are disabled via >> -XX:DisableIntrinsic=_sha_implCompress,_sha2_implCompress,_sha5_implCompress. >> >> Similar to the mathexact tests (see JDK-8182727), we should use the isIntrinsicAvailable WhiteBox >> API method. It currently always returns false for Graal as JIT (Graal has its own unit tests for >> intrinsics). >> >> Thanks, >> Tobias >> >> [1] >> https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/meta/HotSpotGraphBuilderPlugins.java#L514 >> >> From igor.ignatyev at oracle.com Mon Mar 5 19:43:55 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 5 Mar 2018 11:43:55 -0800 Subject: RFR(XXS) : 8199050 : reenable concurrent execution of compiler tests Message-ID: <8E9B0B17-244A-400A-B9C8-809DF28A4CD3@oracle.com> http://cr.openjdk.java.net/~iignatyev/8199050/webrev.00/ > 3 lines changed: 0 ins; 3 del; Hi all, Concurrent execution has been disabled for compiler tests which use JIB b/c JIB wasn't able to perform artifactory installation concurrently. now w/ it being fixed, the tests can be run concurrently. webrev: http://cr.openjdk.java.net/~iignatyev/8199050/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8199050 testing: compiler/aot tests Thanks, -- Igor From vladimir.kozlov at oracle.com Mon Mar 5 20:10:47 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 5 Mar 2018 12:10:47 -0800 Subject: RFR(XXS) : 8199050 : reenable concurrent execution of compiler tests In-Reply-To: <8E9B0B17-244A-400A-B9C8-809DF28A4CD3@oracle.com> References: <8E9B0B17-244A-400A-B9C8-809DF28A4CD3@oracle.com> Message-ID: <26f0d2cd-136a-7eed-4a96-2322ff63ae2c@oracle.com> Looks good. Vladimir On 3/5/18 11:43 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8199050/webrev.00/ >> 3 lines changed: 0 ins; 3 del; > > Hi all, > > Concurrent execution has been disabled for compiler tests which use JIB b/c JIB wasn't able to perform artifactory installation concurrently. now w/ it being fixed, the tests can be run concurrently. > > webrev: http://cr.openjdk.java.net/~iignatyev/8199050/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8199050 > testing: compiler/aot tests > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Mon Mar 5 21:09:42 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 5 Mar 2018 13:09:42 -0800 Subject: RFR(XXS) : 8199050 : reenable concurrent execution of compiler tests In-Reply-To: <26f0d2cd-136a-7eed-4a96-2322ff63ae2c@oracle.com> References: <8E9B0B17-244A-400A-B9C8-809DF28A4CD3@oracle.com> <26f0d2cd-136a-7eed-4a96-2322ff63ae2c@oracle.com> Message-ID: <3338040D-5CC1-4E73-9552-E805F516DA63@oracle.com> Thanks for your review Vladimir, -- Igor > On Mar 5, 2018, at 12:10 PM, Vladimir Kozlov wrote: > > Looks good. > > Vladimir > > On 3/5/18 11:43 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8199050/webrev.00/ >>> 3 lines changed: 0 ins; 3 del; >> Hi all, >> Concurrent execution has been disabled for compiler tests which use JIB b/c JIB wasn't able to perform artifactory installation concurrently. now w/ it being fixed, the tests can be run concurrently. >> webrev: http://cr.openjdk.java.net/~iignatyev/8199050/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8199050 >> testing: compiler/aot tests >> Thanks, >> -- Igor From tobias.hartmann at oracle.com Tue Mar 6 10:25:01 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 6 Mar 2018 11:25:01 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> Message-ID: <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> Hi Lutz, first of all, this is a very nice enhancement! I haven't looked at the code in detail yet (will wait for the next webrev). Here are some comments: - The documentation is great and should at least go into the release notes (I've added the release-not=yes label). I think we should also add something to the Tools Reference Guide [1] or even the JVM Guide [2]. - As Robbin already pointed out, please use unified logging instead of PrintCodeHeapState. I think that would also allow to specify and on the command line. - I think it would be nice to print some/all of this new information if the code cache is full at CodeCache::report_codemem_full() if PrintCodeHeapState is enabled. We had several customer reported issues in the past where the compilers were disabled although there still was some free space in the code cache. We were never able to reproduce/analyze but always expected this to be due to high fragmentation. - JFR integration would be nice but that should be done in a separate RFE - It's a style question but I would prefer the pointer asterisk at the type not the argument. For example, "outputStream* out" instead of "outputStream *out" - heap.hpp: please put the commas after the definitions in the enum Thanks, Tobias [1] https://docs.oracle.com/javase/9/tools/java.htm#JSWOR624 [2] https://docs.oracle.com/javase/9/vm/java-virtual-machine-technology-overview.htm#JSJVM-GUID-982B244A-9B01-479A-8651-CB6475019281 On 01.03.2018 18:17, Schmidt, Lutz wrote: > Dear all, > > ? > > may I please request reviews for this quite voluminous enhancement: > > ? > > Bug:??? ?https://bugs.openjdk.java.net/browse/JDK-8198691 > > Webrev:? http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html ? > > ? > > Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The > changes to other files are not difficult to understand. > > ? > > If you need information about what this enhancement does and how it can be used, please refer to the > bug description. There I have attached some documentation which will greatly help with understanding > the code. > > ? > > Thank you! > > Lutz > > ? > > ? > > ? > From tobias.hartmann at oracle.com Tue Mar 6 10:31:51 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 6 Mar 2018 11:31:51 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> Message-ID: I've quickly ran this through our testing and there are some build failures: jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1710) : warning C4101: 'frameLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1711) : warning C4101: 'textLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1922) : warning C4101: 'frameLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1923) : warning C4101: 'textLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2094) : warning C4101: 'frameLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2095) : warning C4101: 'textLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2282) : warning C4101: 'frameLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2283) : warning C4101: 'textLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2479) : warning C4101: 'frameLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2480) : warning C4101: 'textLine' : unreferenced local variable jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2699) : warning C4267: '+=' : conversion from 'size_t' to 'int', possible loss of data jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2700) : warning C4267: '+=' : conversion from 'size_t' to 'int', possible loss of data Best regards, Tobias On 06.03.2018 11:25, Tobias Hartmann wrote: > Hi Lutz, > > first of all, this is a very nice enhancement! > > I haven't looked at the code in detail yet (will wait for the next webrev). Here are some comments: > - The documentation is great and should at least go into the release notes (I've added the > release-not=yes label). I think we should also add something to the Tools Reference Guide [1] or > even the JVM Guide [2]. > - As Robbin already pointed out, please use unified logging instead of PrintCodeHeapState. I think > that would also allow to specify and on the command line. > - I think it would be nice to print some/all of this new information if the code cache is full at > CodeCache::report_codemem_full() if PrintCodeHeapState is enabled. We had several customer reported > issues in the past where the compilers were disabled although there still was some free space in the > code cache. We were never able to reproduce/analyze but always expected this to be due to high > fragmentation. > - JFR integration would be nice but that should be done in a separate RFE > - It's a style question but I would prefer the pointer asterisk at the type not the argument. For > example, "outputStream* out" instead of "outputStream *out" > - heap.hpp: please put the commas after the definitions in the enum > > Thanks, > Tobias > > [1] https://docs.oracle.com/javase/9/tools/java.htm#JSWOR624 > [2] > https://docs.oracle.com/javase/9/vm/java-virtual-machine-technology-overview.htm#JSJVM-GUID-982B244A-9B01-479A-8651-CB6475019281 > > > On 01.03.2018 18:17, Schmidt, Lutz wrote: >> Dear all, >> >> ? >> >> may I please request reviews for this quite voluminous enhancement: >> >> ? >> >> Bug:??? ?https://bugs.openjdk.java.net/browse/JDK-8198691 >> >> Webrev:? http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html ? >> >> ? >> >> Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The >> changes to other files are not difficult to understand. >> >> ? >> >> If you need information about what this enhancement does and how it can be used, please refer to the >> bug description. There I have attached some documentation which will greatly help with understanding >> the code. >> >> ? >> >> Thank you! >> >> Lutz >> >> ? >> >> ? >> >> ? >> From tobias.hartmann at oracle.com Tue Mar 6 13:32:31 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 6 Mar 2018 14:32:31 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> Message-ID: Hi Lutz, I found more build/test problems and added a summary comment to the bug. Please let me know if you need more information. Best regards, Tobias On 06.03.2018 11:31, Tobias Hartmann wrote: > I've quickly ran this through our testing and there are some build failures: > > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1710) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1711) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1922) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1923) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2094) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2095) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2282) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2283) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2479) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2480) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2699) : warning C4267: '+=' : conversion > from 'size_t' to 'int', possible loss of data > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2700) : warning C4267: '+=' : conversion > from 'size_t' to 'int', possible loss of data > > Best regards, > Tobias > > On 06.03.2018 11:25, Tobias Hartmann wrote: >> Hi Lutz, >> >> first of all, this is a very nice enhancement! >> >> I haven't looked at the code in detail yet (will wait for the next webrev). Here are some comments: >> - The documentation is great and should at least go into the release notes (I've added the >> release-not=yes label). I think we should also add something to the Tools Reference Guide [1] or >> even the JVM Guide [2]. >> - As Robbin already pointed out, please use unified logging instead of PrintCodeHeapState. I think >> that would also allow to specify and on the command line. >> - I think it would be nice to print some/all of this new information if the code cache is full at >> CodeCache::report_codemem_full() if PrintCodeHeapState is enabled. We had several customer reported >> issues in the past where the compilers were disabled although there still was some free space in the >> code cache. We were never able to reproduce/analyze but always expected this to be due to high >> fragmentation. >> - JFR integration would be nice but that should be done in a separate RFE >> - It's a style question but I would prefer the pointer asterisk at the type not the argument. For >> example, "outputStream* out" instead of "outputStream *out" >> - heap.hpp: please put the commas after the definitions in the enum >> >> Thanks, >> Tobias >> >> [1] https://docs.oracle.com/javase/9/tools/java.htm#JSWOR624 >> [2] >> https://docs.oracle.com/javase/9/vm/java-virtual-machine-technology-overview.htm#JSJVM-GUID-982B244A-9B01-479A-8651-CB6475019281 >> >> >> On 01.03.2018 18:17, Schmidt, Lutz wrote: >>> Dear all, >>> >>> ? >>> >>> may I please request reviews for this quite voluminous enhancement: >>> >>> ? >>> >>> Bug:??? ?https://bugs.openjdk.java.net/browse/JDK-8198691 >>> >>> Webrev:? http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html ? >>> >>> ? >>> >>> Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The >>> changes to other files are not difficult to understand. >>> >>> ? >>> >>> If you need information about what this enhancement does and how it can be used, please refer to the >>> bug description. There I have attached some documentation which will greatly help with understanding >>> the code. >>> >>> ? >>> >>> Thank you! >>> >>> Lutz >>> >>> ? >>> >>> ? >>> >>> ? >>> From lutz.schmidt at sap.com Tue Mar 6 17:03:35 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 6 Mar 2018 17:03:35 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> Message-ID: <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> Hi Tobias, thanks for the extensive testing. Here are my comments/actions: ISSUES ====== - frameLine/textLine warnings: These were leftover declarations from when I consolidated all that box printing into printBox(). Removed. - "+=" warning: Resolved by using "unsigned int" and explicit cast "(unsigned int)strlen(text)" - "memset" error: Only ran into this problem when moving the new code to a separate file. Resolved by casting to (void*) as suggested. - failing AOT tests: I will restrict analytics to FOR_ALL_ALLOCABLE_HEAPS(), as suggested by Vladimir. Not sure if that will heal the tests. - SIGFPE in CodeHeap::aggregate(): could that be related to the "memset" error? Never had that here at SAP, neither in OpenJDK test nor in SAP JVM. If that issue persists with the new webrev (coming up soon), I may need ad'l information. If not activated by command line argument or explicit call, the new code has _ZERO_ effect. COMMENTS ========== - Documentation format: which format would you need? The PDF I uploaded is generated from our internal Wiki. It was the least effort for now. I could also provide plain text (losing all the formatting) or MS Word .docx (hopefully preserving most of the formatting). - Documentation location: I'm not familiar with the policies that direct documentation to a certain place. Please put it where you think it's appropriate. Of course I will help wherever I can. - Documentation contents/writing style: I will ask a SAP documentation specialist to have a look at it once it's final content-wise. That might eliminate some German-sounding English text. - Printing on CodeCache full condition: We have that in our SAP JVM product. Must be activated by a command line argument. Already proved helpful. - Expect "unified logging" instead of "PrintCodeHeapState" with the new webrev. - Code style: will move the '*' and ',' as requested. So please stay tuned. I'm working hard to get all the modifications ready. It's a lot to do and, unfortunately, there is that day-to-day business demanding some attention as well. Regards, Lutz ?On 06.03.18, 14:32, "Tobias Hartmann" wrote: Hi Lutz, I found more build/test problems and added a summary comment to the bug. Please let me know if you need more information. Best regards, Tobias On 06.03.2018 11:31, Tobias Hartmann wrote: > I've quickly ran this through our testing and there are some build failures: > > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1710) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1711) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1922) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(1923) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2094) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2095) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2282) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2283) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2479) : warning C4101: 'frameLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2480) : warning C4101: 'textLine' : > unreferenced local variable > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2699) : warning C4267: '+=' : conversion > from 'size_t' to 'int', possible loss of data > jib > t:/workspace/open/src/hotspot/share/memory/heap.cpp(2700) : warning C4267: '+=' : conversion > from 'size_t' to 'int', possible loss of data > > Best regards, > Tobias > > On 06.03.2018 11:25, Tobias Hartmann wrote: >> Hi Lutz, >> >> first of all, this is a very nice enhancement! >> >> I haven't looked at the code in detail yet (will wait for the next webrev). Here are some comments: >> - The documentation is great and should at least go into the release notes (I've added the >> release-not=yes label). I think we should also add something to the Tools Reference Guide [1] or >> even the JVM Guide [2]. >> - As Robbin already pointed out, please use unified logging instead of PrintCodeHeapState. I think >> that would also allow to specify and on the command line. >> - I think it would be nice to print some/all of this new information if the code cache is full at >> CodeCache::report_codemem_full() if PrintCodeHeapState is enabled. We had several customer reported >> issues in the past where the compilers were disabled although there still was some free space in the >> code cache. We were never able to reproduce/analyze but always expected this to be due to high >> fragmentation. >> - JFR integration would be nice but that should be done in a separate RFE >> - It's a style question but I would prefer the pointer asterisk at the type not the argument. For >> example, "outputStream* out" instead of "outputStream *out" >> - heap.hpp: please put the commas after the definitions in the enum >> >> Thanks, >> Tobias >> >> [1] https://docs.oracle.com/javase/9/tools/java.htm#JSWOR624 >> [2] >> https://docs.oracle.com/javase/9/vm/java-virtual-machine-technology-overview.htm#JSJVM-GUID-982B244A-9B01-479A-8651-CB6475019281 >> >> >> On 01.03.2018 18:17, Schmidt, Lutz wrote: >>> Dear all, >>> >>> >>> >>> may I please request reviews for this quite voluminous enhancement: >>> >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 >>> >>> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.00/index.html >>> >>> >>> >>> Don?t get afraid! Most of the logic is new and isolated in a big, separate block in heap.cpp. The >>> changes to other files are not difficult to understand. >>> >>> >>> >>> If you need information about what this enhancement does and how it can be used, please refer to the >>> bug description. There I have attached some documentation which will greatly help with understanding >>> the code. >>> >>> >>> >>> Thank you! >>> >>> Lutz >>> >>> >>> >>> >>> >>> >>> From vladimir.kozlov at oracle.com Tue Mar 6 17:36:45 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 09:36:45 -0800 Subject: RFR(M): 8194490: [JVMCI] Move `iterateFrames` to C++ In-Reply-To: <0878bf59-fe9e-9c34-f07a-8252c62c6b1f@oracle.com> References: <6b9d58d4-9015-dfd7-a3bf-485c40516d24@oracle.com> <0aa88b2a-8922-044b-8132-4186fbdb7fbb@oracle.com> <904b5398-40e1-8f1d-30fb-b4afd802ee1a@oracle.com> <793e8655-ba42-44a4-f35a-bca097cd2977@oracle.com> <0878bf59-fe9e-9c34-f07a-8252c62c6b1f@oracle.com> Message-ID: Thank you, Gilles I looked on logs of timeouted testing and there were no failing tests until they timeout. So it should be fine. Thanks, Vladimir On 3/6/18 8:23 AM, Gilles Duboscq wrote: > I also re-generated a webrev that will apply cleanly: > http://cr.openjdk.java.net/~gdub/webrev-8194490.2/ > > On 06/03/18 17:22, Gilles Duboscq wrote: >> Hi, >> >> I re-ran the tests: >> http://java.se.oracle.com:10065/mdash/jobs/gmdubosc-8194490-20180305-1320-13250 >> There are 2 timeouts (flaky test?) >> >> Tom, Doug, could either of you push this? >> >> Gilles >> >> On 02/03/18 19:26, Vladimir Kozlov wrote: >>> On 3/2/18 3:34 AM, Gilles Duboscq wrote: >>>> Hi Vladimir, >>>> >>>> Sorry about that, somehow i never saw your first message. >>>> Tom and Doug reviewed the JDK8 version internally. >>>> >>>> Beside the copyright year change of `src/hotspot/share/jvmci/jvmciCompilerToVM.cpp` everything still applies cleanly. >>>> Should i consider this reviewed? >>> >>> Yes. >>> >>>> I guess i should re-run the tests since it's been a while. >>> >>> Yes, please update to latest jdk/hs sources and run testing again. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> ? Gilles >>>> >>>> On 01/03/18 23:58, Vladimir Kozlov wrote: >>>>> Hi Gilles, >>>>> >>>>> What happened to this fix? >>>>> >>>>> Testing all passed but it was not pushed. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 2/1/18 5:21 PM, Vladimir Kozlov wrote: >>>>>> Thank you, Gilles >>>>>> >>>>>> Seems fine to me. Who reviewed it in Labs? >>>>>> >>>>>> And thank you for running testing. >>>>>> >>>>>> Vladimir >>>>>> >>>>>> On 1/22/18 6:58 AM, Gilles Duboscq wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review the following fix for `HotSpotStackIntrospection.iterateFrames`. >>>>>>> It moves the iteration code from Java to C++: this helps with an issue that would arise if the nmethod containing the `iterateFrames` hits and uncommon trap during iteration. IT would change the layout of the top frames which would confuse the stack walking logic. Having this loop in C++ ensure there can be no uncommon trap. >>>>>>> >>>>>>> Webrev: http://cr.openjdk.java.net/~gdub/webrev-8194490/ >>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8194490 >>>>>>> Testing: hs-tier1,hs-tier2,hs-precheckin-comp >>>>>>> >>>>>>> It's bit unfortunate that we have tests for implementation details of JVMCI (i.e., tests for CompilerToVM) instead of tests for the actual API. >>>>>>> >>>>>>> Thanks, >>>>>>> ?? Gilles >>>>>>> From vladimir.x.ivanov at oracle.com Tue Mar 6 18:51:23 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 6 Mar 2018 21:51:23 +0300 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> Message-ID: <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> > There were several bugs before when we had trouble with loads which have > control edge. As I remember we only require RAW loads to have such > edges. Meaning Load nodes should have only dependency on memory state. > Of cause, there could be exclusions. > > Originally EA can skip all membars for instance's load because it > assumes that it will end-up in Store node into allocated object which > should *follow* instance's allocation. And it can skip membars (which > follow allocation) because nobody see non-escaping allocation. > > Load (#391) is not instance load from instance array (#363). It is load > from source Arraycopy (#255) (it is not allocation). So it should not > have bypass membars separating them: > > http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 Updated IR dump during before/after split_unique_types with wider context (and, unfortunately, different node ids): http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png One detail is missing in the original description: there's another AllocateArray (#311) dominating the ArrayCopy (#389) and loads access it directly. ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() returns true and stops further analysis: http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 > So it is really some problem in step 2) in EA. Could be because only one > alias index (memory slice) is used for whole array access. Unlikely, since I don't see any interference between accesses to different elements during split_unique_types(). > So what memory slice of Merge node (#379) was updated to bypass membar? It updates instance memory slice corresponding to: bool[int:8]:NotNull:exact+any *,iid=311 Best regards, Vladimir Ivanov > On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >> Hi, >> >> I'm seeing unschedulable graph being produced during GCM when adding >> anti-dependence to a load node with a control dependency. I found the >> root cause, but can't decide how to fix it. >> >> Here are steps which lead to the broken graph: >> >> ??(1) The load causing problems (#391) is added as part of >> specializing ArrayCopy for small arrays (added as part of JDK-6912521 >> [1] in 9). Both control & memory are tied to AllocateArray. (IR [2]) >> >> ??(2) EA proves that AllocateArray (#363, destination) is scalar >> replaceable and during split_unique_types() updates corresponding >> MemoryMerge (#379) and it allows to directly use memory produced by >> ArrayCopy (#255, source) bypassing the allocation & membar (#348). (IR >> [3]) >> >> ??(3) After allocation elimination, the load control dependency is >> switched to MemBarCPUOrder (#348) which was immediate dominator of >> eliminated allocation (IR [4]) >> >> ??(4) After matching the load has control on the membar, but not >> memory (IR before [5] and after [6] matching.) >> >> ??(5) During GCM, anti-dependence from membar (#317) to the load is >> added, but it makes the graph unschedulable which then triggers the >> assertion [7] during LCM. >> >> Relevant places in the code: [8] >> >> Everything looks fine, except updates of MergeMems in step #2: >> >> ?? * the load is pinned to the proper branch after deciding what >> direction to go; >> >> ?? * wide membars do need anti-dependences on loads >> >> So, as a fix I'd disable memory edge updates which bypass any membars. >> Does it sound reasonable or am I missing something important? >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >> >> [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >> >> [3] >> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >> >> >> [4] >> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >> >> >> [5] >> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >> >> [6] >> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >> >> [7] >> #? Internal Error >> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >> pid=90414, tid=14851 >> #? assert(false) failed: graph should be schedulable >> >> >> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Tue Mar 6 19:21:15 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 11:21:15 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> Message-ID: This changes everything. Load is associated with non-global-escaping allocation #311 (iid is assigned only in such cases). It is allowed its memory edge change in such way. Why GCM makes unschedulable graph? I don't see are problem in 05_after_matching.png. Vladimir K On 3/6/18 10:51 AM, Vladimir Ivanov wrote: > >> There were several bugs before when we had trouble with loads which >> have control edge. As I remember we only require RAW loads to have >> such edges. Meaning Load nodes should have only dependency on memory >> state. Of cause, there could be exclusions. >> >> Originally EA can skip all membars for instance's load because it >> assumes that it will end-up in Store node into allocated object which >> should *follow* instance's allocation. And it can skip membars (which >> follow allocation) because nobody see non-escaping allocation. >> >> Load (#391) is not instance load from instance array (#363). It is >> load from source Arraycopy (#255) (it is not allocation). So it should >> not have bypass membars separating them: >> >> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 > > > Updated IR dump during before/after split_unique_types with wider > context (and, unfortunately, different node ids): > > http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png > > > One detail is missing in the original description: there's another > AllocateArray (#311) dominating the ArrayCopy (#389) and loads access it > directly. > > ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() > returns true and stops further analysis: > > > http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 > > >> So it is really some problem in step 2) in EA. Could be because only >> one alias index (memory slice) is used for whole array access. > > Unlikely, since I don't see any interference between accesses to > different elements during split_unique_types(). > >> So what memory slice of Merge node (#379) was updated to bypass membar? > > It updates instance memory slice corresponding to: > ? bool[int:8]:NotNull:exact+any *,iid=311 > > Best regards, > Vladimir Ivanov > > >> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>> Hi, >>> >>> I'm seeing unschedulable graph being produced during GCM when adding >>> anti-dependence to a load node with a control dependency. I found the >>> root cause, but can't decide how to fix it. >>> >>> Here are steps which lead to the broken graph: >>> >>> ??(1) The load causing problems (#391) is added as part of >>> specializing ArrayCopy for small arrays (added as part of JDK-6912521 >>> [1] in 9). Both control & memory are tied to AllocateArray. (IR [2]) >>> >>> ??(2) EA proves that AllocateArray (#363, destination) is scalar >>> replaceable and during split_unique_types() updates corresponding >>> MemoryMerge (#379) and it allows to directly use memory produced by >>> ArrayCopy (#255, source) bypassing the allocation & membar (#348). >>> (IR [3]) >>> >>> ??(3) After allocation elimination, the load control dependency is >>> switched to MemBarCPUOrder (#348) which was immediate dominator of >>> eliminated allocation (IR [4]) >>> >>> ??(4) After matching the load has control on the membar, but not >>> memory (IR before [5] and after [6] matching.) >>> >>> ??(5) During GCM, anti-dependence from membar (#317) to the load is >>> added, but it makes the graph unschedulable which then triggers the >>> assertion [7] during LCM. >>> >>> Relevant places in the code: [8] >>> >>> Everything looks fine, except updates of MergeMems in step #2: >>> >>> ?? * the load is pinned to the proper branch after deciding what >>> direction to go; >>> >>> ?? * wide membars do need anti-dependences on loads >>> >>> So, as a fix I'd disable memory edge updates which bypass any >>> membars. Does it sound reasonable or am I missing something important? >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>> >>> [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>> >>> [3] >>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>> >>> >>> [4] >>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>> >>> >>> [5] >>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>> >>> [6] >>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>> >>> [7] >>> #? Internal Error >>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>> pid=90414, tid=14851 >>> #? assert(false) failed: graph should be schedulable >>> >>> >>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Tue Mar 6 19:26:44 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 11:26:44 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> Message-ID: On 3/6/18 11:21 AM, Vladimir Kozlov wrote: > This changes everything. Load is associated with non-global-escaping > allocation #311 (iid is assigned only in such cases). It is allowed its > memory edge change in such way. > > Why GCM makes unschedulable graph? I don't see a problem in > 05_after_matching.png. Is it because Load's memory (#173) is above membar (#317) but the Load below because of control? Vladimir K > > Vladimir K > > On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >> >>> There were several bugs before when we had trouble with loads which >>> have control edge. As I remember we only require RAW loads to have >>> such edges. Meaning Load nodes should have only dependency on memory >>> state. Of cause, there could be exclusions. >>> >>> Originally EA can skip all membars for instance's load because it >>> assumes that it will end-up in Store node into allocated object which >>> should *follow* instance's allocation. And it can skip membars (which >>> follow allocation) because nobody see non-escaping allocation. >>> >>> Load (#391) is not instance load from instance array (#363). It is >>> load from source Arraycopy (#255) (it is not allocation). So it >>> should not have bypass membars separating them: >>> >>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >> >> >> >> Updated IR dump during before/after split_unique_types with wider >> context (and, unfortunately, different node ids): >> >> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >> >> >> One detail is missing in the original description: there's another >> AllocateArray (#311) dominating the ArrayCopy (#389) and loads access >> it directly. >> >> ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() >> returns true and stops further analysis: >> >> >> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >> >> >>> So it is really some problem in step 2) in EA. Could be because only >>> one alias index (memory slice) is used for whole array access. >> >> Unlikely, since I don't see any interference between accesses to >> different elements during split_unique_types(). >> >>> So what memory slice of Merge node (#379) was updated to bypass membar? >> >> It updates instance memory slice corresponding to: >> ?? bool[int:8]:NotNull:exact+any *,iid=311 >> >> Best regards, >> Vladimir Ivanov >> >> >>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>> Hi, >>>> >>>> I'm seeing unschedulable graph being produced during GCM when adding >>>> anti-dependence to a load node with a control dependency. I found >>>> the root cause, but can't decide how to fix it. >>>> >>>> Here are steps which lead to the broken graph: >>>> >>>> ??(1) The load causing problems (#391) is added as part of >>>> specializing ArrayCopy for small arrays (added as part of >>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>> AllocateArray. (IR [2]) >>>> >>>> ??(2) EA proves that AllocateArray (#363, destination) is scalar >>>> replaceable and during split_unique_types() updates corresponding >>>> MemoryMerge (#379) and it allows to directly use memory produced by >>>> ArrayCopy (#255, source) bypassing the allocation & membar (#348). >>>> (IR [3]) >>>> >>>> ??(3) After allocation elimination, the load control dependency is >>>> switched to MemBarCPUOrder (#348) which was immediate dominator of >>>> eliminated allocation (IR [4]) >>>> >>>> ??(4) After matching the load has control on the membar, but not >>>> memory (IR before [5] and after [6] matching.) >>>> >>>> ??(5) During GCM, anti-dependence from membar (#317) to the load is >>>> added, but it makes the graph unschedulable which then triggers the >>>> assertion [7] during LCM. >>>> >>>> Relevant places in the code: [8] >>>> >>>> Everything looks fine, except updates of MergeMems in step #2: >>>> >>>> ?? * the load is pinned to the proper branch after deciding what >>>> direction to go; >>>> >>>> ?? * wide membars do need anti-dependences on loads >>>> >>>> So, as a fix I'd disable memory edge updates which bypass any >>>> membars. Does it sound reasonable or am I missing something important? >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>> >>>> [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>> >>>> [3] >>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>> >>>> >>>> [4] >>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>> >>>> >>>> [5] >>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>> >>>> >>>> [6] >>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>> >>>> [7] >>>> #? Internal Error >>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>> pid=90414, tid=14851 >>>> #? assert(false) failed: graph should be schedulable >>>> >>>> >>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Tue Mar 6 20:15:05 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 12:15:05 -0800 Subject: [11] RFR(S) 8199066: [JVMCI] EagerJVMCI option should also initialize the JVMCI compiler Message-ID: <1f73289e-590a-ec8f-964f-092c614b6d3d@oracle.com> http://cr.openjdk.java.net/~kvn/8199066/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8199066 JDK-8195623 introduced the EagerJVMCI option to avoid issues with lazy initialization of Graal. However, the change only initialized the JVMCI subsystem without initializing Graal itself. Contributed by Doug Simon. -- Thanks, Vladimir From vladimir.kozlov at oracle.com Tue Mar 6 20:20:05 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 12:20:05 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> Message-ID: I think we should remove control edge for Loads from *non-escaping* instances. Instance' pointer is not NULL and class is exact. And, as I said, such Loads can skip membars since their instance is not escaping. It is not exception - we have other Load nodes without control edge. Vladimir K On 3/6/18 12:13 PM, Vladimir Ivanov wrote: > > > On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>> This changes everything. Load is associated with non-global-escaping >>> allocation #311 (iid is assigned only in such cases). It is allowed >>> its memory edge change in such way. >>> >>> Why GCM makes unschedulable graph? I don't see a problem in >>> 05_after_matching.png. >> >> Is it because Load's memory (#173) is above membar (#317) but the Load >> below because of control? > > Exactly. Anti-dependences are added from membar (#317) to the loads > (#380/...) and it makes the graph unschedulable in LCM. > > Best regards, > Vladimir Ivanov > >>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>> >>>>> There were several bugs before when we had trouble with loads which >>>>> have control edge. As I remember we only require RAW loads to have >>>>> such edges. Meaning Load nodes should have only dependency on >>>>> memory state. Of cause, there could be exclusions. >>>>> >>>>> Originally EA can skip all membars for instance's load because it >>>>> assumes that it will end-up in Store node into allocated object >>>>> which should *follow* instance's allocation. And it can skip >>>>> membars (which follow allocation) because nobody see non-escaping >>>>> allocation. >>>>> >>>>> Load (#391) is not instance load from instance array (#363). It is >>>>> load from source Arraycopy (#255) (it is not allocation). So it >>>>> should not have bypass membars separating them: >>>>> >>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>> >>>> >>>> >>>> >>>> >>>> Updated IR dump during before/after split_unique_types with wider >>>> context (and, unfortunately, different node ids): >>>> >>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>> >>>> >>>> One detail is missing in the original description: there's another >>>> AllocateArray (#311) dominating the ArrayCopy (#389) and loads >>>> access it directly. >>>> >>>> ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() >>>> returns true and stops further analysis: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>> >>>> >>>>> So it is really some problem in step 2) in EA. Could be because >>>>> only one alias index (memory slice) is used for whole array access. >>>> >>>> Unlikely, since I don't see any interference between accesses to >>>> different elements during split_unique_types(). >>>> >>>>> So what memory slice of Merge node (#379) was updated to bypass >>>>> membar? >>>> >>>> It updates instance memory slice corresponding to: >>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> >>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>> Hi, >>>>>> >>>>>> I'm seeing unschedulable graph being produced during GCM when >>>>>> adding anti-dependence to a load node with a control dependency. I >>>>>> found the root cause, but can't decide how to fix it. >>>>>> >>>>>> Here are steps which lead to the broken graph: >>>>>> >>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>> AllocateArray. (IR [2]) >>>>>> >>>>>> ??(2) EA proves that AllocateArray (#363, destination) is scalar >>>>>> replaceable and during split_unique_types() updates corresponding >>>>>> MemoryMerge (#379) and it allows to directly use memory produced >>>>>> by ArrayCopy (#255, source) bypassing the allocation & membar >>>>>> (#348). (IR [3]) >>>>>> >>>>>> ??(3) After allocation elimination, the load control dependency is >>>>>> switched to MemBarCPUOrder (#348) which was immediate dominator of >>>>>> eliminated allocation (IR [4]) >>>>>> >>>>>> ??(4) After matching the load has control on the membar, but not >>>>>> memory (IR before [5] and after [6] matching.) >>>>>> >>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the load >>>>>> is added, but it makes the graph unschedulable which then triggers >>>>>> the assertion [7] during LCM. >>>>>> >>>>>> Relevant places in the code: [8] >>>>>> >>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>> >>>>>> ?? * the load is pinned to the proper branch after deciding what >>>>>> direction to go; >>>>>> >>>>>> ?? * wide membars do need anti-dependences on loads >>>>>> >>>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>>> membars. Does it sound reasonable or am I missing something >>>>>> important? >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>> >>>>>> [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>> >>>>>> [3] >>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>> >>>>>> >>>>>> [4] >>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>> >>>>>> >>>>>> [5] >>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>> >>>>>> >>>>>> [6] >>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>> >>>>>> >>>>>> [7] >>>>>> #? Internal Error >>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>> pid=90414, tid=14851 >>>>>> #? assert(false) failed: graph should be schedulable >>>>>> >>>>>> >>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Tue Mar 6 20:23:33 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 12:23:33 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> Message-ID: <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> What would happen if it is volatile Load? Vladimir On 3/6/18 12:20 PM, Vladimir Kozlov wrote: > I think we should remove control edge for Loads from *non-escaping* > instances. Instance' pointer is not NULL and class is exact. And, as I > said, such Loads can skip membars since their instance is not escaping. > > It is not exception - we have other Load nodes without control edge. > > Vladimir K > > On 3/6/18 12:13 PM, Vladimir Ivanov wrote: >> >> >> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >>> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>>> This changes everything. Load is associated with non-global-escaping >>>> allocation #311 (iid is assigned only in such cases). It is allowed >>>> its memory edge change in such way. >>>> >>>> Why GCM makes unschedulable graph? I don't see a problem in >>>> 05_after_matching.png. >>> >>> Is it because Load's memory (#173) is above membar (#317) but the >>> Load below because of control? >> >> Exactly. Anti-dependences are added from membar (#317) to the loads >> (#380/...) and it makes the graph unschedulable in LCM. >> >> Best regards, >> Vladimir Ivanov >> >>>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>>> >>>>>> There were several bugs before when we had trouble with loads >>>>>> which have control edge. As I remember we only require RAW loads >>>>>> to have such edges. Meaning Load nodes should have only dependency >>>>>> on memory state. Of cause, there could be exclusions. >>>>>> >>>>>> Originally EA can skip all membars for instance's load because it >>>>>> assumes that it will end-up in Store node into allocated object >>>>>> which should *follow* instance's allocation. And it can skip >>>>>> membars (which follow allocation) because nobody see non-escaping >>>>>> allocation. >>>>>> >>>>>> Load (#391) is not instance load from instance array (#363). It is >>>>>> load from source Arraycopy (#255) (it is not allocation). So it >>>>>> should not have bypass membars separating them: >>>>>> >>>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Updated IR dump during before/after split_unique_types with wider >>>>> context (and, unfortunately, different node ids): >>>>> >>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>>> >>>>> >>>>> One detail is missing in the original description: there's another >>>>> AllocateArray (#311) dominating the ArrayCopy (#389) and loads >>>>> access it directly. >>>>> >>>>> ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() >>>>> returns true and stops further analysis: >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>>> >>>>> >>>>>> So it is really some problem in step 2) in EA. Could be because >>>>>> only one alias index (memory slice) is used for whole array access. >>>>> >>>>> Unlikely, since I don't see any interference between accesses to >>>>> different elements during split_unique_types(). >>>>> >>>>>> So what memory slice of Merge node (#379) was updated to bypass >>>>>> membar? >>>>> >>>>> It updates instance memory slice corresponding to: >>>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> >>>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm seeing unschedulable graph being produced during GCM when >>>>>>> adding anti-dependence to a load node with a control dependency. >>>>>>> I found the root cause, but can't decide how to fix it. >>>>>>> >>>>>>> Here are steps which lead to the broken graph: >>>>>>> >>>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>>> AllocateArray. (IR [2]) >>>>>>> >>>>>>> ??(2) EA proves that AllocateArray (#363, destination) is scalar >>>>>>> replaceable and during split_unique_types() updates corresponding >>>>>>> MemoryMerge (#379) and it allows to directly use memory produced >>>>>>> by ArrayCopy (#255, source) bypassing the allocation & membar >>>>>>> (#348). (IR [3]) >>>>>>> >>>>>>> ??(3) After allocation elimination, the load control dependency >>>>>>> is switched to MemBarCPUOrder (#348) which was immediate >>>>>>> dominator of eliminated allocation (IR [4]) >>>>>>> >>>>>>> ??(4) After matching the load has control on the membar, but not >>>>>>> memory (IR before [5] and after [6] matching.) >>>>>>> >>>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the load >>>>>>> is added, but it makes the graph unschedulable which then >>>>>>> triggers the assertion [7] during LCM. >>>>>>> >>>>>>> Relevant places in the code: [8] >>>>>>> >>>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>>> >>>>>>> ?? * the load is pinned to the proper branch after deciding what >>>>>>> direction to go; >>>>>>> >>>>>>> ?? * wide membars do need anti-dependences on loads >>>>>>> >>>>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>>>> membars. Does it sound reasonable or am I missing something >>>>>>> important? >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>>> >>>>>>> [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>>> >>>>>>> [3] >>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>>> >>>>>>> >>>>>>> [4] >>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>>> >>>>>>> >>>>>>> [5] >>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>>> >>>>>>> >>>>>>> [6] >>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>>> >>>>>>> >>>>>>> [7] >>>>>>> #? Internal Error >>>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>>> pid=90414, tid=14851 >>>>>>> #? assert(false) failed: graph should be schedulable >>>>>>> >>>>>>> >>>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.x.ivanov at oracle.com Tue Mar 6 20:13:16 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 6 Mar 2018 23:13:16 +0300 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> Message-ID: <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: > On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >> This changes everything. Load is associated with non-global-escaping >> allocation #311 (iid is assigned only in such cases). It is allowed >> its memory edge change in such way. >> >> Why GCM makes unschedulable graph? I don't see a problem in >> 05_after_matching.png. > > Is it because Load's memory (#173) is above membar (#317) but the Load > below because of control? Exactly. Anti-dependences are added from membar (#317) to the loads (#380/...) and it makes the graph unschedulable in LCM. Best regards, Vladimir Ivanov >> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>> >>>> There were several bugs before when we had trouble with loads which >>>> have control edge. As I remember we only require RAW loads to have >>>> such edges. Meaning Load nodes should have only dependency on memory >>>> state. Of cause, there could be exclusions. >>>> >>>> Originally EA can skip all membars for instance's load because it >>>> assumes that it will end-up in Store node into allocated object >>>> which should *follow* instance's allocation. And it can skip membars >>>> (which follow allocation) because nobody see non-escaping allocation. >>>> >>>> Load (#391) is not instance load from instance array (#363). It is >>>> load from source Arraycopy (#255) (it is not allocation). So it >>>> should not have bypass membars separating them: >>>> >>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>> >>> >>> >>> >>> Updated IR dump during before/after split_unique_types with wider >>> context (and, unfortunately, different node ids): >>> >>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>> >>> >>> One detail is missing in the original description: there's another >>> AllocateArray (#311) dominating the ArrayCopy (#389) and loads access >>> it directly. >>> >>> ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() >>> returns true and stops further analysis: >>> >>> >>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>> >>> >>>> So it is really some problem in step 2) in EA. Could be because only >>>> one alias index (memory slice) is used for whole array access. >>> >>> Unlikely, since I don't see any interference between accesses to >>> different elements during split_unique_types(). >>> >>>> So what memory slice of Merge node (#379) was updated to bypass membar? >>> >>> It updates instance memory slice corresponding to: >>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> >>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>> Hi, >>>>> >>>>> I'm seeing unschedulable graph being produced during GCM when >>>>> adding anti-dependence to a load node with a control dependency. I >>>>> found the root cause, but can't decide how to fix it. >>>>> >>>>> Here are steps which lead to the broken graph: >>>>> >>>>> ??(1) The load causing problems (#391) is added as part of >>>>> specializing ArrayCopy for small arrays (added as part of >>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>> AllocateArray. (IR [2]) >>>>> >>>>> ??(2) EA proves that AllocateArray (#363, destination) is scalar >>>>> replaceable and during split_unique_types() updates corresponding >>>>> MemoryMerge (#379) and it allows to directly use memory produced by >>>>> ArrayCopy (#255, source) bypassing the allocation & membar (#348). >>>>> (IR [3]) >>>>> >>>>> ??(3) After allocation elimination, the load control dependency is >>>>> switched to MemBarCPUOrder (#348) which was immediate dominator of >>>>> eliminated allocation (IR [4]) >>>>> >>>>> ??(4) After matching the load has control on the membar, but not >>>>> memory (IR before [5] and after [6] matching.) >>>>> >>>>> ??(5) During GCM, anti-dependence from membar (#317) to the load is >>>>> added, but it makes the graph unschedulable which then triggers the >>>>> assertion [7] during LCM. >>>>> >>>>> Relevant places in the code: [8] >>>>> >>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>> >>>>> ?? * the load is pinned to the proper branch after deciding what >>>>> direction to go; >>>>> >>>>> ?? * wide membars do need anti-dependences on loads >>>>> >>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>> membars. Does it sound reasonable or am I missing something important? >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>> >>>>> [2] http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>> >>>>> [3] >>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>> >>>>> >>>>> [4] >>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>> >>>>> >>>>> [5] >>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>> >>>>> >>>>> [6] >>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>> >>>>> >>>>> [7] >>>>> #? Internal Error >>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>> pid=90414, tid=14851 >>>>> #? assert(false) failed: graph should be schedulable >>>>> >>>>> >>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.x.ivanov at oracle.com Tue Mar 6 21:03:14 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 7 Mar 2018 00:03:14 +0300 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> Message-ID: <092cbc61-c582-cf08-8a0d-9f50540c8901@oracle.com> > What would happen if it is volatile Load? If it accesses non-escaping object, then preserving program dependence should be enough. >> I think we should remove control edge for Loads from *non-escaping* >> instances. Instance' pointer is not NULL and class is exact. And, as I >> said, such Loads can skip membars since their instance is not escaping. >> >> It is not exception - we have other Load nodes without control edge. So, in case of early expanded ArrayCopy, it'll enable loads to float above the direction check (copy forward/backwards). Moreover, it can lead to possible change of respective order which can be incorrect w.r.t. accompanying stores. Best regards, Vladimir Ivanov >>> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >>>> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>>>> This changes everything. Load is associated with >>>>> non-global-escaping allocation #311 (iid is assigned only in such >>>>> cases). It is allowed its memory edge change in such way. >>>>> >>>>> Why GCM makes unschedulable graph? I don't see a problem in >>>>> 05_after_matching.png. >>>> >>>> Is it because Load's memory (#173) is above membar (#317) but the >>>> Load below because of control? >>> >>> Exactly. Anti-dependences are added from membar (#317) to the loads >>> (#380/...) and it makes the graph unschedulable in LCM. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>>>> >>>>>>> There were several bugs before when we had trouble with loads >>>>>>> which have control edge. As I remember we only require RAW loads >>>>>>> to have such edges. Meaning Load nodes should have only >>>>>>> dependency on memory state. Of cause, there could be exclusions. >>>>>>> >>>>>>> Originally EA can skip all membars for instance's load because it >>>>>>> assumes that it will end-up in Store node into allocated object >>>>>>> which should *follow* instance's allocation. And it can skip >>>>>>> membars (which follow allocation) because nobody see non-escaping >>>>>>> allocation. >>>>>>> >>>>>>> Load (#391) is not instance load from instance array (#363). It >>>>>>> is load from source Arraycopy (#255) (it is not allocation). So >>>>>>> it should not have bypass membars separating them: >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Updated IR dump during before/after split_unique_types with wider >>>>>> context (and, unfortunately, different node ids): >>>>>> >>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>>>> >>>>>> >>>>>> One detail is missing in the original description: there's another >>>>>> AllocateArray (#311) dominating the ArrayCopy (#389) and loads >>>>>> access it directly. >>>>>> >>>>>> ArrayCopy uses #311 as destination, so ArrayCopyNode::may_modify() >>>>>> returns true and stops further analysis: >>>>>> >>>>>> >>>>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>>>> >>>>>> >>>>>>> So it is really some problem in step 2) in EA. Could be because >>>>>>> only one alias index (memory slice) is used for whole array access. >>>>>> >>>>>> Unlikely, since I don't see any interference between accesses to >>>>>> different elements during split_unique_types(). >>>>>> >>>>>>> So what memory slice of Merge node (#379) was updated to bypass >>>>>>> membar? >>>>>> >>>>>> It updates instance memory slice corresponding to: >>>>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> >>>>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm seeing unschedulable graph being produced during GCM when >>>>>>>> adding anti-dependence to a load node with a control dependency. >>>>>>>> I found the root cause, but can't decide how to fix it. >>>>>>>> >>>>>>>> Here are steps which lead to the broken graph: >>>>>>>> >>>>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>>>> AllocateArray. (IR [2]) >>>>>>>> >>>>>>>> ??(2) EA proves that AllocateArray (#363, destination) is scalar >>>>>>>> replaceable and during split_unique_types() updates >>>>>>>> corresponding MemoryMerge (#379) and it allows to directly use >>>>>>>> memory produced by ArrayCopy (#255, source) bypassing the >>>>>>>> allocation & membar (#348). (IR [3]) >>>>>>>> >>>>>>>> ??(3) After allocation elimination, the load control dependency >>>>>>>> is switched to MemBarCPUOrder (#348) which was immediate >>>>>>>> dominator of eliminated allocation (IR [4]) >>>>>>>> >>>>>>>> ??(4) After matching the load has control on the membar, but not >>>>>>>> memory (IR before [5] and after [6] matching.) >>>>>>>> >>>>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the load >>>>>>>> is added, but it makes the graph unschedulable which then >>>>>>>> triggers the assertion [7] during LCM. >>>>>>>> >>>>>>>> Relevant places in the code: [8] >>>>>>>> >>>>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>>>> >>>>>>>> ?? * the load is pinned to the proper branch after deciding what >>>>>>>> direction to go; >>>>>>>> >>>>>>>> ?? * wide membars do need anti-dependences on loads >>>>>>>> >>>>>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>>>>> membars. Does it sound reasonable or am I missing something >>>>>>>> important? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Vladimir Ivanov >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>>>> >>>>>>>> [2] >>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>>>> >>>>>>>> [3] >>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>>>> >>>>>>>> >>>>>>>> [4] >>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>>>> >>>>>>>> >>>>>>>> [5] >>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>>>> >>>>>>>> >>>>>>>> [6] >>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>>>> >>>>>>>> >>>>>>>> [7] >>>>>>>> #? Internal Error >>>>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>>>> pid=90414, tid=14851 >>>>>>>> #? assert(false) failed: graph should be schedulable >>>>>>>> >>>>>>>> >>>>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Tue Mar 6 21:45:24 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 13:45:24 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <092cbc61-c582-cf08-8a0d-9f50540c8901@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> <092cbc61-c582-cf08-8a0d-9f50540c8901@oracle.com> Message-ID: On 3/6/18 1:03 PM, Vladimir Ivanov wrote: > >> What would happen if it is volatile Load? > > If it accesses non-escaping object, then preserving program dependence > should be enough. > >>> I think we should remove control edge for Loads from *non-escaping* >>> instances. Instance' pointer is not NULL and class is exact. And, as >>> I said, such Loads can skip membars since their instance is not >>> escaping. >>> >>> It is not exception - we have other Load nodes without control edge. > > So, in case of early expanded ArrayCopy, it'll enable loads to float > above the direction check (copy forward/backwards). It should not be a problem because it is loads from new allocation - no check is generated for forward copying: http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.cpp#l236 http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.cpp#l326 But it needs to be tested. > > Moreover, it can lead to possible change of respective order which can > be incorrect w.r.t. accompanying stores. Memory edge should keep order of stores-loads. Note, other threads should not see this memory. Regards, Vladimir > > Best regards, > Vladimir Ivanov > >>>> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >>>>> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>>>>> This changes everything. Load is associated with >>>>>> non-global-escaping allocation #311 (iid is assigned only in such >>>>>> cases). It is allowed its memory edge change in such way. >>>>>> >>>>>> Why GCM makes unschedulable graph? I don't see a problem in >>>>>> 05_after_matching.png. >>>>> >>>>> Is it because Load's memory (#173) is above membar (#317) but the >>>>> Load below because of control? >>>> >>>> Exactly. Anti-dependences are added from membar (#317) to the loads >>>> (#380/...) and it makes the graph unschedulable in LCM. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>>>>> >>>>>>>> There were several bugs before when we had trouble with loads >>>>>>>> which have control edge. As I remember we only require RAW loads >>>>>>>> to have such edges. Meaning Load nodes should have only >>>>>>>> dependency on memory state. Of cause, there could be exclusions. >>>>>>>> >>>>>>>> Originally EA can skip all membars for instance's load because >>>>>>>> it assumes that it will end-up in Store node into allocated >>>>>>>> object which should *follow* instance's allocation. And it can >>>>>>>> skip membars (which follow allocation) because nobody see >>>>>>>> non-escaping allocation. >>>>>>>> >>>>>>>> Load (#391) is not instance load from instance array (#363). It >>>>>>>> is load from source Arraycopy (#255) (it is not allocation). So >>>>>>>> it should not have bypass membars separating them: >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Updated IR dump during before/after split_unique_types with wider >>>>>>> context (and, unfortunately, different node ids): >>>>>>> >>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>>>>> >>>>>>> >>>>>>> One detail is missing in the original description: there's >>>>>>> another AllocateArray (#311) dominating the ArrayCopy (#389) and >>>>>>> loads access it directly. >>>>>>> >>>>>>> ArrayCopy uses #311 as destination, so >>>>>>> ArrayCopyNode::may_modify() returns true and stops further analysis: >>>>>>> >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>>>>> >>>>>>> >>>>>>>> So it is really some problem in step 2) in EA. Could be because >>>>>>>> only one alias index (memory slice) is used for whole array access. >>>>>>> >>>>>>> Unlikely, since I don't see any interference between accesses to >>>>>>> different elements during split_unique_types(). >>>>>>> >>>>>>>> So what memory slice of Merge node (#379) was updated to bypass >>>>>>>> membar? >>>>>>> >>>>>>> It updates instance memory slice corresponding to: >>>>>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> >>>>>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm seeing unschedulable graph being produced during GCM when >>>>>>>>> adding anti-dependence to a load node with a control >>>>>>>>> dependency. I found the root cause, but can't decide how to fix >>>>>>>>> it. >>>>>>>>> >>>>>>>>> Here are steps which lead to the broken graph: >>>>>>>>> >>>>>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>>>>> AllocateArray. (IR [2]) >>>>>>>>> >>>>>>>>> ??(2) EA proves that AllocateArray (#363, destination) is >>>>>>>>> scalar replaceable and during split_unique_types() updates >>>>>>>>> corresponding MemoryMerge (#379) and it allows to directly use >>>>>>>>> memory produced by ArrayCopy (#255, source) bypassing the >>>>>>>>> allocation & membar (#348). (IR [3]) >>>>>>>>> >>>>>>>>> ??(3) After allocation elimination, the load control dependency >>>>>>>>> is switched to MemBarCPUOrder (#348) which was immediate >>>>>>>>> dominator of eliminated allocation (IR [4]) >>>>>>>>> >>>>>>>>> ??(4) After matching the load has control on the membar, but >>>>>>>>> not memory (IR before [5] and after [6] matching.) >>>>>>>>> >>>>>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the >>>>>>>>> load is added, but it makes the graph unschedulable which then >>>>>>>>> triggers the assertion [7] during LCM. >>>>>>>>> >>>>>>>>> Relevant places in the code: [8] >>>>>>>>> >>>>>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>>>>> >>>>>>>>> ?? * the load is pinned to the proper branch after deciding >>>>>>>>> what direction to go; >>>>>>>>> >>>>>>>>> ?? * wide membars do need anti-dependences on loads >>>>>>>>> >>>>>>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>>>>>> membars. Does it sound reasonable or am I missing something >>>>>>>>> important? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Vladimir Ivanov >>>>>>>>> >>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>>>>> >>>>>>>>> [3] >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>>>>> >>>>>>>>> >>>>>>>>> [4] >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>>>>> >>>>>>>>> >>>>>>>>> [5] >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>>>>> >>>>>>>>> >>>>>>>>> [6] >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>>>>> >>>>>>>>> >>>>>>>>> [7] >>>>>>>>> #? Internal Error >>>>>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>>>>> pid=90414, tid=14851 >>>>>>>>> #? assert(false) failed: graph should be schedulable >>>>>>>>> >>>>>>>>> >>>>>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.x.ivanov at oracle.com Tue Mar 6 22:28:16 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 7 Mar 2018 01:28:16 +0300 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> <092cbc61-c582-cf08-8a0d-9f50540c8901@oracle.com> Message-ID: <7f0e566e-1b31-7435-ee5f-d6f87c9419ca@oracle.com> >> >> So, in case of early expanded ArrayCopy, it'll enable loads to float >> above the direction check (copy forward/backwards). > > It should not be a problem because it is loads from new allocation - no > check is generated for forward copying: > > http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.cpp#l236 > > http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.cpp#l326 > > > But it needs to be tested. is_alloc_tightly_coupled() covers the case when ArrayCopy immediately follows AllocateArray [1], but it seems it doesn't completely eliminate the possibility of overlapping: I'm still concerned about the case when src == dst, there are some interim stores, but consequent loads still see the unique instance. In that case, the order of memory operations should be preserved. BTW I'm curious why ArrayCopy expansion happens so early (during IGVN). Can it be delayed till macro expansion phase? Also, what about unsafe accesses? It doesn't sound right if they get their control dependency trimmed and start to float around, even on non-escaping objects. >> >> Moreover, it can lead to possible change of respective order which can >> be incorrect w.r.t. accompanying stores. > > Memory edge should keep order of stores-loads. Note, other threads > should not see this memory. split_unique_types() can separate loads & stores, so memory edges aren't enough to order loads in presence of src/dst overlapping: http://cr.openjdk.java.net/~vlivanov/misc/antidep/ea_after.png http://cr.openjdk.java.net/~vlivanov/misc/antidep/ea_before.png Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.hpp#l48 >>>>> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >>>>>> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>>>>>> This changes everything. Load is associated with >>>>>>> non-global-escaping allocation #311 (iid is assigned only in such >>>>>>> cases). It is allowed its memory edge change in such way. >>>>>>> >>>>>>> Why GCM makes unschedulable graph? I don't see a problem in >>>>>>> 05_after_matching.png. >>>>>> >>>>>> Is it because Load's memory (#173) is above membar (#317) but the >>>>>> Load below because of control? >>>>> >>>>> Exactly. Anti-dependences are added from membar (#317) to the loads >>>>> (#380/...) and it makes the graph unschedulable in LCM. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>>>>>> >>>>>>>>> There were several bugs before when we had trouble with loads >>>>>>>>> which have control edge. As I remember we only require RAW >>>>>>>>> loads to have such edges. Meaning Load nodes should have only >>>>>>>>> dependency on memory state. Of cause, there could be exclusions. >>>>>>>>> >>>>>>>>> Originally EA can skip all membars for instance's load because >>>>>>>>> it assumes that it will end-up in Store node into allocated >>>>>>>>> object which should *follow* instance's allocation. And it can >>>>>>>>> skip membars (which follow allocation) because nobody see >>>>>>>>> non-escaping allocation. >>>>>>>>> >>>>>>>>> Load (#391) is not instance load from instance array (#363). It >>>>>>>>> is load from source Arraycopy (#255) (it is not allocation). So >>>>>>>>> it should not have bypass membars separating them: >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Updated IR dump during before/after split_unique_types with >>>>>>>> wider context (and, unfortunately, different node ids): >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>>>>>> >>>>>>>> >>>>>>>> One detail is missing in the original description: there's >>>>>>>> another AllocateArray (#311) dominating the ArrayCopy (#389) and >>>>>>>> loads access it directly. >>>>>>>> >>>>>>>> ArrayCopy uses #311 as destination, so >>>>>>>> ArrayCopyNode::may_modify() returns true and stops further >>>>>>>> analysis: >>>>>>>> >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>>>>>> >>>>>>>> >>>>>>>>> So it is really some problem in step 2) in EA. Could be because >>>>>>>>> only one alias index (memory slice) is used for whole array >>>>>>>>> access. >>>>>>>> >>>>>>>> Unlikely, since I don't see any interference between accesses to >>>>>>>> different elements during split_unique_types(). >>>>>>>> >>>>>>>>> So what memory slice of Merge node (#379) was updated to bypass >>>>>>>>> membar? >>>>>>>> >>>>>>>> It updates instance memory slice corresponding to: >>>>>>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Vladimir Ivanov >>>>>>>> >>>>>>>> >>>>>>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm seeing unschedulable graph being produced during GCM when >>>>>>>>>> adding anti-dependence to a load node with a control >>>>>>>>>> dependency. I found the root cause, but can't decide how to >>>>>>>>>> fix it. >>>>>>>>>> >>>>>>>>>> Here are steps which lead to the broken graph: >>>>>>>>>> >>>>>>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>>>>>> AllocateArray. (IR [2]) >>>>>>>>>> >>>>>>>>>> ??(2) EA proves that AllocateArray (#363, destination) is >>>>>>>>>> scalar replaceable and during split_unique_types() updates >>>>>>>>>> corresponding MemoryMerge (#379) and it allows to directly use >>>>>>>>>> memory produced by ArrayCopy (#255, source) bypassing the >>>>>>>>>> allocation & membar (#348). (IR [3]) >>>>>>>>>> >>>>>>>>>> ??(3) After allocation elimination, the load control >>>>>>>>>> dependency is switched to MemBarCPUOrder (#348) which was >>>>>>>>>> immediate dominator of eliminated allocation (IR [4]) >>>>>>>>>> >>>>>>>>>> ??(4) After matching the load has control on the membar, but >>>>>>>>>> not memory (IR before [5] and after [6] matching.) >>>>>>>>>> >>>>>>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the >>>>>>>>>> load is added, but it makes the graph unschedulable which then >>>>>>>>>> triggers the assertion [7] during LCM. >>>>>>>>>> >>>>>>>>>> Relevant places in the code: [8] >>>>>>>>>> >>>>>>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>>>>>> >>>>>>>>>> ?? * the load is pinned to the proper branch after deciding >>>>>>>>>> what direction to go; >>>>>>>>>> >>>>>>>>>> ?? * wide membars do need anti-dependences on loads >>>>>>>>>> >>>>>>>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>>>>>>> membars. Does it sound reasonable or am I missing something >>>>>>>>>> important? >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Vladimir Ivanov >>>>>>>>>> >>>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>>>>>> >>>>>>>>>> [2] >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>>>>>> >>>>>>>>>> [3] >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [4] >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [5] >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [6] >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [7] >>>>>>>>>> #? Internal Error >>>>>>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>>>>>> pid=90414, tid=14851 >>>>>>>>>> #? assert(false) failed: graph should be schedulable >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.kozlov at oracle.com Tue Mar 6 22:56:37 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Mar 2018 14:56:37 -0800 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <7f0e566e-1b31-7435-ee5f-d6f87c9419ca@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> <092cbc61-c582-cf08-8a0d-9f50540c8901@oracle.com> <7f0e566e-1b31-7435-ee5f-d6f87c9419ca@oracle.com> Message-ID: <0b8688fc-54bd-eff3-6edd-cab128b7712a@oracle.com> On 3/6/18 2:28 PM, Vladimir Ivanov wrote: >>> >>> So, in case of early expanded ArrayCopy, it'll enable loads to float >>> above the direction check (copy forward/backwards). >> >> It should not be a problem because it is loads from new allocation - >> no check is generated for forward copying: >> >> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.cpp#l236 >> >> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.cpp#l326 >> >> >> But it needs to be tested. > > is_alloc_tightly_coupled() covers the case when ArrayCopy immediately > follows AllocateArray [1], but it seems it doesn't completely eliminate > the possibility of overlapping: I'm still concerned about the case when > src == dst, there are some interim stores, but consequent loads still > see the unique instance. In that case, the order of memory operations > should be preserved. Load's memory edge should point to preceding store and not instance if it is the same array. Memory slice is the same - iy can't bypass store. > > BTW I'm curious why ArrayCopy expansion happens so early (during IGVN). > Can it be delayed till macro expansion phase? ArrayCopy is call which prevents a lot of loop optimizations. > > Also, what about unsafe accesses? It doesn't sound right if they get > their control dependency trimmed and start to float around, even on > non-escaping objects. Yes, unsafe is special case. But in such cases instance will *escape*: http://hg.openjdk.java.net/jdk/hs/file/fde3feaaa4ed/src/hotspot/share/opto/escape.cpp#l742 > >>> >>> Moreover, it can lead to possible change of respective order which >>> can be incorrect w.r.t. accompanying stores. >> >> Memory edge should keep order of stores-loads. Note, other threads >> should not see this memory. > > split_unique_types() can separate loads & stores, so memory edges aren't > enough to order loads in presence of src/dst overlapping: > ? http://cr.openjdk.java.net/~vlivanov/misc/antidep/ea_after.png > ? http://cr.openjdk.java.net/~vlivanov/misc/antidep/ea_before.png If stores and loads access the *same* instance (overlap) they will have the same memory alias (slice) and will be correctly ordered. Vladimir > > Best regards, > Vladimir Ivanov > > [1] > http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.hpp#l48 > > >>>>>> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >>>>>>> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>>>>>>> This changes everything. Load is associated with >>>>>>>> non-global-escaping allocation #311 (iid is assigned only in >>>>>>>> such cases). It is allowed its memory edge change in such way. >>>>>>>> >>>>>>>> Why GCM makes unschedulable graph? I don't see a problem in >>>>>>>> 05_after_matching.png. >>>>>>> >>>>>>> Is it because Load's memory (#173) is above membar (#317) but the >>>>>>> Load below because of control? >>>>>> >>>>>> Exactly. Anti-dependences are added from membar (#317) to the >>>>>> loads (#380/...) and it makes the graph unschedulable in LCM. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>>>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>>>>>>> >>>>>>>>>> There were several bugs before when we had trouble with loads >>>>>>>>>> which have control edge. As I remember we only require RAW >>>>>>>>>> loads to have such edges. Meaning Load nodes should have only >>>>>>>>>> dependency on memory state. Of cause, there could be exclusions. >>>>>>>>>> >>>>>>>>>> Originally EA can skip all membars for instance's load because >>>>>>>>>> it assumes that it will end-up in Store node into allocated >>>>>>>>>> object which should *follow* instance's allocation. And it can >>>>>>>>>> skip membars (which follow allocation) because nobody see >>>>>>>>>> non-escaping allocation. >>>>>>>>>> >>>>>>>>>> Load (#391) is not instance load from instance array (#363). >>>>>>>>>> It is load from source Arraycopy (#255) (it is not >>>>>>>>>> allocation). So it should not have bypass membars separating >>>>>>>>>> them: >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Updated IR dump during before/after split_unique_types with >>>>>>>>> wider context (and, unfortunately, different node ids): >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>>>>>>> >>>>>>>>> >>>>>>>>> One detail is missing in the original description: there's >>>>>>>>> another AllocateArray (#311) dominating the ArrayCopy (#389) >>>>>>>>> and loads access it directly. >>>>>>>>> >>>>>>>>> ArrayCopy uses #311 as destination, so >>>>>>>>> ArrayCopyNode::may_modify() returns true and stops further >>>>>>>>> analysis: >>>>>>>>> >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>>>>>>> >>>>>>>>> >>>>>>>>>> So it is really some problem in step 2) in EA. Could be >>>>>>>>>> because only one alias index (memory slice) is used for whole >>>>>>>>>> array access. >>>>>>>>> >>>>>>>>> Unlikely, since I don't see any interference between accesses >>>>>>>>> to different elements during split_unique_types(). >>>>>>>>> >>>>>>>>>> So what memory slice of Merge node (#379) was updated to >>>>>>>>>> bypass membar? >>>>>>>>> >>>>>>>>> It updates instance memory slice corresponding to: >>>>>>>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Vladimir Ivanov >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I'm seeing unschedulable graph being produced during GCM when >>>>>>>>>>> adding anti-dependence to a load node with a control >>>>>>>>>>> dependency. I found the root cause, but can't decide how to >>>>>>>>>>> fix it. >>>>>>>>>>> >>>>>>>>>>> Here are steps which lead to the broken graph: >>>>>>>>>>> >>>>>>>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>>>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>>>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>>>>>>> AllocateArray. (IR [2]) >>>>>>>>>>> >>>>>>>>>>> ??(2) EA proves that AllocateArray (#363, destination) is >>>>>>>>>>> scalar replaceable and during split_unique_types() updates >>>>>>>>>>> corresponding MemoryMerge (#379) and it allows to directly >>>>>>>>>>> use memory produced by ArrayCopy (#255, source) bypassing the >>>>>>>>>>> allocation & membar (#348). (IR [3]) >>>>>>>>>>> >>>>>>>>>>> ??(3) After allocation elimination, the load control >>>>>>>>>>> dependency is switched to MemBarCPUOrder (#348) which was >>>>>>>>>>> immediate dominator of eliminated allocation (IR [4]) >>>>>>>>>>> >>>>>>>>>>> ??(4) After matching the load has control on the membar, but >>>>>>>>>>> not memory (IR before [5] and after [6] matching.) >>>>>>>>>>> >>>>>>>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the >>>>>>>>>>> load is added, but it makes the graph unschedulable which >>>>>>>>>>> then triggers the assertion [7] during LCM. >>>>>>>>>>> >>>>>>>>>>> Relevant places in the code: [8] >>>>>>>>>>> >>>>>>>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>>>>>>> >>>>>>>>>>> ?? * the load is pinned to the proper branch after deciding >>>>>>>>>>> what direction to go; >>>>>>>>>>> >>>>>>>>>>> ?? * wide membars do need anti-dependences on loads >>>>>>>>>>> >>>>>>>>>>> So, as a fix I'd disable memory edge updates which bypass any >>>>>>>>>>> membars. Does it sound reasonable or am I missing something >>>>>>>>>>> important? >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Vladimir Ivanov >>>>>>>>>>> >>>>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>>>>>>> >>>>>>>>>>> [2] >>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>>>>>>> >>>>>>>>>>> [3] >>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [4] >>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [5] >>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [6] >>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [7] >>>>>>>>>>> #? Internal Error >>>>>>>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>>>>>>> pid=90414, tid=14851 >>>>>>>>>>> #? assert(false) failed: graph should be schedulable >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From vladimir.x.ivanov at oracle.com Tue Mar 6 23:57:57 2018 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 7 Mar 2018 02:57:57 +0300 Subject: RFC: C2: Anti-dependence on a load with a control in presence of a membar In-Reply-To: <0b8688fc-54bd-eff3-6edd-cab128b7712a@oracle.com> References: <1c74d8d6-94b7-0864-268b-a7ec32a377f6@oracle.com> <57713069-cb76-e60b-e9db-70e89e15388f@oracle.com> <6823c248-3b38-661d-d03c-4763c4d66528@oracle.com> <30bd475f-8355-9e15-a1d9-c8a73949fe16@oracle.com> <092cbc61-c582-cf08-8a0d-9f50540c8901@oracle.com> <7f0e566e-1b31-7435-ee5f-d6f87c9419ca@oracle.com> <0b8688fc-54bd-eff3-6edd-cab128b7712a@oracle.com> Message-ID: Thanks for clarifications & suggestions, Vladimir! >> is_alloc_tightly_coupled() covers the case when ArrayCopy immediately >> follows AllocateArray [1], but it seems it doesn't completely >> eliminate the possibility of overlapping: I'm still concerned about >> the case when src == dst, there are some interim stores, but >> consequent loads still see the unique instance. In that case, the >> order of memory operations should be preserved. > > Load's memory edge should point to preceding store and not instance if > it is the same array. Memory slice is the same - iy can't bypass store. Agree. >> >> BTW I'm curious why ArrayCopy expansion happens so early (during >> IGVN). Can it be delayed till macro expansion phase? > > ArrayCopy is call which prevents a lot of loop optimizations. > Got it. >> >> Also, what about unsafe accesses? It doesn't sound right if they get >> their control dependency trimmed and start to float around, even on >> non-escaping objects. > > Yes, unsafe is special case. But in such cases instance will *escape*: > > http://hg.openjdk.java.net/jdk/hs/file/fde3feaaa4ed/src/hotspot/share/opto/escape.cpp#l742 Unsafe accesses are many-sided, e.g. mismatched accesses. That check doesn't cover them. There's a bug lurking there: JDK-8198543 [1]. It hits an assert [2] in debug build because there's an access to "wide" on-heap slice [3] (Object+off) and there's no field info associated with it. As report says in product binaries it leads to incorrect code being generated. (Haven't added my analysis to the bug yet.) Once all flavors of mismatched accesses make base object escape, I agree that your suggestion to prune control from loads on non-escaping objects should fix the problem. I'll file a bug. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8198543 [2] http://hg.openjdk.java.net/jdk/hs/file/fde3feaaa4ed/src/hotspot/share/opto/escape.cpp#l735 [3] http://hg.openjdk.java.net/jdk/hs/file/fde3feaaa4ed/src/hotspot/share/opto/library_call.cpp#l2459 > > >> >>>> >>>> Moreover, it can lead to possible change of respective order which >>>> can be incorrect w.r.t. accompanying stores. >>> >>> Memory edge should keep order of stores-loads. Note, other threads >>> should not see this memory. >> >> split_unique_types() can separate loads & stores, so memory edges >> aren't enough to order loads in presence of src/dst overlapping: >> ?? http://cr.openjdk.java.net/~vlivanov/misc/antidep/ea_after.png >> ?? http://cr.openjdk.java.net/~vlivanov/misc/antidep/ea_before.png > > If stores and loads access the *same* instance (overlap) they will have > the same memory alias (slice) and will be correctly ordered. > > Vladimir > > >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/arraycopynode.hpp#l48 >> >> >>>>>>> On 3/6/18 10:26 PM, Vladimir Kozlov wrote: >>>>>>>> On 3/6/18 11:21 AM, Vladimir Kozlov wrote: >>>>>>>>> This changes everything. Load is associated with >>>>>>>>> non-global-escaping allocation #311 (iid is assigned only in >>>>>>>>> such cases). It is allowed its memory edge change in such way. >>>>>>>>> >>>>>>>>> Why GCM makes unschedulable graph? I don't see a problem in >>>>>>>>> 05_after_matching.png. >>>>>>>> >>>>>>>> Is it because Load's memory (#173) is above membar (#317) but >>>>>>>> the Load below because of control? >>>>>>> >>>>>>> Exactly. Anti-dependences are added from membar (#317) to the >>>>>>> loads (#380/...) and it makes the graph unschedulable in LCM. >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>>>> On 3/6/18 10:51 AM, Vladimir Ivanov wrote: >>>>>>>>>> >>>>>>>>>>> There were several bugs before when we had trouble with loads >>>>>>>>>>> which have control edge. As I remember we only require RAW >>>>>>>>>>> loads to have such edges. Meaning Load nodes should have only >>>>>>>>>>> dependency on memory state. Of cause, there could be exclusions. >>>>>>>>>>> >>>>>>>>>>> Originally EA can skip all membars for instance's load >>>>>>>>>>> because it assumes that it will end-up in Store node into >>>>>>>>>>> allocated object which should *follow* instance's allocation. >>>>>>>>>>> And it can skip membars (which follow allocation) because >>>>>>>>>>> nobody see non-escaping allocation. >>>>>>>>>>> >>>>>>>>>>> Load (#391) is not instance load from instance array (#363). >>>>>>>>>>> It is load from source Arraycopy (#255) (it is not >>>>>>>>>>> allocation). So it should not have bypass membars separating >>>>>>>>>>> them: >>>>>>>>>>> >>>>>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/escape.cpp#l2698 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Updated IR dump during before/after split_unique_types with >>>>>>>>>> wider context (and, unfortunately, different node ids): >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types_01.png >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> One detail is missing in the original description: there's >>>>>>>>>> another AllocateArray (#311) dominating the ArrayCopy (#389) >>>>>>>>>> and loads access it directly. >>>>>>>>>> >>>>>>>>>> ArrayCopy uses #311 as destination, so >>>>>>>>>> ArrayCopyNode::may_modify() returns true and stops further >>>>>>>>>> analysis: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk/hs/file/edb65305d3ac/src/hotspot/share/opto/escape.cpp#l2705 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> So it is really some problem in step 2) in EA. Could be >>>>>>>>>>> because only one alias index (memory slice) is used for whole >>>>>>>>>>> array access. >>>>>>>>>> >>>>>>>>>> Unlikely, since I don't see any interference between accesses >>>>>>>>>> to different elements during split_unique_types(). >>>>>>>>>> >>>>>>>>>>> So what memory slice of Merge node (#379) was updated to >>>>>>>>>>> bypass membar? >>>>>>>>>> >>>>>>>>>> It updates instance memory slice corresponding to: >>>>>>>>>> ?? bool[int:8]:NotNull:exact+any *,iid=311 >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Vladimir Ivanov >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 3/2/18 6:47 AM, Vladimir Ivanov wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I'm seeing unschedulable graph being produced during GCM >>>>>>>>>>>> when adding anti-dependence to a load node with a control >>>>>>>>>>>> dependency. I found the root cause, but can't decide how to >>>>>>>>>>>> fix it. >>>>>>>>>>>> >>>>>>>>>>>> Here are steps which lead to the broken graph: >>>>>>>>>>>> >>>>>>>>>>>> ??(1) The load causing problems (#391) is added as part of >>>>>>>>>>>> specializing ArrayCopy for small arrays (added as part of >>>>>>>>>>>> JDK-6912521 [1] in 9). Both control & memory are tied to >>>>>>>>>>>> AllocateArray. (IR [2]) >>>>>>>>>>>> >>>>>>>>>>>> ??(2) EA proves that AllocateArray (#363, destination) is >>>>>>>>>>>> scalar replaceable and during split_unique_types() updates >>>>>>>>>>>> corresponding MemoryMerge (#379) and it allows to directly >>>>>>>>>>>> use memory produced by ArrayCopy (#255, source) bypassing >>>>>>>>>>>> the allocation & membar (#348). (IR [3]) >>>>>>>>>>>> >>>>>>>>>>>> ??(3) After allocation elimination, the load control >>>>>>>>>>>> dependency is switched to MemBarCPUOrder (#348) which was >>>>>>>>>>>> immediate dominator of eliminated allocation (IR [4]) >>>>>>>>>>>> >>>>>>>>>>>> ??(4) After matching the load has control on the membar, but >>>>>>>>>>>> not memory (IR before [5] and after [6] matching.) >>>>>>>>>>>> >>>>>>>>>>>> ??(5) During GCM, anti-dependence from membar (#317) to the >>>>>>>>>>>> load is added, but it makes the graph unschedulable which >>>>>>>>>>>> then triggers the assertion [7] during LCM. >>>>>>>>>>>> >>>>>>>>>>>> Relevant places in the code: [8] >>>>>>>>>>>> >>>>>>>>>>>> Everything looks fine, except updates of MergeMems in step #2: >>>>>>>>>>>> >>>>>>>>>>>> ?? * the load is pinned to the proper branch after deciding >>>>>>>>>>>> what direction to go; >>>>>>>>>>>> >>>>>>>>>>>> ?? * wide membars do need anti-dependences on loads >>>>>>>>>>>> >>>>>>>>>>>> So, as a fix I'd disable memory edge updates which bypass >>>>>>>>>>>> any membars. Does it sound reasonable or am I missing >>>>>>>>>>>> something important? >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Vladimir Ivanov >>>>>>>>>>>> >>>>>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6912521 >>>>>>>>>>>> >>>>>>>>>>>> [2] >>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/01_initial.png >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [3] >>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/02_ea_split_unique_types.png >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [4] >>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/03_after_alloc_elimination.png >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [5] >>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/04_before_matching.png >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [6] >>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/misc/antidep/05_after_matching.png >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [7] >>>>>>>>>>>> #? Internal Error >>>>>>>>>>>> (/Users/vlivanov/ws/jdk/panama-dev/open/src/hotspot/share/opto/lcm.cpp:1169), >>>>>>>>>>>> pid=90414, tid=14851 >>>>>>>>>>>> #? assert(false) failed: graph should be schedulable >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [8] http://cr.openjdk.java.net/~vlivanov/misc/antidep/webrev/ From tobias.hartmann at oracle.com Wed Mar 7 08:19:27 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Mar 2018 09:19:27 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> Message-ID: <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> Hi Lutz, On 06.03.2018 18:03, Schmidt, Lutz wrote: > - failing AOT tests: I will restrict analytics to FOR_ALL_ALLOCABLE_HEAPS(), as suggested by Vladimir. Not sure if that will heal the tests. > - SIGFPE in CodeHeap::aggregate(): could that be related to the "memset" error? Never had that here at SAP, neither in OpenJDK test nor in SAP JVM. If that issue persists with the new webrev (coming up soon), I may need ad'l information. If not activated by command line argument or explicit call, the new code has _ZERO_ effect. Yes, I've executed all tests with the new flag enabled. I'll re-run with the new webrev and let you know the result. > - Documentation format: which format would you need? The PDF I uploaded is generated from our internal Wiki. It was the least effort for now. I could also provide plain text (losing all the formatting) or MS Word .docx (hopefully preserving most of the formatting). I think the pdf should be fine for now (the doc team will have a look). I've created a subtask to keep track of the documentation in the Tools Reference Guide: https://bugs.openjdk.java.net/browse/JDK-8199213 > - Documentation location: I'm not familiar with the policies that direct documentation to a certain place. Please put it where you think it's appropriate. Of course I will help wherever I can. I'm not sure either. I would assume that a summary will go into the Tools Reference Guide (and maybe the release notes or other guides) and that we should provide a link to the full documentation. Let's see what the doc team comes up with. I've added a comment to the doc task. > - Documentation contents/writing style: I will ask a SAP documentation specialist to have a look at it once it's final content-wise. That might eliminate some German-sounding English text. Sounds good, I know that problem :) > - Printing on CodeCache full condition: We have that in our SAP JVM product. Must be activated by a command line argument. Already proved helpful. Yes, I think it would be nice to add this to the patch. > So please stay tuned. I'm working hard to get all the modifications ready. It's a lot to do and, unfortunately, there is that day-to-day business demanding some attention as well. Sure, don't hurry. I'll take a look at the new webrev once it's available. Best regards, Tobias From lutz.schmidt at sap.com Wed Mar 7 09:26:07 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 7 Mar 2018 09:26:07 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> Message-ID: Hi Tobias, just one quick question on printing on CodeCache full condition: We have the print call in CodeCache::report_codemem_full(): if ((heap->full_count() == 0) || print) { ... if (heap->full_count() == 0) { CompileBroker::print_heapinfo(tty, "all", "4096"); // details, may be a lot! } } ... Would that location be ok for you? Thanks, Lutz ?On 07.03.18, 09:19, "Tobias Hartmann" wrote: Hi Lutz, On 06.03.2018 18:03, Schmidt, Lutz wrote: > - failing AOT tests: I will restrict analytics to FOR_ALL_ALLOCABLE_HEAPS(), as suggested by Vladimir. Not sure if that will heal the tests. > - SIGFPE in CodeHeap::aggregate(): could that be related to the "memset" error? Never had that here at SAP, neither in OpenJDK test nor in SAP JVM. If that issue persists with the new webrev (coming up soon), I may need ad'l information. If not activated by command line argument or explicit call, the new code has _ZERO_ effect. Yes, I've executed all tests with the new flag enabled. I'll re-run with the new webrev and let you know the result. > - Documentation format: which format would you need? The PDF I uploaded is generated from our internal Wiki. It was the least effort for now. I could also provide plain text (losing all the formatting) or MS Word .docx (hopefully preserving most of the formatting). I think the pdf should be fine for now (the doc team will have a look). I've created a subtask to keep track of the documentation in the Tools Reference Guide: https://bugs.openjdk.java.net/browse/JDK-8199213 > - Documentation location: I'm not familiar with the policies that direct documentation to a certain place. Please put it where you think it's appropriate. Of course I will help wherever I can. I'm not sure either. I would assume that a summary will go into the Tools Reference Guide (and maybe the release notes or other guides) and that we should provide a link to the full documentation. Let's see what the doc team comes up with. I've added a comment to the doc task. > - Documentation contents/writing style: I will ask a SAP documentation specialist to have a look at it once it's final content-wise. That might eliminate some German-sounding English text. Sounds good, I know that problem :) > - Printing on CodeCache full condition: We have that in our SAP JVM product. Must be activated by a command line argument. Already proved helpful. Yes, I think it would be nice to add this to the patch. > So please stay tuned. I'm working hard to get all the modifications ready. It's a lot to do and, unfortunately, there is that day-to-day business demanding some attention as well. Sure, don't hurry. I'll take a look at the new webrev once it's available. Best regards, Tobias From tobias.hartmann at oracle.com Wed Mar 7 09:40:34 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Mar 2018 10:40:34 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> Message-ID: Hi Lutz, On 07.03.2018 10:26, Schmidt, Lutz wrote: > just one quick question on printing on CodeCache full condition: > > We have the print call in CodeCache::report_codemem_full(): > if ((heap->full_count() == 0) || print) { > ... > if (heap->full_count() == 0) { > CompileBroker::print_heapinfo(tty, "all", "4096"); // details, may be a lot! > } > } > ... > > Would that location be ok for you? Yes but I would prefer to guard this by PrintCodeHeapState (or the unified logging equivalent) to avoid too much information being printed in production systems because it might confuse the non-expert users. Thanks, Tobias From tobias.hartmann at oracle.com Wed Mar 7 14:47:18 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Mar 2018 15:47:18 +0100 Subject: [11] RFR(S) 8199066: [JVMCI] EagerJVMCI option should also initialize the JVMCI compiler In-Reply-To: <1f73289e-590a-ec8f-964f-092c614b6d3d@oracle.com> References: <1f73289e-590a-ec8f-964f-092c614b6d3d@oracle.com> Message-ID: Hi Vladimir, looks good to me. Best regards, Tobias On 06.03.2018 21:15, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8199066/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8199066 > > JDK-8195623 introduced the EagerJVMCI option to avoid issues with lazy initialization of Graal. > However, the change only initialized the JVMCI subsystem without initializing Graal itself. > > Contributed by Doug Simon. > From vladimir.kozlov at oracle.com Wed Mar 7 16:30:13 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 7 Mar 2018 08:30:13 -0800 Subject: [11] RFR(S) 8199066: [JVMCI] EagerJVMCI option should also initialize the JVMCI compiler In-Reply-To: References: <1f73289e-590a-ec8f-964f-092c614b6d3d@oracle.com> Message-ID: <26df778d-47cc-0ac6-960b-0f93fe366329@oracle.com> Thank you, Tobias Vladimir On 3/7/18 6:47 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > On 06.03.2018 21:15, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8199066/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8199066 >> >> JDK-8195623 introduced the EagerJVMCI option to avoid issues with lazy initialization of Graal. >> However, the change only initialized the JVMCI subsystem without initializing Graal itself. >> >> Contributed by Doug Simon. >> From vladimir.kozlov at oracle.com Wed Mar 7 23:23:28 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 7 Mar 2018 15:23:28 -0800 Subject: [11] RFR(M) 8197235: src/hotspot/share/jvmci/jvmciCompilerToVM.cpp takes 4 minutes to compile on windows Message-ID: http://cr.openjdk.java.net/~kvn/8197235/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8197235 To avoid long compilation time of jvmciCompilerToVM.cpp I moved most expensive methods to new file jvmciCompilerToVMInit.cpp and switch off C++ compiler optimization for it on Windows and Solaris where the problem is seen. Changes are copy+paste of code from one file into an other. I also removed #include which are not required for compilation. The bug contains performance data. Ran pre-integration testing. -- Thanks, Vladimir From erik.joelsson at oracle.com Wed Mar 7 23:58:44 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 7 Mar 2018 15:58:44 -0800 Subject: [11] RFR(M) 8197235: src/hotspot/share/jvmci/jvmciCompilerToVM.cpp takes 4 minutes to compile on windows In-Reply-To: References: Message-ID: Looks good to me. Thanks for fixing this! /Erik On 2018-03-07 15:23, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8197235/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8197235 > > To avoid long compilation time of jvmciCompilerToVM.cpp I moved most > expensive methods to new file jvmciCompilerToVMInit.cpp and switch off > C++ compiler optimization for it on Windows and Solaris where the > problem is seen. > > Changes are copy+paste of code from one file into an other. I also > removed #include which are not required for compilation. > > The bug contains performance data. > > Ran pre-integration testing. > From vladimir.kozlov at oracle.com Thu Mar 8 00:43:39 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 7 Mar 2018 16:43:39 -0800 Subject: [11] RFR(M) 8197235: src/hotspot/share/jvmci/jvmciCompilerToVM.cpp takes 4 minutes to compile on windows In-Reply-To: References: Message-ID: Thank you, Eric Vladimir K On 3/7/18 3:58 PM, Erik Joelsson wrote: > Looks good to me. Thanks for fixing this! > > /Erik > > > On 2018-03-07 15:23, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8197235/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8197235 >> >> To avoid long compilation time of jvmciCompilerToVM.cpp I moved most >> expensive methods to new file jvmciCompilerToVMInit.cpp and switch off >> C++ compiler optimization for it on Windows and Solaris where the >> problem is seen. >> >> Changes are copy+paste of code from one file into an other. I also >> removed #include which are not required for compilation. >> >> The bug contains performance data. >> >> Ran pre-integration testing. >> > From vladimir.kozlov at oracle.com Thu Mar 8 05:02:59 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 7 Mar 2018 21:02:59 -0800 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp Message-ID: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8199212 Our testing show that AOT jtreg tests consume a lot of time when running with -Xcomp. Running AOT compiler in these tests with -Xcomp is not helpful. -- Thanks, Vladimir From jcbeyler at google.com Fri Mar 9 00:00:17 2018 From: jcbeyler at google.com (JC Beyler) Date: Fri, 09 Mar 2018 00:00:17 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi all, I apologize for the delay but I wanted to add an event system and that took a bit longer than expected and I also reworked the code to take into account the deprecation of FastTLABRefill. This update has four parts: A) I moved the implementation from Thread to ThreadHeapSampler inside of Thread. Would you prefer it as a pointer inside of Thread or like this works for you? Second question would be would you rather have an association outside of Thread altogether that tries to remember when threads are live and then we would have something like: ThreadHeapSampler::get_sampling_size(this_thread); I worry about the overhead of this but perhaps it is not too too bad? B) I also have been working on the Allocation event system that sends out a notification at each sampled event. This will be practical when wanting to do something at the allocation point. I'm also looking at if the whole heapMonitoring code could not reside in the agent code and not in the JDK. I'm not convinced but I'm talking to Serguei about it to see/assess :) - Also added two tests for the new event subsystem C) Removed the slow_path fields inside the TLAB code since now FastTLABRefill is deprecated D) Updated the JVMTI documentation and specification for the methods. So the incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ and the full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 I believe I have updated the various JIRA issues that track this :) Thanks for your input, Jc On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler wrote: > Hi Erik, > > I inlined my answers, which the last one seems to answer Robbin's concerns > about the same thing (adding things to Thread). > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > wrote: > >> Hi JC, >> >> Comments are inlined below. >> >> >> On 2018-02-13 06:18, JC Beyler wrote: >> >> Hi Erik, >> >> Thanks for your answers, I've now inlined my own answers/comments. >> >> I've done a new webrev here: >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >> >> The incremental is here: >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >> >> Note to all: >> - I've been integrating changes from Erin/Serguei/David comments so >> this webrev incremental is a bit an answer to all comments in one. I >> apologize for that :) >> >> >> On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund < >> erik.osterlund at oracle.com> wrote: >> >>> Hi JC, >>> >>> Sorry for the delayed reply. >>> >>> Inlined answers: >>> >>> >>> On 2018-02-06 00:04, JC Beyler wrote: >>> >>>> Hi Erik, >>>> >>>> (Renaming this to be folded into the newly renamed thread :)) >>>> >>>> First off, thanks a lot for reviewing the webrev! I appreciate it! >>>> >>>> I updated the webrev to: >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>>> >>>> And the incremental one is here: >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>>> >>>> It contains: >>>> - The change for since from 9 to 11 for the jvmti.xml >>>> - The use of the OrderAccess for initialized >>>> - Clearing the oop >>>> >>>> I also have inlined my answers to your comments. The biggest question >>>> will come from the multiple *_end variables. A bit of the logic there >>>> is due to handling the slow path refill vs fast path refill and >>>> checking that the rug was not pulled underneath the slowpath. I >>>> believe that a previous comment was that TlabFastRefill was going to >>>> be deprecated. >>>> >>>> If this is true, we could revert this code a bit and just do a : if >>>> TlabFastRefill is enabled, disable this. And then deprecate that when >>>> TlabFastRefill is deprecated. >>>> >>>> This might simplify this webrev and I can work on a follow-up that >>>> either: removes TlabFastRefill if Robbin does not have the time to do >>>> it or add the support to the assembly side to handle this correctly. >>>> What do you think? >>>> >>> >>> I support removing TlabFastRefill, but I think it is good to not depend >>> on that happening first. >>> >>> >> >> I'm slowly pushing on the FastTLABRefill ( >> >> https://bugs.openjdk.java.net/browse/JDK-8194084), I agree on keeping >> both separate for now though so that we can think of both differently >> >> >> >>> Now, below, inlined are my answers: >>>> >>>> On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund >>>> wrote: >>>> >>>>> Hi JC, >>>>> >>>>> Hope I am reviewing the right version of your work. Here goes... >>>>> >>>>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>>>> >>>>> 159 AllocTracer::send_allocation_outside_tlab(klass, result, >>>>> size * >>>>> HeapWordSize, THREAD); >>>>> 160 >>>>> 161 THREAD->tlab().handle_sample(THREAD, result, size); >>>>> 162 return result; >>>>> 163 } >>>>> >>>>> Should not call tlab()->X without checking if (UseTLAB) IMO. >>>>> >>>>> Done! >>>> >>> >>> More about this later. >>> >>> >>> >>>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>>>> >>>>> So first of all, there seems to quite a few ends. There is an "end", a >>>>> "hard >>>>> end", a "slow path end", and an "actual end". Moreover, it seems like >>>>> the >>>>> "hard end" is actually further away than the "actual end". So the >>>>> "hard end" >>>>> seems like more of a "really definitely actual end" or something. I >>>>> don't >>>>> know about you, but I think it looks kind of messy. In particular, I >>>>> don't >>>>> feel like the name "actual end" reflects what it represents, >>>>> especially when >>>>> there is another end that is behind the "actual end". >>>>> >>>>> 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { >>>>> 414 // Did a fast TLAB refill occur? >>>>> 415 if (_slow_path_end != _end) { >>>>> 416 // Fix up the actual end to be now the end of this TLAB. >>>>> 417 _slow_path_end = _end; >>>>> 418 _actual_end = _end; >>>>> 419 } >>>>> 420 >>>>> 421 return _actual_end + alignment_reserve(); >>>>> 422 } >>>>> >>>>> I really do not like making getters unexpectedly have these kind of >>>>> side >>>>> effects. It is not expected that when you ask for the "hard end", you >>>>> implicitly update the "slow path end" and "actual end" to new values. >>>>> >>>>> As I said, a lot of this is due to the FastTlabRefill. If I make this >>>> not supporting FastTlabRefill, this goes away. The reason the system >>>> needs to update itself at the get is that you only know at that get if >>>> things have shifted underneath the tlab slow path. I am not sure of >>>> really better names (naming is hard!), perhaps we could do these >>>> names: >>>> >>>> - current_tlab_end // Either the allocated tlab end or a sampling >>>> point >>>> - last_allocation_address // The end of the tlab allocation >>>> - last_slowpath_allocated_end // In case a fast refill occurred the >>>> end might have changed, this is to remember slow vs fast past refills >>>> >>>> the hard_end method can be renamed to something like: >>>> tlab_end_pointer() // The end of the lab including a bit of >>>> alignment reserved bytes >>>> >>> >>> Those names sound better to me. Could you please provide a mapping from >>> the old names to the new names so I understand which one is which please? >>> >>> This is my current guess of what you are proposing: >>> >>> end -> current_tlab_end >>> actual_end -> last_allocation_address >>> slow_path_end -> last_slowpath_allocated_end >>> hard_end -> tlab_end_pointer >>> >>> >> Yes that is correct, that was what I was proposing. >> >> >>> I would prefer this naming: >>> >>> end -> slow_path_end // the end for taking a slow path; either due to >>> sampling or refilling >>> actual_end -> allocation_end // the end for allocations >>> slow_path_end -> last_slow_path_end // last address for slow_path_end >>> (as opposed to allocation_end) >>> hard_end -> reserved_end // the end of the reserved space of the TLAB >>> >>> About setting things in the getter... that still seems like a very >>> unpleasant thing to me. It would be better to inspect the call hierarchy >>> and explicitly update the ends where they need updating, and assert in the >>> getter that they are in sync, rather than implicitly setting various ends >>> as a surprising side effect in a getter. It looks like the call hierarchy >>> is very small. With my new naming convention, reserved_end() would >>> presumably return _allocation_end + alignment_reserve(), and have an assert >>> checking that _allocation_end == _last_slow_path_allocation_end, >>> complaining that this invariant must hold, and that a caller to this >>> function, such as make_parsable(), must first explicitly synchronize the >>> ends as required, to honor that invariant. >>> >>> >> >> I've renamed the variables to how you preferred it except for the _end >> one. I did: >> current_end >> last_allocation_address >> tlab_end_ptr >> >> The reason is that the architecture dependent code use the thread.hpp API >> and it already has tlab included into the name so it becomes >> tlab_current_end (which is better that tlab_current_tlab_end in my opinion). >> >> I also moved the update into a separate method with a TODO that says to >> remove it when FastTLABRefill is deprecated >> >> >> This looks a lot better now. Thanks. >> >> Note that the following comment now needs updating accordingly in >> threadLocalAllocBuffer.hpp: >> >> 41 // Heap sampling is performed via the end/actual_end fields. 42 // actual_end contains the real end of the tlab allocation, 43 // whereas end can be set to an arbitrary spot in the tlab to 44 // trip the return and sample the allocation. 45 // slow_path_end is used to track if a fast tlab refill occured 46 // between slowpath calls. >> >> There might be other comments too, I have not looked in detail. >> > > This was the only spot that still had an actual_end, I fixed it now. I'll > do a sweep to double check other comments. > > >> >> >> >> >> >>> >>> Not sure it's better but before updating the webrev, I wanted to try >>>> to get input/consensus :) >>>> >>>> (Note hard_end was always further off than end). >>>> >>>> src/hotspot/share/prims/jvmti.xml: >>>>> >>>>> 10357 >>>>> 10358 >>>>> 10359 Can sample the heap. >>>>> 10360 If this capability is enabled then the heap sampling >>>>> methods >>>>> can be called. >>>>> 10361 >>>>> 10362 >>>>> >>>>> Looks like this capability should not be "since 9" if it gets >>>>> integrated >>>>> now. >>>>> >>>> Updated now to 11, crossing my fingers :) >>>> >>>> >>>> src/hotspot/share/runtime/heapMonitoring.cpp: >>>>> >>>>> 448 if (is_alive->do_object_b(value)) { >>>>> 449 // Update the oop to point to the new object if it is >>>>> still >>>>> alive. >>>>> 450 f->do_oop(&(trace.obj)); >>>>> 451 >>>>> 452 // Copy the old trace, if it is still live. >>>>> 453 _allocated_traces->at_put(curr_pos++, trace); >>>>> 454 >>>>> 455 // Store the live trace in a cache, to be served up on >>>>> /heapz. >>>>> 456 _traces_on_last_full_gc->append(trace); >>>>> 457 >>>>> 458 count++; >>>>> 459 } else { >>>>> 460 // If the old trace is no longer live, add it to the >>>>> list of >>>>> 461 // recently collected garbage. >>>>> 462 store_garbage_trace(trace); >>>>> 463 } >>>>> >>>>> In the case where the oop was not live, I would like it to be >>>>> explicitly >>>>> cleared. >>>>> >>>> Done I think how you wanted it. Let me know because I'm not familiar >>>> with the RootAccess API. I'm unclear if I'm doing this right or not so >>>> reviews of these parts are highly appreciated. Robbin had talked of >>>> perhaps later pushing this all into a OopStorage, should I do this now >>>> do you think? Or can that wait a second webrev later down the road? >>>> >>> >>> I think using handles can and should be done later. You can use the >>> Access API now. >>> I noticed that you are missing an #include "oops/access.inline.hpp" in >>> your heapMonitoring.cpp file. >>> >>> >> The missing header is there for me so I don't know, I made sure it is >> present in the latest webrev. Sorry about that. >> >> >> >>> + Did I clear it the way you wanted me to or were you thinking of >>>> something else? >>>> >>> >>> That is precisely how I wanted it to be cleared. Thanks. >>> >>> + Final question here, seems like if I were to want to not do the >>>> f->do_oop directly on the trace.obj, I'd need to do something like: >>>> >>>> f->do_oop(&value); >>>> ... >>>> trace->store_oop(value); >>>> >>>> to update the oop internally. Is that right/is that one of the >>>> advantages of going to the Oopstorage sooner than later? >>>> >>> >>> I think you really want to do the do_oop on the root directly. Is there >>> a particular reason why you would not want to do that? >>> Otherwise, yes - the benefit with using the handle approach is that you >>> do not need to call do_oop explicitly in your code. >>> >>> >> There is no reason except that now we have a load_oop and a get_oop_addr, >> I was not sure what you would think of that. >> >> >> That's fine. >> >> >> >>> >>>> Also I see a lot of concurrent-looking use of the following field: >>>>> 267 volatile bool _initialized; >>>>> >>>>> Please note that the "volatile" qualifier does not help with reordering >>>>> here. Reordering between volatile and non-volatile fields is >>>>> completely free >>>>> for both compiler and hardware, except for windows with MSVC, where >>>>> volatile >>>>> semantics is defined to use acquire/release semantics, and the >>>>> hardware is >>>>> TSO. But for the general case, I would expect this field to be stored >>>>> with >>>>> OrderAccess::release_store and loaded with OrderAccess::load_acquire. >>>>> Otherwise it is not thread safe. >>>>> >>>> Because everything is behind a mutex, I wasn't really worried about >>>> this. I have a test that has multiple threads trying to hit this >>>> corner case and it passes. >>>> >>>> However, to be paranoid, I updated it to using the OrderAccess API >>>> now, thanks! Let me know what you think there too! >>>> >>> >>> If it is indeed always supposed to be read and written under a mutex, >>> then I would strongly prefer to have it accessed as a normal non-volatile >>> member, and have an assertion that given lock is held or we are in a >>> safepoint, as we do in many other places. Something like this: >>> >>> assert(HeapMonitorStorage_lock->owned_by_self() || >>> (SafepointSynchronize::is_at_safepoint() && >>> Thread::current()->is_VM_thread()), "this should not be accessed >>> concurrently"); >>> >>> It would be confusing to people reading the code if there are uses of >>> OrderAccess that are actually always protected under a mutex. >>> >>> >> Thank you for the exact example to be put in the code! I put it around >> each access/assignment of the _initialized method and found one case where >> yes you can touch it and not have the lock. It actually is "ok" because you >> don't act on the storage until later and only when you really want to >> modify the storage (see the object_alloc_do_sample method which calls the >> add_trace method). >> >> But, because of this, I'm going to put the OrderAccess here, I'll do some >> performance numbers later and if there are issues, I might add a "unsafe" >> read and a "safe" one to make it explicit to the reader. But I don't think >> it will come to that. >> >> >> Okay. This double return in heapMonitoring.cpp looks wrong: >> >> 283 bool initialized() { >> 284 return OrderAccess::load_acquire(&_initialized) != 0; >> 285 return _initialized; >> 286 } >> >> Since you said object_alloc_do_sample() is the only place where you do >> not hold the mutex while reading initialized(), I had a closer look at >> that. It looks like in its current shape, the lack of a mutex may lead to a >> memory leak. In particular, it first checks if (initialized()). Let's >> assume this is now true. It then allocates a bunch of stuff, and checks if >> the number of frames were over 0. If they were, it calls >> StackTraceStorage::storage()->add_trace() seemingly hoping that after >> grabbing the lock in there, initialized() will still return true. But it >> could now return false and skip doing anything, in which case the allocated >> stuff will never be freed. >> > > I fixed this now by making add_trace return a boolean and checking for > that. It will be in the next webrev. Thanks, the truth is that in our > implementation the system is always on or off, so this never really occurs > :). In this version though, that is not true and it's important to handle > so thanks again! > > > >> >> So the analysis seems to be that _initialized is only used outside of the >> mutex in once instance, where it is used to perform double-checked locking, >> that actually causes a memory leak. >> >> I am not proposing how to fix that, just raising the issue. If you still >> want to perform this double-checked locking somehow, then the use of >> acquire/release still seems odd. Because the memory ordering restrictions >> of it never comes into play in this particular case. If it ever did, then >> the use of destroy_stuff(); release_store(_initialized, 0) would be broken >> anyway as that would imply that whatever concurrent reader there ever was >> would after reading _initialized with load_acquire() could *never* read the >> data that is concurrently destroyed anyway. I would be biased to think that >> RawAccess::load/store looks like a more appropriate solution, >> given that the memory leak issue is resolved. I do not know how painful it >> would be to not perform this double-checked locking. >> > > So I agree with this entirely. I looked also a bit more and the difference > and code really stems from our internal version. In this version however, > there are actually a lot of things going on that I did not go entirely > through in my head but this comment made me ponder a bit more on it. > > Since every object_alloc_do_sample is protected by a check to > HeapMonitoring::enabled(), there is only a small chance that the call is > happening when things have been disabled. So there is no real need to do a > first check on the initialized, it is a rare occurence that a call happens > to object_alloc_do_sample and the initialized of the storage returns false. > > (By the way, even if you did call object_alloc_do_sample without looking > at HeapMonitoring::enabled(), that would be ok too. You would gather the > stacktrace and get nowhere at the add_trace call, which would return false; > so though not optimal performance wise, nothing would break). > > Furthermore, the add_trace is really the moment of no return and we have > the mutex lock and then the initialized check. So, in the end, I did two > things: I removed that first check and then I removed the OrderAccess for > the storage initialized. I think now I have a better grasp and > understanding why it was done in our code and why it is not needed here. > Thanks for pointing it out :). This now still passes my JTREG tests, > especially the threaded one. > > > > > >> >> >> >> >> >>> As a kind of meta comment, I wonder if it would make sense to add >>>>> sampling >>>>> for non-TLAB allocations. Seems like if someone is rapidly allocating a >>>>> whole bunch of 1 MB objects that never fit in a TLAB, I might still be >>>>> interested in seeing that in my traces, and not get surprised that the >>>>> allocation rate is very high yet not showing up in any profiles. >>>>> >>>>> That is handled by the handle_sample where you wanted me to put a >>>> UseTlab because you hit that case if the allocation is too big. >>>> >>> >>> I see. It was not obvious to me that non-TLAB sampling is done in the >>> TLAB class. That seems like an abstraction crime. >>> What I wanted in my previous comment was that we do not call into the >>> TLAB when we are not using TLABs. If there is sampling logic in the TLAB >>> that is used for something else than TLABs, then it seems like that logic >>> simply does not belong inside of the TLAB. It should be moved out of the >>> TLAB, and instead have the TLAB call this common abstraction that makes >>> sense. >>> >>> >> So in the incremental version: >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is >> still a "crime". The reason is that the system has to have the >> bytes_until_sample on a per-thread level and it made "sense" to have it >> with the TLAB implementation. Also, I was not sure how people felt about >> adding something to the thread instance instead. >> >> Do you think it fits better at the Thread level? I can see how difficult >> it is to make it happen there and add some logic there. Let me know what >> you think. >> >> >> We have an unfortunate situation where everyone that has some fields that >> are thread local tend to dump them right into Thread, making the size and >> complexity of Thread grow as it becomes tightly coupled with various >> unrelated subsystems. It would be desirable to have a separate class for >> this instead that encapsulates the sampling logic. That class could >> possibly reside in Thread though as a value object of Thread. >> > > I imagined that would be the case but was not sure. I will look at the > example that Robbin is talking about (ThreadSMR) and will see how to > refactor my code to use that. > > Thanks again for your help, > Jc > > >> >> >> >> >> >>> Hope I have answered your questions and that my feedback makes sense to >>> you. >>> >>> >> You have and thank you for them, I think we are getting to a cleaner >> implementation and things are getting better and more readable :) >> >> >> Yes it is getting better. >> >> Thanks, >> /Erik >> >> >> Thanks for your help! >> Jc >> >> >> >>> Thanks, >>> /Erik >>> >>> >>> I double checked by changing the test >>>> >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java >>>> >>>> to use a smaller Tlab (2048) and made the object bigger and it goes >>>> through that and passes. >>>> >>>> Thanks again for your review and I look forward to your pointers for >>>> the questions I now have raised! >>>> Jc >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>>> /Erik >>>>> >>>>> >>>>> On 2018-01-26 06:45, JC Beyler wrote: >>>>> >>>>>> Thanks Robbin for the reviews :) >>>>>> >>>>>> The new full webrev is here: >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ >>>>>> The incremental webrev is here: >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ >>>>>> >>>>>> I inlined my answers: >>>>>> >>>>>> On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn < >>>>>> robbin.ehn at oracle.com> wrote: >>>>>> >>>>>>> Hi JC, great to see another revision! >>>>>>> >>>>>>> #### >>>>>>> heapMonitoring.cpp >>>>>>> >>>>>>> StackTraceData should not contain the oop for 'safety' reasons. >>>>>>> When StackTraceData is moved from _allocated_traces: >>>>>>> L452 store_garbage_trace(trace); >>>>>>> it contains a dead oop. >>>>>>> _allocated_traces could instead be a tupel of oop and StackTraceData >>>>>>> thus >>>>>>> dead oops are not kept. >>>>>>> >>>>>> Done I used inheritance to make the copier work regardless but the >>>>>> idea is the same. >>>>>> >>>>>> You should use the new Access API for loading the oop, something like >>>>>>> this: >>>>>>> RootAccess::load(...) >>>>>>> I don't think you need to use Access API for clearing the oop, but it >>>>>>> would >>>>>>> look nicer. And you shouldn't probably be using: >>>>>>> Universe::heap()->is_in_reserved(value) >>>>>>> >>>>>> I am unfamiliar with this but I think I did do it like you wanted me >>>>>> to (all tests pass so that's a start). I'm not sure how to clear the >>>>>> oop exactly, is there somewhere that does that, which I can use to do >>>>>> the same? >>>>>> >>>>>> I removed the is_in_reserved, this came from our internal version, I >>>>>> don't know why it was there but my tests work without so I removed it >>>>>> :) >>>>>> >>>>>> >>>>>> The lock: >>>>>>> L424 MutexLocker mu(HeapMonitorStorage_lock); >>>>>>> Is not needed as far as I can see. >>>>>>> weak_oops_do is called in a safepoint, no TLAB allocation can happen >>>>>>> and >>>>>>> JVMTI thread can't access these data-structures. Is there something >>>>>>> more >>>>>>> to >>>>>>> this lock that I'm missing? >>>>>>> >>>>>> Since a thread can call the JVMTI getLiveTraces (or any of the other >>>>>> ones), it can get to the point of trying to copying the >>>>>> _allocated_traces. I imagine it is possible that this is happening >>>>>> during a GC or that it can be started and a GC happens afterwards. >>>>>> Therefore, it seems to me that you want this protected, no? >>>>>> >>>>>> >>>>>> #### >>>>>>> You have 6 files without any changes in them (any more): >>>>>>> g1CollectedHeap.cpp >>>>>>> psMarkSweep.cpp >>>>>>> psParallelCompact.cpp >>>>>>> genCollectedHeap.cpp >>>>>>> referenceProcessor.cpp >>>>>>> thread.hpp >>>>>>> >>>>>>> Done. >>>>>> >>>>>> #### >>>>>>> I have not looked closely, but is it possible to hide heap sampling >>>>>>> in >>>>>>> AllocTracer ? (with some minor changes to the AllocTracer API) >>>>>>> >>>>>>> I am imagining that you are saying to move the code that does the >>>>>> sampling code (change the tlab end, do the call to HeapMonitoring, >>>>>> etc.) into the AllocTracer code itself? I think that is right and I'll >>>>>> look if that is possible and prepare a webrev to show what would be >>>>>> needed to make that happen. >>>>>> >>>>>> #### >>>>>>> Minor nit, when declaring pointer there is a little mix of having the >>>>>>> pointer adjacent by type name and data name. (Most hotspot code is by >>>>>>> type >>>>>>> name) >>>>>>> E.g. >>>>>>> heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... >>>>>>> heapMonitoring.cpp:733 Method* m = vfst.method(); >>>>>>> (not just this file) >>>>>>> >>>>>>> Done! >>>>>> >>>>>> #### >>>>>>> HeapMonitorThreadOnOffTest.java:77 >>>>>>> I would make g_tmp volatile, otherwise the assignment in loop may >>>>>>> theoretical be skipped. >>>>>>> >>>>>>> Also done! >>>>>> >>>>>> Thanks again! >>>>>> Jc >>>>>> >>>>> >>>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Mar 9 01:36:31 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 8 Mar 2018 17:36:31 -0800 Subject: [11] RFR(S) 8198591: compiler/aot/fingerprint tests should be moved to open Message-ID: http://cr.openjdk.java.net/~kvn/8198591/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8198591 AppCDS is now available in Open JDK. Move associated AOT tests into open. To test with Oracle JDK tests use +UnlockCommercialFeatures flag with UseAppCDS. To test with Open JDK +IgnoreUnrecognizedVMOptions flag was added because UnlockCommercialFeatures flag is not available and OpenJDK does not require this flag to use AppCDS. Ran pre-integration testing which runs AOT tests. -- Thanks, Vladimir From tobias.hartmann at oracle.com Fri Mar 9 10:15:11 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 9 Mar 2018 11:15:11 +0100 Subject: [11] RFR(S) 8198591: compiler/aot/fingerprint tests should be moved to open In-Reply-To: References: Message-ID: <4ddb8595-d30b-9339-6b61-609927c1b00a@oracle.com> Hi Vladimir, looks good to me. Best regards, Tobias On 09.03.2018 02:36, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8198591/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8198591 > > AppCDS is now available in Open JDK. Move associated AOT tests into open. > > To test with Oracle JDK tests use +UnlockCommercialFeatures flag with UseAppCDS. > To test with Open JDK +IgnoreUnrecognizedVMOptions flag was added because UnlockCommercialFeatures > flag is not available and OpenJDK does not require this flag to use AppCDS. > > Ran pre-integration testing which runs AOT tests. > From goetz.lindenmaier at sap.com Fri Mar 9 14:53:47 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 9 Mar 2018 14:53:47 +0000 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: Hi Martin, well, I like math, but not if it's expressed in Power vector instructions :) I had a look at the code and checked our nightly tests where you put the change to make sure they are fine. No issues. Eventually you could put it into the jdk build, too, then it's also verified by the benchmarks. I understand you are now generating the tables at startup. Looks good. Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Donnerstag, 1. M?rz 2018 16:50 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Cc: Lindenmaier, Goetz ; Hiroshi H Horii > (HORII at jp.ibm.com) ; Michihiro Horie > (HORIE at jp.ibm.com) ; Gustavo Romero > > Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation > > Hi, > > > > I have implemented a more generic version of the vector instruction based > CRC code. > > It supports CRC32C and Big Endian, too. > > > > The peak performance was even better for large input streams. I got almost > 40GB/s. > > Some smaller length may be slower than with the old version. > > Maybe somebody from IBM would like to double-check performance. > > > > Please review: > > http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ > > (hoping you like math :)) > > > > Best regards, > > Martin > > From martin.doerr at sap.com Fri Mar 9 17:17:51 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 9 Mar 2018 17:17:51 +0000 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: Hi Michihiro and G?tz, thanks, G?tz, for reviewing. Michihiro, can I add you as 2nd reviewer (no need to be jdk-reviewer for that)? I think you have taken a look and you obviously ran tests. So if you're ok with the change, I can push it next week. Our nightly tests look good, too. Thanks and best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 2. M?rz 2018 11:55 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; Hiroshi H Horii (HORII at jp.ibm.com) ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Martin, I double checked performance with our micro benchmark. This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance. Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector]"Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector instruction based CRC code. From: "Doerr, Martin" > To: "'hotspot-compiler-dev at openjdk.java.net'" > Cc: "Lindenmaier, Goetz" >, "Hiroshi H Horii (HORII at jp.ibm.com)" >, "Michihiro Horie (HORIE at jp.ibm.com)" >, Gustavo Romero > Date: 2018/03/02 00:49 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation ________________________________ Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math :)) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From vladimir.kozlov at oracle.com Fri Mar 9 17:34:25 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 9 Mar 2018 09:34:25 -0800 Subject: [11] RFR(S) 8198591: compiler/aot/fingerprint tests should be moved to open In-Reply-To: <4ddb8595-d30b-9339-6b61-609927c1b00a@oracle.com> References: <4ddb8595-d30b-9339-6b61-609927c1b00a@oracle.com> Message-ID: <86504890-aebf-51ee-f10a-df297e02b51a@oracle.com> Thank you, Tobias Vladimir On 3/9/18 2:15 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > On 09.03.2018 02:36, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8198591/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8198591 >> >> AppCDS is now available in Open JDK. Move associated AOT tests into open. >> >> To test with Oracle JDK tests use +UnlockCommercialFeatures flag with UseAppCDS. >> To test with Open JDK +IgnoreUnrecognizedVMOptions flag was added because UnlockCommercialFeatures >> flag is not available and OpenJDK does not require this flag to use AppCDS. >> >> Ran pre-integration testing which runs AOT tests. >> From nils.eliasson at oracle.com Fri Mar 9 19:46:52 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 9 Mar 2018 20:46:52 +0100 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> Message-ID: <66156c4a-8dd5-4cf9-ef84-3e57093866f2@oracle.com> Hi Vladimir, Looks good, // Nils On 2018-03-08 06:02, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8199212 > > Our testing show that AOT jtreg tests consume a lot of time when > running with -Xcomp.? Running AOT compiler in these tests with -Xcomp > is not helpful. > From vladimir.kozlov at oracle.com Fri Mar 9 19:51:07 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 9 Mar 2018 11:51:07 -0800 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: <66156c4a-8dd5-4cf9-ef84-3e57093866f2@oracle.com> References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> <66156c4a-8dd5-4cf9-ef84-3e57093866f2@oracle.com> Message-ID: Thank you, Nils Vladimir On 3/9/18 11:46 AM, Nils Eliasson wrote: > Hi Vladimir, > > Looks good, > > // Nils > > > On 2018-03-08 06:02, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8199212 >> >> Our testing show that AOT jtreg tests consume a lot of time when >> running with -Xcomp.? Running AOT compiler in these tests with -Xcomp >> is not helpful. >> > From razvan.a.lupusoru at intel.com Fri Mar 9 22:54:04 2018 From: razvan.a.lupusoru at intel.com (Lupusoru, Razvan A) Date: Fri, 9 Mar 2018 22:54:04 +0000 Subject: RFR(S): Vector popcount support Message-ID: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> Hi everyone, As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" manual [1], vector popcount instruction will be supported in future Intel ISA. I have updated the superword vectorizer to take advantage of this instruction. I have tested with Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments. http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/index.html Thanks, Razvan [1] https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf [2] https://software.intel.com/en-us/articles/intel-software-development-emulator [3] https://bugs.openjdk.java.net/browse/JDK-8199421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Mar 10 00:02:32 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 9 Mar 2018 16:02:32 -0800 Subject: RFR(S): Vector popcount support In-Reply-To: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> Message-ID: <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> Hi Razvan, In general changes are good. Do you plans to add vpopcntb,w too? Use 'supports' with 's' in name as in other support functions names: supports_avx512_vpopcntdq() Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? In assembler_x86.cpp and other places you don't need to check UseAVX, support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: if (UseAVX < 3) { _features &= ~CPU_AVX512F; _features &= ~CPU_AVX512DQ; _features &= ~CPU_AVX512CD; _features &= ~CPU_AVX512BW; _features &= ~CPU_AVX512VL; + _features &= ~CPU_AVX512_VPOPCNTDQ; } In x86.ad you forgot to add length check in predicate() like next: instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ predicate(UseAVX > 0 && n->as_Vector()->length() == 2); And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. Thanks, Vladimir On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: > Hi everyone, > > As per ?Intel Architecture Instruction Set Extensions and Future > Features Programming Reference? manual [1], vector popcount instruction > will be supported in future Intel ISA. I have updated the superword > vectorizer to take advantage of this instruction. I have tested with > Intel SDE [2] to confirm encoding and semantics are correctly > implemented. Please take a look and let me know if you have any > questions or comments. > > http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/index.html > > Thanks, > > Razvan > > [1] > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf > > [2] > https://software.intel.com/en-us/articles/intel-software-development-emulator > > [3] https://bugs.openjdk.java.net/browse/JDK-8199421 > From HORIE at jp.ibm.com Mon Mar 12 05:14:17 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Mon, 12 Mar 2018 14:14:17 +0900 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: Hi Martin, >Michihiro, can I add you as 2nd reviewer (no need to be jdk-reviewer for that)? Sure. I had a look at the code, and I'm ok with the change. Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie Cc: "Lindenmaier, Goetz" , Gustavo Romero , "Hiroshi H Horii (HORII at jp.ibm.com)" , "'hotspot-compiler-dev at openjdk.java.net'" Date: 2018/03/10 02:17 Subject: RE: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Michihiro and G?tz, thanks, G?tz, for reviewing. Michihiro, can I add you as 2nd reviewer (no need to be jdk-reviewer for that)? I think you have taken a look and you obviously ran tests. So if you?re ok with the change, I can push it next week. Our nightly tests look good, too. Thanks and best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 2. M?rz 2018 11:55 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; Hiroshi H Horii (HORII at jp.ibm.com) ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Martin, I double checked performance with our micro benchmark. This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance. Best regards, -- Michihiro, IBM Research - Tokyo Inactive hide details for "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector instruction based CRC code. From: "Doerr, Martin" To: "'hotspot-compiler-dev at openjdk.java.net'" < hotspot-compiler-dev at openjdk.java.net> Cc: "Lindenmaier, Goetz" , "Hiroshi H Horii ( HORII at jp.ibm.com)" , "Michihiro Horie (HORIE at jp.ibm.com)" , Gustavo Romero Date: 2018/03/02 00:49 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math J) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From martin.doerr at sap.com Mon Mar 12 11:08:20 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 12 Mar 2018 11:08:20 +0000 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation In-Reply-To: References: Message-ID: <692c6966f7d94599a10edadd0fc44a69@sap.com> Thanks. I?ve pushed it. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Montag, 12. M?rz 2018 06:14 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; Hiroshi H Horii (HORII at jp.ibm.com) ; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Martin, >Michihiro, can I add you as 2nd reviewer (no need to be jdk-reviewer for that)? Sure. I had a look at the code, and I'm ok with the change. Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/03/10 02:17:58---Hi Michihiro and G?tz, thanks, G?tz, for reviewing.]"Doerr, Martin" ---2018/03/10 02:17:58---Hi Michihiro and G?tz, thanks, G?tz, for reviewing. From: "Doerr, Martin" > To: Michihiro Horie > Cc: "Lindenmaier, Goetz" >, Gustavo Romero >, "Hiroshi H Horii (HORII at jp.ibm.com)" >, "'hotspot-compiler-dev at openjdk.java.net'" > Date: 2018/03/10 02:17 Subject: RE: RFR(L): 8198894: [PPC64] More generic vector CRC implementation ________________________________ Hi Michihiro and G?tz, thanks, G?tz, for reviewing. Michihiro, can I add you as 2nd reviewer (no need to be jdk-reviewer for that)? I think you have taken a look and you obviously ran tests. So if you?re ok with the change, I can push it next week. Our nightly tests look good, too. Thanks and best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 2. M?rz 2018 11:55 To: Doerr, Martin > Cc: Lindenmaier, Goetz >; Gustavo Romero >; Hiroshi H Horii (HORII at jp.ibm.com) >; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(L): 8198894: [PPC64] More generic vector CRC implementation Hi Martin, I double checked performance with our micro benchmark. This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance. Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector]"Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector instruction based CRC code. From: "Doerr, Martin" > To: "'hotspot-compiler-dev at openjdk.java.net'" > Cc: "Lindenmaier, Goetz" >, "Hiroshi H Horii (HORII at jp.ibm.com)" >, "Michihiro Horie (HORIE at jp.ibm.com)" >, Gustavo Romero > Date: 2018/03/02 00:49 Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation ________________________________ Hi, I have implemented a more generic version of the vector instruction based CRC code. It supports CRC32C and Big Endian, too. The peak performance was even better for large input streams. I got almost 40GB/s. Some smaller length may be slower than with the old version. Maybe somebody from IBM would like to double-check performance. Please review: http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/ (hoping you like math ?) Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From igor.ignatyev at oracle.com Mon Mar 12 19:20:57 2018 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Mon, 12 Mar 2018 12:20:57 -0700 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> Message-ID: <68B11413-38BE-424C-B7A0-5C2E544BA8D7@oracle.com> Hi Vladimir, The fix looks good. ? Igor > On Mar 7, 2018, at 9:02 PM, Vladimir Kozlov wrote: > > http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8199212 > > Our testing show that AOT jtreg tests consume a lot of time when running with -Xcomp. Running AOT compiler in these tests with -Xcomp is not helpful. > > -- > Thanks, > Vladimir From vladimir.kozlov at oracle.com Mon Mar 12 19:39:49 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 12 Mar 2018 12:39:49 -0700 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: <68B11413-38BE-424C-B7A0-5C2E544BA8D7@oracle.com> References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> <68B11413-38BE-424C-B7A0-5C2E544BA8D7@oracle.com> Message-ID: Thank you, Igor Vladimir On 3/12/18 12:20 PM, Igor Ignatev wrote: > Hi Vladimir, > > The fix looks good. > > ? Igor > >> On Mar 7, 2018, at 9:02 PM, Vladimir Kozlov wrote: >> >> http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8199212 >> >> Our testing show that AOT jtreg tests consume a lot of time when running with -Xcomp. Running AOT compiler in these tests with -Xcomp is not helpful. >> >> -- >> Thanks, >> Vladimir > From razvan.a.lupusoru at intel.com Tue Mar 13 00:47:48 2018 From: razvan.a.lupusoru at intel.com (Lupusoru, Razvan A) Date: Tue, 13 Mar 2018 00:47:48 +0000 Subject: RFR(S): Vector popcount support In-Reply-To: <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> Message-ID: <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> Hi Vladimir, Thank you so much for the quick review. I have addressed following issues: - renamed support_avx512_vpopcntdq to supports_vpopcntdq - No longer check UseAVX > 2 following your suggestion to clear flag when AVX512 is not available - Added length checking in predicate - Added appropriate compiler tests to exercise functionality Update patch is available at: http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. Let me know if you have any more comments or suggestions! Thanks, Razvan -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, March 09, 2018 4:03 PM To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Vector popcount support Hi Razvan, In general changes are good. Do you plans to add vpopcntb,w too? Use 'supports' with 's' in name as in other support functions names: supports_avx512_vpopcntdq() Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? In assembler_x86.cpp and other places you don't need to check UseAVX, support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: if (UseAVX < 3) { _features &= ~CPU_AVX512F; _features &= ~CPU_AVX512DQ; _features &= ~CPU_AVX512CD; _features &= ~CPU_AVX512BW; _features &= ~CPU_AVX512VL; + _features &= ~CPU_AVX512_VPOPCNTDQ; } In x86.ad you forgot to add length check in predicate() like next: instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ predicate(UseAVX > 0 && n->as_Vector()->length() == 2); And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. Thanks, Vladimir On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: > Hi everyone, > > As per "Intel Architecture Instruction Set Extensions and Future > Features Programming Reference" manual [1], vector popcount > instruction will be supported in future Intel ISA. I have updated the > superword vectorizer to take advantage of this instruction. I have > tested with Intel SDE [2] to confirm encoding and semantics are > correctly implemented. Please take a look and let me know if you have > any questions or comments. > > http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/index > .html > > Thanks, > > Razvan > > [1] > https://software.intel.com/sites/default/files/managed/c5/15/architect > ure-instruction-set-extensions-programming-reference.pdf > > [2] > https://software.intel.com/en-us/articles/intel-software-development-e > mulator > > [3] https://bugs.openjdk.java.net/browse/JDK-8199421 > From vladimir.kozlov at oracle.com Tue Mar 13 01:33:26 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 12 Mar 2018 18:33:26 -0700 Subject: RFR(S): Vector popcount support In-Reply-To: <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> Message-ID: <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> Looks good. Thank you for explanation about b,w and q variants. I will start testing. Can you explain why you have additional test with MaxVectorSize=8? Thanks, Vladimir On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: > Hi Vladimir, > > Thank you so much for the quick review. > > I have addressed following issues: > - renamed support_avx512_vpopcntdq to supports_vpopcntdq > - No longer check UseAVX > 2 following your suggestion to clear flag when AVX512 is not available > - Added length checking in predicate > - Added appropriate compiler tests to exercise functionality > > Update patch is available at: http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ > > Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. > > Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. > > Let me know if you have any more comments or suggestions! > > Thanks, > Razvan > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, March 09, 2018 4:03 PM > To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > Hi Razvan, > > In general changes are good. Do you plans to add vpopcntb,w too? > > Use 'supports' with 's' in name as in other support functions names: > supports_avx512_vpopcntdq() > > Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? > > In assembler_x86.cpp and other places you don't need to check UseAVX, > support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: > > if (UseAVX < 3) { > _features &= ~CPU_AVX512F; > _features &= ~CPU_AVX512DQ; > _features &= ~CPU_AVX512CD; > _features &= ~CPU_AVX512BW; > _features &= ~CPU_AVX512VL; > + _features &= ~CPU_AVX512_VPOPCNTDQ; > } > > In x86.ad you forgot to add length check in predicate() like next: > > instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ > predicate(UseAVX > 0 && n->as_Vector()->length() == 2); > > And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. > > Thanks, > Vladimir > > On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >> Hi everyone, >> >> As per "Intel Architecture Instruction Set Extensions and Future >> Features Programming Reference" manual [1], vector popcount >> instruction will be supported in future Intel ISA. I have updated the >> superword vectorizer to take advantage of this instruction. I have >> tested with Intel SDE [2] to confirm encoding and semantics are >> correctly implemented. Please take a look and let me know if you have >> any questions or comments. >> >> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/index >> .html >> >> Thanks, >> >> Razvan >> >> [1] >> https://software.intel.com/sites/default/files/managed/c5/15/architect >> ure-instruction-set-extensions-programming-reference.pdf >> >> [2] >> https://software.intel.com/en-us/articles/intel-software-development-e >> mulator >> >> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >> From vparfinenko at excelsior-usa.com Tue Mar 13 09:39:53 2018 From: vparfinenko at excelsior-usa.com (Vladimir Parfinenko) Date: Tue, 13 Mar 2018 16:39:53 +0700 Subject: Changed code generation in JDK 9 Message-ID: Hi all, I am trying to investigate how C2 generates code for method: public static boolean invert(boolean x) { return !x; } I have found out that in JDK 8 it generates just "xor arg, 1" which does not properly handle random integers as booleans: 0x0000000004af4a8c: mov %edx,%eax 0x0000000004af4a8e: xor $0x1,%eax ;*ireturn ; - Inverter::invert at 9 (line 3) In JDK 9 it performs truncation of argument to range {0, 1} and then "xor arg, 1". 0x000001d92cc9f0ac: test %edx,%edx 0x000001d92cc9f0ae: setne %al 0x000001d92cc9f0b1: movzbl %al,%eax 0x000001d92cc9f0b4: xor $0x1,%eax ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ; - Inverter::invert at 9 (line 3) Full logs are available here: https://gist.github.com/cypok/24b2f30060958e10321f44784a4187c0 So now I am trying to find the commit responsible for this change or motivational bug. Could anyone help me? -- Vladimir Parfinenko From shade at redhat.com Tue Mar 13 09:46:41 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Mar 2018 10:46:41 +0100 Subject: Changed code generation in JDK 9 In-Reply-To: References: Message-ID: On 03/13/2018 10:39 AM, Vladimir Parfinenko wrote: > Hi all, > > I am trying to investigate how C2 generates code for method: > > public static boolean invert(boolean x) { > return !x; > } > > I have found out that in JDK 8 it generates just "xor arg, 1" which does > not properly handle random integers as booleans: > > 0x0000000004af4a8c: mov %edx,%eax > 0x0000000004af4a8e: xor $0x1,%eax ;*ireturn > ; - Inverter::invert at 9 > (line 3) > > In JDK 9 it performs truncation of argument to range {0, 1} and then > "xor arg, 1". > > 0x000001d92cc9f0ac: test %edx,%edx > 0x000001d92cc9f0ae: setne %al > 0x000001d92cc9f0b1: movzbl %al,%eax > 0x000001d92cc9f0b4: xor $0x1,%eax ;*ireturn {reexecute=0 > rethrow=0 return_oop=0} > ; - Inverter::invert at 9 > (line 3) > > Full logs are available here: > https://gist.github.com/cypok/24b2f30060958e10321f44784a4187c0 > > So now I am trying to find the commit responsible for this change or > motivational bug. > Could anyone help me? I would speculate it is related to boolean value normalization: https://bugs.openjdk.java.net/browse/JDK-8161720 It is puzzling how would that leak to normal loads though. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vparfinenko at excelsior-usa.com Tue Mar 13 11:13:21 2018 From: vparfinenko at excelsior-usa.com (Vladimir Parfinenko) Date: Tue, 13 Mar 2018 18:13:21 +0700 Subject: Changed code generation in JDK 9 References: Message-ID: -----Original Message----- From: Aleksey Shipilev [mailto:shade at redhat.com] Sent: Tuesday, March 13, 2018 4:47 PM To: Vladimir Parfinenko; hotspot-compiler-dev at openjdk.java.net Subject: Re: Changed code generation in JDK 9 On 03/13/2018 10:39 AM, Vladimir Parfinenko wrote: > Hi all, > > I am trying to investigate how C2 generates code for method: > > public static boolean invert(boolean x) { > return !x; > } > > I have found out that in JDK 8 it generates just "xor arg, 1" which does > not properly handle random integers as booleans: > > 0x0000000004af4a8c: mov %edx,%eax > 0x0000000004af4a8e: xor $0x1,%eax ;*ireturn > ; - Inverter::invert at 9 > (line 3) > > In JDK 9 it performs truncation of argument to range {0, 1} and then > "xor arg, 1". > > 0x000001d92cc9f0ac: test %edx,%edx > 0x000001d92cc9f0ae: setne %al > 0x000001d92cc9f0b1: movzbl %al,%eax > 0x000001d92cc9f0b4: xor $0x1,%eax ;*ireturn {reexecute=0 > rethrow=0 return_oop=0} > ; - Inverter::invert at 9 > (line 3) > > Full logs are available here: > https://gist.github.com/cypok/24b2f30060958e10321f44784a4187c0 > > So now I am trying to find the commit responsible for this change or > motivational bug. > Could anyone help me? I would speculate it is related to boolean value normalization: https://bugs.openjdk.java.net/browse/JDK-8161720 It is puzzling how would that leak to normal loads though. -Aleksey From vparfinenko at excelsior-usa.com Tue Mar 13 11:16:50 2018 From: vparfinenko at excelsior-usa.com (Vladimir Parfinenko) Date: Tue, 13 Mar 2018 18:16:50 +0700 Subject: Changed code generation in JDK 9 References: Message-ID: >> I would speculate it is related to boolean value normalization: >> https://bugs.openjdk.java.net/browse/JDK-8161720 Yes, it looks very related. Thank you! >> It is puzzling how would that leak to normal loads though. You could always just write and execute handmade bytecode: bipush 2 invokestatic Inverter.invert(Z)Z -- Vladimir Parfinenko P.S. Sorry for previous empty message. -----Original Message----- From: Vladimir Parfinenko Sent: Tuesday, March 13, 2018 6:13 PM To: 'Aleksey Shipilev'; hotspot-compiler-dev at openjdk.java.net Subject: RE: Changed code generation in JDK 9 -----Original Message----- From: Aleksey Shipilev [mailto:shade at redhat.com] Sent: Tuesday, March 13, 2018 4:47 PM To: Vladimir Parfinenko; hotspot-compiler-dev at openjdk.java.net Subject: Re: Changed code generation in JDK 9 On 03/13/2018 10:39 AM, Vladimir Parfinenko wrote: > Hi all, > > I am trying to investigate how C2 generates code for method: > > public static boolean invert(boolean x) { > return !x; > } > > I have found out that in JDK 8 it generates just "xor arg, 1" which does > not properly handle random integers as booleans: > > 0x0000000004af4a8c: mov %edx,%eax > 0x0000000004af4a8e: xor $0x1,%eax ;*ireturn > ; - Inverter::invert at 9 > (line 3) > > In JDK 9 it performs truncation of argument to range {0, 1} and then > "xor arg, 1". > > 0x000001d92cc9f0ac: test %edx,%edx > 0x000001d92cc9f0ae: setne %al > 0x000001d92cc9f0b1: movzbl %al,%eax > 0x000001d92cc9f0b4: xor $0x1,%eax ;*ireturn {reexecute=0 > rethrow=0 return_oop=0} > ; - Inverter::invert at 9 > (line 3) > > Full logs are available here: > https://gist.github.com/cypok/24b2f30060958e10321f44784a4187c0 > > So now I am trying to find the commit responsible for this change or > motivational bug. > Could anyone help me? I would speculate it is related to boolean value normalization: https://bugs.openjdk.java.net/browse/JDK-8161720 It is puzzling how would that leak to normal loads though. -Aleksey From rahul.v.raghavan at oracle.com Tue Mar 13 13:18:27 2018 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Tue, 13 Mar 2018 18:48:27 +0530 Subject: [11] RFR: 8071282: remove misc dead code Message-ID: Hi, Please review the following cleanup changes. - http://cr.openjdk.java.net/~rraghavan/8071282/webrev.00/ - https://bugs.openjdk.java.net/browse/JDK-8071282- 'remove misc dead code' -- Removed following unused methods - frame::min_local_offset_for_compiler frame::monitor_offset_for_compiler frame::pd_oop_map_offset_adjustment frame::local_offset_for_compiler frame::volatile_across_calls C1_MacroAssembler::unverified_entry void LIRGenerator::cmp_reg_mem(LIR_Condition condition, LIR_Opr reg, LIR_Opr base, LIR_Opr disp, BasicType type, CodeEmitInfo* info) -- No issues with local builds, pre-integration testing in progress. Thanks, Rahul From razvan.a.lupusoru at intel.com Tue Mar 13 18:15:20 2018 From: razvan.a.lupusoru at intel.com (Lupusoru, Razvan A) Date: Tue, 13 Mar 2018 18:15:20 +0000 Subject: RFR(S): Vector popcount support In-Reply-To: <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> Message-ID: <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Hi Vladimir, I wanted to add at least one test variant that will exercise a smaller vector size (in this case 8 bytes). That is because when AVX512 is supported, it is possible to use smaller vectors with EVEX encoding. If you find it unnecessary, please feel free to remove it before integration. Thanks! --Razvan -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, March 12, 2018 6:33 PM To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Vector popcount support Looks good. Thank you for explanation about b,w and q variants. I will start testing. Can you explain why you have additional test with MaxVectorSize=8? Thanks, Vladimir On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: > Hi Vladimir, > > Thank you so much for the quick review. > > I have addressed following issues: > - renamed support_avx512_vpopcntdq to supports_vpopcntdq > - No longer check UseAVX > 2 following your suggestion to clear flag > when AVX512 is not available > - Added length checking in predicate > - Added appropriate compiler tests to exercise functionality > > Update patch is available at: > http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ > > Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. > > Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. > > Let me know if you have any more comments or suggestions! > > Thanks, > Razvan > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, March 09, 2018 4:03 PM > To: Lupusoru, Razvan A ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > Hi Razvan, > > In general changes are good. Do you plans to add vpopcntb,w too? > > Use 'supports' with 's' in name as in other support functions names: > supports_avx512_vpopcntdq() > > Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? > > In assembler_x86.cpp and other places you don't need to check UseAVX, > support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: > > if (UseAVX < 3) { > _features &= ~CPU_AVX512F; > _features &= ~CPU_AVX512DQ; > _features &= ~CPU_AVX512CD; > _features &= ~CPU_AVX512BW; > _features &= ~CPU_AVX512VL; > + _features &= ~CPU_AVX512_VPOPCNTDQ; > } > > In x86.ad you forgot to add length check in predicate() like next: > > instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ > predicate(UseAVX > 0 && n->as_Vector()->length() == 2); > > And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. > > Thanks, > Vladimir > > On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >> Hi everyone, >> >> As per "Intel Architecture Instruction Set Extensions and Future >> Features Programming Reference" manual [1], vector popcount >> instruction will be supported in future Intel ISA. I have updated the >> superword vectorizer to take advantage of this instruction. I have >> tested with Intel SDE [2] to confirm encoding and semantics are >> correctly implemented. Please take a look and let me know if you have >> any questions or comments. >> >> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/inde >> x >> .html >> >> Thanks, >> >> Razvan >> >> [1] >> https://software.intel.com/sites/default/files/managed/c5/15/architec >> t ure-instruction-set-extensions-programming-reference.pdf >> >> [2] >> https://software.intel.com/en-us/articles/intel-software-development- >> e >> mulator >> >> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >> From vladimir.kozlov at oracle.com Tue Mar 13 18:22:13 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 13 Mar 2018 11:22:13 -0700 Subject: [11] RFR: 8071282: remove misc dead code In-Reply-To: References: Message-ID: <20837740-32e1-f20e-60f1-ddb8612a7dc7@oracle.com> Looks good. Thanks, Vladimir On 3/13/18 6:18 AM, Rahul Raghavan wrote: > Hi, > > Please review the following cleanup changes. > > - > ?http://cr.openjdk.java.net/~rraghavan/8071282/webrev.00/ > > - > ?https://bugs.openjdk.java.net/browse/JDK-8071282- > ?'remove misc dead code' > > > -- Removed following unused methods - > ? frame::min_local_offset_for_compiler > ? frame::monitor_offset_for_compiler > ? frame::pd_oop_map_offset_adjustment > ? frame::local_offset_for_compiler > ? frame::volatile_across_calls > ? C1_MacroAssembler::unverified_entry > ? void LIRGenerator::cmp_reg_mem(LIR_Condition condition, LIR_Opr reg, > LIR_Opr base, LIR_Opr disp, BasicType type, CodeEmitInfo* info) > > -- No issues with local builds, pre-integration testing in progress. > > > Thanks, > Rahul From vladimir.kozlov at oracle.com Tue Mar 13 18:25:47 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 13 Mar 2018 11:25:47 -0700 Subject: RFR(S): Vector popcount support In-Reply-To: <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Message-ID: On 3/13/18 11:15 AM, Lupusoru, Razvan A wrote: > Hi Vladimir, > > I wanted to add at least one test variant that will exercise a smaller vector size (in this case 8 bytes). That is because when AVX512 is supported, it is possible to use smaller vectors with EVEX encoding. If you find it unnecessary, please feel free to remove it before integration. Good to know. Testing passed and I pushed changes. Thanks, Vladimir > > Thanks! > > --Razvan > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, March 12, 2018 6:33 PM > To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > Looks good. Thank you for explanation about b,w and q variants. I will start testing. > > Can you explain why you have additional test with MaxVectorSize=8? > > Thanks, > Vladimir > > On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: >> Hi Vladimir, >> >> Thank you so much for the quick review. >> >> I have addressed following issues: >> - renamed support_avx512_vpopcntdq to supports_vpopcntdq >> - No longer check UseAVX > 2 following your suggestion to clear flag >> when AVX512 is not available >> - Added length checking in predicate >> - Added appropriate compiler tests to exercise functionality >> >> Update patch is available at: >> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ >> >> Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. >> >> Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. >> >> Let me know if you have any more comments or suggestions! >> >> Thanks, >> Razvan >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Friday, March 09, 2018 4:03 PM >> To: Lupusoru, Razvan A ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): Vector popcount support >> >> Hi Razvan, >> >> In general changes are good. Do you plans to add vpopcntb,w too? >> >> Use 'supports' with 's' in name as in other support functions names: >> supports_avx512_vpopcntdq() >> >> Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? >> >> In assembler_x86.cpp and other places you don't need to check UseAVX, >> support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: >> >> if (UseAVX < 3) { >> _features &= ~CPU_AVX512F; >> _features &= ~CPU_AVX512DQ; >> _features &= ~CPU_AVX512CD; >> _features &= ~CPU_AVX512BW; >> _features &= ~CPU_AVX512VL; >> + _features &= ~CPU_AVX512_VPOPCNTDQ; >> } >> >> In x86.ad you forgot to add length check in predicate() like next: >> >> instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ >> predicate(UseAVX > 0 && n->as_Vector()->length() == 2); >> >> And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. >> >> Thanks, >> Vladimir >> >> On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >>> Hi everyone, >>> >>> As per "Intel Architecture Instruction Set Extensions and Future >>> Features Programming Reference" manual [1], vector popcount >>> instruction will be supported in future Intel ISA. I have updated the >>> superword vectorizer to take advantage of this instruction. I have >>> tested with Intel SDE [2] to confirm encoding and semantics are >>> correctly implemented. Please take a look and let me know if you have >>> any questions or comments. >>> >>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/inde >>> x >>> .html >>> >>> Thanks, >>> >>> Razvan >>> >>> [1] >>> https://software.intel.com/sites/default/files/managed/c5/15/architec >>> t ure-instruction-set-extensions-programming-reference.pdf >>> >>> [2] >>> https://software.intel.com/en-us/articles/intel-software-development- >>> e >>> mulator >>> >>> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >>> From tobias.hartmann at oracle.com Wed Mar 14 07:51:29 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 08:51:29 +0100 Subject: [11] RFR: 8071282: remove misc dead code In-Reply-To: References: Message-ID: <0145383e-bdf7-e967-3396-30ee6a0b90fa@oracle.com> Hi Rahul, looks good to me too. If you get a chance, please update the copyright dates before pushing (no new webrev required). Thanks, Tobias On 13.03.2018 14:18, Rahul Raghavan wrote: > Hi, > > Please review the following cleanup changes. > > - > ?http://cr.openjdk.java.net/~rraghavan/8071282/webrev.00/ > > - > ?https://bugs.openjdk.java.net/browse/JDK-8071282- > ?'remove misc dead code' > > > -- Removed following unused methods - > ? frame::min_local_offset_for_compiler > ? frame::monitor_offset_for_compiler > ? frame::pd_oop_map_offset_adjustment > ? frame::local_offset_for_compiler > ? frame::volatile_across_calls > ? C1_MacroAssembler::unverified_entry > ? void LIRGenerator::cmp_reg_mem(LIR_Condition condition, LIR_Opr reg, LIR_Opr base, LIR_Opr disp, > BasicType type, CodeEmitInfo* info) > > -- No issues with local builds, pre-integration testing in progress. > > > Thanks, > Rahul From nils.eliasson at oracle.com Wed Mar 14 10:17:15 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 14 Mar 2018 11:17:15 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> Message-ID: <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> Hi Vladimir, After taking a very close look I found that the anti-dependency checking that hoists the testN_mem_reg from the jmpCon is broken, and that the hoisting is unnecessary. So this is not a case where we need anti-depenency checking for loads before matching. Generally the insert_anti_dependences looks good, except the store->is_Phi() clause that is full of holes (overly conservative). I don't think I fully understand how the graph looks when the clause is needed, but it tries to find stores upwards that is otherwise unreachable from the downward memory flow search. I found these three flaws: 1) A Phi in a block that is preceded by a store - even though the store is dominated by the loads LCA it will force the testN up! We don't check where the stores are located. 2) A Phi that consumes the same memory as the load may force it up, even though no stores are involved. 3) A Phi that consumes a mergemem, which in itself has already has been processed and passed as irrelevant, may force the testN up. One could add that any predecessor to the phi would have to be a store/call to affect the load placement. I have also added some additional debugging printouts to the patch. http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ https://bugs.openjdk.java.net/browse/JDK-8192992 Regards, // Nils On 2018-03-05 18:02, Vladimir Kozlov wrote: > Hi Nils, > > Yes, it is legal workaround but this way you removed all subsuming > loads in code. > > Should we do anti-dependency check for loads during matching when > shared nodes are marked?: > > http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 > > > How expensive would be that? > > Vladimir > > On 3/5/18 7:50 AM, Nils Eliasson wrote: >> Hi, >> >> This patch is a workaround for a scheduling problem encountered in >> some rare circumstances. Instead of hitting the assert we retry the >> compilation without subsuming loads. >> >> To quote Tobias: >> >> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) >> is scheduled in a different block than its jmpCon user and the >> register allocator tries to spill the flag register. The problem is >> that PhaseCFG::schedule_late() detects an anti-dependency for the >> testN_mem_reg0 on a bottom memory Phi and therefore raises the LCA to >> the early block (see PhaseCFG::insert_anti_dependences()) which is >> "far away" from its jmpCon user. " >> >> Thanks to Roland who suggested the workaround. >> >> https://bugs.openjdk.java.net/browse/JDK-8192992 >> >> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >> >> Regards, >> >> Nils >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Mar 14 11:03:33 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 14 Mar 2018 11:03:33 +0000 Subject: RFR(S): Vector popcount support In-Reply-To: References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Message-ID: Hi, unfortunately, this change breaks the Windows 32 bit build: ad_x86.obj : error LNK2019: unresolved external symbol "public: void __thiscall Assembler::vpopcntd(class XMMRegisterImpl *,class XMMRegisterImpl *,int)" (?vpopcntd at Assembler@@QAEXPAVXMMRegisterImpl@@0H at Z) referenced in function "private: virtual void __thiscall vpopcount16INode::emit(class CodeBuffer &,class PhaseRegAlloc *)const " (?emit at vpopcount16INode@@EBEXAAVCodeBuffer@@PAVPhaseRegAlloc@@@Z) Should the new nodes not be in x86_64.ad? The declaration of vpopcntd is in shared 32/64 bit code while the definition is in 64 bit code. Is already somebody working on a fix? Best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Dienstag, 13. M?rz 2018 19:26 To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Vector popcount support On 3/13/18 11:15 AM, Lupusoru, Razvan A wrote: > Hi Vladimir, > > I wanted to add at least one test variant that will exercise a smaller vector size (in this case 8 bytes). That is because when AVX512 is supported, it is possible to use smaller vectors with EVEX encoding. If you find it unnecessary, please feel free to remove it before integration. Good to know. Testing passed and I pushed changes. Thanks, Vladimir > > Thanks! > > --Razvan > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, March 12, 2018 6:33 PM > To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > Looks good. Thank you for explanation about b,w and q variants. I will start testing. > > Can you explain why you have additional test with MaxVectorSize=8? > > Thanks, > Vladimir > > On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: >> Hi Vladimir, >> >> Thank you so much for the quick review. >> >> I have addressed following issues: >> - renamed support_avx512_vpopcntdq to supports_vpopcntdq >> - No longer check UseAVX > 2 following your suggestion to clear flag >> when AVX512 is not available >> - Added length checking in predicate >> - Added appropriate compiler tests to exercise functionality >> >> Update patch is available at: >> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ >> >> Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. >> >> Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. >> >> Let me know if you have any more comments or suggestions! >> >> Thanks, >> Razvan >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Friday, March 09, 2018 4:03 PM >> To: Lupusoru, Razvan A ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): Vector popcount support >> >> Hi Razvan, >> >> In general changes are good. Do you plans to add vpopcntb,w too? >> >> Use 'supports' with 's' in name as in other support functions names: >> supports_avx512_vpopcntdq() >> >> Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? >> >> In assembler_x86.cpp and other places you don't need to check UseAVX, >> support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: >> >> if (UseAVX < 3) { >> _features &= ~CPU_AVX512F; >> _features &= ~CPU_AVX512DQ; >> _features &= ~CPU_AVX512CD; >> _features &= ~CPU_AVX512BW; >> _features &= ~CPU_AVX512VL; >> + _features &= ~CPU_AVX512_VPOPCNTDQ; >> } >> >> In x86.ad you forgot to add length check in predicate() like next: >> >> instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ >> predicate(UseAVX > 0 && n->as_Vector()->length() == 2); >> >> And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. >> >> Thanks, >> Vladimir >> >> On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >>> Hi everyone, >>> >>> As per "Intel Architecture Instruction Set Extensions and Future >>> Features Programming Reference" manual [1], vector popcount >>> instruction will be supported in future Intel ISA. I have updated the >>> superword vectorizer to take advantage of this instruction. I have >>> tested with Intel SDE [2] to confirm encoding and semantics are >>> correctly implemented. Please take a look and let me know if you have >>> any questions or comments. >>> >>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/inde >>> x >>> .html >>> >>> Thanks, >>> >>> Razvan >>> >>> [1] >>> https://software.intel.com/sites/default/files/managed/c5/15/architec >>> t ure-instruction-set-extensions-programming-reference.pdf >>> >>> [2] >>> https://software.intel.com/en-us/articles/intel-software-development- >>> e >>> mulator >>> >>> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >>> From shade at redhat.com Wed Mar 14 11:05:52 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Mar 2018 12:05:52 +0100 Subject: RFR(S): Vector popcount support In-Reply-To: References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Message-ID: I filed: https://bugs.openjdk.java.net/browse/JDK-8199603 ...and trying to fix it. -Aleksey On 03/14/2018 12:03 PM, Doerr, Martin wrote: > Hi, > > unfortunately, this change breaks the Windows 32 bit build: > ad_x86.obj : error LNK2019: unresolved external symbol "public: void __thiscall Assembler::vpopcntd(class XMMRegisterImpl *,class XMMRegisterImpl *,int)" (?vpopcntd at Assembler@@QAEXPAVXMMRegisterImpl@@0H at Z) referenced in function "private: virtual void __thiscall vpopcount16INode::emit(class CodeBuffer &,class PhaseRegAlloc *)const " (?emit at vpopcount16INode@@EBEXAAVCodeBuffer@@PAVPhaseRegAlloc@@@Z) > > Should the new nodes not be in x86_64.ad? > The declaration of vpopcntd is in shared 32/64 bit code while the definition is in 64 bit code. > > Is already somebody working on a fix? > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Dienstag, 13. M?rz 2018 19:26 > To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > On 3/13/18 11:15 AM, Lupusoru, Razvan A wrote: >> Hi Vladimir, >> >> I wanted to add at least one test variant that will exercise a smaller vector size (in this case 8 bytes). That is because when AVX512 is supported, it is possible to use smaller vectors with EVEX encoding. If you find it unnecessary, please feel free to remove it before integration. > > Good to know. > > Testing passed and I pushed changes. > > Thanks, > Vladimir > >> >> Thanks! >> >> --Razvan >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, March 12, 2018 6:33 PM >> To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): Vector popcount support >> >> Looks good. Thank you for explanation about b,w and q variants. I will start testing. >> >> Can you explain why you have additional test with MaxVectorSize=8? >> >> Thanks, >> Vladimir >> >> On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: >>> Hi Vladimir, >>> >>> Thank you so much for the quick review. >>> >>> I have addressed following issues: >>> - renamed support_avx512_vpopcntdq to supports_vpopcntdq >>> - No longer check UseAVX > 2 following your suggestion to clear flag >>> when AVX512 is not available >>> - Added length checking in predicate >>> - Added appropriate compiler tests to exercise functionality >>> >>> Update patch is available at: >>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ >>> >>> Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. >>> >>> Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. >>> >>> Let me know if you have any more comments or suggestions! >>> >>> Thanks, >>> Razvan >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, March 09, 2018 4:03 PM >>> To: Lupusoru, Razvan A ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): Vector popcount support >>> >>> Hi Razvan, >>> >>> In general changes are good. Do you plans to add vpopcntb,w too? >>> >>> Use 'supports' with 's' in name as in other support functions names: >>> supports_avx512_vpopcntdq() >>> >>> Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? >>> >>> In assembler_x86.cpp and other places you don't need to check UseAVX, >>> support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: >>> >>> if (UseAVX < 3) { >>> _features &= ~CPU_AVX512F; >>> _features &= ~CPU_AVX512DQ; >>> _features &= ~CPU_AVX512CD; >>> _features &= ~CPU_AVX512BW; >>> _features &= ~CPU_AVX512VL; >>> + _features &= ~CPU_AVX512_VPOPCNTDQ; >>> } >>> >>> In x86.ad you forgot to add length check in predicate() like next: >>> >>> instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ >>> predicate(UseAVX > 0 && n->as_Vector()->length() == 2); >>> >>> And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >>>> Hi everyone, >>>> >>>> As per "Intel Architecture Instruction Set Extensions and Future >>>> Features Programming Reference" manual [1], vector popcount >>>> instruction will be supported in future Intel ISA. I have updated the >>>> superword vectorizer to take advantage of this instruction. I have >>>> tested with Intel SDE [2] to confirm encoding and semantics are >>>> correctly implemented. Please take a look and let me know if you have >>>> any questions or comments. >>>> >>>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/inde >>>> x >>>> .html >>>> >>>> Thanks, >>>> >>>> Razvan >>>> >>>> [1] >>>> https://software.intel.com/sites/default/files/managed/c5/15/architec >>>> t ure-instruction-set-extensions-programming-reference.pdf >>>> >>>> [2] >>>> https://software.intel.com/en-us/articles/intel-software-development- >>>> e >>>> mulator >>>> >>>> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >>>> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From martin.doerr at sap.com Wed Mar 14 11:10:47 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 14 Mar 2018 11:10:47 +0000 Subject: RFR(S): Vector popcount support In-Reply-To: References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Message-ID: Hi Aleksey, awesome. Thank you. Best regards, Martin -----Original Message----- From: Aleksey Shipilev [mailto:shade at redhat.com] Sent: Mittwoch, 14. M?rz 2018 12:06 To: Doerr, Martin ; Vladimir Kozlov ; Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Vector popcount support I filed: https://bugs.openjdk.java.net/browse/JDK-8199603 ...and trying to fix it. -Aleksey On 03/14/2018 12:03 PM, Doerr, Martin wrote: > Hi, > > unfortunately, this change breaks the Windows 32 bit build: > ad_x86.obj : error LNK2019: unresolved external symbol "public: void __thiscall Assembler::vpopcntd(class XMMRegisterImpl *,class XMMRegisterImpl *,int)" (?vpopcntd at Assembler@@QAEXPAVXMMRegisterImpl@@0H at Z) referenced in function "private: virtual void __thiscall vpopcount16INode::emit(class CodeBuffer &,class PhaseRegAlloc *)const " (?emit at vpopcount16INode@@EBEXAAVCodeBuffer@@PAVPhaseRegAlloc@@@Z) > > Should the new nodes not be in x86_64.ad? > The declaration of vpopcntd is in shared 32/64 bit code while the definition is in 64 bit code. > > Is already somebody working on a fix? > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Dienstag, 13. M?rz 2018 19:26 > To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > On 3/13/18 11:15 AM, Lupusoru, Razvan A wrote: >> Hi Vladimir, >> >> I wanted to add at least one test variant that will exercise a smaller vector size (in this case 8 bytes). That is because when AVX512 is supported, it is possible to use smaller vectors with EVEX encoding. If you find it unnecessary, please feel free to remove it before integration. > > Good to know. > > Testing passed and I pushed changes. > > Thanks, > Vladimir > >> >> Thanks! >> >> --Razvan >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, March 12, 2018 6:33 PM >> To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): Vector popcount support >> >> Looks good. Thank you for explanation about b,w and q variants. I will start testing. >> >> Can you explain why you have additional test with MaxVectorSize=8? >> >> Thanks, >> Vladimir >> >> On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: >>> Hi Vladimir, >>> >>> Thank you so much for the quick review. >>> >>> I have addressed following issues: >>> - renamed support_avx512_vpopcntdq to supports_vpopcntdq >>> - No longer check UseAVX > 2 following your suggestion to clear flag >>> when AVX512 is not available >>> - Added length checking in predicate >>> - Added appropriate compiler tests to exercise functionality >>> >>> Update patch is available at: >>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ >>> >>> Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. >>> >>> Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. >>> >>> Let me know if you have any more comments or suggestions! >>> >>> Thanks, >>> Razvan >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, March 09, 2018 4:03 PM >>> To: Lupusoru, Razvan A ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): Vector popcount support >>> >>> Hi Razvan, >>> >>> In general changes are good. Do you plans to add vpopcntb,w too? >>> >>> Use 'supports' with 's' in name as in other support functions names: >>> supports_avx512_vpopcntdq() >>> >>> Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? >>> >>> In assembler_x86.cpp and other places you don't need to check UseAVX, >>> support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: >>> >>> if (UseAVX < 3) { >>> _features &= ~CPU_AVX512F; >>> _features &= ~CPU_AVX512DQ; >>> _features &= ~CPU_AVX512CD; >>> _features &= ~CPU_AVX512BW; >>> _features &= ~CPU_AVX512VL; >>> + _features &= ~CPU_AVX512_VPOPCNTDQ; >>> } >>> >>> In x86.ad you forgot to add length check in predicate() like next: >>> >>> instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ >>> predicate(UseAVX > 0 && n->as_Vector()->length() == 2); >>> >>> And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >>>> Hi everyone, >>>> >>>> As per "Intel Architecture Instruction Set Extensions and Future >>>> Features Programming Reference" manual [1], vector popcount >>>> instruction will be supported in future Intel ISA. I have updated the >>>> superword vectorizer to take advantage of this instruction. I have >>>> tested with Intel SDE [2] to confirm encoding and semantics are >>>> correctly implemented. Please take a look and let me know if you have >>>> any questions or comments. >>>> >>>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/inde >>>> x >>>> .html >>>> >>>> Thanks, >>>> >>>> Razvan >>>> >>>> [1] >>>> https://software.intel.com/sites/default/files/managed/c5/15/architec >>>> t ure-instruction-set-extensions-programming-reference.pdf >>>> >>>> [2] >>>> https://software.intel.com/en-us/articles/intel-software-development- >>>> e >>>> mulator >>>> >>>> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >>>> From tobias.hartmann at oracle.com Wed Mar 14 11:06:47 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 12:06:47 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> Message-ID: <45e89f3a-76fc-c82d-9946-6aee5a39c7be@oracle.com> Hi Nils, very nice work! I also like the new debugging points. Small typo in gcm.cpp:711: "actully" -> "actually", "skill" -> "skip"? Best regards, Tobias On 14.03.2018 11:17, Nils Eliasson wrote: > Hi Vladimir, > > After taking a very close look I found that the anti-dependency checking that hoists the > testN_mem_reg from the jmpCon is broken, and that the hoisting is unnecessary. So this is not a case > where we need anti-depenency checking for loads before matching. > > Generally the insert_anti_dependences looks good, except the store->is_Phi() clause that is full of > holes (overly conservative). I don't think I fully understand how the graph looks when the clause is > needed, but it tries to find stores upwards that is otherwise unreachable from the downward memory > flow search. > > I found these three flaws: > > 1) A Phi in a block that is preceded by a store - even though the store is dominated by the loads > LCA it will force the testN up! We don't check where the stores are located. > > 2) A Phi that consumes the same memory as the load may force it up, even though no stores are involved. > > 3) A Phi that consumes a mergemem, which in itself has already has been processed and passed as > irrelevant, may force the testN up. > > One could add that any predecessor to the phi would have to be a store/call to affect the load > placement. > > I have also added some additional debugging printouts to the patch. > > http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ > https://bugs.openjdk.java.net/browse/JDK-8192992 > > Regards, > > // Nils > > > > On 2018-03-05 18:02, Vladimir Kozlov wrote: >> Hi Nils, >> >> Yes, it is legal workaround but this way you removed all subsuming loads in code. >> >> Should we do anti-dependency check for loads during matching when shared nodes are marked?: >> >> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 >> >> How expensive would be that? >> >> Vladimir >> >> On 3/5/18 7:50 AM, Nils Eliasson wrote: >>> Hi, >>> >>> This patch is a workaround for a scheduling problem encountered in some rare circumstances. >>> Instead of hitting the assert we retry the compilation without subsuming loads. >>> >>> To quote Tobias: >>> >>> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) is scheduled in a different >>> block than its jmpCon user and the register allocator tries to spill the flag register. The >>> problem is that PhaseCFG::schedule_late() detects an anti-dependency for the testN_mem_reg0 on a >>> bottom memory Phi and therefore raises the LCA to the early block (see >>> PhaseCFG::insert_anti_dependences()) which is "far away" from its jmpCon user. " >>> >>> Thanks to Roland who suggested the workaround. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>> >>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >>> >>> Regards, >>> >>> Nils >>> > From tobias.hartmann at oracle.com Wed Mar 14 11:07:47 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 12:07:47 +0100 Subject: RFR(S): Vector popcount support In-Reply-To: References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Message-ID: <17287760-ddfd-c4e5-5f4a-7817bc9ec77a@oracle.com> Hi Martin, On 14.03.2018 12:03, Doerr, Martin wrote: > Is already somebody working on a fix? Aleksey filed: https://bugs.openjdk.java.net/browse/JDK-8199603 Best regards, Tobias From shade at redhat.com Wed Mar 14 11:18:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Mar 2018 12:18:58 +0100 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" Message-ID: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8199603 Fix: http://cr.openjdk.java.net/~shade/8199603/webrev.01/ The problem with the offending change was that the declaration for vpopcntd comes in shared 32/64 assembler, but definition is in 64 assembler only. Assembler method is being used from shared 32/64 x86.ad, so the simplest fix is moving the definition close to shared 32/64, near existing popcntl. Testing: x86_32 build, x86_64 compiler/vectorization/TestPopCountVector test (can run it through submit-hs, but I think it is too trivial to bother) Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Wed Mar 14 11:30:28 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 12:30:28 +0100 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" In-Reply-To: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> References: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> Message-ID: <7eccde5a-c856-b9a7-af0b-9e054d8fb668@oracle.com> Hi Aleksey, looks good to me! Best regards, Tobias On 14.03.2018 12:18, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8199603 > > Fix: > http://cr.openjdk.java.net/~shade/8199603/webrev.01/ > > The problem with the offending change was that the declaration for vpopcntd comes in shared 32/64 > assembler, but definition is in 64 assembler only. Assembler method is being used from shared 32/64 > x86.ad, so the simplest fix is moving the definition close to shared 32/64, near existing popcntl. > > Testing: x86_32 build, x86_64 compiler/vectorization/TestPopCountVector test > (can run it through submit-hs, but I think it is too trivial to bother) > > Thanks, > -Aleksey > From shade at redhat.com Wed Mar 14 11:33:52 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Mar 2018 12:33:52 +0100 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" In-Reply-To: <7eccde5a-c856-b9a7-af0b-9e054d8fb668@oracle.com> References: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> <7eccde5a-c856-b9a7-af0b-9e054d8fb668@oracle.com> Message-ID: <58dc83b3-b061-81e9-ada1-795264fdc4eb@redhat.com> Thanks Tobias, With your Reviewer hat on, you consider this trivial? Should I push it now? -Aleksey On 03/14/2018 12:30 PM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good to me! > > Best regards, > Tobias > > On 14.03.2018 12:18, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8199603 >> >> Fix: >> http://cr.openjdk.java.net/~shade/8199603/webrev.01/ >> >> The problem with the offending change was that the declaration for vpopcntd comes in shared 32/64 >> assembler, but definition is in 64 assembler only. Assembler method is being used from shared 32/64 >> x86.ad, so the simplest fix is moving the definition close to shared 32/64, near existing popcntl. >> >> Testing: x86_32 build, x86_64 compiler/vectorization/TestPopCountVector test >> (can run it through submit-hs, but I think it is too trivial to bother) >> >> Thanks, >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Wed Mar 14 11:41:23 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 12:41:23 +0100 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" In-Reply-To: <58dc83b3-b061-81e9-ada1-795264fdc4eb@redhat.com> References: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> <7eccde5a-c856-b9a7-af0b-9e054d8fb668@oracle.com> <58dc83b3-b061-81e9-ada1-795264fdc4eb@redhat.com> Message-ID: <5877a101-f52f-817b-9094-cb5ccfecd6be@oracle.com> Hi Aleksey, On 14.03.2018 12:33, Aleksey Shipilev wrote: > With your Reviewer hat on, you consider this trivial? Should I push it now? I would recommend running this through the submit repo just to make sure. Thanks, Tobias > On 03/14/2018 12:30 PM, Tobias Hartmann wrote: >> Hi Aleksey, >> >> looks good to me! >> >> Best regards, >> Tobias >> >> On 14.03.2018 12:18, Aleksey Shipilev wrote: >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8199603 >>> >>> Fix: >>> http://cr.openjdk.java.net/~shade/8199603/webrev.01/ >>> >>> The problem with the offending change was that the declaration for vpopcntd comes in shared 32/64 >>> assembler, but definition is in 64 assembler only. Assembler method is being used from shared 32/64 >>> x86.ad, so the simplest fix is moving the definition close to shared 32/64, near existing popcntl. >>> >>> Testing: x86_32 build, x86_64 compiler/vectorization/TestPopCountVector test >>> (can run it through submit-hs, but I think it is too trivial to bother) >>> >>> Thanks, >>> -Aleksey >>> > > From nils.eliasson at oracle.com Wed Mar 14 13:43:17 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 14 Mar 2018 14:43:17 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <45e89f3a-76fc-c82d-9946-6aee5a39c7be@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> <45e89f3a-76fc-c82d-9946-6aee5a39c7be@oracle.com> Message-ID: <722e4130-ba52-e2fc-be07-580c9e493dae@oracle.com> Thanks Tobias, Take a look at version 3, it has the comments fixed. I can add that I am running precheckin testing. Best regards, Nils On 2018-03-14 12:06, Tobias Hartmann wrote: > Hi Nils, > > very nice work! I also like the new debugging points. > > Small typo in gcm.cpp:711: "actully" -> "actually", "skill" -> "skip"? > > Best regards, > Tobias > > > On 14.03.2018 11:17, Nils Eliasson wrote: >> Hi Vladimir, >> >> After taking a very close look I found that the anti-dependency checking that hoists the >> testN_mem_reg from the jmpCon is broken, and that the hoisting is unnecessary. So this is not a case >> where we need anti-depenency checking for loads before matching. >> >> Generally the insert_anti_dependences looks good, except the store->is_Phi() clause that is full of >> holes (overly conservative). I don't think I fully understand how the graph looks when the clause is >> needed, but it tries to find stores upwards that is otherwise unreachable from the downward memory >> flow search. >> >> I found these three flaws: >> >> 1) A Phi in a block that is preceded by a store - even though the store is dominated by the loads >> LCA it will force the testN up! We don't check where the stores are located. >> >> 2) A Phi that consumes the same memory as the load may force it up, even though no stores are involved. >> >> 3) A Phi that consumes a mergemem, which in itself has already has been processed and passed as >> irrelevant, may force the testN up. >> >> One could add that any predecessor to the phi would have to be a store/call to affect the load >> placement. >> >> I have also added some additional debugging printouts to the patch. >> >> http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ >> https://bugs.openjdk.java.net/browse/JDK-8192992 >> >> Regards, >> >> // Nils >> >> >> >> On 2018-03-05 18:02, Vladimir Kozlov wrote: >>> Hi Nils, >>> >>> Yes, it is legal workaround but this way you removed all subsuming loads in code. >>> >>> Should we do anti-dependency check for loads during matching when shared nodes are marked?: >>> >>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 >>> >>> How expensive would be that? >>> >>> Vladimir >>> >>> On 3/5/18 7:50 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> This patch is a workaround for a scheduling problem encountered in some rare circumstances. >>>> Instead of hitting the assert we retry the compilation without subsuming loads. >>>> >>>> To quote Tobias: >>>> >>>> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) is scheduled in a different >>>> block than its jmpCon user and the register allocator tries to spill the flag register. The >>>> problem is that PhaseCFG::schedule_late() detects an anti-dependency for the testN_mem_reg0 on a >>>> bottom memory Phi and therefore raises the LCA to the early block (see >>>> PhaseCFG::insert_anti_dependences()) which is "far away" from its jmpCon user. " >>>> >>>> Thanks to Roland who suggested the workaround. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>>> >>>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >>>> >>>> Regards, >>>> >>>> Nils >>>> From tobias.hartmann at oracle.com Wed Mar 14 13:59:47 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 14:59:47 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <722e4130-ba52-e2fc-be07-580c9e493dae@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> <45e89f3a-76fc-c82d-9946-6aee5a39c7be@oracle.com> <722e4130-ba52-e2fc-be07-580c9e493dae@oracle.com> Message-ID: <7a4d51cd-fe35-f702-c965-39273e52d62d@oracle.com> On 14.03.2018 14:43, Nils Eliasson wrote: > Take a look at version 3, it has the comments fixed. Looks good (the link in your previous email says webrev.03 but links to webrev.02, that's why I looked at the wrong one). Thanks, Tobias From shade at redhat.com Wed Mar 14 14:22:16 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Mar 2018 15:22:16 +0100 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" In-Reply-To: <5877a101-f52f-817b-9094-cb5ccfecd6be@oracle.com> References: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> <7eccde5a-c856-b9a7-af0b-9e054d8fb668@oracle.com> <58dc83b3-b061-81e9-ada1-795264fdc4eb@redhat.com> <5877a101-f52f-817b-9094-cb5ccfecd6be@oracle.com> Message-ID: <65341d72-49e6-12fc-cd37-7d6fd4ad2078@redhat.com> On 03/14/2018 12:41 PM, Tobias Hartmann wrote: > On 14.03.2018 12:33, Aleksey Shipilev wrote: >> With your Reviewer hat on, you consider this trivial? Should I push it now? > > I would recommend running this through the submit repo just to make sure. submit-hs returned fine, except for apparently known failure in java/lang/invoke/condy/CondyInterfaceWithOverpassMethods.java -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Wed Mar 14 14:26:30 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 15:26:30 +0100 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" In-Reply-To: <65341d72-49e6-12fc-cd37-7d6fd4ad2078@redhat.com> References: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> <7eccde5a-c856-b9a7-af0b-9e054d8fb668@oracle.com> <58dc83b3-b061-81e9-ada1-795264fdc4eb@redhat.com> <5877a101-f52f-817b-9094-cb5ccfecd6be@oracle.com> <65341d72-49e6-12fc-cd37-7d6fd4ad2078@redhat.com> Message-ID: On 14.03.2018 15:22, Aleksey Shipilev wrote: > submit-hs returned fine, except for apparently known failure in > java/lang/invoke/condy/CondyInterfaceWithOverpassMethods.java Yes, that's https://bugs.openjdk.java.net/browse/JDK-8199515. You are good to go (push) then! Thanks, Tobias From rahul.v.raghavan at oracle.com Wed Mar 14 15:32:34 2018 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 14 Mar 2018 21:02:34 +0530 Subject: [11] RFR: 8071282: remove misc dead code In-Reply-To: <0145383e-bdf7-e967-3396-30ee6a0b90fa@oracle.com> References: <0145383e-bdf7-e967-3396-30ee6a0b90fa@oracle.com> Message-ID: <3ee74489-0750-a54d-49c6-866135430d53@oracle.com> Thanks Tobias, Vladimir for the review. Confirmed no issues with pre-integration testing; and updated Copyright year at source files headers. http://cr.openjdk.java.net/~rraghavan/8071282/webrev.01/ Thanks, Rahul On Wednesday 14 March 2018 01:21 PM, Tobias Hartmann wrote: > Hi Rahul, > > looks good to me too. If you get a chance, please update the copyright dates before pushing (no new > webrev required). > > Thanks, > Tobias > > On 13.03.2018 14:18, Rahul Raghavan wrote: >> Hi, >> >> Please review the following cleanup changes. >> >> - >> ?http://cr.openjdk.java.net/~rraghavan/8071282/webrev.00/ >> >> - >> ?https://bugs.openjdk.java.net/browse/JDK-8071282- >> ?'remove misc dead code' >> >> >> -- Removed following unused methods - >> ? frame::min_local_offset_for_compiler >> ? frame::monitor_offset_for_compiler >> ? frame::pd_oop_map_offset_adjustment >> ? frame::local_offset_for_compiler >> ? frame::volatile_across_calls >> ? C1_MacroAssembler::unverified_entry >> ? void LIRGenerator::cmp_reg_mem(LIR_Condition condition, LIR_Opr reg, LIR_Opr base, LIR_Opr disp, >> BasicType type, CodeEmitInfo* info) >> >> -- No issues with local builds, pre-integration testing in progress. >> >> >> Thanks, >> Rahul From tobias.hartmann at oracle.com Wed Mar 14 15:33:51 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 14 Mar 2018 16:33:51 +0100 Subject: [11] RFR: 8071282: remove misc dead code In-Reply-To: <3ee74489-0750-a54d-49c6-866135430d53@oracle.com> References: <0145383e-bdf7-e967-3396-30ee6a0b90fa@oracle.com> <3ee74489-0750-a54d-49c6-866135430d53@oracle.com> Message-ID: Looks good. Best regards, Tobias On 14.03.2018 16:32, Rahul Raghavan wrote: > Thanks Tobias, Vladimir for the review. > > Confirmed no issues with pre-integration testing; > and updated Copyright year at source files headers. > > http://cr.openjdk.java.net/~rraghavan/8071282/webrev.01/ > > > Thanks, > Rahul > > On Wednesday 14 March 2018 01:21 PM, Tobias Hartmann wrote: >> Hi Rahul, >> >> looks good to me too. If you get a chance, please update the copyright dates before pushing (no new >> webrev required). >> >> Thanks, >> Tobias >> >> On 13.03.2018 14:18, Rahul Raghavan wrote: >>> Hi, >>> >>> Please review the following cleanup changes. >>> >>> - >>> ??http://cr.openjdk.java.net/~rraghavan/8071282/webrev.00/ >>> >>> - >>> ??https://bugs.openjdk.java.net/browse/JDK-8071282- >>> ??'remove misc dead code' >>> >>> >>> -- Removed following unused methods - >>> ?? frame::min_local_offset_for_compiler >>> ?? frame::monitor_offset_for_compiler >>> ?? frame::pd_oop_map_offset_adjustment >>> ?? frame::local_offset_for_compiler >>> ?? frame::volatile_across_calls >>> ?? C1_MacroAssembler::unverified_entry >>> ?? void LIRGenerator::cmp_reg_mem(LIR_Condition condition, LIR_Opr reg, LIR_Opr base, LIR_Opr disp, >>> BasicType type, CodeEmitInfo* info) >>> >>> -- No issues with local builds, pre-integration testing in progress. >>> >>> >>> Thanks, >>> Rahul From vladimir.kozlov at oracle.com Wed Mar 14 16:57:57 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 14 Mar 2018 09:57:57 -0700 Subject: RFR 8199603: Build failures after JDK-8199421 "Add support for vector popcount" In-Reply-To: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> References: <5db89b74-167b-e151-fba9-22d3e2ea6ad8@redhat.com> Message-ID: <23193b47-32c8-83c9-8673-7b26d53ae494@oracle.com> Thank you, Aleksey, for fixing it fast! Vladimir On 3/14/18 4:18 AM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8199603 > > Fix: > http://cr.openjdk.java.net/~shade/8199603/webrev.01/ > > The problem with the offending change was that the declaration for vpopcntd comes in shared 32/64 > assembler, but definition is in 64 assembler only. Assembler method is being used from shared 32/64 > x86.ad, so the simplest fix is moving the definition close to shared 32/64, near existing popcntl. > > Testing: x86_32 build, x86_64 compiler/vectorization/TestPopCountVector test > (can run it through submit-hs, but I think it is too trivial to bother) > > Thanks, > -Aleksey > From vladimir.kozlov at oracle.com Wed Mar 14 17:04:30 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 14 Mar 2018 10:04:30 -0700 Subject: RFR(S): Vector popcount support In-Reply-To: References: <48D92A70936F7946BE99F3FF5ECB6461F1957FA0@ORSMSX105.amr.corp.intel.com> <3f80df98-6bb8-bbe9-238d-8b2ee5750ab4@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958964@ORSMSX105.amr.corp.intel.com> <83bd1990-afe4-f581-3993-d0944d040212@oracle.com> <48D92A70936F7946BE99F3FF5ECB6461F1958BAB@ORSMSX105.amr.corp.intel.com> Message-ID: <71fdcf82-2e00-0981-779c-c4ecbab49575@oracle.com> Yes, big thanks to Aleksey. And sorry for breaking 32-bit. Vladimir On 3/14/18 4:10 AM, Doerr, Martin wrote: > Hi Aleksey, > > awesome. Thank you. > > Best regards, > Martin > > > -----Original Message----- > From: Aleksey Shipilev [mailto:shade at redhat.com] > Sent: Mittwoch, 14. M?rz 2018 12:06 > To: Doerr, Martin ; Vladimir Kozlov ; Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Vector popcount support > > I filed: > https://bugs.openjdk.java.net/browse/JDK-8199603 > > ...and trying to fix it. > > -Aleksey > > On 03/14/2018 12:03 PM, Doerr, Martin wrote: >> Hi, >> >> unfortunately, this change breaks the Windows 32 bit build: >> ad_x86.obj : error LNK2019: unresolved external symbol "public: void __thiscall Assembler::vpopcntd(class XMMRegisterImpl *,class XMMRegisterImpl *,int)" (?vpopcntd at Assembler@@QAEXPAVXMMRegisterImpl@@0H at Z) referenced in function "private: virtual void __thiscall vpopcount16INode::emit(class CodeBuffer &,class PhaseRegAlloc *)const " (?emit at vpopcount16INode@@EBEXAAVCodeBuffer@@PAVPhaseRegAlloc@@@Z) >> >> Should the new nodes not be in x86_64.ad? >> The declaration of vpopcntd is in shared 32/64 bit code while the definition is in 64 bit code. >> >> Is already somebody working on a fix? >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov >> Sent: Dienstag, 13. M?rz 2018 19:26 >> To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): Vector popcount support >> >> On 3/13/18 11:15 AM, Lupusoru, Razvan A wrote: >>> Hi Vladimir, >>> >>> I wanted to add at least one test variant that will exercise a smaller vector size (in this case 8 bytes). That is because when AVX512 is supported, it is possible to use smaller vectors with EVEX encoding. If you find it unnecessary, please feel free to remove it before integration. >> >> Good to know. >> >> Testing passed and I pushed changes. >> >> Thanks, >> Vladimir >> >>> >>> Thanks! >>> >>> --Razvan >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, March 12, 2018 6:33 PM >>> To: Lupusoru, Razvan A ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): Vector popcount support >>> >>> Looks good. Thank you for explanation about b,w and q variants. I will start testing. >>> >>> Can you explain why you have additional test with MaxVectorSize=8? >>> >>> Thanks, >>> Vladimir >>> >>> On 3/12/18 5:47 PM, Lupusoru, Razvan A wrote: >>>> Hi Vladimir, >>>> >>>> Thank you so much for the quick review. >>>> >>>> I have addressed following issues: >>>> - renamed support_avx512_vpopcntdq to supports_vpopcntdq >>>> - No longer check UseAVX > 2 following your suggestion to clear flag >>>> when AVX512 is not available >>>> - Added length checking in predicate >>>> - Added appropriate compiler tests to exercise functionality >>>> >>>> Update patch is available at: >>>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/ >>>> >>>> Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do. >>>> >>>> Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking. >>>> >>>> Let me know if you have any more comments or suggestions! >>>> >>>> Thanks, >>>> Razvan >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Friday, March 09, 2018 4:03 PM >>>> To: Lupusoru, Razvan A ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(S): Vector popcount support >>>> >>>> Hi Razvan, >>>> >>>> In general changes are good. Do you plans to add vpopcntb,w too? >>>> >>>> Use 'supports' with 's' in name as in other support functions names: >>>> supports_avx512_vpopcntdq() >>>> >>>> Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512? >>>> >>>> In assembler_x86.cpp and other places you don't need to check UseAVX, >>>> support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3: >>>> >>>> if (UseAVX < 3) { >>>> _features &= ~CPU_AVX512F; >>>> _features &= ~CPU_AVX512DQ; >>>> _features &= ~CPU_AVX512CD; >>>> _features &= ~CPU_AVX512BW; >>>> _features &= ~CPU_AVX512VL; >>>> + _features &= ~CPU_AVX512_VPOPCNTDQ; >>>> } >>>> >>>> In x86.ad you forgot to add length check in predicate() like next: >>>> >>>> instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{ >>>> predicate(UseAVX > 0 && n->as_Vector()->length() == 2); >>>> >>>> And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote: >>>>> Hi everyone, >>>>> >>>>> As per "Intel Architecture Instruction Set Extensions and Future >>>>> Features Programming Reference" manual [1], vector popcount >>>>> instruction will be supported in future Intel ISA. I have updated the >>>>> superword vectorizer to take advantage of this instruction. I have >>>>> tested with Intel SDE [2] to confirm encoding and semantics are >>>>> correctly implemented. Please take a look and let me know if you have >>>>> any questions or comments. >>>>> >>>>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/inde >>>>> x >>>>> .html >>>>> >>>>> Thanks, >>>>> >>>>> Razvan >>>>> >>>>> [1] >>>>> https://software.intel.com/sites/default/files/managed/c5/15/architec >>>>> t ure-instruction-set-extensions-programming-reference.pdf >>>>> >>>>> [2] >>>>> https://software.intel.com/en-us/articles/intel-software-development- >>>>> e >>>>> mulator >>>>> >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8199421 >>>>> > > From vladimir.kozlov at oracle.com Wed Mar 14 17:42:54 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 14 Mar 2018 10:42:54 -0700 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> Message-ID: <14a9a030-15b0-aab3-d840-3f12b80db275@oracle.com> Very good! Thank you for doing this analysis. Please, run our usual mach5 set of tests. Thanks, Vladimir On 3/14/18 3:17 AM, Nils Eliasson wrote: > Hi Vladimir, > > After taking a very close look I found that the anti-dependency checking > that hoists the testN_mem_reg from the jmpCon is broken, and that the > hoisting is unnecessary. So this is not a case where we need > anti-depenency checking for loads before matching. > > Generally the insert_anti_dependences looks good, except the > store->is_Phi() clause that is full of holes (overly conservative). I > don't think I fully understand how the graph looks when the clause is > needed, but it tries to find stores upwards that is otherwise > unreachable from the downward memory flow search. > > I found these three flaws: > > 1) A Phi in a block that is preceded by a store - even though the store > is dominated by the loads LCA it will force the testN up! We don't check > where the stores are located. > > 2) A Phi that consumes the same memory as the load may force it up, even > though no stores are involved. > > 3) A Phi that consumes a mergemem, which in itself has already has been > processed and passed as irrelevant, may force the testN up. > > One could add that any predecessor to the phi would have to be a > store/call to affect the load placement. > > I have also added some additional debugging printouts to the patch. > > http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ > https://bugs.openjdk.java.net/browse/JDK-8192992 > > Regards, > > // Nils > > > > On 2018-03-05 18:02, Vladimir Kozlov wrote: >> Hi Nils, >> >> Yes, it is legal workaround but this way you removed all subsuming >> loads in code. >> >> Should we do anti-dependency check for loads during matching when >> shared nodes are marked?: >> >> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 >> >> >> How expensive would be that? >> >> Vladimir >> >> On 3/5/18 7:50 AM, Nils Eliasson wrote: >>> Hi, >>> >>> This patch is a workaround for a scheduling problem encountered in >>> some rare circumstances. Instead of hitting the assert we retry the >>> compilation without subsuming loads. >>> >>> To quote Tobias: >>> >>> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) >>> is scheduled in a different block than its jmpCon user and the >>> register allocator tries to spill the flag register. The problem is >>> that PhaseCFG::schedule_late() detects an anti-dependency for the >>> testN_mem_reg0 on a bottom memory Phi and therefore raises the LCA to >>> the early block (see PhaseCFG::insert_anti_dependences()) which is >>> "far away" from its jmpCon user. " >>> >>> Thanks to Roland who suggested the workaround. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>> >>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >>> >>> Regards, >>> >>> Nils >>> > From ningsheng.jian at linaro.org Thu Mar 15 03:03:19 2018 From: ningsheng.jian at linaro.org (Ningsheng Jian) Date: Thu, 15 Mar 2018 11:03:19 +0800 Subject: [aarch64-port-dev ] RFR: 8191954: AArch64: disable UseCISCSpill in C2 In-Reply-To: <9c3ad319-39dc-61e4-0f38-b5c4ff27fc37@redhat.com> References: <2bbe989e-558f-7cf6-a453-1fefd8ba9551@redhat.com> <9c3ad319-39dc-61e4-0f38-b5c4ff27fc37@redhat.com> Message-ID: Hi, Is this trivial patch OK for jdk11 now? http://cr.openjdk.java.net/~njian/8191954/webrev.00/ If it is OK, could someone please help to push the patch? Thanks, Ningsheng On 28 November 2017 at 18:11, Andrew Haley wrote: > On 28/11/17 09:35, Andrew Dinn wrote: >> On 28/11/17 08:49, Andrew Haley wrote: >>> On 28/11/17 05:54, Ningsheng Jian wrote: >> >> So, leaving UseCISCSpill set to true causes a small amount of extra >> checking to be done but no action is ever taken. Of course that also >> means the change is not going to be /critical/ to performance i.e. >> rushing to squeeze it into jdk10 is not justified. > > OK, so there's no hurry to change it, then. :-) > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ningsheng.jian at linaro.org Thu Mar 15 07:42:19 2018 From: ningsheng.jian at linaro.org (Ningsheng Jian) Date: Thu, 15 Mar 2018 15:42:19 +0800 Subject: RFR: 8173100: AArch64: -XX:-UseOnStackReplacement does not work together with -XX:+TieredCompilation. Message-ID: (Resend of [1]) Hi, Please help to review this fix. Bug: https://bugs.openjdk.java.net/browse/JDK-8173100 Webrev: http://cr.openjdk.java.net/~njian/8173100/webrev.01/ JDK-8159620 included a test case compiler/interpreter/DisableOSRTest.java which exposes the same issue on AArch64. Basically, if we specify -XX:-UseOnStackReplacement and -XX:+TieredCompilation (default) options, there are still OSR compilations found. The root cause is that, even with -UseOnStackReplacement, interpreter will still count the backedge and jump to backedge_counter_overflow to request an OSR compilation. With correct label passed to increment_mask_and_jump, it will either jump to OSR or dispatch to next target instruction. This fix also covers an old fix of [2] and makes the code align with x86 code (On x86, the increment_mask_and_jump just jumps to the next instruction if backedge_counter_overflow is not generated, so no such issue then.) JTreg tests passed. [1] http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2017-January/004119.html [2] http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2014-February/000764.html Thanks, Ningsheng From vladimir.kozlov at oracle.com Fri Mar 16 00:18:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Mar 2018 17:18:01 -0700 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> <68B11413-38BE-424C-B7A0-5C2E544BA8D7@oracle.com> Message-ID: http://cr.openjdk.java.net/~kvn/8199212/webrev.01/ After additional tesrting I decided redo changes by splitting also Runtime and GC tests when we run with Xcomp. I also doubled timeouts for few Hotspot tests which hit time limits on SPARC in some configurations. Thanks, Vladimir On 3/12/18 12:39 PM, Vladimir Kozlov wrote: > Thank you, Igor > > Vladimir > > On 3/12/18 12:20 PM, Igor Ignatev wrote: >> Hi Vladimir, >> >> The fix looks good. >> >> ? Igor >> >>> On Mar 7, 2018, at 9:02 PM, Vladimir Kozlov >>> wrote: >>> >>> http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8199212 >>> >>> Our testing show that AOT jtreg tests consume a lot of time when >>> running with -Xcomp.? Running AOT compiler in these tests with -Xcomp >>> is not helpful. >>> >>> -- >>> Thanks, >>> Vladimir >> From aph at redhat.com Fri Mar 16 09:51:56 2018 From: aph at redhat.com (Andrew Haley) Date: Fri, 16 Mar 2018 09:51:56 +0000 Subject: RFR: 8173100: AArch64: -XX:-UseOnStackReplacement does not work together with -XX:+TieredCompilation. In-Reply-To: References: Message-ID: On 15/03/18 07:42, Ningsheng Jian wrote: > Please help to review this fix. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8173100 > Webrev: http://cr.openjdk.java.net/~njian/8173100/webrev.01/ That looks good. Thank you. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nils.eliasson at oracle.com Fri Mar 16 15:37:56 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 16 Mar 2018 16:37:56 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <14a9a030-15b0-aab3-d840-3f12b80db275@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> <14a9a030-15b0-aab3-d840-3f12b80db275@oracle.com> Message-ID: <1a7f06b3-bffe-cd1e-3142-0167443dbae4@oracle.com> Hi Vladimir, I managed to smash right into JDK-6843752 "missing code for an anti-dependent Phi in GCM". Thanks for adding that test :) I'll be back next week with a solution. Regards, // Nils On 2018-03-14 18:42, Vladimir Kozlov wrote: > Very good! Thank you for doing this analysis. > Please, run our usual mach5 set of tests. > > Thanks, > Vladimir > > On 3/14/18 3:17 AM, Nils Eliasson wrote: >> Hi Vladimir, >> >> After taking a very close look I found that the anti-dependency >> checking that hoists the testN_mem_reg from the jmpCon is broken, and >> that the hoisting is unnecessary. So this is not a case where we need >> anti-depenency checking for loads before matching. >> >> Generally the insert_anti_dependences looks good, except the >> store->is_Phi() clause that is full of holes (overly conservative). I >> don't think I fully understand how the graph looks when the clause is >> needed, but it tries to find stores upwards that is otherwise >> unreachable from the downward memory flow search. >> >> I found these three flaws: >> >> 1) A Phi in a block that is preceded by a store - even though the >> store is dominated by the loads LCA it will force the testN up! We >> don't check where the stores are located. >> >> 2) A Phi that consumes the same memory as the load may force it up, >> even though no stores are involved. >> >> 3) A Phi that consumes a mergemem, which in itself has already has >> been processed and passed as irrelevant, may force the testN up. >> >> One could add that any predecessor to the phi would have to be a >> store/call to affect the load placement. >> >> I have also added some additional debugging printouts to the patch. >> >> http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ >> https://bugs.openjdk.java.net/browse/JDK-8192992 >> >> Regards, >> >> // Nils >> >> >> >> On 2018-03-05 18:02, Vladimir Kozlov wrote: >>> Hi Nils, >>> >>> Yes, it is legal workaround but this way you removed all subsuming >>> loads in code. >>> >>> Should we do anti-dependency check for loads during matching when >>> shared nodes are marked?: >>> >>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 >>> >>> >>> How expensive would be that? >>> >>> Vladimir >>> >>> On 3/5/18 7:50 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> This patch is a workaround for a scheduling problem encountered in >>>> some rare circumstances. Instead of hitting the assert we retry the >>>> compilation without subsuming loads. >>>> >>>> To quote Tobias: >>>> >>>> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), >>>> NULL)) is scheduled in a different block than its jmpCon user and >>>> the register allocator tries to spill the flag register. The >>>> problem is that PhaseCFG::schedule_late() detects an >>>> anti-dependency for the testN_mem_reg0 on a bottom memory Phi and >>>> therefore raises the LCA to the early block (see >>>> PhaseCFG::insert_anti_dependences()) which is "far away" from its >>>> jmpCon user. " >>>> >>>> Thanks to Roland who suggested the workaround. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>>> >>>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >>>> >>>> Regards, >>>> >>>> Nils >>>> >> From dmitrij.pochepko at bell-sw.com Fri Mar 16 16:21:38 2018 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 16 Mar 2018 19:21:38 +0300 Subject: [aarch64-port-dev ] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: References: <02e62700-efaf-918e-f04e-031dd4788aa0@bell-sw.com> Message-ID: Hi Andrew, all, I modified patch according to your comments. First, regarding size statistics: I looked into string size statistics which was measured for JEP254: Compact Strings (http://docs.huihoo.com/javaone/2015/CON5483-Compact-Strings-A-Memory-Efficient-Internal-Representation-for-Strings.pdf). Conclusions are that 75% of strings <= 32 symbols, so we can take this number into account. This also means that changing string equals code to be capable of prefetching(and thus adding branch for that) is not practical. Arrays are another story. So, I split array equals and string equals. String equals remains basically unchanged(the only noticeable change is that I removed size calculations for UTF strings (asr to convert length-in-bytes into length-in-characters, and than converting it back via lsr). It save 2 instructions for this code path: about 5% improvement on small sizes. Array equals was completely re-written. For array equals I have 2 algorithms, controlled by vm option: UseSimpleArrayEquals. First algorithm(simple one) is basically the same as original one with slightly another branch logic. It seems like Cortex A7* series prefer shorter code, which fits better for speculative execution. All other CPUs I was able to check(Cortex A53 and Cavium h/w) prefer algorithm with large loop and possible prefetching. Regarding your comment about addresses at the end of memory page: you're right. It is indeed possible in theory to use this code for some substring, however, due to the nature of algorithm for array equals(it reads array length from array header directly and then jump to first array element), memory access will always be 8-byte aligned, then it's safe to use 8-byte loads for array tails. So, I believe this issue is now naturally resolved, since I left string equals logic unchanged. Now, benchmarks: I modified benchmark to have better measurements accuracy by increasing amount of data processed in each iteration (http://cr.openjdk.java.net/~dpochepk/8187472/ArrayAltEquals.java) and has following improvement average numbers for array equals: Cavium "ThunderX 2": length 1..8: 10-20% length 9..16: 1% length 17..32: 5-10% large(512+): almost 2 times Cavium "Thunder X": length 1..8: 1% length 9..16: 2-3% length 17..32: 5% large arrays(512+): up to 5 times Cortex A53 (R-Pi): all ranges are about 5% average (note: large results dispersion (about 10%) on R-PI) Cortex A73: basically no changes because implementation is almost the same (better by about 0.5% average, but dispersion is about 4%, so it's not a statistically significant result) More detailed benchmarking results can be found here: http://cr.openjdk.java.net/~dpochepk/8187472/array_equals_total.xls updated webrev: http://cr.openjdk.java.net/~dpochepk/8187472/webrev.08/ Testing: passed jtreg hotspot tests on AArch64 fastdebug build. no new failures found in comparison with non-patched build. I also additionally used "brute-force" test which checks all array equality combinations for any given length ranges: http://cr.openjdk.java.net/~dpochepk/8187472/ArrayEqTest.java Thanks, Dmitrij On 08.02.2018 13:11, Andrew Haley wrote: > On 07/02/18 19:39, Dmitrij Pochepko wrote: >> In general, this patch changes very short arrays handling(performing >> 8-byte read instead of few smaller reads, using the fact of 8-byte >> alignment) and jumping into stub with large 64-byte read loop for larger >> arrays). >> >> Measurements(measured array length 7,64,128,256,512,1024,100000. >> Improvement in %. 80% improvement means that new version is 80% faster, >> i.e. 5 times.): >> >> >> ThunderX: 2%, -4%, 0%, 2%, 32%, 55%, 80% >> >> ThunderX2: 0%, -3%, 17%, 19%, 29%, 31%, 47% >> >> Cortex A53 at 533MHz: 8%, -1%, -2%, 4%, 6%, 5%, 3% >> >> Cortex A73 at 903MHz: 8%, -3%, 0%, 7%, 8%, 9%, 8% >> >> Note: medium sizes are a bit slower because of additional branch >> added(which checks size and jumps to stub). > This indentation is messed up: > > @@ -5201,40 +5217,23 @@ > // length == 4. > if (log_elem_size > 0) > lsl(cnt1, cnt1, log_elem_size); > - ldr(tmp1, Address(a1, cnt1)); > - ldr(tmp2, Address(a2, cnt1)); > + ldr(tmp1, Address(a1, cnt1)); > + ldr(tmp2, Address(a2, cnt1)); > > I'm not convinced that this works correctly if passed the address of a pair > of arrays at the end of a page. Maybe it isn't used on sub-arrays today > in HotSpot, but one day it might be. > > It pessimizes a very common case of strings, those of about 32 characters. > Please think again. Please also think about strings that are long enough > for the SIMD loop but differ in their early substrings. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Sat Mar 17 01:45:32 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 16 Mar 2018 18:45:32 -0700 Subject: RFR 8198969: Update Graal Message-ID: <799A1314-9FBC-4CBD-B2E0-E2F47F6AB465@oracle.com> Webrev: http://cr.openjdk.java.net/~iveresov/8198969/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8198969 Please refer to the JBS entry for the list of the changes included in the update. Thanks, igor From vladimir.kozlov at oracle.com Sat Mar 17 02:06:50 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Mar 2018 19:06:50 -0700 Subject: RFR 8198969: Update Graal In-Reply-To: <799A1314-9FBC-4CBD-B2E0-E2F47F6AB465@oracle.com> References: <799A1314-9FBC-4CBD-B2E0-E2F47F6AB465@oracle.com> Message-ID: <4f5fdd5c-a4b1-4c11-0b9e-1cae75072aa4@oracle.com> Looks good. I thought you need AOT and JVMCI changes too. Vladimir On 3/16/18 6:45 PM, Igor Veresov wrote: > Webrev: http://cr.openjdk.java.net/~iveresov/8198969/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8198969 > > Please refer to the JBS entry for the list of the changes included in the update. > > Thanks, > igor > From igor.veresov at oracle.com Sat Mar 17 02:09:16 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 16 Mar 2018 19:09:16 -0700 Subject: RFR 8198969: Update Graal In-Reply-To: <4f5fdd5c-a4b1-4c11-0b9e-1cae75072aa4@oracle.com> References: <799A1314-9FBC-4CBD-B2E0-E2F47F6AB465@oracle.com> <4f5fdd5c-a4b1-4c11-0b9e-1cae75072aa4@oracle.com> Message-ID: No, I ended up avoiding touching JVMCI. igor > On Mar 16, 2018, at 7:06 PM, Vladimir Kozlov wrote: > > Looks good. I thought you need AOT and JVMCI changes too. > > Vladimir > > On 3/16/18 6:45 PM, Igor Veresov wrote: >> Webrev: http://cr.openjdk.java.net/~iveresov/8198969/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8198969 >> Please refer to the JBS entry for the list of the changes included in the update. >> Thanks, >> igor From vladimir.kozlov at oracle.com Sat Mar 17 02:37:51 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Mar 2018 19:37:51 -0700 Subject: RFR 8198969: Update Graal In-Reply-To: References: <799A1314-9FBC-4CBD-B2E0-E2F47F6AB465@oracle.com> <4f5fdd5c-a4b1-4c11-0b9e-1cae75072aa4@oracle.com> Message-ID: Okay > On Mar 16, 2018, at 7:09 PM, Igor Veresov wrote: > > No, I ended up avoiding touching JVMCI. > > igor > >> On Mar 16, 2018, at 7:06 PM, Vladimir Kozlov wrote: >> >> Looks good. I thought you need AOT and JVMCI changes too. >> >> Vladimir >> >>> On 3/16/18 6:45 PM, Igor Veresov wrote: >>> Webrev: http://cr.openjdk.java.net/~iveresov/8198969/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8198969 >>> Please refer to the JBS entry for the list of the changes included in the update. >>> Thanks, >>> igor > From ningsheng.jian at linaro.org Mon Mar 19 01:28:50 2018 From: ningsheng.jian at linaro.org (Ningsheng Jian) Date: Mon, 19 Mar 2018 09:28:50 +0800 Subject: RFR: 8173100: AArch64: -XX:-UseOnStackReplacement does not work together with -XX:+TieredCompilation. In-Reply-To: References: Message-ID: Thank you Andrew! Regards, Ningsheng On 16 March 2018 at 17:51, Andrew Haley wrote: > On 15/03/18 07:42, Ningsheng Jian wrote: >> Please help to review this fix. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8173100 >> Webrev: http://cr.openjdk.java.net/~njian/8173100/webrev.01/ > > That looks good. Thank you. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -- Ningsheng From nils.eliasson at oracle.com Mon Mar 19 09:17:28 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 19 Mar 2018 10:17:28 +0100 Subject: 8193935: RFR(S): Illegal countedLoops transformation Message-ID: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> Hi, This bug was found in mpegaudio hiding behind the loop predication. The Counted loop transformation may loose a significant truncation which changes the behaviour of the program. CountedLoopNode::match_incr_with_optional_truncation finds the truncation Op_AndI(0x7fff) and sets trunc_t = TypeInt::CHAR. However the program does not use it for a char truncation, but a accessing an array as a circular buffer. (Any other mask would have hidden this problem since char truncation is the only one matched). A loop is succesfully matched as a countedloop, and when the trip counter is cloned it drops the truncation. In the intended char-case that is ok. In this case the truncation prevents the program from hitting an AIOOB. In the general case, if a truncated loop counter is compared to an array length (or any variable) it must be provable that the array length is less than the truncation, and then the truncation can be omitted. If the array length can be longer, the exit may never be taken - the loop may never terminate, and a counted loop transform can not be performed. One additional topic of discussion is if we really want to do counted loop transformations with a RangeCheck as exit point. Especially if the profiling shows that the RangeCheck never fails. In the loop that fails there are multiple exits, many which are RangeChecks. For additional optimization opportunities we could consider rotating the loop until a normal compare is the loop exit condition. Image of significant parts of node graph (the entire loop with its multiple exits, is omitted): http://cr.openjdk.java.net/~neliasso/8193935/mpegaudio.png bug: https://bugs.openjdk.java.net/browse/JDK-8193935 webrev: http://cr.openjdk.java.net/~neliasso/8193935/webrev.01 Please review, Nils Eliasson From lutz.schmidt at sap.com Mon Mar 19 16:00:08 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 19 Mar 2018 16:00:08 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> Message-ID: Dear all, this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. May I please request reviews for Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). - All references to the RFE id should be gone. - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. - The edited/updated documentation is available as an attachment to the bug (in PDF format). - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. - The code style "hickups", noted by Tobias Hartmann, are gone. - The compile time warnings and errors are resolved. -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. Comments are very welcome! Best Regards, Lutz From rwestrel at redhat.com Mon Mar 19 16:40:24 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 19 Mar 2018 17:40:24 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default Message-ID: http://cr.openjdk.java.net/~roland/8196294/webrev.00/ When loop strip mining is enabled by default for G1, the LoopStripMiningIterShortLoop option should be set too. Roland. From shade at redhat.com Mon Mar 19 16:50:41 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 19 Mar 2018 17:50:41 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: On 03/19/2018 05:40 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8196294/webrev.00/ > > When loop strip mining is enabled by default for G1, the > LoopStripMiningIterShortLoop option should be set too. Um. So there is a block in Arguments::check_vm_args_consistency(): if (FLAG_IS_DEFAULT(LoopStripMiningIterShortLoop)) { // blind guess LoopStripMiningIterShortLoop = LoopStripMiningIter / 10; } Is that block misplaced? Should be removed, if we init this per-GC? Or, it should be moved somewhere after GC argument initializations? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dean.long at oracle.com Mon Mar 19 20:32:27 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 19 Mar 2018 13:32:27 -0700 Subject: RFR(S) 8146201: [AOT] Class static initializers that are not pure should not be executed during static compilation Message-ID: <78cd3635-89f1-6a2d-1679-1d930aef6c7e@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8146201 http://cr.openjdk.java.net/~dlong/8146201/webrev Previously, jaotc would run static initializers on classes it accessed during compilation, which is undesirable if those initializers are not pure.? This change does not attempt to identify if an initializer is pure.? Instead, the compiler avoids triggering any initialization at all, while still doing eager resolution and linking.? The upstream Graal changes have been pushed here: https://github.com/oracle/graal/commit/8411a80308c4dea31b05897c3bbb1c8e642fdd67 where a new HotSpotLazyInitializationTest test was also added. dl From jcbeyler at google.com Mon Mar 19 21:06:22 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 19 Mar 2018 21:06:22 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi all, The incremental webrev update is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ The full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ Major change here is: - I've removed the heapMonitoring.cpp code in favor of just having the sampling events as per Serguei's request; I still have to do some overhead measurements but the tests prove the concept can work - Most of the tlab code is unchanged, the only major part is that now things get sent off to event collectors when used and enabled. - Added the interpreter collectors to handle interpreter execution - Updated the name from SetTlabHeapSampling to SetHeapSampling to be more generic - Added a mutex for the thread sampling so that we can initialize an internal static array safely - Ported the tests from the old system to this new one I've also updated the JEP and CSR to reflect these changes: https://bugs.openjdk.java.net/browse/JDK-8194905 https://bugs.openjdk.java.net/browse/JDK-8171119 In order to make this have some forward progress, I've removed the heap sampling code entirely and now rely entirely on the event sampling system. The tests reflect this by using a simplified implementation of what an agent could do: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c (Search for anything mentioning event_storage). I have not taken the time to port the whole code we had originally in heapMonitoring to this. I hesitate only because that code was in C++, I'd have to port it to C and this is for tests so perhaps what I have now is good enough? As far as testing goes, I've ported all the relevant tests and then added a few: - Turning the system on/off - Testing using various GCs - Testing using the interpreter - Testing the sampling rate - Testing with objects and arrays - Testing with various threads Finally, as overhead goes, I have the numbers of the system off vs a clean build and I have 0% overhead, which is what we'd want. This was using the Dacapo benchmarks. I am now preparing to run a version with the events on using dacapo and will report back here. Any comments are welcome :) Jc On Thu, Mar 8, 2018 at 4:00 PM JC Beyler wrote: > Hi all, > > I apologize for the delay but I wanted to add an event system and that > took a bit longer than expected and I also reworked the code to take into > account the deprecation of FastTLABRefill. > > This update has four parts: > > A) I moved the implementation from Thread to ThreadHeapSampler inside of > Thread. Would you prefer it as a pointer inside of Thread or like this > works for you? Second question would be would you rather have an > association outside of Thread altogether that tries to remember when > threads are live and then we would have something like: > ThreadHeapSampler::get_sampling_size(this_thread); > > I worry about the overhead of this but perhaps it is not too too bad? > > B) I also have been working on the Allocation event system that sends out > a notification at each sampled event. This will be practical when wanting > to do something at the allocation point. I'm also looking at if the whole > heapMonitoring code could not reside in the agent code and not in the JDK. > I'm not convinced but I'm talking to Serguei about it to see/assess :) > - Also added two tests for the new event subsystem > > C) Removed the slow_path fields inside the TLAB code since now > FastTLABRefill is deprecated > > D) Updated the JVMTI documentation and specification for the methods. > > So the incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > and the full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > I believe I have updated the various JIRA issues that track this :) > > Thanks for your input, > Jc > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler wrote: > >> Hi Erik, >> >> I inlined my answers, which the last one seems to answer Robbin's >> concerns about the same thing (adding things to Thread). >> >> On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund < >> erik.osterlund at oracle.com> wrote: >> >>> Hi JC, >>> >>> Comments are inlined below. >>> >>> >>> On 2018-02-13 06:18, JC Beyler wrote: >>> >>> Hi Erik, >>> >>> Thanks for your answers, I've now inlined my own answers/comments. >>> >>> I've done a new webrev here: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>> >>> The incremental is here: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>> >>> Note to all: >>> - I've been integrating changes from Erin/Serguei/David comments so >>> this webrev incremental is a bit an answer to all comments in one. I >>> apologize for that :) >>> >>> >>> On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund < >>> erik.osterlund at oracle.com> wrote: >>> >>>> Hi JC, >>>> >>>> Sorry for the delayed reply. >>>> >>>> Inlined answers: >>>> >>>> >>>> On 2018-02-06 00:04, JC Beyler wrote: >>>> >>>>> Hi Erik, >>>>> >>>>> (Renaming this to be folded into the newly renamed thread :)) >>>>> >>>>> First off, thanks a lot for reviewing the webrev! I appreciate it! >>>>> >>>>> I updated the webrev to: >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>>>> >>>>> And the incremental one is here: >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>>>> >>>>> It contains: >>>>> - The change for since from 9 to 11 for the jvmti.xml >>>>> - The use of the OrderAccess for initialized >>>>> - Clearing the oop >>>>> >>>>> I also have inlined my answers to your comments. The biggest question >>>>> will come from the multiple *_end variables. A bit of the logic there >>>>> is due to handling the slow path refill vs fast path refill and >>>>> checking that the rug was not pulled underneath the slowpath. I >>>>> believe that a previous comment was that TlabFastRefill was going to >>>>> be deprecated. >>>>> >>>>> If this is true, we could revert this code a bit and just do a : if >>>>> TlabFastRefill is enabled, disable this. And then deprecate that when >>>>> TlabFastRefill is deprecated. >>>>> >>>>> This might simplify this webrev and I can work on a follow-up that >>>>> either: removes TlabFastRefill if Robbin does not have the time to do >>>>> it or add the support to the assembly side to handle this correctly. >>>>> What do you think? >>>>> >>>> >>>> I support removing TlabFastRefill, but I think it is good to not depend >>>> on that happening first. >>>> >>>> >>> >>> I'm slowly pushing on the FastTLABRefill ( >>> >>> https://bugs.openjdk.java.net/browse/JDK-8194084), I agree on keeping >>> both separate for now though so that we can think of both differently >>> >>> >>> >>>> Now, below, inlined are my answers: >>>>> >>>>> On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund >>>>> wrote: >>>>> >>>>>> Hi JC, >>>>>> >>>>>> Hope I am reviewing the right version of your work. Here goes... >>>>>> >>>>>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>>>>> >>>>>> 159 AllocTracer::send_allocation_outside_tlab(klass, result, >>>>>> size * >>>>>> HeapWordSize, THREAD); >>>>>> 160 >>>>>> 161 THREAD->tlab().handle_sample(THREAD, result, size); >>>>>> 162 return result; >>>>>> 163 } >>>>>> >>>>>> Should not call tlab()->X without checking if (UseTLAB) IMO. >>>>>> >>>>>> Done! >>>>> >>>> >>>> More about this later. >>>> >>>> >>>> >>>>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>>>>> >>>>>> So first of all, there seems to quite a few ends. There is an "end", >>>>>> a "hard >>>>>> end", a "slow path end", and an "actual end". Moreover, it seems like >>>>>> the >>>>>> "hard end" is actually further away than the "actual end". So the >>>>>> "hard end" >>>>>> seems like more of a "really definitely actual end" or something. I >>>>>> don't >>>>>> know about you, but I think it looks kind of messy. In particular, I >>>>>> don't >>>>>> feel like the name "actual end" reflects what it represents, >>>>>> especially when >>>>>> there is another end that is behind the "actual end". >>>>>> >>>>>> 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { >>>>>> 414 // Did a fast TLAB refill occur? >>>>>> 415 if (_slow_path_end != _end) { >>>>>> 416 // Fix up the actual end to be now the end of this TLAB. >>>>>> 417 _slow_path_end = _end; >>>>>> 418 _actual_end = _end; >>>>>> 419 } >>>>>> 420 >>>>>> 421 return _actual_end + alignment_reserve(); >>>>>> 422 } >>>>>> >>>>>> I really do not like making getters unexpectedly have these kind of >>>>>> side >>>>>> effects. It is not expected that when you ask for the "hard end", you >>>>>> implicitly update the "slow path end" and "actual end" to new values. >>>>>> >>>>>> As I said, a lot of this is due to the FastTlabRefill. If I make this >>>>> not supporting FastTlabRefill, this goes away. The reason the system >>>>> needs to update itself at the get is that you only know at that get if >>>>> things have shifted underneath the tlab slow path. I am not sure of >>>>> really better names (naming is hard!), perhaps we could do these >>>>> names: >>>>> >>>>> - current_tlab_end // Either the allocated tlab end or a >>>>> sampling point >>>>> - last_allocation_address // The end of the tlab allocation >>>>> - last_slowpath_allocated_end // In case a fast refill occurred the >>>>> end might have changed, this is to remember slow vs fast past refills >>>>> >>>>> the hard_end method can be renamed to something like: >>>>> tlab_end_pointer() // The end of the lab including a bit of >>>>> alignment reserved bytes >>>>> >>>> >>>> Those names sound better to me. Could you please provide a mapping from >>>> the old names to the new names so I understand which one is which please? >>>> >>>> This is my current guess of what you are proposing: >>>> >>>> end -> current_tlab_end >>>> actual_end -> last_allocation_address >>>> slow_path_end -> last_slowpath_allocated_end >>>> hard_end -> tlab_end_pointer >>>> >>>> >>> Yes that is correct, that was what I was proposing. >>> >>> >>>> I would prefer this naming: >>>> >>>> end -> slow_path_end // the end for taking a slow path; either due to >>>> sampling or refilling >>>> actual_end -> allocation_end // the end for allocations >>>> slow_path_end -> last_slow_path_end // last address for slow_path_end >>>> (as opposed to allocation_end) >>>> hard_end -> reserved_end // the end of the reserved space of the TLAB >>>> >>>> About setting things in the getter... that still seems like a very >>>> unpleasant thing to me. It would be better to inspect the call hierarchy >>>> and explicitly update the ends where they need updating, and assert in the >>>> getter that they are in sync, rather than implicitly setting various ends >>>> as a surprising side effect in a getter. It looks like the call hierarchy >>>> is very small. With my new naming convention, reserved_end() would >>>> presumably return _allocation_end + alignment_reserve(), and have an assert >>>> checking that _allocation_end == _last_slow_path_allocation_end, >>>> complaining that this invariant must hold, and that a caller to this >>>> function, such as make_parsable(), must first explicitly synchronize the >>>> ends as required, to honor that invariant. >>>> >>>> >>> >>> I've renamed the variables to how you preferred it except for the _end >>> one. I did: >>> current_end >>> last_allocation_address >>> tlab_end_ptr >>> >>> The reason is that the architecture dependent code use the thread.hpp >>> API and it already has tlab included into the name so it becomes >>> tlab_current_end (which is better that tlab_current_tlab_end in my opinion). >>> >>> I also moved the update into a separate method with a TODO that says to >>> remove it when FastTLABRefill is deprecated >>> >>> >>> This looks a lot better now. Thanks. >>> >>> Note that the following comment now needs updating accordingly in >>> threadLocalAllocBuffer.hpp: >>> >>> 41 // Heap sampling is performed via the end/actual_end fields. 42 // actual_end contains the real end of the tlab allocation, 43 // whereas end can be set to an arbitrary spot in the tlab to 44 // trip the return and sample the allocation. 45 // slow_path_end is used to track if a fast tlab refill occured 46 // between slowpath calls. >>> >>> There might be other comments too, I have not looked in detail. >>> >> >> This was the only spot that still had an actual_end, I fixed it now. I'll >> do a sweep to double check other comments. >> >> >>> >>> >>> >>> >>> >>>> >>>> Not sure it's better but before updating the webrev, I wanted to try >>>>> to get input/consensus :) >>>>> >>>>> (Note hard_end was always further off than end). >>>>> >>>>> src/hotspot/share/prims/jvmti.xml: >>>>>> >>>>>> 10357 >>>>>> 10358 >>>>>> 10359 Can sample the heap. >>>>>> 10360 If this capability is enabled then the heap sampling >>>>>> methods >>>>>> can be called. >>>>>> 10361 >>>>>> 10362 >>>>>> >>>>>> Looks like this capability should not be "since 9" if it gets >>>>>> integrated >>>>>> now. >>>>>> >>>>> Updated now to 11, crossing my fingers :) >>>>> >>>>> >>>>> src/hotspot/share/runtime/heapMonitoring.cpp: >>>>>> >>>>>> 448 if (is_alive->do_object_b(value)) { >>>>>> 449 // Update the oop to point to the new object if it is >>>>>> still >>>>>> alive. >>>>>> 450 f->do_oop(&(trace.obj)); >>>>>> 451 >>>>>> 452 // Copy the old trace, if it is still live. >>>>>> 453 _allocated_traces->at_put(curr_pos++, trace); >>>>>> 454 >>>>>> 455 // Store the live trace in a cache, to be served up on >>>>>> /heapz. >>>>>> 456 _traces_on_last_full_gc->append(trace); >>>>>> 457 >>>>>> 458 count++; >>>>>> 459 } else { >>>>>> 460 // If the old trace is no longer live, add it to the >>>>>> list of >>>>>> 461 // recently collected garbage. >>>>>> 462 store_garbage_trace(trace); >>>>>> 463 } >>>>>> >>>>>> In the case where the oop was not live, I would like it to be >>>>>> explicitly >>>>>> cleared. >>>>>> >>>>> Done I think how you wanted it. Let me know because I'm not familiar >>>>> with the RootAccess API. I'm unclear if I'm doing this right or not so >>>>> reviews of these parts are highly appreciated. Robbin had talked of >>>>> perhaps later pushing this all into a OopStorage, should I do this now >>>>> do you think? Or can that wait a second webrev later down the road? >>>>> >>>> >>>> I think using handles can and should be done later. You can use the >>>> Access API now. >>>> I noticed that you are missing an #include "oops/access.inline.hpp" in >>>> your heapMonitoring.cpp file. >>>> >>>> >>> The missing header is there for me so I don't know, I made sure it is >>> present in the latest webrev. Sorry about that. >>> >>> >>> >>>> + Did I clear it the way you wanted me to or were you thinking of >>>>> something else? >>>>> >>>> >>>> That is precisely how I wanted it to be cleared. Thanks. >>>> >>>> + Final question here, seems like if I were to want to not do the >>>>> f->do_oop directly on the trace.obj, I'd need to do something like: >>>>> >>>>> f->do_oop(&value); >>>>> ... >>>>> trace->store_oop(value); >>>>> >>>>> to update the oop internally. Is that right/is that one of the >>>>> advantages of going to the Oopstorage sooner than later? >>>>> >>>> >>>> I think you really want to do the do_oop on the root directly. Is there >>>> a particular reason why you would not want to do that? >>>> Otherwise, yes - the benefit with using the handle approach is that you >>>> do not need to call do_oop explicitly in your code. >>>> >>>> >>> There is no reason except that now we have a load_oop and a >>> get_oop_addr, I was not sure what you would think of that. >>> >>> >>> That's fine. >>> >>> >>> >>>> >>>>> Also I see a lot of concurrent-looking use of the following field: >>>>>> 267 volatile bool _initialized; >>>>>> >>>>>> Please note that the "volatile" qualifier does not help with >>>>>> reordering >>>>>> here. Reordering between volatile and non-volatile fields is >>>>>> completely free >>>>>> for both compiler and hardware, except for windows with MSVC, where >>>>>> volatile >>>>>> semantics is defined to use acquire/release semantics, and the >>>>>> hardware is >>>>>> TSO. But for the general case, I would expect this field to be stored >>>>>> with >>>>>> OrderAccess::release_store and loaded with OrderAccess::load_acquire. >>>>>> Otherwise it is not thread safe. >>>>>> >>>>> Because everything is behind a mutex, I wasn't really worried about >>>>> this. I have a test that has multiple threads trying to hit this >>>>> corner case and it passes. >>>>> >>>>> However, to be paranoid, I updated it to using the OrderAccess API >>>>> now, thanks! Let me know what you think there too! >>>>> >>>> >>>> If it is indeed always supposed to be read and written under a mutex, >>>> then I would strongly prefer to have it accessed as a normal non-volatile >>>> member, and have an assertion that given lock is held or we are in a >>>> safepoint, as we do in many other places. Something like this: >>>> >>>> assert(HeapMonitorStorage_lock->owned_by_self() || >>>> (SafepointSynchronize::is_at_safepoint() && >>>> Thread::current()->is_VM_thread()), "this should not be accessed >>>> concurrently"); >>>> >>>> It would be confusing to people reading the code if there are uses of >>>> OrderAccess that are actually always protected under a mutex. >>>> >>>> >>> Thank you for the exact example to be put in the code! I put it around >>> each access/assignment of the _initialized method and found one case where >>> yes you can touch it and not have the lock. It actually is "ok" because you >>> don't act on the storage until later and only when you really want to >>> modify the storage (see the object_alloc_do_sample method which calls the >>> add_trace method). >>> >>> But, because of this, I'm going to put the OrderAccess here, I'll do >>> some performance numbers later and if there are issues, I might add a >>> "unsafe" read and a "safe" one to make it explicit to the reader. But I >>> don't think it will come to that. >>> >>> >>> Okay. This double return in heapMonitoring.cpp looks wrong: >>> >>> 283 bool initialized() { >>> 284 return OrderAccess::load_acquire(&_initialized) != 0; >>> 285 return _initialized; >>> 286 } >>> >>> Since you said object_alloc_do_sample() is the only place where you do >>> not hold the mutex while reading initialized(), I had a closer look at >>> that. It looks like in its current shape, the lack of a mutex may lead to a >>> memory leak. In particular, it first checks if (initialized()). Let's >>> assume this is now true. It then allocates a bunch of stuff, and checks if >>> the number of frames were over 0. If they were, it calls >>> StackTraceStorage::storage()->add_trace() seemingly hoping that after >>> grabbing the lock in there, initialized() will still return true. But it >>> could now return false and skip doing anything, in which case the allocated >>> stuff will never be freed. >>> >> >> I fixed this now by making add_trace return a boolean and checking for >> that. It will be in the next webrev. Thanks, the truth is that in our >> implementation the system is always on or off, so this never really occurs >> :). In this version though, that is not true and it's important to handle >> so thanks again! >> >> >> >>> >>> So the analysis seems to be that _initialized is only used outside of >>> the mutex in once instance, where it is used to perform double-checked >>> locking, that actually causes a memory leak. >>> >>> I am not proposing how to fix that, just raising the issue. If you still >>> want to perform this double-checked locking somehow, then the use of >>> acquire/release still seems odd. Because the memory ordering restrictions >>> of it never comes into play in this particular case. If it ever did, then >>> the use of destroy_stuff(); release_store(_initialized, 0) would be broken >>> anyway as that would imply that whatever concurrent reader there ever was >>> would after reading _initialized with load_acquire() could *never* read the >>> data that is concurrently destroyed anyway. I would be biased to think that >>> RawAccess::load/store looks like a more appropriate solution, >>> given that the memory leak issue is resolved. I do not know how painful it >>> would be to not perform this double-checked locking. >>> >> >> So I agree with this entirely. I looked also a bit more and the >> difference and code really stems from our internal version. In this version >> however, there are actually a lot of things going on that I did not go >> entirely through in my head but this comment made me ponder a bit more on >> it. >> >> Since every object_alloc_do_sample is protected by a check to >> HeapMonitoring::enabled(), there is only a small chance that the call is >> happening when things have been disabled. So there is no real need to do a >> first check on the initialized, it is a rare occurence that a call happens >> to object_alloc_do_sample and the initialized of the storage returns false. >> >> (By the way, even if you did call object_alloc_do_sample without looking >> at HeapMonitoring::enabled(), that would be ok too. You would gather the >> stacktrace and get nowhere at the add_trace call, which would return false; >> so though not optimal performance wise, nothing would break). >> >> Furthermore, the add_trace is really the moment of no return and we have >> the mutex lock and then the initialized check. So, in the end, I did two >> things: I removed that first check and then I removed the OrderAccess for >> the storage initialized. I think now I have a better grasp and >> understanding why it was done in our code and why it is not needed here. >> Thanks for pointing it out :). This now still passes my JTREG tests, >> especially the threaded one. >> >> >> >> >> >>> >>> >>> >>> >>> >>>> As a kind of meta comment, I wonder if it would make sense to add >>>>>> sampling >>>>>> for non-TLAB allocations. Seems like if someone is rapidly allocating >>>>>> a >>>>>> whole bunch of 1 MB objects that never fit in a TLAB, I might still be >>>>>> interested in seeing that in my traces, and not get surprised that the >>>>>> allocation rate is very high yet not showing up in any profiles. >>>>>> >>>>>> That is handled by the handle_sample where you wanted me to put a >>>>> UseTlab because you hit that case if the allocation is too big. >>>>> >>>> >>>> I see. It was not obvious to me that non-TLAB sampling is done in the >>>> TLAB class. That seems like an abstraction crime. >>>> What I wanted in my previous comment was that we do not call into the >>>> TLAB when we are not using TLABs. If there is sampling logic in the TLAB >>>> that is used for something else than TLABs, then it seems like that logic >>>> simply does not belong inside of the TLAB. It should be moved out of the >>>> TLAB, and instead have the TLAB call this common abstraction that makes >>>> sense. >>>> >>>> >>> So in the incremental version: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is >>> still a "crime". The reason is that the system has to have the >>> bytes_until_sample on a per-thread level and it made "sense" to have it >>> with the TLAB implementation. Also, I was not sure how people felt about >>> adding something to the thread instance instead. >>> >>> Do you think it fits better at the Thread level? I can see how difficult >>> it is to make it happen there and add some logic there. Let me know what >>> you think. >>> >>> >>> We have an unfortunate situation where everyone that has some fields >>> that are thread local tend to dump them right into Thread, making the size >>> and complexity of Thread grow as it becomes tightly coupled with various >>> unrelated subsystems. It would be desirable to have a separate class for >>> this instead that encapsulates the sampling logic. That class could >>> possibly reside in Thread though as a value object of Thread. >>> >> >> I imagined that would be the case but was not sure. I will look at the >> example that Robbin is talking about (ThreadSMR) and will see how to >> refactor my code to use that. >> >> Thanks again for your help, >> Jc >> >> >>> >>> >>> >>> >>> >>>> Hope I have answered your questions and that my feedback makes sense to >>>> you. >>>> >>>> >>> You have and thank you for them, I think we are getting to a cleaner >>> implementation and things are getting better and more readable :) >>> >>> >>> Yes it is getting better. >>> >>> Thanks, >>> /Erik >>> >>> >>> Thanks for your help! >>> Jc >>> >>> >>> >>>> Thanks, >>>> /Erik >>>> >>>> >>>> I double checked by changing the test >>>>> >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java >>>>> >>>>> to use a smaller Tlab (2048) and made the object bigger and it goes >>>>> through that and passes. >>>>> >>>>> Thanks again for your review and I look forward to your pointers for >>>>> the questions I now have raised! >>>>> Jc >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>>> /Erik >>>>>> >>>>>> >>>>>> On 2018-01-26 06:45, JC Beyler wrote: >>>>>> >>>>>>> Thanks Robbin for the reviews :) >>>>>>> >>>>>>> The new full webrev is here: >>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ >>>>>>> The incremental webrev is here: >>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ >>>>>>> >>>>>>> I inlined my answers: >>>>>>> >>>>>>> On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn < >>>>>>> robbin.ehn at oracle.com> wrote: >>>>>>> >>>>>>>> Hi JC, great to see another revision! >>>>>>>> >>>>>>>> #### >>>>>>>> heapMonitoring.cpp >>>>>>>> >>>>>>>> StackTraceData should not contain the oop for 'safety' reasons. >>>>>>>> When StackTraceData is moved from _allocated_traces: >>>>>>>> L452 store_garbage_trace(trace); >>>>>>>> it contains a dead oop. >>>>>>>> _allocated_traces could instead be a tupel of oop and >>>>>>>> StackTraceData thus >>>>>>>> dead oops are not kept. >>>>>>>> >>>>>>> Done I used inheritance to make the copier work regardless but the >>>>>>> idea is the same. >>>>>>> >>>>>>> You should use the new Access API for loading the oop, something like >>>>>>>> this: >>>>>>>> RootAccess::load(...) >>>>>>>> I don't think you need to use Access API for clearing the oop, but >>>>>>>> it >>>>>>>> would >>>>>>>> look nicer. And you shouldn't probably be using: >>>>>>>> Universe::heap()->is_in_reserved(value) >>>>>>>> >>>>>>> I am unfamiliar with this but I think I did do it like you wanted me >>>>>>> to (all tests pass so that's a start). I'm not sure how to clear the >>>>>>> oop exactly, is there somewhere that does that, which I can use to do >>>>>>> the same? >>>>>>> >>>>>>> I removed the is_in_reserved, this came from our internal version, I >>>>>>> don't know why it was there but my tests work without so I removed it >>>>>>> :) >>>>>>> >>>>>>> >>>>>>> The lock: >>>>>>>> L424 MutexLocker mu(HeapMonitorStorage_lock); >>>>>>>> Is not needed as far as I can see. >>>>>>>> weak_oops_do is called in a safepoint, no TLAB allocation can >>>>>>>> happen and >>>>>>>> JVMTI thread can't access these data-structures. Is there something >>>>>>>> more >>>>>>>> to >>>>>>>> this lock that I'm missing? >>>>>>>> >>>>>>> Since a thread can call the JVMTI getLiveTraces (or any of the other >>>>>>> ones), it can get to the point of trying to copying the >>>>>>> _allocated_traces. I imagine it is possible that this is happening >>>>>>> during a GC or that it can be started and a GC happens afterwards. >>>>>>> Therefore, it seems to me that you want this protected, no? >>>>>>> >>>>>>> >>>>>>> #### >>>>>>>> You have 6 files without any changes in them (any more): >>>>>>>> g1CollectedHeap.cpp >>>>>>>> psMarkSweep.cpp >>>>>>>> psParallelCompact.cpp >>>>>>>> genCollectedHeap.cpp >>>>>>>> referenceProcessor.cpp >>>>>>>> thread.hpp >>>>>>>> >>>>>>>> Done. >>>>>>> >>>>>>> #### >>>>>>>> I have not looked closely, but is it possible to hide heap sampling >>>>>>>> in >>>>>>>> AllocTracer ? (with some minor changes to the AllocTracer API) >>>>>>>> >>>>>>>> I am imagining that you are saying to move the code that does the >>>>>>> sampling code (change the tlab end, do the call to HeapMonitoring, >>>>>>> etc.) into the AllocTracer code itself? I think that is right and >>>>>>> I'll >>>>>>> look if that is possible and prepare a webrev to show what would be >>>>>>> needed to make that happen. >>>>>>> >>>>>>> #### >>>>>>>> Minor nit, when declaring pointer there is a little mix of having >>>>>>>> the >>>>>>>> pointer adjacent by type name and data name. (Most hotspot code is >>>>>>>> by >>>>>>>> type >>>>>>>> name) >>>>>>>> E.g. >>>>>>>> heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... >>>>>>>> heapMonitoring.cpp:733 Method* m = vfst.method(); >>>>>>>> (not just this file) >>>>>>>> >>>>>>>> Done! >>>>>>> >>>>>>> #### >>>>>>>> HeapMonitorThreadOnOffTest.java:77 >>>>>>>> I would make g_tmp volatile, otherwise the assignment in loop may >>>>>>>> theoretical be skipped. >>>>>>>> >>>>>>>> Also done! >>>>>>> >>>>>>> Thanks again! >>>>>>> Jc >>>>>>> >>>>>> >>>>>> >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Mar 19 21:16:21 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Mar 2018 14:16:21 -0700 Subject: RFR(S) 8146201: [AOT] Class static initializers that are not pure should not be executed during static compilation In-Reply-To: <78cd3635-89f1-6a2d-1679-1d930aef6c7e@oracle.com> References: <78cd3635-89f1-6a2d-1679-1d930aef6c7e@oracle.com> Message-ID: Changes are fine I think but I don't see changes for AOT code mentioned in bug report: http://hg.openjdk.java.net/jdk/hs/file/00992d4e8a23/src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTBackend.java#l158 Thanks, Vladimir On 3/19/18 1:32 PM, dean.long at oracle.com wrote: > https://bugs.openjdk.java.net/browse/JDK-8146201 > http://cr.openjdk.java.net/~dlong/8146201/webrev > > Previously, jaotc would run static initializers on classes it accessed during compilation, which is > undesirable if those initializers are not pure.? This change does not attempt to identify if an > initializer is pure.? Instead, the compiler avoids triggering any initialization at all, while still > doing eager resolution and linking.? The upstream Graal changes have been pushed here: > > https://github.com/oracle/graal/commit/8411a80308c4dea31b05897c3bbb1c8e642fdd67 > > where a new HotSpotLazyInitializationTest test was also added. > > dl From vladimir.kozlov at oracle.com Mon Mar 19 21:31:35 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Mar 2018 14:31:35 -0700 Subject: 8193935: RFR(S): Illegal countedLoops transformation In-Reply-To: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> References: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> Message-ID: Hi Nils, Do I miss something in explanation? The check you added should be true for all normal loops: if (limit_t->_hi > incr_t->_hi) { return false; // limit might be a value that incr never can reach Fro example: for (int i = 0; i < 10; i++) {} 10 > 1 and you will not convert this loop into counted. Thanks, Vladimir On 3/19/18 2:17 AM, Nils Eliasson wrote: > Hi, > > This bug was found in mpegaudio hiding behind the loop predication. The Counted loop transformation > may loose a significant truncation which changes the behaviour of the program. > > CountedLoopNode::match_incr_with_optional_truncation finds the truncation Op_AndI(0x7fff) and sets > trunc_t = TypeInt::CHAR. However the program does not use it for a char truncation, but a accessing > an array as a circular buffer. (Any other mask would have hidden this problem since char truncation > is the only one matched). > > A loop is succesfully matched as a countedloop, and when the trip counter is cloned it drops the > truncation. In the intended char-case that is ok. In this case the truncation prevents the program > from hitting an AIOOB. > > In the general case, if a truncated loop counter is compared to an array length (or any variable) it > must be provable that the array length is less than the truncation, and then the truncation can be > omitted. If the array length can be longer, the exit may never be taken - the loop may never > terminate, and a counted loop transform can not be performed. > > One additional topic of discussion is if we really want to do counted loop transformations with a > RangeCheck as exit point. Especially if the profiling shows that the RangeCheck never fails. In the > loop that fails there are multiple exits, many which are RangeChecks. > > For additional optimization opportunities we could consider rotating the loop until a normal compare > is the loop exit condition. > > > Image of significant parts of node graph (the entire loop with its multiple exits, is omitted): > http://cr.openjdk.java.net/~neliasso/8193935/mpegaudio.png > > bug: https://bugs.openjdk.java.net/browse/JDK-8193935 > > webrev: http://cr.openjdk.java.net/~neliasso/8193935/webrev.01 > > > Please review, > > Nils Eliasson > From dean.long at oracle.com Mon Mar 19 22:39:53 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 19 Mar 2018 15:39:53 -0700 Subject: RFR(S) 8146201: [AOT] Class static initializers that are not pure should not be executed during static compilation In-Reply-To: References: <78cd3635-89f1-6a2d-1679-1d930aef6c7e@oracle.com> Message-ID: <59c82953-83af-43a7-abc5-36cd35eaba42@oracle.com> Hi Vladimir.? Thanks for looking at it. On 3/19/18 2:16 PM, Vladimir Kozlov wrote: > Changes are fine I think but I don't see changes for AOT code > mentioned in bug report: > > http://hg.openjdk.java.net/jdk/hs/file/00992d4e8a23/src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTBackend.java#l158 > > withEagerResolving(true) by itself is not the problem, except that it used to imply "eager initialization" as well.? Now, we allow HotSpotClassInitializationPlugin to override the default eagerResolving --> implies eagerInitializing behavior, so that for AOT we can have eagerResolving == true and eagerInitializing == false. My first idea was to add withEagerInitializing() to GraphBuilderConfiguration, but I decided to go with just the HotSpotClassInitializationPlugin changes instead. dl > Thanks, > Vladimir > > On 3/19/18 1:32 PM, dean.long at oracle.com wrote: >> https://bugs.openjdk.java.net/browse/JDK-8146201 >> http://cr.openjdk.java.net/~dlong/8146201/webrev >> >> Previously, jaotc would run static initializers on classes it >> accessed during compilation, which is undesirable if those >> initializers are not pure.? This change does not attempt to identify >> if an initializer is pure.? Instead, the compiler avoids triggering >> any initialization at all, while still doing eager resolution and >> linking.? The upstream Graal changes have been pushed here: >> >> https://github.com/oracle/graal/commit/8411a80308c4dea31b05897c3bbb1c8e642fdd67 >> >> >> where a new HotSpotLazyInitializationTest test was also added. >> >> dl From vladimir.kozlov at oracle.com Mon Mar 19 23:01:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Mar 2018 16:01:01 -0700 Subject: RFR(S) 8146201: [AOT] Class static initializers that are not pure should not be executed during static compilation In-Reply-To: <59c82953-83af-43a7-abc5-36cd35eaba42@oracle.com> References: <78cd3635-89f1-6a2d-1679-1d930aef6c7e@oracle.com> <59c82953-83af-43a7-abc5-36cd35eaba42@oracle.com> Message-ID: Okay. thank you for explanation. Vladimir On 3/19/18 3:39 PM, dean.long at oracle.com wrote: > Hi Vladimir.? Thanks for looking at it. > > On 3/19/18 2:16 PM, Vladimir Kozlov wrote: >> Changes are fine I think but I don't see changes for AOT code mentioned in bug report: >> >> http://hg.openjdk.java.net/jdk/hs/file/00992d4e8a23/src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/AOTBackend.java#l158 >> >> > > withEagerResolving(true) by itself is not the problem, except that it used to imply "eager > initialization" as well.? Now, we allow HotSpotClassInitializationPlugin to override the default > eagerResolving --> implies eagerInitializing behavior, so that for AOT we can have eagerResolving == > true and eagerInitializing == false. > > My first idea was to add withEagerInitializing() to GraphBuilderConfiguration, but I decided to go > with just the HotSpotClassInitializationPlugin changes instead. > > dl > >> Thanks, >> Vladimir >> >> On 3/19/18 1:32 PM, dean.long at oracle.com wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8146201 >>> http://cr.openjdk.java.net/~dlong/8146201/webrev >>> >>> Previously, jaotc would run static initializers on classes it accessed during compilation, which >>> is undesirable if those initializers are not pure.? This change does not attempt to identify if >>> an initializer is pure.? Instead, the compiler avoids triggering any initialization at all, while >>> still doing eager resolution and linking.? The upstream Graal changes have been pushed here: >>> >>> https://github.com/oracle/graal/commit/8411a80308c4dea31b05897c3bbb1c8e642fdd67 >>> >>> where a new HotSpotLazyInitializationTest test was also added. >>> >>> dl > From vladimir.kozlov at oracle.com Tue Mar 20 00:21:39 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Mar 2018 17:21:39 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> Message-ID: Hi Lutz, Looks good. You left commented code in heap.hpp not related to UL. And in codeCache.cpp and java.cpp for log stream. Personally I agree to avoid UL prefix (it takes half of line on terminal). UL should allow to specify if you want it or not in your output. Thanks, Vladimir On 3/19/18 9:00 AM, Schmidt, Lutz wrote: > Dear all, > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > May I please request reviews for > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > - All references to the RFE id should be gone. > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > - The code style "hickups", noted by Tobias Hartmann, are gone. > - The compile time warnings and errors are resolved. > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > Comments are very welcome! > > Best Regards, > Lutz > > > > From felix.yang at huawei.com Tue Mar 20 01:02:04 2018 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 20 Mar 2018 01:02:04 +0000 Subject: [aarch64-port-dev ] RFR: 8173100: AArch64: -XX:-UseOnStackReplacement does not work together with -XX:+TieredCompilation. In-Reply-To: References: Message-ID: LGTM. Pushed. Thanks, Felix > -XX:-UseOnStackReplacement does not work together with > -XX:+TieredCompilation. > > Thank you Andrew! > > Regards, > Ningsheng > > On 16 March 2018 at 17:51, Andrew Haley wrote: > > On 15/03/18 07:42, Ningsheng Jian wrote: > >> Please help to review this fix. > >> > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8173100 > >> Webrev: http://cr.openjdk.java.net/~njian/8173100/webrev.01/ > > > > That looks good. Thank you. > > > > -- > > Andrew Haley > > Java Platform Lead Engineer > > Red Hat UK Ltd. > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > > > -- > Ningsheng From felix.yang at huawei.com Tue Mar 20 01:15:09 2018 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 20 Mar 2018 01:15:09 +0000 Subject: [aarch64-port-dev ] RFR: 8191954: AArch64: disable UseCISCSpill in C2 In-Reply-To: References: <2bbe989e-558f-7cf6-a453-1fefd8ba9551@redhat.com> <9c3ad319-39dc-61e4-0f38-b5c4ff27fc37@redhat.com> Message-ID: As this patch is trivial and has been reviewed before, I think it is time for it to go. Pushed. Thanks, Felix > > Hi, > > Is this trivial patch OK for jdk11 now? > > http://cr.openjdk.java.net/~njian/8191954/webrev.00/ > > If it is OK, could someone please help to push the patch? > > Thanks, > Ningsheng > > On 28 November 2017 at 18:11, Andrew Haley wrote: > > On 28/11/17 09:35, Andrew Dinn wrote: > >> On 28/11/17 08:49, Andrew Haley wrote: > >>> On 28/11/17 05:54, Ningsheng Jian wrote: > >> > >> So, leaving UseCISCSpill set to true causes a small amount of extra > >> checking to be done but no action is ever taken. Of course that also > >> means the change is not going to be /critical/ to performance i.e. > >> rushing to squeeze it into jdk10 is not justified. > > > > OK, so there's no hurry to change it, then. :-) > > > > -- > > Andrew Haley > > Java Platform Lead Engineer > > Red Hat UK Ltd. > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ningsheng.jian at linaro.org Tue Mar 20 01:27:40 2018 From: ningsheng.jian at linaro.org (Ningsheng Jian) Date: Tue, 20 Mar 2018 09:27:40 +0800 Subject: [aarch64-port-dev ] RFR: 8191954: AArch64: disable UseCISCSpill in C2 In-Reply-To: References: <2bbe989e-558f-7cf6-a453-1fefd8ba9551@redhat.com> <9c3ad319-39dc-61e4-0f38-b5c4ff27fc37@redhat.com> Message-ID: Thank you Felix! Regards, Ningsheng On 20 March 2018 at 09:15, Yangfei (Felix) wrote: > As this patch is trivial and has been reviewed before, I think it is time for it to go. Pushed. > > Thanks, > Felix > >> >> Hi, >> >> Is this trivial patch OK for jdk11 now? >> >> http://cr.openjdk.java.net/~njian/8191954/webrev.00/ >> >> If it is OK, could someone please help to push the patch? >> >> Thanks, >> Ningsheng >> >> On 28 November 2017 at 18:11, Andrew Haley wrote: >> > On 28/11/17 09:35, Andrew Dinn wrote: >> >> On 28/11/17 08:49, Andrew Haley wrote: >> >>> On 28/11/17 05:54, Ningsheng Jian wrote: >> >> >> >> So, leaving UseCISCSpill set to true causes a small amount of extra >> >> checking to be done but no action is ever taken. Of course that also >> >> means the change is not going to be /critical/ to performance i.e. >> >> rushing to squeeze it into jdk10 is not justified. >> > >> > OK, so there's no hurry to change it, then. :-) >> > >> > -- >> > Andrew Haley >> > Java Platform Lead Engineer >> > Red Hat UK Ltd. >> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -- Ningsheng From ningsheng.jian at linaro.org Tue Mar 20 01:28:27 2018 From: ningsheng.jian at linaro.org (Ningsheng Jian) Date: Tue, 20 Mar 2018 09:28:27 +0800 Subject: [aarch64-port-dev ] RFR: 8173100: AArch64: -XX:-UseOnStackReplacement does not work together with -XX:+TieredCompilation. In-Reply-To: References: Message-ID: Thanks! Regards, Ningsheng On 20 March 2018 at 09:02, Yangfei (Felix) wrote: > LGTM. Pushed. > > Thanks, > Felix > >> -XX:-UseOnStackReplacement does not work together with >> -XX:+TieredCompilation. >> >> Thank you Andrew! >> >> Regards, >> Ningsheng >> >> On 16 March 2018 at 17:51, Andrew Haley wrote: >> > On 15/03/18 07:42, Ningsheng Jian wrote: >> >> Please help to review this fix. >> >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8173100 >> >> Webrev: http://cr.openjdk.java.net/~njian/8173100/webrev.01/ >> > >> > That looks good. Thank you. >> > >> > -- >> > Andrew Haley >> > Java Platform Lead Engineer >> > Red Hat UK Ltd. >> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> >> >> >> -- >> Ningsheng -- Ningsheng From rwestrel at redhat.com Tue Mar 20 09:25:57 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 20 Mar 2018 10:25:57 +0100 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining Message-ID: http://cr.openjdk.java.net/~roland/8199784/webrev.00/ When a Load is sunk out of a loop, if one of the uses is a Phi of a strip mined CountedLoop, PhaseIdealLoop::place_near_use() should return the entry control of the OuterStripMinedLoop. I couldn't write a test case that would trigger this and I'm not even sure it's even possible. We're seeing this with some graph patterns that are specific to Shenandoah but I think it's better to have this fixed anyway. Roland. From shade at redhat.com Tue Mar 20 09:28:34 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Mar 2018 10:28:34 +0100 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: References: Message-ID: On 03/20/2018 10:25 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8199784/webrev.00/ > > When a Load is sunk out of a loop, if one of the uses is a Phi of a > strip mined CountedLoop, PhaseIdealLoop::place_near_use() should return > the entry control of the OuterStripMinedLoop. > > I couldn't write a test case that would trigger this and I'm not even > sure it's even possible. We're seeing this with some graph patterns that > are specific to Shenandoah but I think it's better to have this fixed > anyway. Yes. Also, the patch in this form was in Shenandoah repos for a while, and survived the testing we threw on it. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Tue Mar 20 09:46:05 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Mar 2018 10:46:05 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> Message-ID: <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> Hi Lutz, very nice work! Thanks for incorporating the requested changes. I think you can remove the commented LogStream code. I'll re-run the tests that failed with the last webrev. Best regards, Tobias On 19.03.2018 17:00, Schmidt, Lutz wrote: > Dear all, > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > May I please request reviews for > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > - All references to the RFE id should be gone. > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > - The code style "hickups", noted by Tobias Hartmann, are gone. > - The compile time warnings and errors are resolved. > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > Comments are very welcome! > > Best Regards, > Lutz > > > > From rwestrel at redhat.com Tue Mar 20 10:11:10 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 20 Mar 2018 11:11:10 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: > Um. So there is a block in Arguments::check_vm_args_consistency(): > > if (FLAG_IS_DEFAULT(LoopStripMiningIterShortLoop)) { > // blind guess > LoopStripMiningIterShortLoop = LoopStripMiningIter / 10; > } > > Is that block misplaced? Should be removed, if we init this per-GC? Or, it should be moved somewhere > after GC argument initializations? Right. We can move it after GC initialization code so there's no duplicate code: http://cr.openjdk.java.net/~roland/8196294/webrev.01/ Roland. From shade at redhat.com Tue Mar 20 10:14:25 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Mar 2018 11:14:25 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: On 03/20/2018 11:11 AM, Roland Westrelin wrote: > >> Um. So there is a block in Arguments::check_vm_args_consistency(): >> >> if (FLAG_IS_DEFAULT(LoopStripMiningIterShortLoop)) { >> // blind guess >> LoopStripMiningIterShortLoop = LoopStripMiningIter / 10; >> } >> >> Is that block misplaced? Should be removed, if we init this per-GC? Or, it should be moved somewhere >> after GC argument initializations? > > Right. We can move it after GC initialization code so there's no > duplicate code: > > http://cr.openjdk.java.net/~roland/8196294/webrev.01/ Yes, that makes much more sense, and it covers the Shenandoah case too. Thanks! Looks good. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Tue Mar 20 10:16:50 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Mar 2018 11:16:50 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: Hi Roland, looks good to me but I would prefer using the 'getUintxVMFlag' WhitBox API method in the test (see test/lib/sun/hotspot/WhiteBox.java). Thanks, Tobias On 20.03.2018 11:11, Roland Westrelin wrote: > >> Um. So there is a block in Arguments::check_vm_args_consistency(): >> >> if (FLAG_IS_DEFAULT(LoopStripMiningIterShortLoop)) { >> // blind guess >> LoopStripMiningIterShortLoop = LoopStripMiningIter / 10; >> } >> >> Is that block misplaced? Should be removed, if we init this per-GC? Or, it should be moved somewhere >> after GC argument initializations? > > Right. We can move it after GC initialization code so there's no > duplicate code: > > http://cr.openjdk.java.net/~roland/8196294/webrev.01/ > > Roland. > From thomas.stuefe at gmail.com Tue Mar 20 10:23:27 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 20 Mar 2018 11:23:27 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> Message-ID: Hi Guys, I just talked with Lutz offlist about this UL printing thing. As far as printing at exit is concerned, I am not sure this is a good fit for UL. All the other "PrintSomethingAtExit" flags in print_statistics() (java.cpp) print controlled by switches and to tty. Why should CodeHeap printing be different? Best Regards, Thomas On Tue, Mar 20, 2018 at 10:46 AM, Tobias Hartmann < tobias.hartmann at oracle.com> wrote: > Hi Lutz, > > very nice work! Thanks for incorporating the requested changes. I think > you can remove the commented > LogStream code. > > I'll re-run the tests that failed with the last webrev. > > Best regards, > Tobias > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > Dear all, > > > > this is the next (second) iteration of my CodeHeap State Analytics > effort. It reflects all the comments and suggestions I received on my > initial RFR (sent out on March 1st). Please read on to learn what was > changed and what kept as is. > > > > May I please request reviews for > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > Instead of keeping the long tail of comments and responses, I decided to > provide a summary of what happened. > > - Most of the new code was moved to new files: > share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > - I have added, as requested, an abbreviated version of the "General > Description" chapter to codeHeapState.cpp > > - In case of SegmentedCodeCache, the iteration is limited to > FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using > FOR_ALL_HEAPS(). > > - All references to the RFE id should be gone. > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() > now is close to "PrintCodeCache" for both, product and nonproduct cases. > > - The edited/updated documentation is available as an attachment to the > bug (in PDF format). > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to > print the CodeHeap state for the first occurrence of the "full" condition. > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > - The compile time warnings and errors are resolved. > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > I clearly understand and support the intention to get rid of the Print* > command line arguments. Therefore, the PrintCodeHeapState command line > argument is gone. You can request the CodeHeap state analytics with the > -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache > full and vm shutdown) switches. The output is directed to tty, not to the > log stream. > > > > The reason for not using the log stream is simple: UL prefixes every > line with a timestamp and the trace tags. Unfortunately, that messes up my > formatting big time. The jcmd output, on the other hand, will not have the > UL prefixes. I would have to distinguish between UL and jcmd output when > formatting. In addition, I do not see a benefit from adding the same UL > prefix to thousands of lines. > > > > Comments are very welcome! > > > > Best Regards, > > Lutz > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Tue Mar 20 10:25:09 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 20 Mar 2018 10:25:09 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> Message-ID: <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> Hi Tobias, thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... Thanks, Lutz ?On 20.03.18, 10:46, "Tobias Hartmann" wrote: Hi Lutz, very nice work! Thanks for incorporating the requested changes. I think you can remove the commented LogStream code. I'll re-run the tests that failed with the last webrev. Best regards, Tobias On 19.03.2018 17:00, Schmidt, Lutz wrote: > Dear all, > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > May I please request reviews for > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > - All references to the RFE id should be gone. > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > - The code style "hickups", noted by Tobias Hartmann, are gone. > - The compile time warnings and errors are resolved. > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > Comments are very welcome! > > Best Regards, > Lutz > > > > From nils.eliasson at oracle.com Tue Mar 20 11:49:03 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 20 Mar 2018 12:49:03 +0100 Subject: 8193935: RFR(S): Illegal countedLoops transformation In-Reply-To: References: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> Message-ID: <6e239d9d-282f-769d-2f32-c1126b658175@oracle.com> Hi Vladimir, On 2018-03-19 22:31, Vladimir Kozlov wrote: > Hi Nils, > > Do I miss something in explanation? The check you added should be true > for all normal loops: Yes, I need to add some proper documentation for this. > > if (limit_t->_hi > incr_t->_hi) { > return false; // limit might be a value that incr never can reach > > Fro example: > > for (int i = 0; i < 10; i++) {} This would be: i1 <- 0; loop: i2 <- phi(0 , i3) i3 <- add(i2, 1) cmp (i3, 10) jlt loop inct_t is the type for i3 - IntType[0, 10] limit_t is IntType[10, 10] My case is something like: void testMethod(int[] array){ int i = 0; while (true) sum + = array[i]; i++; i = i && 0x7fff; } } In this case incr_t is the type of i at the end of the loop (or before the loop-phi at the start)- which is IntType[0,0x7fff] If array is longer than 0x8000 this code should never terminate. If array is shorter, it will get an AIOOB. Counted loops transform this into: int j = 0; while (true) { sum += array[j]; j++; } Best regards, Nils > > 10 > 1 and you will not convert this loop into counted. > > Thanks, > Vladimir > > On 3/19/18 2:17 AM, Nils Eliasson wrote: >> Hi, >> >> This bug was found in mpegaudio hiding behind the loop predication. >> The Counted loop transformation may loose a significant truncation >> which changes the behaviour of the program. >> >> CountedLoopNode::match_incr_with_optional_truncation finds the >> truncation Op_AndI(0x7fff) and sets trunc_t = TypeInt::CHAR. However >> the program does not use it for a char truncation, but a accessing an >> array as a circular buffer. (Any other mask would have hidden this >> problem since char truncation is the only one matched). >> >> A loop is succesfully matched as a countedloop, and when the trip >> counter is cloned it drops the truncation. In the intended char-case >> that is ok. In this case the truncation prevents the program from >> hitting an AIOOB. >> >> In the general case, if a truncated loop counter is compared to an >> array length (or any variable) it must be provable that the array >> length is less than the truncation, and then the truncation can be >> omitted. If the array length can be longer, the exit may never be >> taken - the loop may never terminate, and a counted loop transform >> can not be performed. >> >> One additional topic of discussion is if we really want to do counted >> loop transformations with a RangeCheck as exit point. Especially if >> the profiling shows that the RangeCheck never fails. In the loop that >> fails there are multiple exits, many which are RangeChecks. >> >> For additional optimization opportunities we could consider rotating >> the loop until a normal compare is the loop exit condition. >> >> >> Image of significant parts of node graph (the entire loop with its >> multiple exits, is omitted): >> http://cr.openjdk.java.net/~neliasso/8193935/mpegaudio.png >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8193935 >> >> webrev: http://cr.openjdk.java.net/~neliasso/8193935/webrev.01 >> >> >> Please review, >> >> Nils Eliasson >> From rwestrel at redhat.com Tue Mar 20 12:20:07 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 20 Mar 2018 13:20:07 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: Hi Tobias, > looks good to me but I would prefer using the 'getUintxVMFlag' WhitBox API method in the test (see > test/lib/sun/hotspot/WhiteBox.java). Thanks for looking at this. What about: http://cr.openjdk.java.net/~roland/8196294/webrev.02/ ? Roland. From tobias.hartmann at oracle.com Tue Mar 20 12:29:37 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Mar 2018 13:29:37 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: Hi Roland, Looks a bit overly complicated. I would just go with something like this: /* * @test * @bug 8196294 * @summary when loop strip is enabled, LoopStripMiningIterShortLoop should be not null * @library /test/lib / * @build sun.hotspot.WhiteBox * @run driver ClassFileInstaller sun.hotspot.WhiteBox sun.hotspot.WhiteBox$WhiteBoxPermission * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI * -XX:+UseG1GC compiler.loopstripmining.CheckLoopStripMiningIterShortLoop */ package compiler.loopstripmining; import sun.hotspot.WhiteBox; public class CheckLoopStripMiningIterShortLoop { private static final WhiteBox WHITE_BOX = WhiteBox.getWhiteBox(); public static void main(String[] args) throws Exception { long iterShort = WhiteBox.getWhiteBox().getUintxVMFlag("LoopStripMiningIterShortLoop"); if (iterShort <= 0) { throw new RuntimeException("LoopStripMiningIterShortLoop is not set"); } } } Best regards, Tobias On 20.03.2018 13:20, Roland Westrelin wrote: > > Hi Tobias, > >> looks good to me but I would prefer using the 'getUintxVMFlag' WhitBox API method in the test (see >> test/lib/sun/hotspot/WhiteBox.java). > > Thanks for looking at this. > What about: > > http://cr.openjdk.java.net/~roland/8196294/webrev.02/ > > ? > > Roland. > From rwestrel at redhat.com Tue Mar 20 13:41:11 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 20 Mar 2018 14:41:11 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: > Looks a bit overly complicated. I would just go with something like this: The problem then is that if the jtreg tests are run with non default arguments, the test might fail. Roland. From tobias.hartmann at oracle.com Tue Mar 20 14:03:11 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Mar 2018 15:03:11 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: On 20.03.2018 14:41, Roland Westrelin wrote: > The problem then is that if the jtreg tests are run with non default > arguments, the test might fail. Right. For simplicity, it's probably best to go with webrev.01 then. Thanks, Tobias From tobias.hartmann at oracle.com Tue Mar 20 14:45:15 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Mar 2018 15:45:15 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> Message-ID: Hi Lutz, I've already started testing with -Xlog:codecache=Debug and found a problem: The following tests compiler/whitebox/AllocationCodeBlobTest.java compiler/codecache/OverflowCodeCacheTest.java compiler/codecache/stress/ReturnBlobToWrongHeapTest.java compiler/codecache/stress/RandomAllocationTest.java compiler/profiling/spectrapredefineclass_classloaders/Launcher.java compiler/profiling/spectrapredefineclass/Launcher.java fail with # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- possible deadlock Let me know if you need more information to reproduce! Thanks, Tobias On 20.03.2018 11:25, Schmidt, Lutz wrote: > Hi Tobias, > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > Thanks, > Lutz > > ?On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > Hi Lutz, > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > LogStream code. > > I'll re-run the tests that failed with the last webrev. > > Best regards, > Tobias > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > Dear all, > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > May I please request reviews for > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > - All references to the RFE id should be gone. > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > - The compile time warnings and errors are resolved. > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > Comments are very welcome! > > > > Best Regards, > > Lutz > > > > > > > > > > From rwestrel at redhat.com Tue Mar 20 14:46:22 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 20 Mar 2018 15:46:22 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 Message-ID: http://cr.openjdk.java.net/~roland/8197931/webrev.00/ I need confirmation that this does fix the problem as I can't test it myself. Roland. From dean.long at oracle.com Tue Mar 20 18:13:57 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 20 Mar 2018 11:13:57 -0700 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: References: Message-ID: I'll test it for you. dl On 3/20/18 7:46 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8197931/webrev.00/ > > I need confirmation that this does fix the problem as I can't test it > myself. > > Roland. From lutz.schmidt at sap.com Tue Mar 20 18:29:47 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 20 Mar 2018 18:29:47 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> Message-ID: Hi Tobias, thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). That bug triggered some thoughts in my brain, resulting in a question or two: Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? Either approach has its advantages, so I'm more or less neutral. What do you all think? Depending on what's in favor by the community, I will move the locks accordingly. Thanks, Lutz ?On 20.03.18, 15:45, "Tobias Hartmann" wrote: Hi Lutz, I've already started testing with -Xlog:codecache=Debug and found a problem: The following tests compiler/whitebox/AllocationCodeBlobTest.java compiler/codecache/OverflowCodeCacheTest.java compiler/codecache/stress/ReturnBlobToWrongHeapTest.java compiler/codecache/stress/RandomAllocationTest.java compiler/profiling/spectrapredefineclass_classloaders/Launcher.java compiler/profiling/spectrapredefineclass/Launcher.java fail with # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- possible deadlock Let me know if you need more information to reproduce! Thanks, Tobias On 20.03.2018 11:25, Schmidt, Lutz wrote: > Hi Tobias, > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > Thanks, > Lutz > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > Hi Lutz, > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > LogStream code. > > I'll re-run the tests that failed with the last webrev. > > Best regards, > Tobias > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > Dear all, > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > May I please request reviews for > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > - All references to the RFE id should be gone. > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > - The compile time warnings and errors are resolved. > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > Comments are very welcome! > > > > Best Regards, > > Lutz > > > > > > > > > > From vladimir.kozlov at oracle.com Tue Mar 20 18:42:13 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Mar 2018 11:42:13 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> Message-ID: <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> I think you should follow what we do with CodeCache::print_summary(): http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 First, print into local buffer stringStream and then lock tty when print that buffer. Thanks, Vladimir On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > Hi Tobias, > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > That bug triggered some thoughts in my brain, resulting in a question or two: > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > Depending on what's in favor by the community, I will move the locks accordingly. > > Thanks, > Lutz > > > ?On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > Hi Lutz, > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > The following tests > compiler/whitebox/AllocationCodeBlobTest.java > compiler/codecache/OverflowCodeCacheTest.java > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > compiler/codecache/stress/RandomAllocationTest.java > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > compiler/profiling/spectrapredefineclass/Launcher.java > > fail with > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > possible deadlock > > Let me know if you need more information to reproduce! > > Thanks, > Tobias > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > Hi Tobias, > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > Thanks, > > Lutz > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > Hi Lutz, > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > LogStream code. > > > > I'll re-run the tests that failed with the last webrev. > > > > Best regards, > > Tobias > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > Dear all, > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > May I please request reviews for > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > - All references to the RFE id should be gone. > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > - The compile time warnings and errors are resolved. > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > Comments are very welcome! > > > > > > Best Regards, > > > Lutz > > > > > > > > > > > > > > > > > > From lutz.schmidt at sap.com Tue Mar 20 18:57:35 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 20 Mar 2018 18:57:35 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> Message-ID: <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> Hi Vladimir, I already saw that code but was a little hesitant to code the same way. Why? In my case, the stringStream buffer could become fairly large. Actual size depends on CodeHeap size and contents as well as printing parameters. If you tell me some MB are OK, I can change my code. Thanks, Lutz ?On 20.03.18, 19:42, "Vladimir Kozlov" wrote: I think you should follow what we do with CodeCache::print_summary(): http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 First, print into local buffer stringStream and then lock tty when print that buffer. Thanks, Vladimir On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > Hi Tobias, > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > That bug triggered some thoughts in my brain, resulting in a question or two: > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > Depending on what's in favor by the community, I will move the locks accordingly. > > Thanks, > Lutz > > > On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > Hi Lutz, > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > The following tests > compiler/whitebox/AllocationCodeBlobTest.java > compiler/codecache/OverflowCodeCacheTest.java > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > compiler/codecache/stress/RandomAllocationTest.java > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > compiler/profiling/spectrapredefineclass/Launcher.java > > fail with > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > possible deadlock > > Let me know if you need more information to reproduce! > > Thanks, > Tobias > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > Hi Tobias, > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > Thanks, > > Lutz > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > Hi Lutz, > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > LogStream code. > > > > I'll re-run the tests that failed with the last webrev. > > > > Best regards, > > Tobias > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > Dear all, > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > May I please request reviews for > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > - All references to the RFE id should be gone. > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > - The compile time warnings and errors are resolved. > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > Comments are very welcome! > > > > > > Best Regards, > > > Lutz > > > > > > > > > > > > > > > > > > From vladimir.kozlov at oracle.com Tue Mar 20 22:01:49 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Mar 2018 15:01:49 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> Message-ID: <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> As I remember we are trying to lock tty outside print functions. Yes, it could be troublesome if it is Mbs of output. Especially when you do it for "full codecache" event when VM is still running. You also have CodeCache_lock in print_heapinfo() and it would not be good to hold both locks at the same time. I think to have "micro locking" (with comments) in print_heapinfo() is better then to have lock in each print function. Vladimir On 3/20/18 11:57 AM, Schmidt, Lutz wrote: > Hi Vladimir, > I already saw that code but was a little hesitant to code the same way. Why? In my case, the stringStream buffer could become fairly large. Actual size depends on CodeHeap size and contents as well as printing parameters. If you tell me some MB are OK, I can change my code. > Thanks, > Lutz > > ?On 20.03.18, 19:42, "Vladimir Kozlov" wrote: > > I think you should follow what we do with CodeCache::print_summary(): > > http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 > > First, print into local buffer stringStream and then lock tty when print that buffer. > > Thanks, > Vladimir > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > > Hi Tobias, > > > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > > > That bug triggered some thoughts in my brain, resulting in a question or two: > > > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > > > Depending on what's in favor by the community, I will move the locks accordingly. > > > > Thanks, > > Lutz > > > > > > On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > > > Hi Lutz, > > > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > > > The following tests > > compiler/whitebox/AllocationCodeBlobTest.java > > compiler/codecache/OverflowCodeCacheTest.java > > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > > compiler/codecache/stress/RandomAllocationTest.java > > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > > compiler/profiling/spectrapredefineclass/Launcher.java > > > > fail with > > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > > possible deadlock > > > > Let me know if you need more information to reproduce! > > > > Thanks, > > Tobias > > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > > Hi Tobias, > > > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > > > Thanks, > > > Lutz > > > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > > > Hi Lutz, > > > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > > LogStream code. > > > > > > I'll re-run the tests that failed with the last webrev. > > > > > > Best regards, > > > Tobias > > > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > > Dear all, > > > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > > > May I please request reviews for > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > > - All references to the RFE id should be gone. > > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > > - The compile time warnings and errors are resolved. > > > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > > > Comments are very welcome! > > > > > > > > Best Regards, > > > > Lutz > > > > > > > > > > > > > > > > > > > > > > > > > > > > From vladimir.kozlov at oracle.com Tue Mar 20 23:36:27 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Mar 2018 16:36:27 -0700 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: References: Message-ID: I agree with changes. But can you explain why you need additional check useblock == u_loop->_head? Since x (clone) comes from outside loop it should be safe place it outside OuterStripMinedLoop. Thanks, Vladimir On 3/20/18 2:25 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8199784/webrev.00/ > > When a Load is sunk out of a loop, if one of the uses is a Phi of a > strip mined CountedLoop, PhaseIdealLoop::place_near_use() should return > the entry control of the OuterStripMinedLoop. > > I couldn't write a test case that would trigger this and I'm not even > sure it's even possible. We're seeing this with some graph patterns that > are specific to Shenandoah but I think it's better to have this fixed > anyway. > > Roland. > From vladimir.kozlov at oracle.com Wed Mar 21 00:30:41 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Mar 2018 17:30:41 -0700 Subject: 8193935: RFR(S): Illegal countedLoops transformation In-Reply-To: <6e239d9d-282f-769d-2f32-c1126b658175@oracle.com> References: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> <6e239d9d-282f-769d-2f32-c1126b658175@oracle.com> Message-ID: <8b320e0f-b990-22cb-43d9-13870fe279c2@oracle.com> Okay. My mistake was that I thought incr_t is type of 'stride' (in example it is '1'). Change seems good then. Can you submit performance testing to make sure we did not miss something? This is very performance sensitive code. Thanks, Vladimir On 3/20/18 4:49 AM, Nils Eliasson wrote: > Hi Vladimir, > > > On 2018-03-19 22:31, Vladimir Kozlov wrote: >> Hi Nils, >> >> Do I miss something in explanation? The check you added should be true for all normal loops: > > Yes, I need to add some proper documentation for this. > >> >> if (limit_t->_hi > incr_t->_hi) { >> ? return false; // limit might be a value that incr never can reach >> >> Fro example: >> >> for (int i = 0; i < 10; i++) {} > > This would be: > > i1 <- 0; > loop: > i2 <- phi(0 , i3) > i3 <- add(i2, 1) > cmp (i3, 10) > jlt loop > > inct_t is the type for i3 - IntType[0, 10] > limit_t is IntType[10, 10] > > My case is something like: > > void testMethod(int[] array){ > ?? int i = 0; > ? while (true) > ????? sum + = array[i]; > ????? i++; > ????? i = i && 0x7fff; > ? } > } > > In this case incr_t is the type of i at the end of the loop (or before the loop-phi at the start)- > which is IntType[0,0x7fff] > > If array is longer than 0x8000 this code should never terminate. If array is shorter, it will get an > AIOOB. > > Counted loops transform this into: > > ?? int j = 0; > ? while (true) { > ????? sum += array[j]; > ????? j++; > ? } > > Best? regards, > Nils > >> >> 10 > 1 and you will not convert this loop into counted. >> >> Thanks, >> Vladimir >> >> On 3/19/18 2:17 AM, Nils Eliasson wrote: >>> Hi, >>> >>> This bug was found in mpegaudio hiding behind the loop predication. The Counted loop >>> transformation may loose a significant truncation which changes the behaviour of the program. >>> >>> CountedLoopNode::match_incr_with_optional_truncation finds the truncation Op_AndI(0x7fff) and >>> sets trunc_t = TypeInt::CHAR. However the program does not use it for a char truncation, but a >>> accessing an array as a circular buffer. (Any other mask would have hidden this problem since >>> char truncation is the only one matched). >>> >>> A loop is succesfully matched as a countedloop, and when the trip counter is cloned it drops the >>> truncation. In the intended char-case that is ok. In this case the truncation prevents the >>> program from hitting an AIOOB. >>> >>> In the general case, if a truncated loop counter is compared to an array length (or any variable) >>> it must be provable that the array length is less than the truncation, and then the truncation >>> can be omitted. If the array length can be longer, the exit may never be taken - the loop may >>> never terminate, and a counted loop transform can not be performed. >>> >>> One additional topic of discussion is if we really want to do counted loop transformations with a >>> RangeCheck as exit point. Especially if the profiling shows that the RangeCheck never fails. In >>> the loop that fails there are multiple exits, many which are RangeChecks. >>> >>> For additional optimization opportunities we could consider rotating the loop until a normal >>> compare is the loop exit condition. >>> >>> >>> Image of significant parts of node graph (the entire loop with its multiple exits, is omitted): >>> http://cr.openjdk.java.net/~neliasso/8193935/mpegaudio.png >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8193935 >>> >>> webrev: http://cr.openjdk.java.net/~neliasso/8193935/webrev.01 >>> >>> >>> Please review, >>> >>> Nils Eliasson >>> > From vladimir.kozlov at oracle.com Wed Mar 21 00:33:59 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Mar 2018 17:33:59 -0700 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: +1 Vladimir On 3/20/18 7:03 AM, Tobias Hartmann wrote: > > On 20.03.2018 14:41, Roland Westrelin wrote: >> The problem then is that if the jtreg tests are run with non default >> arguments, the test might fail. > > Right. For simplicity, it's probably best to go with webrev.01 then. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Wed Mar 21 01:15:22 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Mar 2018 18:15:22 -0700 Subject: [11] RFR(XS) 8199896: [Graal] build Graal on all x86 platforms Message-ID: <53d0725a-ad8e-7d52-8fc5-b0c674fcbcd8@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8199896 Extend Graal build to all x64 platforms. diff -r 7fa5375fa6fd make/autoconf/hotspot.m4 --- a/make/autoconf/hotspot.m4 +++ b/make/autoconf/hotspot.m4 @@ -347,11 +347,10 @@ fi INCLUDE_GRAAL="true" else - # By default enable graal build on linux-x64 or where AOT is available. + # By default enable graal build on x64 or where AOT is available. # graal build requires jvmci. if test "x$JVM_FEATURES_jvmci" = "xjvmci" && \ - (test "x$OPENJDK_TARGET_CPU" = "xx86_64" && \ - test "x$OPENJDK_TARGET_OS" = "xlinux" || \ + (test "x$OPENJDK_TARGET_CPU" = "xx86_64" || \ test "x$ENABLE_AOT" = "xtrue") ; then AC_MSG_RESULT([yes]) JVM_FEATURES_graal="graal" Thanks, Vladimir From rwestrel at redhat.com Wed Mar 21 09:18:57 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 21 Mar 2018 10:18:57 +0100 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: References: Message-ID: > I agree with changes. But can you explain why you need additional check useblock == u_loop->_head? > Since x (clone) comes from outside loop it should be safe place it outside OuterStripMinedLoop. The outer strip mined loop is only a skeleton. It has no Phi. They are added to match those of the counted loop only after loop opts. If the use of the load is a counted loop Phi, once the outer loop phis are added, we want the load to be an input of one of those Phis. So the load must not be sunk between the counted loop and the outer strip mined loop. if (useblock == u_loop->_head && u_loop->_head->is_OuterStripMinedLoop()) { tests for that case. Roland. From nils.eliasson at oracle.com Wed Mar 21 15:00:47 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 21 Mar 2018 16:00:47 +0100 Subject: 8193935: RFR(S): Illegal countedLoops transformation In-Reply-To: <8b320e0f-b990-22cb-43d9-13870fe279c2@oracle.com> References: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> <6e239d9d-282f-769d-2f32-c1126b658175@oracle.com> <8b320e0f-b990-22cb-43d9-13870fe279c2@oracle.com> Message-ID: <8504b46a-66ac-d568-843d-19579cbfb6fb@oracle.com> I decided to make it a little more safe. Since this only happens when there is truncation, I guard the new check so it is only applied when there is truncation. Also added some comments. http://cr.openjdk.java.net/~neliasso/8193935/webrev.03 I will be running this patch together with the 8192992 fix through some extensive testing. Regards, Nils On 2018-03-21 01:30, Vladimir Kozlov wrote: > Okay. My mistake was that I thought incr_t is type of 'stride' (in > example it is '1'). Change seems good then. > > Can you submit performance testing to make sure we did not miss > something? This is very performance sensitive code. > > Thanks, > Vladimir > > On 3/20/18 4:49 AM, Nils Eliasson wrote: >> Hi Vladimir, >> >> >> On 2018-03-19 22:31, Vladimir Kozlov wrote: >>> Hi Nils, >>> >>> Do I miss something in explanation? The check you added should be >>> true for all normal loops: >> >> Yes, I need to add some proper documentation for this. >> >>> >>> if (limit_t->_hi > incr_t->_hi) { >>> return false; // limit might be a value that incr never can reach >>> >>> Fro example: >>> >>> for (int i = 0; i < 10; i++) {} >> >> This would be: >> >> i1 <- 0; >> loop: >> i2 <- phi(0 , i3) >> i3 <- add(i2, 1) >> cmp (i3, 10) >> jlt loop >> >> inct_t is the type for i3 - IntType[0, 10] >> limit_t is IntType[10, 10] >> >> My case is something like: >> >> void testMethod(int[] array){ >> int i = 0; >> while (true) >> sum + = array[i]; >> i++; >> i = i && 0x7fff; >> } >> } >> >> In this case incr_t is the type of i at the end of the loop (or >> before the loop-phi at the start)- which is IntType[0,0x7fff] >> >> If array is longer than 0x8000 this code should never terminate. If >> array is shorter, it will get an AIOOB. >> >> Counted loops transform this into: >> >> int j = 0; >> while (true) { >> sum += array[j]; >> j++; >> } >> >> Best regards, >> Nils >> >>> >>> 10 > 1 and you will not convert this loop into counted. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/19/18 2:17 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> This bug was found in mpegaudio hiding behind the loop predication. >>>> The Counted loop transformation may loose a significant truncation >>>> which changes the behaviour of the program. >>>> >>>> CountedLoopNode::match_incr_with_optional_truncation finds the >>>> truncation Op_AndI(0x7fff) and sets trunc_t = TypeInt::CHAR. >>>> However the program does not use it for a char truncation, but a >>>> accessing an array as a circular buffer. (Any other mask would have >>>> hidden this problem since char truncation is the only one matched). >>>> >>>> A loop is succesfully matched as a countedloop, and when the trip >>>> counter is cloned it drops the truncation. In the intended >>>> char-case that is ok. In this case the truncation prevents the >>>> program from hitting an AIOOB. >>>> >>>> In the general case, if a truncated loop counter is compared to an >>>> array length (or any variable) it must be provable that the array >>>> length is less than the truncation, and then the truncation can be >>>> omitted. If the array length can be longer, the exit may never be >>>> taken - the loop may never terminate, and a counted loop transform >>>> can not be performed. >>>> >>>> One additional topic of discussion is if we really want to do >>>> counted loop transformations with a RangeCheck as exit point. >>>> Especially if the profiling shows that the RangeCheck never fails. >>>> In the loop that fails there are multiple exits, many which are >>>> RangeChecks. >>>> >>>> For additional optimization opportunities we could consider >>>> rotating the loop until a normal compare is the loop exit condition. >>>> >>>> >>>> Image of significant parts of node graph (the entire loop with its >>>> multiple exits, is omitted): >>>> http://cr.openjdk.java.net/~neliasso/8193935/mpegaudio.png >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8193935 >>>> >>>> webrev: http://cr.openjdk.java.net/~neliasso/8193935/webrev.01 >>>> >>>> >>>> Please review, >>>> >>>> Nils Eliasson >>>> >> From nils.eliasson at oracle.com Wed Mar 21 15:28:25 2018 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 21 Mar 2018 16:28:25 +0100 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <1a7f06b3-bffe-cd1e-3142-0167443dbae4@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> <14a9a030-15b0-aab3-d840-3f12b80db275@oracle.com> <1a7f06b3-bffe-cd1e-3142-0167443dbae4@oracle.com> Message-ID: <46d72d36-3a9e-1c5b-3606-724d7e9e1bb1@oracle.com> This is the new solution: http://cr.openjdk.java.net/~neliasso/8192992/webrev.05 I am removing some code I found redundant. In the original code - when finding a phi we visit the phis predecessors. The problem is that we will always find the node we visited before the phi. And since no extra checks are done - this will be the new LCA. If that node is the same as the original loads memory, the LCA will always be the early block. Firstly I remove the iteration over the phis predecessors. If the phi is in a loop, we must follow edges around, just checking the immediate predecessors isn't enough. There might be a store further away. It works only because we always raise the LCA, we don' have to traverse the preds just to do it anyway. I add a check too see if the phi is belonging to a loop. This is the only case where we can find phis in the memory flow, that has predecessors that affects us. The second change is to just raise the LCA before the Phi's block. That is enough - the load is then placed before the loop. http://cr.openjdk.java.net/~neliasso/8192992/webrev.05 I will do extensive testing of this. Regards, Nils On 2018-03-16 16:37, Nils Eliasson wrote: > Hi Vladimir, > > I managed to smash right into JDK-6843752 "missing code for an > anti-dependent Phi in GCM". > > Thanks for adding that test :) > > I'll be back next week with a solution. > > Regards, > > // Nils > > > On 2018-03-14 18:42, Vladimir Kozlov wrote: >> Very good! Thank you for doing this analysis. >> Please, run our usual mach5 set of tests. >> >> Thanks, >> Vladimir >> >> On 3/14/18 3:17 AM, Nils Eliasson wrote: >>> Hi Vladimir, >>> >>> After taking a very close look I found that the anti-dependency >>> checking that hoists the testN_mem_reg from the jmpCon is broken, >>> and that the hoisting is unnecessary. So this is not a case where we >>> need anti-depenency checking for loads before matching. >>> >>> Generally the insert_anti_dependences looks good, except the >>> store->is_Phi() clause that is full of holes (overly conservative). >>> I don't think I fully understand how the graph looks when the clause >>> is needed, but it tries to find stores upwards that is otherwise >>> unreachable from the downward memory flow search. >>> >>> I found these three flaws: >>> >>> 1) A Phi in a block that is preceded by a store - even though the >>> store is dominated by the loads LCA it will force the testN up! We >>> don't check where the stores are located. >>> >>> 2) A Phi that consumes the same memory as the load may force it up, >>> even though no stores are involved. >>> >>> 3) A Phi that consumes a mergemem, which in itself has already has >>> been processed and passed as irrelevant, may force the testN up. >>> >>> One could add that any predecessor to the phi would have to be a >>> store/call to affect the load placement. >>> >>> I have also added some additional debugging printouts to the patch. >>> >>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ >>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>> >>> Regards, >>> >>> // Nils >>> >>> >>> >>> On 2018-03-05 18:02, Vladimir Kozlov wrote: >>>> Hi Nils, >>>> >>>> Yes, it is legal workaround but this way you removed all subsuming >>>> loads in code. >>>> >>>> Should we do anti-dependency check for loads during matching when >>>> shared nodes are marked?: >>>> >>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 >>>> >>>> >>>> How expensive would be that? >>>> >>>> Vladimir >>>> >>>> On 3/5/18 7:50 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> This patch is a workaround for a scheduling problem encountered in >>>>> some rare circumstances. Instead of hitting the assert we retry >>>>> the compilation without subsuming loads. >>>>> >>>>> To quote Tobias: >>>>> >>>>> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), >>>>> NULL)) is scheduled in a different block than its jmpCon user and >>>>> the register allocator tries to spill the flag register. The >>>>> problem is that PhaseCFG::schedule_late() detects an >>>>> anti-dependency for the testN_mem_reg0 on a bottom memory Phi and >>>>> therefore raises the LCA to the early block (see >>>>> PhaseCFG::insert_anti_dependences()) which is "far away" from its >>>>> jmpCon user. " >>>>> >>>>> Thanks to Roland who suggested the workaround. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>>>> >>>>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >>>>> >>>>> Regards, >>>>> >>>>> Nils >>>>> >>> > From vladimir.kozlov at oracle.com Wed Mar 21 15:42:16 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Mar 2018 08:42:16 -0700 Subject: 8193935: RFR(S): Illegal countedLoops transformation In-Reply-To: <8504b46a-66ac-d568-843d-19579cbfb6fb@oracle.com> References: <609b86db-1b0c-9fba-f7c4-0e22764ce205@oracle.com> <6e239d9d-282f-769d-2f32-c1126b658175@oracle.com> <8b320e0f-b990-22cb-43d9-13870fe279c2@oracle.com> <8504b46a-66ac-d568-843d-19579cbfb6fb@oracle.com> Message-ID: <50089DAC-FB53-4182-A22A-540CD6040103@oracle.com> Good. Thanks, Vladimir > On Mar 21, 2018, at 8:00 AM, Nils Eliasson wrote: > > I decided to make it a little more safe. Since this only happens when there is truncation, I guard the new check so it is only applied when there is truncation. > > Also added some comments. > > http://cr.openjdk.java.net/~neliasso/8193935/webrev.03 > > I will be running this patch together with the 8192992 fix through some extensive testing. > > Regards, > Nils > > On 2018-03-21 01:30, Vladimir Kozlov wrote: >> Okay. My mistake was that I thought incr_t is type of 'stride' (in example it is '1'). Change seems good then. >> >> Can you submit performance testing to make sure we did not miss something? This is very performance sensitive code. >> >> Thanks, >> Vladimir >> >> On 3/20/18 4:49 AM, Nils Eliasson wrote: >>> Hi Vladimir, >>> >>> >>> On 2018-03-19 22:31, Vladimir Kozlov wrote: >>>> Hi Nils, >>>> >>>> Do I miss something in explanation? The check you added should be true for all normal loops: >>> >>> Yes, I need to add some proper documentation for this. >>> >>>> >>>> if (limit_t->_hi > incr_t->_hi) { >>>> return false; // limit might be a value that incr never can reach >>>> >>>> Fro example: >>>> >>>> for (int i = 0; i < 10; i++) {} >>> >>> This would be: >>> >>> i1 <- 0; >>> loop: >>> i2 <- phi(0 , i3) >>> i3 <- add(i2, 1) >>> cmp (i3, 10) >>> jlt loop >>> >>> inct_t is the type for i3 - IntType[0, 10] >>> limit_t is IntType[10, 10] >>> >>> My case is something like: >>> >>> void testMethod(int[] array){ >>> int i = 0; >>> while (true) >>> sum + = array[i]; >>> i++; >>> i = i && 0x7fff; >>> } >>> } >>> >>> In this case incr_t is the type of i at the end of the loop (or before the loop-phi at the start)- which is IntType[0,0x7fff] >>> >>> If array is longer than 0x8000 this code should never terminate. If array is shorter, it will get an AIOOB. >>> >>> Counted loops transform this into: >>> >>> int j = 0; >>> while (true) { >>> sum += array[j]; >>> j++; >>> } >>> >>> Best regards, >>> Nils >>> >>>> >>>> 10 > 1 and you will not convert this loop into counted. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/19/18 2:17 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> This bug was found in mpegaudio hiding behind the loop predication. The Counted loop transformation may loose a significant truncation which changes the behaviour of the program. >>>>> >>>>> CountedLoopNode::match_incr_with_optional_truncation finds the truncation Op_AndI(0x7fff) and sets trunc_t = TypeInt::CHAR. However the program does not use it for a char truncation, but a accessing an array as a circular buffer. (Any other mask would have hidden this problem since char truncation is the only one matched). >>>>> >>>>> A loop is succesfully matched as a countedloop, and when the trip counter is cloned it drops the truncation. In the intended char-case that is ok. In this case the truncation prevents the program from hitting an AIOOB. >>>>> >>>>> In the general case, if a truncated loop counter is compared to an array length (or any variable) it must be provable that the array length is less than the truncation, and then the truncation can be omitted. If the array length can be longer, the exit may never be taken - the loop may never terminate, and a counted loop transform can not be performed. >>>>> >>>>> One additional topic of discussion is if we really want to do counted loop transformations with a RangeCheck as exit point. Especially if the profiling shows that the RangeCheck never fails. In the loop that fails there are multiple exits, many which are RangeChecks. >>>>> >>>>> For additional optimization opportunities we could consider rotating the loop until a normal compare is the loop exit condition. >>>>> >>>>> >>>>> Image of significant parts of node graph (the entire loop with its multiple exits, is omitted): http://cr.openjdk.java.net/~neliasso/8193935/mpegaudio.png >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8193935 >>>>> >>>>> webrev: http://cr.openjdk.java.net/~neliasso/8193935/webrev.01 >>>>> >>>>> >>>>> Please review, >>>>> >>>>> Nils Eliasson >>>>> >>> > From vladimir.kozlov at oracle.com Wed Mar 21 16:12:27 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Mar 2018 09:12:27 -0700 Subject: RFR(S): 8192992: Test8007294.java failed: attempted to spill a non-spillable item In-Reply-To: <46d72d36-3a9e-1c5b-3606-724d7e9e1bb1@oracle.com> References: <9249e5ae-3043-da1d-80de-fcecdabdfad2@oracle.com> <2121dc39-1f4f-d428-1de5-3e62b90e75c4@oracle.com> <14a9a030-15b0-aab3-d840-3f12b80db275@oracle.com> <1a7f06b3-bffe-cd1e-3142-0167443dbae4@oracle.com> <46d72d36-3a9e-1c5b-3606-724d7e9e1bb1@oracle.com> Message-ID: <45B9FE31-1191-4CB5-92BE-1BFF2260B9CB@oracle.com> I understand that if Phi is in loop you want to place above loop. Do I understand correctly that for regular Phis we don?t need to traverse their inputs because we will visit those inputs anyway when traversing graph? Thanks, Vladimir > On Mar 21, 2018, at 8:28 AM, Nils Eliasson wrote: > > This is the new solution: > > http://cr.openjdk.java.net/~neliasso/8192992/webrev.05 > > I am removing some code I found redundant. > > In the original code - when finding a phi we visit the phis predecessors. The problem is that we will always find the node we visited before the phi. And since no extra checks are done - this will be the new LCA. If that node is the same as the original loads memory, the LCA will always be the early block. > > Firstly I remove the iteration over the phis predecessors. If the phi is in a loop, we must follow edges around, just checking the immediate predecessors isn't enough. There might be a store further away. It works only because we always raise the LCA, we don' have to traverse the preds just to do it anyway. > > I add a check too see if the phi is belonging to a loop. This is the only case where we can find phis in the memory flow, that has predecessors that affects us. > > The second change is to just raise the LCA before the Phi's block. That is enough - the load is then placed before the loop. > > http://cr.openjdk.java.net/~neliasso/8192992/webrev.05 > > I will do extensive testing of this. > > Regards, > Nils > > > > > On 2018-03-16 16:37, Nils Eliasson wrote: >> Hi Vladimir, >> >> I managed to smash right into JDK-6843752 "missing code for an anti-dependent Phi in GCM". >> >> Thanks for adding that test :) >> >> I'll be back next week with a solution. >> >> Regards, >> >> // Nils >> >> >> On 2018-03-14 18:42, Vladimir Kozlov wrote: >>> Very good! Thank you for doing this analysis. >>> Please, run our usual mach5 set of tests. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/14/18 3:17 AM, Nils Eliasson wrote: >>>> Hi Vladimir, >>>> >>>> After taking a very close look I found that the anti-dependency checking that hoists the testN_mem_reg from the jmpCon is broken, and that the hoisting is unnecessary. So this is not a case where we need anti-depenency checking for loads before matching. >>>> >>>> Generally the insert_anti_dependences looks good, except the store->is_Phi() clause that is full of holes (overly conservative). I don't think I fully understand how the graph looks when the clause is needed, but it tries to find stores upwards that is otherwise unreachable from the downward memory flow search. >>>> >>>> I found these three flaws: >>>> >>>> 1) A Phi in a block that is preceded by a store - even though the store is dominated by the loads LCA it will force the testN up! We don't check where the stores are located. >>>> >>>> 2) A Phi that consumes the same memory as the load may force it up, even though no stores are involved. >>>> >>>> 3) A Phi that consumes a mergemem, which in itself has already has been processed and passed as irrelevant, may force the testN up. >>>> >>>> One could add that any predecessor to the phi would have to be a store/call to affect the load placement. >>>> >>>> I have also added some additional debugging printouts to the patch. >>>> >>>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.03/ >>>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>>> >>>> Regards, >>>> >>>> // Nils >>>> >>>> >>>> >>>> On 2018-03-05 18:02, Vladimir Kozlov wrote: >>>>> Hi Nils, >>>>> >>>>> Yes, it is legal workaround but this way you removed all subsuming loads in code. >>>>> >>>>> Should we do anti-dependency check for loads during matching when shared nodes are marked?: >>>>> >>>>> http://hg.openjdk.java.net/jdk/hs/file/4e82736053ae/src/hotspot/share/opto/matcher.cpp#l2159 >>>>> >>>>> How expensive would be that? >>>>> >>>>> Vladimir >>>>> >>>>> On 3/5/18 7:50 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> This patch is a workaround for a scheduling problem encountered in some rare circumstances. Instead of hitting the assert we retry the compilation without subsuming loads. >>>>>> >>>>>> To quote Tobias: >>>>>> >>>>>> "The crash happens because a testN_mem_reg0 (CmpN(LoadN(mem), NULL)) is scheduled in a different block than its jmpCon user and the register allocator tries to spill the flag register. The problem is that PhaseCFG::schedule_late() detects an anti-dependency for the testN_mem_reg0 on a bottom memory Phi and therefore raises the LCA to the early block (see PhaseCFG::insert_anti_dependences()) which is "far away" from its jmpCon user. " >>>>>> >>>>>> Thanks to Roland who suggested the workaround. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8192992 >>>>>> >>>>>> http://cr.openjdk.java.net/~neliasso/8192992/webrev.01/ >>>>>> >>>>>> Regards, >>>>>> >>>>>> Nils >>>>>> >>>> >> > From vladimir.kozlov at oracle.com Wed Mar 21 16:52:28 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Mar 2018 09:52:28 -0700 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: References: Message-ID: I am asking why not just do next?: + if (u_loop->_child) { + if (u_loop->_head->is_OuterStripMinedLoop()) { + return u_loop->_head->in(LoopNode::EntryControl); Vladimir > On Mar 21, 2018, at 2:18 AM, Roland Westrelin wrote: > > >> I agree with changes. But can you explain why you need additional check useblock == u_loop->_head? >> Since x (clone) comes from outside loop it should be safe place it outside OuterStripMinedLoop. > > The outer strip mined loop is only a skeleton. It has no Phi. They are > added to match those of the counted loop only after loop opts. If the > use of the load is a counted loop Phi, once the outer loop phis are > added, we want the load to be an input of one of those Phis. So the load > must not be sunk between the counted loop and the outer strip mined > loop. > > if (useblock == u_loop->_head && u_loop->_head->is_OuterStripMinedLoop()) { > > tests for that case. > > Roland. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Mar 21 16:53:10 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 21 Mar 2018 17:53:10 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: References: Message-ID: > I'll test it for you. Thanks. Roland. From rwestrel at redhat.com Wed Mar 21 17:01:56 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 21 Mar 2018 18:01:56 +0100 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: References: Message-ID: > I am asking why not just do next?: > + if (u_loop->_child) { > + if (u_loop->_head->is_OuterStripMinedLoop()) { > + return u_loop->_head->in(LoopNode::EntryControl); That would cover the case where the load is moved from inside the strip mined counted loop to the outer loop, that is after the counted loop end. Roland. From vladimir.kozlov at oracle.com Wed Mar 21 17:46:45 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Mar 2018 10:46:45 -0700 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: References: Message-ID: <75CED442-A1C6-4458-8E75-A79E81918CD5@oracle.com> Okay. I thought you will have only Safepoint in outer loop and will be filtered out in caller. But thinking again - you can have other nodes sinked from strip mined inner loop there which could be users or some Phi nodes. But on other hand you missing case when user in a second strip mined loop followed first. Anyway, your code is conservative and good. Reviewed. It has to be tested more then what submit_hs provides. I will start testing. Thanks, Vladimir > On Mar 21, 2018, at 10:01 AM, Roland Westrelin wrote: > > >> I am asking why not just do next?: >> + if (u_loop->_child) { >> + if (u_loop->_head->is_OuterStripMinedLoop()) { >> + return u_loop->_head->in(LoopNode::EntryControl); > > That would cover the case where the load is moved from inside the strip > mined counted loop to the outer loop, that is after the counted > loop end. > > Roland. From rwestrel at redhat.com Wed Mar 21 19:24:56 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 21 Mar 2018 20:24:56 +0100 Subject: RFR(XS): 8196294: LoopStripMiningIterShortLoop is set to zero by default In-Reply-To: References: Message-ID: Thanks for the reviews, Aleksey, Tobias and Vladimir. Roland. From dean.long at oracle.com Wed Mar 21 19:37:54 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 21 Mar 2018 12:37:54 -0700 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: References: Message-ID: <8c14035d-686b-d753-6753-4ad3c21b3430@oracle.com> Testing is done.? Your fix is good. On 3/21/18 9:53 AM, Roland Westrelin wrote: >> I'll test it for you. > Thanks. > > Roland. From ekaterina.pavlova at oracle.com Wed Mar 21 19:56:46 2018 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 21 Mar 2018 12:56:46 -0700 Subject: RFR (XXS) : 8200071: test/hotspot/jtreg/ProblemList-graal.txt Message-ID: <4e97b6fe-db03-45a0-d7a2-f26cba8aabe3@oracle.com> Hi all, please review this small change which fixes misprint in test/hotspot/jtreg/ProblemList-graal.txt. Also deleted one test record for the bug which has been fixed and added new one for recently filed new issue. JBS: https://bugs.openjdk.java.net/browse/JDK-8200071 webrev: http://cr.openjdk.java.net/~epavlova//8200071/webrev.00/index.html Thanks, -katya p.s. Igor Ignatyev volunteered to sponsor this change. From igor.ignatyev at oracle.com Wed Mar 21 20:21:40 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 21 Mar 2018 13:21:40 -0700 Subject: RFR (XXS) : 8200071: test/hotspot/jtreg/ProblemList-graal.txt In-Reply-To: <4e97b6fe-db03-45a0-d7a2-f26cba8aabe3@oracle.com> References: <4e97b6fe-db03-45a0-d7a2-f26cba8aabe3@oracle.com> Message-ID: the patch looks good. Thanks, -- Igor > On Mar 21, 2018, at 12:56 PM, Ekaterina Pavlova wrote: > > Hi all, > > please review this small change which fixes misprint in test/hotspot/jtreg/ProblemList-graal.txt. > Also deleted one test record for the bug which has been fixed and added new one for recently filed new issue. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8200071 > webrev: http://cr.openjdk.java.net/~epavlova//8200071/webrev.00/index.html > > Thanks, > -katya > > p.s. > Igor Ignatyev volunteered to sponsor this change. From vladimir.kozlov at oracle.com Wed Mar 21 22:36:11 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Mar 2018 15:36:11 -0700 Subject: RFR(S): 8198756: Limit number of compiler threads for small code cache In-Reply-To: <0597e4be52684e168ba30adc2ad84b7b@sap.com> References: <8c77470008d343f19d032de8f48f1bbb@sap.com> <0597e4be52684e168ba30adc2ad84b7b@sap.com> Message-ID: <8e054d5d-1092-c20b-a93e-d964a5d97954@oracle.com> I hijacked and change Subject of this RFE to implement dynamic allocation of compiler threads. I wrote proposal in FRE's comment. Please, look. It is still assigned to Martin ;) but we can take ownership when we finalize design. Thanks, Vladimir On 3/2/18 2:28 AM, Doerr, Martin wrote: > Hi Derek, Igor and Vladimir, > > thanks for all replies. > > I agree with that it would be good to have something like > UseDynamicNumberOfGCThreads. > > Btw. I have recently requested to activate that one by default > (JDK-8198547). > > If we can?t get it for jdk11, I?d like at least to make it easier for > customers to save memory without explicitly setting CICompilerCount. > > Best regards, > > Martin > > *From:*White, Derek [mailto:Derek.White at cavium.com] > *Sent:* Freitag, 2. M?rz 2018 04:25 > *To:* Igor Veresov ; Doerr, Martin > > *Cc:* Vladimir Kozlov ; > hotspot-compiler-dev at openjdk.java.net > *Subject:* RE: RFR(S): 8198756: Limit number of compiler threads for > small code cache > > Hi Igor, Martin, > > Just to throw out some other user experience: > > I?m typically running on machines with 98 to 224 CPUs. It?s not the case > that **every** Java app needs to use all the CPUs for compiler threads. > The JVM may not be the only JVM running on the system (Hadoop, > microservices, etc), let alone the only important app on the system. > > Historically the GC threads have been the worst offenders in this > regard. The GC thread?s ?scaling factor? is much higher than the > compiler thread?s scaling factor. But with options like > UseDynamicNumberOfGCThreads, the GC tries to adjust the number of GC > threads to the work to be done. I think it?s important that the JVM > figure out how to scale the number of compiler threads as well. > > I won?t claim that Martin?s scheme is the best approach, or that it > should be on by default, but unless a better solution is going into JDK > 11, I?d support this scheme as an experimental flag. FWIW. > > * Derek White, Cavium (Purveyor of fine 224 cpu systems for the > discerning developer). > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of > *Igor Veresov > *Sent:* Thursday, March 01, 2018 7:46 PM > *To:* Doerr, Martin > > *Cc:* Vladimir Kozlov >; > hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8198756: Limit number of compiler threads for > small code cache > > Doerr, > > I think the optimal number of compiler threads is such that it keeps the > length of the compiler queues as minimal. During startup typically the > optimal number of compiler threads is equal to the number of the CPUs, > may be even more than that considering threads a either C1 or C2 and > compiles typically happen in waves using one and then the other. The > fact that some users see code cache filling slower with fewer threads is > just an indication of how huge their compile queues are, and this is > certainly not good for startup. The problem of resource holding is real, > since after startup we don?t need that many threads (unless you?re > running something that does dynamic code generation). Perhaps the > solution to all of this is having a dynamic pool of compiler threads > that could expand/shrink depending on the load (the length of the > compile queues). > > igor > > On Mar 1, 2018, at 12:31 AM, Doerr, Martin > wrote: > > Hi Igor, > > we observed that the compiler threads fill up the code cache faster > than the sweeper can clean when using a small code cache. > > This doesn't seem beneficial at all. > > Some customers try to save memory by using a very small code cache. > It's very annoying that so much memory gets wasted for such a large > number of idle compiler threads which hold their arenas etc. > > Maybe the current formula was optimized for a special scenario with > many slow cores? Maybe SPARC Niagara? > > Shouldn't such scenarios use a large code cache? Maybe much more > than 240MB? > > Best regards, > > Martin > > *From:*Igor Veresov [mailto:igor.veresov at oracle.com] > *Sent:*Donnerstag, 1. M?rz 2018 08:05 > *To:*Vladimir Kozlov > > *Cc:*Doerr, Martin >; > hotspot-compiler-dev at openjdk.java.net > > *Subject:*Re: RFR(S): 8198756: Limit number of compiler threads for > small code cache > > I?m curious about the rationale for tying the number of thread to > the size of the code cache. Is it because you don?t want them to > keep holding the space for their code buffers when they are idle? > > igor > > > > On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov > > > wrote: > > Hi Doerr, > > The problem with your proposal is that we don't use scale number > of compiler threads when we have a lot of cpus (>1000 on big > "slow" machines). > By default for tiered compilation we have 240Mb for CodeCache. > With your formula we always will have 7 threads (2 C1 and 5 C2) > which could be fine if machine has < total 32 procs/threads. But > for big machines it may be bottleneck for JIT compilation > intensive applications (and for startup when most JIT > compilations happened). > > Main motivation of current approach was to reach peak > performance (c2 compilations) as fast as possible. What we > usually observed before is large compilation queue for C2 > compilation because slow throughput of C2. It was especially > visible with tiered compilation when compilation thresholds > reached faster with first tier compiled profiling code. > > And I agree that we may have problem with number of compiler > threads at the beginning of graph (< 32 cpu threads) when the > number grows too fast: > > Graph for3*?log2(?x)*?log2(?log2(?x))/?2 > > -60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: > 32.0711217y: 17.4325495 > > > > May be we should have a formula which takes into account code > cache size and number of cpu threads. > > Igor Veresov was original developer of current formula. It would > be nice to hear his opinion. > > Thanks, > Vladimir > > On 2/27/18 8:10 AM, Doerr, Martin wrote: > > Hi, > > the VM currently starts a large amount of compiler threads > on systems with many CPUs regardless of the code cache size. > > This doesn't make sense for very small code cache sizes. > > The dynamically determined number of compiler threads can be > observed by: > > jdk/bin/java -XX:ReservedCodeCacheSize=128m > -XX:+PrintFlagsFinal -version|grep CICompiler > > I suggest not to use more than 1 compiler thread per 32MB of > code cache: > > http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ > > > This seems to be conservative. > > Please review and let me know if you have a different > limitation proposal. > > Best regards, > > Martin > From vladimir.kozlov at oracle.com Thu Mar 22 01:04:51 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Mar 2018 18:04:51 -0700 Subject: [11] RFR(XS) 8199896: [Graal] build Graal on all x86 platforms In-Reply-To: <53d0725a-ad8e-7d52-8fc5-b0c674fcbcd8@oracle.com> References: <53d0725a-ad8e-7d52-8fc5-b0c674fcbcd8@oracle.com> Message-ID: <18b854dc-d9a1-e1d0-7d80-2e9509cb0352@oracle.com> Forgot to CC to build-dev. Note, this change does not have effect unless you disable AOT build. We are already building Graal on all x64 platforms as part of AOT build currently. Vladimir On 3/20/18 6:15 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8199896 > > Extend Graal build to all x64 platforms. > > diff -r 7fa5375fa6fd make/autoconf/hotspot.m4 > --- a/make/autoconf/hotspot.m4 > +++ b/make/autoconf/hotspot.m4 > @@ -347,11 +347,10 @@ > ???? fi > ???? INCLUDE_GRAAL="true" > ?? else > -??? # By default enable graal build on linux-x64 or where AOT is available. > +??? # By default enable graal build on x64 or where AOT is available. > ???? # graal build requires jvmci. > ???? if test "x$JVM_FEATURES_jvmci" = "xjvmci" && \ > -??????? (test "x$OPENJDK_TARGET_CPU" = "xx86_64" && \ > -???????? test "x$OPENJDK_TARGET_OS" = "xlinux" || \ > +??????? (test "x$OPENJDK_TARGET_CPU" = "xx86_64" || \ > ???????? test "x$ENABLE_AOT" = "xtrue") ; then > ?????? AC_MSG_RESULT([yes]) > ?????? JVM_FEATURES_graal="graal" > > Thanks, > Vladimir From leonid.mesnik at oracle.com Thu Mar 22 02:50:30 2018 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 21 Mar 2018 19:50:30 -0700 Subject: RFR(XS): 8200091: [TESTBUG] Update jittester for jdk11 Message-ID: <50E40630-3030-4CA8-AD1B-0F3C3D6391E4@oracle.com> Hi Could you please review this small fix which fix jittester make/property files to work correctly in current jdk/hs repo. Also this fix add 2 groups so aot and jit tests could be separated easily. Tested by running ?make install? and running tests for jit/aot groups (OEL only). webrev: http://cr.openjdk.java.net/~lmesnik/8200091/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8200091 Leonid -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Thu Mar 22 03:32:16 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 21 Mar 2018 20:32:16 -0700 Subject: RFR(XS): 8200091: [TESTBUG] Update jittester for jdk11 In-Reply-To: <50E40630-3030-4CA8-AD1B-0F3C3D6391E4@oracle.com> References: <50E40630-3030-4CA8-AD1B-0F3C3D6391E4@oracle.com> Message-ID: <99A7266A-FCE3-4E71-9A1C-6DFE04AD2E69@oracle.com> Leonid, I didn't get why System::runFinalizersOnExit has to be excluded, otherwise the fix looks good. Thanks, -- Igor > On Mar 21, 2018, at 7:50 PM, Leonid Mesnik wrote: > > Hi > > Could you please review this small fix which fix jittester make/property files to work correctly in current jdk/hs repo. > Also this fix add 2 groups so aot and jit tests could be separated easily. > > Tested by running ?make install? and running tests for jit/aot groups (OEL only). > > webrev: http://cr.openjdk.java.net/~lmesnik/8200091/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8200091 > > Leonid -------------- next part -------------- An HTML attachment was scrubbed... URL: From leonid.mesnik at oracle.com Thu Mar 22 03:45:47 2018 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 21 Mar 2018 20:45:47 -0700 Subject: RFR(XS): 8200091: [TESTBUG] Update jittester for jdk11 In-Reply-To: <99A7266A-FCE3-4E71-9A1C-6DFE04AD2E69@oracle.com> References: <50E40630-3030-4CA8-AD1B-0F3C3D6391E4@oracle.com> <99A7266A-FCE3-4E71-9A1C-6DFE04AD2E69@oracle.com> Message-ID: <966A7EA9-C0E4-4DD9-8C04-064A5C535082@oracle.com> It was deprecated and finally removed by http://hg.openjdk.java.net/jdk/jdk/rev/a6c4b85163c1 Leonid > On Mar 21, 2018, at 8:32 PM, Igor Ignatyev wrote: > > Leonid, > > I didn't get why System::runFinalizersOnExit has to be excluded, otherwise the fix looks good. > > Thanks, > -- Igor > >> On Mar 21, 2018, at 7:50 PM, Leonid Mesnik > wrote: >> >> Hi >> >> Could you please review this small fix which fix jittester make/property files to work correctly in current jdk/hs repo. >> Also this fix add 2 groups so aot and jit tests could be separated easily. >> >> Tested by running ?make install? and running tests for jit/aot groups (OEL only). >> >> webrev: http://cr.openjdk.java.net/~lmesnik/8200091/webrev.00/ >> bug: https://bugs.openjdk.java.net/browse/JDK-8200091 >> >> Leonid > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Thu Mar 22 03:50:16 2018 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 21 Mar 2018 20:50:16 -0700 Subject: RFR(XS): 8200091: [TESTBUG] Update jittester for jdk11 In-Reply-To: <966A7EA9-C0E4-4DD9-8C04-064A5C535082@oracle.com> References: <50E40630-3030-4CA8-AD1B-0F3C3D6391E4@oracle.com> <99A7266A-FCE3-4E71-9A1C-6DFE04AD2E69@oracle.com> <966A7EA9-C0E4-4DD9-8C04-064A5C535082@oracle.com> Message-ID: if it's removed you shouldn't list it at all as you are supposed to use the same/similar JDK (at least from libs point of view) for test generating as for test execution. -- Igor > On Mar 21, 2018, at 8:45 PM, Leonid Mesnik wrote: > > It was deprecated and finally removed by > http://hg.openjdk.java.net/jdk/jdk/rev/a6c4b85163c1 > > Leonid > >> On Mar 21, 2018, at 8:32 PM, Igor Ignatyev > wrote: >> >> Leonid, >> >> I didn't get why System::runFinalizersOnExit has to be excluded, otherwise the fix looks good. >> >> Thanks, >> -- Igor >> >>> On Mar 21, 2018, at 7:50 PM, Leonid Mesnik > wrote: >>> >>> Hi >>> >>> Could you please review this small fix which fix jittester make/property files to work correctly in current jdk/hs repo. >>> Also this fix add 2 groups so aot and jit tests could be separated easily. >>> >>> Tested by running ?make install? and running tests for jit/aot groups (OEL only). >>> >>> webrev: http://cr.openjdk.java.net/~lmesnik/8200091/webrev.00/ >>> bug: https://bugs.openjdk.java.net/browse/JDK-8200091 >>> >>> Leonid >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leonid.mesnik at oracle.com Thu Mar 22 04:11:10 2018 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 21 Mar 2018 21:11:10 -0700 Subject: RFR(XS): 8200091: [TESTBUG] Update jittester for jdk11 In-Reply-To: References: <50E40630-3030-4CA8-AD1B-0F3C3D6391E4@oracle.com> <99A7266A-FCE3-4E71-9A1C-6DFE04AD2E69@oracle.com> <966A7EA9-C0E4-4DD9-8C04-064A5C535082@oracle.com> Message-ID: <65AE33BF-52ED-4666-8F40-848FD275CCDB@oracle.com> I see. Then I just left exlude list unchanged. Leonid > On Mar 21, 2018, at 8:50 PM, Igor Ignatyev wrote: > > if it's removed you shouldn't list it at all as you are supposed to use the same/similar JDK (at least from libs point of view) for test generating as for test execution. > -- Igor >> On Mar 21, 2018, at 8:45 PM, Leonid Mesnik > wrote: >> >> It was deprecated and finally removed by >> http://hg.openjdk.java.net/jdk/jdk/rev/a6c4b85163c1 >> >> Leonid >> >>> On Mar 21, 2018, at 8:32 PM, Igor Ignatyev > wrote: >>> >>> Leonid, >>> >>> I didn't get why System::runFinalizersOnExit has to be excluded, otherwise the fix looks good. >>> >>> Thanks, >>> -- Igor >>> >>>> On Mar 21, 2018, at 7:50 PM, Leonid Mesnik > wrote: >>>> >>>> Hi >>>> >>>> Could you please review this small fix which fix jittester make/property files to work correctly in current jdk/hs repo. >>>> Also this fix add 2 groups so aot and jit tests could be separated easily. >>>> >>>> Tested by running ?make install? and running tests for jit/aot groups (OEL only). >>>> >>>> webrev: http://cr.openjdk.java.net/~lmesnik/8200091/webrev.00/ >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8200091 >>>> >>>> Leonid >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Thu Mar 22 08:03:21 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Mar 2018 09:03:21 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: <8c14035d-686b-d753-6753-4ad3c21b3430@oracle.com> References: <8c14035d-686b-d753-6753-4ad3c21b3430@oracle.com> Message-ID: <4f61db3b-a74a-3ac8-4a5c-374d8259b3a3@oracle.com> Looks good to me too. Feel free to push after testing through the submit repo passed (see [1]). Best regards, Tobias [1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-March/030656.html On 21.03.2018 20:37, dean.long at oracle.com wrote: > Testing is done.? Your fix is good. > > On 3/21/18 9:53 AM, Roland Westrelin wrote: >>> I'll test it for you. >> Thanks. >> >> Roland. > From tobias.hartmann at oracle.com Thu Mar 22 08:27:56 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Mar 2018 09:27:56 +0100 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> Message-ID: <509e600e-df4d-2f29-abeb-5b5ceb92647b@oracle.com> Hi Lutz, On 20.03.2018 23:01, Vladimir Kozlov wrote: > I think to have "micro locking" (with comments) in print_heapinfo() is better then to have lock in > each print function. Yes, I think so too. Best regards, Tobias From lutz.schmidt at sap.com Thu Mar 22 08:38:41 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 22 Mar 2018 08:38:41 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <509e600e-df4d-2f29-abeb-5b5ceb92647b@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <509e600e-df4d-2f29-abeb-5b5ceb92647b@oracle.com> Message-ID: <7272D938-AC93-4631-B59D-6B5BF55DFD1C@sap.com> Hi Tobias, Vladimir, I have a new webrev available, but have to request some more patience. Our nightly tests did not run last night. I don't want to send the RFR update around with zero advance test coverage. I have misunderstood Vladimir's comment in the first place. Based on that, I implemented what you would probably call "nano locking". It's not too complicated. Maybe you just want to have a look. If you don't like it, I can acquire the tty_lock before calling each individual print function. Please stay tuned! Lutz ?On 22.03.18, 09:27, "Tobias Hartmann" wrote: Hi Lutz, On 20.03.2018 23:01, Vladimir Kozlov wrote: > I think to have "micro locking" (with comments) in print_heapinfo() is better then to have lock in > each print function. Yes, I think so too. Best regards, Tobias From rwestrel at redhat.com Thu Mar 22 08:43:11 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 22 Mar 2018 09:43:11 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: <4f61db3b-a74a-3ac8-4a5c-374d8259b3a3@oracle.com> References: <8c14035d-686b-d753-6753-4ad3c21b3430@oracle.com> <4f61db3b-a74a-3ac8-4a5c-374d8259b3a3@oracle.com> Message-ID: > Looks good to me too. Feel free to push after testing through the submit repo passed (see [1]). Ok. Thanks for the reviews. Roland. From rwestrel at redhat.com Thu Mar 22 08:45:48 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 22 Mar 2018 09:45:48 +0100 Subject: RFR(S): 8199784: PhaseIdealLoop::place_near_use() might return wrong control with loop strip mining In-Reply-To: <75CED442-A1C6-4458-8E75-A79E81918CD5@oracle.com> References: <75CED442-A1C6-4458-8E75-A79E81918CD5@oracle.com> Message-ID: > It has to be tested more then what submit_hs provides. I will start testing. Thanks for taking care of that. Roland. From per.liden at oracle.com Thu Mar 22 10:07:46 2018 From: per.liden at oracle.com (Per Liden) Date: Thu, 22 Mar 2018 11:07:46 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: References: Message-ID: Hi, On 03/20/2018 03:46 PM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8197931/webrev.00/ > > I need confirmation that this does fix the problem as I can't test it > myself. It seems this patch only adds a number of asserts, so I'm curious as to how this avoids the NULL pointer issue? Am I missing something here? cheers, Per > > Roland. > From rwestrel at redhat.com Thu Mar 22 10:53:13 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 22 Mar 2018 11:53:13 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: References: Message-ID: Hi Per, > It seems this patch only adds a number of asserts, so I'm curious as to > how this avoids the NULL pointer issue? Am I missing something here? There's no NULL pointer issue. The asserts were added to appease a static code analysis tool (parfait). Roland. From per.liden at oracle.com Thu Mar 22 11:14:59 2018 From: per.liden at oracle.com (Per Liden) Date: Thu, 22 Mar 2018 12:14:59 +0100 Subject: RFC: Remove DONT_USE_REGISTER_DEFINES on Sparc? Message-ID: <904ad73a-14d6-7d37-4e53-bfeeeb25a709@oracle.com> Hi, We recently ran into an unfortunate naming conflict, concerning "G1" the garbage collector vs. "G1" the sparc register. We'd like to be able to use "G1" as an enum value in various GC code, but register_sparc.hpp defines "G1" as a macro, which obviously breaks things. We're very reluctant to sprinkling #define DONT_USE_REGISTER_DEFINES in GC code. An alternative would be to simply remove this optimization in the sparc code. The comment in register_sparc.hpp suggests that this was done to reduce the size of libjvm.so. I applied a patch[1] to remove the sparc macros and libjvm.so grew by ~0.3% (66450K->66682K). Builds available here[2] and here[3]. Given that the libjvm.so growth doesn't seem that bad, would people be ok with removing the register defines on sparc? If so I'll file a bug and send out the patch for formal review. The patch currently removes all register defines. There are of course alternatives here in case someone things the libjvm growth is unacceptable, like only remove general register, only G* registers, etc. /Per [1] http://cr.openjdk.java.net/~pliden/remove_G1_define_on_sparc/webrev.0 [2] https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0935410.per.liden.hs/bundles/solaris-sparcv9/ [3] https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0952297.per.liden.hs/bundles/solaris-sparcv9/ From per.liden at oracle.com Thu Mar 22 11:17:39 2018 From: per.liden at oracle.com (Per Liden) Date: Thu, 22 Mar 2018 12:17:39 +0100 Subject: RFR(XS): 8197931: Null pointer dereference in Unique_Node_List::push of node.hpp:1510 In-Reply-To: References: Message-ID: On 03/22/2018 11:53 AM, Roland Westrelin wrote: > > Hi Per, > >> It seems this patch only adds a number of asserts, so I'm curious as to >> how this avoids the NULL pointer issue? Am I missing something here? > > There's no NULL pointer issue. The asserts were added to appease a static > code analysis tool (parfait). Ah, that explains it! Thanks! /Per > > Roland. > From erik.osterlund at oracle.com Thu Mar 22 13:39:14 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 22 Mar 2018 14:39:14 +0100 Subject: RFC: Remove DONT_USE_REGISTER_DEFINES on Sparc? In-Reply-To: <904ad73a-14d6-7d37-4e53-bfeeeb25a709@oracle.com> References: <904ad73a-14d6-7d37-4e53-bfeeeb25a709@oracle.com> Message-ID: <5AB3B202.5090609@oracle.com> Hi Per, I welcome this change. Stealing the identifier G1 from the global name space seems like a problem. Inflating the libjvm.so size by ~0.3% on a SPARC machine does not seem like a problem. Thanks, /Erik On 2018-03-22 12:14, Per Liden wrote: > Hi, > > We recently ran into an unfortunate naming conflict, concerning "G1" > the garbage collector vs. "G1" the sparc register. We'd like to be > able to use "G1" as an enum value in various GC code, but > register_sparc.hpp defines "G1" as a macro, which obviously breaks > things. > > We're very reluctant to sprinkling #define DONT_USE_REGISTER_DEFINES > in GC code. An alternative would be to simply remove this optimization > in the sparc code. The comment in register_sparc.hpp suggests that > this was done to reduce the size of libjvm.so. > > I applied a patch[1] to remove the sparc macros and libjvm.so grew by > ~0.3% (66450K->66682K). Builds available here[2] and here[3]. > > Given that the libjvm.so growth doesn't seem that bad, would people be > ok with removing the register defines on sparc? If so I'll file a bug > and send out the patch for formal review. > > The patch currently removes all register defines. There are of course > alternatives here in case someone things the libjvm growth is > unacceptable, like only remove general register, only G* registers, etc. > > /Per > > [1] http://cr.openjdk.java.net/~pliden/remove_G1_define_on_sparc/webrev.0 > > [2] > https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0935410.per.liden.hs/bundles/solaris-sparcv9/ > > [3] > https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0952297.per.liden.hs/bundles/solaris-sparcv9/ From erik.joelsson at oracle.com Thu Mar 22 14:07:38 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Thu, 22 Mar 2018 07:07:38 -0700 Subject: [11] RFR(XS) 8199896: [Graal] build Graal on all x86 platforms In-Reply-To: <18b854dc-d9a1-e1d0-7d80-2e9509cb0352@oracle.com> References: <53d0725a-ad8e-7d52-8fc5-b0c674fcbcd8@oracle.com> <18b854dc-d9a1-e1d0-7d80-2e9509cb0352@oracle.com> Message-ID: <20a5339f-f765-008d-841a-734d32f51f72@oracle.com> Looks good. /Erik On 2018-03-21 18:04, Vladimir Kozlov wrote: > Forgot to CC to build-dev. > > Note, this change does not have effect unless you disable AOT build. > We are already building Graal on all x64 platforms as part of AOT > build currently. > > Vladimir > > On 3/20/18 6:15 PM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8199896 >> >> Extend Graal build to all x64 platforms. >> >> diff -r 7fa5375fa6fd make/autoconf/hotspot.m4 >> --- a/make/autoconf/hotspot.m4 >> +++ b/make/autoconf/hotspot.m4 >> @@ -347,11 +347,10 @@ >> ????? fi >> ????? INCLUDE_GRAAL="true" >> ??? else >> -??? # By default enable graal build on linux-x64 or where AOT is >> available. >> +??? # By default enable graal build on x64 or where AOT is available. >> ????? # graal build requires jvmci. >> ????? if test "x$JVM_FEATURES_jvmci" = "xjvmci" && \ >> -??????? (test "x$OPENJDK_TARGET_CPU" = "xx86_64" && \ >> -???????? test "x$OPENJDK_TARGET_OS" = "xlinux" || \ >> +??????? (test "x$OPENJDK_TARGET_CPU" = "xx86_64" || \ >> ????????? test "x$ENABLE_AOT" = "xtrue") ; then >> ??????? AC_MSG_RESULT([yes]) >> ??????? JVM_FEATURES_graal="graal" >> >> Thanks, >> Vladimir From lesliezhai at llvm.org.cn Thu Mar 22 16:32:13 2018 From: lesliezhai at llvm.org.cn (Leslie Zhai) Date: Fri, 23 Mar 2018 00:32:13 +0800 Subject: How to use gdb to debug C1 compiler's internal error? Message-ID: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn>+6D3E180D40322A1E Hi HotSpot compiler developers, I am new to HotSpot C1 compiler, and I am trying to implement a new greedy register allocation skeleton for academy research, but might wrongly modified some code, for example, `Runtime1::generate_handle_exception` in jdk/src/hotspot/cpu/x86/c1_Runtime1_x86.cpp, then `install_code` failed to work and thrown such internal error: ... [Stub Code] ? 0x00007fffe13752a0: mov??? $0x0,%rbx????????? ;?? {no_reloc} ? 0x00007fffe13752aa: jmpq?? 0x00007fffe13752aa? ; {runtime_call} [Exception Handler] ? 0x00007fffe13752af: jmpq?? 0x00007fffe1004ee0? ; {runtime_call} [Deopt Handler Code] ? 0x00007fffe13752b4: callq? 0x00007fffe13752b9 ? 0x00007fffe13752b9: subq?? $0x5,(%rsp) ? 0x00007fffe13752be: jmpq?? 0x00007fffe11072e0? ; {runtime_call} ? 0x00007fffe13752c3: hlt ? 0x00007fffe13752c4: hlt ? 0x00007fffe13752c5: hlt ? 0x00007fffe13752c6: hlt ? 0x00007fffe13752c7: hlt Decoding compiled method 0x00007fffe136d310: Code: [Entry Point] ? # {method} {0x00007fffe015e0e0} 'fillInStackTrace' '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable' ? # this:???? rsi:rsi?? = 'java/lang/Throwable' ? # parm0:??? rdx?????? = int ? #?????????? [sp+0x50]? (sp of caller) ? 0x00007fffe136d4a0: mov??? 0x8(%rsi),%r10d ? 0x00007fffe136d4a4: shl??? $0x3,%r10 ? 0x00007fffe136d4a8: cmp??? %r10,%rax ? 0x00007fffe136d4ab: je???? 0x00007fffe136d4b8 ? 0x00007fffe136d4b1: jmpq?? 0x00007fffe1105c40? ; {runtime_call} ? 0x00007fffe136d4b6: nop ? 0x00007fffe136d4b7: nop [Verified Entry Point] ? 0x00007fffe136d4b8: mov??? %eax,-0x16000(%rsp) ? 0x00007fffe136d4bf: push?? %rbp ? 0x00007fffe136d4c0: mov??? %rsp,%rbp ? 0x00007fffe136d4c3: sub??? $0x40,%rsp ? 0x00007fffe136d4c7: mov??? %rsp,%rax ? 0x00007fffe136d4ca: and??? $0xfffffffffffffff0,%rax ? 0x00007fffe136d4ce: cmp??? %rsp,%rax ? 0x00007fffe136d4d1: je???? 0x00007fffe136d54e ? 0x00007fffe136d4d7: mov??? %rsp,-0x28(%rsp) ? 0x00007fffe136d4dc: sub??? $0x80,%rsp ? 0x00007fffe136d4e3: mov??? %rax,0x78(%rsp) ? 0x00007fffe136d4e8: mov??? %rcx,0x70(%rsp) ? 0x00007fffe136d4ed: mov??? %rdx,0x68(%rsp) ? 0x00007fffe136d4f2: mov??? %rbx,0x60(%rsp) ? 0x00007fffe136d4f7: mov??? %rbp,0x50(%rsp) ? 0x00007fffe136d4fc: mov??? %rsi,0x48(%rsp) ? 0x00007fffe136d501: mov??? %rdi,0x40(%rsp) ? 0x00007fffe136d506: mov??? %r8,0x38(%rsp) ? 0x00007fffe136d50b: mov??? %r9,0x30(%rsp) ? 0x00007fffe136d510: mov??? %r10,0x28(%rsp) ? 0x00007fffe136d515: mov??? %r11,0x20(%rsp) ? 0x00007fffe136d51a: mov??? %r12,0x18(%rsp) ? 0x00007fffe136d51f: mov??? %r13,0x10(%rsp) ? 0x00007fffe136d524: mov??? %r14,0x8(%rsp) ? 0x00007fffe136d529: mov??? %r15,(%rsp) ? 0x00007fffe136d52d: mov??? $0x7ffff6dbea09,%rdi? ; {external_word} ? 0x00007fffe136d537: mov??? $0x7fffe136d4d7,%rsi? ; {internal_word} ? 0x00007fffe136d541: mov??? %rsp,%rdx ? 0x00007fffe136d544: and??? $0xfffffffffffffff0,%rsp ? 0x00007fffe136d548: callq? 0x00007ffff68211ee? ; {runtime_call} ? 0x00007fffe136d54d: hlt ? ;; move 1 -> 2 ? ;; move 0 -> 1 ? 0x00007fffe136d54e: mov??? %rsi,(%rsp) ? 0x00007fffe136d552: cmp??? $0x0,%rsi ? 0x00007fffe136d556: lea??? (%rsp),%rsi ? 0x00007fffe136d55a: cmove? (%rsp),%rsi??????? ; OopMap{[0]=Oop off=191} ? 0x00007fffe136d55f: mov??? $0x7fffe136d55f,%r10? ; {section_word} ? 0x00007fffe136d569: mov??? %r10,0x208(%r15) ? 0x00007fffe136d570: mov??? %rsp,0x200(%r15) ? 0x00007fffe136d577: cmpb?? $0x0,0x1602de2c(%rip)??????? # 0x00007ffff739b3aa ??????????????????????????????????????????????? ; {external_word} ? 0x00007fffe136d57e: je???? 0x00007fffe136d5b8 ? 0x00007fffe136d584: push?? %rsi ? 0x00007fffe136d585: push?? %rdx ? 0x00007fffe136d586: mov??? $0x7fffe015e0e0,%rsi? ; {metadata({method} {0x00007fffe015e0e0} 'fillInStackTrace' '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable')} ? 0x00007fffe136d590: mov??? %r15,%rdi ? 0x00007fffe136d593: test?? $0xf,%esp ? 0x00007fffe136d599: je???? 0x00007fffe136d5b1 ? 0x00007fffe136d59f: sub??? $0x8,%rsp ? 0x00007fffe136d5a3: callq? 0x00007ffff69c48ae? ; {runtime_call} ? 0x00007fffe136d5a8: add??? $0x8,%rsp ? 0x00007fffe136d5ac: jmpq?? 0x00007fffe136d5b6 ? 0x00007fffe136d5b1: callq? 0x00007ffff69c48ae? ; {runtime_call} ? 0x00007fffe136d5b6: pop??? %rdx ? 0x00007fffe136d5b7: pop??? %rsi ? 0x00007fffe136d5b8: lea??? 0x220(%r15),%rdi ? 0x00007fffe136d5bf: movl?? $0x4,0x298(%r15) ? 0x00007fffe136d5ca: callq? 0x00007ffff4f55fef? ; {runtime_call} ? 0x00007fffe136d5cf: vzeroupper ? 0x00007fffe136d5d2: movl?? $0x5,0x298(%r15) ? 0x00007fffe136d5dd: mov??? %r15d,%ecx ? 0x00007fffe136d5e0: shr??? $0x4,%ecx ? 0x00007fffe136d5e3: and??? $0xffc,%ecx ? 0x00007fffe136d5e9: mov??? $0x7ffff7ff3000,%r10? ; {external_word} ? 0x00007fffe136d5f3: mov??? %ecx,(%r10,%rcx,1) ? 0x00007fffe136d5f7: cmpl?? $0x0,0x1603f89f(%rip)??????? # 0x00007ffff73acea0 ??????????????????????????????????????????????? ; {external_word} ? 0x00007fffe136d601: jne??? 0x00007fffe136d615 ? 0x00007fffe136d607: cmpl?? $0x0,0x30(%r15) ? 0x00007fffe136d60f: je???? 0x00007fffe136d636 ? 0x00007fffe136d615: mov??? %rax,-0x8(%rbp) ? 0x00007fffe136d619: mov??? %r15,%rdi ? 0x00007fffe136d61c: mov??? %rsp,%r12 ? 0x00007fffe136d61f: sub??? $0x0,%rsp ? 0x00007fffe136d623: and??? $0xfffffffffffffff0,%rsp ? 0x00007fffe136d627: callq? 0x00007ffff6a691da? ; {runtime_call} ? 0x00007fffe136d62c: mov??? %r12,%rsp ? 0x00007fffe136d62f: xor??? %r12,%r12 ? 0x00007fffe136d632: mov??? -0x8(%rbp),%rax ? 0x00007fffe136d636: movl?? $0x8,0x298(%r15) ? 0x00007fffe136d641: cmpl?? $0x1,0x2c4(%r15) ? 0x00007fffe136d64c: je???? 0x00007fffe136d6e8 ? 0x00007fffe136d652: cmpb?? $0x0,0x1602dd51(%rip)??????? # 0x00007ffff739b3aa ??????????????????????????????????????????????? ; {external_word} ? 0x00007fffe136d659: je???? 0x00007fffe136d697 ? 0x00007fffe136d65f: mov??? %rax,-0x8(%rbp) ? 0x00007fffe136d663: mov??? $0x7fffe015e0e0,%rsi? ; {metadata({method} {0x00007fffe015e0e0} 'fillInStackTrace' '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable')} ? 0x00007fffe136d66d: mov??? %r15,%rdi ? 0x00007fffe136d670: test?? $0xf,%esp ? 0x00007fffe136d676: je???? 0x00007fffe136d68e ? 0x00007fffe136d67c: sub??? $0x8,%rsp ? 0x00007fffe136d680: callq? 0x00007ffff69c4ab8? ; {runtime_call} ? 0x00007fffe136d685: add??? $0x8,%rsp ? 0x00007fffe136d689: jmpq?? 0x00007fffe136d693 ? 0x00007fffe136d68e: callq? 0x00007ffff69c4ab8? ; {runtime_call} ? 0x00007fffe136d693: mov??? -0x8(%rbp),%rax ? 0x00007fffe136d697: mov??? $0x0,%r10 ? 0x00007fffe136d6a1: mov??? %r10,0x200(%r15) ? 0x00007fffe136d6a8: mov??? $0x0,%r10 ? 0x00007fffe136d6b2: mov??? %r10,0x208(%r15) ? 0x00007fffe136d6b9: test?? %rax,%rax ? 0x00007fffe136d6bc: je???? 0x00007fffe136d6c5 ? 0x00007fffe136d6c2: mov??? (%rax),%rax ? 0x00007fffe136d6c5: mov??? 0x38(%r15),%rcx ? 0x00007fffe136d6c9: movl?? $0x0,0x108(%rcx) ? 0x00007fffe136d6d3: leaveq ? 0x00007fffe136d6d4: cmpq?? $0x0,0x8(%r15) ? 0x00007fffe136d6dc: jne??? 0x00007fffe136d6e3 ? 0x00007fffe136d6e2: retq ? 0x00007fffe136d6e3: jmpq?? Stub::forward exception? ; {runtime_call} ? 0x00007fffe136d6e8: mov??? %rax,-0x8(%rbp) ? 0x00007fffe136d6ec: mov??? %rsp,%r12 ? 0x00007fffe136d6ef: sub??? $0x0,%rsp ? 0x00007fffe136d6f3: and??? $0xfffffffffffffff0,%rsp ? 0x00007fffe136d6f7: callq? 0x00007ffff69c8b64? ; {runtime_call} ? 0x00007fffe136d6fc: mov??? %r12,%rsp ? 0x00007fffe136d6ff: xor??? %r12,%r12 ? 0x00007fffe136d702: mov??? -0x8(%rbp),%rax ? 0x00007fffe136d706: jmpq?? 0x00007fffe136d652 ? 0x00007fffe136d70b: hlt ? 0x00007fffe136d70c: hlt ? 0x00007fffe136d70d: hlt ? 0x00007fffe136d70e: hlt ? 0x00007fffe136d70f: hlt # # A fatal error has been detected by the Java Runtime Environment: # #? SIGSEGV (0xb) at pc=0x000000000000dead, pid=2174, tid=0x00007ffff7fc8700 # # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal-debug-xiangzhai_2018_03_19_20_27-b00) # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug compiled mode linux-amd64 compressed oops) # Problematic frame: # C? 0x000000000000dead # # Core dump written. Default location: /data/project/openjdk/jdk8u/hotspot/test/compiler/5057225/core or core.2174 # # An error report file with more information is saved as: # /data/project/openjdk/jdk8u/hotspot/test/compiler/5057225/hs_err_pid2174.log Compiled method (c1)?? 21870? 156?? !?? 3 java.lang.ClassLoader::loadClass (122 bytes) ?total in heap? [0x00007fffe12bcc90,0x00007fffe12beee0] = 8784 ?relocation???? [0x00007fffe12bcdc0,0x00007fffe12bcfb8] = 504 ?main code????? [0x00007fffe12bcfc0,0x00007fffe12be2c0] = 4864 ?stub code????? [0x00007fffe12be2c0,0x00007fffe12be460] = 416 ?metadata?????? [0x00007fffe12be460,0x00007fffe12be4a0] = 64 ?scopes data??? [0x00007fffe12be4a0,0x00007fffe12be848] = 936 ?scopes pcs???? [0x00007fffe12be848,0x00007fffe12becd8] = 1168 ?dependencies?? [0x00007fffe12becd8,0x00007fffe12bece0] = 8 ?handler table? [0x00007fffe12bece0,0x00007fffe12beea8] = 456 ?nul chk table? [0x00007fffe12beea8,0x00007fffe12beee0] = 56 Compiled method (c1)?? 21871? 156?? !?? 3 java.lang.ClassLoader::loadClass (122 bytes) ?total in heap? [0x00007fffe12bcc90,0x00007fffe12beee0] = 8784 ?relocation???? [0x00007fffe12bcdc0,0x00007fffe12bcfb8] = 504 ?main code????? [0x00007fffe12bcfc0,0x00007fffe12be2c0] = 4864 ?stub code????? [0x00007fffe12be2c0,0x00007fffe12be460] = 416 ?metadata?????? [0x00007fffe12be460,0x00007fffe12be4a0] = 64 ?scopes data??? [0x00007fffe12be4a0,0x00007fffe12be848] = 936 ?scopes pcs???? [0x00007fffe12be848,0x00007fffe12becd8] = 1168 ?dependencies?? [0x00007fffe12becd8,0x00007fffe12bece0] = 8 ?handler table? [0x00007fffe12bece0,0x00007fffe12beea8] = 456 ?nul chk table? [0x00007fffe12beea8,0x00007fffe12beee0] = 56 # # If you would like to submit a bug report, please visit: #?? http://bugreport.java.com/bugreport/crash.jsp # Current thread is 140737353910016 Dumping core ... [Switching to Thread 0x7ffff7fc8700 (LWP 2178)] __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51????? } (gdb) bt #0? __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1? 0x00007ffff740c4da in __GI_abort () at abort.c:89 #2? 0x00007ffff6905d0b in os::abort (dump_core=true) ??? at /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:1515 #3? 0x00007ffff6ac75fc in VMError::report_and_die (this=0x7ffff7fc6400) ??? at /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:1060 #4? 0x00007ffff6ac7d29 in crash_handler (sig=11, info=0x7ffff7fc6630, ucVoid=0x7ffff7fc6500) ??? at /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/vmError_linux.cpp:106 #5? #6? 0x00007ffff690071a in os::print_hex_dump (st=0x7ffff7fc6c30, ??? start=0xde8d , ??? end=0xdecd , unitsize=1) ??? at /data/project/openjdk/jdk8u/hotspot/src/share/vm/runtime/os.cpp:802 #7? 0x00007ffff691328e in os::print_context (st=0x7ffff7fc6c30, context=0x7ffff7fc6f00) ??? at /data/project/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:823 #8? 0x00007ffff6ac5adb in VMError::report (this=0x7ffff7fc6d50, st=0x7ffff7fc6c30) ??? at /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:534 #9? 0x00007ffff6ac70cc in VMError::report_and_die (this=0x7ffff7fc6d50) ??? at /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:971 #10 0x00007ffff6912bde in JVM_handle_linux_signal (sig=11, info=0x7ffff7fc7030, ucVoid=0x7ffff7fc6f00, ??? abort_if_unrecognized=1) ??? at /data/project/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 #11 0x00007ffff690be1d in signalHandler (sig=11, info=0x7ffff7fc7030, uc=0x7ffff7fc6f00) ??? at /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:4435 #12 ... So backtrace or set breakpoint might be helpful for debugging compiling thread, but doesn't work for running thread? I am reading Analyzing and Debugging the HotSpot VM at the OS Level[1] please give me some advice, thanks a lot! [1] http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html -- Regards, Leslie Zhai From vladimir.kozlov at oracle.com Thu Mar 22 17:37:49 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 22 Mar 2018 10:37:49 -0700 Subject: [11] RFR(XS) 8199896: [Graal] build Graal on all x86 platforms In-Reply-To: <20a5339f-f765-008d-841a-734d32f51f72@oracle.com> References: <53d0725a-ad8e-7d52-8fc5-b0c674fcbcd8@oracle.com> <18b854dc-d9a1-e1d0-7d80-2e9509cb0352@oracle.com> <20a5339f-f765-008d-841a-734d32f51f72@oracle.com> Message-ID: <42e48e0e-42cf-f728-7278-8339ee620d0c@oracle.com> Thank you, Erik Vladimir On 3/22/18 7:07 AM, Erik Joelsson wrote: > Looks good. > > /Erik > > > On 2018-03-21 18:04, Vladimir Kozlov wrote: >> Forgot to CC to build-dev. >> >> Note, this change does not have effect unless you disable AOT build. We are already building Graal >> on all x64 platforms as part of AOT build currently. >> >> Vladimir >> >> On 3/20/18 6:15 PM, Vladimir Kozlov wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8199896 >>> >>> Extend Graal build to all x64 platforms. >>> >>> diff -r 7fa5375fa6fd make/autoconf/hotspot.m4 >>> --- a/make/autoconf/hotspot.m4 >>> +++ b/make/autoconf/hotspot.m4 >>> @@ -347,11 +347,10 @@ >>> ????? fi >>> ????? INCLUDE_GRAAL="true" >>> ??? else >>> -??? # By default enable graal build on linux-x64 or where AOT is available. >>> +??? # By default enable graal build on x64 or where AOT is available. >>> ????? # graal build requires jvmci. >>> ????? if test "x$JVM_FEATURES_jvmci" = "xjvmci" && \ >>> -??????? (test "x$OPENJDK_TARGET_CPU" = "xx86_64" && \ >>> -???????? test "x$OPENJDK_TARGET_OS" = "xlinux" || \ >>> +??????? (test "x$OPENJDK_TARGET_CPU" = "xx86_64" || \ >>> ????????? test "x$ENABLE_AOT" = "xtrue") ; then >>> ??????? AC_MSG_RESULT([yes]) >>> ??????? JVM_FEATURES_graal="graal" >>> >>> Thanks, >>> Vladimir > From vladimir.kozlov at oracle.com Thu Mar 22 18:38:36 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 22 Mar 2018 11:38:36 -0700 Subject: RFC: Remove DONT_USE_REGISTER_DEFINES on Sparc? In-Reply-To: <5AB3B202.5090609@oracle.com> References: <904ad73a-14d6-7d37-4e53-bfeeeb25a709@oracle.com> <5AB3B202.5090609@oracle.com> Message-ID: <638845f4-eab0-d96f-00a6-4097eccc58e9@oracle.com> Yes, you can remove this code. Please, build fastdebug version of VM too to verify that it works. And run tier1. Thanks, Vladimir On 3/22/18 6:39 AM, Erik ?sterlund wrote: > Hi Per, > > I welcome this change. Stealing the identifier G1 from the global name space seems like a problem. > Inflating the libjvm.so size by ~0.3% on a SPARC machine does not seem like a problem. > > Thanks, > /Erik > > On 2018-03-22 12:14, Per Liden wrote: >> Hi, >> >> We recently ran into an unfortunate naming conflict, concerning "G1" the garbage collector vs. >> "G1" the sparc register. We'd like to be able to use "G1" as an enum value in various GC code, but >> register_sparc.hpp defines "G1" as a macro, which obviously breaks things. >> >> We're very reluctant to sprinkling #define DONT_USE_REGISTER_DEFINES in GC code. An alternative >> would be to simply remove this optimization in the sparc code. The comment in register_sparc.hpp >> suggests that this was done to reduce the size of libjvm.so. >> >> I applied a patch[1] to remove the sparc macros and libjvm.so grew by ~0.3% (66450K->66682K). >> Builds available here[2] and here[3]. >> >> Given that the libjvm.so growth doesn't seem that bad, would people be ok with removing the >> register defines on sparc? If so I'll file a bug and send out the patch for formal review. >> >> The patch currently removes all register defines. There are of course alternatives here in case >> someone things the libjvm growth is unacceptable, like only remove general register, only G* >> registers, etc. >> >> /Per >> >> [1] http://cr.openjdk.java.net/~pliden/remove_G1_define_on_sparc/webrev.0 >> >> [2] >> https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0935410.per.liden.hs/bundles/solaris-sparcv9/ >> >> >> [3] >> https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0952297.per.liden.hs/bundles/solaris-sparcv9/ >> > From shravya.rukmannagari at intel.com Thu Mar 22 19:11:54 2018 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Thu, 22 Mar 2018 19:11:54 +0000 Subject: RFR(S): Vector Carry-less Multiplication support Message-ID: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> Hi everyone, As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" manual [1], vector carry-less multiplication (vpclmulqdq) instruction will be supported in future Intel ISA. I have updated the CRC32 algorithm to take advantage of this instruction. I have tested with Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments. http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/ Thanks, Shravya. [1] https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf [2] https://software.intel.com/en-us/articles/intel-software-development-emulator [3] https://bugs.openjdk.java.net/browse/JDK-8200067 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Thu Mar 22 20:59:57 2018 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 22 Mar 2018 13:59:57 -0700 Subject: How to use gdb to debug C1 compiler's internal error? In-Reply-To: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> References: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> Message-ID: <119dcd61-401e-1cc5-41b0-0dd88c98543f@oracle.com> Gdb is not very useful for getting stack backtraces in generated JIT code, and it wouldn't know where to start because it apparently jumped to 0x000000000000dead.? I suggest trying -XX:+C1Breakpoint and then single-stepping through the generated code. dl On 3/22/18 9:32 AM, Leslie Zhai wrote: > Hi HotSpot compiler developers, > > I am new to HotSpot C1 compiler, and I am trying to implement a new > greedy register allocation skeleton for academy research, but might > wrongly modified some code, for example, > `Runtime1::generate_handle_exception` in > jdk/src/hotspot/cpu/x86/c1_Runtime1_x86.cpp, then `install_code` > failed to work and thrown such internal error: > > ... > > [Stub Code] > ? 0x00007fffe13752a0: mov??? $0x0,%rbx????????? ;?? {no_reloc} > ? 0x00007fffe13752aa: jmpq?? 0x00007fffe13752aa? ; {runtime_call} > [Exception Handler] > ? 0x00007fffe13752af: jmpq?? 0x00007fffe1004ee0? ; {runtime_call} > [Deopt Handler Code] > ? 0x00007fffe13752b4: callq? 0x00007fffe13752b9 > ? 0x00007fffe13752b9: subq?? $0x5,(%rsp) > ? 0x00007fffe13752be: jmpq?? 0x00007fffe11072e0? ; {runtime_call} > ? 0x00007fffe13752c3: hlt > ? 0x00007fffe13752c4: hlt > ? 0x00007fffe13752c5: hlt > ? 0x00007fffe13752c6: hlt > ? 0x00007fffe13752c7: hlt > Decoding compiled method 0x00007fffe136d310: > Code: > [Entry Point] > ? # {method} {0x00007fffe015e0e0} 'fillInStackTrace' > '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable' > ? # this:???? rsi:rsi?? = 'java/lang/Throwable' > ? # parm0:??? rdx?????? = int > ? #?????????? [sp+0x50]? (sp of caller) > ? 0x00007fffe136d4a0: mov??? 0x8(%rsi),%r10d > ? 0x00007fffe136d4a4: shl??? $0x3,%r10 > ? 0x00007fffe136d4a8: cmp??? %r10,%rax > ? 0x00007fffe136d4ab: je???? 0x00007fffe136d4b8 > ? 0x00007fffe136d4b1: jmpq?? 0x00007fffe1105c40? ; {runtime_call} > ? 0x00007fffe136d4b6: nop > ? 0x00007fffe136d4b7: nop > [Verified Entry Point] > ? 0x00007fffe136d4b8: mov??? %eax,-0x16000(%rsp) > ? 0x00007fffe136d4bf: push?? %rbp > ? 0x00007fffe136d4c0: mov??? %rsp,%rbp > ? 0x00007fffe136d4c3: sub??? $0x40,%rsp > ? 0x00007fffe136d4c7: mov??? %rsp,%rax > ? 0x00007fffe136d4ca: and??? $0xfffffffffffffff0,%rax > ? 0x00007fffe136d4ce: cmp??? %rsp,%rax > ? 0x00007fffe136d4d1: je???? 0x00007fffe136d54e > ? 0x00007fffe136d4d7: mov??? %rsp,-0x28(%rsp) > ? 0x00007fffe136d4dc: sub??? $0x80,%rsp > ? 0x00007fffe136d4e3: mov??? %rax,0x78(%rsp) > ? 0x00007fffe136d4e8: mov??? %rcx,0x70(%rsp) > ? 0x00007fffe136d4ed: mov??? %rdx,0x68(%rsp) > ? 0x00007fffe136d4f2: mov??? %rbx,0x60(%rsp) > ? 0x00007fffe136d4f7: mov??? %rbp,0x50(%rsp) > ? 0x00007fffe136d4fc: mov??? %rsi,0x48(%rsp) > ? 0x00007fffe136d501: mov??? %rdi,0x40(%rsp) > ? 0x00007fffe136d506: mov??? %r8,0x38(%rsp) > ? 0x00007fffe136d50b: mov??? %r9,0x30(%rsp) > ? 0x00007fffe136d510: mov??? %r10,0x28(%rsp) > ? 0x00007fffe136d515: mov??? %r11,0x20(%rsp) > ? 0x00007fffe136d51a: mov??? %r12,0x18(%rsp) > ? 0x00007fffe136d51f: mov??? %r13,0x10(%rsp) > ? 0x00007fffe136d524: mov??? %r14,0x8(%rsp) > ? 0x00007fffe136d529: mov??? %r15,(%rsp) > ? 0x00007fffe136d52d: mov??? $0x7ffff6dbea09,%rdi? ; {external_word} > ? 0x00007fffe136d537: mov??? $0x7fffe136d4d7,%rsi? ; {internal_word} > ? 0x00007fffe136d541: mov??? %rsp,%rdx > ? 0x00007fffe136d544: and??? $0xfffffffffffffff0,%rsp > ? 0x00007fffe136d548: callq? 0x00007ffff68211ee? ; {runtime_call} > ? 0x00007fffe136d54d: hlt > ? ;; move 1 -> 2 > ? ;; move 0 -> 1 > ? 0x00007fffe136d54e: mov??? %rsi,(%rsp) > ? 0x00007fffe136d552: cmp??? $0x0,%rsi > ? 0x00007fffe136d556: lea??? (%rsp),%rsi > ? 0x00007fffe136d55a: cmove? (%rsp),%rsi??????? ; OopMap{[0]=Oop off=191} > ? 0x00007fffe136d55f: mov??? $0x7fffe136d55f,%r10? ; {section_word} > ? 0x00007fffe136d569: mov??? %r10,0x208(%r15) > ? 0x00007fffe136d570: mov??? %rsp,0x200(%r15) > ? 0x00007fffe136d577: cmpb?? $0x0,0x1602de2c(%rip)??????? # > 0x00007ffff739b3aa > ??????????????????????????????????????????????? ; {external_word} > ? 0x00007fffe136d57e: je???? 0x00007fffe136d5b8 > ? 0x00007fffe136d584: push?? %rsi > ? 0x00007fffe136d585: push?? %rdx > ? 0x00007fffe136d586: mov??? $0x7fffe015e0e0,%rsi? ; > {metadata({method} {0x00007fffe015e0e0} 'fillInStackTrace' > '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable')} > ? 0x00007fffe136d590: mov??? %r15,%rdi > ? 0x00007fffe136d593: test?? $0xf,%esp > ? 0x00007fffe136d599: je???? 0x00007fffe136d5b1 > ? 0x00007fffe136d59f: sub??? $0x8,%rsp > ? 0x00007fffe136d5a3: callq? 0x00007ffff69c48ae? ; {runtime_call} > ? 0x00007fffe136d5a8: add??? $0x8,%rsp > ? 0x00007fffe136d5ac: jmpq?? 0x00007fffe136d5b6 > ? 0x00007fffe136d5b1: callq? 0x00007ffff69c48ae? ; {runtime_call} > ? 0x00007fffe136d5b6: pop??? %rdx > ? 0x00007fffe136d5b7: pop??? %rsi > ? 0x00007fffe136d5b8: lea??? 0x220(%r15),%rdi > ? 0x00007fffe136d5bf: movl?? $0x4,0x298(%r15) > ? 0x00007fffe136d5ca: callq? 0x00007ffff4f55fef? ; {runtime_call} > ? 0x00007fffe136d5cf: vzeroupper > ? 0x00007fffe136d5d2: movl?? $0x5,0x298(%r15) > ? 0x00007fffe136d5dd: mov??? %r15d,%ecx > ? 0x00007fffe136d5e0: shr??? $0x4,%ecx > ? 0x00007fffe136d5e3: and??? $0xffc,%ecx > ? 0x00007fffe136d5e9: mov??? $0x7ffff7ff3000,%r10? ; {external_word} > ? 0x00007fffe136d5f3: mov??? %ecx,(%r10,%rcx,1) > ? 0x00007fffe136d5f7: cmpl?? $0x0,0x1603f89f(%rip)??????? # > 0x00007ffff73acea0 > ??????????????????????????????????????????????? ; {external_word} > ? 0x00007fffe136d601: jne??? 0x00007fffe136d615 > ? 0x00007fffe136d607: cmpl?? $0x0,0x30(%r15) > ? 0x00007fffe136d60f: je???? 0x00007fffe136d636 > ? 0x00007fffe136d615: mov??? %rax,-0x8(%rbp) > ? 0x00007fffe136d619: mov??? %r15,%rdi > ? 0x00007fffe136d61c: mov??? %rsp,%r12 > ? 0x00007fffe136d61f: sub??? $0x0,%rsp > ? 0x00007fffe136d623: and??? $0xfffffffffffffff0,%rsp > ? 0x00007fffe136d627: callq? 0x00007ffff6a691da? ; {runtime_call} > ? 0x00007fffe136d62c: mov??? %r12,%rsp > ? 0x00007fffe136d62f: xor??? %r12,%r12 > ? 0x00007fffe136d632: mov??? -0x8(%rbp),%rax > ? 0x00007fffe136d636: movl?? $0x8,0x298(%r15) > ? 0x00007fffe136d641: cmpl?? $0x1,0x2c4(%r15) > ? 0x00007fffe136d64c: je???? 0x00007fffe136d6e8 > ? 0x00007fffe136d652: cmpb?? $0x0,0x1602dd51(%rip)??????? # > 0x00007ffff739b3aa > ??????????????????????????????????????????????? ; {external_word} > ? 0x00007fffe136d659: je???? 0x00007fffe136d697 > ? 0x00007fffe136d65f: mov??? %rax,-0x8(%rbp) > ? 0x00007fffe136d663: mov??? $0x7fffe015e0e0,%rsi? ; > {metadata({method} {0x00007fffe015e0e0} 'fillInStackTrace' > '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable')} > ? 0x00007fffe136d66d: mov??? %r15,%rdi > ? 0x00007fffe136d670: test?? $0xf,%esp > ? 0x00007fffe136d676: je???? 0x00007fffe136d68e > ? 0x00007fffe136d67c: sub??? $0x8,%rsp > ? 0x00007fffe136d680: callq? 0x00007ffff69c4ab8? ; {runtime_call} > ? 0x00007fffe136d685: add??? $0x8,%rsp > ? 0x00007fffe136d689: jmpq?? 0x00007fffe136d693 > ? 0x00007fffe136d68e: callq? 0x00007ffff69c4ab8? ; {runtime_call} > ? 0x00007fffe136d693: mov??? -0x8(%rbp),%rax > ? 0x00007fffe136d697: mov??? $0x0,%r10 > ? 0x00007fffe136d6a1: mov??? %r10,0x200(%r15) > ? 0x00007fffe136d6a8: mov??? $0x0,%r10 > ? 0x00007fffe136d6b2: mov??? %r10,0x208(%r15) > ? 0x00007fffe136d6b9: test?? %rax,%rax > ? 0x00007fffe136d6bc: je???? 0x00007fffe136d6c5 > ? 0x00007fffe136d6c2: mov??? (%rax),%rax > ? 0x00007fffe136d6c5: mov??? 0x38(%r15),%rcx > ? 0x00007fffe136d6c9: movl?? $0x0,0x108(%rcx) > ? 0x00007fffe136d6d3: leaveq > ? 0x00007fffe136d6d4: cmpq?? $0x0,0x8(%r15) > ? 0x00007fffe136d6dc: jne??? 0x00007fffe136d6e3 > ? 0x00007fffe136d6e2: retq > ? 0x00007fffe136d6e3: jmpq?? Stub::forward exception? ; {runtime_call} > ? 0x00007fffe136d6e8: mov??? %rax,-0x8(%rbp) > ? 0x00007fffe136d6ec: mov??? %rsp,%r12 > ? 0x00007fffe136d6ef: sub??? $0x0,%rsp > ? 0x00007fffe136d6f3: and??? $0xfffffffffffffff0,%rsp > ? 0x00007fffe136d6f7: callq? 0x00007ffff69c8b64? ; {runtime_call} > ? 0x00007fffe136d6fc: mov??? %r12,%rsp > ? 0x00007fffe136d6ff: xor??? %r12,%r12 > ? 0x00007fffe136d702: mov??? -0x8(%rbp),%rax > ? 0x00007fffe136d706: jmpq?? 0x00007fffe136d652 > ? 0x00007fffe136d70b: hlt > ? 0x00007fffe136d70c: hlt > ? 0x00007fffe136d70d: hlt > ? 0x00007fffe136d70e: hlt > ? 0x00007fffe136d70f: hlt > > # > # A fatal error has been detected by the Java Runtime Environment: > # > #? SIGSEGV (0xb) at pc=0x000000000000dead, pid=2174, > tid=0x00007ffff7fc8700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build > 1.8.0-internal-debug-xiangzhai_2018_03_19_20_27-b00) > # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug compiled mode > linux-amd64 compressed oops) > # Problematic frame: > # C? 0x000000000000dead > # > # Core dump written. Default location: > /data/project/openjdk/jdk8u/hotspot/test/compiler/5057225/core or > core.2174 > # > # An error report file with more information is saved as: > # > /data/project/openjdk/jdk8u/hotspot/test/compiler/5057225/hs_err_pid2174.log > > Compiled method (c1)?? 21870? 156?? !?? 3 > java.lang.ClassLoader::loadClass (122 bytes) > ?total in heap? [0x00007fffe12bcc90,0x00007fffe12beee0] = 8784 > ?relocation???? [0x00007fffe12bcdc0,0x00007fffe12bcfb8] = 504 > ?main code????? [0x00007fffe12bcfc0,0x00007fffe12be2c0] = 4864 > ?stub code????? [0x00007fffe12be2c0,0x00007fffe12be460] = 416 > ?metadata?????? [0x00007fffe12be460,0x00007fffe12be4a0] = 64 > ?scopes data??? [0x00007fffe12be4a0,0x00007fffe12be848] = 936 > ?scopes pcs???? [0x00007fffe12be848,0x00007fffe12becd8] = 1168 > ?dependencies?? [0x00007fffe12becd8,0x00007fffe12bece0] = 8 > ?handler table? [0x00007fffe12bece0,0x00007fffe12beea8] = 456 > ?nul chk table? [0x00007fffe12beea8,0x00007fffe12beee0] = 56 > Compiled method (c1)?? 21871? 156?? !?? 3 > java.lang.ClassLoader::loadClass (122 bytes) > ?total in heap? [0x00007fffe12bcc90,0x00007fffe12beee0] = 8784 > ?relocation???? [0x00007fffe12bcdc0,0x00007fffe12bcfb8] = 504 > ?main code????? [0x00007fffe12bcfc0,0x00007fffe12be2c0] = 4864 > ?stub code????? [0x00007fffe12be2c0,0x00007fffe12be460] = 416 > ?metadata?????? [0x00007fffe12be460,0x00007fffe12be4a0] = 64 > ?scopes data??? [0x00007fffe12be4a0,0x00007fffe12be848] = 936 > ?scopes pcs???? [0x00007fffe12be848,0x00007fffe12becd8] = 1168 > ?dependencies?? [0x00007fffe12becd8,0x00007fffe12bece0] = 8 > ?handler table? [0x00007fffe12bece0,0x00007fffe12beea8] = 456 > ?nul chk table? [0x00007fffe12beea8,0x00007fffe12beee0] = 56 > # > # If you would like to submit a bug report, please visit: > #?? http://bugreport.java.com/bugreport/crash.jsp > # > Current thread is 140737353910016 > Dumping core ... > > [Switching to Thread 0x7ffff7fc8700 (LWP 2178)] > __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > 51????? } > (gdb) bt > #0? __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > #1? 0x00007ffff740c4da in __GI_abort () at abort.c:89 > #2? 0x00007ffff6905d0b in os::abort (dump_core=true) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:1515 > #3? 0x00007ffff6ac75fc in VMError::report_and_die (this=0x7ffff7fc6400) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:1060 > #4? 0x00007ffff6ac7d29 in crash_handler (sig=11, info=0x7ffff7fc6630, > ucVoid=0x7ffff7fc6500) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/vmError_linux.cpp:106 > #5? > #6? 0x00007ffff690071a in os::print_hex_dump (st=0x7ffff7fc6c30, > ??? start=0xde8d , > ??? end=0xdecd , > unitsize=1) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/share/vm/runtime/os.cpp:802 > #7? 0x00007ffff691328e in os::print_context (st=0x7ffff7fc6c30, > context=0x7ffff7fc6f00) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:823 > #8? 0x00007ffff6ac5adb in VMError::report (this=0x7ffff7fc6d50, > st=0x7ffff7fc6c30) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:534 > #9? 0x00007ffff6ac70cc in VMError::report_and_die (this=0x7ffff7fc6d50) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:971 > #10 0x00007ffff6912bde in JVM_handle_linux_signal (sig=11, > info=0x7ffff7fc7030, ucVoid=0x7ffff7fc6f00, > ??? abort_if_unrecognized=1) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 > #11 0x00007ffff690be1d in signalHandler (sig=11, info=0x7ffff7fc7030, > uc=0x7ffff7fc6f00) > ??? at > /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:4435 > #12 > ... > > So backtrace or set breakpoint might be helpful for debugging > compiling thread, but doesn't work for running thread? I am reading > Analyzing and Debugging the HotSpot VM at the OS Level[1] please give > me some advice, thanks a lot! > > [1] http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html > From vladimir.kozlov at oracle.com Thu Mar 22 23:09:54 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 22 Mar 2018 16:09:54 -0700 Subject: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks In-Reply-To: References: <908c1b6d-367a-2928-bb6b-203e2d6cb7bb@oracle.com> <3da586e3-f363-c47e-e106-ffdccafd7e1b@oracle.com> Message-ID: <6862ee3b-58e5-b56e-6945-1d2a78606f02@oracle.com> http://cr.openjdk.java.net/~roland/8193130/webrev.01/ After going back and force and debugging code myself I came almost to the same ideas as in this fix. I agree with it now. And sorry that it took long time. I submitted new testing for latest jdk/hs. I had to remove all Copyright years updates from patch - they already updated in sources. I am not sure about backport into JDK 10 which will be short lived. Changes are too complex and I think they should "baked" in current JDK first. Thanks, Vladimir On 1/18/18 12:12 AM, Roland Westrelin wrote: > >> First, I am suggesting to defer it to jdk 11 since it is present in jdk >> 9 (not new in jdk 10) and the fix is not simple. It is just 6 months delay. > > That's fine with me. > > Roland. > From igor.veresov at oracle.com Thu Mar 22 23:18:15 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 22 Mar 2018 16:18:15 -0700 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> <68B11413-38BE-424C-B7A0-5C2E544BA8D7@oracle.com> Message-ID: Looks good. igor > On Mar 15, 2018, at 5:18 PM, Vladimir Kozlov wrote: > > http://cr.openjdk.java.net/~kvn/8199212/webrev.01/ > > After additional tesrting I decided redo changes by splitting also Runtime and GC tests when we run with Xcomp. I also doubled timeouts for few Hotspot tests which hit time limits on SPARC in some configurations. > > Thanks, > Vladimir > > On 3/12/18 12:39 PM, Vladimir Kozlov wrote: >> Thank you, Igor >> Vladimir >> On 3/12/18 12:20 PM, Igor Ignatev wrote: >>> Hi Vladimir, >>> >>> The fix looks good. >>> >>> ? Igor >>> >>>> On Mar 7, 2018, at 9:02 PM, Vladimir Kozlov wrote: >>>> >>>> http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8199212 >>>> >>>> Our testing show that AOT jtreg tests consume a lot of time when running with -Xcomp. Running AOT compiler in these tests with -Xcomp is not helpful. >>>> >>>> -- >>>> Thanks, >>>> Vladimir >>> From vladimir.kozlov at oracle.com Thu Mar 22 23:33:52 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 22 Mar 2018 16:33:52 -0700 Subject: [11] RFR(S) 8199212: [TESTBUG] don't run compiler/aot tests with -Xcomp In-Reply-To: References: <1c0527ac-acda-8124-b2de-99e5b058c185@oracle.com> <68B11413-38BE-424C-B7A0-5C2E544BA8D7@oracle.com> Message-ID: <2cdb7e3a-ce4c-027d-f76c-0e0578058136@oracle.com> Thank you, Igor Vladimir On 3/22/18 4:18 PM, Igor Veresov wrote: > Looks good. > > igor > >> On Mar 15, 2018, at 5:18 PM, Vladimir Kozlov wrote: >> >> http://cr.openjdk.java.net/~kvn/8199212/webrev.01/ >> >> After additional tesrting I decided redo changes by splitting also Runtime and GC tests when we run with Xcomp. I also doubled timeouts for few Hotspot tests which hit time limits on SPARC in some configurations. >> >> Thanks, >> Vladimir >> >> On 3/12/18 12:39 PM, Vladimir Kozlov wrote: >>> Thank you, Igor >>> Vladimir >>> On 3/12/18 12:20 PM, Igor Ignatev wrote: >>>> Hi Vladimir, >>>> >>>> The fix looks good. >>>> >>>> ? Igor >>>> >>>>> On Mar 7, 2018, at 9:02 PM, Vladimir Kozlov wrote: >>>>> >>>>> http://cr.openjdk.java.net/~kvn/8199212/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8199212 >>>>> >>>>> Our testing show that AOT jtreg tests consume a lot of time when running with -Xcomp. Running AOT compiler in these tests with -Xcomp is not helpful. >>>>> >>>>> -- >>>>> Thanks, >>>>> Vladimir >>>> > From per.liden at oracle.com Fri Mar 23 09:44:09 2018 From: per.liden at oracle.com (Per Liden) Date: Fri, 23 Mar 2018 10:44:09 +0100 Subject: RFC: Remove DONT_USE_REGISTER_DEFINES on Sparc? In-Reply-To: <638845f4-eab0-d96f-00a6-4097eccc58e9@oracle.com> References: <904ad73a-14d6-7d37-4e53-bfeeeb25a709@oracle.com> <5AB3B202.5090609@oracle.com> <638845f4-eab0-d96f-00a6-4097eccc58e9@oracle.com> Message-ID: <90dc4049-7253-1509-bd2b-2e4eefb346bc@oracle.com> Ok, thanks! I'll file an RFE and send the patch out for formal review. And yes, it passed hs-tier{1,2} in mach5. /Per On 2018-03-22 19:38, Vladimir Kozlov wrote: > Yes, you can remove this code. Please, build fastdebug version of VM > too to verify that it works. And run tier1. > > Thanks, > Vladimir > > On 3/22/18 6:39 AM, Erik ?sterlund wrote: >> Hi Per, >> >> I welcome this change. Stealing the identifier G1 from the global >> name space seems like a problem. Inflating the libjvm.so size by >> ~0.3% on a SPARC machine does not seem like a problem. >> >> Thanks, >> /Erik >> >> On 2018-03-22 12:14, Per Liden wrote: >>> Hi, >>> >>> We recently ran into an unfortunate naming conflict, concerning "G1" >>> the garbage collector vs. "G1" the sparc register. We'd like to be >>> able to use "G1" as an enum value in various GC code, but >>> register_sparc.hpp defines "G1" as a macro, which obviously breaks >>> things. >>> >>> We're very reluctant to sprinkling #define DONT_USE_REGISTER_DEFINES >>> in GC code. An alternative would be to simply remove this >>> optimization in the sparc code. The comment in register_sparc.hpp >>> suggests that this was done to reduce the size of libjvm.so. >>> >>> I applied a patch[1] to remove the sparc macros and libjvm.so grew >>> by ~0.3% (66450K->66682K). Builds available here[2] and here[3]. >>> >>> Given that the libjvm.so growth doesn't seem that bad, would people >>> be ok with removing the register defines on sparc? If so I'll file a >>> bug and send out the patch for formal review. >>> >>> The patch currently removes all register defines. There are of >>> course alternatives here in case someone things the libjvm growth is >>> unacceptable, like only remove general register, only G* registers, >>> etc. >>> >>> /Per >>> >>> [1] >>> http://cr.openjdk.java.net/~pliden/remove_G1_define_on_sparc/webrev.0 >>> >>> [2] >>> https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0935410.per.liden.hs/bundles/solaris-sparcv9/ >>> >>> >>> [3] >>> https://java.se.oracle.com/artifactory/jdk-dev-local/jdk/personal/per.liden/2018-03-22-0952297.per.liden.hs/bundles/solaris-sparcv9/ >>> >> From per.liden at oracle.com Fri Mar 23 10:29:07 2018 From: per.liden at oracle.com (Per Liden) Date: Fri, 23 Mar 2018 11:29:07 +0100 Subject: RFR: 8200168: Remove DONT_USE_REGISTER_DEFINES on Sparc Message-ID: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> Hi, Please review this patch to remove register macros on Sparc, as discussed here: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028541.html Bug: https://bugs.openjdk.java.net/browse/JDK-8200168 Webrev: http://cr.openjdk.java.net/~pliden/8200168/webrev.0 Testing: Passed hs-tier{1,2} /Per From rwestrel at redhat.com Fri Mar 23 12:19:29 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 23 Mar 2018 13:19:29 +0100 Subject: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks In-Reply-To: <6862ee3b-58e5-b56e-6945-1d2a78606f02@oracle.com> References: <908c1b6d-367a-2928-bb6b-203e2d6cb7bb@oracle.com> <3da586e3-f363-c47e-e106-ffdccafd7e1b@oracle.com> <6862ee3b-58e5-b56e-6945-1d2a78606f02@oracle.com> Message-ID: Hi Vladimir, > http://cr.openjdk.java.net/~roland/8193130/webrev.01/ > > After going back and force and debugging code myself I came almost to the same ideas as in this fix. > I agree with it now. And sorry that it took long time. Cool! Thanks for the review, testing and sponsoring. > I am not sure about backport into JDK 10 which will be short lived. Changes are too complex and I > think they should "baked" in current JDK first. Not backporting is fine with me. Roland. From martin.doerr at sap.com Fri Mar 23 15:04:11 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 23 Mar 2018 15:04:11 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads Message-ID: Hi Vladimir, thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. I'll be glad if you can support this effort. My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created during startup so the Compiler Threads don't need to call Java. The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning parameters or different numbers. Threads get stopped in reverse order as they were created when their compile queue is empty for some time. The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by -XX:+TraceCompilerThreads. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ The following issues need to get addressed, yet: -Test JVMCI support. (I'm not familiar with it.) -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. -Logging. -Performance and memory consumption evaluation. It would be great to get support and advice for these issues. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From lesliezhai at llvm.org.cn Fri Mar 23 15:11:44 2018 From: lesliezhai at llvm.org.cn (Leslie Zhai) Date: Fri, 23 Mar 2018 23:11:44 +0800 Subject: How to use gdb to debug C1 compiler's internal error? In-Reply-To: <119dcd61-401e-1cc5-41b0-0dd88c98543f@oracle.com> References: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> <119dcd61-401e-1cc5-41b0-0dd88c98543f@oracle.com> Message-ID: <9f0312a6-16bf-2d29-c207-ca2d688b92e3@llvm.org.cn>+2194BE02A1F8600E Hi Dean, Thanks for your response! ? 2018?03?23? 04:59, dean.long at oracle.com ??: > Gdb is not very useful for getting stack backtraces in generated JIT > code, and it wouldn't know where to start because it apparently jumped > to 0x000000000000dead.? I suggest trying -XX:+C1Breakpoint and then > single-stepping through the generated code. Work :) Thread 2 "java" received signal SIGSEGV, Segmentation fault. 0x00007fffe1214138 in ?? () (gdb) call help() "Executing help" basic ? pp(void* p)?? - try to make sense of p ? pv(intptr_t p)- ((PrintableResourceObj*) p)->print() ? ps()????????? - print current thread stack ? pss()???????? - print all thread stacks ? pm(int pc)??? - print Method* given compiled PC ? findm(intptr_t pc) - finds Method* ? find(intptr_t x)?? - finds & prints nmethod/stub/bytecode/oop based on pointer into it ? pns(void* sp, void* fp, void* pc)? - print native (i.e. mixed) stack trace. E.g. ?????????????????? pns($sp, $rbp, $pc) on Linux/amd64 and Solaris/amd64 or ?????????????????? pns($sp, $ebp, $pc) on Linux/x86 or ?????????????????? pns($sp, 0, $pc)??? on Linux/ppc64 or ?????????????????? pns($sp + 0x7ff, 0, $pc) on Solaris/SPARC ???????????????? - in gdb do 'set overload-resolution off' before calling pns() ???????????????? - in dbx do 'frame 1' before calling pns() misc. ? flush()?????? - flushes the log file ? events()????? - dump events from ring buffers compiler debugging ? debug()?????? - to set things up for compiler debugging ? ndebug()????? - undo debug (gdb) call find(0x00007fffe1214138) "Executing find" 0x00007fffe1214138 is at entry_point+56 in (nmethod*)0x00007fffe1213f90 Compiled method (c2)?? 17107?? 33?????? 4 java.util.Properties::getProperty (46 bytes) ?total in heap? [0x00007fffe1213f90,0x00007fffe1214370] = 992 ?relocation???? [0x00007fffe12140c0,0x00007fffe12140f0] = 48 ?main code????? [0x00007fffe1214100,0x00007fffe12141a0] = 160 ?stub code????? [0x00007fffe12141a0,0x00007fffe12141d8] = 56 ?metadata?????? [0x00007fffe12141d8,0x00007fffe12141e8] = 16 ?scopes data??? [0x00007fffe12141e8,0x00007fffe1214248] = 96 ?scopes pcs???? [0x00007fffe1214248,0x00007fffe1214328] = 224 ?dependencies?? [0x00007fffe1214328,0x00007fffe1214330] = 8 ?handler table? [0x00007fffe1214330,0x00007fffe1214360] = 48 ?nul chk table? [0x00007fffe1214360,0x00007fffe1214370] = 16 (gdb) call pns($sp, $rbp, $pc) "Executing pns" Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) ... And I found someone experienced similar case http://llvm.org/docs/DebuggingJITedCode.html > > > dl > > On 3/22/18 9:32 AM, Leslie Zhai wrote: >> Hi HotSpot compiler developers, >> >> I am new to HotSpot C1 compiler, and I am trying to implement a new >> greedy register allocation skeleton for academy research, but might >> wrongly modified some code, for example, >> `Runtime1::generate_handle_exception` in >> jdk/src/hotspot/cpu/x86/c1_Runtime1_x86.cpp, then `install_code` >> failed to work and thrown such internal error: >> >> ... >> >> [Stub Code] >> ? 0x00007fffe13752a0: mov??? $0x0,%rbx????????? ;?? {no_reloc} >> ? 0x00007fffe13752aa: jmpq?? 0x00007fffe13752aa? ; {runtime_call} >> [Exception Handler] >> ? 0x00007fffe13752af: jmpq?? 0x00007fffe1004ee0? ; {runtime_call} >> [Deopt Handler Code] >> ? 0x00007fffe13752b4: callq? 0x00007fffe13752b9 >> ? 0x00007fffe13752b9: subq?? $0x5,(%rsp) >> ? 0x00007fffe13752be: jmpq?? 0x00007fffe11072e0? ; {runtime_call} >> ? 0x00007fffe13752c3: hlt >> ? 0x00007fffe13752c4: hlt >> ? 0x00007fffe13752c5: hlt >> ? 0x00007fffe13752c6: hlt >> ? 0x00007fffe13752c7: hlt >> Decoding compiled method 0x00007fffe136d310: >> Code: >> [Entry Point] >> ? # {method} {0x00007fffe015e0e0} 'fillInStackTrace' >> '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable' >> ? # this:???? rsi:rsi?? = 'java/lang/Throwable' >> ? # parm0:??? rdx?????? = int >> ? #?????????? [sp+0x50]? (sp of caller) >> ? 0x00007fffe136d4a0: mov??? 0x8(%rsi),%r10d >> ? 0x00007fffe136d4a4: shl??? $0x3,%r10 >> ? 0x00007fffe136d4a8: cmp??? %r10,%rax >> ? 0x00007fffe136d4ab: je???? 0x00007fffe136d4b8 >> ? 0x00007fffe136d4b1: jmpq?? 0x00007fffe1105c40? ; {runtime_call} >> ? 0x00007fffe136d4b6: nop >> ? 0x00007fffe136d4b7: nop >> [Verified Entry Point] >> ? 0x00007fffe136d4b8: mov??? %eax,-0x16000(%rsp) >> ? 0x00007fffe136d4bf: push?? %rbp >> ? 0x00007fffe136d4c0: mov??? %rsp,%rbp >> ? 0x00007fffe136d4c3: sub??? $0x40,%rsp >> ? 0x00007fffe136d4c7: mov??? %rsp,%rax >> ? 0x00007fffe136d4ca: and??? $0xfffffffffffffff0,%rax >> ? 0x00007fffe136d4ce: cmp??? %rsp,%rax >> ? 0x00007fffe136d4d1: je???? 0x00007fffe136d54e >> ? 0x00007fffe136d4d7: mov??? %rsp,-0x28(%rsp) >> ? 0x00007fffe136d4dc: sub??? $0x80,%rsp >> ? 0x00007fffe136d4e3: mov??? %rax,0x78(%rsp) >> ? 0x00007fffe136d4e8: mov??? %rcx,0x70(%rsp) >> ? 0x00007fffe136d4ed: mov??? %rdx,0x68(%rsp) >> ? 0x00007fffe136d4f2: mov??? %rbx,0x60(%rsp) >> ? 0x00007fffe136d4f7: mov??? %rbp,0x50(%rsp) >> ? 0x00007fffe136d4fc: mov??? %rsi,0x48(%rsp) >> ? 0x00007fffe136d501: mov??? %rdi,0x40(%rsp) >> ? 0x00007fffe136d506: mov??? %r8,0x38(%rsp) >> ? 0x00007fffe136d50b: mov??? %r9,0x30(%rsp) >> ? 0x00007fffe136d510: mov??? %r10,0x28(%rsp) >> ? 0x00007fffe136d515: mov??? %r11,0x20(%rsp) >> ? 0x00007fffe136d51a: mov??? %r12,0x18(%rsp) >> ? 0x00007fffe136d51f: mov??? %r13,0x10(%rsp) >> ? 0x00007fffe136d524: mov??? %r14,0x8(%rsp) >> ? 0x00007fffe136d529: mov??? %r15,(%rsp) >> ? 0x00007fffe136d52d: mov??? $0x7ffff6dbea09,%rdi? ; {external_word} >> ? 0x00007fffe136d537: mov??? $0x7fffe136d4d7,%rsi? ; {internal_word} >> ? 0x00007fffe136d541: mov??? %rsp,%rdx >> ? 0x00007fffe136d544: and??? $0xfffffffffffffff0,%rsp >> ? 0x00007fffe136d548: callq? 0x00007ffff68211ee? ; {runtime_call} >> ? 0x00007fffe136d54d: hlt >> ? ;; move 1 -> 2 >> ? ;; move 0 -> 1 >> ? 0x00007fffe136d54e: mov??? %rsi,(%rsp) >> ? 0x00007fffe136d552: cmp??? $0x0,%rsi >> ? 0x00007fffe136d556: lea??? (%rsp),%rsi >> ? 0x00007fffe136d55a: cmove? (%rsp),%rsi??????? ; OopMap{[0]=Oop >> off=191} >> ? 0x00007fffe136d55f: mov??? $0x7fffe136d55f,%r10? ; {section_word} >> ? 0x00007fffe136d569: mov??? %r10,0x208(%r15) >> ? 0x00007fffe136d570: mov??? %rsp,0x200(%r15) >> ? 0x00007fffe136d577: cmpb?? $0x0,0x1602de2c(%rip)??????? # >> 0x00007ffff739b3aa >> ??????????????????????????????????????????????? ; {external_word} >> ? 0x00007fffe136d57e: je???? 0x00007fffe136d5b8 >> ? 0x00007fffe136d584: push?? %rsi >> ? 0x00007fffe136d585: push?? %rdx >> ? 0x00007fffe136d586: mov??? $0x7fffe015e0e0,%rsi? ; >> {metadata({method} {0x00007fffe015e0e0} 'fillInStackTrace' >> '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable')} >> ? 0x00007fffe136d590: mov??? %r15,%rdi >> ? 0x00007fffe136d593: test?? $0xf,%esp >> ? 0x00007fffe136d599: je???? 0x00007fffe136d5b1 >> ? 0x00007fffe136d59f: sub??? $0x8,%rsp >> ? 0x00007fffe136d5a3: callq? 0x00007ffff69c48ae? ; {runtime_call} >> ? 0x00007fffe136d5a8: add??? $0x8,%rsp >> ? 0x00007fffe136d5ac: jmpq?? 0x00007fffe136d5b6 >> ? 0x00007fffe136d5b1: callq? 0x00007ffff69c48ae? ; {runtime_call} >> ? 0x00007fffe136d5b6: pop??? %rdx >> ? 0x00007fffe136d5b7: pop??? %rsi >> ? 0x00007fffe136d5b8: lea??? 0x220(%r15),%rdi >> ? 0x00007fffe136d5bf: movl?? $0x4,0x298(%r15) >> ? 0x00007fffe136d5ca: callq? 0x00007ffff4f55fef? ; {runtime_call} >> ? 0x00007fffe136d5cf: vzeroupper >> ? 0x00007fffe136d5d2: movl?? $0x5,0x298(%r15) >> ? 0x00007fffe136d5dd: mov??? %r15d,%ecx >> ? 0x00007fffe136d5e0: shr??? $0x4,%ecx >> ? 0x00007fffe136d5e3: and??? $0xffc,%ecx >> ? 0x00007fffe136d5e9: mov??? $0x7ffff7ff3000,%r10? ; {external_word} >> ? 0x00007fffe136d5f3: mov??? %ecx,(%r10,%rcx,1) >> ? 0x00007fffe136d5f7: cmpl?? $0x0,0x1603f89f(%rip)??????? # >> 0x00007ffff73acea0 >> ??????????????????????????????????????????????? ; {external_word} >> ? 0x00007fffe136d601: jne??? 0x00007fffe136d615 >> ? 0x00007fffe136d607: cmpl?? $0x0,0x30(%r15) >> ? 0x00007fffe136d60f: je???? 0x00007fffe136d636 >> ? 0x00007fffe136d615: mov??? %rax,-0x8(%rbp) >> ? 0x00007fffe136d619: mov??? %r15,%rdi >> ? 0x00007fffe136d61c: mov??? %rsp,%r12 >> ? 0x00007fffe136d61f: sub??? $0x0,%rsp >> ? 0x00007fffe136d623: and??? $0xfffffffffffffff0,%rsp >> ? 0x00007fffe136d627: callq? 0x00007ffff6a691da? ; {runtime_call} >> ? 0x00007fffe136d62c: mov??? %r12,%rsp >> ? 0x00007fffe136d62f: xor??? %r12,%r12 >> ? 0x00007fffe136d632: mov??? -0x8(%rbp),%rax >> ? 0x00007fffe136d636: movl?? $0x8,0x298(%r15) >> ? 0x00007fffe136d641: cmpl?? $0x1,0x2c4(%r15) >> ? 0x00007fffe136d64c: je???? 0x00007fffe136d6e8 >> ? 0x00007fffe136d652: cmpb?? $0x0,0x1602dd51(%rip)??????? # >> 0x00007ffff739b3aa >> ??????????????????????????????????????????????? ; {external_word} >> ? 0x00007fffe136d659: je???? 0x00007fffe136d697 >> ? 0x00007fffe136d65f: mov??? %rax,-0x8(%rbp) >> ? 0x00007fffe136d663: mov??? $0x7fffe015e0e0,%rsi? ; >> {metadata({method} {0x00007fffe015e0e0} 'fillInStackTrace' >> '(I)Ljava/lang/Throwable;' in 'java/lang/Throwable')} >> ? 0x00007fffe136d66d: mov??? %r15,%rdi >> ? 0x00007fffe136d670: test?? $0xf,%esp >> ? 0x00007fffe136d676: je???? 0x00007fffe136d68e >> ? 0x00007fffe136d67c: sub??? $0x8,%rsp >> ? 0x00007fffe136d680: callq? 0x00007ffff69c4ab8? ; {runtime_call} >> ? 0x00007fffe136d685: add??? $0x8,%rsp >> ? 0x00007fffe136d689: jmpq?? 0x00007fffe136d693 >> ? 0x00007fffe136d68e: callq? 0x00007ffff69c4ab8? ; {runtime_call} >> ? 0x00007fffe136d693: mov??? -0x8(%rbp),%rax >> ? 0x00007fffe136d697: mov??? $0x0,%r10 >> ? 0x00007fffe136d6a1: mov??? %r10,0x200(%r15) >> ? 0x00007fffe136d6a8: mov??? $0x0,%r10 >> ? 0x00007fffe136d6b2: mov??? %r10,0x208(%r15) >> ? 0x00007fffe136d6b9: test?? %rax,%rax >> ? 0x00007fffe136d6bc: je???? 0x00007fffe136d6c5 >> ? 0x00007fffe136d6c2: mov??? (%rax),%rax >> ? 0x00007fffe136d6c5: mov??? 0x38(%r15),%rcx >> ? 0x00007fffe136d6c9: movl?? $0x0,0x108(%rcx) >> ? 0x00007fffe136d6d3: leaveq >> ? 0x00007fffe136d6d4: cmpq?? $0x0,0x8(%r15) >> ? 0x00007fffe136d6dc: jne??? 0x00007fffe136d6e3 >> ? 0x00007fffe136d6e2: retq >> ? 0x00007fffe136d6e3: jmpq?? Stub::forward exception? ; {runtime_call} >> ? 0x00007fffe136d6e8: mov??? %rax,-0x8(%rbp) >> ? 0x00007fffe136d6ec: mov??? %rsp,%r12 >> ? 0x00007fffe136d6ef: sub??? $0x0,%rsp >> ? 0x00007fffe136d6f3: and??? $0xfffffffffffffff0,%rsp >> ? 0x00007fffe136d6f7: callq? 0x00007ffff69c8b64? ; {runtime_call} >> ? 0x00007fffe136d6fc: mov??? %r12,%rsp >> ? 0x00007fffe136d6ff: xor??? %r12,%r12 >> ? 0x00007fffe136d702: mov??? -0x8(%rbp),%rax >> ? 0x00007fffe136d706: jmpq?? 0x00007fffe136d652 >> ? 0x00007fffe136d70b: hlt >> ? 0x00007fffe136d70c: hlt >> ? 0x00007fffe136d70d: hlt >> ? 0x00007fffe136d70e: hlt >> ? 0x00007fffe136d70f: hlt >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> #? SIGSEGV (0xb) at pc=0x000000000000dead, pid=2174, >> tid=0x00007ffff7fc8700 >> # >> # JRE version: OpenJDK Runtime Environment (8.0) (build >> 1.8.0-internal-debug-xiangzhai_2018_03_19_20_27-b00) >> # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug compiled mode >> linux-amd64 compressed oops) >> # Problematic frame: >> # C? 0x000000000000dead >> # >> # Core dump written. Default location: >> /data/project/openjdk/jdk8u/hotspot/test/compiler/5057225/core or >> core.2174 >> # >> # An error report file with more information is saved as: >> # >> /data/project/openjdk/jdk8u/hotspot/test/compiler/5057225/hs_err_pid2174.log >> >> Compiled method (c1)?? 21870? 156?? !?? 3 >> java.lang.ClassLoader::loadClass (122 bytes) >> ?total in heap? [0x00007fffe12bcc90,0x00007fffe12beee0] = 8784 >> ?relocation???? [0x00007fffe12bcdc0,0x00007fffe12bcfb8] = 504 >> ?main code????? [0x00007fffe12bcfc0,0x00007fffe12be2c0] = 4864 >> ?stub code????? [0x00007fffe12be2c0,0x00007fffe12be460] = 416 >> ?metadata?????? [0x00007fffe12be460,0x00007fffe12be4a0] = 64 >> ?scopes data??? [0x00007fffe12be4a0,0x00007fffe12be848] = 936 >> ?scopes pcs???? [0x00007fffe12be848,0x00007fffe12becd8] = 1168 >> ?dependencies?? [0x00007fffe12becd8,0x00007fffe12bece0] = 8 >> ?handler table? [0x00007fffe12bece0,0x00007fffe12beea8] = 456 >> ?nul chk table? [0x00007fffe12beea8,0x00007fffe12beee0] = 56 >> Compiled method (c1)?? 21871? 156?? !?? 3 >> java.lang.ClassLoader::loadClass (122 bytes) >> ?total in heap? [0x00007fffe12bcc90,0x00007fffe12beee0] = 8784 >> ?relocation???? [0x00007fffe12bcdc0,0x00007fffe12bcfb8] = 504 >> ?main code????? [0x00007fffe12bcfc0,0x00007fffe12be2c0] = 4864 >> ?stub code????? [0x00007fffe12be2c0,0x00007fffe12be460] = 416 >> ?metadata?????? [0x00007fffe12be460,0x00007fffe12be4a0] = 64 >> ?scopes data??? [0x00007fffe12be4a0,0x00007fffe12be848] = 936 >> ?scopes pcs???? [0x00007fffe12be848,0x00007fffe12becd8] = 1168 >> ?dependencies?? [0x00007fffe12becd8,0x00007fffe12bece0] = 8 >> ?handler table? [0x00007fffe12bece0,0x00007fffe12beea8] = 456 >> ?nul chk table? [0x00007fffe12beea8,0x00007fffe12beee0] = 56 >> # >> # If you would like to submit a bug report, please visit: >> #?? http://bugreport.java.com/bugreport/crash.jsp >> # >> Current thread is 140737353910016 >> Dumping core ... >> >> [Switching to Thread 0x7ffff7fc8700 (LWP 2178)] >> __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 >> 51????? } >> (gdb) bt >> #0? __GI_raise (sig=sig at entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:51 >> #1? 0x00007ffff740c4da in __GI_abort () at abort.c:89 >> #2? 0x00007ffff6905d0b in os::abort (dump_core=true) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:1515 >> #3? 0x00007ffff6ac75fc in VMError::report_and_die (this=0x7ffff7fc6400) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:1060 >> #4? 0x00007ffff6ac7d29 in crash_handler (sig=11, info=0x7ffff7fc6630, >> ucVoid=0x7ffff7fc6500) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/vmError_linux.cpp:106 >> #5? >> #6? 0x00007ffff690071a in os::print_hex_dump (st=0x7ffff7fc6c30, >> ??? start=0xde8d , >> ??? end=0xdecd , >> unitsize=1) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/share/vm/runtime/os.cpp:802 >> #7? 0x00007ffff691328e in os::print_context (st=0x7ffff7fc6c30, >> context=0x7ffff7fc6f00) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:823 >> #8? 0x00007ffff6ac5adb in VMError::report (this=0x7ffff7fc6d50, >> st=0x7ffff7fc6c30) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:534 >> #9? 0x00007ffff6ac70cc in VMError::report_and_die (this=0x7ffff7fc6d50) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:971 >> #10 0x00007ffff6912bde in JVM_handle_linux_signal (sig=11, >> info=0x7ffff7fc7030, ucVoid=0x7ffff7fc6f00, >> ??? abort_if_unrecognized=1) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 >> #11 0x00007ffff690be1d in signalHandler (sig=11, info=0x7ffff7fc7030, >> uc=0x7ffff7fc6f00) >> ??? at >> /data/project/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:4435 >> #12 >> ... >> >> So backtrace or set breakpoint might be helpful for debugging >> compiling thread, but doesn't work for running thread? I am reading >> Analyzing and Debugging the HotSpot VM at the OS Level[1] please give >> me some advice, thanks a lot! >> >> [1] http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html >> > -- Regards, Leslie Zhai From lutz.schmidt at sap.com Fri Mar 23 16:30:13 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 23 Mar 2018 16:30:13 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> Message-ID: <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> Hi Vladimir, Tobias, I have worked on your comments quite some time. There were changes to - share/code/codeCache.cpp - share/code/codeHeapState.cpp - share/compiler/compileBroker.cpp - share/memory/heap.hpp - share/runtime/java.cpp Here is a summary of what I changed/reworked/adapted: - The lock order problem is solved. - The CodeHeapStateAnalytics_lock o is acquired before the "aggregate" step is begun. o is held continuously during the aggregate and print function. o is released at function return (after all work is done. o just protects from modification of the static variables by other threads. - The CodeCache_lock o is acquired after the CodeHeapStateAnalytics_lock and only if an "aggregate" step is to be performed. o hold time was never observed to be more than one second. Not a guarantee, though. - The tty_lock is never acquired during the "aggregate" step, so there is no interference with the CodeCache_lock. - In the print* functions, blocks that need to stay together are first composed into a bufferedStream (size 4k). They are then printed to the given outputStream under tty_lock. - The remaining out->print_cr() are left by intention. They print diagnostic info if some internal inconsistency is found. - The OpenJDK code style wrt. if-then-else should now be respected everywhere. - The commented lines you mentioned (codeHeapState.cpp/.hpp) are gone. - The "coding alternatives" for printing to the log stream are gone. SAP-internal testing against SAP JVM did not reveal any problems. Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other platforms did not run due to system issues. There is a new webrev at http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ Thanks for spending some time, again, on this RFR. Best regards, Lutz ?On 20.03.18, 23:01, "Vladimir Kozlov" wrote: As I remember we are trying to lock tty outside print functions. Yes, it could be troublesome if it is Mbs of output. Especially when you do it for "full codecache" event when VM is still running. You also have CodeCache_lock in print_heapinfo() and it would not be good to hold both locks at the same time. I think to have "micro locking" (with comments) in print_heapinfo() is better then to have lock in each print function. Vladimir On 3/20/18 11:57 AM, Schmidt, Lutz wrote: > Hi Vladimir, > I already saw that code but was a little hesitant to code the same way. Why? In my case, the stringStream buffer could become fairly large. Actual size depends on CodeHeap size and contents as well as printing parameters. If you tell me some MB are OK, I can change my code. > Thanks, > Lutz > > On 20.03.18, 19:42, "Vladimir Kozlov" wrote: > > I think you should follow what we do with CodeCache::print_summary(): > > http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 > > First, print into local buffer stringStream and then lock tty when print that buffer. > > Thanks, > Vladimir > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > > Hi Tobias, > > > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > > > That bug triggered some thoughts in my brain, resulting in a question or two: > > > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > > > Depending on what's in favor by the community, I will move the locks accordingly. > > > > Thanks, > > Lutz > > > > > > On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > > > Hi Lutz, > > > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > > > The following tests > > compiler/whitebox/AllocationCodeBlobTest.java > > compiler/codecache/OverflowCodeCacheTest.java > > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > > compiler/codecache/stress/RandomAllocationTest.java > > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > > compiler/profiling/spectrapredefineclass/Launcher.java > > > > fail with > > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > > possible deadlock > > > > Let me know if you need more information to reproduce! > > > > Thanks, > > Tobias > > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > > Hi Tobias, > > > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > > > Thanks, > > > Lutz > > > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > > > Hi Lutz, > > > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > > LogStream code. > > > > > > I'll re-run the tests that failed with the last webrev. > > > > > > Best regards, > > > Tobias > > > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > > Dear all, > > > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > > > May I please request reviews for > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > > - All references to the RFE id should be gone. > > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > > - The compile time warnings and errors are resolved. > > > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > > > Comments are very welcome! > > > > > > > > Best Regards, > > > > Lutz > > > > > > > > > > > > > > > > > > > > > > > > > > > > From vladimir.kozlov at oracle.com Fri Mar 23 17:16:49 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Mar 2018 10:16:49 -0700 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: References: Message-ID: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> Very cool! Few thoughts. You can't delete thread when it is NULL (missing check or refactor code): if (thread == NULL || thread->osthread() == NULL) { + if (UseDynamicNumberOfCompilerThreads && comp->num_compiler_threads() > 0) { + delete thread; Why not keep handle instead of returning naked oop from create_thread_oop()? You create Handle again Start fields names with _ to distinguish them from local variable: + static int c1_count, c2_count; In possibly_add_compiler_threads() you can use c2_count instead of calling compile_count() again and array size is fixed already: + int new_c2_count = MIN3(_c2_compile_queue->size() / 2, + CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. Or we may need more complex formula. We may need to free corresponding java thread object when we remove compiler threads. Thanks, Vladimir On 3/23/18 8:04 AM, Doerr, Martin wrote: > Hi Vladimir, > > thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. > > I'll be glad if you can support this effort. > > My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending > on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created > during startup so the Compiler Threads don't need to call Java. > > The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning > parameters or different numbers. > > Threads get stopped in reverse order as they were created when their compile queue is empty for some time. > > The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by > -XX:+TraceCompilerThreads. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ > > The following issues need to get addressed, yet: > > -Test JVMCI support. (I'm not familiar with it.) > > -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. > > -Logging. > > -Performance and memory consumption evaluation. > > It would be great to get support and advice for these issues. > > Best regards, > > Martin > From martin.doerr at sap.com Fri Mar 23 17:37:51 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 23 Mar 2018 17:37:51 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> Message-ID: <41d9a441f84e41919f4566df78b46a0f@sap.com> Hi Vladimir, thanks for the quick reply. Just a few answers. I'll take a closer look next week. > You can't delete thread when it is NULL C++ supports calling delete NULL so I think it would be uncommon to check it. If there's a problem, I think the delete operator should get fixed. "If expression evaluates to a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's implementation-defined), but the default deallocation functions are guaranteed to do nothing when handed a null pointer." [1] > We may need to free corresponding java thread object when we remove compiler threads. I think it would be bad to remove the Java Thread objects because we'd need to recreate them which is rather expensive and violates the design principle that Compiler Threads are not allowed to call Java. Removing them wouldn't save much memory. Keeping them in global handles seems to be beneficial and makes this change easier. >And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. My current formula only creates as much compiler threads so that there exist 2 compile jobs per thread. I think this is better for startup, but we can reevaluate this. Thanks for the improvement proposals. I'll implement them next week. Nevertheless, the current version can already be tested. Best regards, Martin [1] http://en.cppreference.com/w/cpp/language/delete -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 23. M?rz 2018 18:17 To: Doerr, Martin Cc: Igor Veresov (igor.veresov at oracle.com) ; White, Derek ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads Very cool! Few thoughts. You can't delete thread when it is NULL (missing check or refactor code): if (thread == NULL || thread->osthread() == NULL) { + if (UseDynamicNumberOfCompilerThreads && comp->num_compiler_threads() > 0) { + delete thread; Why not keep handle instead of returning naked oop from create_thread_oop()? You create Handle again Start fields names with _ to distinguish them from local variable: + static int c1_count, c2_count; In possibly_add_compiler_threads() you can use c2_count instead of calling compile_count() again and array size is fixed already: + int new_c2_count = MIN3(_c2_compile_queue->size() / 2, + CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. Or we may need more complex formula. We may need to free corresponding java thread object when we remove compiler threads. Thanks, Vladimir On 3/23/18 8:04 AM, Doerr, Martin wrote: > Hi Vladimir, > > thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. > > I'll be glad if you can support this effort. > > My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending > on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created > during startup so the Compiler Threads don't need to call Java. > > The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning > parameters or different numbers. > > Threads get stopped in reverse order as they were created when their compile queue is empty for some time. > > The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by > -XX:+TraceCompilerThreads. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ > > The following issues need to get addressed, yet: > > -Test JVMCI support. (I'm not familiar with it.) > > -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. > > -Logging. > > -Performance and memory consumption evaluation. > > It would be great to get support and advice for these issues. > > Best regards, > > Martin > From vladimir.kozlov at oracle.com Fri Mar 23 21:46:58 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Mar 2018 14:46:58 -0700 Subject: RFR(S) 8200067: Vector Carry-less Multiplication support In-Reply-To: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> References: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> Message-ID: Hi Shravya, macroAssembler_x86.cpp: Why you placed xmm0 initialization before size check?: + movdqu(xmm0, ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); I think initialization and the check should be inside code guarded by supports_vpclmulqdq(). L_Parallel is not used - no jump to it. Thanks, Vladimir On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote: > Hi everyone, > > As per ?Intel Architecture Instruction Set Extensions and Future Features Programming Reference? > manual [1], vector carry-less multiplication (vpclmulqdq) instruction will be supported in future > Intel ISA. I have updated the CRC32 algorithm to take advantage of this instruction. I have tested > with Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look > and let me know if you have any questions or comments. > > http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/ > > Thanks, > > Shravya. > > [1] > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf > > [2] https://software.intel.com/en-us/articles/intel-software-development-emulator > > [3] https://bugs.openjdk.java.net/browse/JDK-8200067 > From vladimir.kozlov at oracle.com Fri Mar 23 22:40:01 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Mar 2018 15:40:01 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> Message-ID: <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> This looks good. NMethodSweeper::print(out) may be called twice in java.cpp because print_heapinfo() also calls it through print_info(). You removed ttyLocker from NMethodSweeper::print() and you don't have lock in CompileBroker::print_info() which is significant output block. Other places are fine AFAIS. Thank you for fixing coding style. Thanks, Vladimir On 3/23/18 9:30 AM, Schmidt, Lutz wrote: > Hi Vladimir, Tobias, > > I have worked on your comments quite some time. There were changes to > - share/code/codeCache.cpp > - share/code/codeHeapState.cpp > - share/compiler/compileBroker.cpp > - share/memory/heap.hpp > - share/runtime/java.cpp > > Here is a summary of what I changed/reworked/adapted: > - The lock order problem is solved. > - The CodeHeapStateAnalytics_lock > o is acquired before the "aggregate" step is begun. > o is held continuously during the aggregate and print function. > o is released at function return (after all work is done. > o just protects from modification of the static variables by other threads. > - The CodeCache_lock > o is acquired after the CodeHeapStateAnalytics_lock and only if an "aggregate" step is to be performed. > o hold time was never observed to be more than one second. Not a guarantee, though. > - The tty_lock is never acquired during the "aggregate" step, so there is no interference with the CodeCache_lock. > - In the print* functions, blocks that need to stay together are first composed into a bufferedStream (size 4k). They are then printed to the given outputStream under tty_lock. > - The remaining out->print_cr() are left by intention. They print diagnostic info if some internal inconsistency is found. > - The OpenJDK code style wrt. if-then-else should now be respected everywhere. > - The commented lines you mentioned (codeHeapState.cpp/.hpp) are gone. > - The "coding alternatives" for printing to the log stream are gone. > > SAP-internal testing against SAP JVM did not reveal any problems. > Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other platforms did not run due to system issues. > > There is a new webrev at http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ > > Thanks for spending some time, again, on this RFR. > > Best regards, > Lutz > > ?On 20.03.18, 23:01, "Vladimir Kozlov" wrote: > > As I remember we are trying to lock tty outside print functions. > > Yes, it could be troublesome if it is Mbs of output. Especially when you do it for "full codecache" > event when VM is still running. You also have CodeCache_lock in print_heapinfo() and it would not be > good to hold both locks at the same time. I think to have "micro locking" (with comments) in > print_heapinfo() is better then to have lock in each print function. > > Vladimir > > On 3/20/18 11:57 AM, Schmidt, Lutz wrote: > > Hi Vladimir, > > I already saw that code but was a little hesitant to code the same way. Why? In my case, the stringStream buffer could become fairly large. Actual size depends on CodeHeap size and contents as well as printing parameters. If you tell me some MB are OK, I can change my code. > > Thanks, > > Lutz > > > > On 20.03.18, 19:42, "Vladimir Kozlov" wrote: > > > > I think you should follow what we do with CodeCache::print_summary(): > > > > http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 > > > > First, print into local buffer stringStream and then lock tty when print that buffer. > > > > Thanks, > > Vladimir > > > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > > > Hi Tobias, > > > > > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > > > > > That bug triggered some thoughts in my brain, resulting in a question or two: > > > > > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > > > > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > > > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > > > > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > > > > > Depending on what's in favor by the community, I will move the locks accordingly. > > > > > > Thanks, > > > Lutz > > > > > > > > > On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > > > > > Hi Lutz, > > > > > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > > > > > The following tests > > > compiler/whitebox/AllocationCodeBlobTest.java > > > compiler/codecache/OverflowCodeCacheTest.java > > > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > > > compiler/codecache/stress/RandomAllocationTest.java > > > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > > > compiler/profiling/spectrapredefineclass/Launcher.java > > > > > > fail with > > > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > > > possible deadlock > > > > > > Let me know if you need more information to reproduce! > > > > > > Thanks, > > > Tobias > > > > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > > > Hi Tobias, > > > > > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > > > > > Thanks, > > > > Lutz > > > > > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > > > > > Hi Lutz, > > > > > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > > > LogStream code. > > > > > > > > I'll re-run the tests that failed with the last webrev. > > > > > > > > Best regards, > > > > Tobias > > > > > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > > > Dear all, > > > > > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > > > > > May I please request reviews for > > > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > > > - All references to the RFE id should be gone. > > > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > > > - The compile time warnings and errors are resolved. > > > > > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > > > > > Comments are very welcome! > > > > > > > > > > Best Regards, > > > > > Lutz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From vladimir.kozlov at oracle.com Sat Mar 24 00:58:27 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Mar 2018 17:58:27 -0700 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <41d9a441f84e41919f4566df78b46a0f@sap.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> Message-ID: <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> On 3/23/18 10:37 AM, Doerr, Martin wrote: > Hi Vladimir, > > thanks for the quick reply. Just a few answers. I'll take a closer look next week. > >> You can't delete thread when it is NULL > C++ supports calling delete NULL so I think it would be uncommon to check it. If there's a problem, I think the delete operator should get fixed. > > "If expression evaluates to a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's implementation-defined), but the default deallocation functions are guaranteed to do nothing when handed a null pointer." [1] I am sure our code analyzing tool, which we use to check code correctness, will compliant about it. > >> We may need to free corresponding java thread object when we remove compiler threads. > I think it would be bad to remove the Java Thread objects because we'd need to recreate them which is rather expensive and violates the design principle that Compiler Threads are not allowed to call Java. Removing them wouldn't save much memory. Keeping them in global handles seems to be beneficial and makes this change easier. Okay. > >> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. > My current formula only creates as much compiler threads so that there exist 2 compile jobs per thread. I think this is better for startup, but we can reevaluate this. Would be nice to see graph how number of compiler threads change with time depending on load for some applications (for example, jbb2005 and specjvm2008 if you have them)? > > Thanks for the improvement proposals. I'll implement them next week. Nevertheless, the current version can already be tested. I started our testing. I just remember that we may need to treat -Xcomp and CTW cases specially. I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. And also tier1 compiler tests with Graal as JIT (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). They passed. I think JVMCI code is fine. But I see crash in CompileLog::finish_log_on_error() function in compiler/compilercontrol jtreg tests (they are not in tier1) with normal jtreg runs: FAILED: compiler/compilercontrol/commandfile/LogTest.java FAILED: compiler/compilercontrol/commands/LogTest.java FAILED: compiler/compilercontrol/directives/LogTest.java FAILED: compiler/compilercontrol/jcmd/AddLogTest.java FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java FAILED: compiler/compilercontrol/logcompilation/LogTest.java I started performance testing too. Thanks, Vladimir > > Best regards, > Martin > > > [1] http://en.cppreference.com/w/cpp/language/delete > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Freitag, 23. M?rz 2018 18:17 > To: Doerr, Martin > Cc: Igor Veresov (igor.veresov at oracle.com) ; White, Derek ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads > > Very cool! > > Few thoughts. > > You can't delete thread when it is NULL (missing check or refactor code): > > if (thread == NULL || thread->osthread() == NULL) { > + if (UseDynamicNumberOfCompilerThreads && comp->num_compiler_threads() > 0) { > + delete thread; > > Why not keep handle instead of returning naked oop from create_thread_oop()? You create Handle again > > Start fields names with _ to distinguish them from local variable: > > + static int c1_count, c2_count; > > In possibly_add_compiler_threads() you can use c2_count instead of calling compile_count() again and array size is fixed > already: > > + int new_c2_count = MIN3(_c2_compile_queue->size() / 2, > + CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), > > And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues > filled up very fast so you may end up creating all compiler threads. Or we may need more complex formula. > > We may need to free corresponding java thread object when we remove compiler threads. > > Thanks, > Vladimir > > On 3/23/18 8:04 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. >> >> I'll be glad if you can support this effort. >> >> My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending >> on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created >> during startup so the Compiler Threads don't need to call Java. >> >> The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning >> parameters or different numbers. >> >> Threads get stopped in reverse order as they were created when their compile queue is empty for some time. >> >> The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by >> -XX:+TraceCompilerThreads. >> >> Webrev is here: >> >> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >> >> The following issues need to get addressed, yet: >> >> -Test JVMCI support. (I'm not familiar with it.) >> >> -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. >> >> -Logging. >> >> -Performance and memory consumption evaluation. >> >> It would be great to get support and advice for these issues. >> >> Best regards, >> >> Martin >> From erik.osterlund at oracle.com Sat Mar 24 15:49:29 2018 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Sat, 24 Mar 2018 16:49:29 +0100 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> Message-ID: Hi, Just thought I should mention that no JavaThread (hence including compiler threads) that has been added to the Threads list may be deleted directly with delete. Instead you should call it with SMR by calling smr_delete(). Thanks, /Erik > On 24 Mar 2018, at 01:58, Vladimir Kozlov wrote: > >> On 3/23/18 10:37 AM, Doerr, Martin wrote: >> Hi Vladimir, >> thanks for the quick reply. Just a few answers. I'll take a closer look next week. >>> You can't delete thread when it is NULL >> C++ supports calling delete NULL so I think it would be uncommon to check it. If there's a problem, I think the delete operator should get fixed. >> "If expression evaluates to a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's implementation-defined), but the default deallocation functions are guaranteed to do nothing when handed a null pointer." [1] > > I am sure our code analyzing tool, which we use to check code correctness, will compliant about it. > >>> We may need to free corresponding java thread object when we remove compiler threads. >> I think it would be bad to remove the Java Thread objects because we'd need to recreate them which is rather expensive and violates the design principle that Compiler Threads are not allowed to call Java. Removing them wouldn't save much memory. Keeping them in global handles seems to be beneficial and makes this change easier. > > Okay. > >>> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. >> My current formula only creates as much compiler threads so that there exist 2 compile jobs per thread. I think this is better for startup, but we can reevaluate this. > > Would be nice to see graph how number of compiler threads change with time depending on load for some applications (for example, jbb2005 and specjvm2008 if you have them)? > >> Thanks for the improvement proposals. I'll implement them next week. Nevertheless, the current version can already be tested. > > I started our testing. > > I just remember that we may need to treat -Xcomp and CTW cases specially. > > I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. And also tier1 compiler tests with Graal as JIT (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). > They passed. I think JVMCI code is fine. > > But I see crash in CompileLog::finish_log_on_error() function in compiler/compilercontrol jtreg tests (they are not in tier1) with normal jtreg runs: > > FAILED: compiler/compilercontrol/commandfile/LogTest.java > FAILED: compiler/compilercontrol/commands/LogTest.java > FAILED: compiler/compilercontrol/directives/LogTest.java > FAILED: compiler/compilercontrol/jcmd/AddLogTest.java > FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java > FAILED: compiler/compilercontrol/logcompilation/LogTest.java > > I started performance testing too. > > Thanks, > Vladimir > >> Best regards, >> Martin >> [1] http://en.cppreference.com/w/cpp/language/delete >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Freitag, 23. M?rz 2018 18:17 >> To: Doerr, Martin >> Cc: Igor Veresov (igor.veresov at oracle.com) ; White, Derek ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >> Very cool! >> Few thoughts. >> You can't delete thread when it is NULL (missing check or refactor code): >> if (thread == NULL || thread->osthread() == NULL) { >> + if (UseDynamicNumberOfCompilerThreads && comp->num_compiler_threads() > 0) { >> + delete thread; >> Why not keep handle instead of returning naked oop from create_thread_oop()? You create Handle again >> Start fields names with _ to distinguish them from local variable: >> + static int c1_count, c2_count; >> In possibly_add_compiler_threads() you can use c2_count instead of calling compile_count() again and array size is fixed >> already: >> + int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >> + CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues >> filled up very fast so you may end up creating all compiler threads. Or we may need more complex formula. >> We may need to free corresponding java thread object when we remove compiler threads. >> Thanks, >> Vladimir >>> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. >>> >>> I'll be glad if you can support this effort. >>> >>> My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending >>> on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created >>> during startup so the Compiler Threads don't need to call Java. >>> >>> The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning >>> parameters or different numbers. >>> >>> Threads get stopped in reverse order as they were created when their compile queue is empty for some time. >>> >>> The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by >>> -XX:+TraceCompilerThreads. >>> >>> Webrev is here: >>> >>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>> >>> The following issues need to get addressed, yet: >>> >>> -Test JVMCI support. (I'm not familiar with it.) >>> >>> -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. >>> >>> -Logging. >>> >>> -Performance and memory consumption evaluation. >>> >>> It would be great to get support and advice for these issues. >>> >>> Best regards, >>> >>> Martin >>> From tobias.hartmann at oracle.com Mon Mar 26 09:16:59 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 26 Mar 2018 11:16:59 +0200 Subject: [11] RFR(XS): 8200227: [Graal] Test times out with Graal due to low compile threshold Message-ID: <05af2d4f-a72b-1dda-e984-51d13482483c@oracle.com> Hi, please review the following test patch: https://bugs.openjdk.java.net/browse/JDK-8200227 http://cr.openjdk.java.net/~thartmann/8200227/webrev.00/ The test times out with Graal as JIT because it sets -Xbatch -XX:-TieredCompilation -XX:CompileThreshold=100 which extremely slows down execution due to many blocking compilations of Graal internal code that needs to be compiled by Graal itself. The test should be executed with TieredCompilation enabled to allow Graal code to be C1 compiled. I've verified that all intrinsified methods are still compiled (i.e., the test still does what it's supposed to do). I've also searched for other tests that use the same flag combination but we don't have any. Thanks, Tobias From aph at redhat.com Mon Mar 26 09:48:47 2018 From: aph at redhat.com (Andrew Haley) Date: Mon, 26 Mar 2018 10:48:47 +0100 Subject: How to use gdb to debug C1 compiler's internal error? In-Reply-To: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> References: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> Message-ID: <38ae05b3-91ed-f3c9-0478-4112dd06edb2@redhat.com> On 03/22/2018 04:32 PM, Leslie Zhai wrote: > So backtrace or set breakpoint might be helpful for debugging compiling > thread, but doesn't work for running thread? I am reading Analyzing and > Debugging the HotSpot VM at the OS Level[1] please give me some advice, > thanks a lot! > > [1] http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html You'll first need to set a breakpoint in the segfault handler here in JVM_handle_linux_signal: VMError::report_and_die(t, sig, pc, info, ucVoid); Then you can use gdb to go up the stack to the point of the crash. At that point you'll be inspecting the stack to see what's there. If you can't tell, then your next plan should be to instrument the code you're generating to add tracing information so that when it does, you know where you are. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Mon Mar 26 10:37:17 2018 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 26 Mar 2018 12:37:17 +0200 Subject: RFR: 8200168: Remove DONT_USE_REGISTER_DEFINES on Sparc In-Reply-To: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> References: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> Message-ID: <5AB8CD5D.6090103@oracle.com> Hi Per, Looks good. Thanks, /Erik On 2018-03-23 11:29, Per Liden wrote: > Hi, > > Please review this patch to remove register macros on Sparc, as > discussed here: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028541.html > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200168 > Webrev: http://cr.openjdk.java.net/~pliden/8200168/webrev.0 > > Testing: Passed hs-tier{1,2} > > /Per From tobias.hartmann at oracle.com Mon Mar 26 10:35:33 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 26 Mar 2018 12:35:33 +0200 Subject: [11] RFR(S): 8200230: [Graal] Compilations should not be enqueued before Graal is initialized Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8200230 http://cr.openjdk.java.net/~thartmann/8200230/webrev.00/ Looking at the PrintCompilation output when running Graal with -Xbatch and -XX:-TieredCompilation shows lots of blocking compilations that time out because Graal is not yet initialized. Execution with -version therefore takes 41 seconds on my machine with a fastdebug build. With -Xbatch, we should only allow compilations to be enqueued when Graal is fully initialized. This reduces execution time with -version to 2.5 seconds on my machine. Thanks to Doug Simon for providing the patch. Best regards, Tobias From per.liden at oracle.com Mon Mar 26 10:39:52 2018 From: per.liden at oracle.com (Per Liden) Date: Mon, 26 Mar 2018 12:39:52 +0200 Subject: RFR: 8200168: Remove DONT_USE_REGISTER_DEFINES on Sparc In-Reply-To: <5AB8CD5D.6090103@oracle.com> References: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> <5AB8CD5D.6090103@oracle.com> Message-ID: <9ee881c0-1e91-ca2d-26ab-f1e7676b31d4@oracle.com> Thanks Erik! /Per On 03/26/2018 12:37 PM, Erik ?sterlund wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 2018-03-23 11:29, Per Liden wrote: >> Hi, >> >> Please review this patch to remove register macros on Sparc, as >> discussed here: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028541.html >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8200168 >> Webrev: http://cr.openjdk.java.net/~pliden/8200168/webrev.0 >> >> Testing: Passed hs-tier{1,2} >> >> /Per > From thomas.stuefe at gmail.com Mon Mar 26 11:20:26 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 26 Mar 2018 13:20:26 +0200 Subject: How to use gdb to debug C1 compiler's internal error? In-Reply-To: <38ae05b3-91ed-f3c9-0478-4112dd06edb2@redhat.com> References: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> <38ae05b3-91ed-f3c9-0478-4112dd06edb2@redhat.com> Message-ID: On Mon, Mar 26, 2018 at 11:48 AM, Andrew Haley wrote: > On 03/22/2018 04:32 PM, Leslie Zhai wrote: > > So backtrace or set breakpoint might be helpful for debugging compiling > > thread, but doesn't work for running thread? I am reading Analyzing and > > Debugging the HotSpot VM at the OS Level[1] please give me some advice, > > thanks a lot! > > > > [1] http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html > > You'll first need to set a breakpoint in the segfault handler here in > JVM_handle_linux_signal: > > VMError::report_and_die(t, sig, pc, info, ucVoid); > > Then you can use gdb to go up the stack to the point of the crash. > > At that point you'll be inspecting the stack to see what's there. If > you can't tell, then your next plan should be to instrument the code > you're generating to add tracing information so that when it does, you > know where you are. > > small addition, until you hit the breakpoint at VMError::report_and_die() gdb may trip over any number of SIGSEGV or SIGBUS. That is usually normal, since signals are also used internally for non-error purposes. Just continue until you hit VMError::report_and_die(), which when you hit it indicates a real error. Or, set the SIGSEGV handler to nostop. Best Regards, Thomas > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Mon Mar 26 13:57:54 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 26 Mar 2018 13:57:54 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> Message-ID: <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> Thank you Vladimir! The missing ttyLocker sneaked out of the code unintentionally and unnoticed. Sorry for that. I have put it back in. From our experience, it is very helpful to have NMethodSweeper information available whenever you look at CodeHeap state analytics. I changed the code in java.cpp such that NMethodSweeper::print(out) isn't called twice. I have created a new webrev at http://cr.openjdk.java.net/~lucy/webrevs/8198691.03/index.html Thank you! Lutz ?On 23.03.18, 23:40, "Vladimir Kozlov" wrote: This looks good. NMethodSweeper::print(out) may be called twice in java.cpp because print_heapinfo() also calls it through print_info(). You removed ttyLocker from NMethodSweeper::print() and you don't have lock in CompileBroker::print_info() which is significant output block. Other places are fine AFAIS. Thank you for fixing coding style. Thanks, Vladimir On 3/23/18 9:30 AM, Schmidt, Lutz wrote: > Hi Vladimir, Tobias, > > I have worked on your comments quite some time. There were changes to > - share/code/codeCache.cpp > - share/code/codeHeapState.cpp > - share/compiler/compileBroker.cpp > - share/memory/heap.hpp > - share/runtime/java.cpp > > Here is a summary of what I changed/reworked/adapted: > - The lock order problem is solved. > - The CodeHeapStateAnalytics_lock > o is acquired before the "aggregate" step is begun. > o is held continuously during the aggregate and print function. > o is released at function return (after all work is done. > o just protects from modification of the static variables by other threads. > - The CodeCache_lock > o is acquired after the CodeHeapStateAnalytics_lock and only if an "aggregate" step is to be performed. > o hold time was never observed to be more than one second. Not a guarantee, though. > - The tty_lock is never acquired during the "aggregate" step, so there is no interference with the CodeCache_lock. > - In the print* functions, blocks that need to stay together are first composed into a bufferedStream (size 4k). They are then printed to the given outputStream under tty_lock. > - The remaining out->print_cr() are left by intention. They print diagnostic info if some internal inconsistency is found. > - The OpenJDK code style wrt. if-then-else should now be respected everywhere. > - The commented lines you mentioned (codeHeapState.cpp/.hpp) are gone. > - The "coding alternatives" for printing to the log stream are gone. > > SAP-internal testing against SAP JVM did not reveal any problems. > Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other platforms did not run due to system issues. > > There is a new webrev at http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ > > Thanks for spending some time, again, on this RFR. > > Best regards, > Lutz > > On 20.03.18, 23:01, "Vladimir Kozlov" wrote: > > As I remember we are trying to lock tty outside print functions. > > Yes, it could be troublesome if it is Mbs of output. Especially when you do it for "full codecache" > event when VM is still running. You also have CodeCache_lock in print_heapinfo() and it would not be > good to hold both locks at the same time. I think to have "micro locking" (with comments) in > print_heapinfo() is better then to have lock in each print function. > > Vladimir > > On 3/20/18 11:57 AM, Schmidt, Lutz wrote: > > Hi Vladimir, > > I already saw that code but was a little hesitant to code the same way. Why? In my case, the stringStream buffer could become fairly large. Actual size depends on CodeHeap size and contents as well as printing parameters. If you tell me some MB are OK, I can change my code. > > Thanks, > > Lutz > > > > On 20.03.18, 19:42, "Vladimir Kozlov" wrote: > > > > I think you should follow what we do with CodeCache::print_summary(): > > > > http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 > > > > First, print into local buffer stringStream and then lock tty when print that buffer. > > > > Thanks, > > Vladimir > > > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > > > Hi Tobias, > > > > > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > > > > > That bug triggered some thoughts in my brain, resulting in a question or two: > > > > > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > > > > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > > > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > > > > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > > > > > Depending on what's in favor by the community, I will move the locks accordingly. > > > > > > Thanks, > > > Lutz > > > > > > > > > On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > > > > > Hi Lutz, > > > > > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > > > > > The following tests > > > compiler/whitebox/AllocationCodeBlobTest.java > > > compiler/codecache/OverflowCodeCacheTest.java > > > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > > > compiler/codecache/stress/RandomAllocationTest.java > > > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > > > compiler/profiling/spectrapredefineclass/Launcher.java > > > > > > fail with > > > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > > > possible deadlock > > > > > > Let me know if you need more information to reproduce! > > > > > > Thanks, > > > Tobias > > > > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > > > Hi Tobias, > > > > > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > > > > > Thanks, > > > > Lutz > > > > > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > > > > > Hi Lutz, > > > > > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > > > LogStream code. > > > > > > > > I'll re-run the tests that failed with the last webrev. > > > > > > > > Best regards, > > > > Tobias > > > > > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > > > Dear all, > > > > > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > > > > > May I please request reviews for > > > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > > > - All references to the RFE id should be gone. > > > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > > > - The compile time warnings and errors are resolved. > > > > > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > > > > > Comments are very welcome! > > > > > > > > > > Best Regards, > > > > > Lutz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From lesliezhai at llvm.org.cn Mon Mar 26 15:06:06 2018 From: lesliezhai at llvm.org.cn (Leslie Zhai) Date: Mon, 26 Mar 2018 23:06:06 +0800 Subject: How to use gdb to debug C1 compiler's internal error? In-Reply-To: <38ae05b3-91ed-f3c9-0478-4112dd06edb2@redhat.com> References: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> <38ae05b3-91ed-f3c9-0478-4112dd06edb2@redhat.com> Message-ID: +3E92E3C28A21A14A Hi Andrew, Thanks for your response! ? 2018?03?26? 17:48, Andrew Haley ??: > On 03/22/2018 04:32 PM, Leslie Zhai wrote: >> So backtrace or set breakpoint might be helpful for debugging compiling >> thread, but doesn't work for running thread? I am reading Analyzing and >> Debugging the HotSpot VM at the OS Level[1] please give me some advice, >> thanks a lot! >> >> [1] http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html > You'll first need to set a breakpoint in the segfault handler here in > JVM_handle_linux_signal: > > VMError::report_and_die(t, sig, pc, info, ucVoid); > > Then you can use gdb to go up the stack to the point of the crash. Work :) (gdb) b VMError::report_and_die Breakpoint 1 at 0x7ffff65c8f2c: VMError::report_and_die. (7 locations) (gdb) r Thread 14 "C1 CompilerThre" hit Breakpoint 1, VMError::report_and_die (thread=0x7ffff0220000, ??? filename=0x7ffff66dd4c0 "/home/xiangzhai/project/jdk/src/hotspot/share/utilities/growableArray.hpp", ??? lineno=230, message=0x7ffff66de828 "assert(0 <= i && i < _len) failed", ??? detail_fmt=0x7ffff66de819 "illegal index", detail_args=0x7fffc89aafe8) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/utilities/vmError.cpp:1244 1244????? report_and_die(INTERNAL_ERROR, message, detail_fmt, detail_args, thread, NULL, NULL, NULL, filename, lineno, 0); (gdb) bt ... #3? 0x00007ffff5ba4124 in GlobalValueNumbering::value_map_of (this=0x7fffc89ab300, block=0x7fff9c0135d0) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_ValueMap.hpp:244 #4? 0x00007ffff5ba4151 in GlobalValueNumbering::set_value_map_of (this=0x7fffc89ab300, block=0x7fff9c0135d0, ??? map=0x7fff9c0142a0) at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_ValueMap.hpp:245 #5? 0x00007ffff5ba3459 in GlobalValueNumbering::GlobalValueNumbering (this=0x7fffc89ab300, ir=0x7fffa0040c60) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_ValueMap.cpp:514 #6? 0x00007ffff5aa1812 in Compilation::build_hir (this=0x7fffc89ab610) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_Compilation.cpp:204 #7? 0x00007ffff5aa241f in Compilation::compile_java_method (this=0x7fffc89ab610) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_Compilation.cpp:411 #8? 0x00007ffff5aa284e in Compilation::compile_method (this=0x7fffc89ab610) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_Compilation.cpp:484 #9? 0x00007ffff5aa3007 in Compilation::Compilation (this=0x7fffc89ab610, compiler=0x7ffff021ad80, ??? env=0x7fffc89ab9b0, method=0x7fff9c001a20, osr_bci=-1, buffer_blob=0x7fffd8fa0790, ??? directive=0x7ffff01740f0) at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_Compilation.cpp:609 #10 0x00007ffff5aa68ef in Compiler::compile_method (this=0x7ffff021ad80, env=0x7fffc89ab9b0, ---Type to continue, or q to quit--- ??? method=0x7fff9c001a20, entry_bci=-1, directive=0x7ffff01740f0) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/c1/c1_Compiler.cpp:254 #11 0x00007ffff5cfdc54 in CompileBroker::invoke_compiler_on_method (task=0x7ffff0231b40) ??? at /home/xiangzhai/project/jdk/src/hotspot/share/compiler/compileBroker.cpp:1913 #12 0x00007ffff5cfcbbc in CompileBroker::compiler_thread_loop () ??? at /home/xiangzhai/project/jdk/src/hotspot/share/compiler/compileBroker.cpp:1618 ... It is helpful to find my careless bug: diff -r 1f0838e3cebe src/hotspot/share/c1/c1_ValueMap.cpp --- a/src/hotspot/share/c1/c1_ValueMap.cpp????? Sat Mar 24 22:11:30 2018 +0800 +++ b/src/hotspot/share/c1/c1_ValueMap.cpp????? Mon Mar 26 22:37:22 2018 +0800 @@ -486,14 +486,18 @@ ?GlobalValueNumbering::GlobalValueNumbering(IR* ir) ?? : _current_map(NULL) -? , _value_maps(ir->linear_scan_order()->length(), ir->linear_scan_order()->length(), NULL) +? , _value_maps(UseGreedy ? ir->greedy_order()->length() : +???????????????? ir->linear_scan_order()->length(), +?????????????? UseGreedy ? ir->greedy_order()->length() : +???????????????? ir->linear_scan_order()->length(), +?????????????? NULL) ?? , _compilation(ir->compilation()) ?{ ?? TRACE_VALUE_NUMBERING(tty->print_cr("****** start of global value numbering")); ?? ShortLoopOptimizer short_loop_optimizer(this); -? BlockList* blocks = ir->linear_scan_order(); +? BlockList* blocks = UseGreedy ? ir->greedy_order() : ir->linear_scan_order(); ?? int num_blocks = blocks->length(); ?? BlockBegin* start_block = blocks->at(0); @@ -527,7 +531,8 @@ ?????? assert(dominator == block->pred_at(0), "dominator must be equal to predecessor"); ?????? // nothing to do here -??? } else if (block->is_set(BlockBegin::linear_scan_loop_header_flag)) { +??? } else if (block->is_set(UseGreedy ? BlockBegin::greedy_loop_header_flag : + BlockBegin::linear_scan_loop_header_flag)) { ?????? // block has incoming backward branches -> try to optimize short loops ?????? if (!short_loop_optimizer.process(block)) { ???????? // loop is too complicated, so kill all memory loads because there might be diff -r 1f0838e3cebe src/hotspot/share/c1/c1_ValueMap.hpp --- a/src/hotspot/share/c1/c1_ValueMap.hpp????? Sat Mar 24 22:11:30 2018 +0800 +++ b/src/hotspot/share/c1/c1_ValueMap.hpp????? Mon Mar 26 22:37:22 2018 +0800 @@ -241,8 +241,8 @@ ?? // accessors ?? Compilation*? compilation() const????????????? { return _compilation; } ?? ValueMap*???? current_map()??????????????????? { return _current_map; } -? ValueMap*???? value_map_of(BlockBegin* block)? { return _value_maps.at(block->linear_scan_number()); } -? void????????? set_value_map_of(BlockBegin* block, ValueMap* map)?? { assert(value_map_of(block) == NULL, ""); _value_maps.at_put(block->linear_scan_number(), map); } +? ValueMap*???? value_map_of(BlockBegin* block)? { return _value_maps.at(UseGreedy ? block->greedy_number() : block->linear_scan_number()); } +? void????????? set_value_map_of(BlockBegin* block, ValueMap* map)?? { assert(value_map_of(block) == NULL, ""); _value_maps.at_put(UseGreedy ? block->greedy_number() : block->linear_scan_number(), map); } ?? bool????????? is_processed(Value v)??????????? { return _processed_values.contains(v); } ?? void????????? set_processed(Value v)?????????? { _processed_values.put(v); } > > At that point you'll be inspecting the stack to see what's there. If > you can't tell, then your next plan should be to instrument the code > you're generating to add tracing information so that when it does, you > know where you are. > Yes, it is not easy to debug, for example, wrongly modified X86's Runtime1::generate_handle_exception: Thread 2 "java" received signal SIGSEGV, Segmentation fault. 0x00007fffe094e884 in ?? () (gdb) x /200i 0x00007fffe094e800 ?? 0x7fffe094e800:????? add??? %al,(%rax) ?? 0x7fffe094e802:????? add??? %al,(%rax) ?? 0x7fffe094e804:????? int3 ?? 0x7fffe094e805:????? int3 ?? 0x7fffe094e806:????? int3 ?? 0x7fffe094e807:????? int3 ?? 0x7fffe094e808:????? add??? %al,(%rax) ?? 0x7fffe094e80a:????? add??? %al,(%rax) ?? 0x7fffe094e80c:????? add??? %al,(%rax) ?? 0x7fffe094e80e:????? add??? %al,(%rax) ?? 0x7fffe094e810:????? loopne 0x7fffe094e813 ?? 0x7fffe094e812:????? add??? %al,(%rax) ?? 0x7fffe094e814:????? (bad) ?? 0x7fffe094e815:????? (bad) ?? 0x7fffe094e816:????? (bad) ?? 0x7fffe094e817:????? (bad) ?? 0x7fffe094e818:????? (bad) ?? 0x7fffe094e819:????? (bad) ?? 0x7fffe094e81a:????? (bad) ?? 0x7fffe094e81b:????? dec??? %esp ?? 0x7fffe094e81d:????? int3 ?? 0x7fffe094e81e:????? int3 ?? 0x7fffe094e81f:????? int3 ?? 0x7fffe094e820:????? cmove? (%rax),%edx ?? 0x7fffe094e823:????? xor??? $0x2,%al ?? 0x7fffe094e825:????? lock or %ecx,%esp ?? 0x7fffe094e828:????? adc??? $0x10640ab0,%eax ---Type to continue, or q to quit--- ?? 0x7fffe094e82d:????? fs adc $0x64,%al ?? 0x7fffe094e830:????? or???? $0x64,%al ?? 0x7fffe094e832:????? or???? %eax,(%rax) ?? 0x7fffe094e834:????? add??? %edi,%eax ?? 0x7fffe094e836:????? icebp ?? 0x7fffe094e837:????? incl?? (%rax) ?? 0x7fffe094e839:????? push?? %rax ?? 0x7fffe094e83a:????? add??? %al,%al ?? 0x7fffe094e83c:????? or???? -0x8(%rcx,%rax,1),%ah ?? 0x7fffe094e840:????? loope? 0x7fffe094e841 ?? 0x7fffe094e842:????? add??? $0xac00050,%eax ?? 0x7fffe094e847:????? fs add $0x640f64,%eax ?? 0x7fffe094e84d:????? add??? %al,(%rax) ?? 0x7fffe094e84f:????? add??? %cl,%ah ?? 0x7fffe094e851:????? int3 ?? 0x7fffe094e852:????? int3 ?? 0x7fffe094e853:????? int3 ?? 0x7fffe094e854:????? int3 ?? 0x7fffe094e855:????? int3 ?? 0x7fffe094e856:????? int3 ?? 0x7fffe094e857:????? int3 ?? 0x7fffe094e858:????? int3 ?? 0x7fffe094e859:????? int3 ?? 0x7fffe094e85a:????? int3 ?? 0x7fffe094e85b:????? int3 ?? 0x7fffe094e85c:????? int3 ?? 0x7fffe094e85d:????? int3 ---Type to continue, or q to quit--- ?? 0x7fffe094e85e:????? int3 ?? 0x7fffe094e85f:????? int3 ?? 0x7fffe094e860:????? mov??? %eax,-0x16000(%rsp) ?? 0x7fffe094e867:????? push?? %rbp ?? 0x7fffe094e868:????? sub??? $0x10,%rsp ?? 0x7fffe094e86c:????? mov??? %rsi,%rbp ?? 0x7fffe094e86f:????? callq? 0x7fffd9661ea0 ?? 0x7fffe094e874:????? test?? %rax,%rax ?? 0x7fffe094e877:????? je???? 0x7fffe094e8a1 ?? 0x7fffe094e879:????? mov??? %rax,%rsi ?? 0x7fffe094e87c:????? mov??? %rbp,%rdx ?? 0x7fffe094e87f:????? callq? 0x7fffd96624c0 => 0x7fffe094e884:????? mov??? 0x8(%rax),%r10d ?? 0x7fffe094e888:????? cmp??? $0x200002dd,%r10d ?? 0x7fffe094e88f:????? jne??? 0x7fffe094e8ae ?? 0x7fffe094e891:????? add??? $0x10,%rsp ?? 0x7fffe094e895:????? pop??? %rbp ?? 0x7fffe094e896:????? mov??? 0x80(%r15),%r10 ?? 0x7fffe094e89d:????? test?? %eax,(%r10) ?? 0x7fffe094e8a0:????? retq ?? 0x7fffe094e8a1:????? mov??? $0xfffffff6,%esi ?? 0x7fffe094e8a6:????? nop ?? 0x7fffe094e8a7:????? callq? 0x7fffd8ec7220 ?? 0x7fffe094e8ac:????? ud2 ?? 0x7fffe094e8ae:????? mov??? $0xffffffde,%esi ?? 0x7fffe094e8b3:????? mov??? %rax,%rbp ?? 0x7fffe094e8b6:????? nop ---Type to continue, or q to quit--- ?? 0x7fffe094e8b7:????? callq? 0x7fffd8ec7220 ?? 0x7fffe094e8bc:????? ud2 ?? 0x7fffe094e8be:????? mov??? %rax,%rsi ?? 0x7fffe094e8c1:????? jmp??? 0x7fffe094e8c6 ?? 0x7fffe094e8c3:????? mov??? %rax,%rsi ?? 0x7fffe094e8c6:????? add??? $0x10,%rsp ?? 0x7fffe094e8ca:????? pop??? %rbp ?? 0x7fffe094e8cb:????? jmpq?? 0x7fffd903b7a0 ?? 0x7fffe094e8d0:????? mov??? $0xfffffff4,%esi ?? 0x7fffe094e8d5:????? nop ?? 0x7fffe094e8d6:????? nop ?? 0x7fffe094e8d7:????? callq? 0x7fffd8ec7220 ?? 0x7fffe094e8dc:????? ud2 ?? 0x7fffe094e8de:????? hlt ?? 0x7fffe094e8df:????? hlt ?? 0x7fffe094e8e0:????? movabs $0x0,%rbx ?? 0x7fffe094e8ea:????? jmpq?? 0x7fffe094e8ea ?? 0x7fffe094e8ef:????? movabs $0x0,%rbx ?? 0x7fffe094e8f9:????? jmpq?? 0x7fffe094e8f9 ?? 0x7fffe094e8fe:????? jmpq?? 0x7fffd902dfa0 ?? 0x7fffe094e903:????? callq? 0x7fffe094e908 ?? 0x7fffe094e908:????? subq?? $0x5,(%rsp) ?? 0x7fffe094e90d:????? jmpq?? 0x7fffd8ec6c60 ?? 0x7fffe094e912:????? hlt ?? 0x7fffe094e913:????? hlt ?? 0x7fffe094e914:????? hlt ?? 0x7fffe094e915:????? hlt ---Type to continue, or q to quit--- ?? 0x7fffe094e916:????? hlt ?? 0x7fffe094e917:????? hlt ?? 0x7fffe094e918:????? nop ?? 0x7fffe094e919:????? push?? %rbp ?? 0x7fffe094e91a:????? xor??? %cl,%cl ?? 0x7fffe094e91c:????? (bad) ?? 0x7fffe094e91d:????? jg???? 0x7fffe094e91f ?? 0x7fffe094e91f:????? add??? %ch,%al ?? 0x7fffe094e921:????? (bad) ?? 0x7fffe094e922:????? add??? %al,(%rax) ?? 0x7fffe094e924:????? add??? %eax,(%rax) ?? 0x7fffe094e926:????? add??? %al,(%rax) ?? 0x7fffe094e928:????? sub??? $0x13,%bh ?? 0x7fffe094e92b:????? leaveq ?? 0x7fffe094e92c:????? (bad) ?? 0x7fffe094e92d:????? jg???? 0x7fffe094e92f ?? 0x7fffe094e92f:????? add??? %bh,%bh ?? 0x7fffe094e931:????? add??? %al,(%rcx) ?? 0x7fffe094e933:????? add??? %al,(%rax) ?? 0x7fffe094e935:????? add??? %al,(%rax) ?? 0x7fffe094e937:????? add??? %eax,(%rax) ?? 0x7fffe094e939:????? rolb?? %cl,(%rdx) ?? 0x7fffe094e93b:????? add??? %al,(%rcx) ?? 0x7fffe094e93d:????? add??? %eax,(%rdi) ?? 0x7fffe094e93f:????? add??? %al,(%rax) ?? 0x7fffe094e941:????? add??? %eax,(%rax) ?? 0x7fffe094e943:????? add??? %al,(%rax) ---Type to continue, or q to quit--- ?? 0x7fffe094e945:????? add??? %eax,0x11(%rip)??????? # 0x7fffe094e95c ?? 0x7fffe094e94b:????? add??? %ecx,(%rax) ?? 0x7fffe094e94d:????? add??? %al,(%rax) ?? 0x7fffe094e94f:????? add??? %al,(%rdx) ?? 0x7fffe094e951:????? add??? (%rax),%al ?? 0x7fffe094e953:????? add??? %dl,%dl ?? 0x7fffe094e955:????? add??? (%rax),%al ?? 0x7fffe094e957:????? add??? %eax,0x2011(%rip)??????? # 0x7fffe095096e ?? 0x7fffe094e95d:????? add??? %eax,0x0(%rip)??????? # 0x7fffe094e963 ?? 0x7fffe094e963:????? add??? %ecx,(%rax) ?? 0x7fffe094e965:????? adc??? %eax,(%rdi) ?? 0x7fffe094e967:????? add??? %al,(%rax) ?? 0x7fffe094e969:????? add??? %eax,(%rcx) ?? 0x7fffe094e96b:????? add??? %al,(%rax) ?? 0x7fffe094e96d:????? add??? %al,(%rcx) ?? 0x7fffe094e96f:????? add??? (%rax),%al ?? 0x7fffe094e971:????? add??? %al,(%rcx) ?? 0x7fffe094e973:????? or???? %dl,(%rcx) ?? 0x7fffe094e975:????? ds add %cl,%ah ?? 0x7fffe094e978:????? (bad) ?? 0x7fffe094e979:????? (bad) ?? 0x7fffe094e97a:????? (bad) ?? 0x7fffe094e97b:????? incl?? (%rax) ?? 0x7fffe094e97d:????? add??? %al,(%rax) ?? 0x7fffe094e97f:????? add??? %al,(%rax) ?? 0x7fffe094e981:????? add??? %al,(%rax) ?? 0x7fffe094e983:????? add??? %al,(%rax) ---Type to continue, or q to quit--- ?? 0x7fffe094e985:????? add??? %al,(%rax) ?? 0x7fffe094e987:????? add??? %cl,(%rax,%rax,1) ?? 0x7fffe094e98a:????? add??? %al,(%rax) ?? 0x7fffe094e98c:????? add??? %eax,(%rax) ?? 0x7fffe094e98e:????? add??? %al,(%rax) ?? 0x7fffe094e990:????? add??? %al,(%rax) ?? 0x7fffe094e992:????? add??? %al,(%rax) ?? 0x7fffe094e994:????? add??? %al,(%rax) ?? 0x7fffe094e996:????? add??? %al,(%rax) ?? 0x7fffe094e998:????? adc??? $0x0,%al ?? 0x7fffe094e99a:????? add??? %al,(%rax) ?? 0x7fffe094e99c:????? or???? (%rax),%eax ?? 0x7fffe094e99e:????? add??? %al,(%rax) ?? 0x7fffe094e9a0:????? add??? %al,(%rax) ?? 0x7fffe094e9a2:????? add??? %al,(%rax) ?? 0x7fffe094e9a4:????? add??? $0x0,%al ?? 0x7fffe094e9a6:????? add??? %al,(%rax) ?? 0x7fffe094e9a8:????? and??? $0x0,%al ?? 0x7fffe094e9aa:????? add??? %al,(%rax) ?? 0x7fffe094e9ac:????? adc??? $0x0,%al ?? 0x7fffe094e9ae:????? add??? %al,(%rax) ?? 0x7fffe094e9b0:????? add??? %al,(%rax) ?? 0x7fffe094e9b2:????? add??? %al,(%rax) ?? 0x7fffe094e9b4:????? add??? $0x0,%al ?? 0x7fffe094e9b6:????? add??? %al,(%rax) ?? 0x7fffe094e9b8:????? xor??? %eax,(%rax) ?? 0x7fffe094e9ba:????? add??? %al,(%rax) ---Type to continue, or q to quit--- ?? 0x7fffe094e9bc:????? sbb??? (%rax),%al ?? 0x7fffe094e9be:????? add??? %al,(%rax) ?? 0x7fffe094e9c0:????? add??? %al,(%rax) ?? 0x7fffe094e9c2:????? add??? %al,(%rax) ?? 0x7fffe094e9c4:????? add??? %al,(%rax) ?? 0x7fffe094e9c6:????? add??? %al,(%rax) ?? 0x7fffe094e9c8:????? rex.WR add %r8b,(%rax) ?? 0x7fffe094e9cb:????? add??? %ah,(%rsi) ?? 0x7fffe094e9cd:????? add??? %al,(%rax) ?? 0x7fffe094e9cf:????? add??? %al,(%rax) ?? 0x7fffe094e9d1:????? add??? %al,(%rax) Then how to add tracing information to find the WHERE? please give me some hint, thanks a lot! diff -r 1f0838e3cebe src/hotspot/cpu/x86/c1_Runtime1_x86.cpp --- a/src/hotspot/cpu/x86/c1_Runtime1_x86.cpp?? Sat Mar 24 22:11:30 2018 +0800 +++ b/src/hotspot/cpu/x86/c1_Runtime1_x86.cpp?? Mon Mar 26 23:04:12 2018 +0800 @@ -750,7 +750,6 @@ ???? // Pop the return address. ???? __ leave(); -??? __ pop(rcx); ???? __ jmp(rcx);? // jump to exception handler ???? break; ?? default:? ShouldNotReachHere(); -- Regards, Leslie Zhai From lesliezhai at llvm.org.cn Mon Mar 26 15:11:57 2018 From: lesliezhai at llvm.org.cn (Leslie Zhai) Date: Mon, 26 Mar 2018 23:11:57 +0800 Subject: How to use gdb to debug C1 compiler's internal error? In-Reply-To: References: <98099178-e21f-b7e8-444c-a5f4f7d5790f@llvm.org.cn> <38ae05b3-91ed-f3c9-0478-4112dd06edb2@redhat.com> Message-ID: <69ed851c-c2ac-e1d2-bfbf-c19180d7d383@llvm.org.cn>+318FBA1DA8F3623E Hi Thomas, Thanks for your response! ? 2018?03?26? 19:20, Thomas St?fe ??: > > > On Mon, Mar 26, 2018 at 11:48 AM, Andrew Haley > wrote: > > On 03/22/2018 04:32 PM, Leslie Zhai wrote: > > So backtrace or set breakpoint might be helpful for debugging > compiling > > thread, but doesn't work for running thread? I am reading > Analyzing and > > Debugging the HotSpot VM at the OS Level[1] please give me some > advice, > > thanks a lot! > > > > [1] > http://www.progdoc.de/papers/JavaOne2014/javaone2014_con3138.html > > > You'll first need to set a breakpoint in the segfault handler here in > JVM_handle_linux_signal: > > ? VMError::report_and_die(t, sig, pc, info, ucVoid); > > Then you can use gdb to go up the stack to the point of the crash. > > At that point you'll be inspecting the stack to see what's there.? If > you can't tell, then your next plan should be to instrument the code > you're generating to add tracing information so that when it does, you > know where you are. > > > small addition, until you hit the breakpoint at > VMError::report_and_die() gdb may trip over any number of SIGSEGV or > SIGBUS. That is usually normal, since signals are also used internally > for non-error purposes. Just continue until you hit > VMError::report_and_die(), which when you hit it indicates a real > error. Or, set the SIGSEGV handler to nostop. Yes, Fedora's wiki mentions that https://fedoraproject.org/wiki/Java/StackTraces And I am able to discover the normal SIGSEGV or SIGBUS with my eyes now :P Thread 2 "java" received signal SIGSEGV, Segmentation fault. 0x00007fffd8f5319c in ?? () > > Best Regards, Thomas > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > -- Regards, Leslie Zhai From vladimir.kozlov at oracle.com Mon Mar 26 16:46:34 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 09:46:34 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> Message-ID: On 3/26/18 6:57 AM, Schmidt, Lutz wrote: > Thank you Vladimir! > > The missing ttyLocker sneaked out of the code unintentionally and unnoticed. Sorry for that. I have put it back in. Okay. > > From our experience, it is very helpful to have NMethodSweeper information available whenever you look at CodeHeap state analytics. I changed the code in java.cpp such that NMethodSweeper::print(out) isn't called twice. Good. > > I have created a new webrev at http://cr.openjdk.java.net/~lucy/webrevs/8198691.03/index.html This looks good. I will start testing this version. Thanks, Vladimir > > Thank you! > Lutz > > > ?On 23.03.18, 23:40, "Vladimir Kozlov" wrote: > > This looks good. > > NMethodSweeper::print(out) may be called twice in java.cpp because print_heapinfo() also calls it > through print_info(). > > You removed ttyLocker from NMethodSweeper::print() and you don't have lock in > CompileBroker::print_info() which is significant output block. > > Other places are fine AFAIS. Thank you for fixing coding style. > > Thanks, > Vladimir > > On 3/23/18 9:30 AM, Schmidt, Lutz wrote: > > Hi Vladimir, Tobias, > > > > I have worked on your comments quite some time. There were changes to > > - share/code/codeCache.cpp > > - share/code/codeHeapState.cpp > > - share/compiler/compileBroker.cpp > > - share/memory/heap.hpp > > - share/runtime/java.cpp > > > > Here is a summary of what I changed/reworked/adapted: > > - The lock order problem is solved. > > - The CodeHeapStateAnalytics_lock > > o is acquired before the "aggregate" step is begun. > > o is held continuously during the aggregate and print function. > > o is released at function return (after all work is done. > > o just protects from modification of the static variables by other threads. > > - The CodeCache_lock > > o is acquired after the CodeHeapStateAnalytics_lock and only if an "aggregate" step is to be performed. > > o hold time was never observed to be more than one second. Not a guarantee, though. > > - The tty_lock is never acquired during the "aggregate" step, so there is no interference with the CodeCache_lock. > > - In the print* functions, blocks that need to stay together are first composed into a bufferedStream (size 4k). They are then printed to the given outputStream under tty_lock. > > - The remaining out->print_cr() are left by intention. They print diagnostic info if some internal inconsistency is found. > > - The OpenJDK code style wrt. if-then-else should now be respected everywhere. > > - The commented lines you mentioned (codeHeapState.cpp/.hpp) are gone. > > - The "coding alternatives" for printing to the log stream are gone. > > > > SAP-internal testing against SAP JVM did not reveal any problems. > > Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other platforms did not run due to system issues. > > > > There is a new webrev at http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ > > > > Thanks for spending some time, again, on this RFR. > > > > Best regards, > > Lutz > > > > On 20.03.18, 23:01, "Vladimir Kozlov" wrote: > > > > As I remember we are trying to lock tty outside print functions. > > > > Yes, it could be troublesome if it is Mbs of output. Especially when you do it for "full codecache" > > event when VM is still running. You also have CodeCache_lock in print_heapinfo() and it would not be > > good to hold both locks at the same time. I think to have "micro locking" (with comments) in > > print_heapinfo() is better then to have lock in each print function. > > > > Vladimir > > > > On 3/20/18 11:57 AM, Schmidt, Lutz wrote: > > > Hi Vladimir, > > > I already saw that code but was a little hesitant to code the same way. Why? In my case, the stringStream buffer could become fairly large. Actual size depends on CodeHeap size and contents as well as printing parameters. If you tell me some MB are OK, I can change my code. > > > Thanks, > > > Lutz > > > > > > On 20.03.18, 19:42, "Vladimir Kozlov" wrote: > > > > > > I think you should follow what we do with CodeCache::print_summary(): > > > > > > http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 > > > > > > First, print into local buffer stringStream and then lock tty when print that buffer. > > > > > > Thanks, > > > Vladimir > > > > > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > > > > Hi Tobias, > > > > > > > > thank you for uncovering this. In CodeCache::report_codemem_full() I oversaw that the tty lock is held at the place I inserted the call to CompileBroker::print_heapinfo(). > > > > > > > > That bug triggered some thoughts in my brain, resulting in a question or two: > > > > > > > > Given the complex output of CompileBroker::print_heapinfo(), what would be the OpenJDK approach to tty locking? > > > > > > > > Should I do "micro locking", trying to keep together only small blocks? That's what is implemented now. > > > > Should I lock tty before each call to a print function (like print_usedSpace, print_freeSpace, ...)? > > > > > > > > Either approach has its advantages, so I'm more or less neutral. What do you all think? > > > > > > > > Depending on what's in favor by the community, I will move the locks accordingly. > > > > > > > > Thanks, > > > > Lutz > > > > > > > > > > > > On 20.03.18, 15:45, "Tobias Hartmann" wrote: > > > > > > > > Hi Lutz, > > > > > > > > I've already started testing with -Xlog:codecache=Debug and found a problem: > > > > > > > > The following tests > > > > compiler/whitebox/AllocationCodeBlobTest.java > > > > compiler/codecache/OverflowCodeCacheTest.java > > > > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > > > > compiler/codecache/stress/RandomAllocationTest.java > > > > compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > > > > compiler/profiling/spectrapredefineclass/Launcher.java > > > > > > > > fail with > > > > # fatal error: acquiring lock CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > > > > possible deadlock > > > > > > > > Let me know if you need more information to reproduce! > > > > > > > > Thanks, > > > > Tobias > > > > > > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > > > > > Hi Tobias, > > > > > > > > > > thank you! If you haven't started yet, you may want to wait with testing a moment. I will remove the comments Vladimir and you complained about and update the webrev. It's comments only, but you never know... > > > > > > > > > > Thanks, > > > > > Lutz > > > > > > > > > > On 20.03.18, 10:46, "Tobias Hartmann" wrote: > > > > > > > > > > Hi Lutz, > > > > > > > > > > very nice work! Thanks for incorporating the requested changes. I think you can remove the commented > > > > > LogStream code. > > > > > > > > > > I'll re-run the tests that failed with the last webrev. > > > > > > > > > > Best regards, > > > > > Tobias > > > > > > > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > > > > > > Dear all, > > > > > > > > > > > > this is the next (second) iteration of my CodeHeap State Analytics effort. It reflects all the comments and suggestions I received on my initial RFR (sent out on March 1st). Please read on to learn what was changed and what kept as is. > > > > > > > > > > > > May I please request reviews for > > > > > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8198691 > > > > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > > > > > > > > > > > > Instead of keeping the long tail of comments and responses, I decided to provide a summary of what happened. > > > > > > - Most of the new code was moved to new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > > > > > > - I have added, as requested, an abbreviated version of the "General Description" chapter to codeHeapState.cpp > > > > > > - In case of SegmentedCodeCache, the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were issues in aot tests when using FOR_ALL_HEAPS(). > > > > > > - All references to the RFE id should be gone. > > > > > > - In share/runtime/java.cpp, the call to CompileBroker::print_heapinfo() now is close to "PrintCodeCache" for both, product and nonproduct cases. > > > > > > - The edited/updated documentation is available as an attachment to the bug (in PDF format). > > > > > > - I added code to share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap state for the first occurrence of the "full" condition. > > > > > > - The code style "hickups", noted by Tobias Hartmann, are gone. > > > > > > - The compile time warnings and errors are resolved. > > > > > > > > > > > > -XX:+PrintCodeHeapState vs. -Xlog:codecache=Trace > > > > > > I clearly understand and support the intention to get rid of the Print* command line arguments. Therefore, the PrintCodeHeapState command line argument is gone. You can request the CodeHeap state analytics with the -Xlog:codecache=Trace (vm shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) switches. The output is directed to tty, not to the log stream. > > > > > > > > > > > > The reason for not using the log stream is simple: UL prefixes every line with a timestamp and the trace tags. Unfortunately, that messes up my formatting big time. The jcmd output, on the other hand, will not have the UL prefixes. I would have to distinguish between UL and jcmd output when formatting. In addition, I do not see a benefit from adding the same UL prefix to thousands of lines. > > > > > > > > > > > > Comments are very welcome! > > > > > > > > > > > > Best Regards, > > > > > > Lutz > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From vladimir.kozlov at oracle.com Mon Mar 26 16:53:03 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 09:53:03 -0700 Subject: [11] RFR(XS): 8200227: [Graal] Test times out with Graal due to low compile threshold In-Reply-To: <05af2d4f-a72b-1dda-e984-51d13482483c@oracle.com> References: <05af2d4f-a72b-1dda-e984-51d13482483c@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 3/26/18 2:16 AM, Tobias Hartmann wrote: > Hi, > > please review the following test patch: > https://bugs.openjdk.java.net/browse/JDK-8200227 > http://cr.openjdk.java.net/~thartmann/8200227/webrev.00/ > > The test times out with Graal as JIT because it sets -Xbatch -XX:-TieredCompilation > -XX:CompileThreshold=100 which extremely slows down execution due to many blocking compilations of > Graal internal code that needs to be compiled by Graal itself. The test should be executed with > TieredCompilation enabled to allow Graal code to be C1 compiled. I've verified that all intrinsified > methods are still compiled (i.e., the test still does what it's supposed to do). > > I've also searched for other tests that use the same flag combination but we don't have any. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Mon Mar 26 17:01:47 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 10:01:47 -0700 Subject: [11] RFR(S): 8200230: [Graal] Compilations should not be enqueued before Graal is initialized In-Reply-To: References: Message-ID: Looks good. Did you test these changes when Graal is enabled as JIT? Thanks, Vladimir On 3/26/18 3:35 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8200230 > http://cr.openjdk.java.net/~thartmann/8200230/webrev.00/ > > Looking at the PrintCompilation output when running Graal with -Xbatch and -XX:-TieredCompilation > shows lots of blocking compilations that time out because Graal is not yet initialized. Execution > with -version therefore takes 41 seconds on my machine with a fastdebug build. > > With -Xbatch, we should only allow compilations to be enqueued when Graal is fully initialized. This > reduces execution time with -version to 2.5 seconds on my machine. > > Thanks to Doug Simon for providing the patch. > > Best regards, > Tobias > From vladimir.kozlov at oracle.com Mon Mar 26 17:29:29 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 10:29:29 -0700 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> Message-ID: <12ed808d-6efb-800a-e18a-3dd9c191390d@oracle.com> Thank you, Erik, for pointing that. Vladimir On 3/24/18 8:49 AM, Erik Osterlund wrote: > Hi, > > Just thought I should mention that no JavaThread (hence including compiler threads) that has been added to the Threads list may be deleted directly with delete. Instead you should call it with SMR by calling smr_delete(). > > Thanks, > /Erik > >> On 24 Mar 2018, at 01:58, Vladimir Kozlov wrote: >> >>> On 3/23/18 10:37 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> thanks for the quick reply. Just a few answers. I'll take a closer look next week. >>>> You can't delete thread when it is NULL >>> C++ supports calling delete NULL so I think it would be uncommon to check it. If there's a problem, I think the delete operator should get fixed. >>> "If expression evaluates to a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's implementation-defined), but the default deallocation functions are guaranteed to do nothing when handed a null pointer." [1] >> >> I am sure our code analyzing tool, which we use to check code correctness, will compliant about it. >> >>>> We may need to free corresponding java thread object when we remove compiler threads. >>> I think it would be bad to remove the Java Thread objects because we'd need to recreate them which is rather expensive and violates the design principle that Compiler Threads are not allowed to call Java. Removing them wouldn't save much memory. Keeping them in global handles seems to be beneficial and makes this change easier. >> >> Okay. >> >>>> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. >>> My current formula only creates as much compiler threads so that there exist 2 compile jobs per thread. I think this is better for startup, but we can reevaluate this. >> >> Would be nice to see graph how number of compiler threads change with time depending on load for some applications (for example, jbb2005 and specjvm2008 if you have them)? >> >>> Thanks for the improvement proposals. I'll implement them next week. Nevertheless, the current version can already be tested. >> >> I started our testing. >> >> I just remember that we may need to treat -Xcomp and CTW cases specially. >> >> I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. And also tier1 compiler tests with Graal as JIT (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). >> They passed. I think JVMCI code is fine. >> >> But I see crash in CompileLog::finish_log_on_error() function in compiler/compilercontrol jtreg tests (they are not in tier1) with normal jtreg runs: >> >> FAILED: compiler/compilercontrol/commandfile/LogTest.java >> FAILED: compiler/compilercontrol/commands/LogTest.java >> FAILED: compiler/compilercontrol/directives/LogTest.java >> FAILED: compiler/compilercontrol/jcmd/AddLogTest.java >> FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java >> FAILED: compiler/compilercontrol/logcompilation/LogTest.java >> >> I started performance testing too. >> >> Thanks, >> Vladimir >> >>> Best regards, >>> Martin >>> [1] http://en.cppreference.com/w/cpp/language/delete >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Freitag, 23. M?rz 2018 18:17 >>> To: Doerr, Martin >>> Cc: Igor Veresov (igor.veresov at oracle.com) ; White, Derek ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >>> Very cool! >>> Few thoughts. >>> You can't delete thread when it is NULL (missing check or refactor code): >>> if (thread == NULL || thread->osthread() == NULL) { >>> + if (UseDynamicNumberOfCompilerThreads && comp->num_compiler_threads() > 0) { >>> + delete thread; >>> Why not keep handle instead of returning naked oop from create_thread_oop()? You create Handle again >>> Start fields names with _ to distinguish them from local variable: >>> + static int c1_count, c2_count; >>> In possibly_add_compiler_threads() you can use c2_count instead of calling compile_count() again and array size is fixed >>> already: >>> + int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >>> + CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >>> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues >>> filled up very fast so you may end up creating all compiler threads. Or we may need more complex formula. >>> We may need to free corresponding java thread object when we remove compiler threads. >>> Thanks, >>> Vladimir >>>> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>>> Hi Vladimir, >>>> >>>> thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. >>>> >>>> I'll be glad if you can support this effort. >>>> >>>> My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending >>>> on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created >>>> during startup so the Compiler Threads don't need to call Java. >>>> >>>> The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning >>>> parameters or different numbers. >>>> >>>> Threads get stopped in reverse order as they were created when their compile queue is empty for some time. >>>> >>>> The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by >>>> -XX:+TraceCompilerThreads. >>>> >>>> Webrev is here: >>>> >>>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>>> >>>> The following issues need to get addressed, yet: >>>> >>>> -Test JVMCI support. (I'm not familiar with it.) >>>> >>>> -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. >>>> >>>> -Logging. >>>> >>>> -Performance and memory consumption evaluation. >>>> >>>> It would be great to get support and advice for these issues. >>>> >>>> Best regards, >>>> >>>> Martin >>>> > From shravya.rukmannagari at intel.com Mon Mar 26 18:51:41 2018 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Mon, 26 Mar 2018 18:51:41 +0000 Subject: RFR(S) 8200067: Vector Carry-less Multiplication support In-Reply-To: References: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> Message-ID: <8D6F463991A1574A8A803B8DA605414F3A74BEA4@ORSMSX111.amr.corp.intel.com> Hi Vladimir, Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments. http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/ Thanks, Shravya. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, March 23, 2018 2:47 PM To: Rukmannagari, Shravya ; hotspot compiler Cc: Kamath, Smita Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support Hi Shravya, macroAssembler_x86.cpp: Why you placed xmm0 initialization before size check?: + movdqu(xmm0, + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); I think initialization and the check should be inside code guarded by supports_vpclmulqdq(). L_Parallel is not used - no jump to it. Thanks, Vladimir On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote: > Hi everyone, > > As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" > manual [1], vector carry-less multiplication (vpclmulqdq) instruction > will be supported in future Intel ISA. I have updated the CRC32 > algorithm to take advantage of this instruction. I have tested with > Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments. > > http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/ > > Thanks, > > Shravya. > > [1] > https://software.intel.com/sites/default/files/managed/c5/15/architect > ure-instruction-set-extensions-programming-reference.pdf > > [2] > https://software.intel.com/en-us/articles/intel-software-development-e > mulator > > [3] https://bugs.openjdk.java.net/browse/JDK-8200067 > From vladimir.kozlov at oracle.com Mon Mar 26 19:58:38 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 12:58:38 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> Message-ID: All tests passed! I pushed changes. But we forgot to add tests to check functionality. Please, file new RFE and add few jtreg tests. There are exmaples in compiler/codecache/cli/ and other places. Thanks, Vladimir On 3/26/18 9:46 AM, Vladimir Kozlov wrote: > On 3/26/18 6:57 AM, Schmidt, Lutz wrote: >> Thank you Vladimir! >> >> The missing ttyLocker sneaked out of the code unintentionally and >> unnoticed. Sorry for that. I have put it back in. > > Okay. > >> >> ?From our experience, it is very helpful to have NMethodSweeper >> information available whenever you look at CodeHeap state analytics. I >> changed the code in java.cpp such that NMethodSweeper::print(out) >> isn't called twice. > > Good. > >> >> I have created a new webrev at >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.03/index.html > > This looks good. I will start testing this version. > > Thanks, > Vladimir > >> >> Thank you! >> Lutz >> >> >> ?On 23.03.18, 23:40, "Vladimir Kozlov" >> wrote: >> >> ???? This looks good. >> ???? NMethodSweeper::print(out) may be called twice in java.cpp >> because print_heapinfo() also calls it >> ???? through print_info(). >> ???? You removed ttyLocker from NMethodSweeper::print() and you don't >> have lock in >> ???? CompileBroker::print_info() which is significant output block. >> ???? Other places are fine AFAIS. Thank you for fixing coding style. >> ???? Thanks, >> ???? Vladimir >> ???? On 3/23/18 9:30 AM, Schmidt, Lutz wrote: >> ???? > Hi Vladimir, Tobias, >> ???? > >> ???? > I have worked on your comments quite some time. There were >> changes to >> ???? >?? - share/code/codeCache.cpp >> ???? >?? - share/code/codeHeapState.cpp >> ???? >?? - share/compiler/compileBroker.cpp >> ???? >?? - share/memory/heap.hpp >> ???? >?? - share/runtime/java.cpp >> ???? > >> ???? > Here is a summary of what I changed/reworked/adapted: >> ???? >?? - The lock order problem is solved. >> ???? >?? - The CodeHeapStateAnalytics_lock >> ???? >????? o is acquired before the "aggregate" step is begun. >> ???? >????? o is held continuously during the aggregate and print >> function. >> ???? >????? o is released at function return (after all work is done. >> ???? >????? o just protects from modification of the static variables >> by other threads. >> ???? >?? - The CodeCache_lock >> ???? >?????? o is acquired after the CodeHeapStateAnalytics_lock and >> only if an "aggregate" step is to be performed. >> ???? >?????? o hold time was never observed to be more than one >> second. Not a guarantee, though. >> ???? >?? - The tty_lock is never acquired during the "aggregate" step, >> so there is no interference with the CodeCache_lock. >> ???? >?? - In the print* functions, blocks that need to stay together >> are first composed into a bufferedStream (size 4k). They are then >> printed to the given outputStream under tty_lock. >> ???? >?? - The remaining out->print_cr() are left by intention. They >> print diagnostic info if some internal inconsistency is found. >> ???? >?? - The OpenJDK code style wrt. if-then-else should now be >> respected everywhere. >> ???? >?? - The commented lines you mentioned (codeHeapState.cpp/.hpp) >> are gone. >> ???? >?? - The "coding alternatives" for printing to the log stream >> are gone. >> ???? > >> ???? > SAP-internal testing against SAP JVM did not reveal any problems. >> ???? > Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other >> platforms did not run due to system issues. >> ???? > >> ???? > There is a new webrev at >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ >> ???? > >> ???? > Thanks for spending some time, again, on this RFR. >> ???? > >> ???? > Best regards, >> ???? > Lutz >> ???? > >> ???? > On 20.03.18, 23:01, "Vladimir Kozlov" >> wrote: >> ???? > >> ???? >????? As I remember we are trying to lock tty outside print >> functions. >> ???? > >> ???? >????? Yes, it could be troublesome if it is Mbs of output. >> Especially when you do it for "full codecache" >> ???? >????? event when VM is still running. You also have >> CodeCache_lock in print_heapinfo() and it would not be >> ???? >????? good to hold both locks at the same time. I think to have >> "micro locking" (with comments) in >> ???? >????? print_heapinfo() is better then to have lock in each print >> function. >> ???? > >> ???? >????? Vladimir >> ???? > >> ???? >????? On 3/20/18 11:57 AM, Schmidt, Lutz wrote: >> ???? >????? > Hi Vladimir, >> ???? >????? > I already saw that code but was a little hesitant to >> code the same way. Why? In my case, the stringStream buffer could >> become fairly large. Actual size depends on CodeHeap size and contents >> as well as printing parameters. If you tell me some MB are OK, I can >> change my code. >> ???? >????? > Thanks, >> ???? >????? > Lutz >> ???? >????? > >> ???? >????? > On 20.03.18, 19:42, "Vladimir Kozlov" >> wrote: >> ???? >????? > >> ???? >????? >????? I think you should follow what we do with >> CodeCache::print_summary(): >> ???? >????? > >> ???? >????? > >> http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 >> >> ???? >????? > >> ???? >????? >????? First, print into local buffer stringStream and >> then lock tty when print that buffer. >> ???? >????? > >> ???? >????? >????? Thanks, >> ???? >????? >????? Vladimir >> ???? >????? > >> ???? >????? >????? On 3/20/18 11:29 AM, Schmidt, Lutz wrote: >> ???? >????? >????? > Hi Tobias, >> ???? >????? >????? > >> ???? >????? >????? > thank you for uncovering this. In >> CodeCache::report_codemem_full() I oversaw that the tty lock is held >> at the place I inserted the call to CompileBroker::print_heapinfo(). >> ???? >????? >????? > >> ???? >????? >????? > That bug triggered some thoughts in my brain, >> resulting in a question or two: >> ???? >????? >????? > >> ???? >????? >????? > Given the complex output of >> CompileBroker::print_heapinfo(), what would be the OpenJDK approach to >> tty locking? >> ???? >????? >????? > >> ???? >????? >????? > Should I do "micro locking", trying to keep >> together only small blocks? That's what is implemented now. >> ???? >????? >????? > Should I lock tty before each call to a print >> function (like print_usedSpace, print_freeSpace, ...)? >> ???? >????? >????? > >> ???? >????? >????? > Either approach has its advantages, so I'm more >> or less neutral. What do you all think? >> ???? >????? >????? > >> ???? >????? >????? > Depending on what's in favor by the community, I >> will move the locks accordingly. >> ???? >????? >????? > >> ???? >????? >????? > Thanks, >> ???? >????? >????? > Lutz >> ???? >????? >????? > >> ???? >????? >????? > >> ???? >????? >????? > On 20.03.18, 15:45, "Tobias Hartmann" >> wrote: >> ???? >????? >????? > >> ???? >????? >????? >????? Hi Lutz, >> ???? >????? >????? > >> ???? >????? >????? >????? I've already started testing with >> -Xlog:codecache=Debug and found a problem: >> ???? >????? >????? > >> ???? >????? >????? >????? The following tests >> ???? >????? >????? >????? compiler/whitebox/AllocationCodeBlobTest.java >> ???? >????? >????? >????? compiler/codecache/OverflowCodeCacheTest.java >> ???? >????? >????? > >> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java >> ???? >????? >????? > >> compiler/codecache/stress/RandomAllocationTest.java >> ???? >????? >????? > >> compiler/profiling/spectrapredefineclass_classloaders/Launcher.java >> ???? >????? >????? > >> compiler/profiling/spectrapredefineclass/Launcher.java >> ???? >????? >????? > >> ???? >????? >????? >????? fail with >> ???? >????? >????? >????? #? fatal error: acquiring lock >> CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- >> ???? >????? >????? >????? possible deadlock >> ???? >????? >????? > >> ???? >????? >????? >????? Let me know if you need more information to >> reproduce! >> ???? >????? >????? > >> ???? >????? >????? >????? Thanks, >> ???? >????? >????? >????? Tobias >> ???? >????? >????? > >> ???? >????? >????? >????? On 20.03.2018 11:25, Schmidt, Lutz wrote: >> ???? >????? >????? >????? > Hi Tobias, >> ???? >????? >????? >????? > >> ???? >????? >????? >????? > thank you! If you haven't started yet, you >> may want to wait with testing a moment. I will remove the comments >> Vladimir and you complained about and update the webrev. It's comments >> only, but you never know... >> ???? >????? >????? >????? > >> ???? >????? >????? >????? > Thanks, >> ???? >????? >????? >????? > Lutz >> ???? >????? >????? >????? > >> ???? >????? >????? >????? > On 20.03.18, 10:46, "Tobias Hartmann" >> wrote: >> ???? >????? >????? >????? > >> ???? >????? >????? >????? >???? Hi Lutz, >> ???? >????? >????? >????? > >> ???? >????? >????? >????? >???? very nice work! Thanks for >> incorporating the requested changes. I think you can remove the commented >> ???? >????? >????? >????? >???? LogStream code. >> ???? >????? >????? >????? > >> ???? >????? >????? >????? >???? I'll re-run the tests that failed with >> the last webrev. >> ???? >????? >????? >????? > >> ???? >????? >????? >????? >???? Best regards, >> ???? >????? >????? >????? >???? Tobias >> ???? >????? >????? >????? > >> ???? >????? >????? >????? >???? On 19.03.2018 17:00, Schmidt, Lutz wrote: >> ???? >????? >????? >????? >???? > Dear all, >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > this is the next (second) iteration >> of my CodeHeap State Analytics effort. It reflects all the comments >> and suggestions I received on my initial RFR (sent out on March 1st). >> Please read on to learn what was changed and what kept as is. >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > May I please request reviews for >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > Bug: >> https://bugs.openjdk.java.net/browse/JDK-8198691 >> ???? >????? >????? >????? >???? > Webrev: >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > Instead of keeping the long tail of >> comments and responses, I decided to provide a summary of what happened. >> ???? >????? >????? >????? >???? >? - Most of the new code was moved to >> new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp >> ???? >????? >????? >????? >???? >? - I have added, as requested, an >> abbreviated version of the "General Description" chapter to >> codeHeapState.cpp >> ???? >????? >????? >????? >???? >? - In case of SegmentedCodeCache, >> the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were >> issues in aot tests when using FOR_ALL_HEAPS(). >> ???? >????? >????? >????? >???? >? - All references to the RFE id >> should be gone. >> ???? >????? >????? >????? >???? >? - In share/runtime/java.cpp, the >> call to CompileBroker::print_heapinfo() now is close to >> "PrintCodeCache" for both, product and nonproduct cases. >> ???? >????? >????? >????? >???? >? - The edited/updated documentation >> is available as an attachment to the bug (in PDF format). >> ???? >????? >????? >????? >???? >? - I added code to >> share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap >> state for the first occurrence of the "full" condition. >> ???? >????? >????? >????? >???? >? - The code style "hickups", noted >> by Tobias Hartmann, are gone. >> ???? >????? >????? >????? >???? >? - The compile time warnings and >> errors are resolved. >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > -XX:+PrintCodeHeapState vs. >> -Xlog:codecache=Trace >> ???? >????? >????? >????? >???? > I clearly understand and support the >> intention to get rid of the Print* command line arguments. Therefore, >> the PrintCodeHeapState command line argument is gone. You can request >> the CodeHeap state analytics with the -Xlog:codecache=Trace (vm >> shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) >> switches. The output is directed to tty, not to the log stream. >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > The reason for not using the log >> stream is simple: UL prefixes every line with a timestamp and the >> trace tags. Unfortunately, that messes up my formatting big time. The >> jcmd output, on the other hand, will not have the UL prefixes. I would >> have to distinguish between UL and jcmd output when formatting. In >> addition, I do not see a benefit from adding the same UL prefix to >> thousands of lines. >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > Comments are very welcome! >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > Best Regards, >> ???? >????? >????? >????? >???? > Lutz >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? >???? > >> ???? >????? >????? >????? > >> ???? >????? >????? >????? > >> ???? >????? >????? > >> ???? >????? >????? > >> ???? >????? > >> ???? >????? > >> ???? > >> ???? > >> From vladimir.kozlov at oracle.com Mon Mar 26 20:36:27 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 13:36:27 -0700 Subject: RFR(S) 8200067: Vector Carry-less Multiplication support In-Reply-To: <8D6F463991A1574A8A803B8DA605414F3A74BEA4@ORSMSX111.amr.corp.intel.com> References: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> <8D6F463991A1574A8A803B8DA605414F3A74BEA4@ORSMSX111.amr.corp.intel.com> Message-ID: <08b488ac-8cb6-caa4-0e26-bc5a220d9511@oracle.com> I was talking about next change since you need new check only when vpclmulqdq is supported: + if (VM_Version::supports_vpclmulqdq()) { + Label Parallel_loop, L_No_Parallel; + + cmpl(len, 8); + jccb(Assembler::less, L_No_Parallel); + + movdqu(xmm0, ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); + evmovdquq(xmm1, Address(buf, 0), Assembler::AVX_512bit); + movdl(xmm5, crc); + evpxorq(xmm1, xmm1, xmm5, Assembler::AVX_512bit); + addptr(buf, 64); + subl(len, 7); + evshufi64x2(xmm0, xmm0, xmm0, 0x00, Assembler::AVX_512bit); //propagate the mask from 128 bits to 512 bits + + BIND(Parallel_loop); + fold_128bit_crc32_avx512(xmm1, xmm0, xmm5, buf, 0); + addptr(buf, 64); + subl(len, 4); + jcc(Assembler::greater, Parallel_loop); + + vextracti64x2(xmm2, xmm1, 0x01); + vextracti64x2(xmm3, xmm1, 0x02); + vextracti64x2(xmm4, xmm1, 0x03); + jmp(L_fold_512b); + + BIND(L_No_Parallel); + } Please, update webrev. I will start testing with my change and let you know results. Thanks, Vladimir On 3/26/18 11:51 AM, Rukmannagari, Shravya wrote: > Hi Vladimir, > Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments. > http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/ > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, March 23, 2018 2:47 PM > To: Rukmannagari, Shravya ; hotspot compiler > Cc: Kamath, Smita > Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support > > Hi Shravya, > > macroAssembler_x86.cpp: > > Why you placed xmm0 initialization before size check?: > > + movdqu(xmm0, > + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); > > I think initialization and the check should be inside code guarded by supports_vpclmulqdq(). > > L_Parallel is not used - no jump to it. > > Thanks, > Vladimir > > On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote: >> Hi everyone, >> >> As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" >> manual [1], vector carry-less multiplication (vpclmulqdq) instruction >> will be supported in future Intel ISA. I have updated the CRC32 >> algorithm to take advantage of this instruction. I have tested with >> Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments. >> >> http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/ >> >> Thanks, >> >> Shravya. >> >> [1] >> https://software.intel.com/sites/default/files/managed/c5/15/architect >> ure-instruction-set-extensions-programming-reference.pdf >> >> [2] >> https://software.intel.com/en-us/articles/intel-software-development-e >> mulator >> >> [3] https://bugs.openjdk.java.net/browse/JDK-8200067 >> From vladimir.kozlov at oracle.com Mon Mar 26 21:15:20 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 14:15:20 -0700 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> Message-ID: Hi Martin, We can't delete _log when deleting CompilerThread. Log is referenced globally and used on VM exit to generate final log file when -XX:+LogCompilation is specified. compilercontrol tests passed after I change it: +CompilerThread::~CompilerThread() { + // Delete objects which were allocated on heap. + delete _counters; + // _log is referenced in global CompileLog::_first chain and used on exit. +} I also see that we C1 compiler threads are removed too soon which cause their re-activation again. This may eat memory: $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t Added initial compiler thread C2 CompilerThread0 Added initial compiler thread C1 CompilerThread0 Warning: TraceDependencies results may be inflated by VerifyDependencies Added compiler thread C1 CompilerThread1 (available memory: 37040MB) Added compiler thread C1 CompilerThread2 (available memory: 37033MB) Added compiler thread C1 CompilerThread3 (available memory: 37032MB) Removing compiler thread C1 CompilerThread3 Removing compiler thread C1 CompilerThread2 Removing compiler thread C1 CompilerThread1 Added compiler thread C1 CompilerThread1 (available memory: 37027MB) May be we should take into account for how long these threads are not used. Thanks, Vladimir On 3/23/18 5:58 PM, Vladimir Kozlov wrote: > On 3/23/18 10:37 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> thanks for the quick reply. Just a few answers. I'll take a closer >> look next week. >> >>> You can't delete thread when it is NULL >> C++ supports calling delete NULL so I think it would be uncommon to >> check it. If there's a problem, I think the delete operator should get >> fixed. >> >> "If expression evaluates to a null pointer value, no destructors are >> called, and the deallocation function may or may not be called (it's >> implementation-defined), but the default deallocation functions are >> guaranteed to do nothing when handed a null pointer." [1] > > I am sure our code analyzing tool, which we use to check code > correctness, will compliant about it. > >> >>> We may need to free corresponding java thread object when we remove >>> compiler threads. >> I think it would be bad to remove the Java Thread objects because we'd >> need to recreate them which is rather expensive and violates the >> design principle that Compiler Threads are not allowed to call Java. >> Removing them wouldn't save much memory. Keeping them in global >> handles seems to be beneficial and makes this change easier. > > Okay. > >> >>> And I thought we would need to add only one threads each time when we >>> hit some queue size threshold. At the start queues filled up very >>> fast so you may end up creating all compiler threads. >> My current formula only creates as much compiler threads so that there >> exist 2 compile jobs per thread. I think this is better for startup, >> but we can reevaluate this. > > Would be nice to see graph how number of compiler threads change with > time depending on load for some applications (for example, jbb2005 and > specjvm2008 if you have them)? > >> >> Thanks for the improvement proposals. I'll implement them next week. >> Nevertheless, the current version can already be tested. > > I started our testing. > > I just remember that we may need to treat -Xcomp and CTW cases specially. > > I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. > And also tier1 compiler tests with Graal as JIT > (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI > -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). > They passed. I think JVMCI code is fine. > > But I see crash in CompileLog::finish_log_on_error() function in > compiler/compilercontrol jtreg tests (they are not in tier1) with normal > jtreg runs: > > FAILED: compiler/compilercontrol/commandfile/LogTest.java > FAILED: compiler/compilercontrol/commands/LogTest.java > FAILED: compiler/compilercontrol/directives/LogTest.java > FAILED: compiler/compilercontrol/jcmd/AddLogTest.java > FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java > FAILED: compiler/compilercontrol/logcompilation/LogTest.java > > I started performance testing too. > > Thanks, > Vladimir > >> >> Best regards, >> Martin >> >> >> [1] http://en.cppreference.com/w/cpp/language/delete >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Freitag, 23. M?rz 2018 18:17 >> To: Doerr, Martin >> Cc: Igor Veresov (igor.veresov at oracle.com) ; >> White, Derek ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >> >> Very cool! >> >> Few thoughts. >> >> You can't delete thread when it is NULL (missing check or refactor code): >> >> ???? if (thread == NULL || thread->osthread() == NULL) { >> +??? if (UseDynamicNumberOfCompilerThreads && >> comp->num_compiler_threads() > 0) { >> +????? delete thread; >> >> Why not keep handle instead of returning naked oop from >> create_thread_oop()? You create Handle again >> >> Start fields names with _ to distinguish them from local variable: >> >> +? static int c1_count, c2_count; >> >> In possibly_add_compiler_threads() you can use c2_count instead of >> calling compile_count() again and array size is fixed >> already: >> >> +??? int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >> + >> CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >> >> And I thought we would need to add only one threads each time when we >> hit some queue size threshold. At the start queues >> filled up very fast so you may end up creating all compiler threads. >> Or we may need more complex formula. >> >> We may need to free corresponding java thread object when we remove >> compiler threads. >> >> Thanks, >> Vladimir >> >> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> thanks for updating the RFE. I already had similar ideas so I've >>> implemented a prototype. >>> >>> I'll be glad if you can support this effort. >>> >>> My implementation starts only one thread per type (C1/C2) initially. >>> Compiler threads start additional threads depending >>> on the compile queue size, the available memory and the predetermined >>> maximum. The Java Thread objects get created >>> during startup so the Compiler Threads don't need to call Java. >>> >>> The heuristics (in possibly_add_compiler_threads()) are just an >>> initial proposal and we may want to add tuning >>> parameters or different numbers. >>> >>> Threads get stopped in reverse order as they were created when their >>> compile queue is empty for some time. >>> >>> The feature can be switched by >>> -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal >>> can be traced by >>> -XX:+TraceCompilerThreads. >>> >>> Webrev is here: >>> >>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>> >>> The following issues need to get addressed, yet: >>> >>> -Test JVMCI support. (I'm not familiar with it.) >>> >>> -Possible memory leaks. I've added some delete calls when a thread >>> dies, but they may be incomplete. >>> >>> -Logging. >>> >>> -Performance and memory consumption evaluation. >>> >>> It would be great to get support and advice for these issues. >>> >>> Best regards, >>> >>> Martin >>> From gromero at linux.vnet.ibm.com Mon Mar 26 21:37:02 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 26 Mar 2018 18:37:02 -0300 Subject: [8u] RFR for backport of JDK-8164920: ppc: enhancement of CRC32 intrinsic to jdk8u-dev Message-ID: <31e036a0-a7b7-70f2-422f-addd4049436f@linux.vnet.ibm.com> Hi Goetz, Lutz, and Hiroshi, I would like to backport the CRC32 intrinsics [1,3] and so I've backported the following patchset necessary to accomplish that: [0] 8086069: Adapt runtime calls to recent intrinsics to pass ints as long (Needed by 8131048. Changes in shared code were removed) http://cr.openjdk.java.net/~gromero/crc32_jdk8u/8086069/v1/ Author: goetz [1] 8131048: ppc: implement CRC32 intrinsic http://cr.openjdk.java.net/~gromero/crc32_jdk8u/8131048/v1/ Author: lutz [2] 8077838: Recent developments for ppc. (Just the implementation of has_vpmsumb() is needed by 8164920) http://cr.openjdk.java.net/~gromero/crc32_jdk8u/8077838/v1/ Author: goetz [3] 8164920: ppc: enhancement of CRC32 intrinsic (Hiroshi's intrinsic for C2) http://cr.openjdk.java.net/~gromero/crc32_jdk8u/8164920/v1/ Author: hiroshi None applied cleanly so could you please review them? I kept them not squashed in order to ease the review. Please let me know if a single patch is better. The patchset is PPC64-only and I tested it on Linux PPC64 LE, so I need a help to test on AIX and Linux BE as well. The bulk of changes are in [1] and [3]. I'm aware that SAP is heavily working on JDK11 right now but since the absence of the CRC32 intrinsics hurts workloads like Apache Cassandra I would be very glad if that change could make its way into jdk8u. Thank you. Best regards, Gustavo [0] 8086069: Adapt runtime calls to recent intrinsics to pass ints as long (https://bugs.openjdk.java.net/browse/JDK-8086069) [1] 8131048: ppc: implement CRC32 intrinsic (https://bugs.openjdk.java.net/browse/JDK-8131048) [2] 8077838: Recent developments for ppc. (https://bugs.openjdk.java.net/browse/JDK-8077838) [3] 8164920: ppc: enhancement of CRC32 intrinsic (https://bugs.openjdk.java.net/browse/JDK-8164920) From shravya.rukmannagari at intel.com Tue Mar 27 00:43:18 2018 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Tue, 27 Mar 2018 00:43:18 +0000 Subject: RFR(S) 8200067: Vector Carry-less Multiplication support In-Reply-To: <08b488ac-8cb6-caa4-0e26-bc5a220d9511@oracle.com> References: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> <8D6F463991A1574A8A803B8DA605414F3A74BEA4@ORSMSX111.amr.corp.intel.com> <08b488ac-8cb6-caa4-0e26-bc5a220d9511@oracle.com> Message-ID: <8D6F463991A1574A8A803B8DA605414F3A74C004@ORSMSX111.amr.corp.intel.com> Hi Vladimir, I have made the suggested changes. Please let me know if you have any questions or comments. http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/ Thanks, Shravya. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, March 26, 2018 1:36 PM To: Rukmannagari, Shravya ; hotspot compiler Cc: Kamath, Smita Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support I was talking about next change since you need new check only when vpclmulqdq is supported: + if (VM_Version::supports_vpclmulqdq()) { + Label Parallel_loop, L_No_Parallel; + + cmpl(len, 8); + jccb(Assembler::less, L_No_Parallel); + + movdqu(xmm0, ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); + evmovdquq(xmm1, Address(buf, 0), Assembler::AVX_512bit); + movdl(xmm5, crc); + evpxorq(xmm1, xmm1, xmm5, Assembler::AVX_512bit); + addptr(buf, 64); + subl(len, 7); + evshufi64x2(xmm0, xmm0, xmm0, 0x00, Assembler::AVX_512bit); //propagate the mask from 128 bits to 512 bits + + BIND(Parallel_loop); + fold_128bit_crc32_avx512(xmm1, xmm0, xmm5, buf, 0); + addptr(buf, 64); + subl(len, 4); + jcc(Assembler::greater, Parallel_loop); + + vextracti64x2(xmm2, xmm1, 0x01); + vextracti64x2(xmm3, xmm1, 0x02); + vextracti64x2(xmm4, xmm1, 0x03); + jmp(L_fold_512b); + + BIND(L_No_Parallel); + } Please, update webrev. I will start testing with my change and let you know results. Thanks, Vladimir On 3/26/18 11:51 AM, Rukmannagari, Shravya wrote: > Hi Vladimir, > Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments. > http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/ > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, March 23, 2018 2:47 PM > To: Rukmannagari, Shravya ; hotspot > compiler > Cc: Kamath, Smita > Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support > > Hi Shravya, > > macroAssembler_x86.cpp: > > Why you placed xmm0 initialization before size check?: > > + movdqu(xmm0, > + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); > > I think initialization and the check should be inside code guarded by supports_vpclmulqdq(). > > L_Parallel is not used - no jump to it. > > Thanks, > Vladimir > > On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote: >> Hi everyone, >> >> As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" >> manual [1], vector carry-less multiplication (vpclmulqdq) instruction >> will be supported in future Intel ISA. I have updated the CRC32 >> algorithm to take advantage of this instruction. I have tested with >> Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments. >> >> http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/ >> >> Thanks, >> >> Shravya. >> >> [1] >> https://software.intel.com/sites/default/files/managed/c5/15/architec >> t ure-instruction-set-extensions-programming-reference.pdf >> >> [2] >> https://software.intel.com/en-us/articles/intel-software-development- >> e >> mulator >> >> [3] https://bugs.openjdk.java.net/browse/JDK-8200067 >> From vladimir.kozlov at oracle.com Tue Mar 27 00:47:09 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Mar 2018 17:47:09 -0700 Subject: RFR(S) 8200067: Vector Carry-less Multiplication support In-Reply-To: <8D6F463991A1574A8A803B8DA605414F3A74C004@ORSMSX111.amr.corp.intel.com> References: <8D6F463991A1574A8A803B8DA605414F3A748871@ORSMSX111.amr.corp.intel.com> <8D6F463991A1574A8A803B8DA605414F3A74BEA4@ORSMSX111.amr.corp.intel.com> <08b488ac-8cb6-caa4-0e26-bc5a220d9511@oracle.com> <8D6F463991A1574A8A803B8DA605414F3A74C004@ORSMSX111.amr.corp.intel.com> Message-ID: <83db6b42-f8a9-f049-cb45-c756ffa5284b@oracle.com> Good. Testing passed with these changes. I will push it. Thanks, Vladimir On 3/26/18 5:43 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > I have made the suggested changes. Please let me know if you have any questions or comments. > http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/ > > Thanks, > Shravya. > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, March 26, 2018 1:36 PM > To: Rukmannagari, Shravya ; hotspot compiler > Cc: Kamath, Smita > Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support > > I was talking about next change since you need new check only when vpclmulqdq is supported: > > + if (VM_Version::supports_vpclmulqdq()) { > + Label Parallel_loop, L_No_Parallel; > + > + cmpl(len, 8); > + jccb(Assembler::less, L_No_Parallel); > + > + movdqu(xmm0, > ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); > + evmovdquq(xmm1, Address(buf, 0), Assembler::AVX_512bit); > + movdl(xmm5, crc); > + evpxorq(xmm1, xmm1, xmm5, Assembler::AVX_512bit); > + addptr(buf, 64); > + subl(len, 7); > + evshufi64x2(xmm0, xmm0, xmm0, 0x00, Assembler::AVX_512bit); > //propagate the mask from 128 bits to 512 bits > + > + BIND(Parallel_loop); > + fold_128bit_crc32_avx512(xmm1, xmm0, xmm5, buf, 0); > + addptr(buf, 64); > + subl(len, 4); > + jcc(Assembler::greater, Parallel_loop); > + > + vextracti64x2(xmm2, xmm1, 0x01); > + vextracti64x2(xmm3, xmm1, 0x02); > + vextracti64x2(xmm4, xmm1, 0x03); > + jmp(L_fold_512b); > + > + BIND(L_No_Parallel); > + } > > Please, update webrev. I will start testing with my change and let you know results. > > Thanks, > Vladimir > > On 3/26/18 11:51 AM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments. >> http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/ >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Friday, March 23, 2018 2:47 PM >> To: Rukmannagari, Shravya ; hotspot >> compiler >> Cc: Kamath, Smita >> Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support >> >> Hi Shravya, >> >> macroAssembler_x86.cpp: >> >> Why you placed xmm0 initialization before size check?: >> >> + movdqu(xmm0, >> + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32)); >> >> I think initialization and the check should be inside code guarded by supports_vpclmulqdq(). >> >> L_Parallel is not used - no jump to it. >> >> Thanks, >> Vladimir >> >> On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote: >>> Hi everyone, >>> >>> As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" >>> manual [1], vector carry-less multiplication (vpclmulqdq) instruction >>> will be supported in future Intel ISA. I have updated the CRC32 >>> algorithm to take advantage of this instruction. I have tested with >>> Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments. >>> >>> http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/ >>> >>> Thanks, >>> >>> Shravya. >>> >>> [1] >>> https://software.intel.com/sites/default/files/managed/c5/15/architec >>> t ure-instruction-set-extensions-programming-reference.pdf >>> >>> [2] >>> https://software.intel.com/en-us/articles/intel-software-development- >>> e >>> mulator >>> >>> [3] https://bugs.openjdk.java.net/browse/JDK-8200067 >>> From per.liden at oracle.com Tue Mar 27 08:27:59 2018 From: per.liden at oracle.com (Per Liden) Date: Tue, 27 Mar 2018 10:27:59 +0200 Subject: RFR: 8200168: Remove DONT_USE_REGISTER_DEFINES on Sparc In-Reply-To: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> References: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> Message-ID: <22590c02-934c-e924-04d8-6ff20dd91b59@oracle.com> Could I please get a second review on this? Vladimir? /Per On 03/23/2018 11:29 AM, Per Liden wrote: > Hi, > > Please review this patch to remove register macros on Sparc, as > discussed here: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028541.html > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200168 > Webrev: http://cr.openjdk.java.net/~pliden/8200168/webrev.0 > > Testing: Passed hs-tier{1,2} > > /Per From tobias.hartmann at oracle.com Tue Mar 27 09:02:24 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Mar 2018 11:02:24 +0200 Subject: [11] RFR(XS): 8200227: [Graal] Test times out with Graal due to low compile threshold In-Reply-To: References: <05af2d4f-a72b-1dda-e984-51d13482483c@oracle.com> Message-ID: <2d747967-8c55-15d9-e7a4-1773a4773f54@oracle.com> Thanks Vladimir. Best regards, Tobias On 26.03.2018 18:53, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 3/26/18 2:16 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following test patch: >> https://bugs.openjdk.java.net/browse/JDK-8200227 >> http://cr.openjdk.java.net/~thartmann/8200227/webrev.00/ >> >> The test times out with Graal as JIT because it sets -Xbatch -XX:-TieredCompilation >> -XX:CompileThreshold=100 which extremely slows down execution due to many blocking compilations of >> Graal internal code that needs to be compiled by Graal itself. The test should be executed with >> TieredCompilation enabled to allow Graal code to be C1 compiled. I've verified that all intrinsified >> methods are still compiled (i.e., the test still does what it's supposed to do). >> >> I've also searched for other tests that use the same flag combination but we don't have any. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Tue Mar 27 09:08:51 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Mar 2018 11:08:51 +0200 Subject: [11] RFR(S): 8200290: Scratch buffer creation fails with "assert(!current_thread_in_native()) failed: must not be in native" on SPARC Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8200290 http://cr.openjdk.java.net/~thartmann/8200290/webrev.00/ Similar to JDK-8193699, the code in MacroAssembler::constant_oop_address() needs to be changed after JDK-8167372 to transition from native. This only reproduces on older SPARC machines (I've hit it on a SPARC64 VII+) and therefore didn't show up in our regular testing. I've verified the fix on this machine. Thanks, Tobias From tobias.hartmann at oracle.com Tue Mar 27 09:10:53 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Mar 2018 11:10:53 +0200 Subject: [11] RFR(S): 8200230: [Graal] Compilations should not be enqueued before Graal is initialized In-Reply-To: References: Message-ID: <4b85c99e-a48b-4064-d5a2-5475899f2a77@oracle.com> Thanks Vladimir! On 26.03.2018 19:01, Vladimir Kozlov wrote: > Looks good. Did you test these changes when Graal is enabled as JIT? Yes, I've tested with Graal but since we have lots of known issues, it's a bit hard to filter out the real ones. I haven't seen any related issues. Best regards, Tobias > On 3/26/18 3:35 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8200230 >> http://cr.openjdk.java.net/~thartmann/8200230/webrev.00/ >> >> Looking at the PrintCompilation output when running Graal with -Xbatch and -XX:-TieredCompilation >> shows lots of blocking compilations that time out because Graal is not yet initialized. Execution >> with -version therefore takes 41 seconds on my machine with a fastdebug build. >> >> With -Xbatch, we should only allow compilations to be enqueued when Graal is fully initialized. This >> reduces execution time with -version to 2.5 seconds on my machine. >> >> Thanks to Doug Simon for providing the patch. >> >> Best regards, >> Tobias >> From lutz.schmidt at sap.com Tue Mar 27 09:30:35 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 27 Mar 2018 09:30:35 +0000 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <1b411eee-29c8-b977-96b0-45145ebbfcf9@oracle.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> Message-ID: <1DFFA964-DE2E-48E4-BABD-BA36A95C067A@sap.com> Hi Vladimir, thank you for testing and pushing! I will spend some thoughts on how to automatically test functionality, open an RFE (https://bugs.openjdk.java.net/browse/JDK-8200296), and create some tests. But, please, give me some days grace period before you expect an RFR. Thank you, Lutz ?On 26.03.18, 21:58, "Vladimir Kozlov" wrote: All tests passed! I pushed changes. But we forgot to add tests to check functionality. Please, file new RFE and add few jtreg tests. There are exmaples in compiler/codecache/cli/ and other places. Thanks, Vladimir On 3/26/18 9:46 AM, Vladimir Kozlov wrote: > On 3/26/18 6:57 AM, Schmidt, Lutz wrote: >> Thank you Vladimir! >> >> The missing ttyLocker sneaked out of the code unintentionally and >> unnoticed. Sorry for that. I have put it back in. > > Okay. > >> >> From our experience, it is very helpful to have NMethodSweeper >> information available whenever you look at CodeHeap state analytics. I >> changed the code in java.cpp such that NMethodSweeper::print(out) >> isn't called twice. > > Good. > >> >> I have created a new webrev at >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.03/index.html > > This looks good. I will start testing this version. > > Thanks, > Vladimir > >> >> Thank you! >> Lutz >> >> >> On 23.03.18, 23:40, "Vladimir Kozlov" >> wrote: >> >> This looks good. >> NMethodSweeper::print(out) may be called twice in java.cpp >> because print_heapinfo() also calls it >> through print_info(). >> You removed ttyLocker from NMethodSweeper::print() and you don't >> have lock in >> CompileBroker::print_info() which is significant output block. >> Other places are fine AFAIS. Thank you for fixing coding style. >> Thanks, >> Vladimir >> On 3/23/18 9:30 AM, Schmidt, Lutz wrote: >> > Hi Vladimir, Tobias, >> > >> > I have worked on your comments quite some time. There were >> changes to >> > - share/code/codeCache.cpp >> > - share/code/codeHeapState.cpp >> > - share/compiler/compileBroker.cpp >> > - share/memory/heap.hpp >> > - share/runtime/java.cpp >> > >> > Here is a summary of what I changed/reworked/adapted: >> > - The lock order problem is solved. >> > - The CodeHeapStateAnalytics_lock >> > o is acquired before the "aggregate" step is begun. >> > o is held continuously during the aggregate and print >> function. >> > o is released at function return (after all work is done. >> > o just protects from modification of the static variables >> by other threads. >> > - The CodeCache_lock >> > o is acquired after the CodeHeapStateAnalytics_lock and >> only if an "aggregate" step is to be performed. >> > o hold time was never observed to be more than one >> second. Not a guarantee, though. >> > - The tty_lock is never acquired during the "aggregate" step, >> so there is no interference with the CodeCache_lock. >> > - In the print* functions, blocks that need to stay together >> are first composed into a bufferedStream (size 4k). They are then >> printed to the given outputStream under tty_lock. >> > - The remaining out->print_cr() are left by intention. They >> print diagnostic info if some internal inconsistency is found. >> > - The OpenJDK code style wrt. if-then-else should now be >> respected everywhere. >> > - The commented lines you mentioned (codeHeapState.cpp/.hpp) >> are gone. >> > - The "coding alternatives" for printing to the log stream >> are gone. >> > >> > SAP-internal testing against SAP JVM did not reveal any problems. >> > Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other >> platforms did not run due to system issues. >> > >> > There is a new webrev at >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ >> > >> > Thanks for spending some time, again, on this RFR. >> > >> > Best regards, >> > Lutz >> > >> > On 20.03.18, 23:01, "Vladimir Kozlov" >> wrote: >> > >> > As I remember we are trying to lock tty outside print >> functions. >> > >> > Yes, it could be troublesome if it is Mbs of output. >> Especially when you do it for "full codecache" >> > event when VM is still running. You also have >> CodeCache_lock in print_heapinfo() and it would not be >> > good to hold both locks at the same time. I think to have >> "micro locking" (with comments) in >> > print_heapinfo() is better then to have lock in each print >> function. >> > >> > Vladimir >> > >> > On 3/20/18 11:57 AM, Schmidt, Lutz wrote: >> > > Hi Vladimir, >> > > I already saw that code but was a little hesitant to >> code the same way. Why? In my case, the stringStream buffer could >> become fairly large. Actual size depends on CodeHeap size and contents >> as well as printing parameters. If you tell me some MB are OK, I can >> change my code. >> > > Thanks, >> > > Lutz >> > > >> > > On 20.03.18, 19:42, "Vladimir Kozlov" >> wrote: >> > > >> > > I think you should follow what we do with >> CodeCache::print_summary(): >> > > >> > > >> http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 >> >> > > >> > > First, print into local buffer stringStream and >> then lock tty when print that buffer. >> > > >> > > Thanks, >> > > Vladimir >> > > >> > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: >> > > > Hi Tobias, >> > > > >> > > > thank you for uncovering this. In >> CodeCache::report_codemem_full() I oversaw that the tty lock is held >> at the place I inserted the call to CompileBroker::print_heapinfo(). >> > > > >> > > > That bug triggered some thoughts in my brain, >> resulting in a question or two: >> > > > >> > > > Given the complex output of >> CompileBroker::print_heapinfo(), what would be the OpenJDK approach to >> tty locking? >> > > > >> > > > Should I do "micro locking", trying to keep >> together only small blocks? That's what is implemented now. >> > > > Should I lock tty before each call to a print >> function (like print_usedSpace, print_freeSpace, ...)? >> > > > >> > > > Either approach has its advantages, so I'm more >> or less neutral. What do you all think? >> > > > >> > > > Depending on what's in favor by the community, I >> will move the locks accordingly. >> > > > >> > > > Thanks, >> > > > Lutz >> > > > >> > > > >> > > > On 20.03.18, 15:45, "Tobias Hartmann" >> wrote: >> > > > >> > > > Hi Lutz, >> > > > >> > > > I've already started testing with >> -Xlog:codecache=Debug and found a problem: >> > > > >> > > > The following tests >> > > > compiler/whitebox/AllocationCodeBlobTest.java >> > > > compiler/codecache/OverflowCodeCacheTest.java >> > > > >> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java >> > > > >> compiler/codecache/stress/RandomAllocationTest.java >> > > > >> compiler/profiling/spectrapredefineclass_classloaders/Launcher.java >> > > > >> compiler/profiling/spectrapredefineclass/Launcher.java >> > > > >> > > > fail with >> > > > # fatal error: acquiring lock >> CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- >> > > > possible deadlock >> > > > >> > > > Let me know if you need more information to >> reproduce! >> > > > >> > > > Thanks, >> > > > Tobias >> > > > >> > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: >> > > > > Hi Tobias, >> > > > > >> > > > > thank you! If you haven't started yet, you >> may want to wait with testing a moment. I will remove the comments >> Vladimir and you complained about and update the webrev. It's comments >> only, but you never know... >> > > > > >> > > > > Thanks, >> > > > > Lutz >> > > > > >> > > > > On 20.03.18, 10:46, "Tobias Hartmann" >> wrote: >> > > > > >> > > > > Hi Lutz, >> > > > > >> > > > > very nice work! Thanks for >> incorporating the requested changes. I think you can remove the commented >> > > > > LogStream code. >> > > > > >> > > > > I'll re-run the tests that failed with >> the last webrev. >> > > > > >> > > > > Best regards, >> > > > > Tobias >> > > > > >> > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: >> > > > > > Dear all, >> > > > > > >> > > > > > this is the next (second) iteration >> of my CodeHeap State Analytics effort. It reflects all the comments >> and suggestions I received on my initial RFR (sent out on March 1st). >> Please read on to learn what was changed and what kept as is. >> > > > > > >> > > > > > May I please request reviews for >> > > > > > >> > > > > > Bug: >> https://bugs.openjdk.java.net/browse/JDK-8198691 >> > > > > > Webrev: >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ >> > > > > > >> > > > > > Instead of keeping the long tail of >> comments and responses, I decided to provide a summary of what happened. >> > > > > > - Most of the new code was moved to >> new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp >> > > > > > - I have added, as requested, an >> abbreviated version of the "General Description" chapter to >> codeHeapState.cpp >> > > > > > - In case of SegmentedCodeCache, >> the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were >> issues in aot tests when using FOR_ALL_HEAPS(). >> > > > > > - All references to the RFE id >> should be gone. >> > > > > > - In share/runtime/java.cpp, the >> call to CompileBroker::print_heapinfo() now is close to >> "PrintCodeCache" for both, product and nonproduct cases. >> > > > > > - The edited/updated documentation >> is available as an attachment to the bug (in PDF format). >> > > > > > - I added code to >> share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap >> state for the first occurrence of the "full" condition. >> > > > > > - The code style "hickups", noted >> by Tobias Hartmann, are gone. >> > > > > > - The compile time warnings and >> errors are resolved. >> > > > > > >> > > > > > -XX:+PrintCodeHeapState vs. >> -Xlog:codecache=Trace >> > > > > > I clearly understand and support the >> intention to get rid of the Print* command line arguments. Therefore, >> the PrintCodeHeapState command line argument is gone. You can request >> the CodeHeap state analytics with the -Xlog:codecache=Trace (vm >> shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) >> switches. The output is directed to tty, not to the log stream. >> > > > > > >> > > > > > The reason for not using the log >> stream is simple: UL prefixes every line with a timestamp and the >> trace tags. Unfortunately, that messes up my formatting big time. The >> jcmd output, on the other hand, will not have the UL prefixes. I would >> have to distinguish between UL and jcmd output when formatting. In >> addition, I do not see a benefit from adding the same UL prefix to >> thousands of lines. >> > > > > > >> > > > > > Comments are very welcome! >> > > > > > >> > > > > > Best Regards, >> > > > > > Lutz >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> > >> From shade at redhat.com Tue Mar 27 09:37:36 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 27 Mar 2018 11:37:36 +0200 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> Message-ID: <2baa3057-16d1-0f28-3ac7-ce7b0a9fc647@redhat.com> On 03/26/2018 09:58 PM, Vladimir Kozlov wrote: > All tests passed! I pushed changes. This change failed x86_32 build, see: https://bugs.openjdk.java.net/browse/JDK-8200297 Lutz, can you please fix those ASAP? -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From martin.doerr at sap.com Tue Mar 27 13:44:55 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Mar 2018 13:44:55 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> Message-ID: <046e7d38c5a24655ab66feab57812def@sap.com> Hi Vladimir, thank you very much for looking into these problems. I think we should keep and reuse the log objects like we keep the Compiler Thread Java Objects. I will think about how to keep the C1 threads longer alive. About the delete NULL question: I think it's a common coding style to do the NULL check inside of operator delete in order to be consistent with C++' default operators. In my opinion, code checking tools should complain about free(NULL), but accept delete NULL (of a certain type). Are you sure that delete NULL is not acceptable? I'm currently not fit, but I'll continue to work on this RFE when I feel better. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Montag, 26. M?rz 2018 23:15 To: Doerr, Martin Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads Hi Martin, We can't delete _log when deleting CompilerThread. Log is referenced globally and used on VM exit to generate final log file when -XX:+LogCompilation is specified. compilercontrol tests passed after I change it: +CompilerThread::~CompilerThread() { + // Delete objects which were allocated on heap. + delete _counters; + // _log is referenced in global CompileLog::_first chain and used on exit. +} I also see that we C1 compiler threads are removed too soon which cause their re-activation again. This may eat memory: $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t Added initial compiler thread C2 CompilerThread0 Added initial compiler thread C1 CompilerThread0 Warning: TraceDependencies results may be inflated by VerifyDependencies Added compiler thread C1 CompilerThread1 (available memory: 37040MB) Added compiler thread C1 CompilerThread2 (available memory: 37033MB) Added compiler thread C1 CompilerThread3 (available memory: 37032MB) Removing compiler thread C1 CompilerThread3 Removing compiler thread C1 CompilerThread2 Removing compiler thread C1 CompilerThread1 Added compiler thread C1 CompilerThread1 (available memory: 37027MB) May be we should take into account for how long these threads are not used. Thanks, Vladimir On 3/23/18 5:58 PM, Vladimir Kozlov wrote: > On 3/23/18 10:37 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> thanks for the quick reply. Just a few answers. I'll take a closer >> look next week. >> >>> You can't delete thread when it is NULL >> C++ supports calling delete NULL so I think it would be uncommon to >> check it. If there's a problem, I think the delete operator should get >> fixed. >> >> "If expression evaluates to a null pointer value, no destructors are >> called, and the deallocation function may or may not be called (it's >> implementation-defined), but the default deallocation functions are >> guaranteed to do nothing when handed a null pointer." [1] > > I am sure our code analyzing tool, which we use to check code > correctness, will compliant about it. > >> >>> We may need to free corresponding java thread object when we remove >>> compiler threads. >> I think it would be bad to remove the Java Thread objects because we'd >> need to recreate them which is rather expensive and violates the >> design principle that Compiler Threads are not allowed to call Java. >> Removing them wouldn't save much memory. Keeping them in global >> handles seems to be beneficial and makes this change easier. > > Okay. > >> >>> And I thought we would need to add only one threads each time when we >>> hit some queue size threshold. At the start queues filled up very >>> fast so you may end up creating all compiler threads. >> My current formula only creates as much compiler threads so that there >> exist 2 compile jobs per thread. I think this is better for startup, >> but we can reevaluate this. > > Would be nice to see graph how number of compiler threads change with > time depending on load for some applications (for example, jbb2005 and > specjvm2008 if you have them)? > >> >> Thanks for the improvement proposals. I'll implement them next week. >> Nevertheless, the current version can already be tested. > > I started our testing. > > I just remember that we may need to treat -Xcomp and CTW cases specially. > > I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. > And also tier1 compiler tests with Graal as JIT > (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI > -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). > They passed. I think JVMCI code is fine. > > But I see crash in CompileLog::finish_log_on_error() function in > compiler/compilercontrol jtreg tests (they are not in tier1) with normal > jtreg runs: > > FAILED: compiler/compilercontrol/commandfile/LogTest.java > FAILED: compiler/compilercontrol/commands/LogTest.java > FAILED: compiler/compilercontrol/directives/LogTest.java > FAILED: compiler/compilercontrol/jcmd/AddLogTest.java > FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java > FAILED: compiler/compilercontrol/logcompilation/LogTest.java > > I started performance testing too. > > Thanks, > Vladimir > >> >> Best regards, >> Martin >> >> >> [1] http://en.cppreference.com/w/cpp/language/delete >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Freitag, 23. M?rz 2018 18:17 >> To: Doerr, Martin >> Cc: Igor Veresov (igor.veresov at oracle.com) ; >> White, Derek ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >> >> Very cool! >> >> Few thoughts. >> >> You can't delete thread when it is NULL (missing check or refactor code): >> >> ???? if (thread == NULL || thread->osthread() == NULL) { >> +??? if (UseDynamicNumberOfCompilerThreads && >> comp->num_compiler_threads() > 0) { >> +????? delete thread; >> >> Why not keep handle instead of returning naked oop from >> create_thread_oop()? You create Handle again >> >> Start fields names with _ to distinguish them from local variable: >> >> +? static int c1_count, c2_count; >> >> In possibly_add_compiler_threads() you can use c2_count instead of >> calling compile_count() again and array size is fixed >> already: >> >> +??? int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >> + >> CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >> >> And I thought we would need to add only one threads each time when we >> hit some queue size threshold. At the start queues >> filled up very fast so you may end up creating all compiler threads. >> Or we may need more complex formula. >> >> We may need to free corresponding java thread object when we remove >> compiler threads. >> >> Thanks, >> Vladimir >> >> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> thanks for updating the RFE. I already had similar ideas so I've >>> implemented a prototype. >>> >>> I'll be glad if you can support this effort. >>> >>> My implementation starts only one thread per type (C1/C2) initially. >>> Compiler threads start additional threads depending >>> on the compile queue size, the available memory and the predetermined >>> maximum. The Java Thread objects get created >>> during startup so the Compiler Threads don't need to call Java. >>> >>> The heuristics (in possibly_add_compiler_threads()) are just an >>> initial proposal and we may want to add tuning >>> parameters or different numbers. >>> >>> Threads get stopped in reverse order as they were created when their >>> compile queue is empty for some time. >>> >>> The feature can be switched by >>> -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal >>> can be traced by >>> -XX:+TraceCompilerThreads. >>> >>> Webrev is here: >>> >>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>> >>> The following issues need to get addressed, yet: >>> >>> -Test JVMCI support. (I'm not familiar with it.) >>> >>> -Possible memory leaks. I've added some delete calls when a thread >>> dies, but they may be incomplete. >>> >>> -Logging. >>> >>> -Performance and memory consumption evaluation. >>> >>> It would be great to get support and advice for these issues. >>> >>> Best regards, >>> >>> Martin >>> From martin.doerr at sap.com Tue Mar 27 13:49:44 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Mar 2018 13:49:44 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <12ed808d-6efb-800a-e18a-3dd9c191390d@oracle.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> <12ed808d-6efb-800a-e18a-3dd9c191390d@oracle.com> Message-ID: <18b4cfea1fcd4979a3db3f0120045156@sap.com> Ok. I'll try to use smr_delete(). Thank you. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Montag, 26. M?rz 2018 19:29 To: Erik Osterlund Cc: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads Thank you, Erik, for pointing that. Vladimir On 3/24/18 8:49 AM, Erik Osterlund wrote: > Hi, > > Just thought I should mention that no JavaThread (hence including compiler threads) that has been added to the Threads list may be deleted directly with delete. Instead you should call it with SMR by calling smr_delete(). > > Thanks, > /Erik > >> On 24 Mar 2018, at 01:58, Vladimir Kozlov wrote: >> >>> On 3/23/18 10:37 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> thanks for the quick reply. Just a few answers. I'll take a closer look next week. >>>> You can't delete thread when it is NULL >>> C++ supports calling delete NULL so I think it would be uncommon to check it. If there's a problem, I think the delete operator should get fixed. >>> "If expression evaluates to a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's implementation-defined), but the default deallocation functions are guaranteed to do nothing when handed a null pointer." [1] >> >> I am sure our code analyzing tool, which we use to check code correctness, will compliant about it. >> >>>> We may need to free corresponding java thread object when we remove compiler threads. >>> I think it would be bad to remove the Java Thread objects because we'd need to recreate them which is rather expensive and violates the design principle that Compiler Threads are not allowed to call Java. Removing them wouldn't save much memory. Keeping them in global handles seems to be beneficial and makes this change easier. >> >> Okay. >> >>>> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues filled up very fast so you may end up creating all compiler threads. >>> My current formula only creates as much compiler threads so that there exist 2 compile jobs per thread. I think this is better for startup, but we can reevaluate this. >> >> Would be nice to see graph how number of compiler threads change with time depending on load for some applications (for example, jbb2005 and specjvm2008 if you have them)? >> >>> Thanks for the improvement proposals. I'll implement them next week. Nevertheless, the current version can already be tested. >> >> I started our testing. >> >> I just remember that we may need to treat -Xcomp and CTW cases specially. >> >> I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. And also tier1 compiler tests with Graal as JIT (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). >> They passed. I think JVMCI code is fine. >> >> But I see crash in CompileLog::finish_log_on_error() function in compiler/compilercontrol jtreg tests (they are not in tier1) with normal jtreg runs: >> >> FAILED: compiler/compilercontrol/commandfile/LogTest.java >> FAILED: compiler/compilercontrol/commands/LogTest.java >> FAILED: compiler/compilercontrol/directives/LogTest.java >> FAILED: compiler/compilercontrol/jcmd/AddLogTest.java >> FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java >> FAILED: compiler/compilercontrol/logcompilation/LogTest.java >> >> I started performance testing too. >> >> Thanks, >> Vladimir >> >>> Best regards, >>> Martin >>> [1] http://en.cppreference.com/w/cpp/language/delete >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Freitag, 23. M?rz 2018 18:17 >>> To: Doerr, Martin >>> Cc: Igor Veresov (igor.veresov at oracle.com) ; White, Derek ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >>> Very cool! >>> Few thoughts. >>> You can't delete thread when it is NULL (missing check or refactor code): >>> if (thread == NULL || thread->osthread() == NULL) { >>> + if (UseDynamicNumberOfCompilerThreads && comp->num_compiler_threads() > 0) { >>> + delete thread; >>> Why not keep handle instead of returning naked oop from create_thread_oop()? You create Handle again >>> Start fields names with _ to distinguish them from local variable: >>> + static int c1_count, c2_count; >>> In possibly_add_compiler_threads() you can use c2_count instead of calling compile_count() again and array size is fixed >>> already: >>> + int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >>> + CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >>> And I thought we would need to add only one threads each time when we hit some queue size threshold. At the start queues >>> filled up very fast so you may end up creating all compiler threads. Or we may need more complex formula. >>> We may need to free corresponding java thread object when we remove compiler threads. >>> Thanks, >>> Vladimir >>>> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>>> Hi Vladimir, >>>> >>>> thanks for updating the RFE. I already had similar ideas so I've implemented a prototype. >>>> >>>> I'll be glad if you can support this effort. >>>> >>>> My implementation starts only one thread per type (C1/C2) initially. Compiler threads start additional threads depending >>>> on the compile queue size, the available memory and the predetermined maximum. The Java Thread objects get created >>>> during startup so the Compiler Threads don't need to call Java. >>>> >>>> The heuristics (in possibly_add_compiler_threads()) are just an initial proposal and we may want to add tuning >>>> parameters or different numbers. >>>> >>>> Threads get stopped in reverse order as they were created when their compile queue is empty for some time. >>>> >>>> The feature can be switched by -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal can be traced by >>>> -XX:+TraceCompilerThreads. >>>> >>>> Webrev is here: >>>> >>>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>>> >>>> The following issues need to get addressed, yet: >>>> >>>> -Test JVMCI support. (I'm not familiar with it.) >>>> >>>> -Possible memory leaks. I've added some delete calls when a thread dies, but they may be incomplete. >>>> >>>> -Logging. >>>> >>>> -Performance and memory consumption evaluation. >>>> >>>> It would be great to get support and advice for these issues. >>>> >>>> Best regards, >>>> >>>> Martin >>>> > From thomas.stuefe at gmail.com Tue Mar 27 14:20:49 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 27 Mar 2018 16:20:49 +0200 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) Message-ID: Hi, could I please have reviews for this tiny but urgent fix to x86-32: Bug: https://bugs.openjdk.java.net/browse/JDK-8200297 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.00/webrev/ I built on linux x64 and x86. Currently tier1 tests are running. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at redhat.com Tue Mar 27 14:30:44 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 27 Mar 2018 16:30:44 +0200 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: References: Message-ID: On 03/27/2018 04:20 PM, Thomas St?fe wrote: > could I please have reviews for this tiny but urgent fix to x86-32: > > Bug:?https://bugs.openjdk.java.net/browse/JDK-8200297 > webrev:?http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.00/webrev/ Not sure if we want to cast to size_t, instead of selecting the format specifiers for the expressions. Have you printed the tables before/after? Because it seems you have lost a space here ("K"/"M" goes there in other lines): here | v - ast->print("[%5d ..%5d ): " - ,(SizeDistributionArray[i].rangeStart<print("[" SIZE_FORMAT_W(5) ".." SIZE_FORMAT_W(5) "): " + ,(size_t)(SizeDistributionArray[i].rangeStart<print("[%5d ..%5d ): " - ,(SizeDistributionArray[i].rangeStart<print("[" SIZE_FORMAT_W(5) ".." SIZE_FORMAT_W(5) "): " + ,(size_t)(SizeDistributionArray[i].rangeStart< From rwestrel at redhat.com Tue Mar 27 14:35:16 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 27 Mar 2018 16:35:16 +0200 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch Message-ID: http://cr.openjdk.java.net/~roland/8200303/webrev.00/ Over in the Shenandoah project, Aleksey found that: 1) a counted loop with a switch: for (...) { switch (..) { ... default: throw SomeException(); } } where some cases break out of the loop would not perform as well when loop strip mining is enabled, even if the cases that exit the loop are never taken in practice. Because C2 gives all branches out of a JumpNode the same probability, exiting the loop has a non null probability and GCM computes (wrongly) that scheduling the loop strip mining book keeping logic in the loop is cheaper than out of the loop. 2) Shenandoah write barriers in some of the cases should be hoisted but are not because C2 can't tell that only a single case of the switch is ever hit. http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html In the Shenandoah repo, we have a change that makes C2 leverage profiling for switch. Experiments showed that 1) and 2) above are fixed and that some common benchmarks run with parallel gc benefit as well (~+7% on Serial): http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-February/004886.html The patch I'm proposing here is based on the patch we've been using for a couple months in Shenandoah: - it fixes profile collection in c1 for lookupswitch/tableswitch - it sets profiling information on IfNodes and JumpNodes emitted from lookupswitch/tableswitch and propagate it after matching so GCM can take advantage of it - it takes advantage of profiling to find never taken cases and trim down the cases (or ranges as they're called in the code). A never taken range can now cause an uncommon trap. and also has some improvements: - if some ranges are a lot more common than others, it might pay off to check for them one after the other before going to the binary search. The patch has some logic to evaluate the number of steps in the binary search and determine whether checking for the most common case upfront would pay off (from profile data) - the binary search doesn't always keep the tree balanced but instead picks a mid point that split frequencies in half Roland. From vladimir.kozlov at oracle.com Tue Mar 27 14:42:59 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 07:42:59 -0700 Subject: RFR(L): 8198691: CodeHeap State Analytics In-Reply-To: <1DFFA964-DE2E-48E4-BABD-BA36A95C067A@sap.com> References: <250CD3F0-F4AB-4EA1-9EA9-8136442FDFF9@sap.com> <8EB0096D-1020-4E86-82E1-2D7D564601AE@sap.com> <14c67b9d-4e1d-cb49-b29a-0d4cd02fdcc9@oracle.com> <2c4bac01-9775-2abc-01b6-f20bb580c257@oracle.com> <201E8551-1024-4BF2-996D-3387E6C1D52A@sap.com> <1ed9df48-cf5e-8755-f0f5-c25ed0efe8fd@oracle.com> <0C762F33-344E-49AB-B5DC-0553BFB67533@sap.com> <65dd8816-620f-68e9-7f3b-f221aa0a43de@oracle.com> <7B32C15D-EF32-4544-BAA4-3C7AAD99ED55@sap.com> <439573c8-d24c-59e2-1355-d08df369792b@oracle.com> <7870010E-543F-450B-BB9A-6255D31D1BC1@sap.com> <1DFFA964-DE2E-48E4-BABD-BA36A95C067A@sap.com> Message-ID: On 3/27/18 2:30 AM, Schmidt, Lutz wrote: > Hi Vladimir, > thank you for testing and pushing! > > I will spend some thoughts on how to automatically test functionality, open an RFE (https://bugs.openjdk.java.net/browse/JDK-8200296), and create some tests. But, please, give me some days grace period before you expect an RFR. Okay, no problem. Vladimir > > Thank you, > Lutz > > ?On 26.03.18, 21:58, "Vladimir Kozlov" wrote: > > All tests passed! I pushed changes. > > But we forgot to add tests to check functionality. Please, file new RFE > and add few jtreg tests. There are exmaples in compiler/codecache/cli/ > and other places. > > Thanks, > Vladimir > > On 3/26/18 9:46 AM, Vladimir Kozlov wrote: > > On 3/26/18 6:57 AM, Schmidt, Lutz wrote: > >> Thank you Vladimir! > >> > >> The missing ttyLocker sneaked out of the code unintentionally and > >> unnoticed. Sorry for that. I have put it back in. > > > > Okay. > > > >> > >> From our experience, it is very helpful to have NMethodSweeper > >> information available whenever you look at CodeHeap state analytics. I > >> changed the code in java.cpp such that NMethodSweeper::print(out) > >> isn't called twice. > > > > Good. > > > >> > >> I have created a new webrev at > >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.03/index.html > > > > This looks good. I will start testing this version. > > > > Thanks, > > Vladimir > > > >> > >> Thank you! > >> Lutz > >> > >> > >> On 23.03.18, 23:40, "Vladimir Kozlov" > >> wrote: > >> > >> This looks good. > >> NMethodSweeper::print(out) may be called twice in java.cpp > >> because print_heapinfo() also calls it > >> through print_info(). > >> You removed ttyLocker from NMethodSweeper::print() and you don't > >> have lock in > >> CompileBroker::print_info() which is significant output block. > >> Other places are fine AFAIS. Thank you for fixing coding style. > >> Thanks, > >> Vladimir > >> On 3/23/18 9:30 AM, Schmidt, Lutz wrote: > >> > Hi Vladimir, Tobias, > >> > > >> > I have worked on your comments quite some time. There were > >> changes to > >> > - share/code/codeCache.cpp > >> > - share/code/codeHeapState.cpp > >> > - share/compiler/compileBroker.cpp > >> > - share/memory/heap.hpp > >> > - share/runtime/java.cpp > >> > > >> > Here is a summary of what I changed/reworked/adapted: > >> > - The lock order problem is solved. > >> > - The CodeHeapStateAnalytics_lock > >> > o is acquired before the "aggregate" step is begun. > >> > o is held continuously during the aggregate and print > >> function. > >> > o is released at function return (after all work is done. > >> > o just protects from modification of the static variables > >> by other threads. > >> > - The CodeCache_lock > >> > o is acquired after the CodeHeapStateAnalytics_lock and > >> only if an "aggregate" step is to be performed. > >> > o hold time was never observed to be more than one > >> second. Not a guarantee, though. > >> > - The tty_lock is never acquired during the "aggregate" step, > >> so there is no interference with the CodeCache_lock. > >> > - In the print* functions, blocks that need to stay together > >> are first composed into a bufferedStream (size 4k). They are then > >> printed to the given outputStream under tty_lock. > >> > - The remaining out->print_cr() are left by intention. They > >> print diagnostic info if some internal inconsistency is found. > >> > - The OpenJDK code style wrt. if-then-else should now be > >> respected everywhere. > >> > - The commented lines you mentioned (codeHeapState.cpp/.hpp) > >> are gone. > >> > - The "coding alternatives" for printing to the log stream > >> are gone. > >> > > >> > SAP-internal testing against SAP JVM did not reveal any problems. > >> > Testing OpenJDK (jdk/hs repo, linuxx86_64) was all green. Other > >> platforms did not run due to system issues. > >> > > >> > There is a new webrev at > >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.02/ > >> > > >> > Thanks for spending some time, again, on this RFR. > >> > > >> > Best regards, > >> > Lutz > >> > > >> > On 20.03.18, 23:01, "Vladimir Kozlov" > >> wrote: > >> > > >> > As I remember we are trying to lock tty outside print > >> functions. > >> > > >> > Yes, it could be troublesome if it is Mbs of output. > >> Especially when you do it for "full codecache" > >> > event when VM is still running. You also have > >> CodeCache_lock in print_heapinfo() and it would not be > >> > good to hold both locks at the same time. I think to have > >> "micro locking" (with comments) in > >> > print_heapinfo() is better then to have lock in each print > >> function. > >> > > >> > Vladimir > >> > > >> > On 3/20/18 11:57 AM, Schmidt, Lutz wrote: > >> > > Hi Vladimir, > >> > > I already saw that code but was a little hesitant to > >> code the same way. Why? In my case, the stringStream buffer could > >> become fairly large. Actual size depends on CodeHeap size and contents > >> as well as printing parameters. If you tell me some MB are OK, I can > >> change my code. > >> > > Thanks, > >> > > Lutz > >> > > > >> > > On 20.03.18, 19:42, "Vladimir Kozlov" > >> wrote: > >> > > > >> > > I think you should follow what we do with > >> CodeCache::print_summary(): > >> > > > >> > > > >> http://hg.openjdk.java.net/jdk/hs/file/74db2b7cec75/src/hotspot/share/code/codeCache.cpp#l1359 > >> > >> > > > >> > > First, print into local buffer stringStream and > >> then lock tty when print that buffer. > >> > > > >> > > Thanks, > >> > > Vladimir > >> > > > >> > > On 3/20/18 11:29 AM, Schmidt, Lutz wrote: > >> > > > Hi Tobias, > >> > > > > >> > > > thank you for uncovering this. In > >> CodeCache::report_codemem_full() I oversaw that the tty lock is held > >> at the place I inserted the call to CompileBroker::print_heapinfo(). > >> > > > > >> > > > That bug triggered some thoughts in my brain, > >> resulting in a question or two: > >> > > > > >> > > > Given the complex output of > >> CompileBroker::print_heapinfo(), what would be the OpenJDK approach to > >> tty locking? > >> > > > > >> > > > Should I do "micro locking", trying to keep > >> together only small blocks? That's what is implemented now. > >> > > > Should I lock tty before each call to a print > >> function (like print_usedSpace, print_freeSpace, ...)? > >> > > > > >> > > > Either approach has its advantages, so I'm more > >> or less neutral. What do you all think? > >> > > > > >> > > > Depending on what's in favor by the community, I > >> will move the locks accordingly. > >> > > > > >> > > > Thanks, > >> > > > Lutz > >> > > > > >> > > > > >> > > > On 20.03.18, 15:45, "Tobias Hartmann" > >> wrote: > >> > > > > >> > > > Hi Lutz, > >> > > > > >> > > > I've already started testing with > >> -Xlog:codecache=Debug and found a problem: > >> > > > > >> > > > The following tests > >> > > > compiler/whitebox/AllocationCodeBlobTest.java > >> > > > compiler/codecache/OverflowCodeCacheTest.java > >> > > > > >> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java > >> > > > > >> compiler/codecache/stress/RandomAllocationTest.java > >> > > > > >> compiler/profiling/spectrapredefineclass_classloaders/Launcher.java > >> > > > > >> compiler/profiling/spectrapredefineclass/Launcher.java > >> > > > > >> > > > fail with > >> > > > # fatal error: acquiring lock > >> CodeHeapStateAnalytics_lock/6 out of order with lock tty_lock/0 -- > >> > > > possible deadlock > >> > > > > >> > > > Let me know if you need more information to > >> reproduce! > >> > > > > >> > > > Thanks, > >> > > > Tobias > >> > > > > >> > > > On 20.03.2018 11:25, Schmidt, Lutz wrote: > >> > > > > Hi Tobias, > >> > > > > > >> > > > > thank you! If you haven't started yet, you > >> may want to wait with testing a moment. I will remove the comments > >> Vladimir and you complained about and update the webrev. It's comments > >> only, but you never know... > >> > > > > > >> > > > > Thanks, > >> > > > > Lutz > >> > > > > > >> > > > > On 20.03.18, 10:46, "Tobias Hartmann" > >> wrote: > >> > > > > > >> > > > > Hi Lutz, > >> > > > > > >> > > > > very nice work! Thanks for > >> incorporating the requested changes. I think you can remove the commented > >> > > > > LogStream code. > >> > > > > > >> > > > > I'll re-run the tests that failed with > >> the last webrev. > >> > > > > > >> > > > > Best regards, > >> > > > > Tobias > >> > > > > > >> > > > > On 19.03.2018 17:00, Schmidt, Lutz wrote: > >> > > > > > Dear all, > >> > > > > > > >> > > > > > this is the next (second) iteration > >> of my CodeHeap State Analytics effort. It reflects all the comments > >> and suggestions I received on my initial RFR (sent out on March 1st). > >> Please read on to learn what was changed and what kept as is. > >> > > > > > > >> > > > > > May I please request reviews for > >> > > > > > > >> > > > > > Bug: > >> https://bugs.openjdk.java.net/browse/JDK-8198691 > >> > > > > > Webrev: > >> http://cr.openjdk.java.net/~lucy/webrevs/8198691.01/ > >> > > > > > > >> > > > > > Instead of keeping the long tail of > >> comments and responses, I decided to provide a summary of what happened. > >> > > > > > - Most of the new code was moved to > >> new files: share/code/codeHeapState.cpp and share/code/codeHeapState.hpp > >> > > > > > - I have added, as requested, an > >> abbreviated version of the "General Description" chapter to > >> codeHeapState.cpp > >> > > > > > - In case of SegmentedCodeCache, > >> the iteration is limited to FOR_ALL_ALLOCABLE_HEAPS(). There were > >> issues in aot tests when using FOR_ALL_HEAPS(). > >> > > > > > - All references to the RFE id > >> should be gone. > >> > > > > > - In share/runtime/java.cpp, the > >> call to CompileBroker::print_heapinfo() now is close to > >> "PrintCodeCache" for both, product and nonproduct cases. > >> > > > > > - The edited/updated documentation > >> is available as an attachment to the bug (in PDF format). > >> > > > > > - I added code to > >> share/code/codeCache.cpp (report_codemem_full()) to print the CodeHeap > >> state for the first occurrence of the "full" condition. > >> > > > > > - The code style "hickups", noted > >> by Tobias Hartmann, are gone. > >> > > > > > - The compile time warnings and > >> errors are resolved. > >> > > > > > > >> > > > > > -XX:+PrintCodeHeapState vs. > >> -Xlog:codecache=Trace > >> > > > > > I clearly understand and support the > >> intention to get rid of the Print* command line arguments. Therefore, > >> the PrintCodeHeapState command line argument is gone. You can request > >> the CodeHeap state analytics with the -Xlog:codecache=Trace (vm > >> shutdown) or -Xlog:codecache=Debug (CodeCache full and vm shutdown) > >> switches. The output is directed to tty, not to the log stream. > >> > > > > > > >> > > > > > The reason for not using the log > >> stream is simple: UL prefixes every line with a timestamp and the > >> trace tags. Unfortunately, that messes up my formatting big time. The > >> jcmd output, on the other hand, will not have the UL prefixes. I would > >> have to distinguish between UL and jcmd output when formatting. In > >> addition, I do not see a benefit from adding the same UL prefix to > >> thousands of lines. > >> > > > > > > >> > > > > > Comments are very welcome! > >> > > > > > > >> > > > > > Best Regards, > >> > > > > > Lutz > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > > From vladimir.kozlov at oracle.com Tue Mar 27 14:46:36 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 07:46:36 -0700 Subject: RFR: 8200168: Remove DONT_USE_REGISTER_DEFINES on Sparc In-Reply-To: <22590c02-934c-e924-04d8-6ff20dd91b59@oracle.com> References: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> <22590c02-934c-e924-04d8-6ff20dd91b59@oracle.com> Message-ID: Looks good. I thought I reviewed it ;) Thanks, Vladimir On 3/27/18 1:27 AM, Per Liden wrote: > Could I please get a second review on this? Vladimir? > > /Per > > On 03/23/2018 11:29 AM, Per Liden wrote: >> Hi, >> >> Please review this patch to remove register macros on Sparc, as >> discussed here: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028541.html >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8200168 >> Webrev: http://cr.openjdk.java.net/~pliden/8200168/webrev.0 >> >> Testing: Passed hs-tier{1,2} >> >> /Per From vladimir.kozlov at oracle.com Tue Mar 27 14:48:35 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 07:48:35 -0700 Subject: [11] RFR(S): 8200230: [Graal] Compilations should not be enqueued before Graal is initialized In-Reply-To: <4b85c99e-a48b-4064-d5a2-5475899f2a77@oracle.com> References: <4b85c99e-a48b-4064-d5a2-5475899f2a77@oracle.com> Message-ID: On 3/27/18 2:10 AM, Tobias Hartmann wrote: > Thanks Vladimir! > > On 26.03.2018 19:01, Vladimir Kozlov wrote: >> Looks good. Did you test these changes when Graal is enabled as JIT? > > Yes, I've tested with Graal but since we have lots of known issues, it's a bit hard to filter out > the real ones. I haven't seen any related issues. Okay. Thanks, Vladimir > > Best regards, > Tobias > > >> On 3/26/18 3:35 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8200230 >>> http://cr.openjdk.java.net/~thartmann/8200230/webrev.00/ >>> >>> Looking at the PrintCompilation output when running Graal with -Xbatch and -XX:-TieredCompilation >>> shows lots of blocking compilations that time out because Graal is not yet initialized. Execution >>> with -version therefore takes 41 seconds on my machine with a fastdebug build. >>> >>> With -Xbatch, we should only allow compilations to be enqueued when Graal is fully initialized. This >>> reduces execution time with -version to 2.5 seconds on my machine. >>> >>> Thanks to Doug Simon for providing the patch. >>> >>> Best regards, >>> Tobias >>> From vladimir.kozlov at oracle.com Tue Mar 27 14:49:46 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 07:49:46 -0700 Subject: [11] RFR(S): 8200290: Scratch buffer creation fails with "assert(!current_thread_in_native()) failed: must not be in native" on SPARC In-Reply-To: References: Message-ID: Looks good. Thanks, Vladimir On 3/27/18 2:08 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8200290 > http://cr.openjdk.java.net/~thartmann/8200290/webrev.00/ > > Similar to JDK-8193699, the code in MacroAssembler::constant_oop_address() needs to be changed after > JDK-8167372 to transition from native. > > This only reproduces on older SPARC machines (I've hit it on a SPARC64 VII+) and therefore didn't > show up in our regular testing. I've verified the fix on this machine. > > Thanks, > Tobias > From per.liden at oracle.com Tue Mar 27 14:57:14 2018 From: per.liden at oracle.com (Per Liden) Date: Tue, 27 Mar 2018 16:57:14 +0200 Subject: RFR: 8200168: Remove DONT_USE_REGISTER_DEFINES on Sparc In-Reply-To: References: <1359f2a0-e62b-f953-f053-fdcf99e8e87e@oracle.com> <22590c02-934c-e924-04d8-6ff20dd91b59@oracle.com> Message-ID: Awesome! Thanks Vladimir! /Per On 03/27/2018 04:46 PM, Vladimir Kozlov wrote: > Looks good. I thought I reviewed it ;) > > Thanks, > Vladimir > > On 3/27/18 1:27 AM, Per Liden wrote: >> Could I please get a second review on this? Vladimir? >> >> /Per >> >> On 03/23/2018 11:29 AM, Per Liden wrote: >>> Hi, >>> >>> Please review this patch to remove register macros on Sparc, as >>> discussed here: >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028541.html >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8200168 >>> Webrev: http://cr.openjdk.java.net/~pliden/8200168/webrev.0 >>> >>> Testing: Passed hs-tier{1,2} >>> >>> /Per From tobias.hartmann at oracle.com Tue Mar 27 15:15:57 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Mar 2018 17:15:57 +0200 Subject: [11] RFR(S): 8200290: Scratch buffer creation fails with "assert(!current_thread_in_native()) failed: must not be in native" on SPARC In-Reply-To: References: Message-ID: Thanks, Vladimir! Best regards, Tobias On 27.03.2018 16:49, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 3/27/18 2:08 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8200290 >> http://cr.openjdk.java.net/~thartmann/8200290/webrev.00/ >> >> Similar to JDK-8193699, the code in MacroAssembler::constant_oop_address() needs to be changed after >> JDK-8167372 to transition from native. >> >> This only reproduces on older SPARC machines (I've hit it on a SPARC64 VII+) and therefore didn't >> show up in our regular testing. I've verified the fix on this machine. >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Tue Mar 27 15:20:51 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 08:20:51 -0700 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: References: Message-ID: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> Thank you, Roland, for contributing this. Changes look reasonable. I will start testing including performance. Thanks, Vladimir On 3/27/18 7:35 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8200303/webrev.00/ > > Over in the Shenandoah project, Aleksey found that: > > 1) a counted loop with a switch: > > for (...) { > switch (..) { > ... > default: > throw SomeException(); > } > } > > where some cases break out of the loop would not perform as well when > loop strip mining is enabled, even if the cases that exit the loop are > never taken in practice. > > Because C2 gives all branches out of a JumpNode the same probability, > exiting the loop has a non null probability and GCM computes (wrongly) > that scheduling the loop strip mining book keeping logic in the loop is > cheaper than out of the loop. > > 2) Shenandoah write barriers in some of the cases should be hoisted but > are not because C2 can't tell that only a single case of the switch is > ever hit. > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html > > In the Shenandoah repo, we have a change that makes C2 leverage > profiling for switch. Experiments showed that 1) and 2) above are fixed > and that some common benchmarks run with parallel gc benefit as well > (~+7% on Serial): > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-February/004886.html > > The patch I'm proposing here is based on the patch we've been using for > a couple months in Shenandoah: > > - it fixes profile collection in c1 for lookupswitch/tableswitch > > - it sets profiling information on IfNodes and JumpNodes emitted from > lookupswitch/tableswitch and propagate it after matching so GCM can > take advantage of it > > - it takes advantage of profiling to find never taken cases and trim > down the cases (or ranges as they're called in the code). A never > taken range can now cause an uncommon trap. > > and also has some improvements: > > - if some ranges are a lot more common than others, it might pay off to > check for them one after the other before going to the binary > search. The patch has some logic to evaluate the number of steps in > the binary search and determine whether checking for the most common > case upfront would pay off (from profile data) > > - the binary search doesn't always keep the tree balanced but instead > picks a mid point that split frequencies in half > > Roland. > From thomas.stuefe at gmail.com Tue Mar 27 15:27:41 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 27 Mar 2018 17:27:41 +0200 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: References: Message-ID: New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ Comments inline. On Tue, Mar 27, 2018 at 4:30 PM, Aleksey Shipilev wrote: > On 03/27/2018 04:20 PM, Thomas St?fe wrote: > > could I please have reviews for this tiny but urgent fix to x86-32: > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200297 > > webrev: http://cr.openjdk.java.net/~stuefe/webrevs/ > 8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.00/webrev/ > > Not sure if we want to cast to size_t, instead of selecting the format > specifiers for the expressions. > > Okay. Since almost all printed expressions divide by "K" and "M" which are size_t the expression type is converted to size_t as well, so I removed the extraneous size_t casts for those cases. Lets hope this works for all platforms and compilers. For the byte-size cases I kept it. Have you printed the tables before/after? Because it seems you have lost a > space here ("K"/"M" goes > there in other lines): > > here > | > v > - ast->print("[%5d ..%5d ): " > - ,(SizeDistributionArray[i].rangeStart< - ,(SizeDistributionArray[i].rangeEnd< + ast->print("[" SIZE_FORMAT_W(5) ".." SIZE_FORMAT_W(5) "): " > + ,(size_t)(SizeDistributionArray[i]. > rangeStart< + ,(size_t)(SizeDistributionArray[i]. > rangeEnd< ); > > - ast->print("[%5d ..%5d ): " > - ,(SizeDistributionArray[i].rangeStart< - ,(SizeDistributionArray[i].rangeEnd< + ast->print("[" SIZE_FORMAT_W(5) ".." SIZE_FORMAT_W(5) "): " > + ,(size_t)(SizeDistributionArray[i]. > rangeStart< + ,(size_t)(SizeDistributionArray[i]. > rangeEnd< ); > > Fixed. Thanks, Thomas > > Thanks, > -Aleksey > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at redhat.com Tue Mar 27 15:31:03 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 27 Mar 2018 17:31:03 +0200 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: References: Message-ID: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> On 03/27/2018 05:27 PM, Thomas St?fe wrote: > http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ OK from me, assuming it still builds x86_32, and passes submit-hs. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From forax at univ-mlv.fr Tue Mar 27 15:54:57 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 27 Mar 2018 17:54:57 +0200 (CEST) Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: References: Message-ID: <594841547.994777.1522166097222.JavaMail.zimbra@u-pem.fr> Roland, thanks for doing that, not having a switch correctly profiled and the profile not taken into account by c2 was a serious headache when trying to optimize a parser generator (10 years ago). I think you can also close JDK-8058192 as a dup. Now, we are just missing the equivalent method handle combinator :) R?mi ----- Mail original ----- > De: "Roland Westrelin" > ?: "hotspot compiler" > Envoy?: Mardi 27 Mars 2018 16:35:16 > Objet: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch > http://cr.openjdk.java.net/~roland/8200303/webrev.00/ > > Over in the Shenandoah project, Aleksey found that: > > 1) a counted loop with a switch: > > for (...) { > switch (..) { > ... > default: > throw SomeException(); > } > } > > where some cases break out of the loop would not perform as well when > loop strip mining is enabled, even if the cases that exit the loop are > never taken in practice. > > Because C2 gives all branches out of a JumpNode the same probability, > exiting the loop has a non null probability and GCM computes (wrongly) > that scheduling the loop strip mining book keeping logic in the loop is > cheaper than out of the loop. > > 2) Shenandoah write barriers in some of the cases should be hoisted but > are not because C2 can't tell that only a single case of the switch is > ever hit. > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html > > In the Shenandoah repo, we have a change that makes C2 leverage > profiling for switch. Experiments showed that 1) and 2) above are fixed > and that some common benchmarks run with parallel gc benefit as well > (~+7% on Serial): > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-February/004886.html > > The patch I'm proposing here is based on the patch we've been using for > a couple months in Shenandoah: > > - it fixes profile collection in c1 for lookupswitch/tableswitch > > - it sets profiling information on IfNodes and JumpNodes emitted from > lookupswitch/tableswitch and propagate it after matching so GCM can > take advantage of it > > - it takes advantage of profiling to find never taken cases and trim > down the cases (or ranges as they're called in the code). A never > taken range can now cause an uncommon trap. > > and also has some improvements: > > - if some ranges are a lot more common than others, it might pay off to > check for them one after the other before going to the binary > search. The patch has some logic to evaluate the number of steps in > the binary search and determine whether checking for the most common > case upfront would pay off (from profile data) > > - the binary search doesn't always keep the tree balanced but instead > picks a mid point that split frequencies in half > > Roland. From lesliezhai at llvm.org.cn Tue Mar 27 15:57:35 2018 From: lesliezhai at llvm.org.cn (Leslie Zhai) Date: Tue, 27 Mar 2018 23:57:35 +0800 Subject: How to migrate hs17's LIR_Assembler::emit_exception_handler to hs25? Message-ID: <6c76544b-0830-bd5d-d1af-538372b1a269@llvm.org.cn>+29BEC09D0E7F7F4E Hi HotSpot compiler developers, I am new to HotSpot, and trying to migrate hs17 to hs25, but JDK-6919934 changed: int LIR_Assembler::emit_exception_handler() { ? // if the last instruction is a call (typically to do a throw which ? // is coming at the end after block reordering) the return address ? // must still point into the code area in order to avoid assertion ? // failures when searching for the corresponding bci => add a nop ? // (was bug 5/14/1999 - gri) ? __ nop(); ? // generate code for exception handler ? address handler_base = __ start_a_stub(exception_handler_size); ? if (handler_base == NULL) { ??? // not enough space left for the handler ??? bailout("exception handler overflow"); ??? return -1; ? } ? int offset = code_offset(); ? // if the method does not have an exception handler, then there is ? // no reason to search for one ? if (compilation()->has_exception_handlers() || compilation()->env()->jvmti_can_post_on_exceptions()) { ??? // the exception oop and pc are in rax, and rdx ??? // no other registers need to be preserved, so invalidate them ??? __ invalidate_registers(false, true, true, false, true, true); ??? // check that there is really an exception ??? __ verify_not_null_oop(rax); ??? // search an exception handler (rax: exception oop, rdx: throwing pc) ??? __ call(RuntimeAddress(Runtime1::entry_for(Runtime1::handle_exception_nofpu_id))); ??? // if the call returns here, then the exception handler for particular ??? // exception doesn't exist -> unwind activation and forward exception to caller ? } ? // the exception oop is in rax, ? // no other registers need to be preserved, so invalidate them ? __ invalidate_registers(false, true, true, true, true, true); ? // check that there is really an exception ? __ verify_not_null_oop(rax); ? // unlock the receiver/klass if necessary ? // rax,: exception ? ciMethod* method = compilation()->method(); ? if (method->is_synchronized() && GenerateSynchronizationCode) { ??? monitorexit(FrameMap::rbx_oop_opr, FrameMap::rcx_opr, SYNC_header, 0, rax); ? } ? // unwind activation and forward exception to caller ? // rax,: exception ? __ jump(RuntimeAddress(Runtime1::entry_for(Runtime1::unwind_exception_id))); ? assert(code_offset() - offset <= exception_handler_size, "overflow"); ? __ end_a_stub(); ? return offset; } To: int LIR_Assembler::emit_exception_handler() { ? // if the last instruction is a call (typically to do a throw which ? // is coming at the end after block reordering) the return address ? // must still point into the code area in order to avoid assertion ? // failures when searching for the corresponding bci => add a nop ? // (was bug 5/14/1999 - gri) ? __ nop(); ? // generate code for exception handler ? address handler_base = __ start_a_stub(exception_handler_size); ? if (handler_base == NULL) { ??? // not enough space left for the handler ??? bailout("exception handler overflow"); ??? return -1; ? } ? int offset = code_offset(); ? // the exception oop and pc are in rax, and rdx ? // no other registers need to be preserved, so invalidate them ? __ invalidate_registers(false, true, true, false, true, true); ? // check that there is really an exception ? __ verify_not_null_oop(rax); ? // search an exception handler (rax: exception oop, rdx: throwing pc) ? __ call(RuntimeAddress(Runtime1::entry_for(Runtime1::handle_exception_nofpu_id))); ? __ stop("should not reach here"); ? assert(code_offset() - offset <= exception_handler_size, "overflow"); ? __ end_a_stub(); ? return offset; } I have no idea how to check whether or not the Java method have an exception handler *without* the `compilation()->has_exception_handlers()` condition. And after JDK-6939930 and JDK-7012914, it separated `unwind` handler from `exception` handler, added `LIR_Assembler::emit_unwind_handler` and `handle_exception_from_callee` StubID: int LIR_Assembler::emit_exception_handler() { ? // if the last instruction is a call (typically to do a throw which ? // is coming at the end after block reordering) the return address ? // must still point into the code area in order to avoid assertion ? // failures when searching for the corresponding bci => add a nop ? // (was bug 5/14/1999 - gri) ? __ nop(); ? // generate code for exception handler ? address handler_base = __ start_a_stub(exception_handler_size()); ? if (handler_base == NULL) { ??? // not enough space left for the handler ??? bailout("exception handler overflow"); ??? return -1; ? } ? int offset = code_offset(); ? // the exception oop and pc are in rax, and rdx ? // no other registers need to be preserved, so invalidate them ? __ invalidate_registers(false, true, true, false, true, true); ? // check that there is really an exception ? __ verify_not_null_oop(rax); ? // search an exception handler (rax: exception oop, rdx: throwing pc) ? __ call(RuntimeAddress(Runtime1::entry_for(Runtime1::handle_exception_from_callee_id))); ? __ should_not_reach_here(); ? guarantee(code_offset() - offset <= exception_handler_size(), "overflow"); ? __ end_a_stub(); ? return offset; } How to migrate `if (compilation()->has_exception_handlers()) { } else { }` in `Runtime1::generate_handle_exception` or `Runtime1::generate_unwind_exception` be equivalent for hs25 or even jdk11? My workaround is override `generate_handle_exception`: diff -r 33a61051088d src/share/vm/c1/c1_Runtime1.hpp --- a/src/share/vm/c1/c1_Runtime1.hpp?? Sat Mar 24 12:18:54 2018 +0800 +++ b/src/share/vm/c1/c1_Runtime1.hpp?? Tue Mar 27 23:32:54 2018 +0800 @@ -127,6 +127,7 @@ ?? static OopMapSet* generate_code_for(StubID id, StubAssembler* sasm); ?? static OopMapSet* generate_exception_throw(StubAssembler* sasm, address target, bool has_argument); ?? static OopMapSet* generate_handle_exception(StubID id, StubAssembler* sasm); +? static OopMapSet* generate_handle_exception(StubAssembler *sasm, OopMapSet* oop_maps, OopMap* oop_map, bool save_fpu_registers); ?? static void?????? generate_unwind_exception(StubAssembler *sasm); ?? static OopMapSet* generate_patching(StubAssembler* sasm, address target); Then reuse hs17's `LIR_Assembler::emit_exception_handler`, `Runtime1::generate_handle_exception`, `Runtime1::generate_code_for`... in hs25, but it is monkey patch... not easy to merge upstream. Please share your porting experience, and give me some advice, thanks a lot! -- Regards, Leslie Zhai From vladimir.kozlov at oracle.com Tue Mar 27 16:04:47 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 09:04:47 -0700 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> References: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> Message-ID: Failed to build on SPARC (PCH?): Undefined first referenced symbol in file log2f /workspace/build/solaris-sparcv9/hotspot/variant-server/libjvm/objs/parse2.o Vladimir On 3/27/18 8:20 AM, Vladimir Kozlov wrote: > Thank you, Roland, for contributing this. > > Changes look reasonable. I will start testing including performance. > > Thanks, > Vladimir > > On 3/27/18 7:35 AM, Roland Westrelin wrote: >> >> http://cr.openjdk.java.net/~roland/8200303/webrev.00/ >> >> Over in the Shenandoah project, Aleksey found that: >> >> 1) a counted loop with a switch: >> >> for (...) { >> ?? switch (..) { >> ???? ... >> ???? default: >> ?????? throw SomeException(); >> ?? } >> } >> >> where some cases break out of the loop would not perform as well when >> loop strip mining is enabled, even if the cases that exit the loop are >> never taken in practice. >> >> Because C2 gives all branches out of a JumpNode the same probability, >> exiting the loop has a non null probability and GCM computes (wrongly) >> that scheduling the loop strip mining book keeping logic in the loop is >> cheaper than out of the loop. >> >> 2) Shenandoah write barriers in some of the cases should be hoisted but >> are not because C2 can't tell that only a single case of the switch is >> ever hit. >> >> http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html >> >> >> In the Shenandoah repo, we have a change that makes C2 leverage >> profiling for switch. Experiments showed that 1) and 2) above are fixed >> and that some common benchmarks run with parallel gc benefit as well >> (~+7% on Serial): >> >> http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-February/004886.html >> >> >> The patch I'm proposing here is based on the patch we've been using for >> a couple months in Shenandoah: >> >> - it fixes profile collection in c1 for lookupswitch/tableswitch >> >> - it sets profiling information on IfNodes and JumpNodes emitted from >> ?? lookupswitch/tableswitch and propagate it after matching so GCM can >> ?? take advantage of it >> >> - it takes advantage of profiling to find never taken cases and trim >> ?? down the cases (or ranges as they're called in the code). A never >> ?? taken range can now cause an uncommon trap. >> >> and also has some improvements: >> >> - if some ranges are a lot more common than others, it might pay off to >> ?? check for them one after the other before going to the binary >> ?? search. The patch has some logic to evaluate the number of steps in >> ?? the binary search and determine whether checking for the most common >> ?? case upfront would pay off (from profile data) >> >> - the binary search doesn't always keep the tree balanced but instead >> ?? picks a mid point that split frequencies in half >> >> Roland. >> From vladimir.kozlov at oracle.com Tue Mar 27 19:03:04 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 12:03:04 -0700 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> Message-ID: <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> Looks good to me too. But it should be verified by someone. Our (Oracle) Java test system and submit-hs (which use it) does not build or test on any x86_32 systems. Regards, Vladimir On 3/27/18 8:31 AM, Aleksey Shipilev wrote: > On 03/27/2018 05:27 PM, Thomas St?fe wrote: >> http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ > > OK from me, assuming it still builds x86_32, and passes submit-hs. > > Thanks, > -Aleksey > From thomas.stuefe at gmail.com Tue Mar 27 19:34:45 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 27 Mar 2018 21:34:45 +0200 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> Message-ID: On Tue, Mar 27, 2018 at 9:03 PM, Vladimir Kozlov wrote: > Looks good to me too. > > But it should be verified by someone. Our (Oracle) Java test system and > submit-hs (which use it) does not build or test on any x86_32 systems. > > Well, I did build on Linux 32bit. Is that sufficient or do you need someone else to test? Thanks, Thomas > Regards, > Vladimir > > > On 3/27/18 8:31 AM, Aleksey Shipilev wrote: > >> On 03/27/2018 05:27 PM, Thomas St?fe wrote: >> >>> http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderro >>> rs-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ >>> >> >> OK from me, assuming it still builds x86_32, and passes submit-hs. >> >> Thanks, >> -Aleksey >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Mar 27 20:18:51 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Mar 2018 13:18:51 -0700 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> Message-ID: <42b84c01-3467-32cf-1c57-5d3d0e991c1b@oracle.com> On 3/27/18 12:34 PM, Thomas St?fe wrote: > > > On Tue, Mar 27, 2018 at 9:03 PM, Vladimir Kozlov > > wrote: > > Looks good to me too. > > But it should be verified by someone. Our (Oracle) Java test system > and submit-hs (which use it) does not build or test on any x86_32 > systems. > > > Well, I did build on Linux 32bit. Is that sufficient or do you need > someone else to test? If you can run HelloWorld.java with -Xlog:codecache=Trace to use stat code, it will be sufficient for me. Thanks, Vladimir > > Thanks, Thomas > > Regards, > Vladimir > > > On 3/27/18 8:31 AM, Aleksey Shipilev wrote: > > On 03/27/2018 05:27 PM, Thomas St?fe wrote: > > http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ > > > > OK from me, assuming it still builds x86_32, and passes submit-hs. > > Thanks, > -Aleksey > > From thomas.stuefe at gmail.com Wed Mar 28 07:18:52 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 28 Mar 2018 09:18:52 +0200 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: <42b84c01-3467-32cf-1c57-5d3d0e991c1b@oracle.com> References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> <42b84c01-3467-32cf-1c57-5d3d0e991c1b@oracle.com> Message-ID: Hi, On Tue, Mar 27, 2018 at 10:18 PM, Vladimir Kozlov < vladimir.kozlov at oracle.com> wrote: > On 3/27/18 12:34 PM, Thomas St?fe wrote: > >> >> >> On Tue, Mar 27, 2018 at 9:03 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com > wrote: >> >> Looks good to me too. >> >> But it should be verified by someone. Our (Oracle) Java test system >> and submit-hs (which use it) does not build or test on any x86_32 >> systems. >> >> >> Well, I did build on Linux 32bit. Is that sufficient or do you need >> someone else to test? >> > > If you can run HelloWorld.java with -Xlog:codecache=Trace to use stat > code, it will be sufficient for me. > > Thanks, > Vladimir > > Tested on ubuntu 16.4 32bit. Looks good. Pushed. Thanks, Thomas > > >> Thanks, Thomas >> >> Regards, >> Vladimir >> >> >> On 3/27/18 8:31 AM, Aleksey Shipilev wrote: >> >> On 03/27/2018 05:27 PM, Thomas St?fe wrote: >> >> http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderro >> rs-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ >> > ors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/> >> >> >> OK from me, assuming it still builds x86_32, and passes submit-hs. >> >> Thanks, >> -Aleksey >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Mar 28 07:42:17 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 28 Mar 2018 09:42:17 +0200 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: References: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> Message-ID: Hi Vladimir, Thanks for helping again. > Failed to build on SPARC (PCH?): > > Undefined first referenced > symbol in file > log2f > /workspace/build/solaris-sparcv9/hotspot/variant-server/libjvm/objs/parse2.o It builds fine without precompiled headers on linux. I added a: #include http://cr.openjdk.java.net/~roland/8200303/webrev.01/ Can you try that one? Roland. From lutz.schmidt at sap.com Wed Mar 28 07:50:28 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 28 Mar 2018 07:50:28 +0000 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> <42b84c01-3467-32cf-1c57-5d3d0e991c1b@oracle.com> Message-ID: <88880294-A626-4B1C-8D67-58206BCE85DE@sap.com> Dear all, first of all a big THANK YOU to all involved for detecting, handling and fixing this issue. I was in really bad shape yesterday. Regards, Lutz From: hotspot-compiler-dev on behalf of Thomas St?fe Date: Wednesday, 28. March 2018 at 09:18 To: Vladimir Kozlov Cc: hotspot compiler Subject: Re: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) Hi, On Tue, Mar 27, 2018 at 10:18 PM, Vladimir Kozlov > wrote: On 3/27/18 12:34 PM, Thomas St?fe wrote: On Tue, Mar 27, 2018 at 9:03 PM, Vladimir Kozlov >> wrote: Looks good to me too. But it should be verified by someone. Our (Oracle) Java test system and submit-hs (which use it) does not build or test on any x86_32 systems. Well, I did build on Linux 32bit. Is that sufficient or do you need someone else to test? If you can run HelloWorld.java with -Xlog:codecache=Trace to use stat code, it will be sufficient for me. Thanks, Vladimir Tested on ubuntu 16.4 32bit. Looks good. Pushed. Thanks, Thomas Thanks, Thomas Regards, Vladimir On 3/27/18 8:31 AM, Aleksey Shipilev wrote: On 03/27/2018 05:27 PM, Thomas St?fe wrote: http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ OK from me, assuming it still builds x86_32, and passes submit-hs. Thanks, -Aleksey -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Wed Mar 28 15:43:28 2018 From: jcbeyler at google.com (JC Beyler) Date: Wed, 28 Mar 2018 15:43:28 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi all, I've been working on deflaking the tests mostly and the wording in the JVMTI spec. Here is the two incremental webrevs: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ Here is the total webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ Here are the notes of this change: - Currently the tests pass 100 times in a row, I am working on checking if they pass 1000 times in a row. - The default sampling rate is set to 512k, this is what we use internally and having a default means that to enable the sampling with the default, the user only has to do a enable event/disable event via JVMTI (instead of enable + set sample rate). - I deprecated the code that was handling the fast path tlab refill if it happened since this is now deprecated - Though I saw that Graal is still using it so I have to see what needs to be done there exactly Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when the event system is turned on and the callback to the native agent is just empty. I got a 3% overhead with a 512k sampling rate with the code I put in the native side of my tests. Thanks and comments are appreciated, Jc On Mon, Mar 19, 2018 at 2:06 PM JC Beyler wrote: > Hi all, > > The incremental webrev update is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > The full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > Major change here is: > - I've removed the heapMonitoring.cpp code in favor of just having the > sampling events as per Serguei's request; I still have to do some overhead > measurements but the tests prove the concept can work > - Most of the tlab code is unchanged, the only major part is that > now things get sent off to event collectors when used and enabled. > - Added the interpreter collectors to handle interpreter execution > - Updated the name from SetTlabHeapSampling to SetHeapSampling to be > more generic > - Added a mutex for the thread sampling so that we can initialize an > internal static array safely > - Ported the tests from the old system to this new one > > I've also updated the JEP and CSR to reflect these changes: > https://bugs.openjdk.java.net/browse/JDK-8194905 > https://bugs.openjdk.java.net/browse/JDK-8171119 > > In order to make this have some forward progress, I've removed the heap > sampling code entirely and now rely entirely on the event sampling system. > The tests reflect this by using a simplified implementation of what an > agent could do: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > (Search for anything mentioning event_storage). > > I have not taken the time to port the whole code we had originally in > heapMonitoring to this. I hesitate only because that code was in C++, I'd > have to port it to C and this is for tests so perhaps what I have now is > good enough? > > As far as testing goes, I've ported all the relevant tests and then added > a few: > - Turning the system on/off > - Testing using various GCs > - Testing using the interpreter > - Testing the sampling rate > - Testing with objects and arrays > - Testing with various threads > > Finally, as overhead goes, I have the numbers of the system off vs a clean > build and I have 0% overhead, which is what we'd want. This was using the > Dacapo benchmarks. I am now preparing to run a version with the events on > using dacapo and will report back here. > > Any comments are welcome :) > Jc > > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler wrote: > >> Hi all, >> >> I apologize for the delay but I wanted to add an event system and that >> took a bit longer than expected and I also reworked the code to take into >> account the deprecation of FastTLABRefill. >> >> This update has four parts: >> >> A) I moved the implementation from Thread to ThreadHeapSampler inside of >> Thread. Would you prefer it as a pointer inside of Thread or like this >> works for you? Second question would be would you rather have an >> association outside of Thread altogether that tries to remember when >> threads are live and then we would have something like: >> ThreadHeapSampler::get_sampling_size(this_thread); >> >> I worry about the overhead of this but perhaps it is not too too bad? >> >> B) I also have been working on the Allocation event system that sends out >> a notification at each sampled event. This will be practical when wanting >> to do something at the allocation point. I'm also looking at if the whole >> heapMonitoring code could not reside in the agent code and not in the JDK. >> I'm not convinced but I'm talking to Serguei about it to see/assess :) >> - Also added two tests for the new event subsystem >> >> C) Removed the slow_path fields inside the TLAB code since now >> FastTLABRefill is deprecated >> >> D) Updated the JVMTI documentation and specification for the methods. >> >> So the incremental webrev is here: >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >> >> and the full webrev is here: >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >> >> I believe I have updated the various JIRA issues that track this :) >> >> Thanks for your input, >> Jc >> >> >> On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler wrote: >> >>> Hi Erik, >>> >>> I inlined my answers, which the last one seems to answer Robbin's >>> concerns about the same thing (adding things to Thread). >>> >>> On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund < >>> erik.osterlund at oracle.com> wrote: >>> >>>> Hi JC, >>>> >>>> Comments are inlined below. >>>> >>>> >>>> On 2018-02-13 06:18, JC Beyler wrote: >>>> >>>> Hi Erik, >>>> >>>> Thanks for your answers, I've now inlined my own answers/comments. >>>> >>>> I've done a new webrev here: >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>>> >>>> The incremental is here: >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>>> >>>> Note to all: >>>> - I've been integrating changes from Erin/Serguei/David comments so >>>> this webrev incremental is a bit an answer to all comments in one. I >>>> apologize for that :) >>>> >>>> >>>> On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund < >>>> erik.osterlund at oracle.com> wrote: >>>> >>>>> Hi JC, >>>>> >>>>> Sorry for the delayed reply. >>>>> >>>>> Inlined answers: >>>>> >>>>> >>>>> On 2018-02-06 00:04, JC Beyler wrote: >>>>> >>>>>> Hi Erik, >>>>>> >>>>>> (Renaming this to be folded into the newly renamed thread :)) >>>>>> >>>>>> First off, thanks a lot for reviewing the webrev! I appreciate it! >>>>>> >>>>>> I updated the webrev to: >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>>>>> >>>>>> And the incremental one is here: >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>>>>> >>>>>> It contains: >>>>>> - The change for since from 9 to 11 for the jvmti.xml >>>>>> - The use of the OrderAccess for initialized >>>>>> - Clearing the oop >>>>>> >>>>>> I also have inlined my answers to your comments. The biggest question >>>>>> will come from the multiple *_end variables. A bit of the logic there >>>>>> is due to handling the slow path refill vs fast path refill and >>>>>> checking that the rug was not pulled underneath the slowpath. I >>>>>> believe that a previous comment was that TlabFastRefill was going to >>>>>> be deprecated. >>>>>> >>>>>> If this is true, we could revert this code a bit and just do a : if >>>>>> TlabFastRefill is enabled, disable this. And then deprecate that when >>>>>> TlabFastRefill is deprecated. >>>>>> >>>>>> This might simplify this webrev and I can work on a follow-up that >>>>>> either: removes TlabFastRefill if Robbin does not have the time to do >>>>>> it or add the support to the assembly side to handle this correctly. >>>>>> What do you think? >>>>>> >>>>> >>>>> I support removing TlabFastRefill, but I think it is good to not >>>>> depend on that happening first. >>>>> >>>>> >>>> >>>> I'm slowly pushing on the FastTLABRefill ( >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8194084), I agree on keeping >>>> both separate for now though so that we can think of both differently >>>> >>>> >>>> >>>>> Now, below, inlined are my answers: >>>>>> >>>>>> On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund >>>>>> wrote: >>>>>> >>>>>>> Hi JC, >>>>>>> >>>>>>> Hope I am reviewing the right version of your work. Here goes... >>>>>>> >>>>>>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>>>>>> >>>>>>> 159 AllocTracer::send_allocation_outside_tlab(klass, result, >>>>>>> size * >>>>>>> HeapWordSize, THREAD); >>>>>>> 160 >>>>>>> 161 THREAD->tlab().handle_sample(THREAD, result, size); >>>>>>> 162 return result; >>>>>>> 163 } >>>>>>> >>>>>>> Should not call tlab()->X without checking if (UseTLAB) IMO. >>>>>>> >>>>>>> Done! >>>>>> >>>>> >>>>> More about this later. >>>>> >>>>> >>>>> >>>>>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>>>>>> >>>>>>> So first of all, there seems to quite a few ends. There is an "end", >>>>>>> a "hard >>>>>>> end", a "slow path end", and an "actual end". Moreover, it seems >>>>>>> like the >>>>>>> "hard end" is actually further away than the "actual end". So the >>>>>>> "hard end" >>>>>>> seems like more of a "really definitely actual end" or something. I >>>>>>> don't >>>>>>> know about you, but I think it looks kind of messy. In particular, I >>>>>>> don't >>>>>>> feel like the name "actual end" reflects what it represents, >>>>>>> especially when >>>>>>> there is another end that is behind the "actual end". >>>>>>> >>>>>>> 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { >>>>>>> 414 // Did a fast TLAB refill occur? >>>>>>> 415 if (_slow_path_end != _end) { >>>>>>> 416 // Fix up the actual end to be now the end of this TLAB. >>>>>>> 417 _slow_path_end = _end; >>>>>>> 418 _actual_end = _end; >>>>>>> 419 } >>>>>>> 420 >>>>>>> 421 return _actual_end + alignment_reserve(); >>>>>>> 422 } >>>>>>> >>>>>>> I really do not like making getters unexpectedly have these kind of >>>>>>> side >>>>>>> effects. It is not expected that when you ask for the "hard end", you >>>>>>> implicitly update the "slow path end" and "actual end" to new values. >>>>>>> >>>>>>> As I said, a lot of this is due to the FastTlabRefill. If I make this >>>>>> not supporting FastTlabRefill, this goes away. The reason the system >>>>>> needs to update itself at the get is that you only know at that get if >>>>>> things have shifted underneath the tlab slow path. I am not sure of >>>>>> really better names (naming is hard!), perhaps we could do these >>>>>> names: >>>>>> >>>>>> - current_tlab_end // Either the allocated tlab end or a >>>>>> sampling point >>>>>> - last_allocation_address // The end of the tlab allocation >>>>>> - last_slowpath_allocated_end // In case a fast refill occurred the >>>>>> end might have changed, this is to remember slow vs fast past refills >>>>>> >>>>>> the hard_end method can be renamed to something like: >>>>>> tlab_end_pointer() // The end of the lab including a bit of >>>>>> alignment reserved bytes >>>>>> >>>>> >>>>> Those names sound better to me. Could you please provide a mapping >>>>> from the old names to the new names so I understand which one is which >>>>> please? >>>>> >>>>> This is my current guess of what you are proposing: >>>>> >>>>> end -> current_tlab_end >>>>> actual_end -> last_allocation_address >>>>> slow_path_end -> last_slowpath_allocated_end >>>>> hard_end -> tlab_end_pointer >>>>> >>>>> >>>> Yes that is correct, that was what I was proposing. >>>> >>>> >>>>> I would prefer this naming: >>>>> >>>>> end -> slow_path_end // the end for taking a slow path; either due to >>>>> sampling or refilling >>>>> actual_end -> allocation_end // the end for allocations >>>>> slow_path_end -> last_slow_path_end // last address for slow_path_end >>>>> (as opposed to allocation_end) >>>>> hard_end -> reserved_end // the end of the reserved space of the TLAB >>>>> >>>>> About setting things in the getter... that still seems like a very >>>>> unpleasant thing to me. It would be better to inspect the call hierarchy >>>>> and explicitly update the ends where they need updating, and assert in the >>>>> getter that they are in sync, rather than implicitly setting various ends >>>>> as a surprising side effect in a getter. It looks like the call hierarchy >>>>> is very small. With my new naming convention, reserved_end() would >>>>> presumably return _allocation_end + alignment_reserve(), and have an assert >>>>> checking that _allocation_end == _last_slow_path_allocation_end, >>>>> complaining that this invariant must hold, and that a caller to this >>>>> function, such as make_parsable(), must first explicitly synchronize the >>>>> ends as required, to honor that invariant. >>>>> >>>>> >>>> >>>> I've renamed the variables to how you preferred it except for the _end >>>> one. I did: >>>> current_end >>>> last_allocation_address >>>> tlab_end_ptr >>>> >>>> The reason is that the architecture dependent code use the thread.hpp >>>> API and it already has tlab included into the name so it becomes >>>> tlab_current_end (which is better that tlab_current_tlab_end in my opinion). >>>> >>>> I also moved the update into a separate method with a TODO that says to >>>> remove it when FastTLABRefill is deprecated >>>> >>>> >>>> This looks a lot better now. Thanks. >>>> >>>> Note that the following comment now needs updating accordingly in >>>> threadLocalAllocBuffer.hpp: >>>> >>>> 41 // Heap sampling is performed via the end/actual_end fields. 42 // actual_end contains the real end of the tlab allocation, 43 // whereas end can be set to an arbitrary spot in the tlab to 44 // trip the return and sample the allocation. 45 // slow_path_end is used to track if a fast tlab refill occured 46 // between slowpath calls. >>>> >>>> There might be other comments too, I have not looked in detail. >>>> >>> >>> This was the only spot that still had an actual_end, I fixed it now. >>> I'll do a sweep to double check other comments. >>> >>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> Not sure it's better but before updating the webrev, I wanted to try >>>>>> to get input/consensus :) >>>>>> >>>>>> (Note hard_end was always further off than end). >>>>>> >>>>>> src/hotspot/share/prims/jvmti.xml: >>>>>>> >>>>>>> 10357 >>>>>>> 10358 >>>>>>> 10359 Can sample the heap. >>>>>>> 10360 If this capability is enabled then the heap sampling >>>>>>> methods >>>>>>> can be called. >>>>>>> 10361 >>>>>>> 10362 >>>>>>> >>>>>>> Looks like this capability should not be "since 9" if it gets >>>>>>> integrated >>>>>>> now. >>>>>>> >>>>>> Updated now to 11, crossing my fingers :) >>>>>> >>>>>> >>>>>> src/hotspot/share/runtime/heapMonitoring.cpp: >>>>>>> >>>>>>> 448 if (is_alive->do_object_b(value)) { >>>>>>> 449 // Update the oop to point to the new object if it is >>>>>>> still >>>>>>> alive. >>>>>>> 450 f->do_oop(&(trace.obj)); >>>>>>> 451 >>>>>>> 452 // Copy the old trace, if it is still live. >>>>>>> 453 _allocated_traces->at_put(curr_pos++, trace); >>>>>>> 454 >>>>>>> 455 // Store the live trace in a cache, to be served up on >>>>>>> /heapz. >>>>>>> 456 _traces_on_last_full_gc->append(trace); >>>>>>> 457 >>>>>>> 458 count++; >>>>>>> 459 } else { >>>>>>> 460 // If the old trace is no longer live, add it to the >>>>>>> list of >>>>>>> 461 // recently collected garbage. >>>>>>> 462 store_garbage_trace(trace); >>>>>>> 463 } >>>>>>> >>>>>>> In the case where the oop was not live, I would like it to be >>>>>>> explicitly >>>>>>> cleared. >>>>>>> >>>>>> Done I think how you wanted it. Let me know because I'm not familiar >>>>>> with the RootAccess API. I'm unclear if I'm doing this right or not so >>>>>> reviews of these parts are highly appreciated. Robbin had talked of >>>>>> perhaps later pushing this all into a OopStorage, should I do this now >>>>>> do you think? Or can that wait a second webrev later down the road? >>>>>> >>>>> >>>>> I think using handles can and should be done later. You can use the >>>>> Access API now. >>>>> I noticed that you are missing an #include "oops/access.inline.hpp" in >>>>> your heapMonitoring.cpp file. >>>>> >>>>> >>>> The missing header is there for me so I don't know, I made sure it is >>>> present in the latest webrev. Sorry about that. >>>> >>>> >>>> >>>>> + Did I clear it the way you wanted me to or were you thinking of >>>>>> something else? >>>>>> >>>>> >>>>> That is precisely how I wanted it to be cleared. Thanks. >>>>> >>>>> + Final question here, seems like if I were to want to not do the >>>>>> f->do_oop directly on the trace.obj, I'd need to do something like: >>>>>> >>>>>> f->do_oop(&value); >>>>>> ... >>>>>> trace->store_oop(value); >>>>>> >>>>>> to update the oop internally. Is that right/is that one of the >>>>>> advantages of going to the Oopstorage sooner than later? >>>>>> >>>>> >>>>> I think you really want to do the do_oop on the root directly. Is >>>>> there a particular reason why you would not want to do that? >>>>> Otherwise, yes - the benefit with using the handle approach is that >>>>> you do not need to call do_oop explicitly in your code. >>>>> >>>>> >>>> There is no reason except that now we have a load_oop and a >>>> get_oop_addr, I was not sure what you would think of that. >>>> >>>> >>>> That's fine. >>>> >>>> >>>> >>>>> >>>>>> Also I see a lot of concurrent-looking use of the following field: >>>>>>> 267 volatile bool _initialized; >>>>>>> >>>>>>> Please note that the "volatile" qualifier does not help with >>>>>>> reordering >>>>>>> here. Reordering between volatile and non-volatile fields is >>>>>>> completely free >>>>>>> for both compiler and hardware, except for windows with MSVC, where >>>>>>> volatile >>>>>>> semantics is defined to use acquire/release semantics, and the >>>>>>> hardware is >>>>>>> TSO. But for the general case, I would expect this field to be >>>>>>> stored with >>>>>>> OrderAccess::release_store and loaded with OrderAccess::load_acquire. >>>>>>> Otherwise it is not thread safe. >>>>>>> >>>>>> Because everything is behind a mutex, I wasn't really worried about >>>>>> this. I have a test that has multiple threads trying to hit this >>>>>> corner case and it passes. >>>>>> >>>>>> However, to be paranoid, I updated it to using the OrderAccess API >>>>>> now, thanks! Let me know what you think there too! >>>>>> >>>>> >>>>> If it is indeed always supposed to be read and written under a mutex, >>>>> then I would strongly prefer to have it accessed as a normal non-volatile >>>>> member, and have an assertion that given lock is held or we are in a >>>>> safepoint, as we do in many other places. Something like this: >>>>> >>>>> assert(HeapMonitorStorage_lock->owned_by_self() || >>>>> (SafepointSynchronize::is_at_safepoint() && >>>>> Thread::current()->is_VM_thread()), "this should not be accessed >>>>> concurrently"); >>>>> >>>>> It would be confusing to people reading the code if there are uses of >>>>> OrderAccess that are actually always protected under a mutex. >>>>> >>>>> >>>> Thank you for the exact example to be put in the code! I put it around >>>> each access/assignment of the _initialized method and found one case where >>>> yes you can touch it and not have the lock. It actually is "ok" because you >>>> don't act on the storage until later and only when you really want to >>>> modify the storage (see the object_alloc_do_sample method which calls the >>>> add_trace method). >>>> >>>> But, because of this, I'm going to put the OrderAccess here, I'll do >>>> some performance numbers later and if there are issues, I might add a >>>> "unsafe" read and a "safe" one to make it explicit to the reader. But I >>>> don't think it will come to that. >>>> >>>> >>>> Okay. This double return in heapMonitoring.cpp looks wrong: >>>> >>>> 283 bool initialized() { >>>> 284 return OrderAccess::load_acquire(&_initialized) != 0; >>>> 285 return _initialized; >>>> 286 } >>>> >>>> Since you said object_alloc_do_sample() is the only place where you do >>>> not hold the mutex while reading initialized(), I had a closer look at >>>> that. It looks like in its current shape, the lack of a mutex may lead to a >>>> memory leak. In particular, it first checks if (initialized()). Let's >>>> assume this is now true. It then allocates a bunch of stuff, and checks if >>>> the number of frames were over 0. If they were, it calls >>>> StackTraceStorage::storage()->add_trace() seemingly hoping that after >>>> grabbing the lock in there, initialized() will still return true. But it >>>> could now return false and skip doing anything, in which case the allocated >>>> stuff will never be freed. >>>> >>> >>> I fixed this now by making add_trace return a boolean and checking for >>> that. It will be in the next webrev. Thanks, the truth is that in our >>> implementation the system is always on or off, so this never really occurs >>> :). In this version though, that is not true and it's important to handle >>> so thanks again! >>> >>> >>> >>>> >>>> So the analysis seems to be that _initialized is only used outside of >>>> the mutex in once instance, where it is used to perform double-checked >>>> locking, that actually causes a memory leak. >>>> >>>> I am not proposing how to fix that, just raising the issue. If you >>>> still want to perform this double-checked locking somehow, then the use of >>>> acquire/release still seems odd. Because the memory ordering restrictions >>>> of it never comes into play in this particular case. If it ever did, then >>>> the use of destroy_stuff(); release_store(_initialized, 0) would be broken >>>> anyway as that would imply that whatever concurrent reader there ever was >>>> would after reading _initialized with load_acquire() could *never* read the >>>> data that is concurrently destroyed anyway. I would be biased to think that >>>> RawAccess::load/store looks like a more appropriate solution, >>>> given that the memory leak issue is resolved. I do not know how painful it >>>> would be to not perform this double-checked locking. >>>> >>> >>> So I agree with this entirely. I looked also a bit more and the >>> difference and code really stems from our internal version. In this version >>> however, there are actually a lot of things going on that I did not go >>> entirely through in my head but this comment made me ponder a bit more on >>> it. >>> >>> Since every object_alloc_do_sample is protected by a check to >>> HeapMonitoring::enabled(), there is only a small chance that the call is >>> happening when things have been disabled. So there is no real need to do a >>> first check on the initialized, it is a rare occurence that a call happens >>> to object_alloc_do_sample and the initialized of the storage returns false. >>> >>> (By the way, even if you did call object_alloc_do_sample without looking >>> at HeapMonitoring::enabled(), that would be ok too. You would gather the >>> stacktrace and get nowhere at the add_trace call, which would return false; >>> so though not optimal performance wise, nothing would break). >>> >>> Furthermore, the add_trace is really the moment of no return and we have >>> the mutex lock and then the initialized check. So, in the end, I did two >>> things: I removed that first check and then I removed the OrderAccess for >>> the storage initialized. I think now I have a better grasp and >>> understanding why it was done in our code and why it is not needed here. >>> Thanks for pointing it out :). This now still passes my JTREG tests, >>> especially the threaded one. >>> >>> >>> >>> >>> >>>> >>>> >>>> >>>> >>>> >>>>> As a kind of meta comment, I wonder if it would make sense to add >>>>>>> sampling >>>>>>> for non-TLAB allocations. Seems like if someone is rapidly >>>>>>> allocating a >>>>>>> whole bunch of 1 MB objects that never fit in a TLAB, I might still >>>>>>> be >>>>>>> interested in seeing that in my traces, and not get surprised that >>>>>>> the >>>>>>> allocation rate is very high yet not showing up in any profiles. >>>>>>> >>>>>>> That is handled by the handle_sample where you wanted me to put a >>>>>> UseTlab because you hit that case if the allocation is too big. >>>>>> >>>>> >>>>> I see. It was not obvious to me that non-TLAB sampling is done in the >>>>> TLAB class. That seems like an abstraction crime. >>>>> What I wanted in my previous comment was that we do not call into the >>>>> TLAB when we are not using TLABs. If there is sampling logic in the TLAB >>>>> that is used for something else than TLABs, then it seems like that logic >>>>> simply does not belong inside of the TLAB. It should be moved out of the >>>>> TLAB, and instead have the TLAB call this common abstraction that makes >>>>> sense. >>>>> >>>>> >>>> So in the incremental version: >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is >>>> still a "crime". The reason is that the system has to have the >>>> bytes_until_sample on a per-thread level and it made "sense" to have it >>>> with the TLAB implementation. Also, I was not sure how people felt about >>>> adding something to the thread instance instead. >>>> >>>> Do you think it fits better at the Thread level? I can see how >>>> difficult it is to make it happen there and add some logic there. Let me >>>> know what you think. >>>> >>>> >>>> We have an unfortunate situation where everyone that has some fields >>>> that are thread local tend to dump them right into Thread, making the size >>>> and complexity of Thread grow as it becomes tightly coupled with various >>>> unrelated subsystems. It would be desirable to have a separate class for >>>> this instead that encapsulates the sampling logic. That class could >>>> possibly reside in Thread though as a value object of Thread. >>>> >>> >>> I imagined that would be the case but was not sure. I will look at the >>> example that Robbin is talking about (ThreadSMR) and will see how to >>> refactor my code to use that. >>> >>> Thanks again for your help, >>> Jc >>> >>> >>>> >>>> >>>> >>>> >>>> >>>>> Hope I have answered your questions and that my feedback makes sense >>>>> to you. >>>>> >>>>> >>>> You have and thank you for them, I think we are getting to a cleaner >>>> implementation and things are getting better and more readable :) >>>> >>>> >>>> Yes it is getting better. >>>> >>>> Thanks, >>>> /Erik >>>> >>>> >>>> Thanks for your help! >>>> Jc >>>> >>>> >>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>> >>>>> I double checked by changing the test >>>>>> >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java >>>>>> >>>>>> to use a smaller Tlab (2048) and made the object bigger and it goes >>>>>> through that and passes. >>>>>> >>>>>> Thanks again for your review and I look forward to your pointers for >>>>>> the questions I now have raised! >>>>>> Jc >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>>> /Erik >>>>>>> >>>>>>> >>>>>>> On 2018-01-26 06:45, JC Beyler wrote: >>>>>>> >>>>>>>> Thanks Robbin for the reviews :) >>>>>>>> >>>>>>>> The new full webrev is here: >>>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ >>>>>>>> The incremental webrev is here: >>>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ >>>>>>>> >>>>>>>> I inlined my answers: >>>>>>>> >>>>>>>> On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn < >>>>>>>> robbin.ehn at oracle.com> wrote: >>>>>>>> >>>>>>>>> Hi JC, great to see another revision! >>>>>>>>> >>>>>>>>> #### >>>>>>>>> heapMonitoring.cpp >>>>>>>>> >>>>>>>>> StackTraceData should not contain the oop for 'safety' reasons. >>>>>>>>> When StackTraceData is moved from _allocated_traces: >>>>>>>>> L452 store_garbage_trace(trace); >>>>>>>>> it contains a dead oop. >>>>>>>>> _allocated_traces could instead be a tupel of oop and >>>>>>>>> StackTraceData thus >>>>>>>>> dead oops are not kept. >>>>>>>>> >>>>>>>> Done I used inheritance to make the copier work regardless but the >>>>>>>> idea is the same. >>>>>>>> >>>>>>>> You should use the new Access API for loading the oop, something >>>>>>>>> like >>>>>>>>> this: >>>>>>>>> RootAccess::load(...) >>>>>>>>> I don't think you need to use Access API for clearing the oop, but >>>>>>>>> it >>>>>>>>> would >>>>>>>>> look nicer. And you shouldn't probably be using: >>>>>>>>> Universe::heap()->is_in_reserved(value) >>>>>>>>> >>>>>>>> I am unfamiliar with this but I think I did do it like you wanted me >>>>>>>> to (all tests pass so that's a start). I'm not sure how to clear the >>>>>>>> oop exactly, is there somewhere that does that, which I can use to >>>>>>>> do >>>>>>>> the same? >>>>>>>> >>>>>>>> I removed the is_in_reserved, this came from our internal version, I >>>>>>>> don't know why it was there but my tests work without so I removed >>>>>>>> it >>>>>>>> :) >>>>>>>> >>>>>>>> >>>>>>>> The lock: >>>>>>>>> L424 MutexLocker mu(HeapMonitorStorage_lock); >>>>>>>>> Is not needed as far as I can see. >>>>>>>>> weak_oops_do is called in a safepoint, no TLAB allocation can >>>>>>>>> happen and >>>>>>>>> JVMTI thread can't access these data-structures. Is there >>>>>>>>> something more >>>>>>>>> to >>>>>>>>> this lock that I'm missing? >>>>>>>>> >>>>>>>> Since a thread can call the JVMTI getLiveTraces (or any of the other >>>>>>>> ones), it can get to the point of trying to copying the >>>>>>>> _allocated_traces. I imagine it is possible that this is happening >>>>>>>> during a GC or that it can be started and a GC happens afterwards. >>>>>>>> Therefore, it seems to me that you want this protected, no? >>>>>>>> >>>>>>>> >>>>>>>> #### >>>>>>>>> You have 6 files without any changes in them (any more): >>>>>>>>> g1CollectedHeap.cpp >>>>>>>>> psMarkSweep.cpp >>>>>>>>> psParallelCompact.cpp >>>>>>>>> genCollectedHeap.cpp >>>>>>>>> referenceProcessor.cpp >>>>>>>>> thread.hpp >>>>>>>>> >>>>>>>>> Done. >>>>>>>> >>>>>>>> #### >>>>>>>>> I have not looked closely, but is it possible to hide heap >>>>>>>>> sampling in >>>>>>>>> AllocTracer ? (with some minor changes to the AllocTracer API) >>>>>>>>> >>>>>>>>> I am imagining that you are saying to move the code that does the >>>>>>>> sampling code (change the tlab end, do the call to HeapMonitoring, >>>>>>>> etc.) into the AllocTracer code itself? I think that is right and >>>>>>>> I'll >>>>>>>> look if that is possible and prepare a webrev to show what would be >>>>>>>> needed to make that happen. >>>>>>>> >>>>>>>> #### >>>>>>>>> Minor nit, when declaring pointer there is a little mix of having >>>>>>>>> the >>>>>>>>> pointer adjacent by type name and data name. (Most hotspot code is >>>>>>>>> by >>>>>>>>> type >>>>>>>>> name) >>>>>>>>> E.g. >>>>>>>>> heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... >>>>>>>>> heapMonitoring.cpp:733 Method* m = vfst.method(); >>>>>>>>> (not just this file) >>>>>>>>> >>>>>>>>> Done! >>>>>>>> >>>>>>>> #### >>>>>>>>> HeapMonitorThreadOnOffTest.java:77 >>>>>>>>> I would make g_tmp volatile, otherwise the assignment in loop may >>>>>>>>> theoretical be skipped. >>>>>>>>> >>>>>>>>> Also done! >>>>>>>> >>>>>>>> Thanks again! >>>>>>>> Jc >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Mar 28 16:02:13 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 28 Mar 2018 16:02:13 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> Message-ID: <589e4fdca2bc47f197066a0f110e7d34@sap.com> Hi Vladimir, I have addressed your proposals with this new webrev: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.02/ Please take a look. Thanks for your support. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Montag, 26. M?rz 2018 23:15 To: Doerr, Martin Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads Hi Martin, We can't delete _log when deleting CompilerThread. Log is referenced globally and used on VM exit to generate final log file when -XX:+LogCompilation is specified. compilercontrol tests passed after I change it: +CompilerThread::~CompilerThread() { + // Delete objects which were allocated on heap. + delete _counters; + // _log is referenced in global CompileLog::_first chain and used on exit. +} I also see that we C1 compiler threads are removed too soon which cause their re-activation again. This may eat memory: $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t Added initial compiler thread C2 CompilerThread0 Added initial compiler thread C1 CompilerThread0 Warning: TraceDependencies results may be inflated by VerifyDependencies Added compiler thread C1 CompilerThread1 (available memory: 37040MB) Added compiler thread C1 CompilerThread2 (available memory: 37033MB) Added compiler thread C1 CompilerThread3 (available memory: 37032MB) Removing compiler thread C1 CompilerThread3 Removing compiler thread C1 CompilerThread2 Removing compiler thread C1 CompilerThread1 Added compiler thread C1 CompilerThread1 (available memory: 37027MB) May be we should take into account for how long these threads are not used. Thanks, Vladimir On 3/23/18 5:58 PM, Vladimir Kozlov wrote: > On 3/23/18 10:37 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> thanks for the quick reply. Just a few answers. I'll take a closer >> look next week. >> >>> You can't delete thread when it is NULL >> C++ supports calling delete NULL so I think it would be uncommon to >> check it. If there's a problem, I think the delete operator should get >> fixed. >> >> "If expression evaluates to a null pointer value, no destructors are >> called, and the deallocation function may or may not be called (it's >> implementation-defined), but the default deallocation functions are >> guaranteed to do nothing when handed a null pointer." [1] > > I am sure our code analyzing tool, which we use to check code > correctness, will compliant about it. > >> >>> We may need to free corresponding java thread object when we remove >>> compiler threads. >> I think it would be bad to remove the Java Thread objects because we'd >> need to recreate them which is rather expensive and violates the >> design principle that Compiler Threads are not allowed to call Java. >> Removing them wouldn't save much memory. Keeping them in global >> handles seems to be beneficial and makes this change easier. > > Okay. > >> >>> And I thought we would need to add only one threads each time when we >>> hit some queue size threshold. At the start queues filled up very >>> fast so you may end up creating all compiler threads. >> My current formula only creates as much compiler threads so that there >> exist 2 compile jobs per thread. I think this is better for startup, >> but we can reevaluate this. > > Would be nice to see graph how number of compiler threads change with > time depending on load for some applications (for example, jbb2005 and > specjvm2008 if you have them)? > >> >> Thanks for the improvement proposals. I'll implement them next week. >> Nevertheless, the current version can already be tested. > > I started our testing. > > I just remember that we may need to treat -Xcomp and CTW cases specially. > > I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. > And also tier1 compiler tests with Graal as JIT > (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI > -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). > They passed. I think JVMCI code is fine. > > But I see crash in CompileLog::finish_log_on_error() function in > compiler/compilercontrol jtreg tests (they are not in tier1) with normal > jtreg runs: > > FAILED: compiler/compilercontrol/commandfile/LogTest.java > FAILED: compiler/compilercontrol/commands/LogTest.java > FAILED: compiler/compilercontrol/directives/LogTest.java > FAILED: compiler/compilercontrol/jcmd/AddLogTest.java > FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java > FAILED: compiler/compilercontrol/logcompilation/LogTest.java > > I started performance testing too. > > Thanks, > Vladimir > >> >> Best regards, >> Martin >> >> >> [1] http://en.cppreference.com/w/cpp/language/delete >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Freitag, 23. M?rz 2018 18:17 >> To: Doerr, Martin >> Cc: Igor Veresov (igor.veresov at oracle.com) ; >> White, Derek ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >> >> Very cool! >> >> Few thoughts. >> >> You can't delete thread when it is NULL (missing check or refactor code): >> >> ???? if (thread == NULL || thread->osthread() == NULL) { >> +??? if (UseDynamicNumberOfCompilerThreads && >> comp->num_compiler_threads() > 0) { >> +????? delete thread; >> >> Why not keep handle instead of returning naked oop from >> create_thread_oop()? You create Handle again >> >> Start fields names with _ to distinguish them from local variable: >> >> +? static int c1_count, c2_count; >> >> In possibly_add_compiler_threads() you can use c2_count instead of >> calling compile_count() again and array size is fixed >> already: >> >> +??? int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >> + >> CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >> >> And I thought we would need to add only one threads each time when we >> hit some queue size threshold. At the start queues >> filled up very fast so you may end up creating all compiler threads. >> Or we may need more complex formula. >> >> We may need to free corresponding java thread object when we remove >> compiler threads. >> >> Thanks, >> Vladimir >> >> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> thanks for updating the RFE. I already had similar ideas so I've >>> implemented a prototype. >>> >>> I'll be glad if you can support this effort. >>> >>> My implementation starts only one thread per type (C1/C2) initially. >>> Compiler threads start additional threads depending >>> on the compile queue size, the available memory and the predetermined >>> maximum. The Java Thread objects get created >>> during startup so the Compiler Threads don't need to call Java. >>> >>> The heuristics (in possibly_add_compiler_threads()) are just an >>> initial proposal and we may want to add tuning >>> parameters or different numbers. >>> >>> Threads get stopped in reverse order as they were created when their >>> compile queue is empty for some time. >>> >>> The feature can be switched by >>> -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal >>> can be traced by >>> -XX:+TraceCompilerThreads. >>> >>> Webrev is here: >>> >>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>> >>> The following issues need to get addressed, yet: >>> >>> -Test JVMCI support. (I'm not familiar with it.) >>> >>> -Possible memory leaks. I've added some delete calls when a thread >>> dies, but they may be incomplete. >>> >>> -Logging. >>> >>> -Performance and memory consumption evaluation. >>> >>> It would be great to get support and advice for these issues. >>> >>> Best regards, >>> >>> Martin >>> From vladimir.kozlov at oracle.com Wed Mar 28 16:26:06 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Mar 2018 09:26:06 -0700 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: <88880294-A626-4B1C-8D67-58206BCE85DE@sap.com> References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> <42b84c01-3467-32cf-1c57-5d3d0e991c1b@oracle.com> <88880294-A626-4B1C-8D67-58206BCE85DE@sap.com> Message-ID: <785983ad-fcb0-5290-a998-f1d8e909cfbf@oracle.com> Hi Lutz, I hope you feel better today. We hit another bug https://bugs.openjdk.java.net/browse/JDK-8200366 but I was not able to reproduce crash on linux-x64. Regards, Vladimir On 3/28/18 12:50 AM, Schmidt, Lutz wrote: > Dear all, > > first of all a big THANK YOU to all involved for detecting, handling and > fixing this issue. I was in really bad shape yesterday. > > Regards, > > Lutz > > *From: *hotspot-compiler-dev > on behalf of Thomas > St?fe > *Date: *Wednesday, 28. March 2018 at 09:18 > *To: *Vladimir Kozlov > *Cc: *hotspot compiler > *Subject: *Re: RFR(xxs): 8200297: Build failures after JDK-8198691 > (CodeHeap State Analytics) > > Hi, > > On Tue, Mar 27, 2018 at 10:18 PM, Vladimir Kozlov > > wrote: > > On 3/27/18 12:34 PM, Thomas St?fe wrote: > > > > On Tue, Mar 27, 2018 at 9:03 PM, Vladimir Kozlov > >> > wrote: > > ? ? Looks good to me too. > > ? ? But it should be verified by someone. Our (Oracle) Java > test system > ? ? and submit-hs (which use it) does not build or test on any > x86_32 > ? ? systems. > > > Well, I did build on Linux 32bit. Is that sufficient or do you > need someone else to test? > > > If you can run HelloWorld.java with -Xlog:codecache=Trace to use > stat code, it will be sufficient for me. > > Thanks, > Vladimir > > Tested on ubuntu 16.4 32bit. Looks good. Pushed. > > Thanks, Thomas > > > Thanks, Thomas > > ? ? Regards, > ? ? Vladimir > > > ? ? On 3/27/18 8:31 AM, Aleksey Shipilev wrote: > > ? ? ? ? On 03/27/2018 05:27 PM, Thomas St?fe wrote: > > http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ > > > > > ? ? ? ? OK from me, assuming it still builds x86_32, and passes > submit-hs. > > ? ? ? ? Thanks, > ? ? ? ? -Aleksey > From lutz.schmidt at sap.com Wed Mar 28 16:54:51 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 28 Mar 2018 16:54:51 +0000 Subject: RFR(xxs): 8200297: Build failures after JDK-8198691 (CodeHeap State Analytics) In-Reply-To: <785983ad-fcb0-5290-a998-f1d8e909cfbf@oracle.com> References: <2f6bdfe4-f89a-c77c-ee11-6a825be6eeca@redhat.com> <0bc0b507-f9fb-318b-5eed-be5eb48400ba@oracle.com> <42b84c01-3467-32cf-1c57-5d3d0e991c1b@oracle.com> <88880294-A626-4B1C-8D67-58206BCE85DE@sap.com> <785983ad-fcb0-5290-a998-f1d8e909cfbf@oracle.com> Message-ID: <9B42650E-65B6-41DF-BC95-F11E58931BC5@sap.com> Hi Vladimir, yes, I'm feeling much better. It was a short-lived, out of nothing virus attack. I will look into this more deeply asap (this evening). A short glance at the hs_err* file did not reveal anything helpful. Regards, Lutz ?On 28.03.18, 18:26, "Vladimir Kozlov" wrote: Hi Lutz, I hope you feel better today. We hit another bug https://bugs.openjdk.java.net/browse/JDK-8200366 but I was not able to reproduce crash on linux-x64. Regards, Vladimir On 3/28/18 12:50 AM, Schmidt, Lutz wrote: > Dear all, > > first of all a big THANK YOU to all involved for detecting, handling and > fixing this issue. I was in really bad shape yesterday. > > Regards, > > Lutz > > *From: *hotspot-compiler-dev > on behalf of Thomas > St?fe > *Date: *Wednesday, 28. March 2018 at 09:18 > *To: *Vladimir Kozlov > *Cc: *hotspot compiler > *Subject: *Re: RFR(xxs): 8200297: Build failures after JDK-8198691 > (CodeHeap State Analytics) > > Hi, > > On Tue, Mar 27, 2018 at 10:18 PM, Vladimir Kozlov > > wrote: > > On 3/27/18 12:34 PM, Thomas St?fe wrote: > > > > On Tue, Mar 27, 2018 at 9:03 PM, Vladimir Kozlov > >> > wrote: > > Looks good to me too. > > But it should be verified by someone. Our (Oracle) Java > test system > and submit-hs (which use it) does not build or test on any > x86_32 > systems. > > > Well, I did build on Linux 32bit. Is that sufficient or do you > need someone else to test? > > > If you can run HelloWorld.java with -Xlog:codecache=Trace to use > stat code, it will be sufficient for me. > > Thanks, > Vladimir > > Tested on ubuntu 16.4 32bit. Looks good. Pushed. > > Thanks, Thomas > > > Thanks, Thomas > > Regards, > Vladimir > > > On 3/27/18 8:31 AM, Aleksey Shipilev wrote: > > On 03/27/2018 05:27 PM, Thomas St?fe wrote: > > http://cr.openjdk.java.net/~stuefe/webrevs/8200297-builderrors-on-x86-32-after-codeheap-lucy/webrev.01/webrev/ > > > > > OK from me, assuming it still builds x86_32, and passes > submit-hs. > > Thanks, > -Aleksey > From vladimir.kozlov at oracle.com Wed Mar 28 18:26:58 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Mar 2018 11:26:58 -0700 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: References: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> Message-ID: <0bb06579-9429-c4da-2b3d-52fcf65f07dc@oracle.com> Did not help :( Some kind of linker weirdness: log2f workspace/build/solaris-sparcv9-debug/hotspot/variant-server/libjvm/objs/parse2.o (symbol belongs to implicit dependency /work_dir/jib-master/install/jpg/infra/builddeps/devkit-solaris_sparcv9/SS12u4-Solaris11u1+1.1/devkit-solaris_sparcv9-SS12u4-Solaris11u1+1.1.tar.gz/SS12u4-Solaris11u1/sysroot/usr/lib/sparcv9/libm.so.2) Vladimir On 3/28/18 12:42 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for helping again. > >> Failed to build on SPARC (PCH?): >> >> Undefined first referenced >> symbol in file >> log2f >> /workspace/build/solaris-sparcv9/hotspot/variant-server/libjvm/objs/parse2.o > > It builds fine without precompiled headers on linux. > > I added a: > > #include > > http://cr.openjdk.java.net/~roland/8200303/webrev.01/ > > Can you try that one? > > Roland. > From vladimir.kozlov at oracle.com Wed Mar 28 19:26:16 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Mar 2018 12:26:16 -0700 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: <0bb06579-9429-c4da-2b3d-52fcf65f07dc@oracle.com> References: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> <0bb06579-9429-c4da-2b3d-52fcf65f07dc@oracle.com> Message-ID: We do link with /lib/sparcv9/libm.so.1 but it does not have log2f() :( http://hg.openjdk.java.net/jdk/hs/file/6d5bd76650df/make/autoconf/libraries.m4#l116 I will file a bug. Vladimir On 3/28/18 11:26 AM, Vladimir Kozlov wrote: > Did not help :( > > Some kind of linker weirdness: > > log2f > workspace/build/solaris-sparcv9-debug/hotspot/variant-server/libjvm/objs/parse2.o > ?(symbol belongs to implicit dependency > /work_dir/jib-master/install/jpg/infra/builddeps/devkit-solaris_sparcv9/SS12u4-Solaris11u1+1.1/devkit-solaris_sparcv9-SS12u4-Solaris11u1+1.1.tar.gz/SS12u4-Solaris11u1/sysroot/usr/lib/sparcv9/libm.so.2) > > > Vladimir > > On 3/28/18 12:42 AM, Roland Westrelin wrote: >> >> Hi Vladimir, >> >> Thanks for helping again. >> >>> Failed to build on SPARC (PCH?): >>> >>> Undefined??????????? first referenced >>> symbol????????????????? in file >>> log2f >>> /workspace/build/solaris-sparcv9/hotspot/variant-server/libjvm/objs/parse2.o >>> >> >> It builds fine without precompiled headers on linux. >> >> I added a: >> >> #include >> >> http://cr.openjdk.java.net/~roland/8200303/webrev.01/ >> >> Can you try that one? >> >> Roland. >> From vladimir.kozlov at oracle.com Wed Mar 28 21:12:00 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Mar 2018 14:12:00 -0700 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: References: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> <0bb06579-9429-c4da-2b3d-52fcf65f07dc@oracle.com> Message-ID: <578bc5fc-516c-91b1-6434-4e146e91195b@oracle.com> Hi Roland, I filed 8200383 [1] and sent fix for review. With that fix you don't need to include math.h - build passed with webrev.00 version. I think you need an other Reviewer to look on your changes. Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8200383 On 3/28/18 12:26 PM, Vladimir Kozlov wrote: > We do link with /lib/sparcv9/libm.so.1 but it does not have log2f() :( > > http://hg.openjdk.java.net/jdk/hs/file/6d5bd76650df/make/autoconf/libraries.m4#l116 > > > I will file a bug. > > Vladimir > > On 3/28/18 11:26 AM, Vladimir Kozlov wrote: >> Did not help :( >> >> Some kind of linker weirdness: >> >> log2f >> workspace/build/solaris-sparcv9-debug/hotspot/variant-server/libjvm/objs/parse2.o >> ??(symbol belongs to implicit dependency >> /work_dir/jib-master/install/jpg/infra/builddeps/devkit-solaris_sparcv9/SS12u4-Solaris11u1+1.1/devkit-solaris_sparcv9-SS12u4-Solaris11u1+1.1.tar.gz/SS12u4-Solaris11u1/sysroot/usr/lib/sparcv9/libm.so.2) >> >> >> Vladimir >> >> On 3/28/18 12:42 AM, Roland Westrelin wrote: >>> >>> Hi Vladimir, >>> >>> Thanks for helping again. >>> >>>> Failed to build on SPARC (PCH?): >>>> >>>> Undefined??????????? first referenced >>>> symbol????????????????? in file >>>> log2f >>>> /workspace/build/solaris-sparcv9/hotspot/variant-server/libjvm/objs/parse2.o >>>> >>> >>> It builds fine without precompiled headers on linux. >>> >>> I added a: >>> >>> #include >>> >>> http://cr.openjdk.java.net/~roland/8200303/webrev.01/ >>> >>> Can you try that one? >>> >>> Roland. >>> From rwestrel at redhat.com Thu Mar 29 07:18:06 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 29 Mar 2018 09:18:06 +0200 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: <578bc5fc-516c-91b1-6434-4e146e91195b@oracle.com> References: <85ee42ef-1323-8513-164a-392592bed7d4@oracle.com> <0bb06579-9429-c4da-2b3d-52fcf65f07dc@oracle.com> <578bc5fc-516c-91b1-6434-4e146e91195b@oracle.com> Message-ID: > I filed 8200383 [1] and sent fix for review. With that fix you don't > need to include math.h - build passed with webrev.00 version. Thanks for chasing that down! Roland. From tobias.hartmann at oracle.com Thu Mar 29 10:05:30 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 29 Mar 2018 12:05:30 +0200 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: References: Message-ID: Hi Roland, this looks good to me! It's hard to review but here are some comments: cfgnode.hpp - "probabily" -> "probability" machnode.hpp - Do we need a forward declaration of MachJumpNode at the beginning of the header file (like we do have for the other MachNodes)? - I think you also need to define MachJumpNode in vmStructs.cpp parse2.cpp - I would prefer "always/never taken" over "taken always/never" - 1160: what is 'ifff' used for? Best regards, Tobias On 27.03.2018 16:35, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8200303/webrev.00/ > > Over in the Shenandoah project, Aleksey found that: > > 1) a counted loop with a switch: > > for (...) { > switch (..) { > ... > default: > throw SomeException(); > } > } > > where some cases break out of the loop would not perform as well when > loop strip mining is enabled, even if the cases that exit the loop are > never taken in practice. > > Because C2 gives all branches out of a JumpNode the same probability, > exiting the loop has a non null probability and GCM computes (wrongly) > that scheduling the loop strip mining book keeping logic in the loop is > cheaper than out of the loop. > > 2) Shenandoah write barriers in some of the cases should be hoisted but > are not because C2 can't tell that only a single case of the switch is > ever hit. > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html > > In the Shenandoah repo, we have a change that makes C2 leverage > profiling for switch. Experiments showed that 1) and 2) above are fixed > and that some common benchmarks run with parallel gc benefit as well > (~+7% on Serial): > > http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-February/004886.html > > The patch I'm proposing here is based on the patch we've been using for > a couple months in Shenandoah: > > - it fixes profile collection in c1 for lookupswitch/tableswitch > > - it sets profiling information on IfNodes and JumpNodes emitted from > lookupswitch/tableswitch and propagate it after matching so GCM can > take advantage of it > > - it takes advantage of profiling to find never taken cases and trim > down the cases (or ranges as they're called in the code). A never > taken range can now cause an uncommon trap. > > and also has some improvements: > > - if some ranges are a lot more common than others, it might pay off to > check for them one after the other before going to the binary > search. The patch has some logic to evaluate the number of steps in > the binary search and determine whether checking for the most common > case upfront would pay off (from profile data) > > - the binary search doesn't always keep the tree balanced but instead > picks a mid point that split frequencies in half > > Roland. > From rwestrel at redhat.com Thu Mar 29 13:58:17 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 29 Mar 2018 15:58:17 +0200 Subject: RFR(M): 8200303: C2 should leverage profiling for lookupswitch/tableswitch In-Reply-To: <594841547.994777.1522166097222.JavaMail.zimbra@u-pem.fr> References: <594841547.994777.1522166097222.JavaMail.zimbra@u-pem.fr> Message-ID: > I think you can also close JDK-8058192 as a dup. There are some comments about graal in this CR. So I'll let the owner of the bug (Dean) decide whether he wants to close it or not. Roland. From vladimir.kozlov at oracle.com Thu Mar 29 16:42:14 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 09:42:14 -0700 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <589e4fdca2bc47f197066a0f110e7d34@sap.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> <589e4fdca2bc47f197066a0f110e7d34@sap.com> Message-ID: <13980bc9-3a12-23d3-477d-596b4c2432ee@oracle.com> Hi Martin, Thank you for update. Does using thread->smr_delete() solve runtime/whitebox/WBStackSize.java test problem I reported in RFE? I see you kept _c1_compile_queue->size() / 2. I think it create too many C1 threads reaching max number _c1_count very fast. I will start testing with 02 changes and let you know results. Thanks, Vladimir On 3/28/18 9:02 AM, Doerr, Martin wrote: > Hi Vladimir, > > I have addressed your proposals with this new webrev: > http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.02/ > > Please take a look. > > Thanks for your support. Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Montag, 26. M?rz 2018 23:15 > To: Doerr, Martin > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads > > Hi Martin, > > We can't delete _log when deleting CompilerThread. Log is referenced > globally and used on VM exit to generate final log file when > -XX:+LogCompilation is specified. compilercontrol tests passed after I > change it: > > +CompilerThread::~CompilerThread() { > + // Delete objects which were allocated on heap. > + delete _counters; > + // _log is referenced in global CompileLog::_first chain and used on > exit. > +} > > I also see that we C1 compiler threads are removed too soon which cause > their re-activation again. This may eat memory: > > $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t > Added initial compiler thread C2 CompilerThread0 > Added initial compiler thread C1 CompilerThread0 > Warning: TraceDependencies results may be inflated by VerifyDependencies > Added compiler thread C1 CompilerThread1 (available memory: 37040MB) > Added compiler thread C1 CompilerThread2 (available memory: 37033MB) > Added compiler thread C1 CompilerThread3 (available memory: 37032MB) > Removing compiler thread C1 CompilerThread3 > Removing compiler thread C1 CompilerThread2 > Removing compiler thread C1 CompilerThread1 > Added compiler thread C1 CompilerThread1 (available memory: 37027MB) > > May be we should take into account for how long these threads are not used. > > Thanks, > Vladimir > > On 3/23/18 5:58 PM, Vladimir Kozlov wrote: >> On 3/23/18 10:37 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> thanks for the quick reply. Just a few answers. I'll take a closer >>> look next week. >>> >>>> You can't delete thread when it is NULL >>> C++ supports calling delete NULL so I think it would be uncommon to >>> check it. If there's a problem, I think the delete operator should get >>> fixed. >>> >>> "If expression evaluates to a null pointer value, no destructors are >>> called, and the deallocation function may or may not be called (it's >>> implementation-defined), but the default deallocation functions are >>> guaranteed to do nothing when handed a null pointer." [1] >> >> I am sure our code analyzing tool, which we use to check code >> correctness, will compliant about it. >> >>> >>>> We may need to free corresponding java thread object when we remove >>>> compiler threads. >>> I think it would be bad to remove the Java Thread objects because we'd >>> need to recreate them which is rather expensive and violates the >>> design principle that Compiler Threads are not allowed to call Java. >>> Removing them wouldn't save much memory. Keeping them in global >>> handles seems to be beneficial and makes this change easier. >> >> Okay. >> >>> >>>> And I thought we would need to add only one threads each time when we >>>> hit some queue size threshold. At the start queues filled up very >>>> fast so you may end up creating all compiler threads. >>> My current formula only creates as much compiler threads so that there >>> exist 2 compile jobs per thread. I think this is better for startup, >>> but we can reevaluate this. >> >> Would be nice to see graph how number of compiler threads change with >> time depending on load for some applications (for example, jbb2005 and >> specjvm2008 if you have them)? >> >>> >>> Thanks for the improvement proposals. I'll implement them next week. >>> Nevertheless, the current version can already be tested. >> >> I started our testing. >> >> I just remember that we may need to treat -Xcomp and CTW cases specially. >> >> I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. >> And also tier1 compiler tests with Graal as JIT >> (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI >> -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). >> They passed. I think JVMCI code is fine. >> >> But I see crash in CompileLog::finish_log_on_error() function in >> compiler/compilercontrol jtreg tests (they are not in tier1) with normal >> jtreg runs: >> >> FAILED: compiler/compilercontrol/commandfile/LogTest.java >> FAILED: compiler/compilercontrol/commands/LogTest.java >> FAILED: compiler/compilercontrol/directives/LogTest.java >> FAILED: compiler/compilercontrol/jcmd/AddLogTest.java >> FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java >> FAILED: compiler/compilercontrol/logcompilation/LogTest.java >> >> I started performance testing too. >> >> Thanks, >> Vladimir >> >>> >>> Best regards, >>> Martin >>> >>> >>> [1] http://en.cppreference.com/w/cpp/language/delete >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Freitag, 23. M?rz 2018 18:17 >>> To: Doerr, Martin >>> Cc: Igor Veresov (igor.veresov at oracle.com) ; >>> White, Derek ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >>> >>> Very cool! >>> >>> Few thoughts. >>> >>> You can't delete thread when it is NULL (missing check or refactor code): >>> >>> ???? if (thread == NULL || thread->osthread() == NULL) { >>> +??? if (UseDynamicNumberOfCompilerThreads && >>> comp->num_compiler_threads() > 0) { >>> +????? delete thread; >>> >>> Why not keep handle instead of returning naked oop from >>> create_thread_oop()? You create Handle again >>> >>> Start fields names with _ to distinguish them from local variable: >>> >>> +? static int c1_count, c2_count; >>> >>> In possibly_add_compiler_threads() you can use c2_count instead of >>> calling compile_count() again and array size is fixed >>> already: >>> >>> +??? int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >>> + >>> CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >>> >>> And I thought we would need to add only one threads each time when we >>> hit some queue size threshold. At the start queues >>> filled up very fast so you may end up creating all compiler threads. >>> Or we may need more complex formula. >>> >>> We may need to free corresponding java thread object when we remove >>> compiler threads. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>>> Hi Vladimir, >>>> >>>> thanks for updating the RFE. I already had similar ideas so I've >>>> implemented a prototype. >>>> >>>> I'll be glad if you can support this effort. >>>> >>>> My implementation starts only one thread per type (C1/C2) initially. >>>> Compiler threads start additional threads depending >>>> on the compile queue size, the available memory and the predetermined >>>> maximum. The Java Thread objects get created >>>> during startup so the Compiler Threads don't need to call Java. >>>> >>>> The heuristics (in possibly_add_compiler_threads()) are just an >>>> initial proposal and we may want to add tuning >>>> parameters or different numbers. >>>> >>>> Threads get stopped in reverse order as they were created when their >>>> compile queue is empty for some time. >>>> >>>> The feature can be switched by >>>> -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal >>>> can be traced by >>>> -XX:+TraceCompilerThreads. >>>> >>>> Webrev is here: >>>> >>>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>>> >>>> The following issues need to get addressed, yet: >>>> >>>> -Test JVMCI support. (I'm not familiar with it.) >>>> >>>> -Possible memory leaks. I've added some delete calls when a thread >>>> dies, but they may be incomplete. >>>> >>>> -Logging. >>>> >>>> -Performance and memory consumption evaluation. >>>> >>>> It would be great to get support and advice for these issues. >>>> >>>> Best regards, >>>> >>>> Martin >>>> From martin.doerr at sap.com Thu Mar 29 17:15:13 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 29 Mar 2018 17:15:13 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <13980bc9-3a12-23d3-477d-596b4c2432ee@oracle.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> <589e4fdca2bc47f197066a0f110e7d34@sap.com> <13980bc9-3a12-23d3-477d-596b4c2432ee@oracle.com> Message-ID: <767c3e87c23246e89c2c6d368aa30bcb@sap.com> Hi Vladimir, sorry, I had missed your proposals you have added to the bug while I was sick. webrev.02 is only based on the emails. I'll think about the ideas which were written in the bug next week. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 29. M?rz 2018 18:42 To: Doerr, Martin Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads Hi Martin, Thank you for update. Does using thread->smr_delete() solve runtime/whitebox/WBStackSize.java test problem I reported in RFE? I see you kept _c1_compile_queue->size() / 2. I think it create too many C1 threads reaching max number _c1_count very fast. I will start testing with 02 changes and let you know results. Thanks, Vladimir On 3/28/18 9:02 AM, Doerr, Martin wrote: > Hi Vladimir, > > I have addressed your proposals with this new webrev: > http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.02/ > > Please take a look. > > Thanks for your support. Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Montag, 26. M?rz 2018 23:15 > To: Doerr, Martin > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads > > Hi Martin, > > We can't delete _log when deleting CompilerThread. Log is referenced > globally and used on VM exit to generate final log file when > -XX:+LogCompilation is specified. compilercontrol tests passed after I > change it: > > +CompilerThread::~CompilerThread() { > + // Delete objects which were allocated on heap. > + delete _counters; > + // _log is referenced in global CompileLog::_first chain and used on > exit. > +} > > I also see that we C1 compiler threads are removed too soon which cause > their re-activation again. This may eat memory: > > $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t > Added initial compiler thread C2 CompilerThread0 > Added initial compiler thread C1 CompilerThread0 > Warning: TraceDependencies results may be inflated by VerifyDependencies > Added compiler thread C1 CompilerThread1 (available memory: 37040MB) > Added compiler thread C1 CompilerThread2 (available memory: 37033MB) > Added compiler thread C1 CompilerThread3 (available memory: 37032MB) > Removing compiler thread C1 CompilerThread3 > Removing compiler thread C1 CompilerThread2 > Removing compiler thread C1 CompilerThread1 > Added compiler thread C1 CompilerThread1 (available memory: 37027MB) > > May be we should take into account for how long these threads are not used. > > Thanks, > Vladimir > > On 3/23/18 5:58 PM, Vladimir Kozlov wrote: >> On 3/23/18 10:37 AM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> thanks for the quick reply. Just a few answers. I'll take a closer >>> look next week. >>> >>>> You can't delete thread when it is NULL >>> C++ supports calling delete NULL so I think it would be uncommon to >>> check it. If there's a problem, I think the delete operator should get >>> fixed. >>> >>> "If expression evaluates to a null pointer value, no destructors are >>> called, and the deallocation function may or may not be called (it's >>> implementation-defined), but the default deallocation functions are >>> guaranteed to do nothing when handed a null pointer." [1] >> >> I am sure our code analyzing tool, which we use to check code >> correctness, will compliant about it. >> >>> >>>> We may need to free corresponding java thread object when we remove >>>> compiler threads. >>> I think it would be bad to remove the Java Thread objects because we'd >>> need to recreate them which is rather expensive and violates the >>> design principle that Compiler Threads are not allowed to call Java. >>> Removing them wouldn't save much memory. Keeping them in global >>> handles seems to be beneficial and makes this change easier. >> >> Okay. >> >>> >>>> And I thought we would need to add only one threads each time when we >>>> hit some queue size threshold. At the start queues filled up very >>>> fast so you may end up creating all compiler threads. >>> My current formula only creates as much compiler threads so that there >>> exist 2 compile jobs per thread. I think this is better for startup, >>> but we can reevaluate this. >> >> Would be nice to see graph how number of compiler threads change with >> time depending on load for some applications (for example, jbb2005 and >> specjvm2008 if you have them)? >> >>> >>> Thanks for the improvement proposals. I'll implement them next week. >>> Nevertheless, the current version can already be tested. >> >> I started our testing. >> >> I just remember that we may need to treat -Xcomp and CTW cases specially. >> >> I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. >> And also tier1 compiler tests with Graal as JIT >> (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI >> -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). >> They passed. I think JVMCI code is fine. >> >> But I see crash in CompileLog::finish_log_on_error() function in >> compiler/compilercontrol jtreg tests (they are not in tier1) with normal >> jtreg runs: >> >> FAILED: compiler/compilercontrol/commandfile/LogTest.java >> FAILED: compiler/compilercontrol/commands/LogTest.java >> FAILED: compiler/compilercontrol/directives/LogTest.java >> FAILED: compiler/compilercontrol/jcmd/AddLogTest.java >> FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java >> FAILED: compiler/compilercontrol/logcompilation/LogTest.java >> >> I started performance testing too. >> >> Thanks, >> Vladimir >> >>> >>> Best regards, >>> Martin >>> >>> >>> [1] http://en.cppreference.com/w/cpp/language/delete >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Freitag, 23. M?rz 2018 18:17 >>> To: Doerr, Martin >>> Cc: Igor Veresov (igor.veresov at oracle.com) ; >>> White, Derek ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >>> >>> Very cool! >>> >>> Few thoughts. >>> >>> You can't delete thread when it is NULL (missing check or refactor code): >>> >>> ???? if (thread == NULL || thread->osthread() == NULL) { >>> +??? if (UseDynamicNumberOfCompilerThreads && >>> comp->num_compiler_threads() > 0) { >>> +????? delete thread; >>> >>> Why not keep handle instead of returning naked oop from >>> create_thread_oop()? You create Handle again >>> >>> Start fields names with _ to distinguish them from local variable: >>> >>> +? static int c1_count, c2_count; >>> >>> In possibly_add_compiler_threads() you can use c2_count instead of >>> calling compile_count() again and array size is fixed >>> already: >>> >>> +??? int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >>> + >>> CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >>> >>> And I thought we would need to add only one threads each time when we >>> hit some queue size threshold. At the start queues >>> filled up very fast so you may end up creating all compiler threads. >>> Or we may need more complex formula. >>> >>> We may need to free corresponding java thread object when we remove >>> compiler threads. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>>> Hi Vladimir, >>>> >>>> thanks for updating the RFE. I already had similar ideas so I've >>>> implemented a prototype. >>>> >>>> I'll be glad if you can support this effort. >>>> >>>> My implementation starts only one thread per type (C1/C2) initially. >>>> Compiler threads start additional threads depending >>>> on the compile queue size, the available memory and the predetermined >>>> maximum. The Java Thread objects get created >>>> during startup so the Compiler Threads don't need to call Java. >>>> >>>> The heuristics (in possibly_add_compiler_threads()) are just an >>>> initial proposal and we may want to add tuning >>>> parameters or different numbers. >>>> >>>> Threads get stopped in reverse order as they were created when their >>>> compile queue is empty for some time. >>>> >>>> The feature can be switched by >>>> -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal >>>> can be traced by >>>> -XX:+TraceCompilerThreads. >>>> >>>> Webrev is here: >>>> >>>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>>> >>>> The following issues need to get addressed, yet: >>>> >>>> -Test JVMCI support. (I'm not familiar with it.) >>>> >>>> -Possible memory leaks. I've added some delete calls when a thread >>>> dies, but they may be incomplete. >>>> >>>> -Logging. >>>> >>>> -Performance and memory consumption evaluation. >>>> >>>> It would be great to get support and advice for these issues. >>>> >>>> Best regards, >>>> >>>> Martin >>>> From vladimir.kozlov at oracle.com Thu Mar 29 17:53:20 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 10:53:20 -0700 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: <767c3e87c23246e89c2c6d368aa30bcb@sap.com> References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> <589e4fdca2bc47f197066a0f110e7d34@sap.com> <13980bc9-3a12-23d3-477d-596b4c2432ee@oracle.com> <767c3e87c23246e89c2c6d368aa30bcb@sap.com> Message-ID: Okay. I posted webrev.02 testing failures in bug report. Thanks, Vladimir K On 3/29/18 10:15 AM, Doerr, Martin wrote: > Hi Vladimir, > > sorry, I had missed your proposals you have added to the bug while I was sick. webrev.02 is only based on the emails. > I'll think about the ideas which were written in the bug next week. > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 29. M?rz 2018 18:42 > To: Doerr, Martin > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads > > Hi Martin, > > Thank you for update. > > Does using thread->smr_delete() solve runtime/whitebox/WBStackSize.java > test problem I reported in RFE? > > I see you kept _c1_compile_queue->size() / 2. I think it create too many > C1 threads reaching max number _c1_count very fast. > > I will start testing with 02 changes and let you know results. > > Thanks, > Vladimir > > On 3/28/18 9:02 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> I have addressed your proposals with this new webrev: >> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.02/ >> >> Please take a look. >> >> Thanks for your support. Best regards, >> Martin >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Montag, 26. M?rz 2018 23:15 >> To: Doerr, Martin >> Cc: 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >> >> Hi Martin, >> >> We can't delete _log when deleting CompilerThread. Log is referenced >> globally and used on VM exit to generate final log file when >> -XX:+LogCompilation is specified. compilercontrol tests passed after I >> change it: >> >> +CompilerThread::~CompilerThread() { >> + // Delete objects which were allocated on heap. >> + delete _counters; >> + // _log is referenced in global CompileLog::_first chain and used on >> exit. >> +} >> >> I also see that we C1 compiler threads are removed too soon which cause >> their re-activation again. This may eat memory: >> >> $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t >> Added initial compiler thread C2 CompilerThread0 >> Added initial compiler thread C1 CompilerThread0 >> Warning: TraceDependencies results may be inflated by VerifyDependencies >> Added compiler thread C1 CompilerThread1 (available memory: 37040MB) >> Added compiler thread C1 CompilerThread2 (available memory: 37033MB) >> Added compiler thread C1 CompilerThread3 (available memory: 37032MB) >> Removing compiler thread C1 CompilerThread3 >> Removing compiler thread C1 CompilerThread2 >> Removing compiler thread C1 CompilerThread1 >> Added compiler thread C1 CompilerThread1 (available memory: 37027MB) >> >> May be we should take into account for how long these threads are not used. >> >> Thanks, >> Vladimir >> >> On 3/23/18 5:58 PM, Vladimir Kozlov wrote: >>> On 3/23/18 10:37 AM, Doerr, Martin wrote: >>>> Hi Vladimir, >>>> >>>> thanks for the quick reply. Just a few answers. I'll take a closer >>>> look next week. >>>> >>>>> You can't delete thread when it is NULL >>>> C++ supports calling delete NULL so I think it would be uncommon to >>>> check it. If there's a problem, I think the delete operator should get >>>> fixed. >>>> >>>> "If expression evaluates to a null pointer value, no destructors are >>>> called, and the deallocation function may or may not be called (it's >>>> implementation-defined), but the default deallocation functions are >>>> guaranteed to do nothing when handed a null pointer." [1] >>> >>> I am sure our code analyzing tool, which we use to check code >>> correctness, will compliant about it. >>> >>>> >>>>> We may need to free corresponding java thread object when we remove >>>>> compiler threads. >>>> I think it would be bad to remove the Java Thread objects because we'd >>>> need to recreate them which is rather expensive and violates the >>>> design principle that Compiler Threads are not allowed to call Java. >>>> Removing them wouldn't save much memory. Keeping them in global >>>> handles seems to be beneficial and makes this change easier. >>> >>> Okay. >>> >>>> >>>>> And I thought we would need to add only one threads each time when we >>>>> hit some queue size threshold. At the start queues filled up very >>>>> fast so you may end up creating all compiler threads. >>>> My current formula only creates as much compiler threads so that there >>>> exist 2 compile jobs per thread. I think this is better for startup, >>>> but we can reevaluate this. >>> >>> Would be nice to see graph how number of compiler threads change with >>> time depending on load for some applications (for example, jbb2005 and >>> specjvm2008 if you have them)? >>> >>>> >>>> Thanks for the improvement proposals. I'll implement them next week. >>>> Nevertheless, the current version can already be tested. >>> >>> I started our testing. >>> >>> I just remember that we may need to treat -Xcomp and CTW cases specially. >>> >>> I also ran jtreg testing locally on x64 linux for compiler/jvmci tests. >>> And also tier1 compiler tests with Graal as JIT >>> (-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI >>> -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal). >>> They passed. I think JVMCI code is fine. >>> >>> But I see crash in CompileLog::finish_log_on_error() function in >>> compiler/compilercontrol jtreg tests (they are not in tier1) with normal >>> jtreg runs: >>> >>> FAILED: compiler/compilercontrol/commandfile/LogTest.java >>> FAILED: compiler/compilercontrol/commands/LogTest.java >>> FAILED: compiler/compilercontrol/directives/LogTest.java >>> FAILED: compiler/compilercontrol/jcmd/AddLogTest.java >>> FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java >>> FAILED: compiler/compilercontrol/logcompilation/LogTest.java >>> >>> I started performance testing too. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> [1] http://en.cppreference.com/w/cpp/language/delete >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Freitag, 23. M?rz 2018 18:17 >>>> To: Doerr, Martin >>>> Cc: Igor Veresov (igor.veresov at oracle.com) ; >>>> White, Derek ; >>>> 'hotspot-compiler-dev at openjdk.java.net' >>>> >>>> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >>>> >>>> Very cool! >>>> >>>> Few thoughts. >>>> >>>> You can't delete thread when it is NULL (missing check or refactor code): >>>> >>>> ???? if (thread == NULL || thread->osthread() == NULL) { >>>> +??? if (UseDynamicNumberOfCompilerThreads && >>>> comp->num_compiler_threads() > 0) { >>>> +????? delete thread; >>>> >>>> Why not keep handle instead of returning naked oop from >>>> create_thread_oop()? You create Handle again >>>> >>>> Start fields names with _ to distinguish them from local variable: >>>> >>>> +? static int c1_count, c2_count; >>>> >>>> In possibly_add_compiler_threads() you can use c2_count instead of >>>> calling compile_count() again and array size is fixed >>>> already: >>>> >>>> +??? int new_c2_count = MIN3(_c2_compile_queue->size() / 2, >>>> + >>>> CompilationPolicy::policy()->compiler_count(CompLevel_full_optimization), >>>> >>>> And I thought we would need to add only one threads each time when we >>>> hit some queue size threshold. At the start queues >>>> filled up very fast so you may end up creating all compiler threads. >>>> Or we may need more complex formula. >>>> >>>> We may need to free corresponding java thread object when we remove >>>> compiler threads. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/23/18 8:04 AM, Doerr, Martin wrote: >>>>> Hi Vladimir, >>>>> >>>>> thanks for updating the RFE. I already had similar ideas so I've >>>>> implemented a prototype. >>>>> >>>>> I'll be glad if you can support this effort. >>>>> >>>>> My implementation starts only one thread per type (C1/C2) initially. >>>>> Compiler threads start additional threads depending >>>>> on the compile queue size, the available memory and the predetermined >>>>> maximum. The Java Thread objects get created >>>>> during startup so the Compiler Threads don't need to call Java. >>>>> >>>>> The heuristics (in possibly_add_compiler_threads()) are just an >>>>> initial proposal and we may want to add tuning >>>>> parameters or different numbers. >>>>> >>>>> Threads get stopped in reverse order as they were created when their >>>>> compile queue is empty for some time. >>>>> >>>>> The feature can be switched by >>>>> -XX:+/-UseDynamicNumberOfCompilerThreads. Thread creating and removal >>>>> can be traced by >>>>> -XX:+TraceCompilerThreads. >>>>> >>>>> Webrev is here: >>>>> >>>>> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/ >>>>> >>>>> The following issues need to get addressed, yet: >>>>> >>>>> -Test JVMCI support. (I'm not familiar with it.) >>>>> >>>>> -Possible memory leaks. I've added some delete calls when a thread >>>>> dies, but they may be incomplete. >>>>> >>>>> -Logging. >>>>> >>>>> -Performance and memory consumption evaluation. >>>>> >>>>> It would be great to get support and advice for these issues. >>>>> >>>>> Best regards, >>>>> >>>>> Martin >>>>> From lutz.schmidt at sap.com Thu Mar 29 18:31:13 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 29 Mar 2018 18:31:13 +0000 Subject: RFR(S): 8200366: SIGSEGV in CodeHeapState::print_names() Message-ID: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> Dear All, may I please request reviews for this small fix. It resolves the subject issue, found during testing, by simply removing the failing function (change in diagnosticCommand.cpp). The changes in codeHeapState.cpp target the root cause of the problem. It was shown that they are helpful, but they could not yet be proven to be sufficient. Bug: https://bugs.openjdk.java.net/browse/JDK-8200366 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8200366.02/ Thank you very much! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Mar 29 18:35:54 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 11:35:54 -0700 Subject: RFR(S): 8200366: SIGSEGV in CodeHeapState::print_names() In-Reply-To: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> References: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> Message-ID: <6c5515db-0a06-026b-9cc4-ddd865097653@oracle.com> Good. I will test it and push. Thanks, Vladimir On 3/29/18 11:31 AM, Schmidt, Lutz wrote: > Dear All, > > may I please request reviews for this small fix. It resolves the subject > issue, found during testing, by simply removing the failing function > (change in diagnosticCommand.cpp). The changes in codeHeapState.cpp > target the root cause of the problem. It was shown that they are > helpful, but they could not yet be proven to be sufficient. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200366 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8200366.02/ > > Thank you very much! > > Lutz > From lutz.schmidt at sap.com Thu Mar 29 18:52:21 2018 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 29 Mar 2018 18:52:21 +0000 Subject: RFR(S): 8200366: SIGSEGV in CodeHeapState::print_names() In-Reply-To: <6c5515db-0a06-026b-9cc4-ddd865097653@oracle.com> References: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> <6c5515db-0a06-026b-9cc4-ddd865097653@oracle.com> Message-ID: <74549D4C-32B2-47AA-B55B-FEB4E350AE02@sap.com> HI Vladimir, I have created https://bugs.openjdk.java.net/browse/JDK-8200450 for follow-on investigation. Thanks, Lutz ?On 29.03.18, 20:35, "Vladimir Kozlov" wrote: Good. I will test it and push. Thanks, Vladimir On 3/29/18 11:31 AM, Schmidt, Lutz wrote: > Dear All, > > may I please request reviews for this small fix. It resolves the subject > issue, found during testing, by simply removing the failing function > (change in diagnosticCommand.cpp). The changes in codeHeapState.cpp > target the root cause of the problem. It was shown that they are > helpful, but they could not yet be proven to be sufficient. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200366 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8200366.02/ > > Thank you very much! > > Lutz > From vladimir.kozlov at oracle.com Thu Mar 29 19:07:20 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 12:07:20 -0700 Subject: RFR(S): 8200366: SIGSEGV in CodeHeapState::print_names() In-Reply-To: <74549D4C-32B2-47AA-B55B-FEB4E350AE02@sap.com> References: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> <6c5515db-0a06-026b-9cc4-ddd865097653@oracle.com> <74549D4C-32B2-47AA-B55B-FEB4E350AE02@sap.com> Message-ID: Thank you, Lutz Vladimir On 3/29/18 11:52 AM, Schmidt, Lutz wrote: > HI Vladimir, > > I have created https://bugs.openjdk.java.net/browse/JDK-8200450 for follow-on investigation. > > Thanks, > Lutz > > ?On 29.03.18, 20:35, "Vladimir Kozlov" wrote: > > Good. I will test it and push. > > Thanks, > Vladimir > > On 3/29/18 11:31 AM, Schmidt, Lutz wrote: > > Dear All, > > > > may I please request reviews for this small fix. It resolves the subject > > issue, found during testing, by simply removing the failing function > > (change in diagnosticCommand.cpp). The changes in codeHeapState.cpp > > target the root cause of the problem. It was shown that they are > > helpful, but they could not yet be proven to be sufficient. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200366 > > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8200366.02/ > > > > Thank you very much! > > > > Lutz > > > > From tobias.hartmann at oracle.com Thu Mar 29 19:07:27 2018 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 29 Mar 2018 21:07:27 +0200 Subject: RFR(S): 8200366: SIGSEGV in CodeHeapState::print_names() In-Reply-To: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> References: <9D50320A-9962-4326-96B5-B385F72806E9@sap.com> Message-ID: Hi Lutz, looks good to me as well! Best regards, Tobias On 29.03.2018 20:31, Schmidt, Lutz wrote: > Dear All, > > ? > > may I please request reviews for this small fix. It resolves the subject issue, found during > testing, by simply removing the failing function (change in diagnosticCommand.cpp). The changes in > codeHeapState.cpp target the root cause of the problem. It was shown that they are helpful, but they > could not yet be proven to be sufficient. > > ? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8200366 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8200366.02/ > > ? > > Thank you very much! > > Lutz > > ? > > ? > From poonam.bajaj at oracle.com Thu Mar 29 20:31:57 2018 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Thu, 29 Mar 2018 13:31:57 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 Message-ID: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> Hello, Please review the changes for the following bug that improve the nmethod unloading times with a couple of optimizations. JDK-8199406 : Performance drop with Java JDK 1.8.0_162-b32 Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ This changeset includes two changes: 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need to determine if the code blob is an itable stub. With this change, before linearly searching through all the VtableStub entries, we first check whether the codeblob is a vtable or not. We now also parse through the list entries only once rather than doing it twice in /VtableStubs::is_entry_point()/ and /VtableStubs::stub_containing()/. 2. The second change helps avoid the virtual function calls in CompiledICHolder::is_loader_alive(). CompiledICHolder now stores information whether the metadata it holds is a method or a klass. Testing: - Customer testing confirming that their class-unloading times drop from 10s of seconds to an average of 0.75 secs. - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 Thanks, Poonam -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Mar 29 21:23:20 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 14:23:20 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> Message-ID: Looks good. I have few comments. You can remove old is_entry_point() which have the same code as new entry_point(). After changes it is used only in one place in CompiledIC::is_megamorphic(). It can be replace with NULL check of entry_point() result there. Add comment in vtableStubs.hpp for new method. is_metadata_method field name should start with _ (this is Hotspot convention). is_loader_alive() could be simplified since you have field now: inline bool is_loader_alive(BoolObjectClosure* is_alive) { Klass* k = is_metadata_method ? ((Method*)_holder_metadata)->method_holder() : (Klass*)_holder_metadata; if (!k->is_loader_alive(is_alive)) { return false; } if (!_holder_klass->is_loader_alive(is_alive)) { return false; } return true; } Thanks, Vladimir On 3/29/18 1:31 PM, Poonam Parhar wrote: > Hello, > > Please review the changes for the following bug that improve the nmethod > unloading times with a couple of optimizations. > > JDK-8199406 : > Performance drop with Java JDK 1.8.0_162-b32 > Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ > > This changeset includes two changes: > 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need to > determine if the code blob is an itable stub. With this change, before > linearly searching through all the VtableStub entries, we first check > whether the codeblob is a vtable or not. We now also parse through the > list entries only once rather than doing it twice in > /VtableStubs::is_entry_point()/ and /VtableStubs::stub_containing()/. > 2. The second change helps avoid the virtual function calls in > CompiledICHolder::is_loader_alive(). CompiledICHolder now stores > information whether the metadata it holds is a method or a klass. > > Testing: > - Customer testing confirming that their class-unloading times drop from > 10s of seconds to an average of 0.75 secs. > - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 > > Thanks, > Poonam > From vladimir.kozlov at oracle.com Thu Mar 29 22:00:52 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 15:00:52 -0700 Subject: [11] RFR(XS) 8200461: MeetIncompatibleInterfaceArrays fails with -Xcomp Message-ID: http://cr.openjdk.java.net/~kvn/8200461/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8200461 Test needs to be run by Server VM in mixed mode. Add @requires. Tested locally in different modes. -- Thanks, Vladimir From poonam.bajaj at oracle.com Thu Mar 29 22:14:54 2018 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Thu, 29 Mar 2018 15:14:54 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> Message-ID: Hello Vladimir, Thanks for reviewing the changes. Please find the updated webrev here: http://cr.openjdk.java.net/~poonam/8199406/webrev.01/ Regards, Poonam On 3/29/2018 2:23 PM, Vladimir Kozlov wrote: > Looks good. I have few comments. > > You can remove old is_entry_point() which have the same code as new > entry_point(). After changes it is used only in one place in > CompiledIC::is_megamorphic(). It can be replace with NULL check of > entry_point() result there. Add comment in vtableStubs.hpp for new > method. > > is_metadata_method field name should start with _ (this is Hotspot > convention). > > is_loader_alive() could be simplified since you have field now: > > ? inline bool is_loader_alive(BoolObjectClosure* is_alive) { > ??? Klass* k = is_metadata_method ? > ((Method*)_holder_metadata)->method_holder() : (Klass*)_holder_metadata; > ??? if (!k->is_loader_alive(is_alive)) { > ????? return false; > ??? } > ??? if (!_holder_klass->is_loader_alive(is_alive)) { > ????? return false; > ??? } > ??? return true; > ? } > > Thanks, > Vladimir > > On 3/29/18 1:31 PM, Poonam Parhar wrote: >> Hello, >> >> Please review the changes for the following bug that improve the >> nmethod unloading times with a couple of optimizations. >> >> JDK-8199406 : >> Performance drop with Java JDK 1.8.0_162-b32 >> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >> >> This changeset includes two changes: >> 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need to >> determine if the code blob is an itable stub. With this change, >> before linearly searching through all the VtableStub entries, we >> first check whether the codeblob is a vtable or not. We now also >> parse through the list entries only once rather than doing it twice >> in /VtableStubs::is_entry_point()/ and /VtableStubs::stub_containing()/. >> 2. The second change helps avoid the virtual function calls in >> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >> information whether the metadata it holds is a method or a klass. >> >> Testing: >> - Customer testing confirming that their class-unloading times drop >> from 10s of seconds to an average of 0.75 secs. >> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 >> >> Thanks, >> Poonam >> From vladimir.kozlov at oracle.com Thu Mar 29 22:21:11 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Mar 2018 15:21:11 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> Message-ID: <5d2a0cf5-c277-4979-b9f1-b6b359f4ab61@oracle.com> Looks good. You need other reviewers to look too. Thanks, Vladimir On 3/29/18 3:14 PM, Poonam Parhar wrote: > Hello Vladimir, > > Thanks for reviewing the changes. Please find the updated webrev here: > http://cr.openjdk.java.net/~poonam/8199406/webrev.01/ > > Regards, > Poonam > > On 3/29/2018 2:23 PM, Vladimir Kozlov wrote: >> Looks good. I have few comments. >> >> You can remove old is_entry_point() which have the same code as new >> entry_point(). After changes it is used only in one place in >> CompiledIC::is_megamorphic(). It can be replace with NULL check of >> entry_point() result there. Add comment in vtableStubs.hpp for new >> method. >> >> is_metadata_method field name should start with _ (this is Hotspot >> convention). >> >> is_loader_alive() could be simplified since you have field now: >> >> ? inline bool is_loader_alive(BoolObjectClosure* is_alive) { >> ??? Klass* k = is_metadata_method ? >> ((Method*)_holder_metadata)->method_holder() : (Klass*)_holder_metadata; >> ??? if (!k->is_loader_alive(is_alive)) { >> ????? return false; >> ??? } >> ??? if (!_holder_klass->is_loader_alive(is_alive)) { >> ????? return false; >> ??? } >> ??? return true; >> ? } >> >> Thanks, >> Vladimir >> >> On 3/29/18 1:31 PM, Poonam Parhar wrote: >>> Hello, >>> >>> Please review the changes for the following bug that improve the >>> nmethod unloading times with a couple of optimizations. >>> >>> JDK-8199406 : >>> Performance drop with Java JDK 1.8.0_162-b32 >>> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >>> >>> This changeset includes two changes: >>> 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need to >>> determine if the code blob is an itable stub. With this change, >>> before linearly searching through all the VtableStub entries, we >>> first check whether the codeblob is a vtable or not. We now also >>> parse through the list entries only once rather than doing it twice >>> in /VtableStubs::is_entry_point()/ and /VtableStubs::stub_containing()/. >>> 2. The second change helps avoid the virtual function calls in >>> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >>> information whether the metadata it holds is a method or a klass. >>> >>> Testing: >>> - Customer testing confirming that their class-unloading times drop >>> from 10s of seconds to an average of 0.75 secs. >>> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 >>> >>> Thanks, >>> Poonam >>> > From volker.simonis at gmail.com Fri Mar 30 07:02:55 2018 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 30 Mar 2018 07:02:55 +0000 Subject: [11] RFR(XS) 8200461: MeetIncompatibleInterfaceArrays fails with -Xcomp In-Reply-To: References: Message-ID: Hi Vladimir, the change looks good. Thanks for fixing this! Regards, Volker Vladimir Kozlov schrieb am Fr. 30. M?rz 2018 um 00:01: > http://cr.openjdk.java.net/~kvn/8200461/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8200461 > > Test needs to be run by Server VM in mixed mode. Add @requires. > > Tested locally in different modes. > > -- > Thanks, > Vladimir > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Mar 30 14:41:22 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 30 Mar 2018 07:41:22 -0700 Subject: [11] RFR(XS) 8200461: MeetIncompatibleInterfaceArrays fails with -Xcomp In-Reply-To: References: Message-ID: <487d3ffc-89ac-27dd-287e-0a6e6e9135b5@oracle.com> Thank you, Volker Vladimir On 3/30/18 12:02 AM, Volker Simonis wrote: > Hi Vladimir, > > the change looks good. > Thanks for fixing this! > > Regards, > Volker > > Vladimir Kozlov > schrieb am Fr. 30. M?rz 2018 um 00:01: > > http://cr.openjdk.java.net/~kvn/8200461/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8200461 > > Test needs to be run by Server VM in mixed mode. Add @requires. > > Tested locally in different modes. > > -- > Thanks, > Vladimir > From thomas.schatzl at oracle.com Fri Mar 30 15:34:37 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 30 Mar 2018 17:34:37 +0200 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> Message-ID: <1522424077.2390.12.camel@oracle.com> Hi Poonam, On Thu, 2018-03-29 at 13:31 -0700, Poonam Parhar wrote: > Hello, > > Please review the changes for the following bug that improve the > nmethod unloading times with a couple of optimizations. > > JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 > Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ > > This changeset includes two changes: > 1. In compiledIC.cpp, CompiledIC::is_icholder_entry() , we need to > determine if the code blob is an itable stub. With this change, > before linearly searching through all the VtableStub entries, we > first check whether the codeblob is a vtable or not. We now also > parse through the list entries only once rather than doing it twice > in VtableStubs::is_entry_point() and VtableStubs::stub_containing(). > 2. The second change helps avoid the virtual function calls in > CompiledICHolder::is_loader_alive(). CompiledICHolder now stores > information whether the metadata it holds is a method or a klass. > > Testing: > - Customer testing confirming that their class-unloading times drop > from 10s of seconds to an average of 0.75 secs. > - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 does the change recover all class-unloading times as they were with pre-1.8.0_162-b32? The figures you show seem to only compare 1.8.0_162-b32 to 1.8.0_162- b32 + patch. If not, do you have any idea what the problem could be? Thanks, Thomas From vladimir.kozlov at oracle.com Fri Mar 30 16:25:02 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 30 Mar 2018 09:25:02 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <03535698-5dd3-a5cf-3c13-3ca70ea7a035@oracle.com> References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> <03535698-5dd3-a5cf-3c13-3ca70ea7a035@oracle.com> Message-ID: <87ac4ad9-a0ba-ed68-0720-8699d293f5db@oracle.com> On 3/30/18 8:31 AM, coleen.phillimore at oracle.com wrote: > > http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/vtableStubs.cpp.udiff.html > > > ?? uint hash = VtableStubs::hash(stub->is_vtable_stub(), stub->index()); > > > Isn't stub->is_vtable_stub() always true here, so you could avoid > another virtual call?? Can you assert this and pass true here? Not true. It could be also itable. They differentiate by value of _is_vtable_stub field. It is not virtual call, it is check of the field: bool is_vtable_stub() { return _is_vtable_stub; } > > The whole subclassing of BufferBlob with an virtual call > is_vtable_blob() seems inefficient if you have to call this all the time > during unloading and otherwise, and spending a word in one of the > alignment gaps in CodeBlob to point to the type of code blob would be > more efficient.? But that might be an RFE for the compiler team.? Also > since the only virtual functions are "is_" functions. I think we discussed that during rewriting this code for AOT. I don't remember why we decided to keep it this way. May be because it did not show up on our code performance profiling. On x86 it is really fast. May be we should review this code again if you think it should be optimized. > > You should add this new type at > http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/codeBlob.hpp.html I thought it was there in one of the working versions. Yes, it needs to be added to that comment. Thanks, Vladimir > > > line 61. > > Thanks, > Coleen > > On 3/29/18 6:14 PM, Poonam Parhar wrote: >> Hello Vladimir, >> >> Thanks for reviewing the changes. Please find the updated webrev here: >> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/ >> >> Regards, >> Poonam >> >> On 3/29/2018 2:23 PM, Vladimir Kozlov wrote: >>> Looks good. I have few comments. >>> >>> You can remove old is_entry_point() which have the same code as new >>> entry_point(). After changes it is used only in one place in >>> CompiledIC::is_megamorphic(). It can be replace with NULL check of >>> entry_point() result there. Add comment in vtableStubs.hpp for new >>> method. >>> >>> is_metadata_method field name should start with _ (this is Hotspot >>> convention). >>> >>> is_loader_alive() could be simplified since you have field now: >>> >>> ? inline bool is_loader_alive(BoolObjectClosure* is_alive) { >>> ??? Klass* k = is_metadata_method ? >>> ((Method*)_holder_metadata)->method_holder() : (Klass*)_holder_metadata; >>> ??? if (!k->is_loader_alive(is_alive)) { >>> ????? return false; >>> ??? } >>> ??? if (!_holder_klass->is_loader_alive(is_alive)) { >>> ????? return false; >>> ??? } >>> ??? return true; >>> ? } >>> >>> Thanks, >>> Vladimir >>> >>> On 3/29/18 1:31 PM, Poonam Parhar wrote: >>>> Hello, >>>> >>>> Please review the changes for the following bug that improve the >>>> nmethod unloading times with a couple of optimizations. >>>> >>>> JDK-8199406 : >>>> Performance drop with Java JDK 1.8.0_162-b32 >>>> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >>>> >>>> This changeset includes two changes: >>>> 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need to >>>> determine if the code blob is an itable stub. With this change, >>>> before linearly searching through all the VtableStub entries, we >>>> first check whether the codeblob is a vtable or not. We now also >>>> parse through the list entries only once rather than doing it twice >>>> in /VtableStubs::is_entry_point()/ and >>>> /VtableStubs::stub_containing()/. >>>> 2. The second change helps avoid the virtual function calls in >>>> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >>>> information whether the metadata it holds is a method or a klass. >>>> >>>> Testing: >>>> - Customer testing confirming that their class-unloading times drop >>>> from 10s of seconds to an average of 0.75 secs. >>>> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 >>>> >>>> Thanks, >>>> Poonam >>>> >> > From poonam.bajaj at oracle.com Fri Mar 30 16:38:45 2018 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Fri, 30 Mar 2018 09:38:45 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <1522424077.2390.12.camel@oracle.com> References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> <1522424077.2390.12.camel@oracle.com> Message-ID: Hello Thomas, On 3/30/2018 8:34 AM, Thomas Schatzl wrote: > Hi Poonam, > > On Thu, 2018-03-29 at 13:31 -0700, Poonam Parhar wrote: >> Hello, >> >> Please review the changes for the following bug that improve the >> nmethod unloading times with a couple of optimizations. >> >> JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 >> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >> >> This changeset includes two changes: >> 1. In compiledIC.cpp, CompiledIC::is_icholder_entry() , we need to >> determine if the code blob is an itable stub. With this change, >> before linearly searching through all the VtableStub entries, we >> first check whether the codeblob is a vtable or not. We now also >> parse through the list entries only once rather than doing it twice >> in VtableStubs::is_entry_point() and VtableStubs::stub_containing(). >> 2. The second change helps avoid the virtual function calls in >> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >> information whether the metadata it holds is a method or a klass. >> >> Testing: >> - Customer testing confirming that their class-unloading times drop >> from 10s of seconds to an average of 0.75 secs. >> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 > does the change recover all class-unloading times as they were with > pre-1.8.0_162-b32? This is a regression that got introduced in 8u161 that increased the class-unloading times from an average of 0.65 secs to 10s of seconds for the customer. From the data received from the customer testing with this fix, on an average we lose around 0.05 secs when compared with pre-8u161 runs. And this lost time is attributed towards the itable stub scanning that we still need to do in is_icholder_entry(). > The figures you show seem to only compare 1.8.0_162-b32 to 1.8.0_162- > b32 + patch. > > If not, do you have any idea what the problem could be? Large number of Interfaces, complex Interfaces/classes graph that causes large number of entries in the VtableStubs array can make this issue appear. Thanks, Poonam > Thanks, > Thomas > From poonam.bajaj at oracle.com Fri Mar 30 16:43:36 2018 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Fri, 30 Mar 2018 09:43:36 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <87ac4ad9-a0ba-ed68-0720-8699d293f5db@oracle.com> References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> <03535698-5dd3-a5cf-3c13-3ca70ea7a035@oracle.com> <87ac4ad9-a0ba-ed68-0720-8699d293f5db@oracle.com> Message-ID: Thanks for looking at the changes, Coleen! On 3/30/2018 9:25 AM, Vladimir Kozlov wrote: > On 3/30/18 8:31 AM, coleen.phillimore at oracle.com wrote: >> >> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/vtableStubs.cpp.udiff.html >> >> >> ??? uint hash = VtableStubs::hash(stub->is_vtable_stub(), >> stub->index()); >> >> >> Isn't stub->is_vtable_stub() always true here, so you could avoid >> another virtual call?? Can you assert this and pass true here? > > Not true. It could be also itable. They differentiate by value of > _is_vtable_stub field. It is not virtual call, it is check of the field: > > bool is_vtable_stub() { return? _is_vtable_stub; } > >> >> The whole subclassing of BufferBlob with an virtual call >> is_vtable_blob() seems inefficient if you have to call this all the >> time during unloading and otherwise, and spending a word in one of >> the alignment gaps in CodeBlob to point to the type of code blob >> would be more efficient.? But that might be an RFE for the compiler >> team.? Also since the only virtual functions are "is_> type>" functions. > > I think we discussed that during rewriting this code for AOT. I don't > remember why we decided to keep it this way. May be because it did not > show up on our code performance profiling. On x86 it is really fast. > > May be we should review this code again if you think it should be > optimized. > >> >> You should add this new type at >> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/codeBlob.hpp.html > > > I thought it was there in one of the working versions. Yes, it needs > to be added to that comment. > Sure, I will add the new type to the comments. Thanks, Poonam > Thanks, > Vladimir > >> >> >> line 61. >> >> Thanks, >> Coleen >> >> On 3/29/18 6:14 PM, Poonam Parhar wrote: >>> Hello Vladimir, >>> >>> Thanks for reviewing the changes. Please find the updated webrev here: >>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/ >>> >>> Regards, >>> Poonam >>> >>> On 3/29/2018 2:23 PM, Vladimir Kozlov wrote: >>>> Looks good. I have few comments. >>>> >>>> You can remove old is_entry_point() which have the same code as new >>>> entry_point(). After changes it is used only in one place in >>>> CompiledIC::is_megamorphic(). It can be replace with NULL check of >>>> entry_point() result there. Add comment in vtableStubs.hpp for new >>>> method. >>>> >>>> is_metadata_method field name should start with _ (this is Hotspot >>>> convention). >>>> >>>> is_loader_alive() could be simplified since you have field now: >>>> >>>> ? inline bool is_loader_alive(BoolObjectClosure* is_alive) { >>>> ??? Klass* k = is_metadata_method ? >>>> ((Method*)_holder_metadata)->method_holder() : >>>> (Klass*)_holder_metadata; >>>> ??? if (!k->is_loader_alive(is_alive)) { >>>> ????? return false; >>>> ??? } >>>> ??? if (!_holder_klass->is_loader_alive(is_alive)) { >>>> ????? return false; >>>> ??? } >>>> ??? return true; >>>> ? } >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/29/18 1:31 PM, Poonam Parhar wrote: >>>>> Hello, >>>>> >>>>> Please review the changes for the following bug that improve the >>>>> nmethod unloading times with a couple of optimizations. >>>>> >>>>> JDK-8199406 : >>>>> Performance drop with Java JDK 1.8.0_162-b32 >>>>> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >>>>> >>>>> This changeset includes two changes: >>>>> 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need >>>>> to determine if the code blob is an itable stub. With this change, >>>>> before linearly searching through all the VtableStub entries, we >>>>> first check whether the codeblob is a vtable or not. We now also >>>>> parse through the list entries only once rather than doing it >>>>> twice in /VtableStubs::is_entry_point()/ and >>>>> /VtableStubs::stub_containing()/. >>>>> 2. The second change helps avoid the virtual function calls in >>>>> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >>>>> information whether the metadata it holds is a method or a klass. >>>>> >>>>> Testing: >>>>> - Customer testing confirming that their class-unloading times >>>>> drop from 10s of seconds to an average of 0.75 secs. >>>>> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 >>>>> >>>>> Thanks, >>>>> Poonam >>>>> >>> >> From coleen.phillimore at oracle.com Fri Mar 30 17:31:52 2018 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 30 Mar 2018 13:31:52 -0400 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> <03535698-5dd3-a5cf-3c13-3ca70ea7a035@oracle.com> <87ac4ad9-a0ba-ed68-0720-8699d293f5db@oracle.com> Message-ID: <9ed26d49-4aea-5989-8a25-b93703dd4aed@oracle.com> On 3/30/18 12:43 PM, Poonam Parhar wrote: > Thanks for looking at the changes, Coleen! > > On 3/30/2018 9:25 AM, Vladimir Kozlov wrote: >> On 3/30/18 8:31 AM, coleen.phillimore at oracle.com wrote: >>> >>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/vtableStubs.cpp.udiff.html >>> >>> >>> ??? uint hash = VtableStubs::hash(stub->is_vtable_stub(), >>> stub->index()); >>> >>> >>> Isn't stub->is_vtable_stub() always true here, so you could avoid >>> another virtual call?? Can you assert this and pass true here? >> >> Not true. It could be also itable. They differentiate by value of >> _is_vtable_stub field. It is not virtual call, it is check of the field: >> >> bool is_vtable_stub() { return? _is_vtable_stub; } Thank you for the clarification.? I had it mixed up with is_vtable_blob().? Glad it's not virtual. >> >>> >>> The whole subclassing of BufferBlob with an virtual call >>> is_vtable_blob() seems inefficient if you have to call this all the >>> time during unloading and otherwise, and spending a word in one of >>> the alignment gaps in CodeBlob to point to the type of code blob >>> would be more efficient.? But that might be an RFE for the compiler >>> team.? Also since the only virtual functions are "is_>> type>" functions. >> >> I think we discussed that during rewriting this code for AOT. I don't >> remember why we decided to keep it this way. May be because it did >> not show up on our code performance profiling. On x86 it is really fast. >> >> May be we should review this code again if you think it should be >> optimized. >> It might be worth discussing, if this code continues to matter for unloading method performance. >>> >>> You should add this new type at >>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/codeBlob.hpp.html >> >> >> >> I thought it was there in one of the working versions. Yes, it needs >> to be added to that comment. >> > Sure, I will add the new type to the comments. Thanks!? Thank you for fixing this issue! Coleen > > Thanks, > Poonam > >> Thanks, >> Vladimir >> >>> >>> >>> line 61. >>> >>> Thanks, >>> Coleen >>> >>> On 3/29/18 6:14 PM, Poonam Parhar wrote: >>>> Hello Vladimir, >>>> >>>> Thanks for reviewing the changes. Please find the updated webrev here: >>>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/ >>>> >>>> Regards, >>>> Poonam >>>> >>>> On 3/29/2018 2:23 PM, Vladimir Kozlov wrote: >>>>> Looks good. I have few comments. >>>>> >>>>> You can remove old is_entry_point() which have the same code as >>>>> new entry_point(). After changes it is used only in one place in >>>>> CompiledIC::is_megamorphic(). It can be replace with NULL check of >>>>> entry_point() result there. Add comment in vtableStubs.hpp for new >>>>> method. >>>>> >>>>> is_metadata_method field name should start with _ (this is Hotspot >>>>> convention). >>>>> >>>>> is_loader_alive() could be simplified since you have field now: >>>>> >>>>> ? inline bool is_loader_alive(BoolObjectClosure* is_alive) { >>>>> ??? Klass* k = is_metadata_method ? >>>>> ((Method*)_holder_metadata)->method_holder() : >>>>> (Klass*)_holder_metadata; >>>>> ??? if (!k->is_loader_alive(is_alive)) { >>>>> ????? return false; >>>>> ??? } >>>>> ??? if (!_holder_klass->is_loader_alive(is_alive)) { >>>>> ????? return false; >>>>> ??? } >>>>> ??? return true; >>>>> ? } >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/29/18 1:31 PM, Poonam Parhar wrote: >>>>>> Hello, >>>>>> >>>>>> Please review the changes for the following bug that improve the >>>>>> nmethod unloading times with a couple of optimizations. >>>>>> >>>>>> JDK-8199406 : >>>>>> Performance drop with Java JDK 1.8.0_162-b32 >>>>>> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >>>>>> >>>>>> This changeset includes two changes: >>>>>> 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need >>>>>> to determine if the code blob is an itable stub. With this >>>>>> change, before linearly searching through all the VtableStub >>>>>> entries, we first check whether the codeblob is a vtable or not. >>>>>> We now also parse through the list entries only once rather than >>>>>> doing it twice in /VtableStubs::is_entry_point()/ and >>>>>> /VtableStubs::stub_containing()/. >>>>>> 2. The second change helps avoid the virtual function calls in >>>>>> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >>>>>> information whether the metadata it holds is a method or a klass. >>>>>> >>>>>> Testing: >>>>>> - Customer testing confirming that their class-unloading times >>>>>> drop from 10s of seconds to an average of 0.75 secs. >>>>>> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 >>>>>> >>>>>> Thanks, >>>>>> Poonam >>>>>> >>>> >>> > From vladimir.kozlov at oracle.com Fri Mar 30 18:06:25 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 30 Mar 2018 11:06:25 -0700 Subject: RFR: JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <9ed26d49-4aea-5989-8a25-b93703dd4aed@oracle.com> References: <430b7c16-ab5f-601f-5da0-8e15d098c847@oracle.com> <03535698-5dd3-a5cf-3c13-3ca70ea7a035@oracle.com> <87ac4ad9-a0ba-ed68-0720-8699d293f5db@oracle.com> <9ed26d49-4aea-5989-8a25-b93703dd4aed@oracle.com> Message-ID: <43ef6c2c-ef9d-1089-f457-89be37b79be9@oracle.com> On 3/30/18 10:31 AM, coleen.phillimore at oracle.com wrote: > > > On 3/30/18 12:43 PM, Poonam Parhar wrote: >> Thanks for looking at the changes, Coleen! >> >> On 3/30/2018 9:25 AM, Vladimir Kozlov wrote: >>> On 3/30/18 8:31 AM, coleen.phillimore at oracle.com wrote: >>>> >>>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/vtableStubs.cpp.udiff.html >>>> >>>> >>>> ??? uint hash = VtableStubs::hash(stub->is_vtable_stub(), >>>> stub->index()); >>>> >>>> >>>> Isn't stub->is_vtable_stub() always true here, so you could avoid >>>> another virtual call?? Can you assert this and pass true here? >>> >>> Not true. It could be also itable. They differentiate by value of >>> _is_vtable_stub field. It is not virtual call, it is check of the field: >>> >>> bool is_vtable_stub() { return? _is_vtable_stub; } > > Thank you for the clarification.? I had it mixed up with > is_vtable_blob().? Glad it's not virtual. >>> >>>> >>>> The whole subclassing of BufferBlob with an virtual call >>>> is_vtable_blob() seems inefficient if you have to call this all the >>>> time during unloading and otherwise, and spending a word in one of >>>> the alignment gaps in CodeBlob to point to the type of code blob >>>> would be more efficient.? But that might be an RFE for the compiler >>>> team.? Also since the only virtual functions are "is_>>> type>" functions. >>> >>> I think we discussed that during rewriting this code for AOT. I don't >>> remember why we decided to keep it this way. May be because it did >>> not show up on our code performance profiling. On x86 it is really fast. >>> >>> May be we should review this code again if you think it should be >>> optimized. >>> > > It might be worth discussing, if this code continues to matter for > unloading method performance. Devirtualizing does not always help unless it is in very critical code but it may complicate things for partially constructed subclass objects (virtual pointer is initialized first but we don't know when fields will be initialized). The first thing which was suggested and Poonam implemented was devirtualize CompiledICHolder(). But it did not help at all. Vladimir > > >>>> >>>> You should add this new type at >>>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/src/hotspot/share/code/codeBlob.hpp.html >>> >>> >>> >>> >>> I thought it was there in one of the working versions. Yes, it needs >>> to be added to that comment. >>> >> Sure, I will add the new type to the comments. > > Thanks!? Thank you for fixing this issue! > Coleen >> >> Thanks, >> Poonam >> >>> Thanks, >>> Vladimir >>> >>>> >>>> >>>> line 61. >>>> >>>> Thanks, >>>> Coleen >>>> >>>> On 3/29/18 6:14 PM, Poonam Parhar wrote: >>>>> Hello Vladimir, >>>>> >>>>> Thanks for reviewing the changes. Please find the updated webrev here: >>>>> http://cr.openjdk.java.net/~poonam/8199406/webrev.01/ >>>>> >>>>> Regards, >>>>> Poonam >>>>> >>>>> On 3/29/2018 2:23 PM, Vladimir Kozlov wrote: >>>>>> Looks good. I have few comments. >>>>>> >>>>>> You can remove old is_entry_point() which have the same code as >>>>>> new entry_point(). After changes it is used only in one place in >>>>>> CompiledIC::is_megamorphic(). It can be replace with NULL check of >>>>>> entry_point() result there. Add comment in vtableStubs.hpp for new >>>>>> method. >>>>>> >>>>>> is_metadata_method field name should start with _ (this is Hotspot >>>>>> convention). >>>>>> >>>>>> is_loader_alive() could be simplified since you have field now: >>>>>> >>>>>> ? inline bool is_loader_alive(BoolObjectClosure* is_alive) { >>>>>> ??? Klass* k = is_metadata_method ? >>>>>> ((Method*)_holder_metadata)->method_holder() : >>>>>> (Klass*)_holder_metadata; >>>>>> ??? if (!k->is_loader_alive(is_alive)) { >>>>>> ????? return false; >>>>>> ??? } >>>>>> ??? if (!_holder_klass->is_loader_alive(is_alive)) { >>>>>> ????? return false; >>>>>> ??? } >>>>>> ??? return true; >>>>>> ? } >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 3/29/18 1:31 PM, Poonam Parhar wrote: >>>>>>> Hello, >>>>>>> >>>>>>> Please review the changes for the following bug that improve the >>>>>>> nmethod unloading times with a couple of optimizations. >>>>>>> >>>>>>> JDK-8199406 : >>>>>>> Performance drop with Java JDK 1.8.0_162-b32 >>>>>>> Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.00/ >>>>>>> >>>>>>> This changeset includes two changes: >>>>>>> 1. In /compiledIC.cpp, CompiledIC::is_icholder_entry()/ , we need >>>>>>> to determine if the code blob is an itable stub. With this >>>>>>> change, before linearly searching through all the VtableStub >>>>>>> entries, we first check whether the codeblob is a vtable or not. >>>>>>> We now also parse through the list entries only once rather than >>>>>>> doing it twice in /VtableStubs::is_entry_point()/ and >>>>>>> /VtableStubs::stub_containing()/. >>>>>>> 2. The second change helps avoid the virtual function calls in >>>>>>> CompiledICHolder::is_loader_alive(). CompiledICHolder now stores >>>>>>> information whether the metadata it holds is a method or a klass. >>>>>>> >>>>>>> Testing: >>>>>>> - Customer testing confirming that their class-unloading times >>>>>>> drop from 10s of seconds to an average of 0.75 secs. >>>>>>> - mach5 jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2 >>>>>>> >>>>>>> Thanks, >>>>>>> Poonam >>>>>>> >>>>> >>>> >> > From poonam.bajaj at oracle.com Fri Mar 30 19:12:20 2018 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Fri, 30 Mar 2018 12:12:20 -0700 Subject: RFR(8u-dev): JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 Message-ID: <606ecb8e-5fb3-fd52-4fae-223414c28c05@oracle.com> Hello Vladimir, Coleen, Thomas, Could I request you to take a look at the 8u changes for JDK-8199406 as well: Bug: JDK-8199406 : Performance drop with Java JDK 1.8.0_162-b32 Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.8u/ jdk/hs changeset: http://hg.openjdk.java.net/jdk/hs/rev/d6893a76c554 The changes mostly apply cleanly from JDK 11 changeset except for the file path shuffling, and a few other minor changes such as calling round_to() instead of align_up(). Thanks, Poonam -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Mar 30 19:37:45 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 30 Mar 2018 12:37:45 -0700 Subject: RFR(8u-dev): JDK-8199406: Performance drop with Java JDK 1.8.0_162-b32 In-Reply-To: <606ecb8e-5fb3-fd52-4fae-223414c28c05@oracle.com> References: <606ecb8e-5fb3-fd52-4fae-223414c28c05@oracle.com> Message-ID: <44962991-902a-4e8c-9707-cb7bdf2b7971@oracle.com> Looks good. Thanks, Vladimir On 3/30/18 12:12 PM, Poonam Parhar wrote: > Hello Vladimir, Coleen, Thomas, > > Could I request you to take a look at the 8u changes for JDK-8199406 as > well: > > Bug: JDK-8199406 : > Performance drop with Java JDK 1.8.0_162-b32 > Webrev: http://cr.openjdk.java.net/~poonam/8199406/webrev.8u/ > jdk/hs changeset: http://hg.openjdk.java.net/jdk/hs/rev/d6893a76c554 > > The changes mostly apply cleanly from JDK 11 changeset except for the > file path shuffling, and a few other minor changes such as calling > round_to() instead of align_up(). > > Thanks, > Poonam > > > From Derek.White at cavium.com Fri Mar 30 23:24:09 2018 From: Derek.White at cavium.com (White, Derek) Date: Fri, 30 Mar 2018 23:24:09 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi Jc, I?ve been having trouble getting your patch to apply correctly. I may have based it on the wrong version. In any case, I think there?s a missing update to macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where ?JavaThread::tlab_end_offset()? should become ?JavaThread::tlab_current_end_offset()?. This should correspond to the other port?s changes in templateTable_.cpp files. Thanks! - Derek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of JC Beyler Sent: Wednesday, March 28, 2018 11:43 AM To: Erik ?sterlund Cc: serviceability-dev at openjdk.java.net; hotspot-compiler-dev Subject: Re: JDK-8171119: Low-Overhead Heap Profiling Hi all, I've been working on deflaking the tests mostly and the wording in the JVMTI spec. Here is the two incremental webrevs: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ Here is the total webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ Here are the notes of this change: - Currently the tests pass 100 times in a row, I am working on checking if they pass 1000 times in a row. - The default sampling rate is set to 512k, this is what we use internally and having a default means that to enable the sampling with the default, the user only has to do a enable event/disable event via JVMTI (instead of enable + set sample rate). - I deprecated the code that was handling the fast path tlab refill if it happened since this is now deprecated - Though I saw that Graal is still using it so I have to see what needs to be done there exactly Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when the event system is turned on and the callback to the native agent is just empty. I got a 3% overhead with a 512k sampling rate with the code I put in the native side of my tests. Thanks and comments are appreciated, Jc On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > wrote: Hi all, The incremental webrev update is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ The full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ Major change here is: - I've removed the heapMonitoring.cpp code in favor of just having the sampling events as per Serguei's request; I still have to do some overhead measurements but the tests prove the concept can work - Most of the tlab code is unchanged, the only major part is that now things get sent off to event collectors when used and enabled. - Added the interpreter collectors to handle interpreter execution - Updated the name from SetTlabHeapSampling to SetHeapSampling to be more generic - Added a mutex for the thread sampling so that we can initialize an internal static array safely - Ported the tests from the old system to this new one I've also updated the JEP and CSR to reflect these changes: https://bugs.openjdk.java.net/browse/JDK-8194905 https://bugs.openjdk.java.net/browse/JDK-8171119 In order to make this have some forward progress, I've removed the heap sampling code entirely and now rely entirely on the event sampling system. The tests reflect this by using a simplified implementation of what an agent could do: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c (Search for anything mentioning event_storage). I have not taken the time to port the whole code we had originally in heapMonitoring to this. I hesitate only because that code was in C++, I'd have to port it to C and this is for tests so perhaps what I have now is good enough? As far as testing goes, I've ported all the relevant tests and then added a few: - Turning the system on/off - Testing using various GCs - Testing using the interpreter - Testing the sampling rate - Testing with objects and arrays - Testing with various threads Finally, as overhead goes, I have the numbers of the system off vs a clean build and I have 0% overhead, which is what we'd want. This was using the Dacapo benchmarks. I am now preparing to run a version with the events on using dacapo and will report back here. Any comments are welcome :) Jc On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > wrote: Hi all, I apologize for the delay but I wanted to add an event system and that took a bit longer than expected and I also reworked the code to take into account the deprecation of FastTLABRefill. This update has four parts: A) I moved the implementation from Thread to ThreadHeapSampler inside of Thread. Would you prefer it as a pointer inside of Thread or like this works for you? Second question would be would you rather have an association outside of Thread altogether that tries to remember when threads are live and then we would have something like: ThreadHeapSampler::get_sampling_size(this_thread); I worry about the overhead of this but perhaps it is not too too bad? B) I also have been working on the Allocation event system that sends out a notification at each sampled event. This will be practical when wanting to do something at the allocation point. I'm also looking at if the whole heapMonitoring code could not reside in the agent code and not in the JDK. I'm not convinced but I'm talking to Serguei about it to see/assess :) - Also added two tests for the new event subsystem C) Removed the slow_path fields inside the TLAB code since now FastTLABRefill is deprecated D) Updated the JVMTI documentation and specification for the methods. So the incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ and the full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 I believe I have updated the various JIRA issues that track this :) Thanks for your input, Jc On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > wrote: Hi Erik, I inlined my answers, which the last one seems to answer Robbin's concerns about the same thing (adding things to Thread). On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > wrote: Hi JC, Comments are inlined below. On 2018-02-13 06:18, JC Beyler wrote: Hi Erik, Thanks for your answers, I've now inlined my own answers/comments. I've done a new webrev here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ The incremental is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ Note to all: - I've been integrating changes from Erin/Serguei/David comments so this webrev incremental is a bit an answer to all comments in one. I apologize for that :) On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > wrote: Hi JC, Sorry for the delayed reply. Inlined answers: On 2018-02-06 00:04, JC Beyler wrote: Hi Erik, (Renaming this to be folded into the newly renamed thread :)) First off, thanks a lot for reviewing the webrev! I appreciate it! I updated the webrev to: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ And the incremental one is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ It contains: - The change for since from 9 to 11 for the jvmti.xml - The use of the OrderAccess for initialized - Clearing the oop I also have inlined my answers to your comments. The biggest question will come from the multiple *_end variables. A bit of the logic there is due to handling the slow path refill vs fast path refill and checking that the rug was not pulled underneath the slowpath. I believe that a previous comment was that TlabFastRefill was going to be deprecated. If this is true, we could revert this code a bit and just do a : if TlabFastRefill is enabled, disable this. And then deprecate that when TlabFastRefill is deprecated. This might simplify this webrev and I can work on a follow-up that either: removes TlabFastRefill if Robbin does not have the time to do it or add the support to the assembly side to handle this correctly. What do you think? I support removing TlabFastRefill, but I think it is good to not depend on that happening first. I'm slowly pushing on the FastTLABRefill (https://bugs.openjdk.java.net/browse/JDK-8194084), I agree on keeping both separate for now though so that we can think of both differently Now, below, inlined are my answers: On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund > wrote: Hi JC, Hope I am reviewing the right version of your work. Here goes... src/hotspot/share/gc/shared/collectedHeap.inline.hpp: 159 AllocTracer::send_allocation_outside_tlab(klass, result, size * HeapWordSize, THREAD); 160 161 THREAD->tlab().handle_sample(THREAD, result, size); 162 return result; 163 } Should not call tlab()->X without checking if (UseTLAB) IMO. Done! More about this later. src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: So first of all, there seems to quite a few ends. There is an "end", a "hard end", a "slow path end", and an "actual end". Moreover, it seems like the "hard end" is actually further away than the "actual end". So the "hard end" seems like more of a "really definitely actual end" or something. I don't know about you, but I think it looks kind of messy. In particular, I don't feel like the name "actual end" reflects what it represents, especially when there is another end that is behind the "actual end". 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { 414 // Did a fast TLAB refill occur? 415 if (_slow_path_end != _end) { 416 // Fix up the actual end to be now the end of this TLAB. 417 _slow_path_end = _end; 418 _actual_end = _end; 419 } 420 421 return _actual_end + alignment_reserve(); 422 } I really do not like making getters unexpectedly have these kind of side effects. It is not expected that when you ask for the "hard end", you implicitly update the "slow path end" and "actual end" to new values. As I said, a lot of this is due to the FastTlabRefill. If I make this not supporting FastTlabRefill, this goes away. The reason the system needs to update itself at the get is that you only know at that get if things have shifted underneath the tlab slow path. I am not sure of really better names (naming is hard!), perhaps we could do these names: - current_tlab_end // Either the allocated tlab end or a sampling point - last_allocation_address // The end of the tlab allocation - last_slowpath_allocated_end // In case a fast refill occurred the end might have changed, this is to remember slow vs fast past refills the hard_end method can be renamed to something like: tlab_end_pointer() // The end of the lab including a bit of alignment reserved bytes Those names sound better to me. Could you please provide a mapping from the old names to the new names so I understand which one is which please? This is my current guess of what you are proposing: end -> current_tlab_end actual_end -> last_allocation_address slow_path_end -> last_slowpath_allocated_end hard_end -> tlab_end_pointer Yes that is correct, that was what I was proposing. I would prefer this naming: end -> slow_path_end // the end for taking a slow path; either due to sampling or refilling actual_end -> allocation_end // the end for allocations slow_path_end -> last_slow_path_end // last address for slow_path_end (as opposed to allocation_end) hard_end -> reserved_end // the end of the reserved space of the TLAB About setting things in the getter... that still seems like a very unpleasant thing to me. It would be better to inspect the call hierarchy and explicitly update the ends where they need updating, and assert in the getter that they are in sync, rather than implicitly setting various ends as a surprising side effect in a getter. It looks like the call hierarchy is very small. With my new naming convention, reserved_end() would presumably return _allocation_end + alignment_reserve(), and have an assert checking that _allocation_end == _last_slow_path_allocation_end, complaining that this invariant must hold, and that a caller to this function, such as make_parsable(), must first explicitly synchronize the ends as required, to honor that invariant. I've renamed the variables to how you preferred it except for the _end one. I did: current_end last_allocation_address tlab_end_ptr The reason is that the architecture dependent code use the thread.hpp API and it already has tlab included into the name so it becomes tlab_current_end (which is better that tlab_current_tlab_end in my opinion). I also moved the update into a separate method with a TODO that says to remove it when FastTLABRefill is deprecated This looks a lot better now. Thanks. Note that the following comment now needs updating accordingly in threadLocalAllocBuffer.hpp: 41 // Heap sampling is performed via the end/actual_end fields. 42 // actual_end contains the real end of the tlab allocation, 43 // whereas end can be set to an arbitrary spot in the tlab to 44 // trip the return and sample the allocation. 45 // slow_path_end is used to track if a fast tlab refill occured 46 // between slowpath calls. There might be other comments too, I have not looked in detail. This was the only spot that still had an actual_end, I fixed it now. I'll do a sweep to double check other comments. Not sure it's better but before updating the webrev, I wanted to try to get input/consensus :) (Note hard_end was always further off than end). src/hotspot/share/prims/jvmti.xml: 10357 10358 10359 Can sample the heap. 10360 If this capability is enabled then the heap sampling methods can be called. 10361 10362 Looks like this capability should not be "since 9" if it gets integrated now. Updated now to 11, crossing my fingers :) src/hotspot/share/runtime/heapMonitoring.cpp: 448 if (is_alive->do_object_b(value)) { 449 // Update the oop to point to the new object if it is still alive. 450 f->do_oop(&(trace.obj)); 451 452 // Copy the old trace, if it is still live. 453 _allocated_traces->at_put(curr_pos++, trace); 454 455 // Store the live trace in a cache, to be served up on /heapz. 456 _traces_on_last_full_gc->append(trace); 457 458 count++; 459 } else { 460 // If the old trace is no longer live, add it to the list of 461 // recently collected garbage. 462 store_garbage_trace(trace); 463 } In the case where the oop was not live, I would like it to be explicitly cleared. Done I think how you wanted it. Let me know because I'm not familiar with the RootAccess API. I'm unclear if I'm doing this right or not so reviews of these parts are highly appreciated. Robbin had talked of perhaps later pushing this all into a OopStorage, should I do this now do you think? Or can that wait a second webrev later down the road? I think using handles can and should be done later. You can use the Access API now. I noticed that you are missing an #include "oops/access.inline.hpp" in your heapMonitoring.cpp file. The missing header is there for me so I don't know, I made sure it is present in the latest webrev. Sorry about that. + Did I clear it the way you wanted me to or were you thinking of something else? That is precisely how I wanted it to be cleared. Thanks. + Final question here, seems like if I were to want to not do the f->do_oop directly on the trace.obj, I'd need to do something like: f->do_oop(&value); ... trace->store_oop(value); to update the oop internally. Is that right/is that one of the advantages of going to the Oopstorage sooner than later? I think you really want to do the do_oop on the root directly. Is there a particular reason why you would not want to do that? Otherwise, yes - the benefit with using the handle approach is that you do not need to call do_oop explicitly in your code. There is no reason except that now we have a load_oop and a get_oop_addr, I was not sure what you would think of that. That's fine. Also I see a lot of concurrent-looking use of the following field: 267 volatile bool _initialized; Please note that the "volatile" qualifier does not help with reordering here. Reordering between volatile and non-volatile fields is completely free for both compiler and hardware, except for windows with MSVC, where volatile semantics is defined to use acquire/release semantics, and the hardware is TSO. But for the general case, I would expect this field to be stored with OrderAccess::release_store and loaded with OrderAccess::load_acquire. Otherwise it is not thread safe. Because everything is behind a mutex, I wasn't really worried about this. I have a test that has multiple threads trying to hit this corner case and it passes. However, to be paranoid, I updated it to using the OrderAccess API now, thanks! Let me know what you think there too! If it is indeed always supposed to be read and written under a mutex, then I would strongly prefer to have it accessed as a normal non-volatile member, and have an assertion that given lock is held or we are in a safepoint, as we do in many other places. Something like this: assert(HeapMonitorStorage_lock->owned_by_self() || (SafepointSynchronize::is_at_safepoint() && Thread::current()->is_VM_thread()), "this should not be accessed concurrently"); It would be confusing to people reading the code if there are uses of OrderAccess that are actually always protected under a mutex. Thank you for the exact example to be put in the code! I put it around each access/assignment of the _initialized method and found one case where yes you can touch it and not have the lock. It actually is "ok" because you don't act on the storage until later and only when you really want to modify the storage (see the object_alloc_do_sample method which calls the add_trace method). But, because of this, I'm going to put the OrderAccess here, I'll do some performance numbers later and if there are issues, I might add a "unsafe" read and a "safe" one to make it explicit to the reader. But I don't think it will come to that. Okay. This double return in heapMonitoring.cpp looks wrong: 283 bool initialized() { 284 return OrderAccess::load_acquire(&_initialized) != 0; 285 return _initialized; 286 } Since you said object_alloc_do_sample() is the only place where you do not hold the mutex while reading initialized(), I had a closer look at that. It looks like in its current shape, the lack of a mutex may lead to a memory leak. In particular, it first checks if (initialized()). Let's assume this is now true. It then allocates a bunch of stuff, and checks if the number of frames were over 0. If they were, it calls StackTraceStorage::storage()->add_trace() seemingly hoping that after grabbing the lock in there, initialized() will still return true. But it could now return false and skip doing anything, in which case the allocated stuff will never be freed. I fixed this now by making add_trace return a boolean and checking for that. It will be in the next webrev. Thanks, the truth is that in our implementation the system is always on or off, so this never really occurs :). In this version though, that is not true and it's important to handle so thanks again! So the analysis seems to be that _initialized is only used outside of the mutex in once instance, where it is used to perform double-checked locking, that actually causes a memory leak. I am not proposing how to fix that, just raising the issue. If you still want to perform this double-checked locking somehow, then the use of acquire/release still seems odd. Because the memory ordering restrictions of it never comes into play in this particular case. If it ever did, then the use of destroy_stuff(); release_store(_initialized, 0) would be broken anyway as that would imply that whatever concurrent reader there ever was would after reading _initialized with load_acquire() could *never* read the data that is concurrently destroyed anyway. I would be biased to think that RawAccess::load/store looks like a more appropriate solution, given that the memory leak issue is resolved. I do not know how painful it would be to not perform this double-checked locking. So I agree with this entirely. I looked also a bit more and the difference and code really stems from our internal version. In this version however, there are actually a lot of things going on that I did not go entirely through in my head but this comment made me ponder a bit more on it. Since every object_alloc_do_sample is protected by a check to HeapMonitoring::enabled(), there is only a small chance that the call is happening when things have been disabled. So there is no real need to do a first check on the initialized, it is a rare occurence that a call happens to object_alloc_do_sample and the initialized of the storage returns false. (By the way, even if you did call object_alloc_do_sample without looking at HeapMonitoring::enabled(), that would be ok too. You would gather the stacktrace and get nowhere at the add_trace call, which would return false; so though not optimal performance wise, nothing would break). Furthermore, the add_trace is really the moment of no return and we have the mutex lock and then the initialized check. So, in the end, I did two things: I removed that first check and then I removed the OrderAccess for the storage initialized. I think now I have a better grasp and understanding why it was done in our code and why it is not needed here. Thanks for pointing it out :). This now still passes my JTREG tests, especially the threaded one. As a kind of meta comment, I wonder if it would make sense to add sampling for non-TLAB allocations. Seems like if someone is rapidly allocating a whole bunch of 1 MB objects that never fit in a TLAB, I might still be interested in seeing that in my traces, and not get surprised that the allocation rate is very high yet not showing up in any profiles. That is handled by the handle_sample where you wanted me to put a UseTlab because you hit that case if the allocation is too big. I see. It was not obvious to me that non-TLAB sampling is done in the TLAB class. That seems like an abstraction crime. What I wanted in my previous comment was that we do not call into the TLAB when we are not using TLABs. If there is sampling logic in the TLAB that is used for something else than TLABs, then it seems like that logic simply does not belong inside of the TLAB. It should be moved out of the TLAB, and instead have the TLAB call this common abstraction that makes sense. So in the incremental version: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still a "crime". The reason is that the system has to have the bytes_until_sample on a per-thread level and it made "sense" to have it with the TLAB implementation. Also, I was not sure how people felt about adding something to the thread instance instead. Do you think it fits better at the Thread level? I can see how difficult it is to make it happen there and add some logic there. Let me know what you think. We have an unfortunate situation where everyone that has some fields that are thread local tend to dump them right into Thread, making the size and complexity of Thread grow as it becomes tightly coupled with various unrelated subsystems. It would be desirable to have a separate class for this instead that encapsulates the sampling logic. That class could possibly reside in Thread though as a value object of Thread. I imagined that would be the case but was not sure. I will look at the example that Robbin is talking about (ThreadSMR) and will see how to refactor my code to use that. Thanks again for your help, Jc Hope I have answered your questions and that my feedback makes sense to you. You have and thank you for them, I think we are getting to a cleaner implementation and things are getting better and more readable :) Yes it is getting better. Thanks, /Erik Thanks for your help! Jc Thanks, /Erik I double checked by changing the test http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java to use a smaller Tlab (2048) and made the object bigger and it goes through that and passes. Thanks again for your review and I look forward to your pointers for the questions I now have raised! Jc Thanks, /Erik On 2018-01-26 06:45, JC Beyler wrote: Thanks Robbin for the reviews :) The new full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ The incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ I inlined my answers: On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn > wrote: Hi JC, great to see another revision! #### heapMonitoring.cpp StackTraceData should not contain the oop for 'safety' reasons. When StackTraceData is moved from _allocated_traces: L452 store_garbage_trace(trace); it contains a dead oop. _allocated_traces could instead be a tupel of oop and StackTraceData thus dead oops are not kept. Done I used inheritance to make the copier work regardless but the idea is the same. You should use the new Access API for loading the oop, something like this: RootAccess::load(...) I don't think you need to use Access API for clearing the oop, but it would look nicer. And you shouldn't probably be using: Universe::heap()->is_in_reserved(value) I am unfamiliar with this but I think I did do it like you wanted me to (all tests pass so that's a start). I'm not sure how to clear the oop exactly, is there somewhere that does that, which I can use to do the same? I removed the is_in_reserved, this came from our internal version, I don't know why it was there but my tests work without so I removed it :) The lock: L424 MutexLocker mu(HeapMonitorStorage_lock); Is not needed as far as I can see. weak_oops_do is called in a safepoint, no TLAB allocation can happen and JVMTI thread can't access these data-structures. Is there something more to this lock that I'm missing? Since a thread can call the JVMTI getLiveTraces (or any of the other ones), it can get to the point of trying to copying the _allocated_traces. I imagine it is possible that this is happening during a GC or that it can be started and a GC happens afterwards. Therefore, it seems to me that you want this protected, no? #### You have 6 files without any changes in them (any more): g1CollectedHeap.cpp psMarkSweep.cpp psParallelCompact.cpp genCollectedHeap.cpp referenceProcessor.cpp thread.hpp Done. #### I have not looked closely, but is it possible to hide heap sampling in AllocTracer ? (with some minor changes to the AllocTracer API) I am imagining that you are saying to move the code that does the sampling code (change the tlab end, do the call to HeapMonitoring, etc.) into the AllocTracer code itself? I think that is right and I'll look if that is possible and prepare a webrev to show what would be needed to make that happen. #### Minor nit, when declaring pointer there is a little mix of having the pointer adjacent by type name and data name. (Most hotspot code is by type name) E.g. heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... heapMonitoring.cpp:733 Method* m = vfst.method(); (not just this file) Done! #### HeapMonitorThreadOnOffTest.java:77 I would make g_tmp volatile, otherwise the assignment in loop may theoretical be skipped. Also done! Thanks again! Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Sat Mar 31 14:49:20 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Sat, 31 Mar 2018 14:49:20 +0000 Subject: RFR(M): 8198756: Lazy allocation of compiler threads In-Reply-To: References: <864a492b-6f14-2372-2783-5f687bc43638@oracle.com> <41d9a441f84e41919f4566df78b46a0f@sap.com> <295c3925-9605-42d7-aac8-a7074b237aa0@oracle.com> <589e4fdca2bc47f197066a0f110e7d34@sap.com> <13980bc9-3a12-23d3-477d-596b4c2432ee@oracle.com> <767c3e87c23246e89c2c6d368aa30bcb@sap.com> Message-ID: Hi Vladimir, I have added your changes, but switched on ReduceNumberOfCompilerThreads by default: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.03/ I have also added a fix for the stack size test. I will run testing and look at further issues on Tuesday (the next business day in Germany). Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 29. M?rz 2018 19:53 To: Doerr, Martin Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads Okay. I posted webrev.02 testing failures in bug report. Thanks, Vladimir K On 3/29/18 10:15 AM, Doerr, Martin wrote: > Hi Vladimir, > > sorry, I had missed your proposals you have added to the bug while I was sick. webrev.02 is only based on the emails. > I'll think about the ideas which were written in the bug next week. > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 29. M?rz 2018 18:42 > To: Doerr, Martin > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads > > Hi Martin, > > Thank you for update. > > Does using thread->smr_delete() solve runtime/whitebox/WBStackSize.java > test problem I reported in RFE? > > I see you kept _c1_compile_queue->size() / 2. I think it create too many > C1 threads reaching max number _c1_count very fast. > > I will start testing with 02 changes and let you know results. > > Thanks, > Vladimir > > On 3/28/18 9:02 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> I have addressed your proposals with this new webrev: >> http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.02/ >> >> Please take a look. >> >> Thanks for your support. Best regards, >> Martin >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Montag, 26. M?rz 2018 23:15 >> To: Doerr, Martin >> Cc: 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8198756: Lazy allocation of compiler threads >> >> Hi Martin, >> >> We can't delete _log when deleting CompilerThread. Log is referenced >> globally and used on VM exit to generate final log file wh