From ysr1729 at gmail.com Sat Aug 1 00:39:03 2015 From: ysr1729 at gmail.com (ysr1729 at gmail.com) Date: Fri, 31 Jul 2015 17:39:03 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References:

<55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com>

Message-ID: <9B8024CB-7044-4E97-96B4-C44147C1FE1B@gmail.com> Hi Vitaly -- Which jdk 8 version were you testing? It's a bit of the proverbial curate's egg at the moment (albeit not in the original sense, i assure you!) but if i may be allowed to mix my metaphors, I would be inclined not to throw out the baby with the bath water, yet. There are services that have seen benefits and some that haven't, and the picture overall is still a bit fuzzy. May be someone out there has done a more disciplined epidemiological study... PS: a couple of services were running tiered when it wasn't the default (in jdk 7)... -- ramki Sent from my iPhone > On Jul 31, 2015, at 3:08 PM, Vitaly Davidovich wrote: > > Ramki, are you actually seeing better peak perf with tiered than C2? I experimented with it on a real workload and it was a net loss for peak perf (anywhere from 8-20% worse than C2, but also quite unstable); this was with a very large code cache to play it safe, but no other tuning. > > sent from my phone > >> On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" wrote: >> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if with tiered and a necessarily bounded, if large, code cache whether flushing will in fact eventually become necessary, wrt yr suggested temporary workaround.) >> >> Have a good weekend! >> -- ramki >> >>> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov wrote: >>> Got it. Yes, it is issue with thousands java threads. >>> You are the first pointing this problem. File bug on compiler. We will look what we can do. Most likely we need parallelize this work. >>> >>> Method's hotness is used only for UseCodeCacheFlushing. You can try to guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch it off. >>> >>> We need mark_as_seen_on_stack so leave it. >>> >>> Thanks, >>> Vladimir >>> >>> >>>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: >>>> Hi Vladimir -- >>>> >>>> I noticed the increase even with Initial and Reserved set to the default >>>> of 240 MB, but actual usage much lower (less than a quarter). >>>> >>>> Look at this code path. Note that this is invoked at every safepoint >>>> (although it says "periodically" in the comment). >>>> In the mark_active_nmethods() method, there's a thread iteration in both >>>> branches of the if. I haven't checked to >>>> see which of the two was the culprit here, yet (if either). >>>> >>>> // Various cleaning tasks that should be done periodically at safepoints >>>> >>>> void SafepointSynchronize::do_cleanup_tasks() { >>>> >>>> .... >>>> >>>> { >>>> >>>> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >>>> >>>> NMethodSweeper::mark_active_nmethods(); >>>> >>>> } >>>> >>>> .. >>>> >>>> } >>>> >>>> >>>> void NMethodSweeper::mark_active_nmethods() { >>>> >>>> ... >>>> >>>> if (!sweep_in_progress()) { >>>> >>>> _seen = 0; >>>> >>>> _sweep_fractions_left = NmethodSweepFraction; >>>> >>>> _current = CodeCache::first_nmethod(); >>>> >>>> _traversals += 1; >>>> >>>> _total_time_this_sweep = Tickspan(); >>>> >>>> >>>> if (PrintMethodFlushing) { >>>> >>>> tty->print_cr("### Sweep: stack traversal %d", _traversals); >>>> >>>> } >>>> >>>> Threads::nmethods_do(&mark_activation_closure); >>>> >>>> >>>> } else { >>>> >>>> // Only set hotness counter >>>> >>>> Threads::nmethods_do(&set_hotness_closure); >>>> >>>> } >>>> >>>> >>>> OrderAccess::storestore(); >>>> >>>> } >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >>>> > wrote: >>>> >>>> Hi Ramki, >>>> >>>> Did you fill up CodeCache? It start scanning aggressive only with >>>> full CodeCache: >>>> >>>> // Force stack scanning if there is only 10% free space in the >>>> code cache. >>>> // We force stack scanning only non-profiled code heap gets full, >>>> since critical >>>> // allocation go to the non-profiled heap and we must be make >>>> sure that there is >>>> // enough space. >>>> double free_percent = 1 / >>>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >>>> if (free_percent <= StartAggressiveSweepingAt) { >>>> do_stack_scanning(); >>>> } >>>> >>>> Vladimir >>>> >>>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >>>> >>>> >>>> Yes. >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >>>> >>>> >> wrote: >>>> >>>> Ramki, are you running tiered compilation? >>>> >>>> sent from my phone >>>> >>>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >>>> >>>> >> wrote: >>>> >>>> >>>> Hello GC and Compiler teams! >>>> >>>> One of our services that runs with several thousand threads >>>> recently noticed an increase >>>> in safepoint stop times, but not gc times, upon >>>> transitioning to >>>> JDK 8. >>>> >>>> Further investigation revealed that most of the delta was >>>> related to the so-called >>>> pre-gc/vmop "cleanup" phase when various book-keeping >>>> activities >>>> are performed, >>>> and more specifically in the portion that walks java thread >>>> stacks single-threaded (!) >>>> and updates the hotness counters for the active >>>> nmethods. This >>>> code appears to >>>> be new to JDK 8 (in jdk 7 one would walk the stacks >>>> only during >>>> code cache sweeps). >>>> >>>> I have two questions: >>>> (1) has anyone else (typically, I'd expect applications >>>> with >>>> many hundreds or thousands of threads) >>>> noticed this regression? >>>> (2) Can we do better, for example, by: >>>> (a) doing these updates by walking thread stacks in >>>> multiple worker threads in parallel, or best of all: >>>> (b) doing these updates when we walk the thread >>>> stacks >>>> during GC, and skipping this phase entirely >>>> for non-GC safepoints (with attendant loss in >>>> frequency of this update in low GC frequency >>>> scenarios). >>>> >>>> It seems kind of silly to do GC's with many multiple worker >>>> threads, but do these thread stack >>>> walks single-threaded when it is embarrasingly parallel >>>> (one >>>> could predicate the parallelization >>>> based on the measured stack sizes and thread population, if >>>> there was concern on the ovrhead of >>>> activating and deactivating the thread gangs for the work). >>>> >>>> A followup question: Any guesses as to how code cache >>>> sweep/eviction quality might be compromised if one >>>> were to dispense with these hotness updates entirely >>>> (or at a >>>> much reduced frequency), as a temporary >>>> workaround to the performance problem? >>>> >>>> Thoughts/Comments? In particular, has this issue been >>>> addressed >>>> perhaps in newer JVMs? >>>> >>>> Thanks for any comments, feedback, pointers! >>>> -- ramki >>>> >>>> PS: for comparison, here's data with >>>> +TraceSafepointCleanup from >>>> JDK 7 (first, where this isn't done) >>>> vs JDK 8 (where this is done) with a program that has a few >>>> thousands of threads: >>>> >>>> >>>> >>>> JDK 7: >>>> .. >>>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>>> .. >>>> >>>> JDK 8: >>>> .. >>>> 7368.634: [mark nmethods, 0.0177030 secs] >>>> 7369.587: [mark nmethods, 0.0178305 secs] >>>> 7370.479: [mark nmethods, 0.0180260 secs] >>>> 7371.503: [mark nmethods, 0.0186494 secs] >>>> .. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Sat Aug 1 02:17:29 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 31 Jul 2015 22:17:29 -0400 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <9B8024CB-7044-4E97-96B4-C44147C1FE1B@gmail.com> References:

<55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com>

<9B8024CB-7044-4E97-96B4-C44147C1FE1B@gmail.com> Message-ID: Hi Ramki, That experiment was performed on 7u60, not 8; I may revisit this with 8 or perhaps wait for segregated code cache to be available before trying again. One thing that worried me was the tuning aspect of tiered, which is a bit opaque as compared to, say, GC logs - it's a bit too black boxey for me. Also, the servers I was running this on have tightly chosen cpu affinity masks and there aren't many spare cores to dedicate to C1 and C2 compiler threads. But, I may look at this again in the near future. sent from my phone On Jul 31, 2015 8:39 PM, wrote: > Hi Vitaly -- Which jdk 8 version were you testing? It's a bit of the > proverbial curate's egg at the moment (albeit not in the original sense, i > assure you!) but if i may be allowed to mix my metaphors, I would be > inclined not to throw out the baby with the bath water, yet. There are > services that have seen benefits and some that haven't, and the picture > overall is still a bit fuzzy. May be someone out there has done a more > disciplined epidemiological study... > > PS: a couple of services were running tiered when it wasn't the default > (in jdk 7)... > > -- ramki > > Sent from my iPhone > > On Jul 31, 2015, at 3:08 PM, Vitaly Davidovich wrote: > > Ramki, are you actually seeing better peak perf with tiered than C2? I > experimented with it on a real workload and it was a net loss for peak perf > (anywhere from 8-20% worse than C2, but also quite unstable); this was with > a very large code cache to play it safe, but no other tuning. > > sent from my phone > On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" wrote: > >> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if >> with tiered and a necessarily bounded, if large, code cache whether >> flushing will in fact eventually become necessary, wrt yr suggested >> temporary workaround.) >> >> Have a good weekend! >> -- ramki >> >> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com> wrote: >> >>> Got it. Yes, it is issue with thousands java threads. >>> You are the first pointing this problem. File bug on compiler. We will >>> look what we can do. Most likely we need parallelize this work. >>> >>> Method's hotness is used only for UseCodeCacheFlushing. You can try to >>> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch >>> it off. >>> >>> We need mark_as_seen_on_stack so leave it. >>> >>> Thanks, >>> Vladimir >>> >>> >>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: >>> >>>> Hi Vladimir -- >>>> >>>> I noticed the increase even with Initial and Reserved set to the default >>>> of 240 MB, but actual usage much lower (less than a quarter). >>>> >>>> Look at this code path. Note that this is invoked at every safepoint >>>> (although it says "periodically" in the comment). >>>> In the mark_active_nmethods() method, there's a thread iteration in both >>>> branches of the if. I haven't checked to >>>> see which of the two was the culprit here, yet (if either). >>>> >>>> // Various cleaning tasks that should be done periodically at safepoints >>>> >>>> void SafepointSynchronize::do_cleanup_tasks() { >>>> >>>> .... >>>> >>>> { >>>> >>>> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >>>> >>>> NMethodSweeper::mark_active_nmethods(); >>>> >>>> } >>>> >>>> .. >>>> >>>> } >>>> >>>> >>>> void NMethodSweeper::mark_active_nmethods() { >>>> >>>> ... >>>> >>>> if (!sweep_in_progress()) { >>>> >>>> _seen = 0; >>>> >>>> _sweep_fractions_left = NmethodSweepFraction; >>>> >>>> _current = CodeCache::first_nmethod(); >>>> >>>> _traversals += 1; >>>> >>>> _total_time_this_sweep = Tickspan(); >>>> >>>> >>>> if (PrintMethodFlushing) { >>>> >>>> tty->print_cr("### Sweep: stack traversal %d", _traversals); >>>> >>>> } >>>> >>>> Threads::nmethods_do(&mark_activation_closure); >>>> >>>> >>>> } else { >>>> >>>> // Only set hotness counter >>>> >>>> Threads::nmethods_do(&set_hotness_closure); >>>> >>>> } >>>> >>>> >>>> OrderAccess::storestore(); >>>> >>>> } >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >>>> > wrote: >>>> >>>> Hi Ramki, >>>> >>>> Did you fill up CodeCache? It start scanning aggressive only with >>>> full CodeCache: >>>> >>>> // Force stack scanning if there is only 10% free space in the >>>> code cache. >>>> // We force stack scanning only non-profiled code heap gets full, >>>> since critical >>>> // allocation go to the non-profiled heap and we must be make >>>> sure that there is >>>> // enough space. >>>> double free_percent = 1 / >>>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * >>>> 100; >>>> if (free_percent <= StartAggressiveSweepingAt) { >>>> do_stack_scanning(); >>>> } >>>> >>>> Vladimir >>>> >>>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >>>> >>>> >>>> Yes. >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >>>> >>>> >> wrote: >>>> >>>> Ramki, are you running tiered compilation? >>>> >>>> sent from my phone >>>> >>>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >>>> >>>> >> >>>> wrote: >>>> >>>> >>>> Hello GC and Compiler teams! >>>> >>>> One of our services that runs with several thousand >>>> threads >>>> recently noticed an increase >>>> in safepoint stop times, but not gc times, upon >>>> transitioning to >>>> JDK 8. >>>> >>>> Further investigation revealed that most of the delta >>>> was >>>> related to the so-called >>>> pre-gc/vmop "cleanup" phase when various book-keeping >>>> activities >>>> are performed, >>>> and more specifically in the portion that walks java >>>> thread >>>> stacks single-threaded (!) >>>> and updates the hotness counters for the active >>>> nmethods. This >>>> code appears to >>>> be new to JDK 8 (in jdk 7 one would walk the stacks >>>> only during >>>> code cache sweeps). >>>> >>>> I have two questions: >>>> (1) has anyone else (typically, I'd expect applications >>>> with >>>> many hundreds or thousands of threads) >>>> noticed this regression? >>>> (2) Can we do better, for example, by: >>>> (a) doing these updates by walking thread >>>> stacks in >>>> multiple worker threads in parallel, or best of all: >>>> (b) doing these updates when we walk the thread >>>> stacks >>>> during GC, and skipping this phase entirely >>>> for non-GC safepoints (with attendant >>>> loss in >>>> frequency of this update in low GC frequency >>>> scenarios). >>>> >>>> It seems kind of silly to do GC's with many multiple >>>> worker >>>> threads, but do these thread stack >>>> walks single-threaded when it is embarrasingly parallel >>>> (one >>>> could predicate the parallelization >>>> based on the measured stack sizes and thread >>>> population, if >>>> there was concern on the ovrhead of >>>> activating and deactivating the thread gangs for the >>>> work). >>>> >>>> A followup question: Any guesses as to how code cache >>>> sweep/eviction quality might be compromised if one >>>> were to dispense with these hotness updates entirely >>>> (or at a >>>> much reduced frequency), as a temporary >>>> workaround to the performance problem? >>>> >>>> Thoughts/Comments? In particular, has this issue been >>>> addressed >>>> perhaps in newer JVMs? >>>> >>>> Thanks for any comments, feedback, pointers! >>>> -- ramki >>>> >>>> PS: for comparison, here's data with >>>> +TraceSafepointCleanup from >>>> JDK 7 (first, where this isn't done) >>>> vs JDK 8 (where this is done) with a program that has >>>> a few >>>> thousands of threads: >>>> >>>> >>>> >>>> JDK 7: >>>> .. >>>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>>> .. >>>> >>>> JDK 8: >>>> .. >>>> 7368.634: [mark nmethods, 0.0177030 secs] >>>> 7369.587: [mark nmethods, 0.0178305 secs] >>>> 7370.479: [mark nmethods, 0.0180260 secs] >>>> 7371.503: [mark nmethods, 0.0186494 secs] >>>> .. >>>> >>>> >>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From filipp.zhinkin at gmail.com Sun Aug 2 14:10:44 2015 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Sun, 2 Aug 2015 17:10:44 +0300 Subject: RFR (S): 8067014: LinearScan::is_sorted significantly slows down fastdebug builds' performance In-Reply-To: References: <54F99281.7020101@oracle.com>

Message-ID: ping On Mon, Mar 23, 2015 at 1:40 PM, Filipp Zhinkin wrote: > Hi all, > > sorry for a late reply. > > I don't think that it's possible to remove is_sorted assertion from > create_unhandled_lists, because it's crucial condition for a linear > scan allocation algorithm and it's pretty easy to break it incidentally. > Existing assertion could significantly reduce time required to locate > an issue when something will go wrong. > > However, I believe that it could be relaxed to check only that > intervals in _sorted_intervals list are actually ordered and that > _new_intervals_from_allocation list is empty (in sorting methods > we still will be verifying that sorted and unsorted lists contain > same intervals). > > What do you guys think about that? > > Regards, > Filippp. > > > On Fri, Mar 6, 2015 at 9:24 PM, Filipp Zhinkin wrote: >> Hi Aleksey, >> >> thanks for looking at it! >> >> On Fri, Mar 6, 2015 at 2:41 PM, Aleksey Shipilev >> wrote: >>> Hi Filipp, >>> >>> On 06.03.2015 14:33, Filipp Zhinkin wrote: >>>> In certain cases (like -client -Xcomp) C1 compilation is very slow >>>> w/ fastdebug builds. A place where we spent enormous amount of time >>>> is LinearScan::is_sorted method, which simply verifies that a list >>>> that should be sorted is actually sorted and that both sorted and >>>> unsorted lists contains same intervals. >>> >>> Okay, what caller of is_sorted dominates? Maybe instead of optimizing >>> the is_sorted itself, you need to move/relax the assert in some selected >>> places? >> >> Well, the dominating caller is LinearScan::create_unhandled_lists [1]. >> >>> >>> That is to say I am not fond of complicating the non-product code that >>> does verification without a compelling reason to do so; let's first >>> figure out if we "just" do excess asserts. >> >> That's a good point. I'll try to figure a out if an assertion is placed to be >> sure that all methods called in the right sequence and if it's true, then >> it may be better to use less expensive approach. >> >> Thanks, >> Filipp. >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/de7ca28f8b7d/src/share/vm/c1/c1_LinearScan.cpp#l1486 >> >>> >>> Thanks, >>> -Aleksey. >>> From ysr1729 at gmail.com Sun Aug 2 18:11:31 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Sun, 2 Aug 2015 11:11:31 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <55BBE883.1080308@oracle.com> References:

<55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com> Message-ID: I filed: https://bugs.openjdk.java.net/browse/JDK-8132849 thanks! -- ramki On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov wrote: > Got it. Yes, it is issue with thousands java threads. > You are the first pointing this problem. File bug on compiler. We will > look what we can do. Most likely we need parallelize this work. > > Method's hotness is used only for UseCodeCacheFlushing. You can try to > guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch > it off. > > We need mark_as_seen_on_stack so leave it. > > Thanks, > Vladimir > > > On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: > >> Hi Vladimir -- >> >> I noticed the increase even with Initial and Reserved set to the default >> of 240 MB, but actual usage much lower (less than a quarter). >> >> Look at this code path. Note that this is invoked at every safepoint >> (although it says "periodically" in the comment). >> In the mark_active_nmethods() method, there's a thread iteration in both >> branches of the if. I haven't checked to >> see which of the two was the culprit here, yet (if either). >> >> // Various cleaning tasks that should be done periodically at safepoints >> >> void SafepointSynchronize::do_cleanup_tasks() { >> >> .... >> >> { >> >> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >> >> NMethodSweeper::mark_active_nmethods(); >> >> } >> >> .. >> >> } >> >> >> void NMethodSweeper::mark_active_nmethods() { >> >> ... >> >> if (!sweep_in_progress()) { >> >> _seen = 0; >> >> _sweep_fractions_left = NmethodSweepFraction; >> >> _current = CodeCache::first_nmethod(); >> >> _traversals += 1; >> >> _total_time_this_sweep = Tickspan(); >> >> >> if (PrintMethodFlushing) { >> >> tty->print_cr("### Sweep: stack traversal %d", _traversals); >> >> } >> >> Threads::nmethods_do(&mark_activation_closure); >> >> >> } else { >> >> // Only set hotness counter >> >> Threads::nmethods_do(&set_hotness_closure); >> >> } >> >> >> OrderAccess::storestore(); >> >> } >> >> >> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >> > wrote: >> >> Hi Ramki, >> >> Did you fill up CodeCache? It start scanning aggressive only with >> full CodeCache: >> >> // Force stack scanning if there is only 10% free space in the >> code cache. >> // We force stack scanning only non-profiled code heap gets full, >> since critical >> // allocation go to the non-profiled heap and we must be make >> sure that there is >> // enough space. >> double free_percent = 1 / >> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >> if (free_percent <= StartAggressiveSweepingAt) { >> do_stack_scanning(); >> } >> >> Vladimir >> >> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >> >> >> Yes. >> >> >> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >> >> >> wrote: >> >> Ramki, are you running tiered compilation? >> >> sent from my phone >> >> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >> >> >> >> wrote: >> >> >> Hello GC and Compiler teams! >> >> One of our services that runs with several thousand >> threads >> recently noticed an increase >> in safepoint stop times, but not gc times, upon >> transitioning to >> JDK 8. >> >> Further investigation revealed that most of the delta was >> related to the so-called >> pre-gc/vmop "cleanup" phase when various book-keeping >> activities >> are performed, >> and more specifically in the portion that walks java >> thread >> stacks single-threaded (!) >> and updates the hotness counters for the active >> nmethods. This >> code appears to >> be new to JDK 8 (in jdk 7 one would walk the stacks >> only during >> code cache sweeps). >> >> I have two questions: >> (1) has anyone else (typically, I'd expect applications >> with >> many hundreds or thousands of threads) >> noticed this regression? >> (2) Can we do better, for example, by: >> (a) doing these updates by walking thread stacks >> in >> multiple worker threads in parallel, or best of all: >> (b) doing these updates when we walk the thread >> stacks >> during GC, and skipping this phase entirely >> for non-GC safepoints (with attendant loss >> in >> frequency of this update in low GC frequency >> scenarios). >> >> It seems kind of silly to do GC's with many multiple >> worker >> threads, but do these thread stack >> walks single-threaded when it is embarrasingly parallel >> (one >> could predicate the parallelization >> based on the measured stack sizes and thread population, >> if >> there was concern on the ovrhead of >> activating and deactivating the thread gangs for the >> work). >> >> A followup question: Any guesses as to how code cache >> sweep/eviction quality might be compromised if one >> were to dispense with these hotness updates entirely >> (or at a >> much reduced frequency), as a temporary >> workaround to the performance problem? >> >> Thoughts/Comments? In particular, has this issue been >> addressed >> perhaps in newer JVMs? >> >> Thanks for any comments, feedback, pointers! >> -- ramki >> >> PS: for comparison, here's data with >> +TraceSafepointCleanup from >> JDK 7 (first, where this isn't done) >> vs JDK 8 (where this is done) with a program that has a >> few >> thousands of threads: >> >> >> >> JDK 7: >> .. >> 2827.308: [sweeping nmethods, 0.0000020 secs] >> 2828.679: [sweeping nmethods, 0.0000030 secs] >> 2829.984: [sweeping nmethods, 0.0000030 secs] >> 2830.956: [sweeping nmethods, 0.0000030 secs] >> .. >> >> JDK 8: >> .. >> 7368.634: [mark nmethods, 0.0177030 secs] >> 7369.587: [mark nmethods, 0.0178305 secs] >> 7370.479: [mark nmethods, 0.0180260 secs] >> 7371.503: [mark nmethods, 0.0186494 secs] >> .. >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Mon Aug 3 07:22:38 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 03 Aug 2015 09:22:38 +0200 Subject: [9] RFR(S): 8132457: Unify command-line flags controlling the usage of compiler intrinsics In-Reply-To: <55BB91B4.805@oracle.com> References: <55BB479E.8000402@oracle.com> <55BB91B4.805@oracle.com> Message-ID: <55BF16BE.7010001@oracle.com> Thank you, Vladimir! Best regards, Zoltan On 07/31/2015 05:18 PM, Vladimir Kozlov wrote: > Very nice cleanup. Thank you, Zoltan. > > Vladimir > > On 7/31/15 3:02 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the following patch for JDK-8132457. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8132457 >> >> Problem: There are four cases when flags controlling intrinsics for >> C1 and C2 behave inconsistently: >> 1) The DisableIntrinsic flag is C2-specific. >> 2) The InlineNatives flag disables most but not all intrinsics. Some >> intrinsics (implemented by both C1 and C2) are >> turned off by -XX:-InlineNatives for C1 but are left on for C2. >> 3) The _getClass intrinsic (implemented by both C1 and C2) is turned >> off by -XX:-InlineClassNatives for C1 and is left >> unaffected by C2. >> 4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, >> _compareAndSwapLong, and _compareAndSwapInt >> intrinsics are turned off by -XX:-InlineUnsafeOps for C2 and are >> unaffected by C1. >> >> >> Solution: Unify command-line flags controlling intrinsic processing. >> Processing of command-line flags is now done only >> in vmIntrinsics::is_disabled_by_flags and there is no >> compiler-specific flag processing. >> >> The inconsistencies listed in the problem description were addressed >> the following way: >> 1) Extend the C1 compiler to consider the DisableIntrinsic flag when >> checking if an intrinsic is available. >> 2) -XX:-InlineNatives turns off most intrinsics but leaves on some >> intrinsics (the same set of intrinsics are left on >> for both C1 and C2). >> 3) -XX:-InlineClassNatives turns off the _getClass intrinsic for both >> C1 and C2. >> 4) -XX:-InlineUnsafeOps turns off the _loadfence, _storefence, >> _fullfence, _compareAndSwapObject, _compareAndSwapLong, >> and _compareAndSwapInt intrinsics for both C1 and C2. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8132457/webrev.00/ >> >> Testing: >> - JPRT run, testset hotspot, all tests pass; >> - all JTREG tests in hotspot/test, all tests pass; >> - local testing of DisableIntrinsic with both C1 and C2. >> >> Thank you and best regards, >> >> >> Zoltan >> From adinn at redhat.com Mon Aug 3 09:28:48 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 10:28:48 +0100 Subject: RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 Message-ID: <55BF3450.5020008@redhat.com> The following /AArch64-only/ webrev fixes some problems introduced into the AArch64 codecache routines by the recent fix for JDK-8130309 committed to to hs-comp http://cr.openjdk.java.net/~adinn/8132875/webrev.00/ With this patch the hs-comp tree compiles and runs correctly on AArch64. Reviews welcome. regards, Andrew Dinn ----------- From aph at redhat.com Mon Aug 3 10:05:12 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 03 Aug 2015 11:05:12 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3450.5020008@redhat.com> References: <55BF3450.5020008@redhat.com> Message-ID: <55BF3CD8.6020905@redhat.com> On 03/08/15 10:28, Andrew Dinn wrote: > With this patch the hs-comp tree compiles and runs correctly on AArch64. > Reviews welcome. That looks right to me. Thanks, Andrew. From adinn at redhat.com Mon Aug 3 11:03:13 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 12:03:13 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3CD8.6020905@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> Message-ID: <55BF4A71.2040308@redhat.com> On 03/08/15 11:05, Andrew Haley wrote: > On 03/08/15 10:28, Andrew Dinn wrote: >> With this patch the hs-comp tree compiles and runs correctly on AArch64. >> Reviews welcome. > > That looks right to me. Thanks for the review. Could someone from the compiler team with the relevant access right also please review and then sponsor this patch for inclusion into hs-comp? It would be good to get this fix into that repo before the original patch goes up into jdk9. Thanks. regards, Andrew Dinn ----------- From tobias.hartmann at oracle.com Mon Aug 3 11:35:01 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 03 Aug 2015 13:35:01 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF4A71.2040308@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> Message-ID: <55BF51E5.5090708@oracle.com> Hi Andrew, thanks for fixing that! Seems like I forgot the manual aarch64 testing for my latest webrev.. The changes look good. I can sponsor and push them into hs-comp after an official reviewer approved them. Best, Tobias On 03.08.2015 13:03, Andrew Dinn wrote: > On 03/08/15 11:05, Andrew Haley wrote: >> On 03/08/15 10:28, Andrew Dinn wrote: >>> With this patch the hs-comp tree compiles and runs correctly on AArch64. >>> Reviews welcome. >> >> That looks right to me. > > Thanks for the review. > > Could someone from the compiler team with the relevant access right also > please review and then sponsor this patch for inclusion into hs-comp? It > would be good to get this fix into that repo before the original patch > goes up into jdk9. Thanks. > > regards, > > > Andrew Dinn > ----------- > From adinn at redhat.com Mon Aug 3 11:42:02 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 12:42:02 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF51E5.5090708@oracle.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> Message-ID: <55BF538A.9080409@redhat.com> Hi Tobias, On 03/08/15 12:35, Tobias Hartmann wrote: > thanks for fixing that! Seems like I forgot the manual aarch64 > testing for my latest webrev.. > > The changes look good. I can sponsor and push them into hs-comp after > an official reviewer approved them. Thanks, Tobias. Do we need another reviewer for an AArch64-only change? If so then could someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on holiday so we don't have another AArch64 port dev to review? Thanks! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From tobias.hartmann at oracle.com Mon Aug 3 13:20:46 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 03 Aug 2015 15:20:46 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF538A.9080409@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> Message-ID: <55BF6AAE.2030101@oracle.com> On 03.08.2015 13:42, Andrew Dinn wrote: > Hi Tobias, > > On 03/08/15 12:35, Tobias Hartmann wrote: >> thanks for fixing that! Seems like I forgot the manual aarch64 >> testing for my latest webrev.. >> >> The changes look good. I can sponsor and push them into hs-comp after >> an official reviewer approved them. > > Thanks, Tobias. > > Do we need another reviewer for an AArch64-only change? If so then could > someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on > holiday so we don't have another AArch64 port dev to review? I think we need at least one JDK 9 reviewer (I'm not an official reviewer). Best, Tobias > > Thanks! > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Mon Aug 3 16:07:03 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2015 09:07:03 -0700 Subject: RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3450.5020008@redhat.com> References: <55BF3450.5020008@redhat.com> Message-ID: <55BF91A7.7030608@oracle.com> Looks good. Thanks, Vladimir On 8/3/15 2:28 AM, Andrew Dinn wrote: > The following /AArch64-only/ webrev fixes some problems introduced into > the AArch64 codecache routines by the recent fix for JDK-8130309 > committed to to hs-comp > > http://cr.openjdk.java.net/~adinn/8132875/webrev.00/ > > With this patch the hs-comp tree compiles and runs correctly on AArch64. > Reviews welcome. > > regards, > > > Andrew Dinn > ----------- > From dmitry.dmitriev at oracle.com Wed Aug 5 16:55:44 2015 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Wed, 5 Aug 2015 19:55:44 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) Message-ID: <55C24010.8030901@oracle.com> Hello, Please review this fix which remove small memory leak in debug build. Also, I need a sponsor for this fix, who can push it. MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed after '__ STOP(buf);'. Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 Tested: JPRT(hotspot test set), hotspot all, vm.quick Thanks, Dmitry From vladimir.kozlov at oracle.com Wed Aug 5 17:54:10 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2015 10:54:10 -0700 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C24010.8030901@oracle.com> References: <55C24010.8030901@oracle.com> Message-ID: <55C24DC2.9030902@oracle.com> Looks good. Note, it is not real memory leak - code does not return from STOP call. It either produce assert and exit or wait to attach debugger (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. Thanks, Vladimir On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: > Hello, > > Please review this fix which remove small memory leak in debug build. Also, I need a sponsor for this fix, who can push it. > > MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed > after '__ STOP(buf);'. > > Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 > Tested: JPRT(hotspot test set), hotspot all, vm.quick > > Thanks, > Dmitry From adinn at redhat.com Wed Aug 5 19:55:26 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 05 Aug 2015 20:55:26 +0100 Subject: RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF91A7.7030608@oracle.com> References: <55BF3450.5020008@redhat.com> <55BF91A7.7030608@oracle.com> Message-ID: <55C26A2E.1060902@redhat.com> On 03/08/15 17:07, Vladimir Kozlov wrote: > Looks good. Thanks for the review Vladimir (and apologies for the delay in replying -- I was traveling for a meeting). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From dmitry.dmitriev at oracle.com Wed Aug 5 21:41:55 2015 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Thu, 6 Aug 2015 00:41:55 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C24DC2.9030902@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> Message-ID: <55C28323.3080907@oracle.com> Hello Vladimir, Thank you for review and explanation! I looked at the code and see that code does not return from STOP and this block executed only when ref kind not equal to expected. But it is possible that debug64 will not be called and execution continues? For example at VM start-up? Here a call chain which I see: JVM_RegisterMethodHandleMethods->MethodHandles::generate_adapters->MethodHandlesAdapterGenerator::generate->MethodHandles::generate_method_handle_interpreter_entry->MethodHandles::verify_ref_kind For quick experiment I add tty->print_cr() to the MethodHandles::verify_ref_kind, MacroAssembler::stop and MacroAssembler::debug64 and see that block with memory allocation is executed in this case, stop method is called, but debug64 is not executed and stop successfully finished. So, it explains why I see memory leak... Correct me if I am wrong. Thanks! Dmitry On 05.08.2015 20:54, Vladimir Kozlov wrote: > Looks good. > > Note, it is not real memory leak - code does not return from STOP > call. It either produce assert and exit or wait to attach debugger > (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. > > Thanks, > Vladimir > > On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >> Hello, >> >> Please review this fix which remove small memory leak in debug build. >> Also, I need a sponsor for this fix, who can push it. >> >> MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' >> is allocated by NEW_C_HEAP_ARRAY but not freed >> after '__ STOP(buf);'. >> >> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >> Tested: JPRT(hotspot test set), hotspot all, vm.quick >> >> Thanks, >> Dmitry From vladimir.kozlov at oracle.com Wed Aug 5 22:13:18 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2015 15:13:18 -0700 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C28323.3080907@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> <55C28323.3080907@oracle.com> Message-ID: <55C28A7E.1090406@oracle.com> I don't see how debug64 is not executed if stop is called: void MacroAssembler::stop(const char* msg) { address rip = pc(); pusha(); // get regs on stack lea(c_rarg0, ExternalAddress((address) msg)); lea(c_rarg1, InternalAddress(rip)); movq(c_rarg2, rsp); // pass pointer to regs array andq(rsp, -16); // align stack as required by ABI call(RuntimeAddress(CAST_FROM_FN_PTR(address, MacroAssembler::debug64))); hlt(); } Looks like you misunderstand how this code works. You can't use tty->print_cr() in these cases. It produce output when that assembler code is *generated* and NOT when it is *executed*. Saying that I realized that your fix is totally wrong. Buffer allocation happens during assembler code generation but it is used when that code is executed. If you free it (during code generation) you will get bad pointer during execution because corresponding memory is freed. In this regards it is NOT memory leak. We need this memory during whole run until JVM exit (end of program). This code is used for adapter generation which are never not removed from CodeCache. Regards, Vladimir On 8/5/15 2:41 PM, Dmitry Dmitriev wrote: > Hello Vladimir, > > Thank you for review and explanation! > > I looked at the code and see that code does not return from STOP and this block executed only when ref kind not equal to > expected. But it is possible that debug64 will not be called and execution continues? For example at VM start-up? Here a > call chain which I see: > JVM_RegisterMethodHandleMethods->MethodHandles::generate_adapters->MethodHandlesAdapterGenerator::generate->MethodHandles::generate_method_handle_interpreter_entry->MethodHandles::verify_ref_kind > > > For quick experiment I add tty->print_cr() to the MethodHandles::verify_ref_kind, MacroAssembler::stop and > MacroAssembler::debug64 and see that block with memory allocation is executed in this case, stop method is called, but > debug64 is not executed and stop successfully finished. So, it explains why I see memory leak... Correct me if I am > wrong. Thanks! > > Dmitry > > On 05.08.2015 20:54, Vladimir Kozlov wrote: >> Looks good. >> >> Note, it is not real memory leak - code does not return from STOP call. It either produce assert and exit or wait to >> attach debugger (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. >> >> Thanks, >> Vladimir >> >> On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >>> Hello, >>> >>> Please review this fix which remove small memory leak in debug build. Also, I need a sponsor for this fix, who can >>> push it. >>> >>> MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed >>> after '__ STOP(buf);'. >>> >>> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >>> Tested: JPRT(hotspot test set), hotspot all, vm.quick >>> >>> Thanks, >>> Dmitry > From vladimir.x.ivanov at oracle.com Wed Aug 5 22:17:41 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 06 Aug 2015 01:17:41 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C24DC2.9030902@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> Message-ID: <55C28B85.9090704@oracle.com> Don't we reference freed memory from generated code after this fix? stop() doesn't copy the message, but uses it as is: void MacroAssembler::stop(const char* msg) { ExternalAddress message((address)msg); // push address of message pushptr(message.addr()); ... } So, JVM can print garbage when hitting STOP if the memory was reused. A proper fix would be to store the message somewhere in corresponding nmethod. Best regards, Vladimir Ivanov On 8/5/15 8:54 PM, Vladimir Kozlov wrote: > Looks good. > > Note, it is not real memory leak - code does not return from STOP call. > It either produce assert and exit or wait to attach debugger > (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. > > Thanks, > Vladimir > > On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >> Hello, >> >> Please review this fix which remove small memory leak in debug build. >> Also, I need a sponsor for this fix, who can push it. >> >> MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' >> is allocated by NEW_C_HEAP_ARRAY but not freed >> after '__ STOP(buf);'. >> >> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >> Tested: JPRT(hotspot test set), hotspot all, vm.quick >> >> Thanks, >> Dmitry From dmitry.dmitriev at oracle.com Wed Aug 5 22:33:16 2015 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Thu, 6 Aug 2015 01:33:16 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C28A7E.1090406@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> <55C28323.3080907@oracle.com> <55C28A7E.1090406@oracle.com> Message-ID: <55C28F2C.7090108@oracle.com> Vladimir, thank you for explanation! That makes things clear. Regards, Dmitry On 06.08.2015 1:13, Vladimir Kozlov wrote: > I don't see how debug64 is not executed if stop is called: > > void MacroAssembler::stop(const char* msg) { > address rip = pc(); > pusha(); // get regs on stack > lea(c_rarg0, ExternalAddress((address) msg)); > lea(c_rarg1, InternalAddress(rip)); > movq(c_rarg2, rsp); // pass pointer to regs array > andq(rsp, -16); // align stack as required by ABI > call(RuntimeAddress(CAST_FROM_FN_PTR(address, > MacroAssembler::debug64))); > hlt(); > } > > Looks like you misunderstand how this code works. You can't use > tty->print_cr() in these cases. It produce output when that assembler > code is *generated* and NOT when it is *executed*. > > Saying that I realized that your fix is totally wrong. Buffer > allocation happens during assembler code generation but it is used > when that code is executed. If you free it (during code generation) > you will get bad pointer during execution because corresponding memory > is freed. > > In this regards it is NOT memory leak. We need this memory during > whole run until JVM exit (end of program). > This code is used for adapter generation which are never not removed > from CodeCache. > > Regards, > Vladimir > > On 8/5/15 2:41 PM, Dmitry Dmitriev wrote: >> Hello Vladimir, >> >> Thank you for review and explanation! >> >> I looked at the code and see that code does not return from STOP and >> this block executed only when ref kind not equal to >> expected. But it is possible that debug64 will not be called and >> execution continues? For example at VM start-up? Here a >> call chain which I see: >> JVM_RegisterMethodHandleMethods->MethodHandles::generate_adapters->MethodHandlesAdapterGenerator::generate->MethodHandles::generate_method_handle_interpreter_entry->MethodHandles::verify_ref_kind >> >> >> >> For quick experiment I add tty->print_cr() to the >> MethodHandles::verify_ref_kind, MacroAssembler::stop and >> MacroAssembler::debug64 and see that block with memory allocation is >> executed in this case, stop method is called, but >> debug64 is not executed and stop successfully finished. So, it >> explains why I see memory leak... Correct me if I am >> wrong. Thanks! >> >> Dmitry >> >> On 05.08.2015 20:54, Vladimir Kozlov wrote: >>> Looks good. >>> >>> Note, it is not real memory leak - code does not return from STOP >>> call. It either produce assert and exit or wait to >>> attach debugger (ShowMessageBoxOnError). See >>> MacroAssembler::debug64() for example. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >>>> Hello, >>>> >>>> Please review this fix which remove small memory leak in debug >>>> build. Also, I need a sponsor for this fix, who can >>>> push it. >>>> >>>> MethodHandles::verify_ref_kind contains memory leak. Memory for >>>> 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed >>>> after '__ STOP(buf);'. >>>> >>>> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >>>> Tested: JPRT(hotspot test set), hotspot all, vm.quick >>>> >>>> Thanks, >>>> Dmitry >> From rickard.backman at oracle.com Thu Aug 6 08:24:41 2015 From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=) Date: Thu, 6 Aug 2015 10:24:41 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF538A.9080409@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> Message-ID: <20150806082441.GK12948@rbackman> Looks good. On 08/03, Andrew Dinn wrote: > Hi Tobias, > > On 03/08/15 12:35, Tobias Hartmann wrote: > > thanks for fixing that! Seems like I forgot the manual aarch64 > > testing for my latest webrev.. > > > > The changes look good. I can sponsor and push them into hs-comp after > > an official reviewer approved them. > > Thanks, Tobias. > > Do we need another reviewer for an AArch64-only change? If so then could > someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on > holiday so we don't have another AArch64 port dev to review? > > Thanks! > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) /R From adinn at redhat.com Thu Aug 6 14:04:49 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 06 Aug 2015 15:04:49 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <20150806082441.GK12948@rbackman> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> <20150806082441.GK12948@rbackman> Message-ID: <55C36981.5070001@redhat.com> On 06/08/15 09:24, Rickard B?ckman wrote: > Looks good. Thanks, Rickard! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From zoltan.majo at oracle.com Fri Aug 7 13:14:44 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 07 Aug 2015 15:14:44 +0200 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently Message-ID: <55C4AF44.3060907@oracle.com> Hi, please review the following patch for JDK-8076373. Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 Problem: On x86_32 systems with XMM instructions available, the compilers and the interpreter behave inconsistently as far as signalling NaNs (sNaNs) are concerned. For example, the following statement|| start == doubleToRawLongBits(longBitsToDouble(start)) can be true or false, assuming that the variable 'start' contains a bit pattern corresponding to a sNaN. The result is true if the statement is executed by compiled code and longBitsToDouble/doubleToRawLongBits have been replaced by compiler intrinsics. The result is false if the native library version of the functions is used (either by compiled or by interpreted code). The inconsistency happens because the interpreter/native ABI relies on x87 instructions to process floating point numbers, whereas the compilers use XMM registers for the same purpose. x87 instructions silently convert signaling NaNs to quiet NaNs, XMM instructions preserve sNaNs. Solution: - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers and therefore preserve sNaNs. The stubs are used by both the interpreter and the compilers. - Change the interpreter to use XMM registers instead of x87 registers to internally cache floating point values. As a result, sNaNs are preserved within the interpreter. Webrev: http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ Testing: - JPRT run, testset hotspot (including the newly added test, NaNTest.java); all tests pass; - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass that pass with the default version of the VM. Thank you and best regards, Zoltan From vladimir.kozlov at oracle.com Fri Aug 7 19:33:14 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 7 Aug 2015 12:33:14 -0700 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently In-Reply-To: <55C4AF44.3060907@oracle.com> References: <55C4AF44.3060907@oracle.com> Message-ID: <55C507FA.1090507@oracle.com> I think this is good. You need second review since changes are big and complex. Thanks, Vladimir On 8/7/15 6:14 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8076373. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 > > > Problem: On x86_32 systems with XMM instructions available, the > compilers and the interpreter behave inconsistently as far as signalling > NaNs (sNaNs) are concerned. For example, the following statement|| > > start == doubleToRawLongBits(longBitsToDouble(start)) > > can be true or false, assuming that the variable 'start' contains a bit > pattern corresponding to a sNaN. > > The result is true if the statement is executed by compiled code and > longBitsToDouble/doubleToRawLongBits have been replaced by compiler > intrinsics. The result is false if the native library version of the > functions is used (either by compiled or by interpreted code). > > The inconsistency happens because the interpreter/native ABI relies on > x87 instructions to process floating point numbers, whereas the > compilers use XMM registers for the same purpose. x87 instructions > silently convert signaling NaNs to quiet NaNs, XMM instructions preserve > sNaNs. > > > Solution: > - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, > java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, > and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers > and therefore preserve sNaNs. The stubs are used by both the interpreter > and the compilers. > - Change the interpreter to use XMM registers instead of x87 registers > to internally cache floating point values. As a result, sNaNs are > preserved within the interpreter. > > > Webrev: > http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ > > Testing: > - JPRT run, testset hotspot (including the newly added test, > NaNTest.java); all tests pass; > - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass > that pass with the default version of the VM. > > Thank you and best regards, > > > Zoltan > From michael.c.berg at intel.com Fri Aug 7 20:37:54 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 7 Aug 2015 20:37:54 +0000 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently In-Reply-To: <55C507FA.1090507@oracle.com> References: <55C4AF44.3060907@oracle.com> <55C507FA.1090507@oracle.com> Message-ID: Zoltan, the code looks ok. I have reviewed it in detail. Thanks, -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Friday, August 07, 2015 12:33 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently I think this is good. You need second review since changes are big and complex. Thanks, Vladimir On 8/7/15 6:14 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8076373. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 > > > Problem: On x86_32 systems with XMM instructions available, the > compilers and the interpreter behave inconsistently as far as > signalling NaNs (sNaNs) are concerned. For example, the following > statement|| > > start == doubleToRawLongBits(longBitsToDouble(start)) > > can be true or false, assuming that the variable 'start' contains a > bit pattern corresponding to a sNaN. > > The result is true if the statement is executed by compiled code and > longBitsToDouble/doubleToRawLongBits have been replaced by compiler > intrinsics. The result is false if the native library version of the > functions is used (either by compiled or by interpreted code). > > The inconsistency happens because the interpreter/native ABI relies on > x87 instructions to process floating point numbers, whereas the > compilers use XMM registers for the same purpose. x87 instructions > silently convert signaling NaNs to quiet NaNs, XMM instructions > preserve sNaNs. > > > Solution: > - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, > java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, > and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers > and therefore preserve sNaNs. The stubs are used by both the > interpreter and the compilers. > - Change the interpreter to use XMM registers instead of x87 registers > to internally cache floating point values. As a result, sNaNs are > preserved within the interpreter. > > > Webrev: > http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ > > Testing: > - JPRT run, testset hotspot (including the newly added test, > NaNTest.java); all tests pass; > - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass > that pass with the default version of the VM. > > Thank you and best regards, > > > Zoltan > From ahmed.khawaja at oracle.com Fri Aug 7 20:44:39 2015 From: ahmed.khawaja at oracle.com (Ahmed Khawaja) Date: Fri, 7 Aug 2015 13:44:39 -0700 Subject: Safepointing in HotSpot Message-ID: <55C518B7.3010006@oracle.com> Greetings, I am looking into when HotSpot decides to insert code for safepointing. My goal is to understand the decision process of when a safepoint is inserted and also to relay to an analysis tool that a certain instruction was inserted due to safepointing. I am looking into what criteria merit the insertion of a safepoint and how code can be optimized to avoid that. Can anyone point me in the direction of the source code in HotSpot responsible for this? I am able to identify manually the code sequences that result in a safepoint and realize they must be inserted somewhere before code motion is applied since they don't always show up as contiguous instructions. Thank you, Ahemd Khawaja From aleksey.shipilev at oracle.com Mon Aug 10 08:17:08 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Aug 2015 11:17:08 +0300 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B8FFA0.4070105@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> <55B8DC9C.7010003@oracle.com> <55B8FFA0.4070105@oracle.com> Message-ID: <55C85E04.7060002@oracle.com> On 07/29/2015 07:30 PM, Dean Long wrote: > On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>> >>>> Andrew/Edward, are you OK with AArch64 part? >>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>> I agree that it looks good. >> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >> Andrew Haley. Still no Capital (R)eviewers. >> >> Otherwise, I think we are good to go. I respinned the JPRT with >> open+closed sources, and it would seem the changes in closed sources are >> not required. > > The changes to sparc and ppc may not be required anymore. Excellent, please sponsor! http://cr.openjdk.java.net/~shade/8131682/8131682.changeset Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Mon Aug 10 09:13:37 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Aug 2015 12:13:37 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55BAE566.5020904@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> Message-ID: <55C86B41.9010909@oracle.com> Hi Vladimir! On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: > I think the test is wrong. It should be: > > if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); Um, no? I remember eyeballing the assembly to confirm this. For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store seems to have a boolean value, but "oldval" is oop. In other words, "load_store != 0" tests "(boolean)load_store != false". Current VM produces: 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d ; CAS fail, jump to respin ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 Patched VM piggybacks on the same result: 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d ; CAS success, jump to store barrier ??? 0x00007fe618af4125: jne 0x00007fe618af4070 ; CAS fail, jump to respin ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 Your suggestion seems to ignore the test completely (GVN helped?), and while it's still technically correct with emitting the barrier always, it defeats the purpose of the change: 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b ???? 0x00007f7790aefd36: mov $0x0,%eax 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax ; CAS fail, jump back to respin ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 ; CAS success, follow to exit Also, AFAIU, performance results would look different if we screwed the success check. But they seem to be coherent with our expectations: when CAS fails, either the conditional card marking or this change helps, and the change does not help when CAS succeeds. Thanks, -Aleksey > Thanks, > Vladimir > > On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>> I would like to suggest a fix for: >>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>> >>>> In short, current reference CAS intrinsic blindly emits >>>> post_barrier, ignoring the CAS result. In some cases, notably >>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>> post_barrier excessively. Instead, we can conditionalize on the >>>> result of the store itself, and put the post_barrier only on >>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>> >>>> More performance results here: >>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>> >>> Nice! The code looks fine and your test results are very convincing. >>> I'll be interested to see how this looks on AArch64. >> >> Thanks Andrew! >> >> The change passes JPRT, so AArch64 build is available. The benchmark JAR >> mentioned in the issue comments would run without intervention, taking >> around 40 minutes. You are very welcome to try, while Reviewers are >> taking a look. I can do that only next week. >> >>> That said, I am afraid you still need a Reviewer! >> >> That reminds me I haven't spelled out what testing was done: >> >> * JPRT on all open platforms >> * Targeted benchmarks >> * Eyeballing the generated x86 assembly >> >> Thanks, >> -Aleksey >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dean.long at oracle.com Mon Aug 10 19:42:48 2015 From: dean.long at oracle.com (Dean) Date: Mon, 10 Aug 2015 12:42:48 -0700 Subject: aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? dl Aleksey Shipilev wrote: >On 07/29/2015 07:30 PM, Dean Long wrote: >> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>> >>>>> Andrew/Edward, are you OK with AArch64 part? >>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>> I agree that it looks good. >>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>> Andrew Haley. Still no Capital (R)eviewers. >>> >>> Otherwise, I think we are good to go. I respinned the JPRT with >>> open+closed sources, and it would seem the changes in closed sources are >>> not required. >> >> The changes to sparc and ppc may not be required anymore. > >Excellent, please sponsor! > http://cr.openjdk.java.net/~shade/8131682/8131682.changeset > >Thanks, >-Aleksey > > From dean.long at oracle.com Mon Aug 10 19:57:03 2015 From: dean.long at oracle.com (Dean) Date: Mon, 10 Aug 2015 12:57:03 -0700 Subject: aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: Did you get a Reviewer yet? dl Dean wrote: >I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? > >dl > > >Aleksey Shipilev wrote: >>On 07/29/2015 07:30 PM, Dean Long wrote: >>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>> >>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>> I agree that it looks good. >>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>> Andrew Haley. Still no Capital (R)eviewers. >>>> >>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>> open+closed sources, and it would seem the changes in closed sources are >>>> not required. >>> >>> The changes to sparc and ppc may not be required anymore. >> >>Excellent, please sponsor! >> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >> >>Thanks, >>-Aleksey >> >> From vladimir.kozlov at oracle.com Tue Aug 11 02:21:33 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Aug 2015 19:21:33 -0700 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55C86B41.9010909@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> Message-ID: <55C95C2D.9050900@oracle.com> My bad, I forgot that CompareAndSwapP assembler code produces Boolean value in register. I mistook it for StorePConditional which produces flag. But I think you can get better code since you want to generate test and main point of having specialized CompareAndSwapP is to avoid test instruction. If we use StorePConditional instead of CompareAndSwapP we may remove second branch: > ??? 0x00007fa06809cdd1: test %r11d,%r11d > ; CAS fail, jump to respin > ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 > ; CAS success, jump to store barrier > But C2 changes will be much larger. We would need new Ideal::if_then() which take in result of StorePConditional and set load_store on both paths to 0/1. We may need to play with probability of if_then() to get barrier in follow code. Thanks, Vladimir On 8/10/15 2:13 AM, Aleksey Shipilev wrote: > Hi Vladimir! > > On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >> I think the test is wrong. It should be: >> >> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); > > Um, no? I remember eyeballing the assembly to confirm this. > > For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store > seems to have a boolean value, but "oldval" is oop. In other words, > "load_store != 0" tests "(boolean)load_store != false". > > Current VM produces: > > 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) > 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b > 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d > > 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d > > ; CAS fail, jump to respin > ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 > > > Patched VM piggybacks on the same result: > > 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) > 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b > 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d > 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d > > ; CAS success, jump to store barrier > ??? 0x00007fe618af4125: jne 0x00007fe618af4070 > > ; CAS fail, jump to respin > ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 > > > Your suggestion seems to ignore the test completely (GVN helped?), and > while it's still technically correct with emitting the barrier always, > it defeats the purpose of the change: > > 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) > 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax > 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b > ???? 0x00007f7790aefd36: mov $0x0,%eax > > 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax > > ; CAS fail, jump back to respin > ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 > > ; CAS success, follow to exit > > Also, AFAIU, performance results would look different if we screwed the > success check. But they seem to be coherent with our expectations: when > CAS fails, either the conditional card marking or this change helps, and > the change does not help when CAS succeeds. > > Thanks, > -Aleksey > >> Thanks, >> Vladimir >> >> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>> I would like to suggest a fix for: >>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>> >>>>> In short, current reference CAS intrinsic blindly emits >>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>> result of the store itself, and put the post_barrier only on >>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>> >>>>> More performance results here: >>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>> >>>> Nice! The code looks fine and your test results are very convincing. >>>> I'll be interested to see how this looks on AArch64. >>> >>> Thanks Andrew! >>> >>> The change passes JPRT, so AArch64 build is available. The benchmark JAR >>> mentioned in the issue comments would run without intervention, taking >>> around 40 minutes. You are very welcome to try, while Reviewers are >>> taking a look. I can do that only next week. >>> >>>> That said, I am afraid you still need a Reviewer! >>> >>> That reminds me I haven't spelled out what testing was done: >>> >>> * JPRT on all open platforms >>> * Targeted benchmarks >>> * Eyeballing the generated x86 assembly >>> >>> Thanks, >>> -Aleksey >>> >>> > > From aleksey.shipilev at oracle.com Tue Aug 11 09:22:58 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Aug 2015 12:22:58 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55C95C2D.9050900@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> Message-ID: <55C9BEF2.2030100@oracle.com> Hi Vladimir, My previous disassembly demonstrated the code generated for CAS spinloop. There, it's easy to confuse the "second" branch with a proper backbranch in the loop. Here is the disassembly for the "one-off" failing CAS with patched VM: ? 0x00007fa3acba446c: lock cmpxchg %r11d,(%r10) 46.63% 83.18% ? 0x00007fa3acba4471: sete %r8b 0.03% ? 0x00007fa3acba4475: movzbl %r8b,%r8d 2.23% ? 0x00007fa3acba4479: test %r8d,%r8d <- removable? ?? 0x00007fa3acba447c: je 0x00007fa3acba4490 ?? 0x00007fa3acba447e: shr $0x9,%r10 ?? 0x00007fa3acba4482: movabs $0x7fa3a0dbf000,%r11 ?? 0x00007fa3acba448c: mov %r12b,(%r11,%r10,1) 0.93% ?? 0x00007fa3acba4490: mov %r8d,%eax 0.04% ? 0x00007fa3acba4493: add $0x20,%rsp 1.05% ? 0x00007fa3acba4497: pop %rbp 0.98% ? 0x00007fa3acba4498: test %eax,0x11af2b62(%rip) ? ? 0x00007fa3acba449e: retq ...compare this to baseline VM that does an unconditional barrier: 2.31% 3.64% ? 0x00007fcf595fd4f9: lock cmpxchg %r10d,(%r11) 43.22% 78.37% ? 0x00007fcf595fd4fe: sete %r8b 0.04% ? 0x00007fcf595fd502: movzbl %r8b,%r8d 2.20% ? 0x00007fcf595fd506: mov %r11,%r10 ? 0x00007fcf595fd509: shr $0x9,%r10 ? 0x00007fcf595fd50d: movabs $0x7fcf4dd0c000,%r11 ? 0x00007fcf595fd517: mov %r12b,(%r11,%r10,1) 2.20% ? 0x00007fcf595fd51b: mov %r8d,%eax ? 0x00007fcf595fd51e: add $0x20,%rsp ? 0x00007fcf595fd522: pop %rbp 1.82% ? 0x00007fcf595fd523: test %eax,0x12383ad7(%rip) ? ? 0x00007fcf595fd529: retq Well, yeah, I can see that test at 0x00007fa3acba4479 is avoidable, since cmpxchg already sets the flag. But, I doubt it actually matters, since: a) test-je are routinely macrofused into single uop on modern x86; b) the flag is materialized in register anyway for method return; c) as you predicted, my quick exploration blows up considerably; Notably, handling native oops require missing StoreNConditionalNode, which spreads all the way to AD and various places in compiler that match StorePConditionalNode. Also, my naive attempts of using Ideal to pick up StoreNConditional result and produce 0/1 yields full branches, not the "sete" that is coming from CompareAndSwapP AD encoding -- with terrible performance results. With that, I think we should play it safe, and push the existing obviously correct version that improves performance a lot, instead of blowing up the complexity for purely theoretical improvement. Thanks, -Aleksey On 08/11/2015 05:21 AM, Vladimir Kozlov wrote: > My bad, I forgot that CompareAndSwapP assembler code produces Boolean > value in register. I mistook it for StorePConditional which produces flag. > > But I think you can get better code since you want to generate test and > main point of having specialized CompareAndSwapP is to avoid test > instruction. > > If we use StorePConditional instead of CompareAndSwapP we may remove > second branch: > >> ??? 0x00007fa06809cdd1: test %r11d,%r11d >> ; CAS fail, jump to respin >> ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 >> ; CAS success, jump to store barrier >> > > But C2 changes will be much larger. We would need new Ideal::if_then() > which take in result of StorePConditional and set load_store on both > paths to 0/1. > > We may need to play with probability of if_then() to get barrier in > follow code. > > Thanks, > Vladimir > > On 8/10/15 2:13 AM, Aleksey Shipilev wrote: >> Hi Vladimir! >> >> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >>> I think the test is wrong. It should be: >>> >>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); >> >> Um, no? I remember eyeballing the assembly to confirm this. >> >> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store >> seems to have a boolean value, but "oldval" is oop. In other words, >> "load_store != 0" tests "(boolean)load_store != false". >> >> Current VM produces: >> >> 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) >> 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b >> 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d >> >> 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d >> >> ; CAS fail, jump to respin >> ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 >> >> >> Patched VM piggybacks on the same result: >> >> 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) >> 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b >> 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d >> 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d >> >> ; CAS success, jump to store barrier >> ??? 0x00007fe618af4125: jne 0x00007fe618af4070 >> >> ; CAS fail, jump to respin >> ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 >> >> >> Your suggestion seems to ignore the test completely (GVN helped?), and >> while it's still technically correct with emitting the barrier always, >> it defeats the purpose of the change: >> >> 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) >> 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax >> 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b >> ???? 0x00007f7790aefd36: mov $0x0,%eax >> >> 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax >> >> ; CAS fail, jump back to respin >> ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 >> >> ; CAS success, follow to exit >> >> Also, AFAIU, performance results would look different if we screwed the >> success check. But they seem to be coherent with our expectations: when >> CAS fails, either the conditional card marking or this change helps, and >> the change does not help when CAS succeeds. >> >> Thanks, >> -Aleksey >> >>> Thanks, >>> Vladimir >>> >>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>>> I would like to suggest a fix for: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>>> >>>>>> In short, current reference CAS intrinsic blindly emits >>>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>>> result of the store itself, and put the post_barrier only on >>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>>> >>>>>> More performance results here: >>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>>> >>>>> Nice! The code looks fine and your test results are very convincing. >>>>> I'll be interested to see how this looks on AArch64. >>>> >>>> Thanks Andrew! >>>> >>>> The change passes JPRT, so AArch64 build is available. The benchmark >>>> JAR >>>> mentioned in the issue comments would run without intervention, taking >>>> around 40 minutes. You are very welcome to try, while Reviewers are >>>> taking a look. I can do that only next week. >>>> >>>>> That said, I am afraid you still need a Reviewer! >>>> >>>> That reminds me I haven't spelled out what testing was done: >>>> >>>> * JPRT on all open platforms >>>> * Targeted benchmarks >>>> * Eyeballing the generated x86 assembly >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Tue Aug 11 09:27:38 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Aug 2015 12:27:38 +0300 Subject: aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: References: Message-ID: <55C9C00A.3040302@oracle.com> Hi Dean, Ah yes, since we now use MacroAssembler::align to produce the effective alignment, we can drop the platform-specific changes. ARM and PPC ports may rewire their own MacroAssemblers if there are potentially better nop sequences. New changeset: http://cr.openjdk.java.net/~shade/8131682/8131682.changeset Tested it builds and runs with full JPRT. See the "Reviewed-by" line there. I think there are Reviewers there... Thanks, -Aleksey On 08/10/2015 10:57 PM, Dean wrote: > Did you get a Reviewer yet? > > dl > > > Dean wrote: >> I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? >> >> dl >> >> >> Aleksey Shipilev wrote: >>> On 07/29/2015 07:30 PM, Dean Long wrote: >>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>>> >>>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>>> I agree that it looks good. >>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>>> Andrew Haley. Still no Capital (R)eviewers. >>>>> >>>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>>> open+closed sources, and it would seem the changes in closed sources are >>>>> not required. >>>> >>>> The changes to sparc and ppc may not be required anymore. >>> >>> Excellent, please sponsor! >>> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >>> >>> Thanks, >>> -Aleksey >>> >>> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dawid.weiss at gmail.com Tue Aug 11 14:25:48 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Tue, 11 Aug 2015 16:25:48 +0200 Subject: Transient miscompilation problem on 1.8 (invalid AIOOB/NPE thrown from the method body). Message-ID: Hello, We have encountered a transient miscompilation problem (on 1.8u40). We get an AIOOB exception from a snippet of code which (provably) cannot throw it. The AIOOB is thrown without a stack trace. What's interesting is that when we set: -XX:-OmitStackTraceInFastThrow we get an NPE exception (which, again, is provably impossible at Java code level). The problem does not reproduce on my machine with i7 3770K (at least so far), but does reproduce consistently on i7 2600K (and our customer's machine; exact spec unknown). I will be looking into isolating this issue as it is in our proprietary code, but the pattern seems to be as follows: 1) new instance of A is created, with a new instance of B, which is a single-implementation of interface C. 2) there is a tight loop which calls A (and B) methods. There is no way for an AIOOB (or NPE) to be present in any of A or B, but the stack trace indicates A. I suspect an OSR miscompilation somewhere, but since I can't reproduce it locally it's a bit of a problem to experiment with JVM versions and internal flags. Any hints on what it can be related to (flags to try, etc.) would be appreciated. Dawid From edward.nevill at gmail.com Tue Aug 11 15:57:33 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 11 Aug 2015 16:57:33 +0100 Subject: 8133352: aarch64: generates constrained unpredictable instructions Message-ID: <1439308653.5920.16.camel@mylittlepony.linaroharston> Hi, Webrev http://cr.openjdk.java.net/~enevill/8133352/ fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions. The two cases being generates are STXR Rs, Rt, [Rn] where Rs == Rt and LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code) On the particular vendors HW the behavior for these instructions is to generate a SIGILL. Unfortunately the fix for this is non trivial, the reason being that STXR Rs, Rt, [Rn] requires Rs != Rt != Rn however we only have 2 scratch registers. The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes. The alternative solution would be create a temp by pushing a register on the stack. I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it. Thanks for your help, Ed. From vladimir.kozlov at oracle.com Tue Aug 11 15:58:31 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 08:58:31 -0700 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55C9BEF2.2030100@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> Message-ID: <55CA1BA7.4080907@oracle.com> Thank you for doing additional experiments, Aleksey, and explanation. Now I agree with your changes. Reviewed. Thanks, Vladimir On 8/11/15 2:22 AM, Aleksey Shipilev wrote: > Hi Vladimir, > > My previous disassembly demonstrated the code generated for CAS > spinloop. There, it's easy to confuse the "second" branch with a proper > backbranch in the loop. Here is the disassembly for the "one-off" > failing CAS with patched VM: > > ? 0x00007fa3acba446c: lock cmpxchg %r11d,(%r10) > 46.63% 83.18% ? 0x00007fa3acba4471: sete %r8b > 0.03% ? 0x00007fa3acba4475: movzbl %r8b,%r8d > 2.23% ? 0x00007fa3acba4479: test %r8d,%r8d <- removable? > ?? 0x00007fa3acba447c: je 0x00007fa3acba4490 > ?? 0x00007fa3acba447e: shr $0x9,%r10 > ?? 0x00007fa3acba4482: movabs $0x7fa3a0dbf000,%r11 > ?? 0x00007fa3acba448c: mov %r12b,(%r11,%r10,1) > 0.93% ?? 0x00007fa3acba4490: mov %r8d,%eax > 0.04% ? 0x00007fa3acba4493: add $0x20,%rsp > 1.05% ? 0x00007fa3acba4497: pop %rbp > 0.98% ? 0x00007fa3acba4498: test %eax,0x11af2b62(%rip) > ? > ? 0x00007fa3acba449e: retq > > ...compare this to baseline VM that does an unconditional barrier: > > 2.31% 3.64% ? 0x00007fcf595fd4f9: lock cmpxchg %r10d,(%r11) > 43.22% 78.37% ? 0x00007fcf595fd4fe: sete %r8b > 0.04% ? 0x00007fcf595fd502: movzbl %r8b,%r8d > 2.20% ? 0x00007fcf595fd506: mov %r11,%r10 > ? 0x00007fcf595fd509: shr $0x9,%r10 > ? 0x00007fcf595fd50d: movabs $0x7fcf4dd0c000,%r11 > ? 0x00007fcf595fd517: mov %r12b,(%r11,%r10,1) > 2.20% ? 0x00007fcf595fd51b: mov %r8d,%eax > ? 0x00007fcf595fd51e: add $0x20,%rsp > ? 0x00007fcf595fd522: pop %rbp > 1.82% ? 0x00007fcf595fd523: test %eax,0x12383ad7(%rip) > ? > ? 0x00007fcf595fd529: retq > > Well, yeah, I can see that test at 0x00007fa3acba4479 is avoidable, > since cmpxchg already sets the flag. But, I doubt it actually matters, > since: > a) test-je are routinely macrofused into single uop on modern x86; > b) the flag is materialized in register anyway for method return; > c) as you predicted, my quick exploration blows up considerably; > > Notably, handling native oops require missing StoreNConditionalNode, > which spreads all the way to AD and various places in compiler that > match StorePConditionalNode. Also, my naive attempts of using Ideal to > pick up StoreNConditional result and produce 0/1 yields full branches, > not the "sete" that is coming from CompareAndSwapP AD encoding -- with > terrible performance results. > > With that, I think we should play it safe, and push the existing > obviously correct version that improves performance a lot, instead of > blowing up the complexity for purely theoretical improvement. > > Thanks, > -Aleksey > > On 08/11/2015 05:21 AM, Vladimir Kozlov wrote: >> My bad, I forgot that CompareAndSwapP assembler code produces Boolean >> value in register. I mistook it for StorePConditional which produces flag. >> >> But I think you can get better code since you want to generate test and >> main point of having specialized CompareAndSwapP is to avoid test >> instruction. >> >> If we use StorePConditional instead of CompareAndSwapP we may remove >> second branch: >> >>> ??? 0x00007fa06809cdd1: test %r11d,%r11d >>> ; CAS fail, jump to respin >>> ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 >>> ; CAS success, jump to store barrier >>> >> >> But C2 changes will be much larger. We would need new Ideal::if_then() >> which take in result of StorePConditional and set load_store on both >> paths to 0/1. >> >> We may need to play with probability of if_then() to get barrier in >> follow code. >> >> Thanks, >> Vladimir >> >> On 8/10/15 2:13 AM, Aleksey Shipilev wrote: >>> Hi Vladimir! >>> >>> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >>>> I think the test is wrong. It should be: >>>> >>>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); >>> >>> Um, no? I remember eyeballing the assembly to confirm this. >>> >>> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store >>> seems to have a boolean value, but "oldval" is oop. In other words, >>> "load_store != 0" tests "(boolean)load_store != false". >>> >>> Current VM produces: >>> >>> 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) >>> 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b >>> 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d >>> >>> 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d >>> >>> ; CAS fail, jump to respin >>> ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 >>> >>> >>> Patched VM piggybacks on the same result: >>> >>> 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) >>> 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b >>> 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d >>> 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d >>> >>> ; CAS success, jump to store barrier >>> ??? 0x00007fe618af4125: jne 0x00007fe618af4070 >>> >>> ; CAS fail, jump to respin >>> ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 >>> >>> >>> Your suggestion seems to ignore the test completely (GVN helped?), and >>> while it's still technically correct with emitting the barrier always, >>> it defeats the purpose of the change: >>> >>> 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) >>> 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax >>> 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b >>> ???? 0x00007f7790aefd36: mov $0x0,%eax >>> >>> 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax >>> >>> ; CAS fail, jump back to respin >>> ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 >>> >>> ; CAS success, follow to exit >>> >>> Also, AFAIU, performance results would look different if we screwed the >>> success check. But they seem to be coherent with our expectations: when >>> CAS fails, either the conditional card marking or this change helps, and >>> the change does not help when CAS succeeds. >>> >>> Thanks, >>> -Aleksey >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>>>> I would like to suggest a fix for: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>>>> >>>>>>> In short, current reference CAS intrinsic blindly emits >>>>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>>>> result of the store itself, and put the post_barrier only on >>>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>>>> >>>>>>> More performance results here: >>>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>>>> >>>>>> Nice! The code looks fine and your test results are very convincing. >>>>>> I'll be interested to see how this looks on AArch64. >>>>> >>>>> Thanks Andrew! >>>>> >>>>> The change passes JPRT, so AArch64 build is available. The benchmark >>>>> JAR >>>>> mentioned in the issue comments would run without intervention, taking >>>>> around 40 minutes. You are very welcome to try, while Reviewers are >>>>> taking a look. I can do that only next week. >>>>> >>>>>> That said, I am afraid you still need a Reviewer! >>>>> >>>>> That reminds me I haven't spelled out what testing was done: >>>>> >>>>> * JPRT on all open platforms >>>>> * Targeted benchmarks >>>>> * Eyeballing the generated x86 assembly >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >>>>> >>> >>> > > From vladimir.kozlov at oracle.com Tue Aug 11 16:55:08 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 09:55:08 -0700 Subject: 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439308653.5920.16.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> Message-ID: <55CA28EC.7060109@oracle.com> I think it depends how expensive push/pop on arm64. In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in .ad). So you are saving on stack anyway. On other hand your changes (third temp) are not so big and I think acceptable. Thanks, Vladimir On 8/11/15 8:57 AM, Edward Nevill wrote: > Hi, > > Webrev http://cr.openjdk.java.net/~enevill/8133352/ > > fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions. > > The two cases being generates are > > STXR Rs, Rt, [Rn] where Rs == Rt > > and > > LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code) > > On the particular vendors HW the behavior for these instructions is to generate a SIGILL. > > Unfortunately the fix for this is non trivial, the reason being that > > STXR Rs, Rt, [Rn] > > requires Rs != Rt != Rn however we only have 2 scratch registers. > > The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes. > > The alternative solution would be create a temp by pushing a register on the stack. > > I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it. > > Thanks for your help, > Ed. > > From dawid.weiss at gmail.com Tue Aug 11 21:27:08 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Tue, 11 Aug 2015 23:27:08 +0200 Subject: Transient miscompilation problem on 1.8 (invalid AIOOB/NPE thrown from the method body). In-Reply-To: References: Message-ID: We tried to narrow it down. The problem is tied to tiered compilation somehow because turning it off makes the test pass with flying colors: # 1.8.0_45-b14 PASSES -Xint PASSES -Xmx4g -Xbatch -XX:CICompilerCount=1 -XX:-TieredCompilation PASSES -Xmx4g -XX:-TieredCompilation FAILS -Xmx4g -XX:+TieredCompilation FAILS -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation What's more interesting is that 1.9 and the most recent ea of 1.8 (u60) also pass, even with tiered compilation turned on: # 1.9.0-ea-b71 PASSES -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation # 1.8.0_60-ea-b25 PASSES [always, regardless of options] I can't tell whether it's something masking the original problem or whether the bug has been fixed in between. I looked at JIRA logs, but can't find anything specific. If somebody knows what this could be, I'd appreciate a pointer. Dawid On Tue, Aug 11, 2015 at 4:25 PM, Dawid Weiss wrote: > Hello, > > We have encountered a transient miscompilation problem (on 1.8u40). We > get an AIOOB exception from a snippet of code which (provably) cannot > throw it. The AIOOB is thrown without a stack trace. What's > interesting is that when we set: > > -XX:-OmitStackTraceInFastThrow > > we get an NPE exception (which, again, is provably impossible at Java > code level). > > The problem does not reproduce on my machine with i7 3770K (at least > so far), but does reproduce consistently on i7 2600K (and our > customer's machine; exact spec unknown). > > I will be looking into isolating this issue as it is in our > proprietary code, but the pattern seems to be as follows: > > 1) new instance of A is created, with a new instance of B, which is a > single-implementation of interface C. > > 2) there is a tight loop which calls A (and B) methods. > > There is no way for an AIOOB (or NPE) to be present in any of A or B, > but the stack trace indicates A. > > I suspect an OSR miscompilation somewhere, but since I can't reproduce > it locally it's a bit of a problem to experiment with JVM versions and > internal flags. > > Any hints on what it can be related to (flags to try, etc.) would be > appreciated. > > Dawid From vladimir.kozlov at oracle.com Tue Aug 11 23:57:59 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 16:57:59 -0700 Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of Compile::unique() in appropriate places In-Reply-To: <55B9EF79.1040907@oracle.com> References: <55A9AAB6.50505@oracle.com> <55B9EF79.1040907@oracle.com> Message-ID: <55CA8C07.2080404@oracle.com> I pushed changes: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/af60f1cb36f2 Thanks, Vladimir K On 7/30/15 2:33 AM, Vladimir Ivanov wrote: > Looks good. > I'll sponsor the change. > > Best regards, > Vladimir Ivanov > > On 7/18/15 4:24 AM, Vladimir Kozlov wrote: >> Thank you, Vlad >> >> It looks good. We usually don't put bug id into comments. So your >> previous version on cr.openjdk is fine. >> >> Second reviewer should look on and sponsor it with you listed as >> contributor (I see you signed OCA already). >> >> Thanks, >> Vladimir >> >> On 7/17/15 3:47 PM, Vlad Ureche wrote: >>> Hi, >>> >>> Please review the following patch for JDK-8011858. Big thanks to >>> Vladimir Kozlov for his patient guidance while working on this! >>> >>> *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858 >>> >>> *Problem:* Throughout C2, local stacks are used to prevent recursive >>> calls from blowing up the system stack. These are sized based on the >>> total number of nodes in the compilation run (e.g. C->unique()). >>> Instead, they should be sized based on the live node count >>> (C->live_nodes()). >>> >>> Now, with the increased difference between live_nodes (limited at >>> LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go >>> up to 240K), it is important to not over-estimate the size of stacks. >>> >>> *Solution:* This patch mirrors a patch written by Vladimir Kozlov for >>> JDK8u. It replaces the initial sizes from C->unique() to >>> C->live_nodes(), preserving any shifts (divisions) and offsets. For >>> example, in the compile.cpp patch >>> : >>> >>> >>> >>> |- Node_Stack nstack(unique() >> 1); >>> + Node_Stack nstack(live_nodes() >> 1); >>> | >>> >>> There is an issue described at >>> https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the >>> workaround from Vladimir?s patch. >>> >>> *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or >>> http://vladureche.ro/webrev/8011858 >>> (updated, includes a link to bug >>> 8121702) >>> >>> *Tests:* Running jtreg with the compiler, runtime and gc tests on the >>> dev branch shows the same status >>> before and after the patch: 808 tests passed, 16 failed and 6 errors >>> . What >>> would be a stable point where all tests are expected to pass, so I can >>> test the patch there? Maybe jdk9 ? >>> >>> Thanks, >>> Vlad >>> From aleksey.shipilev at oracle.com Wed Aug 12 07:04:12 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 10:04:12 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55CA1BA7.4080907@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> Message-ID: <55CAEFEC.6090005@oracle.com> Thanks, Vladimir! Here's a changeset: http://cr.openjdk.java.net/~shade/8019968/8019968.changeset Please sponsor! -Aleksey On 08/11/2015 06:58 PM, Vladimir Kozlov wrote: > Thank you for doing additional experiments, Aleksey, and explanation. > Now I agree with your changes. Reviewed. > > Thanks, > Vladimir > > On 8/11/15 2:22 AM, Aleksey Shipilev wrote: >> Hi Vladimir, >> >> My previous disassembly demonstrated the code generated for CAS >> spinloop. There, it's easy to confuse the "second" branch with a proper >> backbranch in the loop. Here is the disassembly for the "one-off" >> failing CAS with patched VM: >> >> ? 0x00007fa3acba446c: lock cmpxchg %r11d,(%r10) >> 46.63% 83.18% ? 0x00007fa3acba4471: sete %r8b >> 0.03% ? 0x00007fa3acba4475: movzbl %r8b,%r8d >> 2.23% ? 0x00007fa3acba4479: test %r8d,%r8d <- removable? >> ?? 0x00007fa3acba447c: je 0x00007fa3acba4490 >> ?? 0x00007fa3acba447e: shr $0x9,%r10 >> ?? 0x00007fa3acba4482: movabs $0x7fa3a0dbf000,%r11 >> ?? 0x00007fa3acba448c: mov %r12b,(%r11,%r10,1) >> 0.93% ?? 0x00007fa3acba4490: mov %r8d,%eax >> 0.04% ? 0x00007fa3acba4493: add $0x20,%rsp >> 1.05% ? 0x00007fa3acba4497: pop %rbp >> 0.98% ? 0x00007fa3acba4498: test %eax,0x11af2b62(%rip) >> ? >> ? 0x00007fa3acba449e: retq >> >> ...compare this to baseline VM that does an unconditional barrier: >> >> 2.31% 3.64% ? 0x00007fcf595fd4f9: lock cmpxchg %r10d,(%r11) >> 43.22% 78.37% ? 0x00007fcf595fd4fe: sete %r8b >> 0.04% ? 0x00007fcf595fd502: movzbl %r8b,%r8d >> 2.20% ? 0x00007fcf595fd506: mov %r11,%r10 >> ? 0x00007fcf595fd509: shr $0x9,%r10 >> ? 0x00007fcf595fd50d: movabs $0x7fcf4dd0c000,%r11 >> ? 0x00007fcf595fd517: mov %r12b,(%r11,%r10,1) >> 2.20% ? 0x00007fcf595fd51b: mov %r8d,%eax >> ? 0x00007fcf595fd51e: add $0x20,%rsp >> ? 0x00007fcf595fd522: pop %rbp >> 1.82% ? 0x00007fcf595fd523: test %eax,0x12383ad7(%rip) >> ? >> ? 0x00007fcf595fd529: retq >> >> Well, yeah, I can see that test at 0x00007fa3acba4479 is avoidable, >> since cmpxchg already sets the flag. But, I doubt it actually matters, >> since: >> a) test-je are routinely macrofused into single uop on modern x86; >> b) the flag is materialized in register anyway for method return; >> c) as you predicted, my quick exploration blows up considerably; >> >> Notably, handling native oops require missing StoreNConditionalNode, >> which spreads all the way to AD and various places in compiler that >> match StorePConditionalNode. Also, my naive attempts of using Ideal to >> pick up StoreNConditional result and produce 0/1 yields full branches, >> not the "sete" that is coming from CompareAndSwapP AD encoding -- with >> terrible performance results. >> >> With that, I think we should play it safe, and push the existing >> obviously correct version that improves performance a lot, instead of >> blowing up the complexity for purely theoretical improvement. >> >> Thanks, >> -Aleksey >> >> On 08/11/2015 05:21 AM, Vladimir Kozlov wrote: >>> My bad, I forgot that CompareAndSwapP assembler code produces Boolean >>> value in register. I mistook it for StorePConditional which produces >>> flag. >>> >>> But I think you can get better code since you want to generate test and >>> main point of having specialized CompareAndSwapP is to avoid test >>> instruction. >>> >>> If we use StorePConditional instead of CompareAndSwapP we may remove >>> second branch: >>> >>>> ??? 0x00007fa06809cdd1: test %r11d,%r11d >>>> ; CAS fail, jump to respin >>>> ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 >>>> ; CAS success, jump to store barrier >>>> >>> >>> But C2 changes will be much larger. We would need new Ideal::if_then() >>> which take in result of StorePConditional and set load_store on both >>> paths to 0/1. >>> >>> We may need to play with probability of if_then() to get barrier in >>> follow code. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/10/15 2:13 AM, Aleksey Shipilev wrote: >>>> Hi Vladimir! >>>> >>>> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >>>>> I think the test is wrong. It should be: >>>>> >>>>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); >>>> >>>> Um, no? I remember eyeballing the assembly to confirm this. >>>> >>>> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store >>>> seems to have a boolean value, but "oldval" is oop. In other words, >>>> "load_store != 0" tests "(boolean)load_store != false". >>>> >>>> Current VM produces: >>>> >>>> 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) >>>> 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b >>>> 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d >>>> >>>> 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d >>>> >>>> ; CAS fail, jump to respin >>>> ??? 0x00007fa06809cdd4: je >>>> 0x00007fa06809ccf0 >>>> >>>> >>>> Patched VM piggybacks on the same result: >>>> >>>> 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) >>>> 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b >>>> 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d >>>> 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d >>>> >>>> ; CAS success, jump to store barrier >>>> ??? 0x00007fe618af4125: jne 0x00007fe618af4070 >>>> >>>> ; CAS fail, jump to respin >>>> ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 >>>> >>>> >>>> Your suggestion seems to ignore the test completely (GVN helped?), and >>>> while it's still technically correct with emitting the barrier always, >>>> it defeats the purpose of the change: >>>> >>>> 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) >>>> 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax >>>> 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b >>>> ???? 0x00007f7790aefd36: mov $0x0,%eax >>>> >>>> 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax >>>> >>>> ; CAS fail, jump back to respin >>>> ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 >>>> >>>> ; CAS success, follow to exit >>>> >>>> Also, AFAIU, performance results would look different if we screwed the >>>> success check. But they seem to be coherent with our expectations: when >>>> CAS fails, either the conditional card marking or this change helps, >>>> and >>>> the change does not help when CAS succeeds. >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>>>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>>>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>>>>> I would like to suggest a fix for: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>>>>> >>>>>>>> In short, current reference CAS intrinsic blindly emits >>>>>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>>>>> result of the store itself, and put the post_barrier only on >>>>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>>>>> >>>>>>>> More performance results here: >>>>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>>>>> >>>>>>> Nice! The code looks fine and your test results are very convincing. >>>>>>> I'll be interested to see how this looks on AArch64. >>>>>> >>>>>> Thanks Andrew! >>>>>> >>>>>> The change passes JPRT, so AArch64 build is available. The benchmark >>>>>> JAR >>>>>> mentioned in the issue comments would run without intervention, >>>>>> taking >>>>>> around 40 minutes. You are very welcome to try, while Reviewers are >>>>>> taking a look. I can do that only next week. >>>>>> >>>>>>> That said, I am afraid you still need a Reviewer! >>>>>> >>>>>> That reminds me I haven't spelled out what testing was done: >>>>>> >>>>>> * JPRT on all open platforms >>>>>> * Targeted benchmarks >>>>>> * Eyeballing the generated x86 assembly >>>>>> >>>>>> Thanks, >>>>>> -Aleksey >>>>>> >>>>>> >>>> >>>> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Wed Aug 12 08:14:07 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Aug 2015 09:14:07 +0100 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55CAEFEC.6090005@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> <55CAEFEC.6090005@oracle.com> Message-ID: <55CB004F.9030903@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/15 08:04, Aleksey Shipilev wrote: > Thanks, Vladimir! > > Here's a changeset: > http://cr.openjdk.java.net/~shade/8019968/8019968.changeset > > Please sponsor! The patch is fine by me but I think you still need another (capital R) Reviewer. regards, Andrew Dinn - ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJVywBPAAoJEGnaNq4xxcSzQR8IAI+dnFW1n4DgRrLQdmehqGqk RrEwAi+JpEGrcX+r5fQtn0KYPZcl8Jse1DfQS22FmmJOkYlx+OxhhDInrEv4ig0z xCBO8/gKEegLqjNy706Jet3CUOzsX3xeFhgfQoUCwVt5opVmvLhNBV9vuJp6j3eW RzDCKJG1Utve5RQ61ncbro4N1Xh17FDfZ854rgAm76JPQallyeTqlPadXr8gbk0q PDDY4n+PqcGxf2jhlinI7IIkuv8V83d6eLE/kHR+41WOHvtwKTLq0g7llDJsqKtm VX5BUi+93+HipNdtZBYuhDEluRns5R+YTodxZeyeTrQDiRsTUszXlRXB+77cJDY= =jY4r -----END PGP SIGNATURE----- From dawid.weiss at gmail.com Wed Aug 12 11:09:47 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Wed, 12 Aug 2015 13:09:47 +0200 Subject: Transient miscompilation problem on 1.8 (invalid AIOOB/NPE thrown from the method body). In-Reply-To: References: Message-ID: FYI. Found it by bisecting hotspot changes and recompiling in fastdebug. The problem is present consistently before this commit: $ hg log -r 7381 changeset: 7381:03596ae35800 user: aeriksso date: Thu May 21 16:49:11 2015 +0200 summary: 8060036: C2: CmpU nodes can end up with wrong type information I cannot explain why -XX:-TieredCompilation helps here, perhaps it collects different stats and the compilation graph is different (?). In any case, the bug issue [1] has incorrect "Affect" field of "8u60"; should be at least "8x45", perhaps lower than that (and a related bug [2] has it set correctly). Dawid [1] https://bugs.openjdk.java.net/browse/JDK-8060036 [2] https://bugs.openjdk.java.net/browse/JDK-8080156 On Tue, Aug 11, 2015 at 11:27 PM, Dawid Weiss wrote: > We tried to narrow it down. The problem is tied to tiered compilation > somehow because turning it off makes the test pass with flying colors: > > # 1.8.0_45-b14 > PASSES -Xint > PASSES -Xmx4g -Xbatch -XX:CICompilerCount=1 -XX:-TieredCompilation > PASSES -Xmx4g -XX:-TieredCompilation > FAILS -Xmx4g -XX:+TieredCompilation > FAILS -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation > > What's more interesting is that 1.9 and the most recent ea of 1.8 > (u60) also pass, even with tiered compilation turned on: > > # 1.9.0-ea-b71 > PASSES -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation > > # 1.8.0_60-ea-b25 > PASSES [always, regardless of options] > > I can't tell whether it's something masking the original problem or > whether the bug has been fixed in between. I looked at JIRA logs, but > can't find anything specific. If somebody knows what this could be, > I'd appreciate a pointer. > > Dawid > > On Tue, Aug 11, 2015 at 4:25 PM, Dawid Weiss wrote: >> Hello, >> >> We have encountered a transient miscompilation problem (on 1.8u40). We >> get an AIOOB exception from a snippet of code which (provably) cannot >> throw it. The AIOOB is thrown without a stack trace. What's >> interesting is that when we set: >> >> -XX:-OmitStackTraceInFastThrow >> >> we get an NPE exception (which, again, is provably impossible at Java >> code level). >> >> The problem does not reproduce on my machine with i7 3770K (at least >> so far), but does reproduce consistently on i7 2600K (and our >> customer's machine; exact spec unknown). >> >> I will be looking into isolating this issue as it is in our >> proprietary code, but the pattern seems to be as follows: >> >> 1) new instance of A is created, with a new instance of B, which is a >> single-implementation of interface C. >> >> 2) there is a tight loop which calls A (and B) methods. >> >> There is no way for an AIOOB (or NPE) to be present in any of A or B, >> but the stack trace indicates A. >> >> I suspect an OSR miscompilation somewhere, but since I can't reproduce >> it locally it's a bit of a problem to experiment with JVM versions and >> internal flags. >> >> Any hints on what it can be related to (flags to try, etc.) would be >> appreciated. >> >> Dawid From adinn at redhat.com Wed Aug 12 12:45:32 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Aug 2015 13:45:32 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55BA78B7.7030300@oracle.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> Message-ID: <55CB3FEC.1070709@redhat.com> Hi Vladimir, Apologies for the delay in responding to your feedback -- I was traveling for a team meeting all of last week. Here is a revised webrev which includes all the code changes you suggested http://cr.openjdk.java.net/~adinn/8078743/webrev.04 Also, as requested I did some testing on the two AArch64 machines to which I have access. Does it help? Short answer: yes it is well worth doing as it causes no harm on the sort of architecture where you would expect no benefit and helps a lot on the sort of architecture where you would expect it to help. More details below. regards, Andrew Dinn ----------- The Tests --------- I ran some simple tests using the jmh micro-benchmark harness, first using the old style dmb based implementation (i.e. passing -XX:+UseBarriersForVolatile) and then using the new style stlr-based implementation (using -XX:-UseBarriersForVolatile). Each test was run in each of the 5 relevant GC configs: +G1GC +CMS +UseCondCardMark +CMS -UseCondCardMark +Par +UseCondCardMark +Par -UseCondCardMark The tests were derived from Alexey Shipilev's recently posted CAS test, tweaked to do volatile stores instead of CASes. Each test employs a single thread which repeatedly writes a volatile field (AtomicReference