From ysr1729 at gmail.com Sat Aug 1 00:39:03 2015 From: ysr1729 at gmail.com (ysr1729 at gmail.com) Date: Fri, 31 Jul 2015 17:39:03 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com> Message-ID: <9B8024CB-7044-4E97-96B4-C44147C1FE1B@gmail.com> Hi Vitaly -- Which jdk 8 version were you testing? It's a bit of the proverbial curate's egg at the moment (albeit not in the original sense, i assure you!) but if i may be allowed to mix my metaphors, I would be inclined not to throw out the baby with the bath water, yet. There are services that have seen benefits and some that haven't, and the picture overall is still a bit fuzzy. May be someone out there has done a more disciplined epidemiological study... PS: a couple of services were running tiered when it wasn't the default (in jdk 7)... -- ramki Sent from my iPhone > On Jul 31, 2015, at 3:08 PM, Vitaly Davidovich wrote: > > Ramki, are you actually seeing better peak perf with tiered than C2? I experimented with it on a real workload and it was a net loss for peak perf (anywhere from 8-20% worse than C2, but also quite unstable); this was with a very large code cache to play it safe, but no other tuning. > > sent from my phone > >> On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" wrote: >> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if with tiered and a necessarily bounded, if large, code cache whether flushing will in fact eventually become necessary, wrt yr suggested temporary workaround.) >> >> Have a good weekend! >> -- ramki >> >>> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov wrote: >>> Got it. Yes, it is issue with thousands java threads. >>> You are the first pointing this problem. File bug on compiler. We will look what we can do. Most likely we need parallelize this work. >>> >>> Method's hotness is used only for UseCodeCacheFlushing. You can try to guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch it off. >>> >>> We need mark_as_seen_on_stack so leave it. >>> >>> Thanks, >>> Vladimir >>> >>> >>>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: >>>> Hi Vladimir -- >>>> >>>> I noticed the increase even with Initial and Reserved set to the default >>>> of 240 MB, but actual usage much lower (less than a quarter). >>>> >>>> Look at this code path. Note that this is invoked at every safepoint >>>> (although it says "periodically" in the comment). >>>> In the mark_active_nmethods() method, there's a thread iteration in both >>>> branches of the if. I haven't checked to >>>> see which of the two was the culprit here, yet (if either). >>>> >>>> // Various cleaning tasks that should be done periodically at safepoints >>>> >>>> void SafepointSynchronize::do_cleanup_tasks() { >>>> >>>> .... >>>> >>>> { >>>> >>>> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >>>> >>>> NMethodSweeper::mark_active_nmethods(); >>>> >>>> } >>>> >>>> .. >>>> >>>> } >>>> >>>> >>>> void NMethodSweeper::mark_active_nmethods() { >>>> >>>> ... >>>> >>>> if (!sweep_in_progress()) { >>>> >>>> _seen = 0; >>>> >>>> _sweep_fractions_left = NmethodSweepFraction; >>>> >>>> _current = CodeCache::first_nmethod(); >>>> >>>> _traversals += 1; >>>> >>>> _total_time_this_sweep = Tickspan(); >>>> >>>> >>>> if (PrintMethodFlushing) { >>>> >>>> tty->print_cr("### Sweep: stack traversal %d", _traversals); >>>> >>>> } >>>> >>>> Threads::nmethods_do(&mark_activation_closure); >>>> >>>> >>>> } else { >>>> >>>> // Only set hotness counter >>>> >>>> Threads::nmethods_do(&set_hotness_closure); >>>> >>>> } >>>> >>>> >>>> OrderAccess::storestore(); >>>> >>>> } >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >>>> > wrote: >>>> >>>> Hi Ramki, >>>> >>>> Did you fill up CodeCache? It start scanning aggressive only with >>>> full CodeCache: >>>> >>>> // Force stack scanning if there is only 10% free space in the >>>> code cache. >>>> // We force stack scanning only non-profiled code heap gets full, >>>> since critical >>>> // allocation go to the non-profiled heap and we must be make >>>> sure that there is >>>> // enough space. >>>> double free_percent = 1 / >>>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >>>> if (free_percent <= StartAggressiveSweepingAt) { >>>> do_stack_scanning(); >>>> } >>>> >>>> Vladimir >>>> >>>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >>>> >>>> >>>> Yes. >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >>>> >>>> >> wrote: >>>> >>>> Ramki, are you running tiered compilation? >>>> >>>> sent from my phone >>>> >>>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >>>> >>>> >> wrote: >>>> >>>> >>>> Hello GC and Compiler teams! >>>> >>>> One of our services that runs with several thousand threads >>>> recently noticed an increase >>>> in safepoint stop times, but not gc times, upon >>>> transitioning to >>>> JDK 8. >>>> >>>> Further investigation revealed that most of the delta was >>>> related to the so-called >>>> pre-gc/vmop "cleanup" phase when various book-keeping >>>> activities >>>> are performed, >>>> and more specifically in the portion that walks java thread >>>> stacks single-threaded (!) >>>> and updates the hotness counters for the active >>>> nmethods. This >>>> code appears to >>>> be new to JDK 8 (in jdk 7 one would walk the stacks >>>> only during >>>> code cache sweeps). >>>> >>>> I have two questions: >>>> (1) has anyone else (typically, I'd expect applications >>>> with >>>> many hundreds or thousands of threads) >>>> noticed this regression? >>>> (2) Can we do better, for example, by: >>>> (a) doing these updates by walking thread stacks in >>>> multiple worker threads in parallel, or best of all: >>>> (b) doing these updates when we walk the thread >>>> stacks >>>> during GC, and skipping this phase entirely >>>> for non-GC safepoints (with attendant loss in >>>> frequency of this update in low GC frequency >>>> scenarios). >>>> >>>> It seems kind of silly to do GC's with many multiple worker >>>> threads, but do these thread stack >>>> walks single-threaded when it is embarrasingly parallel >>>> (one >>>> could predicate the parallelization >>>> based on the measured stack sizes and thread population, if >>>> there was concern on the ovrhead of >>>> activating and deactivating the thread gangs for the work). >>>> >>>> A followup question: Any guesses as to how code cache >>>> sweep/eviction quality might be compromised if one >>>> were to dispense with these hotness updates entirely >>>> (or at a >>>> much reduced frequency), as a temporary >>>> workaround to the performance problem? >>>> >>>> Thoughts/Comments? In particular, has this issue been >>>> addressed >>>> perhaps in newer JVMs? >>>> >>>> Thanks for any comments, feedback, pointers! >>>> -- ramki >>>> >>>> PS: for comparison, here's data with >>>> +TraceSafepointCleanup from >>>> JDK 7 (first, where this isn't done) >>>> vs JDK 8 (where this is done) with a program that has a few >>>> thousands of threads: >>>> >>>> >>>> >>>> JDK 7: >>>> .. >>>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>>> .. >>>> >>>> JDK 8: >>>> .. >>>> 7368.634: [mark nmethods, 0.0177030 secs] >>>> 7369.587: [mark nmethods, 0.0178305 secs] >>>> 7370.479: [mark nmethods, 0.0180260 secs] >>>> 7371.503: [mark nmethods, 0.0186494 secs] >>>> .. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Sat Aug 1 02:17:29 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 31 Jul 2015 22:17:29 -0400 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <9B8024CB-7044-4E97-96B4-C44147C1FE1B@gmail.com> References: <55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com> <9B8024CB-7044-4E97-96B4-C44147C1FE1B@gmail.com> Message-ID: Hi Ramki, That experiment was performed on 7u60, not 8; I may revisit this with 8 or perhaps wait for segregated code cache to be available before trying again. One thing that worried me was the tuning aspect of tiered, which is a bit opaque as compared to, say, GC logs - it's a bit too black boxey for me. Also, the servers I was running this on have tightly chosen cpu affinity masks and there aren't many spare cores to dedicate to C1 and C2 compiler threads. But, I may look at this again in the near future. sent from my phone On Jul 31, 2015 8:39 PM, wrote: > Hi Vitaly -- Which jdk 8 version were you testing? It's a bit of the > proverbial curate's egg at the moment (albeit not in the original sense, i > assure you!) but if i may be allowed to mix my metaphors, I would be > inclined not to throw out the baby with the bath water, yet. There are > services that have seen benefits and some that haven't, and the picture > overall is still a bit fuzzy. May be someone out there has done a more > disciplined epidemiological study... > > PS: a couple of services were running tiered when it wasn't the default > (in jdk 7)... > > -- ramki > > Sent from my iPhone > > On Jul 31, 2015, at 3:08 PM, Vitaly Davidovich wrote: > > Ramki, are you actually seeing better peak perf with tiered than C2? I > experimented with it on a real workload and it was a net loss for peak perf > (anywhere from 8-20% worse than C2, but also quite unstable); this was with > a very large code cache to play it safe, but no other tuning. > > sent from my phone > On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" wrote: > >> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if >> with tiered and a necessarily bounded, if large, code cache whether >> flushing will in fact eventually become necessary, wrt yr suggested >> temporary workaround.) >> >> Have a good weekend! >> -- ramki >> >> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com> wrote: >> >>> Got it. Yes, it is issue with thousands java threads. >>> You are the first pointing this problem. File bug on compiler. We will >>> look what we can do. Most likely we need parallelize this work. >>> >>> Method's hotness is used only for UseCodeCacheFlushing. You can try to >>> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch >>> it off. >>> >>> We need mark_as_seen_on_stack so leave it. >>> >>> Thanks, >>> Vladimir >>> >>> >>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: >>> >>>> Hi Vladimir -- >>>> >>>> I noticed the increase even with Initial and Reserved set to the default >>>> of 240 MB, but actual usage much lower (less than a quarter). >>>> >>>> Look at this code path. Note that this is invoked at every safepoint >>>> (although it says "periodically" in the comment). >>>> In the mark_active_nmethods() method, there's a thread iteration in both >>>> branches of the if. I haven't checked to >>>> see which of the two was the culprit here, yet (if either). >>>> >>>> // Various cleaning tasks that should be done periodically at safepoints >>>> >>>> void SafepointSynchronize::do_cleanup_tasks() { >>>> >>>> .... >>>> >>>> { >>>> >>>> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >>>> >>>> NMethodSweeper::mark_active_nmethods(); >>>> >>>> } >>>> >>>> .. >>>> >>>> } >>>> >>>> >>>> void NMethodSweeper::mark_active_nmethods() { >>>> >>>> ... >>>> >>>> if (!sweep_in_progress()) { >>>> >>>> _seen = 0; >>>> >>>> _sweep_fractions_left = NmethodSweepFraction; >>>> >>>> _current = CodeCache::first_nmethod(); >>>> >>>> _traversals += 1; >>>> >>>> _total_time_this_sweep = Tickspan(); >>>> >>>> >>>> if (PrintMethodFlushing) { >>>> >>>> tty->print_cr("### Sweep: stack traversal %d", _traversals); >>>> >>>> } >>>> >>>> Threads::nmethods_do(&mark_activation_closure); >>>> >>>> >>>> } else { >>>> >>>> // Only set hotness counter >>>> >>>> Threads::nmethods_do(&set_hotness_closure); >>>> >>>> } >>>> >>>> >>>> OrderAccess::storestore(); >>>> >>>> } >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >>>> > wrote: >>>> >>>> Hi Ramki, >>>> >>>> Did you fill up CodeCache? It start scanning aggressive only with >>>> full CodeCache: >>>> >>>> // Force stack scanning if there is only 10% free space in the >>>> code cache. >>>> // We force stack scanning only non-profiled code heap gets full, >>>> since critical >>>> // allocation go to the non-profiled heap and we must be make >>>> sure that there is >>>> // enough space. >>>> double free_percent = 1 / >>>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * >>>> 100; >>>> if (free_percent <= StartAggressiveSweepingAt) { >>>> do_stack_scanning(); >>>> } >>>> >>>> Vladimir >>>> >>>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >>>> >>>> >>>> Yes. >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >>>> >>>> >> wrote: >>>> >>>> Ramki, are you running tiered compilation? >>>> >>>> sent from my phone >>>> >>>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >>>> >>>> >> >>>> wrote: >>>> >>>> >>>> Hello GC and Compiler teams! >>>> >>>> One of our services that runs with several thousand >>>> threads >>>> recently noticed an increase >>>> in safepoint stop times, but not gc times, upon >>>> transitioning to >>>> JDK 8. >>>> >>>> Further investigation revealed that most of the delta >>>> was >>>> related to the so-called >>>> pre-gc/vmop "cleanup" phase when various book-keeping >>>> activities >>>> are performed, >>>> and more specifically in the portion that walks java >>>> thread >>>> stacks single-threaded (!) >>>> and updates the hotness counters for the active >>>> nmethods. This >>>> code appears to >>>> be new to JDK 8 (in jdk 7 one would walk the stacks >>>> only during >>>> code cache sweeps). >>>> >>>> I have two questions: >>>> (1) has anyone else (typically, I'd expect applications >>>> with >>>> many hundreds or thousands of threads) >>>> noticed this regression? >>>> (2) Can we do better, for example, by: >>>> (a) doing these updates by walking thread >>>> stacks in >>>> multiple worker threads in parallel, or best of all: >>>> (b) doing these updates when we walk the thread >>>> stacks >>>> during GC, and skipping this phase entirely >>>> for non-GC safepoints (with attendant >>>> loss in >>>> frequency of this update in low GC frequency >>>> scenarios). >>>> >>>> It seems kind of silly to do GC's with many multiple >>>> worker >>>> threads, but do these thread stack >>>> walks single-threaded when it is embarrasingly parallel >>>> (one >>>> could predicate the parallelization >>>> based on the measured stack sizes and thread >>>> population, if >>>> there was concern on the ovrhead of >>>> activating and deactivating the thread gangs for the >>>> work). >>>> >>>> A followup question: Any guesses as to how code cache >>>> sweep/eviction quality might be compromised if one >>>> were to dispense with these hotness updates entirely >>>> (or at a >>>> much reduced frequency), as a temporary >>>> workaround to the performance problem? >>>> >>>> Thoughts/Comments? In particular, has this issue been >>>> addressed >>>> perhaps in newer JVMs? >>>> >>>> Thanks for any comments, feedback, pointers! >>>> -- ramki >>>> >>>> PS: for comparison, here's data with >>>> +TraceSafepointCleanup from >>>> JDK 7 (first, where this isn't done) >>>> vs JDK 8 (where this is done) with a program that has >>>> a few >>>> thousands of threads: >>>> >>>> >>>> >>>> JDK 7: >>>> .. >>>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>>> .. >>>> >>>> JDK 8: >>>> .. >>>> 7368.634: [mark nmethods, 0.0177030 secs] >>>> 7369.587: [mark nmethods, 0.0178305 secs] >>>> 7370.479: [mark nmethods, 0.0180260 secs] >>>> 7371.503: [mark nmethods, 0.0186494 secs] >>>> .. >>>> >>>> >>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From filipp.zhinkin at gmail.com Sun Aug 2 14:10:44 2015 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Sun, 2 Aug 2015 17:10:44 +0300 Subject: RFR (S): 8067014: LinearScan::is_sorted significantly slows down fastdebug builds' performance In-Reply-To: References: <54F99281.7020101@oracle.com> Message-ID: ping On Mon, Mar 23, 2015 at 1:40 PM, Filipp Zhinkin wrote: > Hi all, > > sorry for a late reply. > > I don't think that it's possible to remove is_sorted assertion from > create_unhandled_lists, because it's crucial condition for a linear > scan allocation algorithm and it's pretty easy to break it incidentally. > Existing assertion could significantly reduce time required to locate > an issue when something will go wrong. > > However, I believe that it could be relaxed to check only that > intervals in _sorted_intervals list are actually ordered and that > _new_intervals_from_allocation list is empty (in sorting methods > we still will be verifying that sorted and unsorted lists contain > same intervals). > > What do you guys think about that? > > Regards, > Filippp. > > > On Fri, Mar 6, 2015 at 9:24 PM, Filipp Zhinkin wrote: >> Hi Aleksey, >> >> thanks for looking at it! >> >> On Fri, Mar 6, 2015 at 2:41 PM, Aleksey Shipilev >> wrote: >>> Hi Filipp, >>> >>> On 06.03.2015 14:33, Filipp Zhinkin wrote: >>>> In certain cases (like -client -Xcomp) C1 compilation is very slow >>>> w/ fastdebug builds. A place where we spent enormous amount of time >>>> is LinearScan::is_sorted method, which simply verifies that a list >>>> that should be sorted is actually sorted and that both sorted and >>>> unsorted lists contains same intervals. >>> >>> Okay, what caller of is_sorted dominates? Maybe instead of optimizing >>> the is_sorted itself, you need to move/relax the assert in some selected >>> places? >> >> Well, the dominating caller is LinearScan::create_unhandled_lists [1]. >> >>> >>> That is to say I am not fond of complicating the non-product code that >>> does verification without a compelling reason to do so; let's first >>> figure out if we "just" do excess asserts. >> >> That's a good point. I'll try to figure a out if an assertion is placed to be >> sure that all methods called in the right sequence and if it's true, then >> it may be better to use less expensive approach. >> >> Thanks, >> Filipp. >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/de7ca28f8b7d/src/share/vm/c1/c1_LinearScan.cpp#l1486 >> >>> >>> Thanks, >>> -Aleksey. >>> From ysr1729 at gmail.com Sun Aug 2 18:11:31 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Sun, 2 Aug 2015 11:11:31 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <55BBE883.1080308@oracle.com> References: <55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com> Message-ID: I filed: https://bugs.openjdk.java.net/browse/JDK-8132849 thanks! -- ramki On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov wrote: > Got it. Yes, it is issue with thousands java threads. > You are the first pointing this problem. File bug on compiler. We will > look what we can do. Most likely we need parallelize this work. > > Method's hotness is used only for UseCodeCacheFlushing. You can try to > guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch > it off. > > We need mark_as_seen_on_stack so leave it. > > Thanks, > Vladimir > > > On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: > >> Hi Vladimir -- >> >> I noticed the increase even with Initial and Reserved set to the default >> of 240 MB, but actual usage much lower (less than a quarter). >> >> Look at this code path. Note that this is invoked at every safepoint >> (although it says "periodically" in the comment). >> In the mark_active_nmethods() method, there's a thread iteration in both >> branches of the if. I haven't checked to >> see which of the two was the culprit here, yet (if either). >> >> // Various cleaning tasks that should be done periodically at safepoints >> >> void SafepointSynchronize::do_cleanup_tasks() { >> >> .... >> >> { >> >> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >> >> NMethodSweeper::mark_active_nmethods(); >> >> } >> >> .. >> >> } >> >> >> void NMethodSweeper::mark_active_nmethods() { >> >> ... >> >> if (!sweep_in_progress()) { >> >> _seen = 0; >> >> _sweep_fractions_left = NmethodSweepFraction; >> >> _current = CodeCache::first_nmethod(); >> >> _traversals += 1; >> >> _total_time_this_sweep = Tickspan(); >> >> >> if (PrintMethodFlushing) { >> >> tty->print_cr("### Sweep: stack traversal %d", _traversals); >> >> } >> >> Threads::nmethods_do(&mark_activation_closure); >> >> >> } else { >> >> // Only set hotness counter >> >> Threads::nmethods_do(&set_hotness_closure); >> >> } >> >> >> OrderAccess::storestore(); >> >> } >> >> >> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >> > wrote: >> >> Hi Ramki, >> >> Did you fill up CodeCache? It start scanning aggressive only with >> full CodeCache: >> >> // Force stack scanning if there is only 10% free space in the >> code cache. >> // We force stack scanning only non-profiled code heap gets full, >> since critical >> // allocation go to the non-profiled heap and we must be make >> sure that there is >> // enough space. >> double free_percent = 1 / >> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >> if (free_percent <= StartAggressiveSweepingAt) { >> do_stack_scanning(); >> } >> >> Vladimir >> >> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >> >> >> Yes. >> >> >> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >> >> >> wrote: >> >> Ramki, are you running tiered compilation? >> >> sent from my phone >> >> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >> >> >> >> wrote: >> >> >> Hello GC and Compiler teams! >> >> One of our services that runs with several thousand >> threads >> recently noticed an increase >> in safepoint stop times, but not gc times, upon >> transitioning to >> JDK 8. >> >> Further investigation revealed that most of the delta was >> related to the so-called >> pre-gc/vmop "cleanup" phase when various book-keeping >> activities >> are performed, >> and more specifically in the portion that walks java >> thread >> stacks single-threaded (!) >> and updates the hotness counters for the active >> nmethods. This >> code appears to >> be new to JDK 8 (in jdk 7 one would walk the stacks >> only during >> code cache sweeps). >> >> I have two questions: >> (1) has anyone else (typically, I'd expect applications >> with >> many hundreds or thousands of threads) >> noticed this regression? >> (2) Can we do better, for example, by: >> (a) doing these updates by walking thread stacks >> in >> multiple worker threads in parallel, or best of all: >> (b) doing these updates when we walk the thread >> stacks >> during GC, and skipping this phase entirely >> for non-GC safepoints (with attendant loss >> in >> frequency of this update in low GC frequency >> scenarios). >> >> It seems kind of silly to do GC's with many multiple >> worker >> threads, but do these thread stack >> walks single-threaded when it is embarrasingly parallel >> (one >> could predicate the parallelization >> based on the measured stack sizes and thread population, >> if >> there was concern on the ovrhead of >> activating and deactivating the thread gangs for the >> work). >> >> A followup question: Any guesses as to how code cache >> sweep/eviction quality might be compromised if one >> were to dispense with these hotness updates entirely >> (or at a >> much reduced frequency), as a temporary >> workaround to the performance problem? >> >> Thoughts/Comments? In particular, has this issue been >> addressed >> perhaps in newer JVMs? >> >> Thanks for any comments, feedback, pointers! >> -- ramki >> >> PS: for comparison, here's data with >> +TraceSafepointCleanup from >> JDK 7 (first, where this isn't done) >> vs JDK 8 (where this is done) with a program that has a >> few >> thousands of threads: >> >> >> >> JDK 7: >> .. >> 2827.308: [sweeping nmethods, 0.0000020 secs] >> 2828.679: [sweeping nmethods, 0.0000030 secs] >> 2829.984: [sweeping nmethods, 0.0000030 secs] >> 2830.956: [sweeping nmethods, 0.0000030 secs] >> .. >> >> JDK 8: >> .. >> 7368.634: [mark nmethods, 0.0177030 secs] >> 7369.587: [mark nmethods, 0.0178305 secs] >> 7370.479: [mark nmethods, 0.0180260 secs] >> 7371.503: [mark nmethods, 0.0186494 secs] >> .. >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Mon Aug 3 07:22:38 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 03 Aug 2015 09:22:38 +0200 Subject: [9] RFR(S): 8132457: Unify command-line flags controlling the usage of compiler intrinsics In-Reply-To: <55BB91B4.805@oracle.com> References: <55BB479E.8000402@oracle.com> <55BB91B4.805@oracle.com> Message-ID: <55BF16BE.7010001@oracle.com> Thank you, Vladimir! Best regards, Zoltan On 07/31/2015 05:18 PM, Vladimir Kozlov wrote: > Very nice cleanup. Thank you, Zoltan. > > Vladimir > > On 7/31/15 3:02 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the following patch for JDK-8132457. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8132457 >> >> Problem: There are four cases when flags controlling intrinsics for >> C1 and C2 behave inconsistently: >> 1) The DisableIntrinsic flag is C2-specific. >> 2) The InlineNatives flag disables most but not all intrinsics. Some >> intrinsics (implemented by both C1 and C2) are >> turned off by -XX:-InlineNatives for C1 but are left on for C2. >> 3) The _getClass intrinsic (implemented by both C1 and C2) is turned >> off by -XX:-InlineClassNatives for C1 and is left >> unaffected by C2. >> 4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, >> _compareAndSwapLong, and _compareAndSwapInt >> intrinsics are turned off by -XX:-InlineUnsafeOps for C2 and are >> unaffected by C1. >> >> >> Solution: Unify command-line flags controlling intrinsic processing. >> Processing of command-line flags is now done only >> in vmIntrinsics::is_disabled_by_flags and there is no >> compiler-specific flag processing. >> >> The inconsistencies listed in the problem description were addressed >> the following way: >> 1) Extend the C1 compiler to consider the DisableIntrinsic flag when >> checking if an intrinsic is available. >> 2) -XX:-InlineNatives turns off most intrinsics but leaves on some >> intrinsics (the same set of intrinsics are left on >> for both C1 and C2). >> 3) -XX:-InlineClassNatives turns off the _getClass intrinsic for both >> C1 and C2. >> 4) -XX:-InlineUnsafeOps turns off the _loadfence, _storefence, >> _fullfence, _compareAndSwapObject, _compareAndSwapLong, >> and _compareAndSwapInt intrinsics for both C1 and C2. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8132457/webrev.00/ >> >> Testing: >> - JPRT run, testset hotspot, all tests pass; >> - all JTREG tests in hotspot/test, all tests pass; >> - local testing of DisableIntrinsic with both C1 and C2. >> >> Thank you and best regards, >> >> >> Zoltan >> From adinn at redhat.com Mon Aug 3 09:28:48 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 10:28:48 +0100 Subject: RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 Message-ID: <55BF3450.5020008@redhat.com> The following /AArch64-only/ webrev fixes some problems introduced into the AArch64 codecache routines by the recent fix for JDK-8130309 committed to to hs-comp http://cr.openjdk.java.net/~adinn/8132875/webrev.00/ With this patch the hs-comp tree compiles and runs correctly on AArch64. Reviews welcome. regards, Andrew Dinn ----------- From aph at redhat.com Mon Aug 3 10:05:12 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 03 Aug 2015 11:05:12 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3450.5020008@redhat.com> References: <55BF3450.5020008@redhat.com> Message-ID: <55BF3CD8.6020905@redhat.com> On 03/08/15 10:28, Andrew Dinn wrote: > With this patch the hs-comp tree compiles and runs correctly on AArch64. > Reviews welcome. That looks right to me. Thanks, Andrew. From adinn at redhat.com Mon Aug 3 11:03:13 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 12:03:13 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3CD8.6020905@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> Message-ID: <55BF4A71.2040308@redhat.com> On 03/08/15 11:05, Andrew Haley wrote: > On 03/08/15 10:28, Andrew Dinn wrote: >> With this patch the hs-comp tree compiles and runs correctly on AArch64. >> Reviews welcome. > > That looks right to me. Thanks for the review. Could someone from the compiler team with the relevant access right also please review and then sponsor this patch for inclusion into hs-comp? It would be good to get this fix into that repo before the original patch goes up into jdk9. Thanks. regards, Andrew Dinn ----------- From tobias.hartmann at oracle.com Mon Aug 3 11:35:01 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 03 Aug 2015 13:35:01 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF4A71.2040308@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> Message-ID: <55BF51E5.5090708@oracle.com> Hi Andrew, thanks for fixing that! Seems like I forgot the manual aarch64 testing for my latest webrev.. The changes look good. I can sponsor and push them into hs-comp after an official reviewer approved them. Best, Tobias On 03.08.2015 13:03, Andrew Dinn wrote: > On 03/08/15 11:05, Andrew Haley wrote: >> On 03/08/15 10:28, Andrew Dinn wrote: >>> With this patch the hs-comp tree compiles and runs correctly on AArch64. >>> Reviews welcome. >> >> That looks right to me. > > Thanks for the review. > > Could someone from the compiler team with the relevant access right also > please review and then sponsor this patch for inclusion into hs-comp? It > would be good to get this fix into that repo before the original patch > goes up into jdk9. Thanks. > > regards, > > > Andrew Dinn > ----------- > From adinn at redhat.com Mon Aug 3 11:42:02 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 12:42:02 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF51E5.5090708@oracle.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> Message-ID: <55BF538A.9080409@redhat.com> Hi Tobias, On 03/08/15 12:35, Tobias Hartmann wrote: > thanks for fixing that! Seems like I forgot the manual aarch64 > testing for my latest webrev.. > > The changes look good. I can sponsor and push them into hs-comp after > an official reviewer approved them. Thanks, Tobias. Do we need another reviewer for an AArch64-only change? If so then could someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on holiday so we don't have another AArch64 port dev to review? Thanks! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From tobias.hartmann at oracle.com Mon Aug 3 13:20:46 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 03 Aug 2015 15:20:46 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF538A.9080409@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> Message-ID: <55BF6AAE.2030101@oracle.com> On 03.08.2015 13:42, Andrew Dinn wrote: > Hi Tobias, > > On 03/08/15 12:35, Tobias Hartmann wrote: >> thanks for fixing that! Seems like I forgot the manual aarch64 >> testing for my latest webrev.. >> >> The changes look good. I can sponsor and push them into hs-comp after >> an official reviewer approved them. > > Thanks, Tobias. > > Do we need another reviewer for an AArch64-only change? If so then could > someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on > holiday so we don't have another AArch64 port dev to review? I think we need at least one JDK 9 reviewer (I'm not an official reviewer). Best, Tobias > > Thanks! > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Mon Aug 3 16:07:03 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2015 09:07:03 -0700 Subject: RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3450.5020008@redhat.com> References: <55BF3450.5020008@redhat.com> Message-ID: <55BF91A7.7030608@oracle.com> Looks good. Thanks, Vladimir On 8/3/15 2:28 AM, Andrew Dinn wrote: > The following /AArch64-only/ webrev fixes some problems introduced into > the AArch64 codecache routines by the recent fix for JDK-8130309 > committed to to hs-comp > > http://cr.openjdk.java.net/~adinn/8132875/webrev.00/ > > With this patch the hs-comp tree compiles and runs correctly on AArch64. > Reviews welcome. > > regards, > > > Andrew Dinn > ----------- > From dmitry.dmitriev at oracle.com Wed Aug 5 16:55:44 2015 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Wed, 5 Aug 2015 19:55:44 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) Message-ID: <55C24010.8030901@oracle.com> Hello, Please review this fix which remove small memory leak in debug build. Also, I need a sponsor for this fix, who can push it. MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed after '__ STOP(buf);'. Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 Tested: JPRT(hotspot test set), hotspot all, vm.quick Thanks, Dmitry From vladimir.kozlov at oracle.com Wed Aug 5 17:54:10 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2015 10:54:10 -0700 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C24010.8030901@oracle.com> References: <55C24010.8030901@oracle.com> Message-ID: <55C24DC2.9030902@oracle.com> Looks good. Note, it is not real memory leak - code does not return from STOP call. It either produce assert and exit or wait to attach debugger (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. Thanks, Vladimir On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: > Hello, > > Please review this fix which remove small memory leak in debug build. Also, I need a sponsor for this fix, who can push it. > > MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed > after '__ STOP(buf);'. > > Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 > Tested: JPRT(hotspot test set), hotspot all, vm.quick > > Thanks, > Dmitry From adinn at redhat.com Wed Aug 5 19:55:26 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 05 Aug 2015 20:55:26 +0100 Subject: RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF91A7.7030608@oracle.com> References: <55BF3450.5020008@redhat.com> <55BF91A7.7030608@oracle.com> Message-ID: <55C26A2E.1060902@redhat.com> On 03/08/15 17:07, Vladimir Kozlov wrote: > Looks good. Thanks for the review Vladimir (and apologies for the delay in replying -- I was traveling for a meeting). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From dmitry.dmitriev at oracle.com Wed Aug 5 21:41:55 2015 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Thu, 6 Aug 2015 00:41:55 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C24DC2.9030902@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> Message-ID: <55C28323.3080907@oracle.com> Hello Vladimir, Thank you for review and explanation! I looked at the code and see that code does not return from STOP and this block executed only when ref kind not equal to expected. But it is possible that debug64 will not be called and execution continues? For example at VM start-up? Here a call chain which I see: JVM_RegisterMethodHandleMethods->MethodHandles::generate_adapters->MethodHandlesAdapterGenerator::generate->MethodHandles::generate_method_handle_interpreter_entry->MethodHandles::verify_ref_kind For quick experiment I add tty->print_cr() to the MethodHandles::verify_ref_kind, MacroAssembler::stop and MacroAssembler::debug64 and see that block with memory allocation is executed in this case, stop method is called, but debug64 is not executed and stop successfully finished. So, it explains why I see memory leak... Correct me if I am wrong. Thanks! Dmitry On 05.08.2015 20:54, Vladimir Kozlov wrote: > Looks good. > > Note, it is not real memory leak - code does not return from STOP > call. It either produce assert and exit or wait to attach debugger > (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. > > Thanks, > Vladimir > > On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >> Hello, >> >> Please review this fix which remove small memory leak in debug build. >> Also, I need a sponsor for this fix, who can push it. >> >> MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' >> is allocated by NEW_C_HEAP_ARRAY but not freed >> after '__ STOP(buf);'. >> >> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >> Tested: JPRT(hotspot test set), hotspot all, vm.quick >> >> Thanks, >> Dmitry From vladimir.kozlov at oracle.com Wed Aug 5 22:13:18 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2015 15:13:18 -0700 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C28323.3080907@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> <55C28323.3080907@oracle.com> Message-ID: <55C28A7E.1090406@oracle.com> I don't see how debug64 is not executed if stop is called: void MacroAssembler::stop(const char* msg) { address rip = pc(); pusha(); // get regs on stack lea(c_rarg0, ExternalAddress((address) msg)); lea(c_rarg1, InternalAddress(rip)); movq(c_rarg2, rsp); // pass pointer to regs array andq(rsp, -16); // align stack as required by ABI call(RuntimeAddress(CAST_FROM_FN_PTR(address, MacroAssembler::debug64))); hlt(); } Looks like you misunderstand how this code works. You can't use tty->print_cr() in these cases. It produce output when that assembler code is *generated* and NOT when it is *executed*. Saying that I realized that your fix is totally wrong. Buffer allocation happens during assembler code generation but it is used when that code is executed. If you free it (during code generation) you will get bad pointer during execution because corresponding memory is freed. In this regards it is NOT memory leak. We need this memory during whole run until JVM exit (end of program). This code is used for adapter generation which are never not removed from CodeCache. Regards, Vladimir On 8/5/15 2:41 PM, Dmitry Dmitriev wrote: > Hello Vladimir, > > Thank you for review and explanation! > > I looked at the code and see that code does not return from STOP and this block executed only when ref kind not equal to > expected. But it is possible that debug64 will not be called and execution continues? For example at VM start-up? Here a > call chain which I see: > JVM_RegisterMethodHandleMethods->MethodHandles::generate_adapters->MethodHandlesAdapterGenerator::generate->MethodHandles::generate_method_handle_interpreter_entry->MethodHandles::verify_ref_kind > > > For quick experiment I add tty->print_cr() to the MethodHandles::verify_ref_kind, MacroAssembler::stop and > MacroAssembler::debug64 and see that block with memory allocation is executed in this case, stop method is called, but > debug64 is not executed and stop successfully finished. So, it explains why I see memory leak... Correct me if I am > wrong. Thanks! > > Dmitry > > On 05.08.2015 20:54, Vladimir Kozlov wrote: >> Looks good. >> >> Note, it is not real memory leak - code does not return from STOP call. It either produce assert and exit or wait to >> attach debugger (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. >> >> Thanks, >> Vladimir >> >> On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >>> Hello, >>> >>> Please review this fix which remove small memory leak in debug build. Also, I need a sponsor for this fix, who can >>> push it. >>> >>> MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed >>> after '__ STOP(buf);'. >>> >>> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >>> Tested: JPRT(hotspot test set), hotspot all, vm.quick >>> >>> Thanks, >>> Dmitry > From vladimir.x.ivanov at oracle.com Wed Aug 5 22:17:41 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 06 Aug 2015 01:17:41 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C24DC2.9030902@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> Message-ID: <55C28B85.9090704@oracle.com> Don't we reference freed memory from generated code after this fix? stop() doesn't copy the message, but uses it as is: void MacroAssembler::stop(const char* msg) { ExternalAddress message((address)msg); // push address of message pushptr(message.addr()); ... } So, JVM can print garbage when hitting STOP if the memory was reused. A proper fix would be to store the message somewhere in corresponding nmethod. Best regards, Vladimir Ivanov On 8/5/15 8:54 PM, Vladimir Kozlov wrote: > Looks good. > > Note, it is not real memory leak - code does not return from STOP call. > It either produce assert and exit or wait to attach debugger > (ShowMessageBoxOnError). See MacroAssembler::debug64() for example. > > Thanks, > Vladimir > > On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >> Hello, >> >> Please review this fix which remove small memory leak in debug build. >> Also, I need a sponsor for this fix, who can push it. >> >> MethodHandles::verify_ref_kind contains memory leak. Memory for 'buf' >> is allocated by NEW_C_HEAP_ARRAY but not freed >> after '__ STOP(buf);'. >> >> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >> Tested: JPRT(hotspot test set), hotspot all, vm.quick >> >> Thanks, >> Dmitry From dmitry.dmitriev at oracle.com Wed Aug 5 22:33:16 2015 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Thu, 6 Aug 2015 01:33:16 +0300 Subject: RFR (XXS): 8132728: Memory leak in MethodHandles::verify_ref_kind function(fastdebug build) In-Reply-To: <55C28A7E.1090406@oracle.com> References: <55C24010.8030901@oracle.com> <55C24DC2.9030902@oracle.com> <55C28323.3080907@oracle.com> <55C28A7E.1090406@oracle.com> Message-ID: <55C28F2C.7090108@oracle.com> Vladimir, thank you for explanation! That makes things clear. Regards, Dmitry On 06.08.2015 1:13, Vladimir Kozlov wrote: > I don't see how debug64 is not executed if stop is called: > > void MacroAssembler::stop(const char* msg) { > address rip = pc(); > pusha(); // get regs on stack > lea(c_rarg0, ExternalAddress((address) msg)); > lea(c_rarg1, InternalAddress(rip)); > movq(c_rarg2, rsp); // pass pointer to regs array > andq(rsp, -16); // align stack as required by ABI > call(RuntimeAddress(CAST_FROM_FN_PTR(address, > MacroAssembler::debug64))); > hlt(); > } > > Looks like you misunderstand how this code works. You can't use > tty->print_cr() in these cases. It produce output when that assembler > code is *generated* and NOT when it is *executed*. > > Saying that I realized that your fix is totally wrong. Buffer > allocation happens during assembler code generation but it is used > when that code is executed. If you free it (during code generation) > you will get bad pointer during execution because corresponding memory > is freed. > > In this regards it is NOT memory leak. We need this memory during > whole run until JVM exit (end of program). > This code is used for adapter generation which are never not removed > from CodeCache. > > Regards, > Vladimir > > On 8/5/15 2:41 PM, Dmitry Dmitriev wrote: >> Hello Vladimir, >> >> Thank you for review and explanation! >> >> I looked at the code and see that code does not return from STOP and >> this block executed only when ref kind not equal to >> expected. But it is possible that debug64 will not be called and >> execution continues? For example at VM start-up? Here a >> call chain which I see: >> JVM_RegisterMethodHandleMethods->MethodHandles::generate_adapters->MethodHandlesAdapterGenerator::generate->MethodHandles::generate_method_handle_interpreter_entry->MethodHandles::verify_ref_kind >> >> >> >> For quick experiment I add tty->print_cr() to the >> MethodHandles::verify_ref_kind, MacroAssembler::stop and >> MacroAssembler::debug64 and see that block with memory allocation is >> executed in this case, stop method is called, but >> debug64 is not executed and stop successfully finished. So, it >> explains why I see memory leak... Correct me if I am >> wrong. Thanks! >> >> Dmitry >> >> On 05.08.2015 20:54, Vladimir Kozlov wrote: >>> Looks good. >>> >>> Note, it is not real memory leak - code does not return from STOP >>> call. It either produce assert and exit or wait to >>> attach debugger (ShowMessageBoxOnError). See >>> MacroAssembler::debug64() for example. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/5/15 9:55 AM, Dmitry Dmitriev wrote: >>>> Hello, >>>> >>>> Please review this fix which remove small memory leak in debug >>>> build. Also, I need a sponsor for this fix, who can >>>> push it. >>>> >>>> MethodHandles::verify_ref_kind contains memory leak. Memory for >>>> 'buf' is allocated by NEW_C_HEAP_ARRAY but not freed >>>> after '__ STOP(buf);'. >>>> >>>> Webrev: http://cr.openjdk.java.net/~ddmitriev/8132728/webrev.00/ >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8132728 >>>> Tested: JPRT(hotspot test set), hotspot all, vm.quick >>>> >>>> Thanks, >>>> Dmitry >> From rickard.backman at oracle.com Thu Aug 6 08:24:41 2015 From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=) Date: Thu, 6 Aug 2015 10:24:41 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF538A.9080409@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> Message-ID: <20150806082441.GK12948@rbackman> Looks good. On 08/03, Andrew Dinn wrote: > Hi Tobias, > > On 03/08/15 12:35, Tobias Hartmann wrote: > > thanks for fixing that! Seems like I forgot the manual aarch64 > > testing for my latest webrev.. > > > > The changes look good. I can sponsor and push them into hs-comp after > > an official reviewer approved them. > > Thanks, Tobias. > > Do we need another reviewer for an AArch64-only change? If so then could > someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on > holiday so we don't have another AArch64 port dev to review? > > Thanks! > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) /R From adinn at redhat.com Thu Aug 6 14:04:49 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 06 Aug 2015 15:04:49 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <20150806082441.GK12948@rbackman> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> <20150806082441.GK12948@rbackman> Message-ID: <55C36981.5070001@redhat.com> On 06/08/15 09:24, Rickard B?ckman wrote: > Looks good. Thanks, Rickard! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From zoltan.majo at oracle.com Fri Aug 7 13:14:44 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 07 Aug 2015 15:14:44 +0200 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently Message-ID: <55C4AF44.3060907@oracle.com> Hi, please review the following patch for JDK-8076373. Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 Problem: On x86_32 systems with XMM instructions available, the compilers and the interpreter behave inconsistently as far as signalling NaNs (sNaNs) are concerned. For example, the following statement|| start == doubleToRawLongBits(longBitsToDouble(start)) can be true or false, assuming that the variable 'start' contains a bit pattern corresponding to a sNaN. The result is true if the statement is executed by compiled code and longBitsToDouble/doubleToRawLongBits have been replaced by compiler intrinsics. The result is false if the native library version of the functions is used (either by compiled or by interpreted code). The inconsistency happens because the interpreter/native ABI relies on x87 instructions to process floating point numbers, whereas the compilers use XMM registers for the same purpose. x87 instructions silently convert signaling NaNs to quiet NaNs, XMM instructions preserve sNaNs. Solution: - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers and therefore preserve sNaNs. The stubs are used by both the interpreter and the compilers. - Change the interpreter to use XMM registers instead of x87 registers to internally cache floating point values. As a result, sNaNs are preserved within the interpreter. Webrev: http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ Testing: - JPRT run, testset hotspot (including the newly added test, NaNTest.java); all tests pass; - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass that pass with the default version of the VM. Thank you and best regards, Zoltan From vladimir.kozlov at oracle.com Fri Aug 7 19:33:14 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 7 Aug 2015 12:33:14 -0700 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently In-Reply-To: <55C4AF44.3060907@oracle.com> References: <55C4AF44.3060907@oracle.com> Message-ID: <55C507FA.1090507@oracle.com> I think this is good. You need second review since changes are big and complex. Thanks, Vladimir On 8/7/15 6:14 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8076373. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 > > > Problem: On x86_32 systems with XMM instructions available, the > compilers and the interpreter behave inconsistently as far as signalling > NaNs (sNaNs) are concerned. For example, the following statement|| > > start == doubleToRawLongBits(longBitsToDouble(start)) > > can be true or false, assuming that the variable 'start' contains a bit > pattern corresponding to a sNaN. > > The result is true if the statement is executed by compiled code and > longBitsToDouble/doubleToRawLongBits have been replaced by compiler > intrinsics. The result is false if the native library version of the > functions is used (either by compiled or by interpreted code). > > The inconsistency happens because the interpreter/native ABI relies on > x87 instructions to process floating point numbers, whereas the > compilers use XMM registers for the same purpose. x87 instructions > silently convert signaling NaNs to quiet NaNs, XMM instructions preserve > sNaNs. > > > Solution: > - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, > java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, > and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers > and therefore preserve sNaNs. The stubs are used by both the interpreter > and the compilers. > - Change the interpreter to use XMM registers instead of x87 registers > to internally cache floating point values. As a result, sNaNs are > preserved within the interpreter. > > > Webrev: > http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ > > Testing: > - JPRT run, testset hotspot (including the newly added test, > NaNTest.java); all tests pass; > - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass > that pass with the default version of the VM. > > Thank you and best regards, > > > Zoltan > From michael.c.berg at intel.com Fri Aug 7 20:37:54 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 7 Aug 2015 20:37:54 +0000 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently In-Reply-To: <55C507FA.1090507@oracle.com> References: <55C4AF44.3060907@oracle.com> <55C507FA.1090507@oracle.com> Message-ID: Zoltan, the code looks ok. I have reviewed it in detail. Thanks, -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Friday, August 07, 2015 12:33 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently I think this is good. You need second review since changes are big and complex. Thanks, Vladimir On 8/7/15 6:14 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8076373. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 > > > Problem: On x86_32 systems with XMM instructions available, the > compilers and the interpreter behave inconsistently as far as > signalling NaNs (sNaNs) are concerned. For example, the following > statement|| > > start == doubleToRawLongBits(longBitsToDouble(start)) > > can be true or false, assuming that the variable 'start' contains a > bit pattern corresponding to a sNaN. > > The result is true if the statement is executed by compiled code and > longBitsToDouble/doubleToRawLongBits have been replaced by compiler > intrinsics. The result is false if the native library version of the > functions is used (either by compiled or by interpreted code). > > The inconsistency happens because the interpreter/native ABI relies on > x87 instructions to process floating point numbers, whereas the > compilers use XMM registers for the same purpose. x87 instructions > silently convert signaling NaNs to quiet NaNs, XMM instructions > preserve sNaNs. > > > Solution: > - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, > java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, > and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers > and therefore preserve sNaNs. The stubs are used by both the > interpreter and the compilers. > - Change the interpreter to use XMM registers instead of x87 registers > to internally cache floating point values. As a result, sNaNs are > preserved within the interpreter. > > > Webrev: > http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ > > Testing: > - JPRT run, testset hotspot (including the newly added test, > NaNTest.java); all tests pass; > - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass > that pass with the default version of the VM. > > Thank you and best regards, > > > Zoltan > From ahmed.khawaja at oracle.com Fri Aug 7 20:44:39 2015 From: ahmed.khawaja at oracle.com (Ahmed Khawaja) Date: Fri, 7 Aug 2015 13:44:39 -0700 Subject: Safepointing in HotSpot Message-ID: <55C518B7.3010006@oracle.com> Greetings, I am looking into when HotSpot decides to insert code for safepointing. My goal is to understand the decision process of when a safepoint is inserted and also to relay to an analysis tool that a certain instruction was inserted due to safepointing. I am looking into what criteria merit the insertion of a safepoint and how code can be optimized to avoid that. Can anyone point me in the direction of the source code in HotSpot responsible for this? I am able to identify manually the code sequences that result in a safepoint and realize they must be inserted somewhere before code motion is applied since they don't always show up as contiguous instructions. Thank you, Ahemd Khawaja From aleksey.shipilev at oracle.com Mon Aug 10 08:17:08 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Aug 2015 11:17:08 +0300 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B8FFA0.4070105@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> <55B8DC9C.7010003@oracle.com> <55B8FFA0.4070105@oracle.com> Message-ID: <55C85E04.7060002@oracle.com> On 07/29/2015 07:30 PM, Dean Long wrote: > On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>> >>>> Andrew/Edward, are you OK with AArch64 part? >>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>> I agree that it looks good. >> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >> Andrew Haley. Still no Capital (R)eviewers. >> >> Otherwise, I think we are good to go. I respinned the JPRT with >> open+closed sources, and it would seem the changes in closed sources are >> not required. > > The changes to sparc and ppc may not be required anymore. Excellent, please sponsor! http://cr.openjdk.java.net/~shade/8131682/8131682.changeset Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Mon Aug 10 09:13:37 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Aug 2015 12:13:37 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55BAE566.5020904@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> Message-ID: <55C86B41.9010909@oracle.com> Hi Vladimir! On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: > I think the test is wrong. It should be: > > if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); Um, no? I remember eyeballing the assembly to confirm this. For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store seems to have a boolean value, but "oldval" is oop. In other words, "load_store != 0" tests "(boolean)load_store != false". Current VM produces: 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d ; CAS fail, jump to respin ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 Patched VM piggybacks on the same result: 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d ; CAS success, jump to store barrier ??? 0x00007fe618af4125: jne 0x00007fe618af4070 ; CAS fail, jump to respin ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 Your suggestion seems to ignore the test completely (GVN helped?), and while it's still technically correct with emitting the barrier always, it defeats the purpose of the change: 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b ???? 0x00007f7790aefd36: mov $0x0,%eax 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax ; CAS fail, jump back to respin ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 ; CAS success, follow to exit Also, AFAIU, performance results would look different if we screwed the success check. But they seem to be coherent with our expectations: when CAS fails, either the conditional card marking or this change helps, and the change does not help when CAS succeeds. Thanks, -Aleksey > Thanks, > Vladimir > > On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>> I would like to suggest a fix for: >>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>> >>>> In short, current reference CAS intrinsic blindly emits >>>> post_barrier, ignoring the CAS result. In some cases, notably >>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>> post_barrier excessively. Instead, we can conditionalize on the >>>> result of the store itself, and put the post_barrier only on >>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>> >>>> More performance results here: >>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>> >>> Nice! The code looks fine and your test results are very convincing. >>> I'll be interested to see how this looks on AArch64. >> >> Thanks Andrew! >> >> The change passes JPRT, so AArch64 build is available. The benchmark JAR >> mentioned in the issue comments would run without intervention, taking >> around 40 minutes. You are very welcome to try, while Reviewers are >> taking a look. I can do that only next week. >> >>> That said, I am afraid you still need a Reviewer! >> >> That reminds me I haven't spelled out what testing was done: >> >> * JPRT on all open platforms >> * Targeted benchmarks >> * Eyeballing the generated x86 assembly >> >> Thanks, >> -Aleksey >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dean.long at oracle.com Mon Aug 10 19:42:48 2015 From: dean.long at oracle.com (Dean) Date: Mon, 10 Aug 2015 12:42:48 -0700 Subject: aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? dl Aleksey Shipilev wrote: >On 07/29/2015 07:30 PM, Dean Long wrote: >> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>> >>>>> Andrew/Edward, are you OK with AArch64 part? >>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>> I agree that it looks good. >>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>> Andrew Haley. Still no Capital (R)eviewers. >>> >>> Otherwise, I think we are good to go. I respinned the JPRT with >>> open+closed sources, and it would seem the changes in closed sources are >>> not required. >> >> The changes to sparc and ppc may not be required anymore. > >Excellent, please sponsor! > http://cr.openjdk.java.net/~shade/8131682/8131682.changeset > >Thanks, >-Aleksey > > From dean.long at oracle.com Mon Aug 10 19:57:03 2015 From: dean.long at oracle.com (Dean) Date: Mon, 10 Aug 2015 12:57:03 -0700 Subject: aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: Did you get a Reviewer yet? dl Dean wrote: >I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? > >dl > > >Aleksey Shipilev wrote: >>On 07/29/2015 07:30 PM, Dean Long wrote: >>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>> >>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>> I agree that it looks good. >>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>> Andrew Haley. Still no Capital (R)eviewers. >>>> >>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>> open+closed sources, and it would seem the changes in closed sources are >>>> not required. >>> >>> The changes to sparc and ppc may not be required anymore. >> >>Excellent, please sponsor! >> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >> >>Thanks, >>-Aleksey >> >> From vladimir.kozlov at oracle.com Tue Aug 11 02:21:33 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Aug 2015 19:21:33 -0700 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55C86B41.9010909@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> Message-ID: <55C95C2D.9050900@oracle.com> My bad, I forgot that CompareAndSwapP assembler code produces Boolean value in register. I mistook it for StorePConditional which produces flag. But I think you can get better code since you want to generate test and main point of having specialized CompareAndSwapP is to avoid test instruction. If we use StorePConditional instead of CompareAndSwapP we may remove second branch: > ??? 0x00007fa06809cdd1: test %r11d,%r11d > ; CAS fail, jump to respin > ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 > ; CAS success, jump to store barrier > But C2 changes will be much larger. We would need new Ideal::if_then() which take in result of StorePConditional and set load_store on both paths to 0/1. We may need to play with probability of if_then() to get barrier in follow code. Thanks, Vladimir On 8/10/15 2:13 AM, Aleksey Shipilev wrote: > Hi Vladimir! > > On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >> I think the test is wrong. It should be: >> >> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); > > Um, no? I remember eyeballing the assembly to confirm this. > > For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store > seems to have a boolean value, but "oldval" is oop. In other words, > "load_store != 0" tests "(boolean)load_store != false". > > Current VM produces: > > 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) > 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b > 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d > > 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d > > ; CAS fail, jump to respin > ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 > > > Patched VM piggybacks on the same result: > > 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) > 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b > 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d > 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d > > ; CAS success, jump to store barrier > ??? 0x00007fe618af4125: jne 0x00007fe618af4070 > > ; CAS fail, jump to respin > ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 > > > Your suggestion seems to ignore the test completely (GVN helped?), and > while it's still technically correct with emitting the barrier always, > it defeats the purpose of the change: > > 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) > 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax > 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b > ???? 0x00007f7790aefd36: mov $0x0,%eax > > 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax > > ; CAS fail, jump back to respin > ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 > > ; CAS success, follow to exit > > Also, AFAIU, performance results would look different if we screwed the > success check. But they seem to be coherent with our expectations: when > CAS fails, either the conditional card marking or this change helps, and > the change does not help when CAS succeeds. > > Thanks, > -Aleksey > >> Thanks, >> Vladimir >> >> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>> I would like to suggest a fix for: >>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>> >>>>> In short, current reference CAS intrinsic blindly emits >>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>> result of the store itself, and put the post_barrier only on >>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>> >>>>> More performance results here: >>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>> >>>> Nice! The code looks fine and your test results are very convincing. >>>> I'll be interested to see how this looks on AArch64. >>> >>> Thanks Andrew! >>> >>> The change passes JPRT, so AArch64 build is available. The benchmark JAR >>> mentioned in the issue comments would run without intervention, taking >>> around 40 minutes. You are very welcome to try, while Reviewers are >>> taking a look. I can do that only next week. >>> >>>> That said, I am afraid you still need a Reviewer! >>> >>> That reminds me I haven't spelled out what testing was done: >>> >>> * JPRT on all open platforms >>> * Targeted benchmarks >>> * Eyeballing the generated x86 assembly >>> >>> Thanks, >>> -Aleksey >>> >>> > > From aleksey.shipilev at oracle.com Tue Aug 11 09:22:58 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Aug 2015 12:22:58 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55C95C2D.9050900@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> Message-ID: <55C9BEF2.2030100@oracle.com> Hi Vladimir, My previous disassembly demonstrated the code generated for CAS spinloop. There, it's easy to confuse the "second" branch with a proper backbranch in the loop. Here is the disassembly for the "one-off" failing CAS with patched VM: ? 0x00007fa3acba446c: lock cmpxchg %r11d,(%r10) 46.63% 83.18% ? 0x00007fa3acba4471: sete %r8b 0.03% ? 0x00007fa3acba4475: movzbl %r8b,%r8d 2.23% ? 0x00007fa3acba4479: test %r8d,%r8d <- removable? ?? 0x00007fa3acba447c: je 0x00007fa3acba4490 ?? 0x00007fa3acba447e: shr $0x9,%r10 ?? 0x00007fa3acba4482: movabs $0x7fa3a0dbf000,%r11 ?? 0x00007fa3acba448c: mov %r12b,(%r11,%r10,1) 0.93% ?? 0x00007fa3acba4490: mov %r8d,%eax 0.04% ? 0x00007fa3acba4493: add $0x20,%rsp 1.05% ? 0x00007fa3acba4497: pop %rbp 0.98% ? 0x00007fa3acba4498: test %eax,0x11af2b62(%rip) ? ? 0x00007fa3acba449e: retq ...compare this to baseline VM that does an unconditional barrier: 2.31% 3.64% ? 0x00007fcf595fd4f9: lock cmpxchg %r10d,(%r11) 43.22% 78.37% ? 0x00007fcf595fd4fe: sete %r8b 0.04% ? 0x00007fcf595fd502: movzbl %r8b,%r8d 2.20% ? 0x00007fcf595fd506: mov %r11,%r10 ? 0x00007fcf595fd509: shr $0x9,%r10 ? 0x00007fcf595fd50d: movabs $0x7fcf4dd0c000,%r11 ? 0x00007fcf595fd517: mov %r12b,(%r11,%r10,1) 2.20% ? 0x00007fcf595fd51b: mov %r8d,%eax ? 0x00007fcf595fd51e: add $0x20,%rsp ? 0x00007fcf595fd522: pop %rbp 1.82% ? 0x00007fcf595fd523: test %eax,0x12383ad7(%rip) ? ? 0x00007fcf595fd529: retq Well, yeah, I can see that test at 0x00007fa3acba4479 is avoidable, since cmpxchg already sets the flag. But, I doubt it actually matters, since: a) test-je are routinely macrofused into single uop on modern x86; b) the flag is materialized in register anyway for method return; c) as you predicted, my quick exploration blows up considerably; Notably, handling native oops require missing StoreNConditionalNode, which spreads all the way to AD and various places in compiler that match StorePConditionalNode. Also, my naive attempts of using Ideal to pick up StoreNConditional result and produce 0/1 yields full branches, not the "sete" that is coming from CompareAndSwapP AD encoding -- with terrible performance results. With that, I think we should play it safe, and push the existing obviously correct version that improves performance a lot, instead of blowing up the complexity for purely theoretical improvement. Thanks, -Aleksey On 08/11/2015 05:21 AM, Vladimir Kozlov wrote: > My bad, I forgot that CompareAndSwapP assembler code produces Boolean > value in register. I mistook it for StorePConditional which produces flag. > > But I think you can get better code since you want to generate test and > main point of having specialized CompareAndSwapP is to avoid test > instruction. > > If we use StorePConditional instead of CompareAndSwapP we may remove > second branch: > >> ??? 0x00007fa06809cdd1: test %r11d,%r11d >> ; CAS fail, jump to respin >> ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 >> ; CAS success, jump to store barrier >> > > But C2 changes will be much larger. We would need new Ideal::if_then() > which take in result of StorePConditional and set load_store on both > paths to 0/1. > > We may need to play with probability of if_then() to get barrier in > follow code. > > Thanks, > Vladimir > > On 8/10/15 2:13 AM, Aleksey Shipilev wrote: >> Hi Vladimir! >> >> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >>> I think the test is wrong. It should be: >>> >>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); >> >> Um, no? I remember eyeballing the assembly to confirm this. >> >> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store >> seems to have a boolean value, but "oldval" is oop. In other words, >> "load_store != 0" tests "(boolean)load_store != false". >> >> Current VM produces: >> >> 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) >> 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b >> 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d >> >> 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d >> >> ; CAS fail, jump to respin >> ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 >> >> >> Patched VM piggybacks on the same result: >> >> 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) >> 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b >> 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d >> 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d >> >> ; CAS success, jump to store barrier >> ??? 0x00007fe618af4125: jne 0x00007fe618af4070 >> >> ; CAS fail, jump to respin >> ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 >> >> >> Your suggestion seems to ignore the test completely (GVN helped?), and >> while it's still technically correct with emitting the barrier always, >> it defeats the purpose of the change: >> >> 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) >> 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax >> 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b >> ???? 0x00007f7790aefd36: mov $0x0,%eax >> >> 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax >> >> ; CAS fail, jump back to respin >> ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 >> >> ; CAS success, follow to exit >> >> Also, AFAIU, performance results would look different if we screwed the >> success check. But they seem to be coherent with our expectations: when >> CAS fails, either the conditional card marking or this change helps, and >> the change does not help when CAS succeeds. >> >> Thanks, >> -Aleksey >> >>> Thanks, >>> Vladimir >>> >>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>>> I would like to suggest a fix for: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>>> >>>>>> In short, current reference CAS intrinsic blindly emits >>>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>>> result of the store itself, and put the post_barrier only on >>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>>> >>>>>> More performance results here: >>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>>> >>>>> Nice! The code looks fine and your test results are very convincing. >>>>> I'll be interested to see how this looks on AArch64. >>>> >>>> Thanks Andrew! >>>> >>>> The change passes JPRT, so AArch64 build is available. The benchmark >>>> JAR >>>> mentioned in the issue comments would run without intervention, taking >>>> around 40 minutes. You are very welcome to try, while Reviewers are >>>> taking a look. I can do that only next week. >>>> >>>>> That said, I am afraid you still need a Reviewer! >>>> >>>> That reminds me I haven't spelled out what testing was done: >>>> >>>> * JPRT on all open platforms >>>> * Targeted benchmarks >>>> * Eyeballing the generated x86 assembly >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Tue Aug 11 09:27:38 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Aug 2015 12:27:38 +0300 Subject: aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: References: Message-ID: <55C9C00A.3040302@oracle.com> Hi Dean, Ah yes, since we now use MacroAssembler::align to produce the effective alignment, we can drop the platform-specific changes. ARM and PPC ports may rewire their own MacroAssemblers if there are potentially better nop sequences. New changeset: http://cr.openjdk.java.net/~shade/8131682/8131682.changeset Tested it builds and runs with full JPRT. See the "Reviewed-by" line there. I think there are Reviewers there... Thanks, -Aleksey On 08/10/2015 10:57 PM, Dean wrote: > Did you get a Reviewer yet? > > dl > > > Dean wrote: >> I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? >> >> dl >> >> >> Aleksey Shipilev wrote: >>> On 07/29/2015 07:30 PM, Dean Long wrote: >>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>>> >>>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>>> I agree that it looks good. >>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>>> Andrew Haley. Still no Capital (R)eviewers. >>>>> >>>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>>> open+closed sources, and it would seem the changes in closed sources are >>>>> not required. >>>> >>>> The changes to sparc and ppc may not be required anymore. >>> >>> Excellent, please sponsor! >>> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >>> >>> Thanks, >>> -Aleksey >>> >>> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dawid.weiss at gmail.com Tue Aug 11 14:25:48 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Tue, 11 Aug 2015 16:25:48 +0200 Subject: Transient miscompilation problem on 1.8 (invalid AIOOB/NPE thrown from the method body). Message-ID: Hello, We have encountered a transient miscompilation problem (on 1.8u40). We get an AIOOB exception from a snippet of code which (provably) cannot throw it. The AIOOB is thrown without a stack trace. What's interesting is that when we set: -XX:-OmitStackTraceInFastThrow we get an NPE exception (which, again, is provably impossible at Java code level). The problem does not reproduce on my machine with i7 3770K (at least so far), but does reproduce consistently on i7 2600K (and our customer's machine; exact spec unknown). I will be looking into isolating this issue as it is in our proprietary code, but the pattern seems to be as follows: 1) new instance of A is created, with a new instance of B, which is a single-implementation of interface C. 2) there is a tight loop which calls A (and B) methods. There is no way for an AIOOB (or NPE) to be present in any of A or B, but the stack trace indicates A. I suspect an OSR miscompilation somewhere, but since I can't reproduce it locally it's a bit of a problem to experiment with JVM versions and internal flags. Any hints on what it can be related to (flags to try, etc.) would be appreciated. Dawid From edward.nevill at gmail.com Tue Aug 11 15:57:33 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 11 Aug 2015 16:57:33 +0100 Subject: 8133352: aarch64: generates constrained unpredictable instructions Message-ID: <1439308653.5920.16.camel@mylittlepony.linaroharston> Hi, Webrev http://cr.openjdk.java.net/~enevill/8133352/ fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions. The two cases being generates are STXR Rs, Rt, [Rn] where Rs == Rt and LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code) On the particular vendors HW the behavior for these instructions is to generate a SIGILL. Unfortunately the fix for this is non trivial, the reason being that STXR Rs, Rt, [Rn] requires Rs != Rt != Rn however we only have 2 scratch registers. The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes. The alternative solution would be create a temp by pushing a register on the stack. I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it. Thanks for your help, Ed. From vladimir.kozlov at oracle.com Tue Aug 11 15:58:31 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 08:58:31 -0700 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55C9BEF2.2030100@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> Message-ID: <55CA1BA7.4080907@oracle.com> Thank you for doing additional experiments, Aleksey, and explanation. Now I agree with your changes. Reviewed. Thanks, Vladimir On 8/11/15 2:22 AM, Aleksey Shipilev wrote: > Hi Vladimir, > > My previous disassembly demonstrated the code generated for CAS > spinloop. There, it's easy to confuse the "second" branch with a proper > backbranch in the loop. Here is the disassembly for the "one-off" > failing CAS with patched VM: > > ? 0x00007fa3acba446c: lock cmpxchg %r11d,(%r10) > 46.63% 83.18% ? 0x00007fa3acba4471: sete %r8b > 0.03% ? 0x00007fa3acba4475: movzbl %r8b,%r8d > 2.23% ? 0x00007fa3acba4479: test %r8d,%r8d <- removable? > ?? 0x00007fa3acba447c: je 0x00007fa3acba4490 > ?? 0x00007fa3acba447e: shr $0x9,%r10 > ?? 0x00007fa3acba4482: movabs $0x7fa3a0dbf000,%r11 > ?? 0x00007fa3acba448c: mov %r12b,(%r11,%r10,1) > 0.93% ?? 0x00007fa3acba4490: mov %r8d,%eax > 0.04% ? 0x00007fa3acba4493: add $0x20,%rsp > 1.05% ? 0x00007fa3acba4497: pop %rbp > 0.98% ? 0x00007fa3acba4498: test %eax,0x11af2b62(%rip) > ? > ? 0x00007fa3acba449e: retq > > ...compare this to baseline VM that does an unconditional barrier: > > 2.31% 3.64% ? 0x00007fcf595fd4f9: lock cmpxchg %r10d,(%r11) > 43.22% 78.37% ? 0x00007fcf595fd4fe: sete %r8b > 0.04% ? 0x00007fcf595fd502: movzbl %r8b,%r8d > 2.20% ? 0x00007fcf595fd506: mov %r11,%r10 > ? 0x00007fcf595fd509: shr $0x9,%r10 > ? 0x00007fcf595fd50d: movabs $0x7fcf4dd0c000,%r11 > ? 0x00007fcf595fd517: mov %r12b,(%r11,%r10,1) > 2.20% ? 0x00007fcf595fd51b: mov %r8d,%eax > ? 0x00007fcf595fd51e: add $0x20,%rsp > ? 0x00007fcf595fd522: pop %rbp > 1.82% ? 0x00007fcf595fd523: test %eax,0x12383ad7(%rip) > ? > ? 0x00007fcf595fd529: retq > > Well, yeah, I can see that test at 0x00007fa3acba4479 is avoidable, > since cmpxchg already sets the flag. But, I doubt it actually matters, > since: > a) test-je are routinely macrofused into single uop on modern x86; > b) the flag is materialized in register anyway for method return; > c) as you predicted, my quick exploration blows up considerably; > > Notably, handling native oops require missing StoreNConditionalNode, > which spreads all the way to AD and various places in compiler that > match StorePConditionalNode. Also, my naive attempts of using Ideal to > pick up StoreNConditional result and produce 0/1 yields full branches, > not the "sete" that is coming from CompareAndSwapP AD encoding -- with > terrible performance results. > > With that, I think we should play it safe, and push the existing > obviously correct version that improves performance a lot, instead of > blowing up the complexity for purely theoretical improvement. > > Thanks, > -Aleksey > > On 08/11/2015 05:21 AM, Vladimir Kozlov wrote: >> My bad, I forgot that CompareAndSwapP assembler code produces Boolean >> value in register. I mistook it for StorePConditional which produces flag. >> >> But I think you can get better code since you want to generate test and >> main point of having specialized CompareAndSwapP is to avoid test >> instruction. >> >> If we use StorePConditional instead of CompareAndSwapP we may remove >> second branch: >> >>> ??? 0x00007fa06809cdd1: test %r11d,%r11d >>> ; CAS fail, jump to respin >>> ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 >>> ; CAS success, jump to store barrier >>> >> >> But C2 changes will be much larger. We would need new Ideal::if_then() >> which take in result of StorePConditional and set load_store on both >> paths to 0/1. >> >> We may need to play with probability of if_then() to get barrier in >> follow code. >> >> Thanks, >> Vladimir >> >> On 8/10/15 2:13 AM, Aleksey Shipilev wrote: >>> Hi Vladimir! >>> >>> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >>>> I think the test is wrong. It should be: >>>> >>>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); >>> >>> Um, no? I remember eyeballing the assembly to confirm this. >>> >>> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store >>> seems to have a boolean value, but "oldval" is oop. In other words, >>> "load_store != 0" tests "(boolean)load_store != false". >>> >>> Current VM produces: >>> >>> 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) >>> 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b >>> 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d >>> >>> 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d >>> >>> ; CAS fail, jump to respin >>> ??? 0x00007fa06809cdd4: je 0x00007fa06809ccf0 >>> >>> >>> Patched VM piggybacks on the same result: >>> >>> 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) >>> 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b >>> 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d >>> 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d >>> >>> ; CAS success, jump to store barrier >>> ??? 0x00007fe618af4125: jne 0x00007fe618af4070 >>> >>> ; CAS fail, jump to respin >>> ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 >>> >>> >>> Your suggestion seems to ignore the test completely (GVN helped?), and >>> while it's still technically correct with emitting the barrier always, >>> it defeats the purpose of the change: >>> >>> 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) >>> 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax >>> 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b >>> ???? 0x00007f7790aefd36: mov $0x0,%eax >>> >>> 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax >>> >>> ; CAS fail, jump back to respin >>> ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 >>> >>> ; CAS success, follow to exit >>> >>> Also, AFAIU, performance results would look different if we screwed the >>> success check. But they seem to be coherent with our expectations: when >>> CAS fails, either the conditional card marking or this change helps, and >>> the change does not help when CAS succeeds. >>> >>> Thanks, >>> -Aleksey >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>>>> I would like to suggest a fix for: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>>>> >>>>>>> In short, current reference CAS intrinsic blindly emits >>>>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>>>> result of the store itself, and put the post_barrier only on >>>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>>>> >>>>>>> More performance results here: >>>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>>>> >>>>>> Nice! The code looks fine and your test results are very convincing. >>>>>> I'll be interested to see how this looks on AArch64. >>>>> >>>>> Thanks Andrew! >>>>> >>>>> The change passes JPRT, so AArch64 build is available. The benchmark >>>>> JAR >>>>> mentioned in the issue comments would run without intervention, taking >>>>> around 40 minutes. You are very welcome to try, while Reviewers are >>>>> taking a look. I can do that only next week. >>>>> >>>>>> That said, I am afraid you still need a Reviewer! >>>>> >>>>> That reminds me I haven't spelled out what testing was done: >>>>> >>>>> * JPRT on all open platforms >>>>> * Targeted benchmarks >>>>> * Eyeballing the generated x86 assembly >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >>>>> >>> >>> > > From vladimir.kozlov at oracle.com Tue Aug 11 16:55:08 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 09:55:08 -0700 Subject: 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439308653.5920.16.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> Message-ID: <55CA28EC.7060109@oracle.com> I think it depends how expensive push/pop on arm64. In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in .ad). So you are saving on stack anyway. On other hand your changes (third temp) are not so big and I think acceptable. Thanks, Vladimir On 8/11/15 8:57 AM, Edward Nevill wrote: > Hi, > > Webrev http://cr.openjdk.java.net/~enevill/8133352/ > > fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions. > > The two cases being generates are > > STXR Rs, Rt, [Rn] where Rs == Rt > > and > > LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code) > > On the particular vendors HW the behavior for these instructions is to generate a SIGILL. > > Unfortunately the fix for this is non trivial, the reason being that > > STXR Rs, Rt, [Rn] > > requires Rs != Rt != Rn however we only have 2 scratch registers. > > The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes. > > The alternative solution would be create a temp by pushing a register on the stack. > > I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it. > > Thanks for your help, > Ed. > > From dawid.weiss at gmail.com Tue Aug 11 21:27:08 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Tue, 11 Aug 2015 23:27:08 +0200 Subject: Transient miscompilation problem on 1.8 (invalid AIOOB/NPE thrown from the method body). In-Reply-To: References: Message-ID: We tried to narrow it down. The problem is tied to tiered compilation somehow because turning it off makes the test pass with flying colors: # 1.8.0_45-b14 PASSES -Xint PASSES -Xmx4g -Xbatch -XX:CICompilerCount=1 -XX:-TieredCompilation PASSES -Xmx4g -XX:-TieredCompilation FAILS -Xmx4g -XX:+TieredCompilation FAILS -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation What's more interesting is that 1.9 and the most recent ea of 1.8 (u60) also pass, even with tiered compilation turned on: # 1.9.0-ea-b71 PASSES -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation # 1.8.0_60-ea-b25 PASSES [always, regardless of options] I can't tell whether it's something masking the original problem or whether the bug has been fixed in between. I looked at JIRA logs, but can't find anything specific. If somebody knows what this could be, I'd appreciate a pointer. Dawid On Tue, Aug 11, 2015 at 4:25 PM, Dawid Weiss wrote: > Hello, > > We have encountered a transient miscompilation problem (on 1.8u40). We > get an AIOOB exception from a snippet of code which (provably) cannot > throw it. The AIOOB is thrown without a stack trace. What's > interesting is that when we set: > > -XX:-OmitStackTraceInFastThrow > > we get an NPE exception (which, again, is provably impossible at Java > code level). > > The problem does not reproduce on my machine with i7 3770K (at least > so far), but does reproduce consistently on i7 2600K (and our > customer's machine; exact spec unknown). > > I will be looking into isolating this issue as it is in our > proprietary code, but the pattern seems to be as follows: > > 1) new instance of A is created, with a new instance of B, which is a > single-implementation of interface C. > > 2) there is a tight loop which calls A (and B) methods. > > There is no way for an AIOOB (or NPE) to be present in any of A or B, > but the stack trace indicates A. > > I suspect an OSR miscompilation somewhere, but since I can't reproduce > it locally it's a bit of a problem to experiment with JVM versions and > internal flags. > > Any hints on what it can be related to (flags to try, etc.) would be > appreciated. > > Dawid From vladimir.kozlov at oracle.com Tue Aug 11 23:57:59 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 16:57:59 -0700 Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of Compile::unique() in appropriate places In-Reply-To: <55B9EF79.1040907@oracle.com> References: <55A9AAB6.50505@oracle.com> <55B9EF79.1040907@oracle.com> Message-ID: <55CA8C07.2080404@oracle.com> I pushed changes: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/af60f1cb36f2 Thanks, Vladimir K On 7/30/15 2:33 AM, Vladimir Ivanov wrote: > Looks good. > I'll sponsor the change. > > Best regards, > Vladimir Ivanov > > On 7/18/15 4:24 AM, Vladimir Kozlov wrote: >> Thank you, Vlad >> >> It looks good. We usually don't put bug id into comments. So your >> previous version on cr.openjdk is fine. >> >> Second reviewer should look on and sponsor it with you listed as >> contributor (I see you signed OCA already). >> >> Thanks, >> Vladimir >> >> On 7/17/15 3:47 PM, Vlad Ureche wrote: >>> Hi, >>> >>> Please review the following patch for JDK-8011858. Big thanks to >>> Vladimir Kozlov for his patient guidance while working on this! >>> >>> *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858 >>> >>> *Problem:* Throughout C2, local stacks are used to prevent recursive >>> calls from blowing up the system stack. These are sized based on the >>> total number of nodes in the compilation run (e.g. C->unique()). >>> Instead, they should be sized based on the live node count >>> (C->live_nodes()). >>> >>> Now, with the increased difference between live_nodes (limited at >>> LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go >>> up to 240K), it is important to not over-estimate the size of stacks. >>> >>> *Solution:* This patch mirrors a patch written by Vladimir Kozlov for >>> JDK8u. It replaces the initial sizes from C->unique() to >>> C->live_nodes(), preserving any shifts (divisions) and offsets. For >>> example, in the compile.cpp patch >>> : >>> >>> >>> >>> |- Node_Stack nstack(unique() >> 1); >>> + Node_Stack nstack(live_nodes() >> 1); >>> | >>> >>> There is an issue described at >>> https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the >>> workaround from Vladimir?s patch. >>> >>> *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or >>> http://vladureche.ro/webrev/8011858 >>> (updated, includes a link to bug >>> 8121702) >>> >>> *Tests:* Running jtreg with the compiler, runtime and gc tests on the >>> dev branch shows the same status >>> before and after the patch: 808 tests passed, 16 failed and 6 errors >>> . What >>> would be a stable point where all tests are expected to pass, so I can >>> test the patch there? Maybe jdk9 ? >>> >>> Thanks, >>> Vlad >>> From aleksey.shipilev at oracle.com Wed Aug 12 07:04:12 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 10:04:12 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55CA1BA7.4080907@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> Message-ID: <55CAEFEC.6090005@oracle.com> Thanks, Vladimir! Here's a changeset: http://cr.openjdk.java.net/~shade/8019968/8019968.changeset Please sponsor! -Aleksey On 08/11/2015 06:58 PM, Vladimir Kozlov wrote: > Thank you for doing additional experiments, Aleksey, and explanation. > Now I agree with your changes. Reviewed. > > Thanks, > Vladimir > > On 8/11/15 2:22 AM, Aleksey Shipilev wrote: >> Hi Vladimir, >> >> My previous disassembly demonstrated the code generated for CAS >> spinloop. There, it's easy to confuse the "second" branch with a proper >> backbranch in the loop. Here is the disassembly for the "one-off" >> failing CAS with patched VM: >> >> ? 0x00007fa3acba446c: lock cmpxchg %r11d,(%r10) >> 46.63% 83.18% ? 0x00007fa3acba4471: sete %r8b >> 0.03% ? 0x00007fa3acba4475: movzbl %r8b,%r8d >> 2.23% ? 0x00007fa3acba4479: test %r8d,%r8d <- removable? >> ?? 0x00007fa3acba447c: je 0x00007fa3acba4490 >> ?? 0x00007fa3acba447e: shr $0x9,%r10 >> ?? 0x00007fa3acba4482: movabs $0x7fa3a0dbf000,%r11 >> ?? 0x00007fa3acba448c: mov %r12b,(%r11,%r10,1) >> 0.93% ?? 0x00007fa3acba4490: mov %r8d,%eax >> 0.04% ? 0x00007fa3acba4493: add $0x20,%rsp >> 1.05% ? 0x00007fa3acba4497: pop %rbp >> 0.98% ? 0x00007fa3acba4498: test %eax,0x11af2b62(%rip) >> ? >> ? 0x00007fa3acba449e: retq >> >> ...compare this to baseline VM that does an unconditional barrier: >> >> 2.31% 3.64% ? 0x00007fcf595fd4f9: lock cmpxchg %r10d,(%r11) >> 43.22% 78.37% ? 0x00007fcf595fd4fe: sete %r8b >> 0.04% ? 0x00007fcf595fd502: movzbl %r8b,%r8d >> 2.20% ? 0x00007fcf595fd506: mov %r11,%r10 >> ? 0x00007fcf595fd509: shr $0x9,%r10 >> ? 0x00007fcf595fd50d: movabs $0x7fcf4dd0c000,%r11 >> ? 0x00007fcf595fd517: mov %r12b,(%r11,%r10,1) >> 2.20% ? 0x00007fcf595fd51b: mov %r8d,%eax >> ? 0x00007fcf595fd51e: add $0x20,%rsp >> ? 0x00007fcf595fd522: pop %rbp >> 1.82% ? 0x00007fcf595fd523: test %eax,0x12383ad7(%rip) >> ? >> ? 0x00007fcf595fd529: retq >> >> Well, yeah, I can see that test at 0x00007fa3acba4479 is avoidable, >> since cmpxchg already sets the flag. But, I doubt it actually matters, >> since: >> a) test-je are routinely macrofused into single uop on modern x86; >> b) the flag is materialized in register anyway for method return; >> c) as you predicted, my quick exploration blows up considerably; >> >> Notably, handling native oops require missing StoreNConditionalNode, >> which spreads all the way to AD and various places in compiler that >> match StorePConditionalNode. Also, my naive attempts of using Ideal to >> pick up StoreNConditional result and produce 0/1 yields full branches, >> not the "sete" that is coming from CompareAndSwapP AD encoding -- with >> terrible performance results. >> >> With that, I think we should play it safe, and push the existing >> obviously correct version that improves performance a lot, instead of >> blowing up the complexity for purely theoretical improvement. >> >> Thanks, >> -Aleksey >> >> On 08/11/2015 05:21 AM, Vladimir Kozlov wrote: >>> My bad, I forgot that CompareAndSwapP assembler code produces Boolean >>> value in register. I mistook it for StorePConditional which produces >>> flag. >>> >>> But I think you can get better code since you want to generate test and >>> main point of having specialized CompareAndSwapP is to avoid test >>> instruction. >>> >>> If we use StorePConditional instead of CompareAndSwapP we may remove >>> second branch: >>> >>>> ??? 0x00007fa06809cdd1: test %r11d,%r11d >>>> ; CAS fail, jump to respin >>>> ? ? 0x00007fe618af412b: jeq 0x00007fe618af4082 >>>> ; CAS success, jump to store barrier >>>> >>> >>> But C2 changes will be much larger. We would need new Ideal::if_then() >>> which take in result of StorePConditional and set load_store on both >>> paths to 0/1. >>> >>> We may need to play with probability of if_then() to get barrier in >>> follow code. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/10/15 2:13 AM, Aleksey Shipilev wrote: >>>> Hi Vladimir! >>>> >>>> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote: >>>>> I think the test is wrong. It should be: >>>>> >>>>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); >>>> >>>> Um, no? I remember eyeballing the assembly to confirm this. >>>> >>>> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store >>>> seems to have a boolean value, but "oldval" is oop. In other words, >>>> "load_store != 0" tests "(boolean)load_store != false". >>>> >>>> Current VM produces: >>>> >>>> 13.46% 45.85% ??? 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi) >>>> 14.91% 4.41% ??? 0x00007fa06809cdb4: sete %r11b >>>> 0.07% ??? 0x00007fa06809cdb8: movzbl %r11b,%r11d >>>> >>>> 0.06% ??? 0x00007fa06809cdd1: test %r11d,%r11d >>>> >>>> ; CAS fail, jump to respin >>>> ??? 0x00007fa06809cdd4: je >>>> 0x00007fa06809ccf0 >>>> >>>> >>>> Patched VM piggybacks on the same result: >>>> >>>> 1.97% 0.05% ??? 0x00007fe618af4115: lock cmpxchg %ebx,(%r9) >>>> 50.59% 90.01% ??? 0x00007fe618af411a: sete %r11b >>>> 0.05% 0.01% ??? 0x00007fe618af411e: movzbl %r11b,%r11d >>>> 3.02% 1.80% ??? 0x00007fe618af4122: test %r11d,%r11d >>>> >>>> ; CAS success, jump to store barrier >>>> ??? 0x00007fe618af4125: jne 0x00007fe618af4070 >>>> >>>> ; CAS fail, jump to respin >>>> ? ? 0x00007fe618af412b: jmpq 0x00007fe618af4082 >>>> >>>> >>>> Your suggestion seems to ignore the test completely (GVN helped?), and >>>> while it's still technically correct with emitting the barrier always, >>>> it defeats the purpose of the change: >>>> >>>> 2.35% 4.97% ? ?? 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx) >>>> 48.54% 86.32% ? ?? 0x00007f7790aefd2b: mov $0x1,%eax >>>> 0.03% ???? 0x00007f7790aefd30: je 0x00007f7790aefd3b >>>> ???? 0x00007f7790aefd36: mov $0x0,%eax >>>> >>>> 2.16% 0.01% ? ?? 0x00007f7790aefd53: cmp $0x0,%eax >>>> >>>> ; CAS fail, jump back to respin >>>> ? ?? 0x00007f7790aefd56: je 0x00007f7790aefd10 >>>> >>>> ; CAS success, follow to exit >>>> >>>> Also, AFAIU, performance results would look different if we screwed the >>>> success check. But they seem to be coherent with our expectations: when >>>> CAS fails, either the conditional card marking or this change helps, >>>> and >>>> the change does not help when CAS succeeds. >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote: >>>>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote: >>>>>>> On 29/07/15 09:58, Aleksey Shipilev wrote: >>>>>>>> I would like to suggest a fix for: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968 >>>>>>> >>>>>>>> In short, current reference CAS intrinsic blindly emits >>>>>>>> post_barrier, ignoring the CAS result. In some cases, notably >>>>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>>>>>>> post_barrier excessively. Instead, we can conditionalize on the >>>>>>>> result of the store itself, and put the post_barrier only on >>>>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >>>>>>> >>>>>>>> More performance results here: >>>>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >>>>>>> >>>>>>> Nice! The code looks fine and your test results are very convincing. >>>>>>> I'll be interested to see how this looks on AArch64. >>>>>> >>>>>> Thanks Andrew! >>>>>> >>>>>> The change passes JPRT, so AArch64 build is available. The benchmark >>>>>> JAR >>>>>> mentioned in the issue comments would run without intervention, >>>>>> taking >>>>>> around 40 minutes. You are very welcome to try, while Reviewers are >>>>>> taking a look. I can do that only next week. >>>>>> >>>>>>> That said, I am afraid you still need a Reviewer! >>>>>> >>>>>> That reminds me I haven't spelled out what testing was done: >>>>>> >>>>>> * JPRT on all open platforms >>>>>> * Targeted benchmarks >>>>>> * Eyeballing the generated x86 assembly >>>>>> >>>>>> Thanks, >>>>>> -Aleksey >>>>>> >>>>>> >>>> >>>> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Wed Aug 12 08:14:07 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Aug 2015 09:14:07 +0100 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55CAEFEC.6090005@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> <55CAEFEC.6090005@oracle.com> Message-ID: <55CB004F.9030903@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/15 08:04, Aleksey Shipilev wrote: > Thanks, Vladimir! > > Here's a changeset: > http://cr.openjdk.java.net/~shade/8019968/8019968.changeset > > Please sponsor! The patch is fine by me but I think you still need another (capital R) Reviewer. regards, Andrew Dinn - ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJVywBPAAoJEGnaNq4xxcSzQR8IAI+dnFW1n4DgRrLQdmehqGqk RrEwAi+JpEGrcX+r5fQtn0KYPZcl8Jse1DfQS22FmmJOkYlx+OxhhDInrEv4ig0z xCBO8/gKEegLqjNy706Jet3CUOzsX3xeFhgfQoUCwVt5opVmvLhNBV9vuJp6j3eW RzDCKJG1Utve5RQ61ncbro4N1Xh17FDfZ854rgAm76JPQallyeTqlPadXr8gbk0q PDDY4n+PqcGxf2jhlinI7IIkuv8V83d6eLE/kHR+41WOHvtwKTLq0g7llDJsqKtm VX5BUi+93+HipNdtZBYuhDEluRns5R+YTodxZeyeTrQDiRsTUszXlRXB+77cJDY= =jY4r -----END PGP SIGNATURE----- From dawid.weiss at gmail.com Wed Aug 12 11:09:47 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Wed, 12 Aug 2015 13:09:47 +0200 Subject: Transient miscompilation problem on 1.8 (invalid AIOOB/NPE thrown from the method body). In-Reply-To: References: Message-ID: FYI. Found it by bisecting hotspot changes and recompiling in fastdebug. The problem is present consistently before this commit: $ hg log -r 7381 changeset: 7381:03596ae35800 user: aeriksso date: Thu May 21 16:49:11 2015 +0200 summary: 8060036: C2: CmpU nodes can end up with wrong type information I cannot explain why -XX:-TieredCompilation helps here, perhaps it collects different stats and the compilation graph is different (?). In any case, the bug issue [1] has incorrect "Affect" field of "8u60"; should be at least "8x45", perhaps lower than that (and a related bug [2] has it set correctly). Dawid [1] https://bugs.openjdk.java.net/browse/JDK-8060036 [2] https://bugs.openjdk.java.net/browse/JDK-8080156 On Tue, Aug 11, 2015 at 11:27 PM, Dawid Weiss wrote: > We tried to narrow it down. The problem is tied to tiered compilation > somehow because turning it off makes the test pass with flying colors: > > # 1.8.0_45-b14 > PASSES -Xint > PASSES -Xmx4g -Xbatch -XX:CICompilerCount=1 -XX:-TieredCompilation > PASSES -Xmx4g -XX:-TieredCompilation > FAILS -Xmx4g -XX:+TieredCompilation > FAILS -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation > > What's more interesting is that 1.9 and the most recent ea of 1.8 > (u60) also pass, even with tiered compilation turned on: > > # 1.9.0-ea-b71 > PASSES -Xmx4g -Xbatch -XX:CICompilerCount=2 -XX:+TieredCompilation > > # 1.8.0_60-ea-b25 > PASSES [always, regardless of options] > > I can't tell whether it's something masking the original problem or > whether the bug has been fixed in between. I looked at JIRA logs, but > can't find anything specific. If somebody knows what this could be, > I'd appreciate a pointer. > > Dawid > > On Tue, Aug 11, 2015 at 4:25 PM, Dawid Weiss wrote: >> Hello, >> >> We have encountered a transient miscompilation problem (on 1.8u40). We >> get an AIOOB exception from a snippet of code which (provably) cannot >> throw it. The AIOOB is thrown without a stack trace. What's >> interesting is that when we set: >> >> -XX:-OmitStackTraceInFastThrow >> >> we get an NPE exception (which, again, is provably impossible at Java >> code level). >> >> The problem does not reproduce on my machine with i7 3770K (at least >> so far), but does reproduce consistently on i7 2600K (and our >> customer's machine; exact spec unknown). >> >> I will be looking into isolating this issue as it is in our >> proprietary code, but the pattern seems to be as follows: >> >> 1) new instance of A is created, with a new instance of B, which is a >> single-implementation of interface C. >> >> 2) there is a tight loop which calls A (and B) methods. >> >> There is no way for an AIOOB (or NPE) to be present in any of A or B, >> but the stack trace indicates A. >> >> I suspect an OSR miscompilation somewhere, but since I can't reproduce >> it locally it's a bit of a problem to experiment with JVM versions and >> internal flags. >> >> Any hints on what it can be related to (flags to try, etc.) would be >> appreciated. >> >> Dawid From adinn at redhat.com Wed Aug 12 12:45:32 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Aug 2015 13:45:32 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55BA78B7.7030300@oracle.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> Message-ID: <55CB3FEC.1070709@redhat.com> Hi Vladimir, Apologies for the delay in responding to your feedback -- I was traveling for a team meeting all of last week. Here is a revised webrev which includes all the code changes you suggested http://cr.openjdk.java.net/~adinn/8078743/webrev.04 Also, as requested I did some testing on the two AArch64 machines to which I have access. Does it help? Short answer: yes it is well worth doing as it causes no harm on the sort of architecture where you would expect no benefit and helps a lot on the sort of architecture where you would expect it to help. More details below. regards, Andrew Dinn ----------- The Tests --------- I ran some simple tests using the jmh micro-benchmark harness, first using the old style dmb based implementation (i.e. passing -XX:+UseBarriersForVolatile) and then using the new style stlr-based implementation (using -XX:-UseBarriersForVolatile). Each test was run in each of the 5 relevant GC configs: +G1GC +CMS +UseCondCardMark +CMS -UseCondCardMark +Par +UseCondCardMark +Par -UseCondCardMark The tests were derived from Alexey Shipilev's recently posted CAS test, tweaked to do volatile stores instead of CASes. Each test employs a single thread which repeatedly writes a volatile field (AtomicReference.set). A delay call follows each write (BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A single AtomicReference instance is employed throughout the test. Test one always writes null; test two always writes a fixed object; test three writes an object newly allocated at each write (example source for the null write test is included below). This range of tests allows various elements of the write barrier to be omitted at generate time or run time, depending upon the GC config. In each case the result was recorded as the average number of nanoseconds per write operation (ns/op). I am afraid I am not in a position to give the actual timings on any specific architecture or, indeed, name what hardware was used. However, I can give a qualitiative account of what I found and it pretty much accords with Andrew Haley's expectations. Main Results ------------ With the first (O-O-O CPU) implementation of AArch64 there was no statistically significant variation in the ns/op. With the other (simple pipeline CPU) implementation for most of the test space there was a very significant improvement (decrease) in ns/op for the stlr version when compared against the equivalent barrier implementation Detailed Results ---------------- The second machine showed some interesting variations in performance improvement which are worth mentioning: - in the best case ns/op was cut by 50% (CMS - UseCondCardMark, backoff 0, old value write) - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old value write and ~15-20% for young value write - at backoff 64 in most cases ns/op was cut by ~5-10% (n.b. this is mostly to do with the addition of wasted backoff time -- there was only a small decrease in the absolute times) - with most GC configs greatest improvement was with old value write, least improvement with young value write the above general results did not apply for 2 specific data points - with CMS + UseCondCardMark no significant %ge change was seen for old value writes - with Par + UseCondCardMark no significant %ge change was seen for young value writes These last 2 results are a bit odd. For both old and young puts CMS + UseCondCardMark requires a dmb ish after the stlr to ensure the card read does not float above the volatile store. For null puts the dmb gets elided (type info tells the compiler no card mark needed). So, the difference here between old and young writes is unexpected but must be down to the effect of conditional card marking rather than the barriers vs stlr. Par + UseCondCardMark employs no synchronization for the card mark. Once again the null write case will not need a card mark but the other two cases will. So, once again the disparity in the improvement between these two cases is unexpected but must be down to the effect of conditional card marking rather than the barriers vs stlr. Example Test Class ------------------- package org.openjdk; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicReference; @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(3) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class VolSetNull { AtomicReference ref; @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) int backoff; @Setup public void setup() { ref = new AtomicReference<>(); ref.set(new Object()); } @Benchmark public boolean test() { Blackhole.consumeCPU(backoff); ref.set(null); return true; } } From adinn at redhat.com Wed Aug 12 14:44:07 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Aug 2015 15:44:07 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55CB3FEC.1070709@redhat.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> Message-ID: <55CB5BB7.6010006@redhat.com> On 12/08/15 13:45, Andrew Dinn wrote: > Hi Vladimir, > > Apologies for the delay in responding to your feedback -- I was > traveling for a team meeting all of last week. > > Here is a revised webrev which includes all the code changes you suggested > > http://cr.openjdk.java.net/~adinn/8078743/webrev.04 > > Also, as requested I did some testing on the two AArch64 machines to > which I have access. Does it help? Short answer: yes it is well worth > doing as it causes no harm on the sort of architecture where you would > expect no benefit and helps a lot on the sort of architecture where you > would expect it to help. More details below. Oops, I mislabelled the GC configs in my test script so as to swap the two configs Par +/- UseCondCardmark. Which means the anomalous result for Par + UseCondCardMark with young writes is actually an anomalous result for Par - UseCondCardMark with young writes. Not that this makes the anomaly that much less unexpected. > The Tests > --------- > > I ran some simple tests using the jmh micro-benchmark harness, first > using the old style dmb based implementation (i.e. passing > -XX:+UseBarriersForVolatile) and then using the new style stlr-based > implementation (using -XX:-UseBarriersForVolatile). Each test was run in > each of the 5 relevant GC configs: > > +G1GC > +CMS +UseCondCardMark > +CMS -UseCondCardMark > +Par +UseCondCardMark > +Par -UseCondCardMark > > The tests were derived from Alexey Shipilev's recently posted CAS test, > tweaked to do volatile stores instead of CASes. Each test employs a > single thread which repeatedly writes a volatile field > (AtomicReference.set). A delay call follows each write > (BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A > single AtomicReference instance is employed throughout the test. > > Test one always writes null; test two always writes a fixed object; test > three writes an object newly allocated at each write (example source for > the null write test is included below). This range of tests allows > various elements of the write barrier to be omitted at generate time or > run time, depending upon the GC config. > > In each case the result was recorded as the average number of > nanoseconds per write operation (ns/op). I am afraid I am not in a > position to give the actual timings on any specific architecture or, > indeed, name what hardware was used. However, I can give a qualitiative > account of what I found and it pretty much accords with Andrew Haley's > expectations. > > Main Results > ------------ > > With the first (O-O-O CPU) implementation of AArch64 there was no > statistically significant variation in the ns/op. > > With the other (simple pipeline CPU) implementation for most of the > test space there was a very significant improvement (decrease) in ns/op > for the stlr version when compared against the equivalent barrier > implementation > > Detailed Results > ---------------- > > The second machine showed some interesting variations in performance > improvement which are worth mentioning: > > - in the best case ns/op was cut by 50% (CMS - UseCondCardMark, > backoff 0, old value write) > > - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old > value write and ~15-20% for young value write > > - at backoff 64 in most cases ns/op was cut by ~5-10% (n.b. this is > mostly to do with the addition of wasted backoff time -- there was only > a small decrease in the absolute times) > > - with most GC configs greatest improvement was with old value write, > least improvement with young value write > > the above general results did not apply for 2 specific data points > > - with CMS + UseCondCardMark no significant %ge change was seen for > old value writes > > - with Par + UseCondCardMark no significant %ge change was seen for > young value writes > > These last 2 results are a bit odd. > > For both old and young puts CMS + UseCondCardMark requires a dmb ish > after the stlr to ensure the card read does not float above the volatile > store. For null puts the dmb gets elided (type info tells the compiler > no card mark needed). So, the difference here between old and young > writes is unexpected but must be down to the effect of conditional card > marking rather than the barriers vs stlr. > > Par + UseCondCardMark employs no synchronization for the card mark. > Once again the null write case will not need a card mark but the other > two cases will. So, once again the disparity in the improvement between > these two cases is unexpected but must be down to the effect of > conditional card marking rather than the barriers vs stlr. > > Example Test Class > ------------------- > > package org.openjdk; > > import org.openjdk.jmh.annotations.*; > import org.openjdk.jmh.infra.Blackhole; > > import java.util.concurrent.TimeUnit; > import java.util.concurrent.atomic.AtomicReference; > > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) > @Fork(3) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Benchmark) > public class VolSetNull { > > AtomicReference ref; > > @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) > int backoff; > > @Setup > public void setup() { > ref = new AtomicReference<>(); > ref.set(new Object()); > } > > @Benchmark > public boolean test() { > Blackhole.consumeCPU(backoff); > ref.set(null); > return true; > } > } > -- regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From edward.nevill at gmail.com Wed Aug 12 16:23:32 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 12 Aug 2015 17:23:32 +0100 Subject: 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55CA28EC.7060109@oracle.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> Message-ID: <1439396612.4820.31.camel@mylittlepony.linaroharston> On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote: > I think it depends how expensive push/pop on arm64. > In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in > .ad). So you are saving on stack anyway. > On other hand your changes (third temp) are not so big and I think acceptable. > On 8/11/15 8:57 AM, Edward Nevill wrote: > > Hi, > > > > Webrev http://cr.openjdk.java.net/~enevill/8133352/ Hi Vladimir, Thanks for that. Another possibility is to use the inverse operation to restore the result after it has been corrupted. EG. -#define ATOMIC_OP(LDXR, OP, STXR) \ +#define ATOMIC_OP(LDXR, OP, IOP, STXR) \ void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \ Register result = rscratch2; \ if (prev->is_valid()) \ @@ -2120,14 +2125,15 @@ bind(retry_load); \ LDXR(result, addr); \ OP(rscratch1, result, incr); \ - STXR(rscratch1, rscratch1, addr); \ - cbnzw(rscratch1, retry_load); \ - if (prev->is_valid() && prev != result) \ - mov(prev, result); \ + STXR(rscratch2, rscratch1, addr); \ + cbnzw(rscratch2, retry_load); \ + if (prev->is_valid() && prev != result) { \ + IOP(prev, rscratch1, incr); \ + } \ } -ATOMIC_OP(ldxr, add, stxr) -ATOMIC_OP(ldxrw, addw, stxrw) +ATOMIC_OP(ldxr, add, sub, stxr) +ATOMIC_OP(ldxrw, addw, subw, stxrw) This essentially creates the extra register we need by using the inverse operation to restore the result. It doesn't win any beauty contests, but it is probably the most optimal. All the best, Ed. From dean.long at oracle.com Thu Aug 13 06:25:03 2015 From: dean.long at oracle.com (Dean Long) Date: Wed, 12 Aug 2015 23:25:03 -0700 Subject: aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55C9C00A.3040302@oracle.com> References: <55C9C00A.3040302@oracle.com> Message-ID: <55CC383F.2070105@oracle.com> OK, I pushed it for you. dl On 8/11/2015 2:27 AM, Aleksey Shipilev wrote: > Hi Dean, > > Ah yes, since we now use MacroAssembler::align to produce the effective > alignment, we can drop the platform-specific changes. ARM and PPC ports > may rewire their own MacroAssemblers if there are potentially better nop > sequences. > > New changeset: > http://cr.openjdk.java.net/~shade/8131682/8131682.changeset > > Tested it builds and runs with full JPRT. > > See the "Reviewed-by" line there. I think there are Reviewers there... > > Thanks, > -Aleksey > > On 08/10/2015 10:57 PM, Dean wrote: >> Did you get a Reviewer yet? >> >> dl >> >> >> Dean wrote: >>> I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? >>> >>> dl >>> >>> >>> Aleksey Shipilev wrote: >>>> On 07/29/2015 07:30 PM, Dean Long wrote: >>>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>>>> >>>>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>>>> I agree that it looks good. >>>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>>>> Andrew Haley. Still no Capital (R)eviewers. >>>>>> >>>>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>>>> open+closed sources, and it would seem the changes in closed sources are >>>>>> not required. >>>>> The changes to sparc and ppc may not be required anymore. >>>> Excellent, please sponsor! >>>> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> > From adinn at redhat.com Fri Aug 14 09:01:02 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 14 Aug 2015 10:01:02 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55CB3FEC.1070709@redhat.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> Message-ID: <55CDAE4E.5060301@redhat.com> Any chance of getting my updated patch reviewed by 2 people and sponsored it to go into hs-comp? I now have the follow on patch for issue ready for review 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code So it would be very helpful if this one could be checked. Thanks! regards, Andrew Dinn ----------- On 12/08/15 13:45, Andrew Dinn wrote: > Hi Vladimir, > > Apologies for the delay in responding to your feedback -- I was > traveling for a team meeting all of last week. > > Here is a revised webrev which includes all the code changes you suggested > > http://cr.openjdk.java.net/~adinn/8078743/webrev.04 > > Also, as requested I did some testing on the two AArch64 machines to > which I have access. Does it help? Short answer: yes it is well worth > doing as it causes no harm on the sort of architecture where you would > expect no benefit and helps a lot on the sort of architecture where you > would expect it to help. More details below. > > regards, > > > Andrew Dinn > ----------- > > The Tests > --------- > > I ran some simple tests using the jmh micro-benchmark harness, first > using the old style dmb based implementation (i.e. passing > -XX:+UseBarriersForVolatile) and then using the new style stlr-based > implementation (using -XX:-UseBarriersForVolatile). Each test was run in > each of the 5 relevant GC configs: > > +G1GC > +CMS +UseCondCardMark > +CMS -UseCondCardMark > +Par +UseCondCardMark > +Par -UseCondCardMark > > The tests were derived from Alexey Shipilev's recently posted CAS test, > tweaked to do volatile stores instead of CASes. Each test employs a > single thread which repeatedly writes a volatile field > (AtomicReference.set). A delay call follows each write > (BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A > single AtomicReference instance is employed throughout the test. > > Test one always writes null; test two always writes a fixed object; test > three writes an object newly allocated at each write (example source for > the null write test is included below). This range of tests allows > various elements of the write barrier to be omitted at generate time or > run time, depending upon the GC config. > > In each case the result was recorded as the average number of > nanoseconds per write operation (ns/op). I am afraid I am not in a > position to give the actual timings on any specific architecture or, > indeed, name what hardware was used. However, I can give a qualitiative > account of what I found and it pretty much accords with Andrew Haley's > expectations. > > Main Results > ------------ > > With the first (O-O-O CPU) implementation of AArch64 there was no > statistically significant variation in the ns/op. > > With the other (simple pipeline CPU) implementation for most of the > test space there was a very significant improvement (decrease) in ns/op > for the stlr version when compared against the equivalent barrier > implementation > > Detailed Results > ---------------- > > The second machine showed some interesting variations in performance > improvement which are worth mentioning: > > - in the best case ns/op was cut by 50% (CMS - UseCondCardMark, > backoff 0, old value write) > > - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old > value write and ~15-20% for young value write > > - at backoff 64 in most cases ns/op was cut by ~5-10% (n.b. this is > mostly to do with the addition of wasted backoff time -- there was only > a small decrease in the absolute times) > > - with most GC configs greatest improvement was with old value write, > least improvement with young value write > > the above general results did not apply for 2 specific data points > > - with CMS + UseCondCardMark no significant %ge change was seen for > old value writes > > - with Par + UseCondCardMark no significant %ge change was seen for > young value writes > > These last 2 results are a bit odd. > > For both old and young puts CMS + UseCondCardMark requires a dmb ish > after the stlr to ensure the card read does not float above the volatile > store. For null puts the dmb gets elided (type info tells the compiler > no card mark needed). So, the difference here between old and young > writes is unexpected but must be down to the effect of conditional card > marking rather than the barriers vs stlr. > > Par + UseCondCardMark employs no synchronization for the card mark. > Once again the null write case will not need a card mark but the other > two cases will. So, once again the disparity in the improvement between > these two cases is unexpected but must be down to the effect of > conditional card marking rather than the barriers vs stlr. > > Example Test Class > ------------------- > > package org.openjdk; > > import org.openjdk.jmh.annotations.*; > import org.openjdk.jmh.infra.Blackhole; > > import java.util.concurrent.TimeUnit; > import java.util.concurrent.atomic.AtomicReference; > > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) > @Fork(3) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Benchmark) > public class VolSetNull { > > AtomicReference ref; > > @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) > int backoff; > > @Setup > public void setup() { > ref = new AtomicReference<>(); > ref.set(new Object()); > } > > @Benchmark > public boolean test() { > Blackhole.consumeCPU(backoff); > ref.set(null); > return true; > } > } > -- regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From aleksey.shipilev at oracle.com Fri Aug 14 11:13:16 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 14 Aug 2015 14:13:16 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55CB004F.9030903@redhat.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> <55CAEFEC.6090005@oracle.com> <55CB004F.9030903@redhat.com> Message-ID: <55CDCD4C.9040404@oracle.com> On 08/12/2015 11:14 AM, Andrew Dinn wrote: > On 12/08/15 08:04, Aleksey Shipilev wrote: >> Thanks, Vladimir! > >> Here's a changeset: >> http://cr.openjdk.java.net/~shade/8019968/8019968.changeset > >> Please sponsor! > > The patch is fine by me but I think you still need another (capital R) > Reviewer. Ah. I can never remember the rules. One more Reviewer and a sponsort, please? This patch implicitly collides with other CAS changes Andrew has in pipeline, so it would be good to push this sooner. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.westrelin at oracle.com Fri Aug 14 17:55:51 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 14 Aug 2015 19:55:51 +0200 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 Message-ID: <874mk1lpp4.fsf@oracle.com> http://cr.openjdk.java.net/~roland/8133599/webrev.00/ Last intrinsic (Unsafe.getAndSetObject() currently) is wrongly filtered out. Roland. From aleksey.shipilev at oracle.com Fri Aug 14 18:04:39 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 14 Aug 2015 21:04:39 +0300 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <874mk1lpp4.fsf@oracle.com> References: <874mk1lpp4.fsf@oracle.com> Message-ID: <55CE2DB7.6050606@oracle.com> On 14.08.2015 20:55, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8133599/webrev.00/ > > Last intrinsic (Unsafe.getAndSetObject() currently) is wrongly filtered out. Ouch. Fix looks good, but what's the point for bounds-checking the $id anyway? Let the "default" clause in switch(id) to handle "return false", assuming no library-compiler-inline intrinsics appear in switch? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From dean.long at oracle.com Fri Aug 14 18:05:12 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 14 Aug 2015 11:05:12 -0700 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <874mk1lpp4.fsf@oracle.com> References: <874mk1lpp4.fsf@oracle.com> Message-ID: <55CE2DD8.5030400@oracle.com> Looks good. dl On 8/14/2015 10:55 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8133599/webrev.00/ > > Last intrinsic (Unsafe.getAndSetObject() currently) is wrongly filtered out. > > Roland. From vladimir.kozlov at oracle.com Fri Aug 14 18:22:01 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Aug 2015 11:22:01 -0700 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <874mk1lpp4.fsf@oracle.com> References: <874mk1lpp4.fsf@oracle.com> Message-ID: <55CE31C9.1020609@oracle.com> Looks good. Thanks, Vladimir On 8/14/15 10:55 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8133599/webrev.00/ > > Last intrinsic (Unsafe.getAndSetObject() currently) is wrongly filtered out. > > Roland. > From roland.westrelin at oracle.com Fri Aug 14 18:44:07 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 14 Aug 2015 20:44:07 +0200 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <55CE2DB7.6050606@oracle.com> References: <874mk1lpp4.fsf@oracle.com> <55CE2DB7.6050606@oracle.com> Message-ID: <87y4hdk8w8.fsf@oracle.com> > Fix looks good, but what's the point for bounds-checking the $id anyway? > Let the "default" clause in switch(id) to handle "return false", > assuming no library-compiler-inline intrinsics appear in switch? Thanks for looking at this, Aleksey. I guess bound checking the id is a bit faster than going through the switch and maybe less error prone. Roland. From vladimir.x.ivanov at oracle.com Fri Aug 14 18:45:59 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 14 Aug 2015 11:45:59 -0700 Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer In-Reply-To: References: <55789088.5050405@oracle.com> <9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com> <5580738D.9070900@oracle.com> <5581C465.7070803@oracle.com> Message-ID: <55CE3767.7010408@oracle.com> Looks good. Best regards, Vladimir Ivanov On 7/31/15 3:20 AM, Roland Westrelin wrote: > Here is a new webrev for this that takes Vladimir?s comments into account: > > http://cr.openjdk.java.net/~roland/8080289/webrev.01/ > > Roland. > > >> On Jun 17, 2015, at 9:03 PM, Vladimir Kozlov wrote: >> >>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html >>> >>> it?s allowed to reorder normal stores with normal stores >> >> If we can guarantee that all passed stores are normal (I assume we will have barriers otherwise in between) then I agree. I am not sure why we didn't do it before, there could be a counterargument for that which I don't remember. To make sure, ask John. >> >>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal(). >>> >>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores? >> >> Yes, it may simplify the code of Ideal. You may still need a loop to look for previous store which could be eliminated but you don't need to have 'prev'. As soon you remove one node, you exit Ideal returning 'this' and it will be called again so you can search for another previous store. >> >>>> BOTTOM (all slices) Phi? >>> >>> Wouldn?t there be a MergeMem between the store and the Phi then? >> >> Yes. Okay, you can keep the check as assert we will see if Nightly testing hit it it or not. >> >> Thanks, >> Vladimir >> >> On 6/17/15 1:35 AM, Roland Westrelin wrote: >>> >>>>> That?s what I think the code does. That is if you have: >>>>> >>>>> st1->st2->st3->st4 >>>> >>>> I assume st4 is first store and st1 is last. Right? >>> >>> Program order is: >>> st4 >>> st3 >>> st2 >>> st1 >>> >>>>> and st3 is redundant with st1, the chain should become: >>>>> >>>>> st1->st2->st4 >>>> >>>> I am not sure it is correct optimization. On some machines result of st3 could be visible before result of st2. And you change it. >>>> I am suggesting not do that. Do you need that for stores move from loop? >>> >>> It?s not required. It cleans up the graph in some cases like this: >>> >>> static void test_after_5(int idx) { >>> for (int i = 0; i < 1000; i++) { >>> array[idx] = i; >>> array[idx+1] = i; >>> array[idx+2] = i; >>> array[idx+3] = i; >>> array[idx+4] = i; >>> array[idx+5] = i; >>> } >>> } >>> >>> all stores are sunk out of the loop but that happens after iteration splitting and so there are multiple redundant copies of each store that are not collapsed. >>> >>> This said, we currently reorder the stores even if it?s less aggressive than what I?m proposing. With program: >>> >>> st4 >>> st3 >>> st2 >>> st1 >>> >>> If st1, st3 and st4 are on one slice and st2 is on another and if st1 and st3 store to the same address we optimize st3 out: >>> >>> st4 >>> st2 >>> st1 >>> >>> so st3=st1 may only be visible after st2. >>> >>> Also, the way I read the first table in this: >>> >>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html >>> >>> it?s allowed to reorder normal stores with normal stores >>> >>>>> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2. >>>> >>>> In this case the code should be: >>>> >>>> if (st->in(MemNode::Address)->eqv_uncast(address) && >>>> ... >>>> } else { >>>> prev = st; >>>> } >>>> >>>> to update 'prev' with 'st' only if 'st' is not removed. >>> >>> You?re right. >>> >>>> Also, I think, st->in(MemNode::Memory) could be put in local var since it is used several times in this code. >>>> >>>>> >>>>>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases. >>>>> >>>>> In the example above, if we remove st2, we modify this, right? >>>> >>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal(). >>> >>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores? >>> >>>>> We?ll find a path from the head that doesn?t go through the store and that exits the loop. What the comment doesn?t say is that with example 2 below: >>>>> >>>>> for (int i = 0; i < 10; i++) { >>>>> if (some_condition) { >>>>> uncommon_trap(); >>>>> } >>>>> array[idx] = 999; >>>>> } >>>>> >>>>> my verification code would find the early exit as well. >>>>> >>>>> It?s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won?t match: the loop?s memory Phi backedge won?t be the store. If we have example 2 above, then the loop?s memory Phi doesn?t have a single memory use. So I don?t think we need to check that the store post dominate the loop head in product. That?s my reasoning anyway and the verification code is there to verify it. >>>> >>>> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce cases only to one store to this address in the loop - good. >>>> >>>> How you check in product VM that there are no other exists from a loop (your example 2)? Is it guarded by mem->outcnt() == 1 check? >>> >>> Yes. >>> >>>>>> Should you check phi == NULL instead of assert to make sure you have only one Phi node? >>>>> >>>>> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head? >>>>> I would have expected that to be impossible. >>>> >>>> BOTTOM (all slices) Phi? >>> >>> Wouldn?t there be a MergeMem between the store and the Phi then? >>> >>> For the record, the webrev: >>> >>> http://cr.openjdk.java.net/~roland/8080289/webrev.00/ >>> >>> Roland. >>> > From roland.westrelin at oracle.com Fri Aug 14 18:48:24 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 14 Aug 2015 20:48:24 +0200 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55CDCD4C.9040404@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> <55CAEFEC.6090005@oracle.com> <55CB004F.9030903@redhat.com> <55CDCD4C.9040404@oracle.com> Message-ID: <87tws1k8p3.fsf@oracle.com> > One more Reviewer and a sponsort, please? That looks good to me. Will push it. Roland. From aleksey.shipilev at oracle.com Fri Aug 14 18:51:27 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 14 Aug 2015 21:51:27 +0300 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <87y4hdk8w8.fsf@oracle.com> References: <874mk1lpp4.fsf@oracle.com> <55CE2DB7.6050606@oracle.com> <87y4hdk8w8.fsf@oracle.com> Message-ID: <55CE38AF.4030103@oracle.com> On 14.08.2015 21:44, Roland Westrelin wrote: > >> Fix looks good, but what's the point for bounds-checking the $id anyway? >> Let the "default" clause in switch(id) to handle "return false", >> assuming no library-compiler-inline intrinsics appear in switch? > > Thanks for looking at this, Aleksey. I guess bound checking the id is a > bit faster than going through the switch and maybe less error prone. Well, "less error prone" is kinda contradicted by empirical evidence here. But I don't have strong opinion about this. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Fri Aug 14 18:51:41 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 14 Aug 2015 21:51:41 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <87tws1k8p3.fsf@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> <55BAE566.5020904@oracle.com> <55C86B41.9010909@oracle.com> <55C95C2D.9050900@oracle.com> <55C9BEF2.2030100@oracle.com> <55CA1BA7.4080907@oracle.com> <55CAEFEC.6090005@oracle.com> <55CB004F.9030903@redhat.com> <55CDCD4C.9040404@oracle.com> <87tws1k8p3.fsf@oracle.com> Message-ID: <55CE38BD.5010103@oracle.com> On 14.08.2015 21:48, Roland Westrelin wrote: > >> One more Reviewer and a sponsort, please? > > That looks good to me. Will push it. Thank you, Roland! -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Fri Aug 14 18:57:54 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 14 Aug 2015 11:57:54 -0700 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> <55B97467.3000404@oracle.com> Message-ID: <55CE3A32.7020600@oracle.com> Looks good. Best regards, Vladimir Ivanov On 7/30/15 11:29 AM, Roland Westrelin wrote: > Updated webrev with Vladimir?s comments: > > http://cr.openjdk.java.net/~roland/8130847/webrev.01/ > > Roland. > >> On Jul 30, 2015, at 2:48 AM, Vladimir Kozlov wrote: >> >> On 7/29/15 6:57 AM, Roland Westrelin wrote: >>>> The next change puzzles me: >>>> >>>> - if (!call->may_modify(tinst, phase)) { >>>> + if (call->may_modify(tinst, phase)) { >>>> - mem = call->in(TypeFunc::Memory); >>>> + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); >>>> >>>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran? >>>> >>>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). >>> >>> What about: >>> >>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); >>> >>> instead of membar_for_arraycopy() >>> >>> So ArrayCopyNode would have: >>> >>> virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase); >>> >>> and >>> >>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); >>> >>> that do the same thing except the static method also looks for a graph pattern starting from a MemBar. >> >> Yes, it is better. >> >> Thanks, >> Vladimir >> >>> >>> Roland. >>> >>>> >>>> Add explicit check: >>>> && strcmp(_name, "unsafe_arraycopy") != 0) >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/28/15 7:05 AM, Roland Westrelin wrote: >>>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >>>>> >>>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >>>>> >>>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >>>>> >>>>> Roland. > From roland.westrelin at oracle.com Fri Aug 14 18:59:47 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 14 Aug 2015 20:59:47 +0200 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <55CE38AF.4030103@oracle.com> References: <874mk1lpp4.fsf@oracle.com> <55CE2DB7.6050606@oracle.com> <87y4hdk8w8.fsf@oracle.com> <55CE38AF.4030103@oracle.com> Message-ID: <87r3n5k864.fsf@oracle.com> > Well, "less error prone" is kinda contradicted by empirical evidence > here. But I don't have strong opinion about this. I actually don't have a strong opinion on this either but Vladimir said to go with the proposed change. :-) Roland. From roland.westrelin at oracle.com Sat Aug 15 00:52:57 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Sat, 15 Aug 2015 02:52:57 +0200 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <55CE3A32.7020600@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> <55B97467.3000404@oracle.com> <55CE3A32.7020600@oracle.com> Message-ID: <87lhddjrti.fsf@oracle.com> Thanks Vladimir & Vladimir for the reviews. Roland. From roland.westrelin at oracle.com Tue Aug 18 07:30:32 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Aug 2015 09:30:32 +0200 Subject: RFR(XS): 8133599: Unsafe.getAndSetObject() is no longer intrinsified by c2 In-Reply-To: <55CE31C9.1020609@oracle.com> References: <874mk1lpp4.fsf@oracle.com> <55CE31C9.1020609@oracle.com> Message-ID: Thanks for the reviews Dean & Vladimir. Roland. From zoltan.majo at oracle.com Tue Aug 18 12:15:58 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 18 Aug 2015 14:15:58 +0200 Subject: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently In-Reply-To: References: <55C4AF44.3060907@oracle.com> <55C507FA.1090507@oracle.com> Message-ID: <55D321FE.3040808@oracle.com> Thank you, Vladimir and Michael, for the review! I plan to push the changes tomorrow. Best regards, Zoltan On 08/07/2015 10:37 PM, Berg, Michael C wrote: > Zoltan, the code looks ok. I have reviewed it in detail. > > Thanks, > -Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Friday, August 07, 2015 12:33 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(M): 8076373: In 32-bit VM interpreter and compiled code process signaling NaN values inconsistently > > I think this is good. You need second review since changes are big and complex. > > Thanks, > Vladimir > > On 8/7/15 6:14 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the following patch for JDK-8076373. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8076373 >> >> >> Problem: On x86_32 systems with XMM instructions available, the >> compilers and the interpreter behave inconsistently as far as >> signalling NaNs (sNaNs) are concerned. For example, the following >> statement|| >> >> start == doubleToRawLongBits(longBitsToDouble(start)) >> >> can be true or false, assuming that the variable 'start' contains a >> bit pattern corresponding to a sNaN. >> >> The result is true if the statement is executed by compiled code and >> longBitsToDouble/doubleToRawLongBits have been replaced by compiler >> intrinsics. The result is false if the native library version of the >> functions is used (either by compiled or by interpreted code). >> >> The inconsistency happens because the interpreter/native ABI relies on >> x87 instructions to process floating point numbers, whereas the >> compilers use XMM registers for the same purpose. x87 instructions >> silently convert signaling NaNs to quiet NaNs, XMM instructions >> preserve sNaNs. >> >> >> Solution: >> - Add intrinsics (stubs) for java.lang.Float.intBitsToFloat, >> java.lang.Float.floatToRawIntBits, java.lang.Double.longBitsToDouble, >> and java.lang.Double.doubleToRawLongBits. The stubs use XMM registers >> and therefore preserve sNaNs. The stubs are used by both the >> interpreter and the compilers. >> - Change the interpreter to use XMM registers instead of x87 registers >> to internally cache floating point values. As a result, sNaNs are >> preserved within the interpreter. >> >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8076373/webrev.00/ >> >> Testing: >> - JPRT run, testset hotspot (including the newly added test, >> NaNTest.java); all tests pass; >> - all JTREG tests in hotspot/test on x86_32 and x86_64; all tests pass >> that pass with the default version of the VM. >> >> Thank you and best regards, >> >> >> Zoltan >> From roland.westrelin at oracle.com Tue Aug 18 12:47:14 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Aug 2015 14:47:14 +0200 Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer In-Reply-To: <55CE3767.7010408@oracle.com> References: <55789088.5050405@oracle.com> <9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com> <5580738D.9070900@oracle.com> <5581C465.7070803@oracle.com> <55CE3767.7010408@oracle.com> Message-ID: Thanks Vladimir & Vladimir for the reviews. Roland. From edward.nevill at gmail.com Tue Aug 18 13:07:11 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 18 Aug 2015 14:07:11 +0100 Subject: 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439396612.4820.31.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> Message-ID: <1439903231.5709.5.camel@mylittlepony.linaroharston> Hi, Given that there has been no objections to my proposed solution I have prepared a webrev based on this. http://cr.openjdk.java.net/~enevill/8133352/webrev.01 The original jira issue is here https://bugs.openjdk.java.net/browse/JDK-8133352 I have tested with jtreg hotspot and langtools. Results before and after were identical. Hotspot: Test results: passed: 883; failed: 2; error: 10 Langtools: Test results: passed: 3,260; failed: 2 Please review and if OK I will push, Thanks, Ed. On Wed, 2015-08-12 at 17:23 +0100, Edward Nevill wrote: > On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote: > > I think it depends how expensive push/pop on arm64. > > In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in > > .ad). So you are saving on stack anyway. > > On other hand your changes (third temp) are not so big and I think acceptable. > > On 8/11/15 8:57 AM, Edward Nevill wrote: > -#define ATOMIC_OP(LDXR, OP, STXR) \ > +#define ATOMIC_OP(LDXR, OP, IOP, STXR) \ > void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \ > Register result = rscratch2; \ > if (prev->is_valid()) \ > @@ -2120,14 +2125,15 @@ > bind(retry_load); \ > LDXR(result, addr); \ > OP(rscratch1, result, incr); \ > - STXR(rscratch1, rscratch1, addr); \ > - cbnzw(rscratch1, retry_load); \ > - if (prev->is_valid() && prev != result) \ > - mov(prev, result); \ > + STXR(rscratch2, rscratch1, addr); \ > + cbnzw(rscratch2, retry_load); \ > + if (prev->is_valid() && prev != result) { \ > + IOP(prev, rscratch1, incr); \ > + } \ > } > > -ATOMIC_OP(ldxr, add, stxr) > -ATOMIC_OP(ldxrw, addw, stxrw) > +ATOMIC_OP(ldxr, add, sub, stxr) > +ATOMIC_OP(ldxrw, addw, subw, stxrw) > > This essentially creates the extra register we need by using the inverse operation to restore the result. > > It doesn't win any beauty contests, but it is probably the most optimal. From claes.redestad at oracle.com Tue Aug 18 12:02:07 2015 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 18 Aug 2015 14:02:07 +0200 Subject: Smoother tiered compilation thread ergonomics Message-ID: <55D31EBF.5030305@oracle.com> Hi, I noticed the thread ergonomics for tiered compilation have a few odd jumps that perhaps could be improved. The calculation used to derive CICompilerCount for Tiered in vm/runtime/advancedThresholdPolicy.cpp: int log_cpu = log2_intptr(os::active_processor_count()); int loglog_cpu = log2_intptr(MAX2(log_cpu, 1)); count = MAX2(log_cpu * loglog_cpu, 1) * 3 / 2; Seems to evaluate to: #CPUs CICompilerCount <4 2 4 3 8 4 16 12 32 15 64 18 128 21 256 36 512 40 1024 45 2048 49 The jump from 4 to 12 threads at 16 processors doesn't look very elegant (there's a small bump going from 128->256, too). It seems reasonable the ratio of compiler threads to actual CPUs should diminish as CPUs increase, but going from 8 (or 15) to 16 actually increases the ratio. If we'd replace log2_intptr with some non-discrete function instead of using log2_intptr, we could smooth out the curve, which I think would be beneficial for the rather common cases where systems have somewhere between 8 and 32 CPUs. /Claes -------------- next part -------------- An HTML attachment was scrubbed... URL: From adinn at redhat.com Tue Aug 18 14:50:07 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 18 Aug 2015 15:50:07 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439903231.5709.5.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> Message-ID: <55D3461F.8070405@redhat.com> Hi Ed, On 18/08/15 14:07, Edward Nevill wrote: > Given that there has been no objections to my proposed solution I > have prepared a webrev based on this. > > http://cr.openjdk.java.net/~enevill/8133352/webrev.01 > > The original jira issue is here > > https://bugs.openjdk.java.net/browse/JDK-8133352 > > I have tested with jtreg hotspot and langtools. Results before and > after were identical. > > Hotspot: Test results: passed: 883; failed: 2; error: 10 Langtools: > Test results: passed: 3,260; failed: 2 > > Please review and if OK I will push, Your change looks good to me. regards, Andrew Dinn ----------- From ichoran at gmail.com Tue Aug 18 15:25:03 2015 From: ichoran at gmail.com (Rex Kerr) Date: Tue, 18 Aug 2015 08:25:03 -0700 Subject: Inliner? error, perhaps JDK-8129397 Message-ID: I have run into what may be JDK-8129397 in some data analysis code. This is especially concerning because it manifests as corrupting integer values. These are used as indices into an array, so I eventually get an ArrayIndexOutOfBoundsError, but similarly structured code that does not ultimately result in array indexing could be wrong without warning. I can provide the files needed to reproduce the failure (tested on Java(TM) SE Runtime Environment (build 1.8.0_51-b16); Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode) on both Linux and Mac OS), but they're not small: about 15 MB of jars (some compiled with Scala) and a 40 MB data file. The problem does not appear in 1.8.0_25; I have not bisected further. Although the problem is solved by -XX:-Inline, I am not entirely sure whether it is an inlining error alone, or whether it is an interaction between inlining and other optimizations. I can't pinpoint it by disabling inlining of every method in the problematic area. It also is fixed with -XX:-DoEscapeAnalysis. The bytecode for the problematic method(s) is/are rather hairy, but it uses inner methods, which Scala implements by using boxing-classes to hold variable quantities referred to by the methods. It also uses tail recursion, implemented by fixing up the stack and using goto to return to the start of the method. (I cannot be 100% certain that the logic of the bytecode produced by Scala is correct, but it was correct-enough as of u25.) Note that although the run always (so far!) crashes, the results _are not identical_ from run to run. This strongly suggests that there is some interaction with uninitialized values or memory. How can I proceed? Thanks. --Rex -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Tue Aug 18 16:07:39 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 18 Aug 2015 18:07:39 +0200 Subject: Inliner? error, perhaps JDK-8129397 In-Reply-To: References: Message-ID: <87E5FC83-038F-47BF-888F-659AA6BCE933@oracle.com> Hi Rex, > How can I proceed? Thanks. Can you try an 8u60 build? https://jdk8.java.net/download.html If it still reproduces, you should file a bug: http://bugreport.java.com/ Roland. From zoltan.majo at oracle.com Tue Aug 18 17:02:08 2015 From: zoltan.majo at oracle.com (Zoltan Majo) Date: Tue, 18 Aug 2015 19:02:08 +0200 Subject: =?UTF-8?B?WzldIFJGUihYUyk6IDgxMzM2MjU6IHNyYy9zaGFyZS92bS9vcHRvL2M=?= =?UTF-8?B?b21waWxlLmhwcDo5NjogZXJyb3I6IGludGVnZXIgY29uc3RhbnQgaXMgdG9vIGw=?= =?UTF-8?B?YXJnZSBmb3Ig4oCYbG9uZ+KAmSB0eXBl?= Message-ID: <55D36510.4050601@oracle.com> Hi, please review the following patch for JDK-8133625. Bug: https://bugs.openjdk.java.net/browse/JDK-8133625 Problem: On certain platforms (e.g., x86_32) and with certain GCC versions (e.g., GCC-4.4), the following compilation error appears: hotspot/src/share/vm/opto/compile.hpp:96: error: integer constant is too large for ?long? type Solution: Fold the constant on line 96 of compile.hpp into CONST64() (as suggested in bug description). Also slightly changed the comments for CONST64(). Webrev: http://cr.openjdk.java.net/~zmajo/8133625/webrev.00/ Testing: - JPRT run with testset hotspot; all tests pass; - build locally with GCC-4.4 (fails without patch, passes with patch). Thank you and best regards, Zoltan From aph at redhat.com Tue Aug 18 17:56:19 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 18 Aug 2015 18:56:19 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D3461F.8070405@redhat.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D3461F.8070405@redhat.com> Message-ID: <55D371C3.6030703@redhat.com> On 08/18/2015 03:50 PM, Andrew Dinn wrote: > Your change looks good to me. Me too. Andrew. From vladimir.kozlov at oracle.com Tue Aug 18 18:24:07 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2015 11:24:07 -0700 Subject: =?UTF-8?Q?Re:_[9]_RFR=28XS=29:_8133625:_src/share/vm/opto/compile.h?= =?UTF-8?Q?pp:96:_error:_integer_constant_is_too_large_for_=e2=80=98long?= =?UTF-8?B?4oCZIHR5cGU=?= In-Reply-To: <55D36510.4050601@oracle.com> References: <55D36510.4050601@oracle.com> Message-ID: <55D37847.4070409@oracle.com> Looks good. Thanks, Vladimir On 8/18/15 10:02 AM, Zoltan Majo wrote: > Hi, > > > please review the following patch for JDK-8133625. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8133625 > > > Problem: On certain platforms (e.g., x86_32) and with certain GCC versions (e.g., GCC-4.4), the following compilation > error appears: > > hotspot/src/share/vm/opto/compile.hpp:96: error: integer constant is too large for ?long? type > > > Solution: Fold the constant on line 96 of compile.hpp into CONST64() (as suggested in bug description). Also slightly > changed the comments for CONST64(). > > Webrev: > http://cr.openjdk.java.net/~zmajo/8133625/webrev.00/ > > Testing: > - JPRT run with testset hotspot; all tests pass; > - build locally with GCC-4.4 (fails without patch, passes with patch). > > Thank you and best regards, > > > Zoltan > From ichoran at gmail.com Tue Aug 18 18:45:57 2015 From: ichoran at gmail.com (Rex Kerr) Date: Tue, 18 Aug 2015 11:45:57 -0700 Subject: Inliner? error, perhaps JDK-8129397 In-Reply-To: <87E5FC83-038F-47BF-888F-659AA6BCE933@oracle.com> References: <87E5FC83-038F-47BF-888F-659AA6BCE933@oracle.com> Message-ID: It does reproduce, all the way back to 8u40 (8u31 is fine), and I have just filed a bug. Unfortunately, I typed 'https' instead of 'http' in the URL, so the files needed to reproduce aren't instantly clickable-on. --Rex On Tue, Aug 18, 2015 at 9:07 AM, Roland Westrelin < roland.westrelin at oracle.com> wrote: > > Hi Rex, > > > How can I proceed? Thanks. > > Can you try an 8u60 build? > > https://jdk8.java.net/download.html > > If it still reproduces, you should file a bug: > > http://bugreport.java.com/ > > Roland. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Aug 19 10:30:37 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 19 Aug 2015 12:30:37 +0200 Subject: RFR(S): 8131969: jit/FloatingPoint/gen_math/Loops05 assert(2 <= size && size <= 16) failed: update low bits table Message-ID: <879D8683-02D2-46F5-BA4D-E08BE2C95943@oracle.com> http://cr.openjdk.java.net/~roland/8131969/webrev.00/ This register allocator code processes the inputs of a vector Phi for a Loop with the expectation that all node inputs are already processed which is impossible: the logic assumes no vector Phi can be encountered. The vector Phi was created by the split through phi optimization (see test case) when optimizing a replicateD node: the Phi?s control is the outer loop. So having a vector Phi is not the problem here and the fix relaxes the assert. Roland. From edward.nevill at gmail.com Wed Aug 19 13:30:13 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 19 Aug 2015 14:30:13 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55CB3FEC.1070709@redhat.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> Message-ID: <1439991013.23660.72.camel@mint> Hi Andrew, I have tested this on two different partner platforms with G1GC, CMS and Parallel GC with and without UseCondCardMark. I have also reviewed the code and and happy with the code and comments. Vladimir, if you are still happy with this may I go ahead and push this on behalf of Andrew Dinn. The changes only affect aarch64.ad. Many thanks, Ed. On Wed, 2015-08-12 at 13:45 +0100, Andrew Dinn wrote: > Hi Vladimir, > > Apologies for the delay in responding to your feedback -- I was > traveling for a team meeting all of last week. > > Here is a revised webrev which includes all the code changes you suggested > > http://cr.openjdk.java.net/~adinn/8078743/webrev.04 > From vladimir.kozlov at oracle.com Wed Aug 19 16:33:49 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 09:33:49 -0700 Subject: 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439903231.5709.5.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> Message-ID: <55D4AFED.5000305@oracle.com> Looks fine to me. I did not see any comments from our colleges (from RH) who works on arm64. Are they agree with this change? Thanks, Vladimir On 8/18/15 6:07 AM, Edward Nevill wrote: > Hi, > > Given that there has been no objections to my proposed solution I have prepared a webrev based on this. > > http://cr.openjdk.java.net/~enevill/8133352/webrev.01 > > The original jira issue is here > > https://bugs.openjdk.java.net/browse/JDK-8133352 > > I have tested with jtreg hotspot and langtools. Results before and after were identical. > > Hotspot: Test results: passed: 883; failed: 2; error: 10 > Langtools: Test results: passed: 3,260; failed: 2 > > Please review and if OK I will push, > > Thanks, > Ed. > > On Wed, 2015-08-12 at 17:23 +0100, Edward Nevill wrote: >> On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote: >>> I think it depends how expensive push/pop on arm64. >>> In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in >>> .ad). So you are saving on stack anyway. >>> On other hand your changes (third temp) are not so big and I think acceptable. >>> On 8/11/15 8:57 AM, Edward Nevill wrote: >> -#define ATOMIC_OP(LDXR, OP, STXR) \ >> +#define ATOMIC_OP(LDXR, OP, IOP, STXR) \ >> void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \ >> Register result = rscratch2; \ >> if (prev->is_valid()) \ >> @@ -2120,14 +2125,15 @@ >> bind(retry_load); \ >> LDXR(result, addr); \ >> OP(rscratch1, result, incr); \ >> - STXR(rscratch1, rscratch1, addr); \ >> - cbnzw(rscratch1, retry_load); \ >> - if (prev->is_valid() && prev != result) \ >> - mov(prev, result); \ >> + STXR(rscratch2, rscratch1, addr); \ >> + cbnzw(rscratch2, retry_load); \ >> + if (prev->is_valid() && prev != result) { \ >> + IOP(prev, rscratch1, incr); \ >> + } \ >> } >> >> -ATOMIC_OP(ldxr, add, stxr) >> -ATOMIC_OP(ldxrw, addw, stxrw) >> +ATOMIC_OP(ldxr, add, sub, stxr) >> +ATOMIC_OP(ldxrw, addw, subw, stxrw) >> >> This essentially creates the extra register we need by using the inverse operation to restore the result. >> >> It doesn't win any beauty contests, but it is probably the most optimal. > > > From aph at redhat.com Wed Aug 19 16:37:26 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 Aug 2015 17:37:26 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D4AFED.5000305@oracle.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D4AFED.5000305@oracle.com> Message-ID: <55D4B0C6.7050803@redhat.com> On 08/19/2015 05:33 PM, Vladimir Kozlov wrote: > I did not see any comments from our colleges (from RH) who works on > arm64. Are they agree with this change? Oh yes, absolutely. Andrew. From adinn at redhat.com Wed Aug 19 16:38:59 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 19 Aug 2015 17:38:59 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D4AFED.5000305@oracle.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D4AFED.5000305@oracle.com> Message-ID: <55D4B123.5060403@redhat.com> On 19/08/15 17:33, Vladimir Kozlov wrote: > Looks fine to me. > > I did not see any comments from our colleges (from RH) who works on > arm64. Are they agree with this change? Hmm, maybe the mail server is being slow. Andrew Haley and I both gave it a thumbs up earlier today. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From vladimir.kozlov at oracle.com Wed Aug 19 16:48:53 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 09:48:53 -0700 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D4B123.5060403@redhat.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D4AFED.5000305@oracle.com> <55D4B123.5060403@redhat.com> Message-ID: <55D4B375.4010307@oracle.com> Good. Yes, may be it is our internal Oracle's server. But I found your comments in openjdk mailing archive: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-August/018700.html Thanks, Vladimir On 8/19/15 9:38 AM, Andrew Dinn wrote: > On 19/08/15 17:33, Vladimir Kozlov wrote: >> Looks fine to me. >> >> I did not see any comments from our colleges (from RH) who works on >> arm64. Are they agree with this change? > > Hmm, maybe the mail server is being slow. Andrew Haley and I both gave > it a thumbs up earlier today. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Wed Aug 19 18:22:18 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 11:22:18 -0700 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <1439991013.23660.72.camel@mint> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> <1439991013.23660.72.camel@mint> Message-ID: <55D4C95A.4060308@oracle.com> Yes, I agree with latest changes. Thank you, Andrew, for running performance tests! Regards, Vladimir On 8/19/15 6:30 AM, Edward Nevill wrote: > Hi Andrew, > > I have tested this on two different partner platforms with G1GC, CMS and > Parallel GC with and without UseCondCardMark. > > I have also reviewed the code and and happy with the code and comments. > > Vladimir, if you are still happy with this may I go ahead and push this > on behalf of Andrew Dinn. The changes only affect aarch64.ad. > > Many thanks, > Ed. > > On Wed, 2015-08-12 at 13:45 +0100, Andrew Dinn wrote: >> Hi Vladimir, >> >> Apologies for the delay in responding to your feedback -- I was >> traveling for a team meeting all of last week. >> >> Here is a revised webrev which includes all the code changes you suggested >> >> http://cr.openjdk.java.net/~adinn/8078743/webrev.04 >> > > From vladimir.kozlov at oracle.com Wed Aug 19 18:26:42 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 11:26:42 -0700 Subject: RFR(S): 8131969: jit/FloatingPoint/gen_math/Loops05 assert(2 <= size && size <= 16) failed: update low bits table In-Reply-To: <879D8683-02D2-46F5-BA4D-E08BE2C95943@oracle.com> References: <879D8683-02D2-46F5-BA4D-E08BE2C95943@oracle.com> Message-ID: <55D4CA62.307@oracle.com> Very good. Thanks, Vladimir On 8/19/15 3:30 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8131969/webrev.00/ > > This register allocator code processes the inputs of a vector Phi for a Loop with the expectation that all node inputs are already processed which is impossible: the logic assumes no vector Phi can be encountered. The vector Phi was created by the split through phi optimization (see test case) when optimizing a replicateD node: the Phi?s control is the outer loop. So having a vector Phi is not the problem here and the fix relaxes the assert. > > Roland. > From igor.veresov at oracle.com Thu Aug 20 00:48:35 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 19 Aug 2015 17:48:35 -0700 Subject: Smoother tiered compilation thread ergonomics In-Reply-To: <55D31EBF.5030305@oracle.com> References: <55D31EBF.5030305@oracle.com> Message-ID: Claes, Sure, I agree. Since you seem to be already looking at the problem, could you experiment with smoother functions and check how it affects the startup? Thanks, igor > On Aug 18, 2015, at 5:02 AM, Claes Redestad wrote: > > Hi, > > I noticed the thread ergonomics for tiered compilation have a few odd jumps > that perhaps could be improved. > > The calculation used to derive CICompilerCount for Tiered in > vm/runtime/advancedThresholdPolicy.cpp: > > int log_cpu = log2_intptr(os::active_processor_count()); > int loglog_cpu = log2_intptr(MAX2(log_cpu, 1)); > count = MAX2(log_cpu * loglog_cpu, 1) * 3 / 2; > > Seems to evaluate to: > > #CPUs CICompilerCount > <4 2 > 4 3 > 8 4 > 16 12 > 32 15 > 64 18 > 128 21 > 256 36 > 512 40 > 1024 45 > 2048 49 > > The jump from 4 to 12 threads at 16 processors doesn't look very elegant (there's > a small bump going from 128->256, too). It seems reasonable the ratio of compiler > threads to actual CPUs should diminish as CPUs increase, but going from 8 (or 15) > to 16 actually increases the ratio. > > If we'd replace log2_intptr with some non-discrete function instead of using > log2_intptr, we could smooth out the curve, which I think would be beneficial > for the rather common cases where systems have somewhere between 8 and 32 CPUs. > > /Claes From edward.nevill at gmail.com Thu Aug 20 10:47:44 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 11:47:44 +0100 Subject: RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 Message-ID: <1440067664.5989.21.camel@mylittlepony.linaroharston> Hi, The following webrev http://cr.openjdk.java.net/~enevill/8133842/webrev.01/ fixes a problem reported by one of our partners whereby C2 can generate illegal instructions on certain partners HW. JIRA issue here https://bugs.openjdk.java.net/browse/JDK-8133842 The problem occurs when you have a logical or arithmetic instruction with a RHS which is shifted by a constant where (const & 32) != 0, ie the constant is 32..63 or 96..127 etc. For example the following res = i | (j >> 53) generates the instruction orrw Rd, Rn, Rm, ASR #53 This instruction has a 6 bit field for the shift so this would appear to be a legal encoding. However certain partner HW treats this as an undefined instruction generating a SIGILL. The problem was that the rules in aarch64.ad were always anding the constant with 0x3f for both ints and longs. The above webrev fixes this to mask with 0x1f for ints and 0x3f for longs. Tested with hotspot and langtools. Results the same in both cases. Hotspot: Test results: passed: 863; failed: 2; error: 10 Langtools: Test results: passed: 3,263 Thanks for your help with this review, Ed. From aph at redhat.com Thu Aug 20 10:56:20 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2015 11:56:20 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <1440067664.5989.21.camel@mylittlepony.linaroharston> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> Message-ID: <55D5B254.1090303@redhat.com> Looks good for C2. OK if you already checked C1! Thanks, Andrew. From adinn at redhat.com Thu Aug 20 11:24:11 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Aug 2015 12:24:11 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5B254.1090303@redhat.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5B254.1090303@redhat.com> Message-ID: <55D5B8DB.8030508@redhat.com> On 20/08/15 11:56, Andrew Haley wrote: > Looks good for C2. OK if you already checked C1! > > Thanks, > Andrew. What he said :-) regards, Andrew Dinn ----------- From edward.nevill at gmail.com Thu Aug 20 13:10:20 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 14:10:20 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5B254.1090303@redhat.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5B254.1090303@redhat.com> Message-ID: <1440076220.5989.27.camel@mylittlepony.linaroharston> On Thu, 2015-08-20 at 11:56 +0100, Andrew Haley wrote: > Looks good for C2. OK if you already checked C1! Yes. For my test case res += i | (j >> 53); C1 generates 0x000003ff90eb0d68: lsr w3, w1, #21 0x000003ff90eb0d6c: orr w3, w0, w3 0x000003ff90eb0d70: add w3, w3, w2 IE. It doesn't attempt to merge the shift and the or, and it masks the shift with 0x1f getting 21 instead of 53. Below is the code in C1 that does it (from c1_LIRGenerator_aarch64.cpp). You can see it ands ints with 0x1f and longs with 0x3f. All the best, Ed. --- cut here --- if (right.is_constant()) { right.dont_load_item(); switch (x->op()) { case Bytecodes::_ishl: { int c = right.get_jint_constant() & 0x1f; __ shift_left(left.result(), c, x->operand()); break; } case Bytecodes::_ishr: { int c = right.get_jint_constant() & 0x1f; __ shift_right(left.result(), c, x->operand()); break; } case Bytecodes::_iushr: { int c = right.get_jint_constant() & 0x1f; __ unsigned_shift_right(left.result(), c, x->operand()); break; } case Bytecodes::_lshl: { int c = right.get_jint_constant() & 0x3f; __ shift_left(left.result(), c, x->operand()); break; } case Bytecodes::_lshr: { int c = right.get_jint_constant() & 0x3f; __ shift_right(left.result(), c, x->operand()); break; } case Bytecodes::_lushr: { int c = right.get_jint_constant() & 0x3f; __ unsigned_shift_right(left.result(), c, x->operand()); break; } default: ShouldNotReachHere(); } } else { --- cut here --- From claes.redestad at oracle.com Thu Aug 20 14:07:42 2015 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 20 Aug 2015 16:07:42 +0200 Subject: Smoother tiered compilation thread ergonomics In-Reply-To: References: <55D31EBF.5030305@oracle.com> Message-ID: <55D5DF2E.9000806@oracle.com> Sure, no code change is needed to experiment (-XX:CICompilerCount=xx) so I'm already running a few benchmarking experiments to try and evaluate what we should really aim for in the 8-32 CPU range. /Claes On 2015-08-20 02:48, Igor Veresov wrote: > Claes, > > Sure, I agree. Since you seem to be already looking at the problem, could you experiment with smoother functions and check how it affects the startup? > > Thanks, > igor > >> On Aug 18, 2015, at 5:02 AM, Claes Redestad wrote: >> >> Hi, >> >> I noticed the thread ergonomics for tiered compilation have a few odd jumps >> that perhaps could be improved. >> >> The calculation used to derive CICompilerCount for Tiered in >> vm/runtime/advancedThresholdPolicy.cpp: >> >> int log_cpu = log2_intptr(os::active_processor_count()); >> int loglog_cpu = log2_intptr(MAX2(log_cpu, 1)); >> count = MAX2(log_cpu * loglog_cpu, 1) * 3 / 2; >> >> Seems to evaluate to: >> >> #CPUs CICompilerCount >> <4 2 >> 4 3 >> 8 4 >> 16 12 >> 32 15 >> 64 18 >> 128 21 >> 256 36 >> 512 40 >> 1024 45 >> 2048 49 >> >> The jump from 4 to 12 threads at 16 processors doesn't look very elegant (there's >> a small bump going from 128->256, too). It seems reasonable the ratio of compiler >> threads to actual CPUs should diminish as CPUs increase, but going from 8 (or 15) >> to 16 actually increases the ratio. >> >> If we'd replace log2_intptr with some non-discrete function instead of using >> log2_intptr, we could smooth out the curve, which I think would be beneficial >> for the rather common cases where systems have somewhere between 8 and 32 CPUs. >> >> /Claes From roland.westrelin at oracle.com Thu Aug 20 14:12:25 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 20 Aug 2015 16:12:25 +0200 Subject: RFR(S): 8131969: jit/FloatingPoint/gen_math/Loops05 assert(2 <= size && size <= 16) failed: update low bits table In-Reply-To: <55D4CA62.307@oracle.com> References: <879D8683-02D2-46F5-BA4D-E08BE2C95943@oracle.com> <55D4CA62.307@oracle.com> Message-ID: <16CF1112-419C-4A7E-95D5-905C67E6B213@oracle.com> Thanks for the review, Vladimir. Roland. From vladimir.kozlov at oracle.com Thu Aug 20 15:55:36 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Aug 2015 08:55:36 -0700 Subject: RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <1440067664.5989.21.camel@mylittlepony.linaroharston> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> Message-ID: <55D5F878.6030901@oracle.com> Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which checks shift's value (to select different instructions for different shift values) and you use general immI operand type for it. Thanks, Vladimir On 8/20/15 3:47 AM, Edward Nevill wrote: > Hi, > > The following webrev > > http://cr.openjdk.java.net/~enevill/8133842/webrev.01/ > > fixes a problem reported by one of our partners whereby C2 can generate illegal instructions on certain partners HW. > > JIRA issue here > > https://bugs.openjdk.java.net/browse/JDK-8133842 > > The problem occurs when you have a logical or arithmetic instruction with a RHS which is shifted by a constant where (const & 32) != 0, ie the constant is 32..63 or 96..127 etc. > > For example the following > > res = i | (j >> 53) > > generates the instruction > > orrw Rd, Rn, Rm, ASR #53 > > This instruction has a 6 bit field for the shift so this would appear to be a legal encoding. However certain partner HW treats this as an undefined instruction generating a SIGILL. > > The problem was that the rules in aarch64.ad were always anding the constant with 0x3f for both ints and longs. > > The above webrev fixes this to mask with 0x1f for ints and 0x3f for longs. > > Tested with hotspot and langtools. Results the same in both cases. > > Hotspot: Test results: passed: 863; failed: 2; error: 10 > Langtools: Test results: passed: 3,263 > > Thanks for your help with this review, > Ed. > > From aph at redhat.com Thu Aug 20 16:04:51 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2015 17:04:51 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5F878.6030901@oracle.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5F878.6030901@oracle.com> Message-ID: <55D5FAA3.4050200@redhat.com> On 08/20/2015 04:55 PM, Vladimir Kozlov wrote: > Changes are fine but I am curious why results will be the same even if you mask less bits Java VM spec says: The shift distance actually used is always in the range 0 to 31, inclusive, as if value2 were subjected to a bitwise logical AND with the mask value 0x1f. Andrew. From edward.nevill at gmail.com Thu Aug 20 16:06:22 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 17:06:22 +0100 Subject: RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5F878.6030901@oracle.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5F878.6030901@oracle.com> Message-ID: <1440086782.1907.6.camel@mylittlepony.linaroharston> On Thu, 2015-08-20 at 08:55 -0700, Vladimir Kozlov wrote: > Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which > checks shift's value (to select different instructions for different shift values) and you use general immI operand type > for it. Not sure I understand the question. Currently it generate orrw Rd, Rn, Rm, ASR #53 Rd, Rn and Rm are 32 bit here (for orrw as opposed to orr). So on our partners implementation this generates a SIGILL. The webrev change this to orrw Rd, Rn, Rm, ASR #21 which is correct (for Java). All the best, Ed. From vladimir.kozlov at oracle.com Thu Aug 20 16:34:30 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Aug 2015 09:34:30 -0700 Subject: RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <1440086782.1907.6.camel@mylittlepony.linaroharston> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5F878.6030901@oracle.com> <1440086782.1907.6.camel@mylittlepony.linaroharston> Message-ID: <55D60196.7000505@oracle.com> Thank you. Andrew answered my question (only 0-31 values are meaningful). I thought you had on low level asm instruction masking already. For example on sparc we use u_field(imm5a, 4, 0): void sll( Register s1, int imm5a, Register d ) { emit_int32( op(arith_op) | rd(d) | op3(sll_op3) | rs1(s1) | sx(0) | immed(true) | u_field(imm5a, 4, 0) ); } It looks like you have to mask on more high level then we do on SPARC. Again, changes are good. Thanks, Vladimir On 8/20/15 9:06 AM, Edward Nevill wrote: > On Thu, 2015-08-20 at 08:55 -0700, Vladimir Kozlov wrote: >> Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which >> checks shift's value (to select different instructions for different shift values) and you use general immI operand type >> for it. > > Not sure I understand the question. > > Currently it generate > > orrw Rd, Rn, Rm, ASR #53 > > Rd, Rn and Rm are 32 bit here (for orrw as opposed to orr). > > So on our partners implementation this generates a SIGILL. > > The webrev change this to > > orrw Rd, Rn, Rm, ASR #21 > > which is correct (for Java). > > All the best, > Ed. > > From hui.shi at linaro.org Fri Aug 21 12:21:12 2015 From: hui.shi at linaro.org (Hui Shi) Date: Fri, 21 Aug 2015 20:21:12 +0800 Subject: aarch64: C2 fast lock/unlock issues Message-ID: Hi JIT members, Attached fast_lock.patch fixes issues in fast lock/unlock on aarch64 platform (in both aarch64-jdk8 and jdk9/hs-comp/hotspot). Could someone help comments, review or sponsor? A small test case and PrintAssembly log with/without fix are also attached for reference. To reproduce this issue on aarch64, command line is "java -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:-BackgroundCompilation -XX:CompileCommand="compileonly,TestSync.f*" -XX:+PrintAssembly TestSync" There are three Issues in aarch64 fast lock/unlock: *1. Duplicated biased lock checking* When option UseBiasedLocking and UseOptoBiasInlining are both true, it doesn't need emit biased_locking_enter in aarch64_enc_fast_lock. This is redundant as biased locking enter check is already inlined in PhaseMacroExpand::expand_lock_node. Checking assembly code in orig.asm [Inlined biased lock check in PhaseMacroExpand::expand_lock_node] 0x000003ff88320d94: str x1, [sp] 0x000003ff88320d98: ldr x10, [x1] 0x000003ff88320d9c: and x11, x10, #0x7 0x000003ff88320da0: cmp x11, #0x5 0x000003ff88320da4: b.ne 0x000003ff88320e18 [Biased lock check expanded in aarch64_enc_fast_lock] 0x000003ff88320e18: add x12, sp, #0x10 0x000003ff88320e1c: ldr x10, [x1] 0x000003ff88320e20: and x11, x12, #0x7 0x000003ff88320e24: cmp x11, #0x5 0x000003ff88320e28: b.ne 0x000003ff88320eec *2. Incorrect parameter used in biased_locking_enter in aarch64_enc_fast_lock* Checking above code [Biased lock check expanded in aarch64_enc_fast_lock], x12 is the box register and holding the address of the lock record on stack. However it is mis-used as mark word in biased lock checking here. As a result, biased pattern check always fails because stack pointer is 8 bytes align and x11 must be zero. Current implementation in aarch64_enc_fast_lock. *biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);* Which should be *biased_locking_enter(box, oop, disp_hdr, tmp, true, cont); //swap disp_hdr and box register, disp_hdr is already loaded with object mark word* This issue might cause problem when running with option -XX:-UseOptoBiasInlining in following scenario, let?s check above code in [Biased lock check expanded in aarch64_enc_fast_lock], x12 is box and x10 is disp_hdr. 1. Suppose object?s mark word (loaded into register x10) is in biased mode, with content ?[biased_thread |epoch|age| 101]? and biased_thread is executing its synchronized block. 2. Another thread tries to acquire the same lock. Firstly, it performs biased pattern check and fails, because ?mark word? register used here is X12 (correct register should be x10). 3. As x12 is not ?biased? (least three significant bits of SP + 0x10 would never be 101), execution goes to thin lock CAS acquire code instead of biased lock revoke/rebias code. 4. Thin lock CAS acquire will succeed because x10?s least two significant bit is 01 (thin lock CAS code uses disp_hdr (x10) as mark word). Two threads acquire same lock at same time and this is incorrect behavior. *3. Inflate monitor code has typo in aarch64_enc_fast_lock* Inflated lock test is generated under condition (EmitSync & 0x02), while generating inflated lock fast path under condition "if ((EmitSync & 0x02) == 0))". At both location, they should be "if ((EmitSync & 0x02) == 0) ". In orig.asm, no instruction branches to inflated lock acquire fast path at 0x000003ff88320f24. Issue #1 and #3 does not impact correctness, they introduce redundant code (double biased lock check) and skip inflated lock fast path check (_owner is null case). Fix is in aarch64_enc_fast_lock/aarch64_enc_fast_unlock, this will not impact C1 and interpreter. Attached patch includes: 1. Disable generating biased lock handle code in fast_lock/fast_unlock when UseOptoBiasInlining is true. 2. Adjust biased_locking_enter?s actual parameters, swap disp_hdr and box register. 3. Fix typo in inflated monitor handling. Regards Shi Hui -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- [Disassembling for mach='aarch64'] [Entry Point] [Verified Entry Point] [Constants] # {method} {0x0000007f1264b4e8} 'foo' '(LTestSync;)I' in 'TestSync' # parm0: c_rarg1:c_rarg1 = 'TestSync' # [sp+0x30] (sp of caller) 0x0000007f950e8000: nop 0x0000007f950e8004: orr x9, xzr, #0xffffffffffffc000 0x0000007f950e8008: str xzr, [sp,x9] 0x0000007f950e800c: sub sp, sp, #0x30 0x0000007f950e8010: stp x29, x30, [sp,#32] ;*synchronization entry ; - TestSync::foo at -1 (line 27) 0x0000007f950e8014: str x1, [sp] 0x0000007f950e8018: ldr x10, [x1] ; implicit exception: dispatches to 0x0000007f950e81e4 0x0000007f950e801c: and x11, x10, #0x7 0x0000007f950e8020: cmp x11, #0x5 0x0000007f950e8024: b.ne 0x0000007f950e8098 0x0000007f950e8028: mov x11, #0x60000 // #393216 ; {metadata('TestSync')} 0x0000007f950e802c: movk x11, #0x28 0x0000007f950e8030: eor x11, x11, #0x800000000 0x0000007f950e8034: ldr x11, [x11,#168] 0x0000007f950e8038: mov x12, x28 0x0000007f950e803c: orr x12, x12, x11 0x0000007f950e8040: eor x13, x12, x10 0x0000007f950e8044: and x14, x13, #0xffffffffffffff87 0x0000007f950e8048: cbnz x14, 0x0000007f950e819c 0x0000007f950e804c: dmb ishld ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 5) ; - TestSync::foo at 1 (line 27) 0x0000007f950e8050: ldr x10, [sp] 0x0000007f950e8054: ldrsh w20, [x10,#12] ;*getfield quantity ; - TestSync::fgetQuantity at 1 (line 5) ; - TestSync::foo at 1 (line 27) 0x0000007f950e8058: dmb ish 0x0000007f950e805c: ldr x10, [x10] 0x0000007f950e8060: and x10, x10, #0x7 0x0000007f950e8064: cmp x10, #0x5 0x0000007f950e8068: b.ne 0x0000007f950e810c ;*ireturn ; - TestSync::fgetQuantity at 6 (line 6) ; - TestSync::foo at 1 (line 27) 0x0000007f950e806c: mov w0, w20 0x0000007f950e8070: ldp x29, x30, [sp,#32] 0x0000007f950e8074: add sp, sp, #0x30 0x0000007f950e8078: adrp x8, 0x0000007f9e847000 ; {poll_return} 0x0000007f950e807c: ldr wzr, [x8] ; {poll_return} 0x0000007f950e8080: ret 0x0000007f950e8084: ldxr x8, [x1] 0x0000007f950e8088: cmp x8, x10 0x0000007f950e808c: b.ne 0x0000007f950e8098 0x0000007f950e8090: stlxr w8, x11, [x1] 0x0000007f950e8094: cbnz w8, 0x0000007f950e8084 0x0000007f950e8098: add x12, sp, #0x10 0x0000007f950e809c: ldr x10, [x1] 0x0000007f950e80a0: tbnz w10, #1, 0x0000007f950e80dc 0x0000007f950e80a4: orr x10, x10, #0x1 0x0000007f950e80a8: str x10, [x12] 0x0000007f950e80ac: ldxr x11, [x1] 0x0000007f950e80b0: cmp x11, x10 0x0000007f950e80b4: b.ne 0x0000007f950e80c4 0x0000007f950e80b8: stlxr w11, x12, [x1] 0x0000007f950e80bc: cbz w11, 0x0000007f950e80fc 0x0000007f950e80c0: b 0x0000007f950e80ac 0x0000007f950e80c4: mov x8, sp 0x0000007f950e80c8: sub x10, x10, x8 0x0000007f950e80cc: orr x11, xzr, #0xfffffffffffff003 0x0000007f950e80d0: ands x11, x10, x11 0x0000007f950e80d4: str x11, [x12] 0x0000007f950e80d8: b 0x0000007f950e80fc 0x0000007f950e80dc: add x11, x10, #0x16 0x0000007f950e80e0: mov x10, xzr 0x0000007f950e80e4: ldxr x8, [x11] 0x0000007f950e80e8: cmp x10, x8 0x0000007f950e80ec: b.ne 0x0000007f950e80f8 0x0000007f950e80f0: stlxr w8, x28, [x11] 0x0000007f950e80f4: cbnz w8, 0x0000007f950e80e4 0x0000007f950e80f8: str x12, [x12] 0x0000007f950e80fc: b.eq 0x0000007f950e804c 0x0000007f950e8100: add x2, sp, #0x10 0x0000007f950e8104: bl 0x0000007f950dcb40 ; OopMap{[0]=Oop off=264} ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 5) ; - TestSync::foo at 1 (line 27) ; {runtime_call} 0x0000007f950e8108: b 0x0000007f950e804c 0x0000007f950e810c: add x12, sp, #0x10 0x0000007f950e8110: ldr x13, [sp] 0x0000007f950e8114: ldr x11, [x12] 0x0000007f950e8118: cmp x11, xzr 0x0000007f950e811c: b.eq 0x0000007f950e817c 0x0000007f950e8120: ldr x10, [x13] 0x0000007f950e8124: tbnz w11, #1, 0x0000007f950e8144 0x0000007f950e8128: ldxr x10, [x13] 0x0000007f950e812c: cmp x12, x10 0x0000007f950e8130: b.ne 0x0000007f950e8140 0x0000007f950e8134: stlxr w10, x11, [x13] 0x0000007f950e8138: cbz w10, 0x0000007f950e817c 0x0000007f950e813c: b 0x0000007f950e8128 0x0000007f950e8140: b 0x0000007f950e817c 0x0000007f950e8144: sub x10, x10, #0x2 0x0000007f950e8148: ldr x8, [x10,#24] 0x0000007f950e814c: ldr x11, [x10,#40] 0x0000007f950e8150: eor x8, x8, x28 0x0000007f950e8154: orr x8, x8, x11 0x0000007f950e8158: cmp x8, xzr 0x0000007f950e815c: b.ne 0x0000007f950e817c 0x0000007f950e8160: ldr x8, [x10,#64] 0x0000007f950e8164: ldr x11, [x10,#56] 0x0000007f950e8168: orr x8, x8, x11 0x0000007f950e816c: cmp x8, xzr 0x0000007f950e8170: b.ne 0x0000007f950e817c 0x0000007f950e8174: add x10, x10, #0x18 0x0000007f950e8178: stlr x8, [x10] 0x0000007f950e817c: b.eq 0x0000007f950e806c 0x0000007f950e8180: add x1, sp, #0x10 ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 5) ; - TestSync::foo at 1 (line 27) 0x0000007f950e8184: mov x0, x13 0x0000007f950e8188: mov x8, #0xebbc // #60348 ; {runtime_call} 0x0000007f950e818c: movk x8, #0x9e35, lsl #16 0x0000007f950e8190: movk x8, #0x7f, lsl #32 0x0000007f950e8194: blr x8 ;*ireturn ; - TestSync::fgetQuantity at 6 (line 6) ; - TestSync::foo at 1 (line 27) 0x0000007f950e8198: b 0x0000007f950e806c 0x0000007f950e819c: and x14, x13, #0x7 0x0000007f950e81a0: cbnz x14, 0x0000007f950e8084 0x0000007f950e81a4: mov x14, #0x37f // #895 0x0000007f950e81a8: and x14, x10, x14 0x0000007f950e81ac: mov x11, x28 0x0000007f950e81b0: and x13, x13, #0x300 0x0000007f950e81b4: orr x11, x11, x14 0x0000007f950e81b8: cbnz x13, 0x0000007f950e81d8 0x0000007f950e81bc: ldxr x8, [x1] 0x0000007f950e81c0: cmp x8, x14 0x0000007f950e81c4: b.ne 0x0000007f950e81d0 0x0000007f950e81c8: stlxr w8, x11, [x1] 0x0000007f950e81cc: cbnz w8, 0x0000007f950e81bc 0x0000007f950e81d0: b.ne 0x0000007f950e8100 0x0000007f950e81d4: b 0x0000007f950e804c 0x0000007f950e81d8: mov x11, x12 0x0000007f950e81dc: mov x14, x10 0x0000007f950e81e0: b 0x0000007f950e81bc 0x0000007f950e81e4: mov w1, #0xfffffff6 // #-10 0x0000007f950e81e8: bl 0x0000007f9507efc0 ; OopMap{off=492} ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 27) ; {runtime_call} 0x0000007f950e81ec: brk #0x3e7 ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 27) 0x0000007f950e81f0: .inst 0x00000000 ; undefined 0x0000007f950e81f4: .inst 0x00000000 ; undefined 0x0000007f950e81f8: .inst 0x00000000 ; undefined 0x0000007f950e81fc: .inst 0x00000000 ; undefined [Exception Handler] [Stub Code] 0x0000007f950e8200: b 0x0000007f950d9b80 ; {no_reloc} [Deopt Handler Code] 0x0000007f950e8204: adr x30, 0x0000007f950e8204 0x0000007f950e8208: b 0x0000007f950b35c0 ; {runtime_call} 0x0000007f950e820c: .inst 0x00000000 ; undefined -------------- next part -------------- A non-text attachment was scrubbed... Name: fast_lock.patch Type: application/octet-stream Size: 949 bytes Desc: not available URL: -------------- next part -------------- [Entry Point] [Verified Entry Point] [Constants] # {method} {0x0000007f818004e8} 'foo' '(LTestSync;)I' in 'TestSync' # parm0: c_rarg1:c_rarg1 = 'TestSync' # [sp+0x30] (sp of caller) 0x0000007f9c3ad480: nop 0x0000007f9c3ad484: orr x9, xzr, #0xffffffffffffc000 0x0000007f9c3ad488: str xzr, [sp,x9] 0x0000007f9c3ad48c: sub sp, sp, #0x30 0x0000007f9c3ad490: stp x29, x30, [sp,#32] ;*synchronization entry ; - TestSync::foo at -1 (line 31) 0x0000007f9c3ad494: str x1, [sp] 0x0000007f9c3ad498: cbz x1, 0x0000007f9c3ad680 ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad49c: add x12, sp, #0x10 0x0000007f9c3ad4a0: ldr x10, [x1] 0x0000007f9c3ad4a4: and x11, x10, #0x7 0x0000007f9c3ad4a8: cmp x11, #0x5 0x0000007f9c3ad4ac: b.ne 0x0000007f9c3ad570 0x0000007f9c3ad4b0: ldr w11, [x1,#8] 0x0000007f9c3ad4b4: eor x11, x11, #0x800000000 0x0000007f9c3ad4b8: ldr x11, [x11,#168] 0x0000007f9c3ad4bc: orr x11, x11, x28 0x0000007f9c3ad4c0: eor x11, x10, x11 0x0000007f9c3ad4c4: and x11, x11, #0xffffffffffffff87 0x0000007f9c3ad4c8: cbz x11, 0x0000007f9c3ad5d0 0x0000007f9c3ad4cc: and x8, x11, #0x7 0x0000007f9c3ad4d0: cbnz x8, 0x0000007f9c3ad540 0x0000007f9c3ad4d4: and x8, x11, #0x300 0x0000007f9c3ad4d8: cbnz x8, 0x0000007f9c3ad50c 0x0000007f9c3ad4dc: mov x8, #0x37f // #895 0x0000007f9c3ad4e0: and x10, x10, x8 0x0000007f9c3ad4e4: orr x11, x10, x28 0x0000007f9c3ad4e8: ldaxr x8, [x1] 0x0000007f9c3ad4ec: cmp x8, x10 0x0000007f9c3ad4f0: b.ne 0x0000007f9c3ad500 0x0000007f9c3ad4f4: stlxr w8, x11, [x1] 0x0000007f9c3ad4f8: cbz w8, 0x0000007f9c3ad508 0x0000007f9c3ad4fc: b 0x0000007f9c3ad4e8 0x0000007f9c3ad500: dmb ish 0x0000007f9c3ad504: mov x10, x8 0x0000007f9c3ad508: b 0x0000007f9c3ad5d0 0x0000007f9c3ad50c: ldr w11, [x1,#8] 0x0000007f9c3ad510: eor x11, x11, #0x800000000 0x0000007f9c3ad514: ldr x11, [x11,#168] 0x0000007f9c3ad518: orr x11, x28, x11 0x0000007f9c3ad51c: ldaxr x8, [x1] 0x0000007f9c3ad520: cmp x8, x10 0x0000007f9c3ad524: b.ne 0x0000007f9c3ad534 0x0000007f9c3ad528: stlxr w8, x11, [x1] 0x0000007f9c3ad52c: cbz w8, 0x0000007f9c3ad53c 0x0000007f9c3ad530: b 0x0000007f9c3ad51c 0x0000007f9c3ad534: dmb ish 0x0000007f9c3ad538: mov x10, x8 0x0000007f9c3ad53c: b 0x0000007f9c3ad5d0 0x0000007f9c3ad540: ldr w11, [x1,#8] 0x0000007f9c3ad544: eor x11, x11, #0x800000000 0x0000007f9c3ad548: ldr x11, [x11,#168] 0x0000007f9c3ad54c: ldaxr x8, [x1] 0x0000007f9c3ad550: cmp x8, x10 0x0000007f9c3ad554: b.ne 0x0000007f9c3ad564 0x0000007f9c3ad558: stlxr w8, x11, [x1] 0x0000007f9c3ad55c: cbz w8, 0x0000007f9c3ad570 0x0000007f9c3ad560: b 0x0000007f9c3ad54c 0x0000007f9c3ad564: dmb ish 0x0000007f9c3ad568: mov x10, x8 0x0000007f9c3ad56c: b 0x0000007f9c3ad570 0x0000007f9c3ad570: ldr x10, [x1] 0x0000007f9c3ad574: tbnz w10, #1, 0x0000007f9c3ad5b0 0x0000007f9c3ad578: orr x10, x10, #0x1 0x0000007f9c3ad57c: str x10, [x12] 0x0000007f9c3ad580: ldxr x11, [x1] 0x0000007f9c3ad584: cmp x11, x10 0x0000007f9c3ad588: b.ne 0x0000007f9c3ad598 0x0000007f9c3ad58c: stlxr w11, x12, [x1] 0x0000007f9c3ad590: cbz w11, 0x0000007f9c3ad5d0 0x0000007f9c3ad594: b 0x0000007f9c3ad580 0x0000007f9c3ad598: mov x8, sp 0x0000007f9c3ad59c: sub x10, x10, x8 0x0000007f9c3ad5a0: orr x11, xzr, #0xfffffffffffff003 0x0000007f9c3ad5a4: ands x11, x10, x11 0x0000007f9c3ad5a8: str x11, [x12] 0x0000007f9c3ad5ac: b 0x0000007f9c3ad5d0 0x0000007f9c3ad5b0: add x11, x10, #0x16 0x0000007f9c3ad5b4: mov x10, xzr 0x0000007f9c3ad5b8: ldxr x8, [x11] 0x0000007f9c3ad5bc: cmp x10, x8 0x0000007f9c3ad5c0: b.ne 0x0000007f9c3ad5cc 0x0000007f9c3ad5c4: stlxr w8, x28, [x11] 0x0000007f9c3ad5c8: cbnz w8, 0x0000007f9c3ad5b8 0x0000007f9c3ad5cc: str x12, [x12] 0x0000007f9c3ad5d0: b.ne 0x0000007f9c3ad68c 0x0000007f9c3ad5d4: dmb ishld ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad5d8: ldr x10, [sp] 0x0000007f9c3ad5dc: ldrsh w19, [x10,#12] ;*getfield quantity ; - TestSync::fgetQuantity at 1 (line 6) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad5e0: dmb ish ;*ireturn ; - TestSync::fgetQuantity at 6 (line 7) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad5e4: add x12, sp, #0x10 ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad5e8: ldr x13, [sp] 0x0000007f9c3ad5ec: ldr x10, [x13] 0x0000007f9c3ad5f0: and x10, x10, #0x7 0x0000007f9c3ad5f4: cmp x10, #0x5 0x0000007f9c3ad5f8: b.eq 0x0000007f9c3ad664 0x0000007f9c3ad5fc: ldr x11, [x12] 0x0000007f9c3ad600: cmp x11, xzr 0x0000007f9c3ad604: b.eq 0x0000007f9c3ad664 0x0000007f9c3ad608: ldr x10, [x13] 0x0000007f9c3ad60c: tbnz w11, #1, 0x0000007f9c3ad62c 0x0000007f9c3ad610: ldxr x10, [x13] 0x0000007f9c3ad614: cmp x12, x10 0x0000007f9c3ad618: b.ne 0x0000007f9c3ad628 0x0000007f9c3ad61c: stlxr w10, x11, [x13] 0x0000007f9c3ad620: cbz w10, 0x0000007f9c3ad664 0x0000007f9c3ad624: b 0x0000007f9c3ad610 0x0000007f9c3ad628: b 0x0000007f9c3ad664 0x0000007f9c3ad62c: sub x10, x10, #0x2 0x0000007f9c3ad630: ldr x8, [x10,#24] 0x0000007f9c3ad634: ldr x11, [x10,#40] 0x0000007f9c3ad638: eor x8, x8, x28 0x0000007f9c3ad63c: orr x8, x8, x11 0x0000007f9c3ad640: cmp x8, xzr 0x0000007f9c3ad644: b.ne 0x0000007f9c3ad664 0x0000007f9c3ad648: ldr x8, [x10,#64] 0x0000007f9c3ad64c: ldr x11, [x10,#56] 0x0000007f9c3ad650: orr x8, x8, x11 0x0000007f9c3ad654: cmp x8, xzr 0x0000007f9c3ad658: b.ne 0x0000007f9c3ad664 0x0000007f9c3ad65c: add x10, x10, #0x18 0x0000007f9c3ad660: stlr x8, [x10] 0x0000007f9c3ad664: b.ne 0x0000007f9c3ad698 ;*ireturn ; - TestSync::fgetQuantity at 6 (line 7) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad668: mov w0, w19 0x0000007f9c3ad66c: ldp x29, x30, [sp,#32] 0x0000007f9c3ad670: add sp, sp, #0x30 0x0000007f9c3ad674: adrp x8, 0x0000007fa9389000 ; {poll_return} 0x0000007f9c3ad678: ldr wzr, [x8] ; {poll_return} 0x0000007f9c3ad67c: ret 0x0000007f9c3ad680: mov w1, #0xfffffff6 // #-10 0x0000007f9c3ad684: bl 0x0000007f9c07efc0 ; OopMap{off=520} ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 31) ; {runtime_call} 0x0000007f9c3ad688: brk #0x3e7 ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad68c: add x2, sp, #0x10 0x0000007f9c3ad690: bl 0x0000007f9c0dcb40 ; OopMap{[0]=Oop off=532} ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) ; {runtime_call} 0x0000007f9c3ad694: b 0x0000007f9c3ad5d4 0x0000007f9c3ad698: add x1, sp, #0x10 ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad69c: mov x0, x13 0x0000007f9c3ad6a0: mov x8, #0xbbc // #3004 ; {runtime_call} 0x0000007f9c3ad6a4: movk x8, #0xa8ea, lsl #16 0x0000007f9c3ad6a8: movk x8, #0x7f, lsl #32 0x0000007f9c3ad6ac: blr x8 ;*ireturn ; - TestSync::fgetQuantity at 6 (line 7) ; - TestSync::foo at 1 (line 31) 0x0000007f9c3ad6b0: b 0x0000007f9c3ad668 0x0000007f9c3ad6b4: .inst 0x00000000 ; undefined 0x0000007f9c3ad6b8: .inst 0x00000000 ; undefined 0x0000007f9c3ad6bc: .inst 0x00000000 ; undefined [Exception Handler] [Stub Code] 0x0000007f9c3ad6c0: b 0x0000007f9c0d9b80 ; {no_reloc} [Deopt Handler Code] 0x0000007f9c3ad6c4: adr x30, 0x0000007f9c3ad6c4 0x0000007f9c3ad6c8: b 0x0000007f9c0b35c0 ; {runtime_call} 0x0000007f9c3ad6cc: .inst 0x00000000 ; undefined -------------- next part -------------- Decoding compiled method 0x000003ff88320c10: Code: [Entry Point] [Verified Entry Point] [Constants] # {method} {0x000003ff574004e8} 'foo' '(LTestSync;)I' in 'TestSync' # parm0: c_rarg1:c_rarg1 = 'TestSync' # [sp+0x30] (sp of caller) 0x000003ff88320d80: nop 0x000003ff88320d84: orr x9, xzr, #0xffffffffffff0000 0x000003ff88320d88: str xzr, [sp,x9] 0x000003ff88320d8c: sub sp, sp, #0x30 0x000003ff88320d90: stp x29, x30, [sp,#32] ;*synchronization entry ; - TestSync::foo at -1 (line 31) 0x000003ff88320d94: str x1, [sp] 0x000003ff88320d98: ldr x10, [x1] ; implicit exception: dispatches to 0x000003ff8832103c 0x000003ff88320d9c: and x11, x10, #0x7 0x000003ff88320da0: cmp x11, #0x5 0x000003ff88320da4: b.ne 0x000003ff88320e18 0x000003ff88320da8: mov x11, #0x60000 // #393216 ; {metadata('TestSync')} 0x000003ff88320dac: movk x11, #0x28 0x000003ff88320db0: eor x11, x11, #0x800000000 0x000003ff88320db4: ldr x11, [x11,#168] 0x000003ff88320db8: mov x12, x28 0x000003ff88320dbc: orr x12, x12, x11 0x000003ff88320dc0: eor x13, x12, x10 0x000003ff88320dc4: and x14, x13, #0xffffffffffffff87 0x000003ff88320dc8: cbnz x14, 0x000003ff88320ff4 0x000003ff88320dcc: dmb ishld ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) 0x000003ff88320dd0: ldr x10, [sp] 0x000003ff88320dd4: ldrsh w20, [x10,#12] ;*getfield quantity ; - TestSync::fgetQuantity at 1 (line 6) ; - TestSync::foo at 1 (line 31) 0x000003ff88320dd8: dmb ish 0x000003ff88320ddc: ldr x10, [x10] 0x000003ff88320de0: and x10, x10, #0x7 0x000003ff88320de4: cmp x10, #0x5 0x000003ff88320de8: b.ne 0x000003ff88320f54 ;*ireturn ; - TestSync::fgetQuantity at 6 (line 7) ; - TestSync::foo at 1 (line 31) 0x000003ff88320dec: mov w0, w20 0x000003ff88320df0: ldp x29, x30, [sp,#32] 0x000003ff88320df4: add sp, sp, #0x30 0x000003ff88320df8: adrp x8, 0x000003ff97090000 ; {poll_return} 0x000003ff88320dfc: ldr wzr, [x8] ; {poll_return} 0x000003ff88320e00: ret 0x000003ff88320e04: ldxr x8, [x1] 0x000003ff88320e08: cmp x8, x10 0x000003ff88320e0c: b.ne 0x000003ff88320e18 0x000003ff88320e10: stlxr w8, x11, [x1] 0x000003ff88320e14: cbnz w8, 0x000003ff88320e04 0x000003ff88320e18: add x12, sp, #0x10 0x000003ff88320e1c: ldr x10, [x1] 0x000003ff88320e20: and x11, x12, #0x7 0x000003ff88320e24: cmp x11, #0x5 0x000003ff88320e28: b.ne 0x000003ff88320eec 0x000003ff88320e2c: ldr w11, [x1,#8] 0x000003ff88320e30: eor x11, x11, #0x800000000 0x000003ff88320e34: ldr x11, [x11,#168] 0x000003ff88320e38: orr x11, x11, x28 0x000003ff88320e3c: eor x11, x12, x11 0x000003ff88320e40: and x11, x11, #0xffffffffffffff87 0x000003ff88320e44: cbz x11, 0x000003ff88320f44 0x000003ff88320e48: and x8, x11, #0x7 0x000003ff88320e4c: cbnz x8, 0x000003ff88320ebc 0x000003ff88320e50: and x8, x11, #0x300 0x000003ff88320e54: cbnz x8, 0x000003ff88320e88 0x000003ff88320e58: mov x8, #0x37f // #895 0x000003ff88320e5c: and x12, x12, x8 0x000003ff88320e60: orr x11, x12, x28 0x000003ff88320e64: ldaxr x8, [x1] 0x000003ff88320e68: cmp x8, x12 0x000003ff88320e6c: b.ne 0x000003ff88320e7c 0x000003ff88320e70: stlxr w8, x11, [x1] 0x000003ff88320e74: cbz w8, 0x000003ff88320e84 0x000003ff88320e78: b 0x000003ff88320e64 0x000003ff88320e7c: dmb ish 0x000003ff88320e80: mov x12, x8 0x000003ff88320e84: b 0x000003ff88320f44 0x000003ff88320e88: ldr w11, [x1,#8] 0x000003ff88320e8c: eor x11, x11, #0x800000000 0x000003ff88320e90: ldr x11, [x11,#168] 0x000003ff88320e94: orr x11, x28, x11 0x000003ff88320e98: ldaxr x8, [x1] 0x000003ff88320e9c: cmp x8, x12 0x000003ff88320ea0: b.ne 0x000003ff88320eb0 0x000003ff88320ea4: stlxr w8, x11, [x1] 0x000003ff88320ea8: cbz w8, 0x000003ff88320eb8 0x000003ff88320eac: b 0x000003ff88320e98 0x000003ff88320eb0: dmb ish 0x000003ff88320eb4: mov x12, x8 0x000003ff88320eb8: b 0x000003ff88320f44 0x000003ff88320ebc: ldr w11, [x1,#8] 0x000003ff88320ec0: eor x11, x11, #0x800000000 0x000003ff88320ec4: ldr x11, [x11,#168] 0x000003ff88320ec8: ldaxr x8, [x1] 0x000003ff88320ecc: cmp x8, x12 0x000003ff88320ed0: b.ne 0x000003ff88320ee0 0x000003ff88320ed4: stlxr w8, x11, [x1] 0x000003ff88320ed8: cbz w8, 0x000003ff88320eec 0x000003ff88320edc: b 0x000003ff88320ec8 0x000003ff88320ee0: dmb ish 0x000003ff88320ee4: mov x12, x8 0x000003ff88320ee8: b 0x000003ff88320eec 0x000003ff88320eec: orr x10, x10, #0x1 0x000003ff88320ef0: str x10, [x12] 0x000003ff88320ef4: ldxr x11, [x1] 0x000003ff88320ef8: cmp x11, x10 0x000003ff88320efc: b.ne 0x000003ff88320f0c 0x000003ff88320f00: stlxr w11, x12, [x1] 0x000003ff88320f04: cbz w11, 0x000003ff88320f44 0x000003ff88320f08: b 0x000003ff88320ef4 0x000003ff88320f0c: mov x8, sp 0x000003ff88320f10: sub x10, x10, x8 0x000003ff88320f14: orr x11, xzr, #0xffffffffffff0003 0x000003ff88320f18: ands x11, x10, x11 0x000003ff88320f1c: str x11, [x12] 0x000003ff88320f20: b 0x000003ff88320f44 0x000003ff88320f24: add x11, x10, #0x16 0x000003ff88320f28: mov x10, xzr 0x000003ff88320f2c: ldxr x8, [x11] 0x000003ff88320f30: cmp x10, x8 0x000003ff88320f34: b.ne 0x000003ff88320f40 0x000003ff88320f38: stlxr w8, x28, [x11] 0x000003ff88320f3c: cbnz w8, 0x000003ff88320f2c 0x000003ff88320f40: str x12, [x12] 0x000003ff88320f44: b.eq 0x000003ff88320dcc 0x000003ff88320f48: add x2, sp, #0x10 0x000003ff88320f4c: bl 0x000003ff88313b00 ; OopMap{[0]=Oop off=464} ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) ; {runtime_call} 0x000003ff88320f50: b 0x000003ff88320dcc 0x000003ff88320f54: add x12, sp, #0x10 0x000003ff88320f58: ldr x13, [sp] 0x000003ff88320f5c: ldr x10, [x13] 0x000003ff88320f60: and x10, x10, #0x7 0x000003ff88320f64: cmp x10, #0x5 0x000003ff88320f68: b.eq 0x000003ff88320fd4 0x000003ff88320f6c: ldr x11, [x12] 0x000003ff88320f70: cmp x11, xzr 0x000003ff88320f74: b.eq 0x000003ff88320fd4 0x000003ff88320f78: ldr x10, [x13] 0x000003ff88320f7c: tbnz w11, #1, 0x000003ff88320f9c 0x000003ff88320f80: ldxr x10, [x13] 0x000003ff88320f84: cmp x12, x10 0x000003ff88320f88: b.ne 0x000003ff88320f98 0x000003ff88320f8c: stlxr w10, x11, [x13] 0x000003ff88320f90: cbz w10, 0x000003ff88320fd4 0x000003ff88320f94: b 0x000003ff88320f80 0x000003ff88320f98: b 0x000003ff88320fd4 0x000003ff88320f9c: sub x10, x10, #0x2 0x000003ff88320fa0: ldr x8, [x10,#24] 0x000003ff88320fa4: ldr x11, [x10,#40] 0x000003ff88320fa8: eor x8, x8, x28 0x000003ff88320fac: orr x8, x8, x11 0x000003ff88320fb0: cmp x8, xzr 0x000003ff88320fb4: b.ne 0x000003ff88320fd4 0x000003ff88320fb8: ldr x8, [x10,#64] 0x000003ff88320fbc: ldr x11, [x10,#56] 0x000003ff88320fc0: orr x8, x8, x11 0x000003ff88320fc4: cmp x8, xzr 0x000003ff88320fc8: b.ne 0x000003ff88320fd4 0x000003ff88320fcc: add x10, x10, #0x18 0x000003ff88320fd0: stlr x8, [x10] 0x000003ff88320fd4: b.eq 0x000003ff88320dec 0x000003ff88320fd8: add x1, sp, #0x10 ;*synchronization entry ; - TestSync::fgetQuantity at -1 (line 6) ; - TestSync::foo at 1 (line 31) 0x000003ff88320fdc: mov x0, x13 0x000003ff88320fe0: mov x8, #0x5b9c // #23452 ; {runtime_call} 0x000003ff88320fe4: movk x8, #0x97c2, lsl #16 0x000003ff88320fe8: movk x8, #0x3ff, lsl #32 0x000003ff88320fec: blr x8 ;*ireturn ; - TestSync::fgetQuantity at 6 (line 7) ; - TestSync::foo at 1 (line 31) 0x000003ff88320ff0: b 0x000003ff88320dec 0x000003ff88320ff4: and x14, x13, #0x7 0x000003ff88320ff8: cbnz x14, 0x000003ff88320e04 0x000003ff88320ffc: mov x14, #0x37f // #895 0x000003ff88321000: and x14, x10, x14 0x000003ff88321004: mov x11, x28 0x000003ff88321008: and x13, x13, #0x300 0x000003ff8832100c: orr x11, x11, x14 0x000003ff88321010: cbnz x13, 0x000003ff88321030 0x000003ff88321014: ldxr x8, [x1] 0x000003ff88321018: cmp x8, x14 0x000003ff8832101c: b.ne 0x000003ff88321028 0x000003ff88321020: stlxr w8, x11, [x1] 0x000003ff88321024: cbnz w8, 0x000003ff88321014 0x000003ff88321028: b.ne 0x000003ff88320f48 0x000003ff8832102c: b 0x000003ff88320dcc 0x000003ff88321030: mov x11, x12 0x000003ff88321034: mov x14, x10 0x000003ff88321038: b 0x000003ff88321014 0x000003ff8832103c: mov w1, #0xfffffff6 // #-10 0x000003ff88321040: bl 0x000003ff880b15c0 ; OopMap{off=708} ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 31) ; {runtime_call} 0x000003ff88321044: brk #0x3e7 ;*invokevirtual fgetQuantity ; - TestSync::foo at 1 (line 31) 0x000003ff88321048: .inst 0x00000000 ; undefined 0x000003ff8832104c: .inst 0x00000000 ; undefined 0x000003ff88321050: .inst 0x00000000 ; undefined 0x000003ff88321054: .inst 0x00000000 ; undefined 0x000003ff88321058: .inst 0x00000000 ; undefined 0x000003ff8832105c: .inst 0x00000000 ; undefined 0x000003ff88321060: .inst 0x00000000 ; undefined 0x000003ff88321064: .inst 0x00000000 ; undefined 0x000003ff88321068: .inst 0x00000000 ; undefined 0x000003ff8832106c: .inst 0x00000000 ; undefined 0x000003ff88321070: .inst 0x00000000 ; undefined 0x000003ff88321074: .inst 0x00000000 ; undefined 0x000003ff88321078: .inst 0x00000000 ; undefined 0x000003ff8832107c: .inst 0x00000000 ; undefined [Exception Handler] [Stub Code] 0x000003ff88321080: b 0x000003ff880d9b00 ; {no_reloc} [Deopt Handler Code] 0x000003ff88321084: adr x30, 0x000003ff88321084 0x000003ff88321088: b 0x000003ff880b35c0 ; {runtime_call} 0x000003ff8832108c: .inst 0x00000000 ; undefined -------------- next part -------------- A non-text attachment was scrubbed... Name: TestSync.java Type: application/octet-stream Size: 560 bytes Desc: not available URL: From tobias.hartmann at oracle.com Fri Aug 21 14:36:30 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 21 Aug 2015 16:36:30 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder Message-ID: <55D7376E.90103@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8075805 http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ Problem: The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. state of A state of B ----------------------------------------- non-entrant non-entrant S [not on stack] [not on stack] S zombie zombie S marked marked S flushed flushed/re-allocated The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. Let's look at the following setting: state of A state of B ----------------------------------------- non-entrant S [not on stack] non-entrant S zombie [not on stack] zombie S marked marked S flushed flushed/re-allocated There are two problems here: - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: state of A state of B ----------------------------------------- unloaded unloaded S zombie zombie S marked marked S flushed flushed/re-allocated Again, we crash while flushing A. Solution: I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. Testing: - Executed failing tests for a week (still running) - JPRT - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences Thanks, Tobias [1] Detailed logs for nmethod A (1178) and nmethod B (552): Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': ### IC at 0xffff80ffad89b017: set to Nmethod 552/ ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb ### I2C/C2I adapter 0xffff80ffad66ac10 allocated ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 enqueueing icholder 0x0000000800034a18 to be freed *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb deleting icholder 0x0000000800034a18 ## nof_mallocs = 211209, nof_frees = 105760 ## memory stomp: GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 Header guard @0x00000008000349f8 is BROKEN From vladimir.kozlov at oracle.com Fri Aug 21 17:28:36 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Aug 2015 10:28:36 -0700 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55D7376E.90103@oracle.com> References: <55D7376E.90103@oracle.com> Message-ID: <55D75FC4.1040208@oracle.com> Nice work. Thank you Tobias. During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. What spacing you changed in compiledIC.cpp because webrev does not show them? Please, fix comment in vm_operations.cpp // Make the dependent methods zombies - CodeCache::make_marked_nmethods_zombies(); + CodeCache::make_marked_nmethods_not_entrant(); Thanks, Vladimir On 8/21/15 7:36 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8075805 > http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ > > Problem: > The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. > > Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. > > Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. > > state of A state of B > ----------------------------------------- > non-entrant non-entrant > S [not on stack] [not on stack] > S zombie zombie > S marked marked > S flushed flushed/re-allocated > > The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. > > Let's look at the following setting: > > state of A state of B > ----------------------------------------- > non-entrant > S [not on stack] > non-entrant > S zombie [not on stack] > zombie > S marked marked > S flushed flushed/re-allocated > > There are two problems here: > - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, > - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. > > The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. > > A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: > > state of A state of B > ----------------------------------------- > unloaded unloaded > S zombie zombie > S marked marked > S flushed flushed/re-allocated > > Again, we crash while flushing A. > > Solution: > I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. > > To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. > > Testing: > - Executed failing tests for a week (still running) > - JPRT > - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences > > Thanks, > Tobias > > > [1] Detailed logs for nmethod A (1178) and nmethod B (552): > > Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 > IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': > ### IC at 0xffff80ffad89b017: set to Nmethod 552/ > ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie > ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() > ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation > ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation > ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed > *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb > ### I2C/C2I adapter 0xffff80ffad66ac10 allocated > ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed > cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 > enqueueing icholder 0x0000000800034a18 to be freed > *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb > deleting icholder 0x0000000800034a18 > ## nof_mallocs = 211209, nof_frees = 105760 > ## memory stomp: > GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 > Header guard @0x00000008000349f8 is BROKEN > From vladimir.kozlov at oracle.com Fri Aug 21 21:24:17 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Aug 2015 14:24:17 -0700 Subject: aarch64: C2 fast lock/unlock issues In-Reply-To: References: Message-ID: <55D79701.9090407@oracle.com> Thank you for report and suggested fixes. CC to aarch64 port developers. Thanks, Vladimir On 8/21/15 5:21 AM, Hui Shi wrote: > Hi JIT members, > Attached fast_lock.patch fixes issues in fast lock/unlock on aarch64 > platform (in both aarch64-jdk8 and jdk9/hs-comp/hotspot). Could someone > help comments, review or sponsor? > A small test case and PrintAssembly log with/without fix are also > attached for reference. > > To reproduce this issue on aarch64, command line is "java > -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions > -XX:-BackgroundCompilation -XX:CompileCommand="compileonly,TestSync.f*" > -XX:+PrintAssembly TestSync" > There are three Issues in aarch64 fast lock/unlock: > *1. Duplicated biased lock checking* > When option UseBiasedLocking and UseOptoBiasInlining are both true, it > doesn't need emit biased_locking_enter in aarch64_enc_fast_lock. This is > redundant as biased locking enter check is already inlined in > PhaseMacroExpand::expand_lock_node. Checking assembly code in orig.asm > > [Inlined biased lock check in PhaseMacroExpand::expand_lock_node] > 0x000003ff88320d94: str x1, [sp] > 0x000003ff88320d98: ldr x10, [x1] > 0x000003ff88320d9c: and x11, x10, #0x7 > 0x000003ff88320da0: cmp x11, #0x5 > 0x000003ff88320da4: b.ne 0x000003ff88320e18 > [Biased lock check expanded in aarch64_enc_fast_lock] > 0x000003ff88320e18: add x12, sp, #0x10 > 0x000003ff88320e1c: ldr x10, [x1] > 0x000003ff88320e20: and x11, x12, #0x7 > 0x000003ff88320e24: cmp x11, #0x5 > 0x000003ff88320e28: b.ne 0x000003ff88320eec > *2. Incorrect parameter used in biased_locking_enter in > aarch64_enc_fast_lock* > Checking above code [Biased lock check expanded in > aarch64_enc_fast_lock], x12 is the box register and holding the address > of the lock record on stack. However it is mis-used as mark word in > biased lock checking here. As a result, biased pattern check always > fails because stack pointer is 8 bytes align and x11 must be zero. > Current implementation in aarch64_enc_fast_lock. > /biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);/ > Which should be > /biased_locking_enter(box, oop, disp_hdr, tmp, true, cont); //swap > disp_hdr and box register, disp_hdr is already loaded with object mark word/ > This issue might cause problem when running with option > -XX:-UseOptoBiasInlining in following scenario, let?s check above code > in [Biased lock check expanded in aarch64_enc_fast_lock], x12 is box and > x10 is disp_hdr. > 1. Suppose object?s mark word (loaded into register x10) is in biased > mode, with content ?[biased_thread |epoch|age| 101]? and biased_thread > is executing its synchronized block. > 2. Another thread tries to acquire the same lock. Firstly, it performs > biased pattern check and fails, because ?mark word? register used here > is X12 (correct register should be x10). > 3. As x12 is not ?biased? (least three significant bits of SP + 0x10 > would never be 101), execution goes to thin lock CAS acquire code > instead of biased lock revoke/rebias code. > 4. Thin lock CAS acquire will succeed because x10?s least two > significant bit is 01 (thin lock CAS code uses disp_hdr (x10) as mark > word). Two threads acquire same lock at same time and this is incorrect > behavior. > > *3. Inflate monitor code has typo in aarch64_enc_fast_lock* > Inflated lock test is generated under condition (EmitSync & 0x02), while > generating inflated lock fast path under condition "if ((EmitSync & > 0x02) == 0))". At both location, they should be "if ((EmitSync & 0x02) > == 0) ". In orig.asm, no instruction branches to inflated lock acquire > fast path at 0x000003ff88320f24. > Issue #1 and #3 does not impact correctness, they introduce redundant > code (double biased lock check) and skip inflated lock fast path check > (_owner is null case). > Fix is in aarch64_enc_fast_lock/aarch64_enc_fast_unlock, this will not > impact C1 and interpreter. Attached patch includes: > 1. Disable generating biased lock handle code in fast_lock/fast_unlock > when UseOptoBiasInlining is true. > 2. Adjust biased_locking_enter?s actual parameters, swap disp_hdr and > box register. > 3. Fix typo in inflated monitor handling. > > Regards > Shi Hui From tobias.hartmann at oracle.com Mon Aug 24 07:58:49 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 24 Aug 2015 09:58:49 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55D75FC4.1040208@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> Message-ID: <55DACEB9.5070107@oracle.com> Thanks, Vladimir! Please see comments inline. On 21.08.2015 19:28, Vladimir Kozlov wrote: > During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. > Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." > I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. > What spacing you changed in compiledIC.cpp because webrev does not show them? I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch > Please, fix comment in vm_operations.cpp > > // Make the dependent methods zombies > - CodeCache::make_marked_nmethods_zombies(); > + CodeCache::make_marked_nmethods_not_entrant(); Fixed. New webrev: http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ Thanks, Tobias > > Thanks, > Vladimir > > On 8/21/15 7:36 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8075805 >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >> >> Problem: >> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >> >> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >> >> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >> >> state of A state of B >> ----------------------------------------- >> non-entrant non-entrant >> S [not on stack] [not on stack] >> S zombie zombie >> S marked marked >> S flushed flushed/re-allocated >> >> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >> >> Let's look at the following setting: >> >> state of A state of B >> ----------------------------------------- >> non-entrant >> S [not on stack] >> non-entrant >> S zombie [not on stack] >> zombie >> S marked marked >> S flushed flushed/re-allocated >> >> There are two problems here: >> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >> >> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >> >> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >> >> state of A state of B >> ----------------------------------------- >> unloaded unloaded >> S zombie zombie >> S marked marked >> S flushed flushed/re-allocated >> >> Again, we crash while flushing A. >> >> Solution: >> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >> >> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >> >> Testing: >> - Executed failing tests for a week (still running) >> - JPRT >> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >> >> Thanks, >> Tobias >> >> >> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >> >> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >> enqueueing icholder 0x0000000800034a18 to be freed >> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >> deleting icholder 0x0000000800034a18 >> ## nof_mallocs = 211209, nof_frees = 105760 >> ## memory stomp: >> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >> Header guard @0x00000008000349f8 is BROKEN >> From adinn at redhat.com Mon Aug 24 14:31:29 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 24 Aug 2015 15:31:29 +0100 Subject: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code Message-ID: <55DB2AC1.50208@redhat.com> The following webrev against hs-comp head fixes 8080293 http://cr.openjdk.java.net/~adinn/8080293/webrev.00/ It is a follow on to the prior volatile object patch 8078743: AARCH64: Extend use of stlr to cater for volatile object stores http://cr.openjdk.java.net/~adinn/8078743/webrev.04/ and requires that previous patch to be applied first. Testing ------- The patch is sensitive to GC configuration so it was tested against 5 relevant configs G1 CMS+UseCondCardMark CMS-UseCondCardMark Par+UseCondCardMark Par-UseCondCardMark The validity of the transformation was verified by: generating and eyeballing compiled code for simple test programs successfully running a fairly large program (netbeans) generating and eyeballing HashMap code compiled on a fairly large program run The fix was performance tested on 2 implementations of the AArch64 architecture (more details below). On an O-O-O CPU it gave no noticeable benefit. On a simple pipeline CPU it gave a very significant benefit in specific cases. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) The Test -------- As with the prior patch I tested the original vs new code generation strategy by running a jmh test first with -XX:+UseBarriersForVolatile and then with -XX:+UseBarriersForVolatile. Four different test programs ran in all 5 GC configs executing. Each test executed repeated CAS operations to an object field in a single thread with a BlackHole backoff between CASes varying from 0 to 64. Test one performed a CAS guaranteed to fail; test two performed a successful CAS from a fixed object to null and then back; test three performed a successful CAS from a fixed object to another fixed object and then back; test four performed a successful CAS from a fixed object to a newly allocated object and then back. The average time per CAS operation (ns/op) -- actually per 2 CAS operations for the latter 3 tests -- was used as a score. The Results ----------- On an O-O-O CPU there was no significant difference in the time taken. On a simple pipeline CPU the optimization gave a very significant benefit for the Fail tests on all GC configurations except CMS + UseCondCardMark. In all other cases there was no significant measurable benefit. Example Test ------------ package org.openjdk; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicReference; @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(3) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class CasNull { Object tombstone; AtomicReference ref; @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) int backoff; @Setup public void setup() { tombstone = new Object(); ref = new AtomicReference<>(); ref.set(tombstone); } @Benchmark public boolean test() { Blackhole.consumeCPU(backoff); ref.compareAndSet(tombstone, null); ref.compareAndSet(null, tombstone); return true; } } From vladimir.kozlov at oracle.com Mon Aug 24 15:23:13 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 24 Aug 2015 08:23:13 -0700 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DACEB9.5070107@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> Message-ID: <55DB36E1.8000908@oracle.com> Looks good. Thank you for explanations. Thanks, Vladimir On 8/24/15 12:58 AM, Tobias Hartmann wrote: > Thanks, Vladimir! Please see comments inline. > > On 21.08.2015 19:28, Vladimir Kozlov wrote: >> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? > > I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. > > The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. > >> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. > > Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. > > In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. > >> What spacing you changed in compiledIC.cpp because webrev does not show them? > > I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": > http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch > >> Please, fix comment in vm_operations.cpp >> >> // Make the dependent methods zombies >> - CodeCache::make_marked_nmethods_zombies(); >> + CodeCache::make_marked_nmethods_not_entrant(); > > Fixed. > > New webrev: > http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ > > Thanks, > Tobias > >> >> Thanks, >> Vladimir >> >> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>> >>> Problem: >>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>> >>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>> >>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>> >>> state of A state of B >>> ----------------------------------------- >>> non-entrant non-entrant >>> S [not on stack] [not on stack] >>> S zombie zombie >>> S marked marked >>> S flushed flushed/re-allocated >>> >>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>> >>> Let's look at the following setting: >>> >>> state of A state of B >>> ----------------------------------------- >>> non-entrant >>> S [not on stack] >>> non-entrant >>> S zombie [not on stack] >>> zombie >>> S marked marked >>> S flushed flushed/re-allocated >>> >>> There are two problems here: >>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>> >>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>> >>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>> >>> state of A state of B >>> ----------------------------------------- >>> unloaded unloaded >>> S zombie zombie >>> S marked marked >>> S flushed flushed/re-allocated >>> >>> Again, we crash while flushing A. >>> >>> Solution: >>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>> >>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>> >>> Testing: >>> - Executed failing tests for a week (still running) >>> - JPRT >>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>> >>> Thanks, >>> Tobias >>> >>> >>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>> >>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>> enqueueing icholder 0x0000000800034a18 to be freed >>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>> deleting icholder 0x0000000800034a18 >>> ## nof_mallocs = 211209, nof_frees = 105760 >>> ## memory stomp: >>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>> Header guard @0x00000008000349f8 is BROKEN >>> From adinn at redhat.com Mon Aug 24 15:25:08 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 24 Aug 2015 16:25:08 +0100 Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues In-Reply-To: <55D79701.9090407@oracle.com> References: <55D79701.9090407@oracle.com> Message-ID: <55DB3754.9090504@redhat.com> Thank you for proposing these fixes. Your analysis and patch both look to me to be completely correct. I applied the patch and it appears to work fine when running programs with biased locking enabled. I have raised the following JIRA for this fix: https://bugs.openjdk.java.net/browse/JDK-8134322 I will post a webrev containing your patch for review as soon as possible. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From tobias.hartmann at oracle.com Mon Aug 24 15:30:29 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 24 Aug 2015 17:30:29 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DB36E1.8000908@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DB36E1.8000908@oracle.com> Message-ID: <55DB3895.1030105@oracle.com> Thanks, Vladimir. Best, Tobias On 24.08.2015 17:23, Vladimir Kozlov wrote: > Looks good. Thank you for explanations. > > Thanks, > Vladimir > > On 8/24/15 12:58 AM, Tobias Hartmann wrote: >> Thanks, Vladimir! Please see comments inline. >> >> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >> >> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >> >> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >> >>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >> >> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >> >> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >> >>> What spacing you changed in compiledIC.cpp because webrev does not show them? >> >> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >> >>> Please, fix comment in vm_operations.cpp >>> >>> // Make the dependent methods zombies >>> - CodeCache::make_marked_nmethods_zombies(); >>> + CodeCache::make_marked_nmethods_not_entrant(); >> >> Fixed. >> >> New webrev: >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>> >>>> Problem: >>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>> >>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>> >>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> non-entrant non-entrant >>>> S [not on stack] [not on stack] >>>> S zombie zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>> >>>> Let's look at the following setting: >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> non-entrant >>>> S [not on stack] >>>> non-entrant >>>> S zombie [not on stack] >>>> zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> There are two problems here: >>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>> >>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>> >>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> unloaded unloaded >>>> S zombie zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> Again, we crash while flushing A. >>>> >>>> Solution: >>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>> >>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>> >>>> Testing: >>>> - Executed failing tests for a week (still running) >>>> - JPRT >>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>> >>>> Thanks, >>>> Tobias >>>> >>>> >>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>> >>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>> enqueueing icholder 0x0000000800034a18 to be freed >>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>> deleting icholder 0x0000000800034a18 >>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>> ## memory stomp: >>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>> Header guard @0x00000008000349f8 is BROKEN >>>> From igor.veresov at oracle.com Mon Aug 24 20:10:14 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 24 Aug 2015 13:10:14 -0700 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DACEB9.5070107@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> Message-ID: Seems good to me. Btw, did you find why there is a need for ?marked for reclamation? state? igor > On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: > > Thanks, Vladimir! Please see comments inline. > > On 21.08.2015 19:28, Vladimir Kozlov wrote: >> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? > > I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. > > The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. > >> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. > > Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. > > In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. > >> What spacing you changed in compiledIC.cpp because webrev does not show them? > > I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": > http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch > >> Please, fix comment in vm_operations.cpp >> >> // Make the dependent methods zombies >> - CodeCache::make_marked_nmethods_zombies(); >> + CodeCache::make_marked_nmethods_not_entrant(); > > Fixed. > > New webrev: > http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ > > Thanks, > Tobias > >> >> Thanks, >> Vladimir >> >> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>> >>> Problem: >>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>> >>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>> >>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>> >>> state of A state of B >>> ----------------------------------------- >>> non-entrant non-entrant >>> S [not on stack] [not on stack] >>> S zombie zombie >>> S marked marked >>> S flushed flushed/re-allocated >>> >>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>> >>> Let's look at the following setting: >>> >>> state of A state of B >>> ----------------------------------------- >>> non-entrant >>> S [not on stack] >>> non-entrant >>> S zombie [not on stack] >>> zombie >>> S marked marked >>> S flushed flushed/re-allocated >>> >>> There are two problems here: >>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>> >>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>> >>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>> >>> state of A state of B >>> ----------------------------------------- >>> unloaded unloaded >>> S zombie zombie >>> S marked marked >>> S flushed flushed/re-allocated >>> >>> Again, we crash while flushing A. >>> >>> Solution: >>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>> >>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>> >>> Testing: >>> - Executed failing tests for a week (still running) >>> - JPRT >>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>> >>> Thanks, >>> Tobias >>> >>> >>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>> >>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>> enqueueing icholder 0x0000000800034a18 to be freed >>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>> deleting icholder 0x0000000800034a18 >>> ## nof_mallocs = 211209, nof_frees = 105760 >>> ## memory stomp: >>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>> Header guard @0x00000008000349f8 is BROKEN >>> From hui.shi at linaro.org Tue Aug 25 00:16:28 2015 From: hui.shi at linaro.org (=?utf-8?B?aHVpLnNoaQ==?=) Date: Tue, 25 Aug 2015 08:16:28 +0800 Subject: =?utf-8?B?5Zue5aSN77yaW2FhcmNoNjQtcG9ydC1kZXYgXSBh?= =?utf-8?B?YXJjaDY0OiBDMiBmYXN0IGxvY2svdW5sb2NrIGlz?= =?utf-8?B?c3Vlcw==?= References: <55DB3754.9090504@redhat.com><55D79701.9090407@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Aug 25 00:24:29 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 24 Aug 2015 17:24:29 -0700 Subject: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DB2AC1.50208@redhat.com> References: <55DB2AC1.50208@redhat.com> Message-ID: <55DBB5BD.4040400@oracle.com> I did not look deep on code logic. Few comments only: Use {} for all conditional code (it cause a lot of pain in the past): if (is_cas) return NULL; You don't need #ifndef PRODUCT: +#ifndef PRODUCT +#ifdef ASSERT New mach instructs missing predicate: predicate(needs_acquiring_load_exclusive(n)); You use higher ins_cost to avoid their generation when predicate is false. So why not explicit predicate? Thanks, Vladimir On 8/24/15 7:31 AM, Andrew Dinn wrote: > The following webrev against hs-comp head fixes 8080293 > > http://cr.openjdk.java.net/~adinn/8080293/webrev.00/ > > It is a follow on to the prior volatile object patch > > 8078743: AARCH64: Extend use of stlr to cater for volatile object stores > http://cr.openjdk.java.net/~adinn/8078743/webrev.04/ > > and requires that previous patch to be applied first. > > Testing > ------- > > The patch is sensitive to GC configuration so it was tested against 5 > relevant configs > > G1 > CMS+UseCondCardMark > CMS-UseCondCardMark > Par+UseCondCardMark > Par-UseCondCardMark > > The validity of the transformation was verified by: > > generating and eyeballing compiled code for simple test programs > successfully running a fairly large program (netbeans) > generating and eyeballing HashMap code compiled on a fairly large > program run > > The fix was performance tested on 2 implementations of the AArch64 > architecture (more details below). On an O-O-O CPU it gave no noticeable > benefit. On a simple pipeline CPU it gave a very significant benefit in > specific cases. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > > The Test > -------- > > As with the prior patch I tested the original vs new code generation > strategy by running a jmh test first with -XX:+UseBarriersForVolatile > and then with -XX:+UseBarriersForVolatile. Four different test programs > ran in all 5 GC configs executing. Each test executed repeated CAS > operations to an object field in a single thread with a BlackHole > backoff between CASes varying from 0 to 64. > > Test one performed a CAS guaranteed to fail; test two performed a > successful CAS from a fixed object to null and then back; test three > performed a successful CAS from a fixed object to another fixed object > and then back; test four performed a successful CAS from a fixed > object to a newly allocated object and then back. The average time per > CAS operation (ns/op) -- actually per 2 CAS operations for the latter 3 > tests -- was used as a score. > > The Results > ----------- > > On an O-O-O CPU there was no significant difference in the time taken. > > On a simple pipeline CPU the optimization gave a very significant > benefit for the Fail tests on all GC configurations except CMS > + UseCondCardMark. In all other cases there was no significant > measurable benefit. > > Example Test > ------------ > > package org.openjdk; > > import org.openjdk.jmh.annotations.*; > import org.openjdk.jmh.infra.Blackhole; > > import java.util.concurrent.TimeUnit; > import java.util.concurrent.atomic.AtomicReference; > > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) > @Fork(3) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Benchmark) > public class CasNull { > > Object tombstone; > > AtomicReference ref; > > @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) > int backoff; > > @Setup > public void setup() { > tombstone = new Object(); > > ref = new AtomicReference<>(); > ref.set(tombstone); > } > > @Benchmark > public boolean test() { > Blackhole.consumeCPU(backoff); > ref.compareAndSet(tombstone, null); > ref.compareAndSet(null, tombstone); > return true; > } > } > From tobias.hartmann at oracle.com Tue Aug 25 05:42:51 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2015 07:42:51 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> Message-ID: <55DC005B.9010109@oracle.com> On 24.08.2015 22:10, Igor Veresov wrote: > Seems good to me. Thanks, Igor. > Btw, did you find why there is a need for ?marked for reclamation? state? No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. Best, Tobias > > igor > >> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >> >> Thanks, Vladimir! Please see comments inline. >> >> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >> >> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >> >> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >> >>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >> >> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >> >> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >> >>> What spacing you changed in compiledIC.cpp because webrev does not show them? >> >> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >> >>> Please, fix comment in vm_operations.cpp >>> >>> // Make the dependent methods zombies >>> - CodeCache::make_marked_nmethods_zombies(); >>> + CodeCache::make_marked_nmethods_not_entrant(); >> >> Fixed. >> >> New webrev: >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>> >>>> Problem: >>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>> >>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>> >>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> non-entrant non-entrant >>>> S [not on stack] [not on stack] >>>> S zombie zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>> >>>> Let's look at the following setting: >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> non-entrant >>>> S [not on stack] >>>> non-entrant >>>> S zombie [not on stack] >>>> zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> There are two problems here: >>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>> >>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>> >>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> unloaded unloaded >>>> S zombie zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> Again, we crash while flushing A. >>>> >>>> Solution: >>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>> >>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>> >>>> Testing: >>>> - Executed failing tests for a week (still running) >>>> - JPRT >>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>> >>>> Thanks, >>>> Tobias >>>> >>>> >>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>> >>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>> enqueueing icholder 0x0000000800034a18 to be freed >>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>> deleting icholder 0x0000000800034a18 >>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>> ## memory stomp: >>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>> Header guard @0x00000008000349f8 is BROKEN >>>> > From adinn at redhat.com Tue Aug 25 08:20:47 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 25 Aug 2015 09:20:47 +0100 Subject: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DBB5BD.4040400@oracle.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> Message-ID: <55DC255F.7070903@redhat.com> Hi Vladimir, Thank you very much for your review. On 25/08/15 01:24, Vladimir Kozlov wrote: > I did not look deep on code logic. > Few comments only: > > Use {} for all conditional code (it cause a lot of pain in the past): > > if (is_cas) > return NULL; Ok, I'll correct all of those. > You don't need #ifndef PRODUCT: > > +#ifndef PRODUCT > +#ifdef ASSERT Ah, that's good to know. Thanks. > New mach instructs missing predicate: > > predicate(needs_acquiring_load_exclusive(n)); > > You use higher ins_cost to avoid their generation when predicate is > false. So why not explicit predicate? I had two separate reasons for not repeating the predicates: 1 They do quite a lot of work crawling the graph. So, calling the predicate in the lower cost case and omitting it in the higher cost case attempts to avoid the expense of executing it twice in some cases. 2 The ins_cost for the lower and higher cost cases is meant to reflect a difference in the expected execution cost associated with the instruction. [n.b. I adopted the same strategy for the new membar generation rules which were added as part of the volatile put optimization patch -- compare membar_release and unnecessary_membar_release] However, looking again at the code I believe I have the costs (and hence the predicates) attached to the wrong rules in each pair. For example, currently the rules include the following details compareAndSwapIAcq -- does not emit dmb instructions no predicate cost (2 * VOLATILE_REF_COST ) compareAndSwapI -- emits dmb instructions predicate(!needs_acquiring_load_exclusive(n)) cost VOLATILE_REF_COST The patch presumes that the first rule will have a lower (or at least no higher) cost than the second. So a correct version would be either, calling the predicate once: A: compareAndSwapIAcq -- does not emit dmb instructions predicate(needs_acquiring_load_exclusive(n)) cost VOLATILE_REF_COST compareAndSwapI-- emits dmb instructions no predicate cost (2 * VOLATILE_REF_COST) or, calling the predicate twice: B: compareAndSwapIAcq -- does not emit dmb instructions predicate(needs_acquiring_load_exclusive(n)) cost VOLATILE_REF_COST compareAndSwapI-- emits dmb instructions predicate(!needs_acquiring_load_exclusive(n)) cost (2 * VOLATILE_REF_COST) I would prefer to retain A over B. A is guaranteed to give the same result as B with potentially less execution overhead. However, if you feel B is required I will adopt that format for all rules. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From dawid.weiss at gmail.com Tue Aug 25 09:59:17 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Tue, 25 Aug 2015 11:59:17 +0200 Subject: NPE on tiered compilation only (seeking help to narrow it down) Message-ID: Hello, Uwe Schindler (CC) has been trying to narrow down a Lucene bug which manifests itself on recent hotspot builds and results in odd NPEs. It looks very strange -- similar to what I reported a while ago (and thought to have been fixed by JDK-8060036). The problem is that an NPE starts to appear at numerous places after the JVM warms up a bit during a test run. What is odd is that: 1. it happens on 32 bit and 64 bit JVMs (32 bit is easier to reproduce), 2. it does *not* happen with -client, 3. it does happen with -server, 4. it does *not* seem to happen with -server and -XX:-TieredCompilation.... I am a bit lost about (4) and (3) -- both these should end up in C2 being used (with -server using tiered compilation). Is there any way we can turn something on or off to narrow the scope there? What's going on during tiered compilation that can be causing this? Dawid From dawid.weiss at gmail.com Tue Aug 25 10:05:38 2015 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Tue, 25 Aug 2015 12:05:38 +0200 Subject: NPE on tiered compilation only (seeking help to narrow it down) In-Reply-To: References: Message-ID: > 1. it happens on 32 bit and 64 bit JVMs (32 bit is easier to reproduce), Correction: so far we can only reproduce it on the 32-bit build. Uwe has been trying to fail on 64 bit. Dawid From martin.doerr at sap.com Tue Aug 25 10:34:30 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 25 Aug 2015 10:34:30 +0000 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DC005B.9010109@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> Hi all, we appreciate that this code gets cleaned up. Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: if (state == zombie) { MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. RelocIterator iter(this, low_boundary); while (iter.next()) { if (iter.type() == relocInfo::virtual_call_type) { CompiledIC *ic = CompiledIC_at(&iter); ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); } } } (Note: set_ic_destination_and_value is currently private.) As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. Not sure which approach is the better one. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann Sent: Dienstag, 25. August 2015 07:43 To: Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder On 24.08.2015 22:10, Igor Veresov wrote: > Seems good to me. Thanks, Igor. > Btw, did you find why there is a need for ?marked for reclamation? state? No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. Best, Tobias > > igor > >> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >> >> Thanks, Vladimir! Please see comments inline. >> >> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >> >> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >> >> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >> >>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >> >> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >> >> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >> >>> What spacing you changed in compiledIC.cpp because webrev does not show them? >> >> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >> >>> Please, fix comment in vm_operations.cpp >>> >>> // Make the dependent methods zombies >>> - CodeCache::make_marked_nmethods_zombies(); >>> + CodeCache::make_marked_nmethods_not_entrant(); >> >> Fixed. >> >> New webrev: >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>> >>>> Problem: >>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>> >>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>> >>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> non-entrant non-entrant >>>> S [not on stack] [not on stack] >>>> S zombie zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>> >>>> Let's look at the following setting: >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> non-entrant >>>> S [not on stack] >>>> non-entrant >>>> S zombie [not on stack] >>>> zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> There are two problems here: >>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>> >>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>> >>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>> >>>> state of A state of B >>>> ----------------------------------------- >>>> unloaded unloaded >>>> S zombie zombie >>>> S marked marked >>>> S flushed flushed/re-allocated >>>> >>>> Again, we crash while flushing A. >>>> >>>> Solution: >>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>> >>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>> >>>> Testing: >>>> - Executed failing tests for a week (still running) >>>> - JPRT >>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>> >>>> Thanks, >>>> Tobias >>>> >>>> >>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>> >>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>> enqueueing icholder 0x0000000800034a18 to be freed >>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>> deleting icholder 0x0000000800034a18 >>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>> ## memory stomp: >>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>> Header guard @0x00000008000349f8 is BROKEN >>>> > From uschindler at apache.org Tue Aug 25 11:19:53 2015 From: uschindler at apache.org (Uwe Schindler) Date: Tue, 25 Aug 2015 13:19:53 +0200 Subject: NPE on tiered compilation only (seeking help to narrow it down) In-Reply-To: References: Message-ID: <022801d0df27$f94b61d0$ebe22570$@apache.org> Hi, thetaphi at serv1:~$ java -version java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b78) Java HotSpot(TM) Server VM (build 1.9.0-ea-b78, mixed mode) thetaphi at serv1:~$ I also fails without tiered compilation. It just takes longer (maybe because of statistics are different). I was not able to make 64 bit JVM fail, on 32 bit it's very easy if you run the full Lucene test suite and give the right command line options for the JVM: - It does not fail with 64 bits (I tried like 20 runs of test suite, no success). - It does not fail with 32 bits and "-client" (I also tried 20 runs, no success). - It fails ASAP on 32 bits (in the first few Lucene tests with messages about random NPEs or failed assertions) with: "-server -Xbatch -XX:+TieredCompilation" - It also fails on 32 bits, but takes longer (like 60 tests of the suite ran): "-server -Xbatch -XX:+TieredCompilation" - Because it was not possible to reproducible on another machine (running Windows), which does not have AVX2 instruction set, so I tried to disable AVX(2) on 32 bits, but it still fails tests: "-server -Xbatch -XX:UseAVX=0" / "-server -Xbatch -XX:-UseSuperWord" I will open a bug report with Rory and Balchandra giving the steps how to make the Lucene tests fail. We may help with trying stuff out, but we have no clue what's wrong. Especially every test run fails at a different place with various NPEs or test assertions failed. The problems started around build 68. Uwe ----- Uwe Schindler uschindler at apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/ > -----Original Message----- > From: Dawid Weiss [mailto:dawid.weiss at gmail.com] > Sent: Tuesday, August 25, 2015 12:06 PM > To: hotspot compiler > Cc: Uwe Schindler; Balchandra Vaidya > Subject: Re: NPE on tiered compilation only (seeking help to narrow it down) > > > 1. it happens on 32 bit and 64 bit JVMs (32 bit is easier to > > reproduce), > > Correction: so far we can only reproduce it on the 32-bit build. Uwe has been > trying to fail on 64 bit. > > Dawid From uschindler at apache.org Tue Aug 25 11:27:30 2015 From: uschindler at apache.org (Uwe Schindler) Date: Tue, 25 Aug 2015 13:27:30 +0200 Subject: NPE on tiered compilation only (seeking help to narrow it down) In-Reply-To: <022801d0df27$f94b61d0$ebe22570$@apache.org> References: <022801d0df27$f94b61d0$ebe22570$@apache.org> Message-ID: <023801d0df29$09c54200$1d4fc600$@apache.org> Sorry small typo: > Hi, > > thetaphi at serv1:~$ java -version > java version "1.9.0-ea" > Java(TM) SE Runtime Environment (build 1.9.0-ea-b78) > Java HotSpot(TM) Server VM (build 1.9.0-ea-b78, mixed mode) > thetaphi at serv1:~$ > > I also fails without tiered compilation. It just takes longer (maybe because of > statistics are different). I was not able to make 64 bit JVM fail, on 32 bit it's > very easy if you run the full Lucene test suite and give the right command line > options for the JVM: > - It does not fail with 64 bits (I tried like 20 runs of test suite, no success). > - It does not fail with 32 bits and "-client" (I also tried 20 runs, no success). > - It fails ASAP on 32 bits (in the first few Lucene tests with messages about > random NPEs or failed assertions) with: "-server -Xbatch - > XX:+TieredCompilation" > - It also fails on 32 bits, but takes longer (like 60 tests of the suite ran): "- > server -Xbatch -XX:+TieredCompilation" Of course should be: - It also fails on 32 bits without tiered compilation, but takes longer (like 60 tests of the suite ran): "-server -Xbatch -XX:-TieredCompilation" > - Because it was not possible to reproducible on another machine (running > Windows), which does not have AVX2 instruction set, so I tried to disable > AVX(2) on 32 bits, but it still fails tests: "-server -Xbatch -XX:UseAVX=0" / "- > server -Xbatch -XX:-UseSuperWord" > > I will open a bug report with Rory and Balchandra giving the steps how to > make the Lucene tests fail. We may help with trying stuff out, but we have > no clue what's wrong. Especially every test run fails at a different place with > various NPEs or test assertions failed. > > The problems started around build 68. > > Uwe > > ----- > Uwe Schindler > uschindler at apache.org > ASF Member, Apache Lucene PMC / Committer > Bremen, Germany > http://lucene.apache.org/ > > > -----Original Message----- > > From: Dawid Weiss [mailto:dawid.weiss at gmail.com] > > Sent: Tuesday, August 25, 2015 12:06 PM > > To: hotspot compiler > > Cc: Uwe Schindler; Balchandra Vaidya > > Subject: Re: NPE on tiered compilation only (seeking help to narrow it > down) > > > > > 1. it happens on 32 bit and 64 bit JVMs (32 bit is easier to > > > reproduce), > > > > Correction: so far we can only reproduce it on the 32-bit build. Uwe has > been > > trying to fail on 64 bit. > > > > Dawid From adinn at redhat.com Tue Aug 25 12:50:43 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 25 Aug 2015 13:50:43 +0100 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DC255F.7070903@redhat.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> Message-ID: <55DC64A3.9000907@redhat.com> A webrev in the light of Vladimir's feedback has been uploaded at the following URL http://cr.openjdk.java.net/~adinn/8080293/webrev.01/ With this version I have made the changes Vladimir requested except for the one detail I queried in my previous response. I have only called the needs_acquiring_load_exclusive predicate in one of each of the pairs of CompareAndSwapX rules. However, I have corrected the rule costs so that they actually reflect the expected cost of the alternative generated code segments (and adjusted the location+sense of the predicate expression accordingly). [Vladimir, if you really require the predicate to be called in both rules I will prepare another webrev] I still need another reviewer and an hs-comp committer to look at this. Could someone from the aarch64-port-dev group (or maybe Alexey Shipilev) please provide a second review and agree to sponsor the patch? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) On 25/08/15 09:20, Andrew Dinn wrote: > Hi Vladimir, > > Thank you very much for your review. > > On 25/08/15 01:24, Vladimir Kozlov wrote: >> I did not look deep on code logic. >> Few comments only: >> >> Use {} for all conditional code (it cause a lot of pain in the past): >> >> if (is_cas) >> return NULL; > > Ok, I'll correct all of those. > >> You don't need #ifndef PRODUCT: >> >> +#ifndef PRODUCT >> +#ifdef ASSERT > > Ah, that's good to know. Thanks. > >> New mach instructs missing predicate: >> >> predicate(needs_acquiring_load_exclusive(n)); >> >> You use higher ins_cost to avoid their generation when predicate is >> false. So why not explicit predicate? > > I had two separate reasons for not repeating the predicates: > > 1 They do quite a lot of work crawling the graph. So, calling the > predicate in the lower cost case and omitting it in the higher cost case > attempts to avoid the expense of executing it twice in some cases. > > 2 The ins_cost for the lower and higher cost cases is meant to reflect > a difference in the expected execution cost associated with the instruction. > > [n.b. I adopted the same strategy for the new membar generation rules > which were added as part of the volatile put optimization patch -- > compare membar_release and unnecessary_membar_release] > > However, looking again at the code I believe I have the costs (and hence > the predicates) attached to the wrong rules in each pair. For example, > currently the rules include the following details > > compareAndSwapIAcq -- does not emit dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST ) > > compareAndSwapI -- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > The patch presumes that the first rule will have a lower (or at least no > higher) cost than the second. So a correct version would be either, > calling the predicate once: > > A: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST) > > or, calling the predicate twice: > > B: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost (2 * VOLATILE_REF_COST) > > I would prefer to retain A over B. A is guaranteed to give the same > result as B with potentially less execution overhead. However, if you > feel B is required I will adopt that format for all rules. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > > From tobias.hartmann at oracle.com Tue Aug 25 12:52:19 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2015 14:52:19 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> Message-ID: <55DC6503.20301@oracle.com> Hi Martin, thanks for looking at this! It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. Thanks, Tobias On 25.08.2015 12:34, Doerr, Martin wrote: > Hi all, > > we appreciate that this code gets cleaned up. > > Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? > We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: > > if (state == zombie) { > MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); > address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. > RelocIterator iter(this, low_boundary); > while (iter.next()) { > if (iter.type() == relocInfo::virtual_call_type) { > CompiledIC *ic = CompiledIC_at(&iter); > ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); > } > } > } > > (Note: set_ic_destination_and_value is currently private.) > > As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. > Not sure which approach is the better one. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann > Sent: Dienstag, 25. August 2015 07:43 > To: Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > On 24.08.2015 22:10, Igor Veresov wrote: >> Seems good to me. > > Thanks, Igor. > >> Btw, did you find why there is a need for ?marked for reclamation? state? > > No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. > > Best, > Tobias > >> >> igor >> >>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>> >>> Thanks, Vladimir! Please see comments inline. >>> >>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>> >>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>> >>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>> >>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>> >>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>> >>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>> >>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>> >>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>> >>>> Please, fix comment in vm_operations.cpp >>>> >>>> // Make the dependent methods zombies >>>> - CodeCache::make_marked_nmethods_zombies(); >>>> + CodeCache::make_marked_nmethods_not_entrant(); >>> >>> Fixed. >>> >>> New webrev: >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>> >>> Thanks, >>> Tobias >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following patch. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>> >>>>> Problem: >>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>> >>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>> >>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>> >>>>> state of A state of B >>>>> ----------------------------------------- >>>>> non-entrant non-entrant >>>>> S [not on stack] [not on stack] >>>>> S zombie zombie >>>>> S marked marked >>>>> S flushed flushed/re-allocated >>>>> >>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>> >>>>> Let's look at the following setting: >>>>> >>>>> state of A state of B >>>>> ----------------------------------------- >>>>> non-entrant >>>>> S [not on stack] >>>>> non-entrant >>>>> S zombie [not on stack] >>>>> zombie >>>>> S marked marked >>>>> S flushed flushed/re-allocated >>>>> >>>>> There are two problems here: >>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>> >>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>> >>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>> >>>>> state of A state of B >>>>> ----------------------------------------- >>>>> unloaded unloaded >>>>> S zombie zombie >>>>> S marked marked >>>>> S flushed flushed/re-allocated >>>>> >>>>> Again, we crash while flushing A. >>>>> >>>>> Solution: >>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>> >>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>> >>>>> Testing: >>>>> - Executed failing tests for a week (still running) >>>>> - JPRT >>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> >>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>> >>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>> deleting icholder 0x0000000800034a18 >>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>> ## memory stomp: >>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>> Header guard @0x00000008000349f8 is BROKEN >>>>> >> From tobias.hartmann at oracle.com Tue Aug 25 14:34:09 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2015 16:34:09 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DC6503.20301@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> Message-ID: <55DC7CE1.90805@oracle.com> I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. Best, Tobias On 25.08.2015 14:52, Tobias Hartmann wrote: > Hi Martin, > > thanks for looking at this! > > It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? > > I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. > > I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: > http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ > > I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. > > If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. > > Thanks, > Tobias > > On 25.08.2015 12:34, Doerr, Martin wrote: >> Hi all, >> >> we appreciate that this code gets cleaned up. >> >> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >> >> if (state == zombie) { >> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >> RelocIterator iter(this, low_boundary); >> while (iter.next()) { >> if (iter.type() == relocInfo::virtual_call_type) { >> CompiledIC *ic = CompiledIC_at(&iter); >> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >> } >> } >> } >> >> (Note: set_ic_destination_and_value is currently private.) >> >> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >> Not sure which approach is the better one. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >> Sent: Dienstag, 25. August 2015 07:43 >> To: Igor Veresov >> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >> >> On 24.08.2015 22:10, Igor Veresov wrote: >>> Seems good to me. >> >> Thanks, Igor. >> >>> Btw, did you find why there is a need for ?marked for reclamation? state? >> >> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >> >> Best, >> Tobias >> >>> >>> igor >>> >>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>> >>>> Thanks, Vladimir! Please see comments inline. >>>> >>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>> >>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>> >>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>> >>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>> >>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>> >>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>> >>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>> >>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>> >>>>> Please, fix comment in vm_operations.cpp >>>>> >>>>> // Make the dependent methods zombies >>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>> >>>> Fixed. >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>> >>>> Thanks, >>>> Tobias >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> please review the following patch. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>> >>>>>> Problem: >>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>> >>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>> >>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>> >>>>>> state of A state of B >>>>>> ----------------------------------------- >>>>>> non-entrant non-entrant >>>>>> S [not on stack] [not on stack] >>>>>> S zombie zombie >>>>>> S marked marked >>>>>> S flushed flushed/re-allocated >>>>>> >>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>> >>>>>> Let's look at the following setting: >>>>>> >>>>>> state of A state of B >>>>>> ----------------------------------------- >>>>>> non-entrant >>>>>> S [not on stack] >>>>>> non-entrant >>>>>> S zombie [not on stack] >>>>>> zombie >>>>>> S marked marked >>>>>> S flushed flushed/re-allocated >>>>>> >>>>>> There are two problems here: >>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>> >>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>> >>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>> >>>>>> state of A state of B >>>>>> ----------------------------------------- >>>>>> unloaded unloaded >>>>>> S zombie zombie >>>>>> S marked marked >>>>>> S flushed flushed/re-allocated >>>>>> >>>>>> Again, we crash while flushing A. >>>>>> >>>>>> Solution: >>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>> >>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>> >>>>>> Testing: >>>>>> - Executed failing tests for a week (still running) >>>>>> - JPRT >>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> >>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>> >>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>> deleting icholder 0x0000000800034a18 >>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>> ## memory stomp: >>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>> >>> From martin.doerr at sap.com Tue Aug 25 14:53:27 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 25 Aug 2015 14:53:27 +0000 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DC6503.20301@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566AD5F7D6@DEWDFEMB19A.global.corp.sap> Hi Tobias, thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. We had observed very sporadic assertions or freeing of unallocated memory. Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. Additionally, it may be required to adapt a couple of assertions. We modified the following ASSERT code (based on hotspot 25) 1. in CompiledIC::is_call_to_compiled(): Accessing cached_metadata() may be unsafe if !caller->is_alive(). 2. in CompiledIC::is_call_to_interpreted(): CodeBlob* db may be gone if cb is unloaded. 3. in CompiledIC::verify(): is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. 4. in nmethod::verify_clean_inline_caches(): In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. For details, please see below. Best regards, Martin 1. assert( is_c1_method || !is_monomorphic || is_optimized() || !caller->is_alive() || (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); 2. #ifdef ASSERT { CodeBlob* db = CodeCache::find_blob_unsafe(dest); if (!db) { nmethod *nm = cb->as_nmethod_or_null(); assert(nm, "sanity"); if ( nm->is_in_use() || (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { { // Dump some information. ttyLocker ttyl; tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); nm->print(tty); Method *m = nm->method(); if (m) { m->print_on(tty); } } assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); } } else assert(!db->is_adapter_blob(), "must use stub!"); } #endif /* ASSERT */ 3. assert(is_clean() || is_optimized() || is_megamorphic() || is_call_to_compiled() || is_call_to_interpreted() , "sanity check"); 4. case relocInfo::virtual_call_type: { CompiledIC *ic = CompiledIC_at(&iter); // Ok, to lookup references to zombies here CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); if( cb != NULL && cb->is_nmethod() ) { nmethod* nm = (nmethod*)cb; // Verify that inline caches pointing to not_entrant methods are clean if (nm->is_not_entrant()) { assert(ic->is_clean(), "IC should be clean"); } } break; } case relocInfo::opt_virtual_call_type: { -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Dienstag, 25. August 2015 14:52 To: Doerr, Martin; Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder Hi Martin, thanks for looking at this! It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. Thanks, Tobias On 25.08.2015 12:34, Doerr, Martin wrote: > Hi all, > > we appreciate that this code gets cleaned up. > > Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? > We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: > > if (state == zombie) { > MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); > address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. > RelocIterator iter(this, low_boundary); > while (iter.next()) { > if (iter.type() == relocInfo::virtual_call_type) { > CompiledIC *ic = CompiledIC_at(&iter); > ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); > } > } > } > > (Note: set_ic_destination_and_value is currently private.) > > As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. > Not sure which approach is the better one. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann > Sent: Dienstag, 25. August 2015 07:43 > To: Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > On 24.08.2015 22:10, Igor Veresov wrote: >> Seems good to me. > > Thanks, Igor. > >> Btw, did you find why there is a need for ?marked for reclamation? state? > > No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. > > Best, > Tobias > >> >> igor >> >>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>> >>> Thanks, Vladimir! Please see comments inline. >>> >>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>> >>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>> >>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>> >>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>> >>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>> >>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>> >>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>> >>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>> >>>> Please, fix comment in vm_operations.cpp >>>> >>>> // Make the dependent methods zombies >>>> - CodeCache::make_marked_nmethods_zombies(); >>>> + CodeCache::make_marked_nmethods_not_entrant(); >>> >>> Fixed. >>> >>> New webrev: >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>> >>> Thanks, >>> Tobias >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following patch. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>> >>>>> Problem: >>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>> >>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>> >>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>> >>>>> state of A state of B >>>>> ----------------------------------------- >>>>> non-entrant non-entrant >>>>> S [not on stack] [not on stack] >>>>> S zombie zombie >>>>> S marked marked >>>>> S flushed flushed/re-allocated >>>>> >>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>> >>>>> Let's look at the following setting: >>>>> >>>>> state of A state of B >>>>> ----------------------------------------- >>>>> non-entrant >>>>> S [not on stack] >>>>> non-entrant >>>>> S zombie [not on stack] >>>>> zombie >>>>> S marked marked >>>>> S flushed flushed/re-allocated >>>>> >>>>> There are two problems here: >>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>> >>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>> >>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>> >>>>> state of A state of B >>>>> ----------------------------------------- >>>>> unloaded unloaded >>>>> S zombie zombie >>>>> S marked marked >>>>> S flushed flushed/re-allocated >>>>> >>>>> Again, we crash while flushing A. >>>>> >>>>> Solution: >>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>> >>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>> >>>>> Testing: >>>>> - Executed failing tests for a week (still running) >>>>> - JPRT >>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> >>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>> >>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>> deleting icholder 0x0000000800034a18 >>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>> ## memory stomp: >>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>> Header guard @0x00000008000349f8 is BROKEN >>>>> >> From martin.doerr at sap.com Tue Aug 25 15:01:55 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 25 Aug 2015 15:01:55 +0000 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DC7CE1.90805@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> <55DC7CE1.90805@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> Good catch. That was actually the reason why we use the loop below instead of calling cleanup_inline_caches(). Pasting my previous mail again so we have everything in one thread. Hi Tobias, thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. We had observed very sporadic assertions or freeing of unallocated memory. Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. Additionally, it may be required to adapt a couple of assertions. We modified the following ASSERT code (based on hotspot 25) 1. in CompiledIC::is_call_to_compiled(): Accessing cached_metadata() may be unsafe if !caller->is_alive(). 2. in CompiledIC::is_call_to_interpreted(): CodeBlob* db may be gone if cb is unloaded. 3. in CompiledIC::verify(): is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. 4. in nmethod::verify_clean_inline_caches(): In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. For details, please see below. Best regards, Martin 1. assert( is_c1_method || !is_monomorphic || is_optimized() || !caller->is_alive() || (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); 2. #ifdef ASSERT { CodeBlob* db = CodeCache::find_blob_unsafe(dest); if (!db) { nmethod *nm = cb->as_nmethod_or_null(); assert(nm, "sanity"); if ( nm->is_in_use() || (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { { // Dump some information. ttyLocker ttyl; tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); nm->print(tty); Method *m = nm->method(); if (m) { m->print_on(tty); } } assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); } } else assert(!db->is_adapter_blob(), "must use stub!"); } #endif /* ASSERT */ 3. assert(is_clean() || is_optimized() || is_megamorphic() || is_call_to_compiled() || is_call_to_interpreted() , "sanity check"); 4. case relocInfo::virtual_call_type: { CompiledIC *ic = CompiledIC_at(&iter); // Ok, to lookup references to zombies here CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); if( cb != NULL && cb->is_nmethod() ) { nmethod* nm = (nmethod*)cb; // Verify that inline caches pointing to not_entrant methods are clean if (nm->is_not_entrant()) { assert(ic->is_clean(), "IC should be clean"); } } break; } case relocInfo::opt_virtual_call_type: { -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Dienstag, 25. August 2015 16:34 To: Doerr, Martin; Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. Best, Tobias On 25.08.2015 14:52, Tobias Hartmann wrote: > Hi Martin, > > thanks for looking at this! > > It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? > > I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. > > I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: > http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ > > I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. > > If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. > > Thanks, > Tobias > > On 25.08.2015 12:34, Doerr, Martin wrote: >> Hi all, >> >> we appreciate that this code gets cleaned up. >> >> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >> >> if (state == zombie) { >> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >> RelocIterator iter(this, low_boundary); >> while (iter.next()) { >> if (iter.type() == relocInfo::virtual_call_type) { >> CompiledIC *ic = CompiledIC_at(&iter); >> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >> } >> } >> } >> >> (Note: set_ic_destination_and_value is currently private.) >> >> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >> Not sure which approach is the better one. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >> Sent: Dienstag, 25. August 2015 07:43 >> To: Igor Veresov >> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >> >> On 24.08.2015 22:10, Igor Veresov wrote: >>> Seems good to me. >> >> Thanks, Igor. >> >>> Btw, did you find why there is a need for ?marked for reclamation? state? >> >> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >> >> Best, >> Tobias >> >>> >>> igor >>> >>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>> >>>> Thanks, Vladimir! Please see comments inline. >>>> >>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>> >>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>> >>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>> >>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>> >>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>> >>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>> >>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>> >>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>> >>>>> Please, fix comment in vm_operations.cpp >>>>> >>>>> // Make the dependent methods zombies >>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>> >>>> Fixed. >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>> >>>> Thanks, >>>> Tobias >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> please review the following patch. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>> >>>>>> Problem: >>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>> >>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>> >>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>> >>>>>> state of A state of B >>>>>> ----------------------------------------- >>>>>> non-entrant non-entrant >>>>>> S [not on stack] [not on stack] >>>>>> S zombie zombie >>>>>> S marked marked >>>>>> S flushed flushed/re-allocated >>>>>> >>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>> >>>>>> Let's look at the following setting: >>>>>> >>>>>> state of A state of B >>>>>> ----------------------------------------- >>>>>> non-entrant >>>>>> S [not on stack] >>>>>> non-entrant >>>>>> S zombie [not on stack] >>>>>> zombie >>>>>> S marked marked >>>>>> S flushed flushed/re-allocated >>>>>> >>>>>> There are two problems here: >>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>> >>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>> >>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>> >>>>>> state of A state of B >>>>>> ----------------------------------------- >>>>>> unloaded unloaded >>>>>> S zombie zombie >>>>>> S marked marked >>>>>> S flushed flushed/re-allocated >>>>>> >>>>>> Again, we crash while flushing A. >>>>>> >>>>>> Solution: >>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>> >>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>> >>>>>> Testing: >>>>>> - Executed failing tests for a week (still running) >>>>>> - JPRT >>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> >>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>> >>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>> deleting icholder 0x0000000800034a18 >>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>> ## memory stomp: >>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>> >>> From vladimir.kozlov at oracle.com Tue Aug 25 17:14:19 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 25 Aug 2015 10:14:19 -0700 Subject: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DC255F.7070903@redhat.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> Message-ID: <55DCA26B.3020508@oracle.com> Okay, I agree to have only one predicate. So I am fine with version A). Thanks, Vladimir PS: "first rule will have a lower" - should compareAndSwapI be first then? > However, looking again at the code I believe I have the costs (and hence > the predicates) attached to the wrong rules in each pair. For example, > currently the rules include the following details > > compareAndSwapIAcq -- does not emit dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST ) > > compareAndSwapI -- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > The patch presumes that the first rule will have a lower (or at least no > higher) cost than the second. So a correct version would be either, > calling the predicate once: On 8/25/15 1:20 AM, Andrew Dinn wrote: > Hi Vladimir, > > Thank you very much for your review. > > On 25/08/15 01:24, Vladimir Kozlov wrote: >> I did not look deep on code logic. >> Few comments only: >> >> Use {} for all conditional code (it cause a lot of pain in the past): >> >> if (is_cas) >> return NULL; > > Ok, I'll correct all of those. > >> You don't need #ifndef PRODUCT: >> >> +#ifndef PRODUCT >> +#ifdef ASSERT > > Ah, that's good to know. Thanks. > >> New mach instructs missing predicate: >> >> predicate(needs_acquiring_load_exclusive(n)); >> >> You use higher ins_cost to avoid their generation when predicate is >> false. So why not explicit predicate? > > I had two separate reasons for not repeating the predicates: > > 1 They do quite a lot of work crawling the graph. So, calling the > predicate in the lower cost case and omitting it in the higher cost case > attempts to avoid the expense of executing it twice in some cases. > > 2 The ins_cost for the lower and higher cost cases is meant to reflect > a difference in the expected execution cost associated with the instruction. > > [n.b. I adopted the same strategy for the new membar generation rules > which were added as part of the volatile put optimization patch -- > compare membar_release and unnecessary_membar_release] > > However, looking again at the code I believe I have the costs (and hence > the predicates) attached to the wrong rules in each pair. For example, > currently the rules include the following details > > compareAndSwapIAcq -- does not emit dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST ) > > compareAndSwapI -- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > The patch presumes that the first rule will have a lower (or at least no > higher) cost than the second. So a correct version would be either, > calling the predicate once: > > A: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST) > > or, calling the predicate twice: > > B: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost (2 * VOLATILE_REF_COST) > > I would prefer to retain A over B. A is guaranteed to give the same > result as B with potentially less execution overhead. However, if you > feel B is required I will adopt that format for all rules. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Tue Aug 25 22:45:20 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 25 Aug 2015 15:45:20 -0700 Subject: 9 RFR [L] 8132081: C2 support for Adler32 on SPARC Message-ID: <55DCF000.9000708@oracle.com> http://cr.openjdk.java.net/~kvn/8132081/webrev.jdk/ http://cr.openjdk.java.net/~kvn/8132081/webrev.hotspot/ https://bugs.openjdk.java.net/browse/JDK-8132081 Add C2 instrinsic support for Adler32 checksum on SPARC. Average improvement: 38% Contributed by ahmed.khawaja at oracle.com From vladimir.kozlov at oracle.com Tue Aug 25 22:48:34 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 25 Aug 2015 15:48:34 -0700 Subject: 9 RFR [L] 8132081: C2 support for Adler32 on SPARC In-Reply-To: <55DCF000.9000708@oracle.com> References: <55DCF000.9000708@oracle.com> Message-ID: <55DCF0C2.3000506@oracle.com> I reviewed these changes. Need second review. Thanks, Vladimir On 8/25/15 3:45 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8132081/webrev.jdk/ > http://cr.openjdk.java.net/~kvn/8132081/webrev.hotspot/ > > https://bugs.openjdk.java.net/browse/JDK-8132081 > > Add C2 instrinsic support for Adler32 checksum on SPARC. > > Average improvement: 38% > > Contributed by ahmed.khawaja at oracle.com From adinn at redhat.com Wed Aug 26 09:35:55 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Aug 2015 10:35:55 +0100 Subject: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DCA26B.3020508@oracle.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> <55DCA26B.3020508@oracle.com> Message-ID: <55DD887B.5090403@redhat.com> On 25/08/15 18:14, Vladimir Kozlov wrote: > Okay, I agree to have only one predicate. So I am fine with version A). Thanks, Vladimir. So, that is now as provided in the latest posted webrev: http://cr.openjdk.java.net/~adinn/8080293/webrev.01/ > PS: "first rule will have a lower" - should compareAndSwapI be first then? Sorry, I think the problem here is that I explained the status of the original patch in a rather confusing way. I am not sure it matters all that much which rule appears first. Or do you really want the lower cost rule to appear before the higher cost one? . . . >> However, looking again at the code I believe I have the costs (and hence >> the predicates) attached to the wrong rules in each pair. For example, >> currently the rules include the following details >> >> compareAndSwapIAcq -- does not emit dmb instructions >> no predicate >> cost (2 * VOLATILE_REF_COST ) >> >> compareAndSwapI -- emits dmb instructions >> predicate(!needs_acquiring_load_exclusive(n)) >> cost VOLATILE_REF_COST . . . what I meant by that comment was that this: - The optimization implemented in this patch is based on an assumption that a generation strategy using dmb -- i.e. the one encoded by compareAndSwapI -- will execute more slowly, or at least no faster, than a generation strategy using stlr -- i.e. the one encoded by compareAndSwapIAcq. - The text above displays the original costs and predicates used to enforce the required rule selection. - In that version the /costs/ are the wrong way round with respect to the /motivating assumption/ i.e. compareAndSwapI has a lower cost than compareAndSwapIAcq. In version A the costs reflect the motivating assumption i.e. for each X in {I, L, P, N} rule compareAndSwapXAcq has a lower cost than compareAndSwapX. However, it is also true that for each X in {I, L, P, N} rule compareAndSwapX appears earlier than compareAndSwapXAcq. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From tobias.hartmann at oracle.com Wed Aug 26 10:53:04 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 26 Aug 2015 12:53:04 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> <55DC7CE1.90805@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> Message-ID: <55DD9A90.3010802@oracle.com> Hi Martin, thanks for the hints, please see comments inline. On 25.08.2015 17:01, Doerr, Martin wrote: > thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. > We had observed very sporadic assertions or freeing of unallocated memory. Okay, same here. > Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. > > Additionally, it may be required to adapt a couple of assertions. > We modified the following ASSERT code (based on hotspot 25) > 1. in CompiledIC::is_call_to_compiled(): > Accessing cached_metadata() may be unsafe if !caller->is_alive(). I assume this could be a problem in the following case: state of A state of B ------------------------------- not-entrant S [not-on-stack] S zombie unloaded Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. Right? I added an assert to cached_metadata() to make sure we don't access metadata that belongs to unloaded nmethods. > 2. in CompiledIC::is_call_to_interpreted(): > CodeBlob* db may be gone if cb is unloaded. How can that happen? I think if db is really flushed, it may as well be replaced by another allocation (for example, an adapter blob) and then your if(!db) check returns false and still triggers the assert(!db->is_adapter_blob(), ...). > 3. in CompiledIC::verify(): > is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. I don't see why is_call_to_compiled() is unsafe in the megamorphic case. Could you explain that? > 4. in nmethod::verify_clean_inline_caches(): > In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. How can that be? After method unloading all ICs of live nmethods were cleaned in CodeCache::gc_epilogue(). Transitions to zombie in the sweeper also lead to IC cleaning of other nmethods. Thanks, Tobias > For details, please see below. > > Best regards, > Martin > > > > 1. > assert( is_c1_method || > !is_monomorphic || > is_optimized() || > !caller->is_alive() || > (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); > > 2. > #ifdef ASSERT > { > CodeBlob* db = CodeCache::find_blob_unsafe(dest); > if (!db) { > nmethod *nm = cb->as_nmethod_or_null(); > assert(nm, "sanity"); > if ( nm->is_in_use() || > (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { > { // Dump some information. > ttyLocker ttyl; > tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); > nm->print(tty); > Method *m = nm->method(); > if (m) { > m->print_on(tty); > } > } > assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); > } > } else > assert(!db->is_adapter_blob(), "must use stub!"); > } > #endif /* ASSERT */ > > 3. > assert(is_clean() > || is_optimized() || is_megamorphic() > || is_call_to_compiled() || is_call_to_interpreted() > , "sanity check"); > > 4. > case relocInfo::virtual_call_type: { > CompiledIC *ic = CompiledIC_at(&iter); > // Ok, to lookup references to zombies here > CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); > if( cb != NULL && cb->is_nmethod() ) { > nmethod* nm = (nmethod*)cb; > // Verify that inline caches pointing to not_entrant methods are clean > if (nm->is_not_entrant()) { > assert(ic->is_clean(), "IC should be clean"); > } > } > break; > } > case relocInfo::opt_virtual_call_type: { > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Dienstag, 25. August 2015 16:34 > To: Doerr, Martin; Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. > > Best, > Tobias > > On 25.08.2015 14:52, Tobias Hartmann wrote: >> Hi Martin, >> >> thanks for looking at this! >> >> It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? >> >> I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. >> >> I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ >> >> I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. >> >> If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. >> >> Thanks, >> Tobias >> >> On 25.08.2015 12:34, Doerr, Martin wrote: >>> Hi all, >>> >>> we appreciate that this code gets cleaned up. >>> >>> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >>> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >>> >>> if (state == zombie) { >>> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >>> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >>> RelocIterator iter(this, low_boundary); >>> while (iter.next()) { >>> if (iter.type() == relocInfo::virtual_call_type) { >>> CompiledIC *ic = CompiledIC_at(&iter); >>> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >>> } >>> } >>> } >>> >>> (Note: set_ic_destination_and_value is currently private.) >>> >>> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >>> Not sure which approach is the better one. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >>> Sent: Dienstag, 25. August 2015 07:43 >>> To: Igor Veresov >>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >>> >>> On 24.08.2015 22:10, Igor Veresov wrote: >>>> Seems good to me. >>> >>> Thanks, Igor. >>> >>>> Btw, did you find why there is a need for ?marked for reclamation? state? >>> >>> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >>> >>> Best, >>> Tobias >>> >>>> >>>> igor >>>> >>>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>>> >>>>> Thanks, Vladimir! Please see comments inline. >>>>> >>>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>>> >>>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>>> >>>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>>> >>>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>>> >>>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>>> >>>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>>> >>>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>>> >>>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>>> >>>>>> Please, fix comment in vm_operations.cpp >>>>>> >>>>>> // Make the dependent methods zombies >>>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>>> >>>>> Fixed. >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please review the following patch. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>>> >>>>>>> Problem: >>>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>>> >>>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>>> >>>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>>> >>>>>>> state of A state of B >>>>>>> ----------------------------------------- >>>>>>> non-entrant non-entrant >>>>>>> S [not on stack] [not on stack] >>>>>>> S zombie zombie >>>>>>> S marked marked >>>>>>> S flushed flushed/re-allocated >>>>>>> >>>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>>> >>>>>>> Let's look at the following setting: >>>>>>> >>>>>>> state of A state of B >>>>>>> ----------------------------------------- >>>>>>> non-entrant >>>>>>> S [not on stack] >>>>>>> non-entrant >>>>>>> S zombie [not on stack] >>>>>>> zombie >>>>>>> S marked marked >>>>>>> S flushed flushed/re-allocated >>>>>>> >>>>>>> There are two problems here: >>>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>>> >>>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>>> >>>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>>> >>>>>>> state of A state of B >>>>>>> ----------------------------------------- >>>>>>> unloaded unloaded >>>>>>> S zombie zombie >>>>>>> S marked marked >>>>>>> S flushed flushed/re-allocated >>>>>>> >>>>>>> Again, we crash while flushing A. >>>>>>> >>>>>>> Solution: >>>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>>> >>>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>>> >>>>>>> Testing: >>>>>>> - Executed failing tests for a week (still running) >>>>>>> - JPRT >>>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>> >>>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>>> >>>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>>> deleting icholder 0x0000000800034a18 >>>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>>> ## memory stomp: >>>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>>> >>>> From roland.westrelin at oracle.com Wed Aug 26 11:31:42 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 26 Aug 2015 13:31:42 +0200 Subject: RFR(XS): 8134288 compiler/runtime/6859338/Test6859338.java crashes in PhaseIdealLoop::try_move_store_after_loop Message-ID: <81A5D258-6987-4B13-8F99-9B13FB4C9C7D@oracle.com> http://cr.openjdk.java.net/~roland/8134288/webrev.00/ Stores from code generated by c2 to update profiling (profile_taken_branch() called from Parse::do_if() if ProfileInterpreter is off)) doesn?t have a control. This looks like a corner case so I went for the simplest fix and excluded stores with no controls from the logic that tries to move stores out of loops. Roland. From roland.westrelin at oracle.com Wed Aug 26 13:52:24 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 26 Aug 2015 15:52:24 +0200 Subject: 8134321: tools/pack200/Pack200Test.java crashes in the VM PIT jdk9 b79 Message-ID: <7BEE1D97-997F-44DD-B3CC-63122C6FD50A@oracle.com> http://cr.openjdk.java.net/~roland/8134321/webrev.00/ At a safepoint, when we capture the field values for an object whose allocation is about to be eliminated, if one of these field values is behind a Phi and on one branch of the Phi the field value may be written to by arraycopy but we don't know for sure, then we should not eliminate the allocation but with the current code we simply proceed with a broken field value. Roland. From claes.redestad at oracle.com Wed Aug 26 14:47:59 2015 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 26 Aug 2015 16:47:59 +0200 Subject: Poll: Remove per-compiler thread perf. counters? Message-ID: <55DDD19F.4020003@oracle.com> Hi, I want to raise the question if there are any known users of these per-compiler thread perf. counters, or if they should be removed? sun.ci.compilerThread.#.compiles sun.ci.compilerThread.#.method sun.ci.compilerThread.#.time sun.ci.compilerThread.#.type For detailed information about compilation there are better tools available (JFR, PrintCompilation), whereas the older perf.counters are useful mostly for their aggregated values. /Claes From martin.doerr at sap.com Wed Aug 26 14:52:27 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 26 Aug 2015 14:52:27 +0000 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DD9A90.3010802@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> <55DC7CE1.90805@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> <55DD9A90.3010802@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566AD5F983@DEWDFEMB19A.global.corp.sap> Hi Tobias, here are my answers to your questions below: 1. Yes. You got the point. 2. This happened before we made the fix. I just pasted it for the case you run into crashes when accessing db->is_adapter_blob(). If the new fix is working correctly, this one may no longer be needed. 3. The crash in is_call_to_compiled() happened after creating a transition stub with the following stack: libjvm::CompiledIC::is_call_to_compiled() libjvm::CompiledIC::verify() libjvm::SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*) libjvm::SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) It's hard to find out what exactly went wrong, but I don't trust cb = CodeCache::find_blob_unsafe() cb->is_nmethod() 4. Assume the destination method has just reached zombie state via unloaded state. At this point of time, only the inline caches of the destination method were cleaned. The caller method may still be in unloaded state with the inline cache pointing to this destination method. This situation can happen because the gc_epilogue doesn't clean the inline caches of unloaded methods. Your first webrev prevents this situation. The new one deals with it by cleaning them when the caller method becomes a zombie (which is the case in which the assertion needs to allow this). Hope this helps a little bit. I guess nobody can answer all inline cache related questions ;-) Best regards, Martin -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Mittwoch, 26. August 2015 12:53 To: Doerr, Martin; Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder Hi Martin, thanks for the hints, please see comments inline. On 25.08.2015 17:01, Doerr, Martin wrote: > thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. > We had observed very sporadic assertions or freeing of unallocated memory. Okay, same here. > Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. > > Additionally, it may be required to adapt a couple of assertions. > We modified the following ASSERT code (based on hotspot 25) > 1. in CompiledIC::is_call_to_compiled(): > Accessing cached_metadata() may be unsafe if !caller->is_alive(). I assume this could be a problem in the following case: state of A state of B ------------------------------- not-entrant S [not-on-stack] S zombie unloaded Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. Right? I added an assert to cached_metadata() to make sure we don't access metadata that belongs to unloaded nmethods. > 2. in CompiledIC::is_call_to_interpreted(): > CodeBlob* db may be gone if cb is unloaded. How can that happen? I think if db is really flushed, it may as well be replaced by another allocation (for example, an adapter blob) and then your if(!db) check returns false and still triggers the assert(!db->is_adapter_blob(), ...). > 3. in CompiledIC::verify(): > is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. I don't see why is_call_to_compiled() is unsafe in the megamorphic case. Could you explain that? > 4. in nmethod::verify_clean_inline_caches(): > In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. How can that be? After method unloading all ICs of live nmethods were cleaned in CodeCache::gc_epilogue(). Transitions to zombie in the sweeper also lead to IC cleaning of other nmethods. Thanks, Tobias > For details, please see below. > > Best regards, > Martin > > > > 1. > assert( is_c1_method || > !is_monomorphic || > is_optimized() || > !caller->is_alive() || > (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); > > 2. > #ifdef ASSERT > { > CodeBlob* db = CodeCache::find_blob_unsafe(dest); > if (!db) { > nmethod *nm = cb->as_nmethod_or_null(); > assert(nm, "sanity"); > if ( nm->is_in_use() || > (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { > { // Dump some information. > ttyLocker ttyl; > tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); > nm->print(tty); > Method *m = nm->method(); > if (m) { > m->print_on(tty); > } > } > assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); > } > } else > assert(!db->is_adapter_blob(), "must use stub!"); > } > #endif /* ASSERT */ > > 3. > assert(is_clean() > || is_optimized() || is_megamorphic() > || is_call_to_compiled() || is_call_to_interpreted() > , "sanity check"); > > 4. > case relocInfo::virtual_call_type: { > CompiledIC *ic = CompiledIC_at(&iter); > // Ok, to lookup references to zombies here > CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); > if( cb != NULL && cb->is_nmethod() ) { > nmethod* nm = (nmethod*)cb; > // Verify that inline caches pointing to not_entrant methods are clean > if (nm->is_not_entrant()) { > assert(ic->is_clean(), "IC should be clean"); > } > } > break; > } > case relocInfo::opt_virtual_call_type: { > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Dienstag, 25. August 2015 16:34 > To: Doerr, Martin; Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. > > Best, > Tobias > > On 25.08.2015 14:52, Tobias Hartmann wrote: >> Hi Martin, >> >> thanks for looking at this! >> >> It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? >> >> I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. >> >> I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: >> http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ >> >> I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. >> >> If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. >> >> Thanks, >> Tobias >> >> On 25.08.2015 12:34, Doerr, Martin wrote: >>> Hi all, >>> >>> we appreciate that this code gets cleaned up. >>> >>> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >>> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >>> >>> if (state == zombie) { >>> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >>> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >>> RelocIterator iter(this, low_boundary); >>> while (iter.next()) { >>> if (iter.type() == relocInfo::virtual_call_type) { >>> CompiledIC *ic = CompiledIC_at(&iter); >>> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >>> } >>> } >>> } >>> >>> (Note: set_ic_destination_and_value is currently private.) >>> >>> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >>> Not sure which approach is the better one. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >>> Sent: Dienstag, 25. August 2015 07:43 >>> To: Igor Veresov >>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >>> >>> On 24.08.2015 22:10, Igor Veresov wrote: >>>> Seems good to me. >>> >>> Thanks, Igor. >>> >>>> Btw, did you find why there is a need for ?marked for reclamation? state? >>> >>> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >>> >>> Best, >>> Tobias >>> >>>> >>>> igor >>>> >>>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>>> >>>>> Thanks, Vladimir! Please see comments inline. >>>>> >>>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>>> >>>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>>> >>>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>>> >>>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>>> >>>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>>> >>>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>>> >>>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>>> >>>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>>> >>>>>> Please, fix comment in vm_operations.cpp >>>>>> >>>>>> // Make the dependent methods zombies >>>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>>> >>>>> Fixed. >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please review the following patch. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>>> >>>>>>> Problem: >>>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>>> >>>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>>> >>>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>>> >>>>>>> state of A state of B >>>>>>> ----------------------------------------- >>>>>>> non-entrant non-entrant >>>>>>> S [not on stack] [not on stack] >>>>>>> S zombie zombie >>>>>>> S marked marked >>>>>>> S flushed flushed/re-allocated >>>>>>> >>>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>>> >>>>>>> Let's look at the following setting: >>>>>>> >>>>>>> state of A state of B >>>>>>> ----------------------------------------- >>>>>>> non-entrant >>>>>>> S [not on stack] >>>>>>> non-entrant >>>>>>> S zombie [not on stack] >>>>>>> zombie >>>>>>> S marked marked >>>>>>> S flushed flushed/re-allocated >>>>>>> >>>>>>> There are two problems here: >>>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>>> >>>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>>> >>>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>>> >>>>>>> state of A state of B >>>>>>> ----------------------------------------- >>>>>>> unloaded unloaded >>>>>>> S zombie zombie >>>>>>> S marked marked >>>>>>> S flushed flushed/re-allocated >>>>>>> >>>>>>> Again, we crash while flushing A. >>>>>>> >>>>>>> Solution: >>>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>>> >>>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>>> >>>>>>> Testing: >>>>>>> - Executed failing tests for a week (still running) >>>>>>> - JPRT >>>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>> >>>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>>> >>>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>>> deleting icholder 0x0000000800034a18 >>>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>>> ## memory stomp: >>>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>>> >>>> From vladimir.kozlov at oracle.com Wed Aug 26 15:41:14 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2015 08:41:14 -0700 Subject: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DD887B.5090403@redhat.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> <55DCA26B.3020508@oracle.com> <55DD887B.5090403@redhat.com> Message-ID: <55DDDE1A.7030008@oracle.com> Looks good. Thank you for clarification. Vladimir On 8/26/15 2:35 AM, Andrew Dinn wrote: > On 25/08/15 18:14, Vladimir Kozlov wrote: >> Okay, I agree to have only one predicate. So I am fine with version A). > > Thanks, Vladimir. So, that is now as provided in the latest posted webrev: > > http://cr.openjdk.java.net/~adinn/8080293/webrev.01/ > >> PS: "first rule will have a lower" - should compareAndSwapI be first then? > > Sorry, I think the problem here is that I explained the status of the > original patch in a rather confusing way. I am not sure it matters all > that much which rule appears first. Or do you really want the lower cost > rule to appear before the higher cost one? . . . > >>> However, looking again at the code I believe I have the costs (and hence >>> the predicates) attached to the wrong rules in each pair. For example, >>> currently the rules include the following details >>> >>> compareAndSwapIAcq -- does not emit dmb instructions >>> no predicate >>> cost (2 * VOLATILE_REF_COST ) >>> >>> compareAndSwapI -- emits dmb instructions >>> predicate(!needs_acquiring_load_exclusive(n)) >>> cost VOLATILE_REF_COST > > > . . . what I meant by that comment was that this: > > - The optimization implemented in this patch is based on an assumption > that a generation strategy using dmb -- i.e. the one encoded by > compareAndSwapI -- will execute more slowly, or at least no faster, than > a generation strategy using stlr -- i.e. the one encoded by > compareAndSwapIAcq. > > - The text above displays the original costs and predicates used to > enforce the required rule selection. > > - In that version the /costs/ are the wrong way round with respect to > the /motivating assumption/ i.e. compareAndSwapI has a lower cost than > compareAndSwapIAcq. > > In version A the costs reflect the motivating assumption i.e. for each > X in {I, L, P, N} rule compareAndSwapXAcq has a lower cost than > compareAndSwapX. > > However, it is also true that for each X in {I, L, P, N} rule > compareAndSwapX appears earlier than compareAndSwapXAcq. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From tobias.hartmann at oracle.com Wed Aug 26 16:14:45 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 26 Aug 2015 18:14:45 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566AD5F983@DEWDFEMB19A.global.corp.sap> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> <55DC7CE1.90805@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> <55DD9A90.3010802@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F983@DEWDFEMB19A.global.corp.sap> Message-ID: <55DDE5F5.4020407@oracle.com> Hi Martin, On 26.08.2015 16:52, Doerr, Martin wrote: > 2. This happened before we made the fix. I just pasted it for the case you run into crashes when accessing db->is_adapter_blob(). If the new fix is working correctly, this one may no longer be needed. Okay, I think it's no longer necessary. > 3. The crash in is_call_to_compiled() happened after creating a transition stub with the following stack: > libjvm::CompiledIC::is_call_to_compiled() > libjvm::CompiledIC::verify() > libjvm::SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*) > libjvm::SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) > > It's hard to find out what exactly went wrong, but I don't trust > cb = CodeCache::find_blob_unsafe() > cb->is_nmethod() Okay, interesting. I think it should be safe to call CodeCache::find_blob_unsafe() because the VtableStub is just a BufferBlob in the code cache and cb->is_nmethod() should return false. > 4. Assume the destination method has just reached zombie state via unloaded state. At this point of time, only the inline caches of the destination method were cleaned. > The caller method may still be in unloaded state with the inline cache pointing to this destination method. > This situation can happen because the gc_epilogue doesn't clean the inline caches of unloaded methods. > Your first webrev prevents this situation. The new one deals with it by cleaning them when the caller method becomes a zombie (which is the case in which the assertion needs to allow this). Right, but we only call nmethod::verify_clean_inline_caches() for alive nmethods (see CodeCache::verify_clean_inline_caches()), so this should not be possible. > Hope this helps a little bit. I guess nobody can answer all inline cache related questions ;-) Sure, thanks a lot! Best, Tobias > > Best regards, > Martin > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Mittwoch, 26. August 2015 12:53 > To: Doerr, Martin; Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > Hi Martin, > > thanks for the hints, please see comments inline. > > On 25.08.2015 17:01, Doerr, Martin wrote: >> thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. >> We had observed very sporadic assertions or freeing of unallocated memory. > > Okay, same here. > >> Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. >> >> Additionally, it may be required to adapt a couple of assertions. >> We modified the following ASSERT code (based on hotspot 25) >> 1. in CompiledIC::is_call_to_compiled(): >> Accessing cached_metadata() may be unsafe if !caller->is_alive(). > > I assume this could be a problem in the following case: > > state of A state of B > ------------------------------- > not-entrant > S [not-on-stack] > S zombie > unloaded > > Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. Right? > > I added an assert to cached_metadata() to make sure we don't access metadata that belongs to unloaded nmethods. > >> 2. in CompiledIC::is_call_to_interpreted(): >> CodeBlob* db may be gone if cb is unloaded. > > How can that happen? > > I think if db is really flushed, it may as well be replaced by another allocation (for example, an adapter blob) and then your if(!db) check returns false and still triggers the assert(!db->is_adapter_blob(), ...). > >> 3. in CompiledIC::verify(): >> is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. > > I don't see why is_call_to_compiled() is unsafe in the megamorphic case. Could you explain that? > >> 4. in nmethod::verify_clean_inline_caches(): >> In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. > > How can that be? After method unloading all ICs of live nmethods were cleaned in CodeCache::gc_epilogue(). Transitions to zombie in the sweeper also lead to IC cleaning of other nmethods. > > Thanks, > Tobias > >> For details, please see below. >> >> Best regards, >> Martin >> >> >> >> 1. >> assert( is_c1_method || >> !is_monomorphic || >> is_optimized() || >> !caller->is_alive() || >> (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); >> >> 2. >> #ifdef ASSERT >> { >> CodeBlob* db = CodeCache::find_blob_unsafe(dest); >> if (!db) { >> nmethod *nm = cb->as_nmethod_or_null(); >> assert(nm, "sanity"); >> if ( nm->is_in_use() || >> (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { >> { // Dump some information. >> ttyLocker ttyl; >> tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); >> nm->print(tty); >> Method *m = nm->method(); >> if (m) { >> m->print_on(tty); >> } >> } >> assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); >> } >> } else >> assert(!db->is_adapter_blob(), "must use stub!"); >> } >> #endif /* ASSERT */ >> >> 3. >> assert(is_clean() >> || is_optimized() || is_megamorphic() >> || is_call_to_compiled() || is_call_to_interpreted() >> , "sanity check"); >> >> 4. >> case relocInfo::virtual_call_type: { >> CompiledIC *ic = CompiledIC_at(&iter); >> // Ok, to lookup references to zombies here >> CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); >> if( cb != NULL && cb->is_nmethod() ) { >> nmethod* nm = (nmethod*)cb; >> // Verify that inline caches pointing to not_entrant methods are clean >> if (nm->is_not_entrant()) { >> assert(ic->is_clean(), "IC should be clean"); >> } >> } >> break; >> } >> case relocInfo::opt_virtual_call_type: { >> >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Dienstag, 25. August 2015 16:34 >> To: Doerr, Martin; Igor Veresov >> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >> >> I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. >> >> Best, >> Tobias >> >> On 25.08.2015 14:52, Tobias Hartmann wrote: >>> Hi Martin, >>> >>> thanks for looking at this! >>> >>> It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? >>> >>> I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. >>> >>> I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ >>> >>> I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. >>> >>> If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. >>> >>> Thanks, >>> Tobias >>> >>> On 25.08.2015 12:34, Doerr, Martin wrote: >>>> Hi all, >>>> >>>> we appreciate that this code gets cleaned up. >>>> >>>> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >>>> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >>>> >>>> if (state == zombie) { >>>> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >>>> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >>>> RelocIterator iter(this, low_boundary); >>>> while (iter.next()) { >>>> if (iter.type() == relocInfo::virtual_call_type) { >>>> CompiledIC *ic = CompiledIC_at(&iter); >>>> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >>>> } >>>> } >>>> } >>>> >>>> (Note: set_ic_destination_and_value is currently private.) >>>> >>>> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >>>> Not sure which approach is the better one. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >>>> Sent: Dienstag, 25. August 2015 07:43 >>>> To: Igor Veresov >>>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >>>> >>>> On 24.08.2015 22:10, Igor Veresov wrote: >>>>> Seems good to me. >>>> >>>> Thanks, Igor. >>>> >>>>> Btw, did you find why there is a need for ?marked for reclamation? state? >>>> >>>> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >>>> >>>> Best, >>>> Tobias >>>> >>>>> >>>>> igor >>>>> >>>>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>>>> >>>>>> Thanks, Vladimir! Please see comments inline. >>>>>> >>>>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>>>> >>>>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>>>> >>>>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>>>> >>>>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>>>> >>>>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>>>> >>>>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>>>> >>>>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>>>> >>>>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>>>> >>>>>>> Please, fix comment in vm_operations.cpp >>>>>>> >>>>>>> // Make the dependent methods zombies >>>>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>>>> >>>>>> Fixed. >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> please review the following patch. >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>>>> >>>>>>>> Problem: >>>>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>>>> >>>>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>>>> >>>>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>>>> >>>>>>>> state of A state of B >>>>>>>> ----------------------------------------- >>>>>>>> non-entrant non-entrant >>>>>>>> S [not on stack] [not on stack] >>>>>>>> S zombie zombie >>>>>>>> S marked marked >>>>>>>> S flushed flushed/re-allocated >>>>>>>> >>>>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>>>> >>>>>>>> Let's look at the following setting: >>>>>>>> >>>>>>>> state of A state of B >>>>>>>> ----------------------------------------- >>>>>>>> non-entrant >>>>>>>> S [not on stack] >>>>>>>> non-entrant >>>>>>>> S zombie [not on stack] >>>>>>>> zombie >>>>>>>> S marked marked >>>>>>>> S flushed flushed/re-allocated >>>>>>>> >>>>>>>> There are two problems here: >>>>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>>>> >>>>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>>>> >>>>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>>>> >>>>>>>> state of A state of B >>>>>>>> ----------------------------------------- >>>>>>>> unloaded unloaded >>>>>>>> S zombie zombie >>>>>>>> S marked marked >>>>>>>> S flushed flushed/re-allocated >>>>>>>> >>>>>>>> Again, we crash while flushing A. >>>>>>>> >>>>>>>> Solution: >>>>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>>>> >>>>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>>>> >>>>>>>> Testing: >>>>>>>> - Executed failing tests for a week (still running) >>>>>>>> - JPRT >>>>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>> >>>>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>>>> >>>>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>>>> deleting icholder 0x0000000800034a18 >>>>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>>>> ## memory stomp: >>>>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>>>> >>>>> From vladimir.kozlov at oracle.com Wed Aug 26 16:27:52 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2015 09:27:52 -0700 Subject: 8134321: tools/pack200/Pack200Test.java crashes in the VM PIT jdk9 b79 In-Reply-To: <7BEE1D97-997F-44DD-B3CC-63122C6FD50A@oracle.com> References: <7BEE1D97-997F-44DD-B3CC-63122C6FD50A@oracle.com> Message-ID: <55DDE908.60300@oracle.com> Reasonable fix. Looks good. Thanks, Vladimir On 8/26/15 6:52 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8134321/webrev.00/ > > At a safepoint, when we capture the field values for an object whose allocation is about to be eliminated, if one of these field values is behind a Phi and on one branch of the Phi the field value may be written to by arraycopy but we don't know for sure, then we should not eliminate the allocation but with the current code we simply proceed with a broken field value. > > Roland. > From adinn at redhat.com Wed Aug 26 16:32:58 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Aug 2015 17:32:58 +0100 Subject: RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DB3754.9090504@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> Message-ID: <55DDEA3A.8070603@redhat.com> The following AArch64-only webrev against hs-comp (fix contributed by Hui Sha of Linaro) fixes several problems with biased locking on AArch64 http://cr.openjdk.java.net/~adinn/8134322/webrev.00/ I have reviewed the patch. It requires one more AArch64 reviewer who can also commit it to hs-comp (Andrew Haley?). When I tested it running netbeans with biased locking enabled it seemed (just by feel) to improve performance. A follow up to enable biased locking by default might be worth investigating. n.b. I built this on hs-comp on top of my (almost but not yet committed) patch for 8080293 so that patch also gets shown in the webrev. The two patches are independent and should both commit cleanly with or without the other. From vladimir.kozlov at oracle.com Wed Aug 26 16:33:12 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2015 09:33:12 -0700 Subject: RFR(XS): 8134288 compiler/runtime/6859338/Test6859338.java crashes in PhaseIdealLoop::try_move_store_after_loop In-Reply-To: <81A5D258-6987-4B13-8F99-9B13FB4C9C7D@oracle.com> References: <81A5D258-6987-4B13-8F99-9B13FB4C9C7D@oracle.com> Message-ID: <55DDEA48.8060601@oracle.com> Looks fine. Thanks, Vladimir On 8/26/15 4:31 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8134288/webrev.00/ > > Stores from code generated by c2 to update profiling (profile_taken_branch() called from Parse::do_if() if ProfileInterpreter is off)) doesn?t have a control. This looks like a corner case so I went for the simplest fix and excluded stores with no controls from the logic that tries to move stores out of loops. > > Roland. > From aph at redhat.com Wed Aug 26 16:40:34 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Aug 2015 17:40:34 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DDEA3A.8070603@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> Message-ID: <55DDEC02.6080502@redhat.com> On 08/26/2015 05:32 PM, Andrew Dinn wrote: > The following AArch64-only webrev against hs-comp (fix contributed by > Hui Sha of Linaro) fixes several problems with biased locking on AArch64 > > http://cr.openjdk.java.net/~adinn/8134322/webrev.00/ > > I have reviewed the patch. It requires one more AArch64 reviewer who can > also commit it to hs-comp (Andrew Haley?). This is OK. Andrew. From roland.westrelin at oracle.com Wed Aug 26 18:36:28 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 26 Aug 2015 20:36:28 +0200 Subject: 8134321: tools/pack200/Pack200Test.java crashes in the VM PIT jdk9 b79 In-Reply-To: <55DDE908.60300@oracle.com> References: <7BEE1D97-997F-44DD-B3CC-63122C6FD50A@oracle.com> <55DDE908.60300@oracle.com> Message-ID: Thanks for the review, Vladimir. Roland. > On Aug 26, 2015, at 6:27 PM, Vladimir Kozlov wrote: > > Reasonable fix. Looks good. > > Thanks, > Vladimir > > On 8/26/15 6:52 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8134321/webrev.00/ >> >> At a safepoint, when we capture the field values for an object whose allocation is about to be eliminated, if one of these field values is behind a Phi and on one branch of the Phi the field value may be written to by arraycopy but we don't know for sure, then we should not eliminate the allocation but with the current code we simply proceed with a broken field value. >> >> Roland. >> From tobias.hartmann at oracle.com Thu Aug 27 07:10:24 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 27 Aug 2015 09:10:24 +0200 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper Message-ID: <55DEB7E0.6040002@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8134493 http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ Problem: This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. Solution: We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): state of A state of B ------------------------------- not-entrant S [not-on-stack] S zombie unloaded Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8075805 From martin.doerr at sap.com Thu Aug 27 09:17:19 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 27 Aug 2015 09:17:19 +0000 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <55DDE5F5.4020407@oracle.com> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> <55DC7CE1.90805@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> <55DD9A90.3010802@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F983@DEWDFEMB19A.global.corp.sap> <55DDE5F5.4020407@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566AD5FABA@DEWDFEMB19A.global.corp.sap> Hi Tobias, about 3.: It should normally work as you described, yes. I guess the inline cache destination was a dangling pointer and find_blob_unsafe found something else. about 4.: Right, the assertion must have been hit before the caller transitioned to unloaded state. Observed on SPARC: assert(ic->is_clean()) failed: IC should be clean Stack: void nmethod::verify_clean_inline_caches() void CodeCache::verify_clean_inline_caches() G1CodeCacheUnloadingTask::~G1CodeCacheUnloadingTask #Nvariant 1() void G1CollectedHeap::parallel_cleaning(BoolObjectClosure*,bool,bool,bool) void ConcurrentMark::weakRefsWorkParallelPart(BoolObjectClosure*,bool) void ConcurrentMark::weakRefsWork(bool) void ConcurrentMark::checkpointRootsFinal(bool) void CMCheckpointRootsFinalClosure::do_void() void VM_CGC_Operation::doit() void VM_Operation::evaluate() void VMThread::evaluate_operation(VM_Operation*) void VMThread::loop() void VMThread::run() Best regards, Martin -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Mittwoch, 26. August 2015 18:15 To: Doerr, Martin; Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder Hi Martin, On 26.08.2015 16:52, Doerr, Martin wrote: > 2. This happened before we made the fix. I just pasted it for the case you run into crashes when accessing db->is_adapter_blob(). If the new fix is working correctly, this one may no longer be needed. Okay, I think it's no longer necessary. > 3. The crash in is_call_to_compiled() happened after creating a transition stub with the following stack: > libjvm::CompiledIC::is_call_to_compiled() > libjvm::CompiledIC::verify() > libjvm::SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*) > libjvm::SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) > > It's hard to find out what exactly went wrong, but I don't trust > cb = CodeCache::find_blob_unsafe() > cb->is_nmethod() Okay, interesting. I think it should be safe to call CodeCache::find_blob_unsafe() because the VtableStub is just a BufferBlob in the code cache and cb->is_nmethod() should return false. > 4. Assume the destination method has just reached zombie state via unloaded state. At this point of time, only the inline caches of the destination method were cleaned. > The caller method may still be in unloaded state with the inline cache pointing to this destination method. > This situation can happen because the gc_epilogue doesn't clean the inline caches of unloaded methods. > Your first webrev prevents this situation. The new one deals with it by cleaning them when the caller method becomes a zombie (which is the case in which the assertion needs to allow this). Right, but we only call nmethod::verify_clean_inline_caches() for alive nmethods (see CodeCache::verify_clean_inline_caches()), so this should not be possible. > Hope this helps a little bit. I guess nobody can answer all inline cache related questions ;-) Sure, thanks a lot! Best, Tobias > > Best regards, > Martin > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Mittwoch, 26. August 2015 12:53 > To: Doerr, Martin; Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > Hi Martin, > > thanks for the hints, please see comments inline. > > On 25.08.2015 17:01, Doerr, Martin wrote: >> thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. >> We had observed very sporadic assertions or freeing of unallocated memory. > > Okay, same here. > >> Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. >> >> Additionally, it may be required to adapt a couple of assertions. >> We modified the following ASSERT code (based on hotspot 25) >> 1. in CompiledIC::is_call_to_compiled(): >> Accessing cached_metadata() may be unsafe if !caller->is_alive(). > > I assume this could be a problem in the following case: > > state of A state of B > ------------------------------- > not-entrant > S [not-on-stack] > S zombie > unloaded > > Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. Right? > > I added an assert to cached_metadata() to make sure we don't access metadata that belongs to unloaded nmethods. > >> 2. in CompiledIC::is_call_to_interpreted(): >> CodeBlob* db may be gone if cb is unloaded. > > How can that happen? > > I think if db is really flushed, it may as well be replaced by another allocation (for example, an adapter blob) and then your if(!db) check returns false and still triggers the assert(!db->is_adapter_blob(), ...). > >> 3. in CompiledIC::verify(): >> is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. > > I don't see why is_call_to_compiled() is unsafe in the megamorphic case. Could you explain that? > >> 4. in nmethod::verify_clean_inline_caches(): >> In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. > > How can that be? After method unloading all ICs of live nmethods were cleaned in CodeCache::gc_epilogue(). Transitions to zombie in the sweeper also lead to IC cleaning of other nmethods. > > Thanks, > Tobias > >> For details, please see below. >> >> Best regards, >> Martin >> >> >> >> 1. >> assert( is_c1_method || >> !is_monomorphic || >> is_optimized() || >> !caller->is_alive() || >> (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); >> >> 2. >> #ifdef ASSERT >> { >> CodeBlob* db = CodeCache::find_blob_unsafe(dest); >> if (!db) { >> nmethod *nm = cb->as_nmethod_or_null(); >> assert(nm, "sanity"); >> if ( nm->is_in_use() || >> (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { >> { // Dump some information. >> ttyLocker ttyl; >> tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); >> nm->print(tty); >> Method *m = nm->method(); >> if (m) { >> m->print_on(tty); >> } >> } >> assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); >> } >> } else >> assert(!db->is_adapter_blob(), "must use stub!"); >> } >> #endif /* ASSERT */ >> >> 3. >> assert(is_clean() >> || is_optimized() || is_megamorphic() >> || is_call_to_compiled() || is_call_to_interpreted() >> , "sanity check"); >> >> 4. >> case relocInfo::virtual_call_type: { >> CompiledIC *ic = CompiledIC_at(&iter); >> // Ok, to lookup references to zombies here >> CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); >> if( cb != NULL && cb->is_nmethod() ) { >> nmethod* nm = (nmethod*)cb; >> // Verify that inline caches pointing to not_entrant methods are clean >> if (nm->is_not_entrant()) { >> assert(ic->is_clean(), "IC should be clean"); >> } >> } >> break; >> } >> case relocInfo::opt_virtual_call_type: { >> >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Dienstag, 25. August 2015 16:34 >> To: Doerr, Martin; Igor Veresov >> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >> >> I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. >> >> Best, >> Tobias >> >> On 25.08.2015 14:52, Tobias Hartmann wrote: >>> Hi Martin, >>> >>> thanks for looking at this! >>> >>> It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? >>> >>> I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. >>> >>> I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: >>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ >>> >>> I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. >>> >>> If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. >>> >>> Thanks, >>> Tobias >>> >>> On 25.08.2015 12:34, Doerr, Martin wrote: >>>> Hi all, >>>> >>>> we appreciate that this code gets cleaned up. >>>> >>>> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >>>> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >>>> >>>> if (state == zombie) { >>>> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >>>> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >>>> RelocIterator iter(this, low_boundary); >>>> while (iter.next()) { >>>> if (iter.type() == relocInfo::virtual_call_type) { >>>> CompiledIC *ic = CompiledIC_at(&iter); >>>> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >>>> } >>>> } >>>> } >>>> >>>> (Note: set_ic_destination_and_value is currently private.) >>>> >>>> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >>>> Not sure which approach is the better one. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >>>> Sent: Dienstag, 25. August 2015 07:43 >>>> To: Igor Veresov >>>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >>>> >>>> On 24.08.2015 22:10, Igor Veresov wrote: >>>>> Seems good to me. >>>> >>>> Thanks, Igor. >>>> >>>>> Btw, did you find why there is a need for ?marked for reclamation? state? >>>> >>>> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >>>> >>>> Best, >>>> Tobias >>>> >>>>> >>>>> igor >>>>> >>>>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>>>> >>>>>> Thanks, Vladimir! Please see comments inline. >>>>>> >>>>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>>>> >>>>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>>>> >>>>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>>>> >>>>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>>>> >>>>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>>>> >>>>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>>>> >>>>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>>>> >>>>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>>>> >>>>>>> Please, fix comment in vm_operations.cpp >>>>>>> >>>>>>> // Make the dependent methods zombies >>>>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>>>> >>>>>> Fixed. >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> please review the following patch. >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>>>> >>>>>>>> Problem: >>>>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>>>> >>>>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>>>> >>>>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>>>> >>>>>>>> state of A state of B >>>>>>>> ----------------------------------------- >>>>>>>> non-entrant non-entrant >>>>>>>> S [not on stack] [not on stack] >>>>>>>> S zombie zombie >>>>>>>> S marked marked >>>>>>>> S flushed flushed/re-allocated >>>>>>>> >>>>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>>>> >>>>>>>> Let's look at the following setting: >>>>>>>> >>>>>>>> state of A state of B >>>>>>>> ----------------------------------------- >>>>>>>> non-entrant >>>>>>>> S [not on stack] >>>>>>>> non-entrant >>>>>>>> S zombie [not on stack] >>>>>>>> zombie >>>>>>>> S marked marked >>>>>>>> S flushed flushed/re-allocated >>>>>>>> >>>>>>>> There are two problems here: >>>>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>>>> >>>>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>>>> >>>>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>>>> >>>>>>>> state of A state of B >>>>>>>> ----------------------------------------- >>>>>>>> unloaded unloaded >>>>>>>> S zombie zombie >>>>>>>> S marked marked >>>>>>>> S flushed flushed/re-allocated >>>>>>>> >>>>>>>> Again, we crash while flushing A. >>>>>>>> >>>>>>>> Solution: >>>>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>>>> >>>>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>>>> >>>>>>>> Testing: >>>>>>>> - Executed failing tests for a week (still running) >>>>>>>> - JPRT >>>>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>> >>>>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>>>> >>>>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>>>> deleting icholder 0x0000000800034a18 >>>>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>>>> ## memory stomp: >>>>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>>>> >>>>> From martin.doerr at sap.com Thu Aug 27 10:26:55 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 27 Aug 2015 10:26:55 +0000 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <55DEB7E0.6040002@oracle.com> References: <55DEB7E0.6040002@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566AD5FAEB@DEWDFEMB19A.global.corp.sap> Hi, one question about always avoiding transition stubs for unloaded nmethods in cleanup_inline_caches() came into my mind when taking a second look at the change. When we transition to zombie state, this should be safe because it's guaranteed that no activations exist. However, cleanup_inline_caches() is used at other places, too. The comment about the unloaded state only says "// there should be no activations". Is it guaranteed that there is no activation of an nmethod which is in unloaded state? If not, the change may have undesired side effects. Besides this, the change looks good to me. Best regards, Martin -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Donnerstag, 27. August 2015 09:10 To: hotspot-compiler-dev at openjdk.java.net Cc: Doerr, Martin Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8134493 http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ Problem: This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. Solution: We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): state of A state of B ------------------------------- not-entrant S [not-on-stack] S zombie unloaded Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8075805 From tobias.hartmann at oracle.com Thu Aug 27 12:02:46 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 27 Aug 2015 14:02:46 +0200 Subject: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566AD5FABA@DEWDFEMB19A.global.corp.sap> References: <55D7376E.90103@oracle.com> <55D75FC4.1040208@oracle.com> <55DACEB9.5070107@oracle.com> <55DC005B.9010109@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F743@DEWDFEMB19A.global.corp.sap> <55DC6503.20301@oracle.com> <55DC7CE1.90805@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F7EB@DEWDFEMB19A.global.corp.sap> <55DD9A90.3010802@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5F983@DEWDFEMB19A.global.corp.sap> <55DDE5F5.4020407@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5FABA@DEWDFEMB19A.global.corp.sap> Message-ID: <55DEFC66.2060607@oracle.com> Hi Martin, On 27.08.2015 11:17, Doerr, Martin wrote: > It should normally work as you described, yes. I guess the inline cache destination was a dangling pointer and find_blob_unsafe found something else. Yes, that could be. > about 4.: > Right, the assertion must have been hit before the caller transitioned to unloaded state. Observed on SPARC: > > assert(ic->is_clean()) failed: IC should be clean > > Stack: > void nmethod::verify_clean_inline_caches() > void CodeCache::verify_clean_inline_caches() > G1CodeCacheUnloadingTask::~G1CodeCacheUnloadingTask #Nvariant 1() > void G1CollectedHeap::parallel_cleaning(BoolObjectClosure*,bool,bool,bool) > void ConcurrentMark::weakRefsWorkParallelPart(BoolObjectClosure*,bool) > void ConcurrentMark::weakRefsWork(bool) > void ConcurrentMark::checkpointRootsFinal(bool) > void CMCheckpointRootsFinalClosure::do_void() > void VM_CGC_Operation::doit() > void VM_Operation::evaluate() > void VMThread::evaluate_operation(VM_Operation*) > void VMThread::loop() > void VMThread::run() I think at this stage all the "-> unloaded" transitions already happened and no alive nmethods should point to dead nmethods. The method verify_clean_inline_caches() is called to verify this. Changing it to ignore zombie/unloaded destination methods may only hide an underlying problem and reduce the verification coverage. I looked at G1CodeCacheUnloadingTask but couldn't spot any obvious problems. Thanks, Tobias > > Best regards, > Martin > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Mittwoch, 26. August 2015 18:15 > To: Doerr, Martin; Igor Veresov > Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder > > Hi Martin, > > On 26.08.2015 16:52, Doerr, Martin wrote: >> 2. This happened before we made the fix. I just pasted it for the case you run into crashes when accessing db->is_adapter_blob(). If the new fix is working correctly, this one may no longer be needed. > > Okay, I think it's no longer necessary. > >> 3. The crash in is_call_to_compiled() happened after creating a transition stub with the following stack: >> libjvm::CompiledIC::is_call_to_compiled() >> libjvm::CompiledIC::verify() >> libjvm::SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*) >> libjvm::SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) >> >> It's hard to find out what exactly went wrong, but I don't trust >> cb = CodeCache::find_blob_unsafe() >> cb->is_nmethod() > > Okay, interesting. I think it should be safe to call CodeCache::find_blob_unsafe() because the VtableStub is just a BufferBlob in the code cache and cb->is_nmethod() should return false. > >> 4. Assume the destination method has just reached zombie state via unloaded state. At this point of time, only the inline caches of the destination method were cleaned. >> The caller method may still be in unloaded state with the inline cache pointing to this destination method. >> This situation can happen because the gc_epilogue doesn't clean the inline caches of unloaded methods. >> Your first webrev prevents this situation. The new one deals with it by cleaning them when the caller method becomes a zombie (which is the case in which the assertion needs to allow this). > > Right, but we only call nmethod::verify_clean_inline_caches() for alive nmethods (see CodeCache::verify_clean_inline_caches()), so this should not be possible. > >> Hope this helps a little bit. I guess nobody can answer all inline cache related questions ;-) > > Sure, thanks a lot! > > Best, > Tobias > >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Mittwoch, 26. August 2015 12:53 >> To: Doerr, Martin; Igor Veresov >> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >> >> Hi Martin, >> >> thanks for the hints, please see comments inline. >> >> On 25.08.2015 17:01, Doerr, Martin wrote: >>> thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. >>> We had observed very sporadic assertions or freeing of unallocated memory. >> >> Okay, same here. >> >>> Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test. >>> >>> Additionally, it may be required to adapt a couple of assertions. >>> We modified the following ASSERT code (based on hotspot 25) >>> 1. in CompiledIC::is_call_to_compiled(): >>> Accessing cached_metadata() may be unsafe if !caller->is_alive(). >> >> I assume this could be a problem in the following case: >> >> state of A state of B >> ------------------------------- >> not-entrant >> S [not-on-stack] >> S zombie >> unloaded >> >> Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. Right? >> >> I added an assert to cached_metadata() to make sure we don't access metadata that belongs to unloaded nmethods. >> >>> 2. in CompiledIC::is_call_to_interpreted(): >>> CodeBlob* db may be gone if cb is unloaded. >> >> How can that happen? >> >> I think if db is really flushed, it may as well be replaced by another allocation (for example, an adapter blob) and then your if(!db) check returns false and still triggers the assert(!db->is_adapter_blob(), ...). >> >>> 3. in CompiledIC::verify(): >>> is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks. >> >> I don't see why is_call_to_compiled() is unsafe in the megamorphic case. Could you explain that? >> >>> 4. in nmethod::verify_clean_inline_caches(): >>> In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method. >> >> How can that be? After method unloading all ICs of live nmethods were cleaned in CodeCache::gc_epilogue(). Transitions to zombie in the sweeper also lead to IC cleaning of other nmethods. >> >> Thanks, >> Tobias >> >>> For details, please see below. >>> >>> Best regards, >>> Martin >>> >>> >>> >>> 1. >>> assert( is_c1_method || >>> !is_monomorphic || >>> is_optimized() || >>> !caller->is_alive() || >>> (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check"); >>> >>> 2. >>> #ifdef ASSERT >>> { >>> CodeBlob* db = CodeCache::find_blob_unsafe(dest); >>> if (!db) { >>> nmethod *nm = cb->as_nmethod_or_null(); >>> assert(nm, "sanity"); >>> if ( nm->is_in_use() || >>> (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { >>> { // Dump some information. >>> ttyLocker ttyl; >>> tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); >>> nm->print(tty); >>> Method *m = nm->method(); >>> if (m) { >>> m->print_on(tty); >>> } >>> } >>> assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); >>> } >>> } else >>> assert(!db->is_adapter_blob(), "must use stub!"); >>> } >>> #endif /* ASSERT */ >>> >>> 3. >>> assert(is_clean() >>> || is_optimized() || is_megamorphic() >>> || is_call_to_compiled() || is_call_to_interpreted() >>> , "sanity check"); >>> >>> 4. >>> case relocInfo::virtual_call_type: { >>> CompiledIC *ic = CompiledIC_at(&iter); >>> // Ok, to lookup references to zombies here >>> CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination()); >>> if( cb != NULL && cb->is_nmethod() ) { >>> nmethod* nm = (nmethod*)cb; >>> // Verify that inline caches pointing to not_entrant methods are clean >>> if (nm->is_not_entrant()) { >>> assert(ic->is_clean(), "IC should be clean"); >>> } >>> } >>> break; >>> } >>> case relocInfo::opt_virtual_call_type: { >>> >>> >>> -----Original Message----- >>> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >>> Sent: Dienstag, 25. August 2015 16:34 >>> To: Doerr, Martin; Igor Veresov >>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >>> >>> I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place. >>> >>> Best, >>> Tobias >>> >>> On 25.08.2015 14:52, Tobias Hartmann wrote: >>>> Hi Martin, >>>> >>>> thanks for looking at this! >>>> >>>> It's interesting that you encountered and fixed the same problem. Were you able to create a regression test? >>>> >>>> I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time. >>>> >>>> I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: >>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/ >>>> >>>> I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions. >>>> >>>> If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> On 25.08.2015 12:34, Doerr, Martin wrote: >>>>> Hi all, >>>>> >>>>> we appreciate that this code gets cleaned up. >>>>> >>>>> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration? >>>>> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie: >>>>> >>>>> if (state == zombie) { >>>>> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock); >>>>> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches. >>>>> RelocIterator iter(this, low_boundary); >>>>> while (iter.next()) { >>>>> if (iter.type() == relocInfo::virtual_call_type) { >>>>> CompiledIC *ic = CompiledIC_at(&iter); >>>>> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL); >>>>> } >>>>> } >>>>> } >>>>> >>>>> (Note: set_ic_destination_and_value is currently private.) >>>>> >>>>> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. >>>>> Not sure which approach is the better one. >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann >>>>> Sent: Dienstag, 25. August 2015 07:43 >>>>> To: Igor Veresov >>>>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder >>>>> >>>>> On 24.08.2015 22:10, Igor Veresov wrote: >>>>>> Seems good to me. >>>>> >>>>> Thanks, Igor. >>>>> >>>>>> Btw, did you find why there is a need for ?marked for reclamation? state? >>>>> >>>>> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. >>>>> >>>>> Best, >>>>> Tobias >>>>> >>>>>> >>>>>> igor >>>>>> >>>>>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann wrote: >>>>>>> >>>>>>> Thanks, Vladimir! Please see comments inline. >>>>>>> >>>>>>> On 21.08.2015 19:28, Vladimir Kozlov wrote: >>>>>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? >>>>>>> >>>>>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. >>>>>>> >>>>>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods. >>>>>>> >>>>>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." >>>>>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded. >>>>>>> >>>>>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. >>>>>>> >>>>>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. >>>>>>> >>>>>>>> What spacing you changed in compiledIC.cpp because webrev does not show them? >>>>>>> >>>>>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch": >>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch >>>>>>> >>>>>>>> Please, fix comment in vm_operations.cpp >>>>>>>> >>>>>>>> // Make the dependent methods zombies >>>>>>>> - CodeCache::make_marked_nmethods_zombies(); >>>>>>>> + CodeCache::make_marked_nmethods_not_entrant(); >>>>>>> >>>>>>> Fixed. >>>>>>> >>>>>>> New webrev: >>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> please review the following patch. >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075805 >>>>>>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ >>>>>>>>> >>>>>>>>> Problem: >>>>>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. >>>>>>>>> >>>>>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. >>>>>>>>> >>>>>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. >>>>>>>>> >>>>>>>>> state of A state of B >>>>>>>>> ----------------------------------------- >>>>>>>>> non-entrant non-entrant >>>>>>>>> S [not on stack] [not on stack] >>>>>>>>> S zombie zombie >>>>>>>>> S marked marked >>>>>>>>> S flushed flushed/re-allocated >>>>>>>>> >>>>>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. >>>>>>>>> >>>>>>>>> Let's look at the following setting: >>>>>>>>> >>>>>>>>> state of A state of B >>>>>>>>> ----------------------------------------- >>>>>>>>> non-entrant >>>>>>>>> S [not on stack] >>>>>>>>> non-entrant >>>>>>>>> S zombie [not on stack] >>>>>>>>> zombie >>>>>>>>> S marked marked >>>>>>>>> S flushed flushed/re-allocated >>>>>>>>> >>>>>>>>> There are two problems here: >>>>>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, >>>>>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. >>>>>>>>> >>>>>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. >>>>>>>>> >>>>>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: >>>>>>>>> >>>>>>>>> state of A state of B >>>>>>>>> ----------------------------------------- >>>>>>>>> unloaded unloaded >>>>>>>>> S zombie zombie >>>>>>>>> S marked marked >>>>>>>>> S flushed flushed/re-allocated >>>>>>>>> >>>>>>>>> Again, we crash while flushing A. >>>>>>>>> >>>>>>>>> Solution: >>>>>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. >>>>>>>>> >>>>>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well. >>>>>>>>> >>>>>>>>> Testing: >>>>>>>>> - Executed failing tests for a week (still running) >>>>>>>>> - JPRT >>>>>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Tobias >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552): >>>>>>>>> >>>>>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 >>>>>>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': >>>>>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/ >>>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie >>>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies() >>>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation >>>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation >>>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed >>>>>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb >>>>>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated >>>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed >>>>>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 >>>>>>>>> enqueueing icholder 0x0000000800034a18 to be freed >>>>>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb >>>>>>>>> deleting icholder 0x0000000800034a18 >>>>>>>>> ## nof_mallocs = 211209, nof_frees = 105760 >>>>>>>>> ## memory stomp: >>>>>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18 >>>>>>>>> Header guard @0x00000008000349f8 is BROKEN >>>>>>>>> >>>>>> From claes.redestad at oracle.com Thu Aug 27 12:41:57 2015 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 27 Aug 2015 14:41:57 +0200 Subject: RFR(XS): 8134583: sun.management.HotspotCompilation should handle absence of per-thread perf counters Message-ID: <55DF0595.4020805@oracle.com> Hi, please review this patch to clean up and make sun.management.HotspotCompilation behave nice if the VM would decide to no longer expose per-compiler thread perf counters: webrev: http://cr.openjdk.java.net/~redestad/jdk9/8134583/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8134583 /Claes From nils.eliasson at oracle.com Thu Aug 27 12:51:23 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 27 Aug 2015 14:51:23 +0200 Subject: RFR(XS): 8134583: sun.management.HotspotCompilation should handle absence of per-thread perf counters In-Reply-To: <55DF0595.4020805@oracle.com> References: <55DF0595.4020805@oracle.com> Message-ID: <55DF07CB.7080206@oracle.com> Hi Claes, Looks good. Best regards, Nils (Not a reviewer) On 2015-08-27 14:41, Claes Redestad wrote: > Hi, > > please review this patch to clean up and make > sun.management.HotspotCompilation > behave nice if the VM would decide to no longer expose per-compiler > thread perf counters: > > webrev: http://cr.openjdk.java.net/~redestad/jdk9/8134583/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8134583 > > /Claes From tobias.hartmann at oracle.com Thu Aug 27 13:05:44 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 27 Aug 2015 15:05:44 +0200 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566AD5FAEB@DEWDFEMB19A.global.corp.sap> References: <55DEB7E0.6040002@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5FAEB@DEWDFEMB19A.global.corp.sap> Message-ID: <55DF0B28.4000901@oracle.com> Hi Martin, thanks for looking at this. On 27.08.2015 12:26, Doerr, Martin wrote: > Hi, > > one question about always avoiding transition stubs for unloaded nmethods in cleanup_inline_caches() came into my mind when taking a second look at the change. > When we transition to zombie state, this should be safe because it's guaranteed that no activations exist. > However, cleanup_inline_caches() is used at other places, too. > > The comment about the unloaded state only says "// there should be no activations". Is it guaranteed that there is no activation of an nmethod which is in unloaded state? > If not, the change may have undesired side effects. I think it is guaranteed that unloaded nmethods are not on the stack and it should be save to clean their ICs without a transition stub. We also rely on this assumption in the sweeper by directly flushing unloaded OSR nmethods (see line 644 in sweeper.cpp [1]). Maybe someone from the GC team (CC'ed) can clarify this. Thanks, Tobias [1] http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/src/share/vm/runtime/sweeper.cpp.html > > Besides this, the change looks good to me. > > Best regards, > Martin > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Donnerstag, 27. August 2015 09:10 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Doerr, Martin > Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper > > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8134493 > http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ > > Problem: > This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. > > Solution: > We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. > > As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): > > state of A state of B > ------------------------------- > not-entrant > S [not-on-stack] > S zombie > unloaded > > Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8075805 > > From claes.redestad at oracle.com Thu Aug 27 14:42:30 2015 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 27 Aug 2015 16:42:30 +0200 Subject: RFR(XS): 8134583: sun.management.HotspotCompilation should handle absence of per-thread perf counters In-Reply-To: <55DF0ECF.9000908@oracle.com> References: <55DF0595.4020805@oracle.com> <55DF0AD1.4080001@oracle.com> <55DF0D42.9040803@oracle.com> <55DF0ECF.9000908@oracle.com> Message-ID: <55DF21D6.5030407@oracle.com> Updated webrev after comments and discussion with Jaroslav: http://cr.openjdk.java.net/~redestad/jdk9/8134583/webrev.03 Changes: - convert 'threads' from array to list - simplified further by removing old code dealing with adapterThread /Claes On 2015-08-27 15:21, Jaroslav Bachorik wrote: > On 27.8.2015 15:14, Claes Redestad wrote: >> >> >> On 2015-08-27 15:04, Jaroslav Bachorik wrote: >>> Hi, >>> >>> On 27.8.2015 14:41, Claes Redestad wrote: >>>> Hi, >>>> >>>> please review this patch to clean up and make >>>> sun.management.HotspotCompilation >>>> behave nice if the VM would decide to no longer expose per-compiler >>>> thread perf counters: >>>> >>>> webrev: http://cr.openjdk.java.net/~redestad/jdk9/8134583/webrev.00/ >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8134583 >>> >>> When already changing this wouldn't it be easier to convert the >>> 'threads' variable to List and only add the info >>> for existing compilers threads (eg. not leaving NULL slots in the >>> array). >>> >>> In 'getCompilerThreadStats' method the 'threads' array is converted to >>> a list anyway. >> >> The CompilerThreadStat object needs to be created on demand (since it >> polls the underlying counters), thus we still need to maintain either an >> array or list of CompilerThreadInfo. Converting CompilerThreadInfo[] to >> a compact (or empty) List may or may not save a few >> bytes, but we'd still have to create a new list every time >> getCompilerThreadStats() is called. > > Right. Still could save some null value juggling by storing > CompilerThreadInfo instances into a list instead of an array. > > -JB- > >> >> /Claes >> >>> >>> -JB- >>> >>>> >>>> /Claes >>> >> > From aph at redhat.com Thu Aug 27 15:19:15 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Aug 2015 16:19:15 +0100 Subject: aarch64: C2 fast lock/unlock issues In-Reply-To: <55D79701.9090407@oracle.com> References: <55D79701.9090407@oracle.com> Message-ID: <55DF2A73.2060207@redhat.com> On 08/21/2015 10:24 PM, Vladimir Kozlov wrote: > Thank you for report and suggested fixes. > > CC to aarch64 port developers. May I push this, or do I need you to review it? Andrew. From tomasz.wojtowicz at intel.com Thu Aug 27 18:47:10 2015 From: tomasz.wojtowicz at intel.com (Wojtowicz, Tomasz) Date: Thu, 27 Aug 2015 18:47:10 +0000 Subject: RFR (M): 8134553: CRC32C implementations for Nehalem x86/amd64 & Westmere+ x86/amd64 Message-ID: <3616187E21868C40AD1B36D41D29F4C1368E6278@FMSMSX106.amr.corp.intel.com> I would like to contribute following change: Review details Review Title: CRC32C implementations for Nehalem x86/amd64 & Westmere+ x86/amd64 Review ID: #8134553 Diff: http://cr.openjdk.java.net/~mcberg/8134553/webrev.01/ Description: Efficient use of a crc32 hardware instruction by division of a problem to a predefined chunks of an increasing size and further by 3 to be computed hiding instruction latencies. x86 delivers up to 8x improvement vs. java library, amd64 stops at even more -> 16x. Performance data are attached to this message for your convenience. No regressions has been observed on hotspot/compiler x86_64. Link: https://bugs.openjdk.java.net/browse/JDK-8134553 Author: Tomasz, Wojtowicz -- Thank you, Tomek -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: perf.pdf Type: application/pdf Size: 41305 bytes Desc: perf.pdf URL: From vladimir.kozlov at oracle.com Fri Aug 28 02:51:01 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2015 19:51:01 -0700 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <55DEB7E0.6040002@oracle.com> References: <55DEB7E0.6040002@oracle.com> Message-ID: <55DFCC95.5080802@oracle.com> CodeCache::gc_epilogue() could be optimized more. When needs_cache_clean() is false we need to execute loop only in debug VM. Otherwise it looks good. Thanks, Vladimir On 8/27/15 12:10 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8134493 > http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ > > Problem: > This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. > > Solution: > We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. > > As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): > > state of A state of B > ------------------------------- > not-entrant > S [not-on-stack] > S zombie > unloaded > > Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8075805 > > From vladimir.kozlov at oracle.com Fri Aug 28 02:58:04 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2015 19:58:04 -0700 Subject: RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DDEA3A.8070603@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> Message-ID: <55DFCE3C.3040201@oracle.com> Looks good to me. Vladimir On 8/26/15 9:32 AM, Andrew Dinn wrote: > The following AArch64-only webrev against hs-comp (fix contributed by > Hui Sha of Linaro) fixes several problems with biased locking on AArch64 > > http://cr.openjdk.java.net/~adinn/8134322/webrev.00/ > > I have reviewed the patch. It requires one more AArch64 reviewer who can > also commit it to hs-comp (Andrew Haley?). > > When I tested it running netbeans with biased locking enabled it seemed > (just by feel) to improve performance. A follow up to enable biased > locking by default might be worth investigating. > > n.b. I built this on hs-comp on top of my (almost but not yet committed) > patch for 8080293 so that patch also gets shown in the webrev. The two > patches are independent and should both commit cleanly with or without > the other. > > > > > From vladimir.kozlov at oracle.com Fri Aug 28 03:00:56 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2015 20:00:56 -0700 Subject: aarch64: C2 fast lock/unlock issues In-Reply-To: <55DF2A73.2060207@redhat.com> References: <55D79701.9090407@oracle.com> <55DF2A73.2060207@redhat.com> Message-ID: <55DFCEE8.3060806@oracle.com> Reviewed. Push. We need at least one official *Reviewer* for all changesets. Thanks, Vladimir On 8/27/15 8:19 AM, Andrew Haley wrote: > On 08/21/2015 10:24 PM, Vladimir Kozlov wrote: >> Thank you for report and suggested fixes. >> >> CC to aarch64 port developers. > > May I push this, or do I need you to review it? > > Andrew. > From vladimir.kozlov at oracle.com Fri Aug 28 03:14:18 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2015 20:14:18 -0700 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566AD5FAEB@DEWDFEMB19A.global.corp.sap> References: <55DEB7E0.6040002@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116566AD5FAEB@DEWDFEMB19A.global.corp.sap> Message-ID: <55DFD20A.4060608@oracle.com> On 8/27/15 3:26 AM, Doerr, Martin wrote: > Hi, > > one question about always avoiding transition stubs for unloaded nmethods in cleanup_inline_caches() came into my mind when taking a second look at the change. > When we transition to zombie state, this should be safe because it's guaranteed that no activations exist. > However, cleanup_inline_caches() is used at other places, too. > > The comment about the unloaded state only says "// there should be no activations". Is it guaranteed that there is no activation of an nmethod which is in unloaded state? Yes, it is guaranteed. Method could be unloaded only if there were no any activations. Note, klass which holds method is unloaded too, so you can't use this method any more. Vladimir > If not, the change may have undesired side effects. > > Besides this, the change looks good to me. > > Best regards, > Martin > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Donnerstag, 27. August 2015 09:10 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Doerr, Martin > Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper > > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8134493 > http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ > > Problem: > This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. > > Solution: > We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. > > As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): > > state of A state of B > ------------------------------- > not-entrant > S [not-on-stack] > S zombie > unloaded > > Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8075805 > > From tobias.hartmann at oracle.com Fri Aug 28 09:47:57 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 28 Aug 2015 11:47:57 +0200 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <55DFCC95.5080802@oracle.com> References: <55DEB7E0.6040002@oracle.com> <55DFCC95.5080802@oracle.com> Message-ID: <55E02E4D.5010105@oracle.com> Thanks, Vladimir. On 28.08.2015 04:51, Vladimir Kozlov wrote: > CodeCache::gc_epilogue() could be optimized more. When needs_cache_clean() is false we need to execute loop only in debug VM. Right, I changed the implementation to only execute the loop in the product VM if needs_cache_clean() is set and always execute it in the debug VM. Like this we also save the needs_cache_clean() checks in each loop iteration in the product VM. I changed nmethod::can_not_entrant_be_converted() to can_convert_to_zombie() because the name was misleading and caused some misconceptions in the past. http://cr.openjdk.java.net/~thartmann/8134493/webrev.01/ Best, Tobias > > Otherwise it looks good. > > Thanks, > Vladimir > > On 8/27/15 12:10 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8134493 >> http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ >> >> Problem: >> This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. >> >> Solution: >> We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. >> >> As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): >> >> state of A state of B >> ------------------------------- >> not-entrant >> S [not-on-stack] >> S zombie >> unloaded >> >> Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8075805 >> >> From aleksey.shipilev at oracle.com Fri Aug 28 10:12:20 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 28 Aug 2015 13:12:20 +0300 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat Message-ID: <55E03404.2090405@oracle.com> Hi, I would like to see this one fixed, because it touches on pending String improvements in JDK: https://bugs.openjdk.java.net/browse/JDK-8076758 Here is a proof-of-concept patch: http://cr.openjdk.java.net/~shade/8076758/webrev.00/ It passes JPRT and fixes the performance issues in microbenchmarks, but I'm not sure the code change is completely correct, given the history of OptimizeStringConcat bugs. Could anyone from a compiler team chime in? Thanks! -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.westrelin at oracle.com Fri Aug 28 13:15:08 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 28 Aug 2015 15:15:08 +0200 Subject: RFR(XS): 8134288 compiler/runtime/6859338/Test6859338.java crashes in PhaseIdealLoop::try_move_store_after_loop In-Reply-To: <55DDEA48.8060601@oracle.com> References: <81A5D258-6987-4B13-8F99-9B13FB4C9C7D@oracle.com> <55DDEA48.8060601@oracle.com> Message-ID: <881E5D92-9C3B-488D-805C-531E96EC726B@oracle.com> Thanks for the review, Vladimir. Roland. > On Aug 26, 2015, at 6:33 PM, Vladimir Kozlov wrote: > > Looks fine. > > Thanks, > Vladimir > > On 8/26/15 4:31 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8134288/webrev.00/ >> >> Stores from code generated by c2 to update profiling (profile_taken_branch() called from Parse::do_if() if ProfileInterpreter is off)) doesn?t have a control. This looks like a corner case so I went for the simplest fix and excluded stores with no controls from the logic that tries to move stores out of loops. >> >> Roland. >> From aph at redhat.com Fri Aug 28 15:21:52 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Aug 2015 16:21:52 +0100 Subject: RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DDEA3A.8070603@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> Message-ID: <55E07C90.6000905@redhat.com> On 08/26/2015 05:32 PM, Andrew Dinn wrote: > n.b. I built this on hs-comp on top of my (almost but not yet committed) > patch for 8080293 so that patch also gets shown in the webrev. The two > patches are independent and should both commit cleanly with or without > the other. Sure, but I have no way to get the changesets. All that is in the webrev is one patch. Please send me the changesets from "hg export" Andrew. From vladimir.kozlov at oracle.com Fri Aug 28 16:28:15 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Aug 2015 09:28:15 -0700 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <55E02E4D.5010105@oracle.com> References: <55DEB7E0.6040002@oracle.com> <55DFCC95.5080802@oracle.com> <55E02E4D.5010105@oracle.com> Message-ID: <55E08C1F.6090809@oracle.com> Nice! Looks good. Thanks, Vladimir On 8/28/15 2:47 AM, Tobias Hartmann wrote: > Thanks, Vladimir. > > On 28.08.2015 04:51, Vladimir Kozlov wrote: >> CodeCache::gc_epilogue() could be optimized more. When needs_cache_clean() is false we need to execute loop only in debug VM. > > Right, I changed the implementation to only execute the loop in the product VM if needs_cache_clean() is set and always execute it in the debug VM. Like this we also save the needs_cache_clean() checks in each loop iteration in the product VM. > > I changed nmethod::can_not_entrant_be_converted() to can_convert_to_zombie() because the name was misleading and caused some misconceptions in the past. > > http://cr.openjdk.java.net/~thartmann/8134493/webrev.01/ > > Best, > Tobias > >> >> Otherwise it looks good. >> >> Thanks, >> Vladimir >> >> On 8/27/15 12:10 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8134493 >>> http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ >>> >>> Problem: >>> This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. >>> >>> Solution: >>> We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. >>> >>> As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): >>> >>> state of A state of B >>> ------------------------------- >>> not-entrant >>> S [not-on-stack] >>> S zombie >>> unloaded >>> >>> Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8075805 >>> >>> From vladimir.kozlov at oracle.com Fri Aug 28 16:45:02 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 28 Aug 2015 09:45:02 -0700 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat In-Reply-To: <55E03404.2090405@oracle.com> References: <55E03404.2090405@oracle.com> Message-ID: <55E0900E.50309@oracle.com> Tobias should look on this. I am fine with the idea. I think these are related to0: https://bugs.openjdk.java.net/browse/JDK-6969165 https://bugs.openjdk.java.net/browse/JDK-7179968 Thanks, Vladimir On 8/28/15 3:12 AM, Aleksey Shipilev wrote: > Hi, > > I would like to see this one fixed, because it touches on pending String > improvements in JDK: > https://bugs.openjdk.java.net/browse/JDK-8076758 > > Here is a proof-of-concept patch: > http://cr.openjdk.java.net/~shade/8076758/webrev.00/ > > It passes JPRT and fixes the performance issues in microbenchmarks, but > I'm not sure the code change is completely correct, given the history of > OptimizeStringConcat bugs. Could anyone from a compiler team chime in? > > Thanks! > -Aleksey > From aleksey.shipilev at oracle.com Fri Aug 28 17:12:24 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 28 Aug 2015 20:12:24 +0300 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat In-Reply-To: <55E0900E.50309@oracle.com> References: <55E03404.2090405@oracle.com> <55E0900E.50309@oracle.com> Message-ID: <55E09678.6080007@oracle.com> Good, thanks! Waiting for Tobias. -Aleksey On 08/28/2015 07:45 PM, Vladimir Kozlov wrote: > Tobias should look on this. I am fine with the idea. > I think these are related to0: > > https://bugs.openjdk.java.net/browse/JDK-6969165 > https://bugs.openjdk.java.net/browse/JDK-7179968 > > Thanks, > Vladimir > > On 8/28/15 3:12 AM, Aleksey Shipilev wrote: >> Hi, >> >> I would like to see this one fixed, because it touches on pending String >> improvements in JDK: >> https://bugs.openjdk.java.net/browse/JDK-8076758 >> >> Here is a proof-of-concept patch: >> http://cr.openjdk.java.net/~shade/8076758/webrev.00/ >> >> It passes JPRT and fixes the performance issues in microbenchmarks, but >> I'm not sure the code change is completely correct, given the history of >> OptimizeStringConcat bugs. Could anyone from a compiler team chime in? >> >> Thanks! >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From christian.thalinger at oracle.com Fri Aug 28 17:19:28 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 28 Aug 2015 07:19:28 -1000 Subject: Safepointing in HotSpot In-Reply-To: <55C518B7.3010006@oracle.com> References: <55C518B7.3010006@oracle.com> Message-ID: It?s been 20 days so I?m sure you found out yourself already. If not, start with SafePointNode in callnode.hpp and work your way from there. > On Aug 7, 2015, at 10:44 AM, Ahmed Khawaja wrote: > > Greetings, > > I am looking into when HotSpot decides to insert code for safepointing. My goal is to understand the decision process of when a safepoint is inserted and also to relay to an analysis tool that a certain instruction was inserted due to safepointing. I am looking into what criteria merit the insertion of a safepoint and how code can be optimized to avoid that. Can anyone point me in the direction of the source code in HotSpot responsible for this? I am able to identify manually the code sequences that result in a safepoint and realize they must be inserted somewhere before code motion is applied since they don't always show up as contiguous instructions. > > Thank you, > Ahemd Khawaja From adinn at redhat.com Sat Aug 29 08:53:21 2015 From: adinn at redhat.com (Andrew Dinn) Date: Sat, 29 Aug 2015 09:53:21 +0100 Subject: RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55E07C90.6000905@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> <55E07C90.6000905@redhat.com> Message-ID: <55E17301.2050406@redhat.com> On 28/08/15 16:21, Andrew Haley wrote: > On 08/26/2015 05:32 PM, Andrew Dinn wrote: >> n.b. I built this on hs-comp on top of my (almost but not yet committed) >> patch for 8080293 so that patch also gets shown in the webrev. The two >> patches are independent and should both commit cleanly with or without >> the other. > > Sure, but I have no way to get the changesets. All that is > in the webrev is one patch. > > Please send me the changesets from "hg export" change set below regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- # HG changeset patch # User adinn # Date 1440605639 -3600 # Wed Aug 26 17:13:59 2015 +0100 # Node ID e699c7a83f8e9630f79ef83066a64d1e38c72ce2 # Parent d7185fbee7e5e0f40ae4286f29055bed5da6bd58 8134322: AArch64: Fix several errors in C2 biased locking implementation Summary: Several errors in C2 biased locking require fixing Reviewed-by: adinn Contributed-by: Hui Shi (hui.shi at linaro.org) diff -r d7185fbee7e5 -r e699c7a83f8e src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Tue Aug 25 08:25:38 2015 -0400 +++ b/src/cpu/aarch64/vm/aarch64.ad Wed Aug 26 17:13:59 2015 +0100 @@ -4896,12 +4896,12 @@ return; } - if (UseBiasedLocking) { - __ biased_locking_enter(disp_hdr, oop, box, tmp, true, cont); + if (UseBiasedLocking && !UseOptoBiasInlining) { + __ biased_locking_enter(box, oop, disp_hdr, tmp, true, cont); } // Handle existing monitor - if (EmitSync & 0x02) { + if ((EmitSync & 0x02) == 0) { // we can use AArch64's bit test and branch here but // markoopDesc does not define a bit index just the bit value // so assert in case the bit pos changes @@ -5041,7 +5041,7 @@ return; } - if (UseBiasedLocking) { + if (UseBiasedLocking && !UseOptoBiasInlining) { __ biased_locking_exit(oop, tmp, cont); } -------------- next part -------------- A non-text attachment was scrubbed... Name: fastlock.patch Type: text/x-patch Size: 1330 bytes Desc: not available URL: From felix.yang at linaro.org Sat Aug 29 13:13:27 2015 From: felix.yang at linaro.org (Felix Yang) Date: Sat, 29 Aug 2015 21:13:27 +0800 Subject: RFR: Disable C2 peephole by default for aarch64 Message-ID: Hi JIT members, Currently, the C2 peephole optimization is only enabled by default for x86 & aarch64 port. But we don't have any peephole rules for aarch64 port, scanning the instruction stream in PhasePeephole::do_transform does not make sense but a waste of time for this port. So I am disabling this pass for aarch64 port by default, can anyone sponsor this if approved? PATCH: diff -r a6acc533dfef src/cpu/aarch64/vm/c2_globals_aarch64.hpp --- a/src/cpu/aarch64/vm/c2_globals_aarch64.hpp Wed Aug 19 16:16:54 2015 +0100 +++ b/src/cpu/aarch64/vm/c2_globals_aarch64.hpp Fri Aug 28 09:28:07 2015 +0800 @@ -69,7 +69,7 @@ // Peephole and CISC spilling both break the graph, and so makes the // scheduler sick. -define_pd_global(bool, OptoPeephole, true); +define_pd_global(bool, OptoPeephole, false); define_pd_global(bool, UseCISCSpill, true); define_pd_global(bool, OptoScheduling, false); define_pd_global(bool, OptoBundling, false); -------------- next part -------------- An HTML attachment was scrubbed... URL: From crofevil at qq.com Sun Aug 30 16:04:36 2015 From: crofevil at qq.com (=?ISO-8859-1?B?Q3JvZmV2aWw=?=) Date: Mon, 31 Aug 2015 00:04:36 +0800 Subject: Redundant instructions MOVK on aarch64? Message-ID: Hi, all we found a underlying optimization about MOVK with jdk9 build on aarch64 platform, but we don't know it can be done or not, so here we ask for some help or guidance, thanks! We may get a situation like this when try to move ptr or generate stub code. This situation was found in SpecJBB 2005: 0x000003ff8c313f90: mov x4, #0x1610 // #5648 ; {metadata('java/lang/String')} 0x000003ff8c313f94: movk x4, #0x0, lsl, #16 0x000003ff8c313f98: movk x4, #0x8, lsl, #32 0x000003ff8c313f98: ldr w8, [x2, #8] We see a sequence of MOV & MOVK. The MOVK is useless if it try to move 0 to the register, it doesn't change anything. The sequence was generated by function MacroAssembler::movptr: // Move a constant pointer into r. In AArch64 mode the virtual // address space is 48 bits in size, so we only need three // instruction to create a patchable instruction sequence that can // reach anywhere. void MacroAssembler::movptr(Register r, uintptr_t imm64) { #ifndef PRODUCT { char buffer[64]; snprintf(buffer, sizeof(buffer), "0x%"PRIX64, imm64); block_comment(buffer); } #endif assert(imm64 < (1ul << 48), "48-bit overflow in address constant"); movz(r, imm64 & oxffff); imm64 >>= 16; movk(r, imm64 & oxffff, 16); imm64 >>= 16; movk(r, imm64 & oxffff, 32); } We can't simply remove the MOVK in the function because JVM may patch the ptr in later process: int MacroAssembler::patch_oop(address insn_addr, address o) { ...... } else { // move wide OOP assert(nativeInstruction_at(insn_addr+8)->is_movk(), "wrong insns is patch"); uintptr_t dest = (uintptr_t)o; Instruction_aarch64::patch(insn_addr, 20, 5, dest & 0xffff); Instruction_aarch64::patch(insn_addr+4, 20, 5, (dest >>= 16) & 0xffff); Instruction_aarch64::patch(insn_addr+8, 20, 5, (dest >>= 16) & 0xffff); instructions = 3; } return instructions * NativeInstruction::instruction_size; } It means that the initial value of this ptr may be 0 and it will be changed to a real address later. Then we can't remove the MOVK because we don't know whent it will be patched. I made a simple test - replace MOVK to NOP when the patched address is 0: /* [JVM] Replace MOVK to NOP if the address we patched is 0. */ static inline void patch_movk(address a, int msb, int lsb, unsigned long val) { #define aarch64_NOP (0xd503201f) if (val == 0) { Instruction_aarch64::patch(a, 31, 0, aarch64_NOP); } else { Instruction_aarch64::patch(a, msb, lsb, val); } } Meanwhile I modifed some asserts. But it still causes a JVM crash, I think I missed something. Can these MOVK be eliminated? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hui.shi at linaro.org Mon Aug 31 00:35:09 2015 From: hui.shi at linaro.org (=?utf-8?B?aHVpLnNoaQ==?=) Date: Mon, 31 Aug 2015 08:35:09 +0800 Subject: =?utf-8?B?5Zue5aSN77yaUmVkdW5kYW50IGluc3RydWN0aW9u?= =?utf-8?B?cyBNT1ZLIG9uIGFhcmNoNjQ/?= References: Message-ID: An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Mon Aug 31 04:49:57 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 31 Aug 2015 06:49:57 +0200 Subject: [9] RFR(S): 8134493: Cleaning inline caches of unloaded nmethods should be done in sweeper In-Reply-To: <55E08C1F.6090809@oracle.com> References: <55DEB7E0.6040002@oracle.com> <55DFCC95.5080802@oracle.com> <55E02E4D.5010105@oracle.com> <55E08C1F.6090809@oracle.com> Message-ID: <55E3DCF5.8020101@oracle.com> Thanks, Vladimir! Best, Tobias On 28.08.2015 18:28, Vladimir Kozlov wrote: > Nice! Looks good. > > Thanks, > Vladimir > > On 8/28/15 2:47 AM, Tobias Hartmann wrote: >> Thanks, Vladimir. >> >> On 28.08.2015 04:51, Vladimir Kozlov wrote: >>> CodeCache::gc_epilogue() could be optimized more. When needs_cache_clean() is false we need to execute loop only in debug VM. >> >> Right, I changed the implementation to only execute the loop in the product VM if needs_cache_clean() is set and always execute it in the debug VM. Like this we also save the needs_cache_clean() checks in each loop iteration in the product VM. >> >> I changed nmethod::can_not_entrant_be_converted() to can_convert_to_zombie() because the name was misleading and caused some misconceptions in the past. >> >> http://cr.openjdk.java.net/~thartmann/8134493/webrev.01/ >> >> Best, >> Tobias >> >>> >>> Otherwise it looks good. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/27/15 12:10 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8134493 >>>> http://cr.openjdk.java.net/~thartmann/8134493/webrev.00/ >>>> >>>> Problem: >>>> This is a follow up on JDK-8075805 [1] which modified CodeCache::gc_epilogue() to clean the ICs of unloaded nmethods as well. The problem is that this code is executed at a safepoint and may affect safepoint duration. The other changes of JDK-8134493 are fine. >>>> >>>> Solution: >>>> We do the cleaning of unloaded nmethods at the unloaded -> zombie transition in the sweeper. I also modified nmethod::cleanup_inline_caches() to not emit any transition stubs if the nmethod is already dead. >>>> >>>> As Martin Doerr pointed out in another thread, we have to be careful with accessing CompiledIC::cached_metadata() of unloaded nmethods. For example, the following scenario may happen (IC of A references B): >>>> >>>> state of A state of B >>>> ------------------------------- >>>> not-entrant >>>> S [not-on-stack] >>>> S zombie >>>> unloaded >>>> >>>> Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. I fixed this by checking caller->is_alive(). >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8075805 >>>> >>>> From tobias.hartmann at oracle.com Mon Aug 31 05:24:51 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 31 Aug 2015 07:24:51 +0200 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat In-Reply-To: <55E03404.2090405@oracle.com> References: <55E03404.2090405@oracle.com> Message-ID: <55E3E523.9020509@oracle.com> Hi Aleksey, I don't think you need the has_offset_field() check because the String.offset field was removed long time ago. We're going to remove the legacy VM side code with the compact strings JEP [1]. Otherwise it looks good. Please also execute the following test [2] from compact strings because it contains some string concat correctness checks. Best, Tobias [1] http://hg.openjdk.java.net/jdk9/sandbox/hotspot/rev/2da46f06bfba [2[ http://hg.openjdk.java.net/jdk9/sandbox/hotspot/file/b50bebd0085b/test/compiler/intrinsics/string/TestStringIntrinsics.java On 28.08.2015 12:12, Aleksey Shipilev wrote: > Hi, > > I would like to see this one fixed, because it touches on pending String > improvements in JDK: > https://bugs.openjdk.java.net/browse/JDK-8076758 > > Here is a proof-of-concept patch: > http://cr.openjdk.java.net/~shade/8076758/webrev.00/ > > It passes JPRT and fixes the performance issues in microbenchmarks, but > I'm not sure the code change is completely correct, given the history of > OptimizeStringConcat bugs. Could anyone from a compiler team chime in? > > Thanks! > -Aleksey > From aleksey.shipilev at oracle.com Mon Aug 31 11:32:37 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 31 Aug 2015 14:32:37 +0300 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat In-Reply-To: <55E3E523.9020509@oracle.com> References: <55E03404.2090405@oracle.com> <55E3E523.9020509@oracle.com> Message-ID: <55E43B55.6080402@oracle.com> Thanks for taking a look, Tobias! On 08/31/2015 08:24 AM, Tobias Hartmann wrote: > I don't think you need the has_offset_field() check because the > String.offset field was removed long time ago. We're going to remove > the legacy VM side code with the compact strings JEP [1]. Okay, removed. I actually wondered about that, but decided to make the code synonymous to the existing one. Anyway: http://cr.openjdk.java.net/~shade/8076758/webrev.01/ > Otherwise it looks good. Please also execute the following test [2] > from compact strings because it contains some string concat > correctness checks. Done, the test passes. > [1] http://hg.openjdk.java.net/jdk9/sandbox/hotspot/rev/2da46f06bfba > [2[ http://hg.openjdk.java.net/jdk9/sandbox/hotspot/file/b50bebd0085b/test/compiler/intrinsics/string/TestStringIntrinsics.java Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Mon Aug 31 13:33:39 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 31 Aug 2015 15:33:39 +0200 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat In-Reply-To: <55E43B55.6080402@oracle.com> References: <55E03404.2090405@oracle.com> <55E3E523.9020509@oracle.com> <55E43B55.6080402@oracle.com> Message-ID: <55E457B3.2020208@oracle.com> Hi Aleksey, looks good. Best, Tobias On 31.08.2015 13:32, Aleksey Shipilev wrote: > Thanks for taking a look, Tobias! > > On 08/31/2015 08:24 AM, Tobias Hartmann wrote: >> I don't think you need the has_offset_field() check because the >> String.offset field was removed long time ago. We're going to remove >> the legacy VM side code with the compact strings JEP [1]. > > Okay, removed. I actually wondered about that, but decided to make the > code synonymous to the existing one. Anyway: > http://cr.openjdk.java.net/~shade/8076758/webrev.01/ > >> Otherwise it looks good. Please also execute the following test [2] >> from compact strings because it contains some string concat >> correctness checks. > > Done, the test passes. > >> [1] http://hg.openjdk.java.net/jdk9/sandbox/hotspot/rev/2da46f06bfba >> [2[ http://hg.openjdk.java.net/jdk9/sandbox/hotspot/file/b50bebd0085b/test/compiler/intrinsics/string/TestStringIntrinsics.java > > Thanks, > -Aleksey > > From aleksey.shipilev at oracle.com Mon Aug 31 14:59:15 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 31 Aug 2015 17:59:15 +0300 Subject: RFR (S) 8134758: Final String field values should be trusted as stable Message-ID: <55E46BC3.3040203@oracle.com> Hi, I would like to make a forward move and make VM to trust all final String fields. I cannot quickly find the scenario where it helps current JDK -- there is only String.value field, which components are not treated as constants anyway. But, it helps a lot the upcoming Compact Strings change, which introduces String.coder field. String.value is actually handled as stable in the GraphKit with UseImplicitStableValues, but it does not affect "normal" Java code. Therefore, in a way, this change extends the same behavior to the normal code. See more here: https://bugs.openjdk.java.net/browse/JDK-8134758 Here is a patch: http://cr.openjdk.java.net/~shade/8134758/webrev.00/ Passes JPRT and eyeballed assembly looks fine on Linux x86_64. Does the change itself look generic enough to consider straight in the mainline? Otherwise, we can keep it in Compact Strings sandbox, but it will eventually arrive back entangled in a much larger code change, and shall still require review. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Mon Aug 31 15:02:42 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 31 Aug 2015 18:02:42 +0300 Subject: RFC (S) 8076758: new StringBuilder().append(String).toString() should be recognized by OptimizeStringConcat In-Reply-To: <55E457B3.2020208@oracle.com> References: <55E03404.2090405@oracle.com> <55E3E523.9020509@oracle.com> <55E43B55.6080402@oracle.com> <55E457B3.2020208@oracle.com> Message-ID: <55E46C92.4050703@oracle.com> Thank you, Tobias! Looking for sponsors. -Aleksey On 08/31/2015 04:33 PM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good. > > Best, > Tobias > > On 31.08.2015 13:32, Aleksey Shipilev wrote: >> Thanks for taking a look, Tobias! >> >> On 08/31/2015 08:24 AM, Tobias Hartmann wrote: >>> I don't think you need the has_offset_field() check because the >>> String.offset field was removed long time ago. We're going to remove >>> the legacy VM side code with the compact strings JEP [1]. >> >> Okay, removed. I actually wondered about that, but decided to make the >> code synonymous to the existing one. Anyway: >> http://cr.openjdk.java.net/~shade/8076758/webrev.01/ >> >>> Otherwise it looks good. Please also execute the following test [2] >>> from compact strings because it contains some string concat >>> correctness checks. >> >> Done, the test passes. >> >>> [1] http://hg.openjdk.java.net/jdk9/sandbox/hotspot/rev/2da46f06bfba >>> [2[ http://hg.openjdk.java.net/jdk9/sandbox/hotspot/file/b50bebd0085b/test/compiler/intrinsics/string/TestStringIntrinsics.java >> >> Thanks, >> -Aleksey >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.westrelin at oracle.com Mon Aug 31 15:33:04 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 31 Aug 2015 17:33:04 +0200 Subject: RFR(S): 8134031: Incorrect JIT compilation of complex code with inlining and escape analysis Message-ID: That bug is caused by a bad rewiring of memory edges during ConnectionGraph::split_unique_types() http://cr.openjdk.java.net/~roland/8134031/webrev.00/ See test case. Before EA, the ?test? method has 3 stores to 3 different Box Objects: 108 StoreI === 97 47 107 12 [[ 110 ]] @TestEABadMergeMem$Box+12 *, name=i, idx=4; Memory: @TestEABadMergeMem$Box:NotNull+12 *, name=i, idx=4; !jvms: TestEABadMergeMem::test @ bci:11 110 StoreI === 97 108 109 13 [[ 125 113 ]] @TestEABadMergeMem$Box+12 *, name=i, idx=4; Memory: @TestEABadMergeMem$Box:NotNull:exact+12 *, name=i, idx=4; !jvms: TestEABadMergeMem::test @ bci:17 125 StoreI === 116 110 124 14 [[ 57 ]] @TestEABadMergeMem$Box+12 *, name=i, idx=4; Memory: @TestEABadMergeMem$Box:NotNull+12 *, name=i, idx=4; !jvms: TestEABadMergeMem::test @ bci:23 chained through their memory inputs. It also has 2 loads from 2 different Box objects: 248 LoadI === _ 233 109 [[ 249 ]] @TestEABadMergeMem$Box+12 *, name=i, idx=4; #int !jvms: TestEABadMergeMem::test @ bci:87 246 LoadI === _ 233 124 [[ 249 ]] @TestEABadMergeMem$Box+12 *, name=i, idx=4; #int !jvms: TestEABadMergeMem::test @ bci:82 The memory input of both loads is a Phi: 233 Phi === 153 274 129 [[ 150 21 246 248 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestEABadMergeMem::test @ bci:76 Input 1 of the Phi is: 274 MergeMem === _ 1 129 1 1 1 1 275 276 [[ 233 ]] { - - - - N275:java/lang/Class:exact+104 * N276:java/lang/Class:exact+108 * } Memory: @BotPTR *+bot, idx=Bot; !orig=[196] !jvms: TestEABadMergeMem::test @ bci:47 The Box that is allocated in the ?test? method doesn?t escape. ConnectionGraph::split_unique_types() assigns a unique type to that allocation and rewires the memory edges. In: // Phase 2: Process MemNode's from memnode_worklist. compute new address type and // compute new values for Memory inputs (the Memory inputs are not // actually updated until phase 4.) LoadI 248 is processed with: Node *mem = find_inst_mem(n->in(MemNode::Memory), alias_idx, orig_phis); which goes through the Phi 233 to MergeMem 274 and in ConnectionGraph::find_inst_mem(), an edge is added to the MergeMem node: result = find_inst_mem(result, alias_idx, orig_phis); mmem->set_memory_at(alias_idx, result); 274 MergeMem === _ 1 129 1 1 1 1 275 276 110 [[ 233 ]] { - - - - N275:java/lang/Class:exact+104 * N276:java/lang/Class:exact+108 * N110:TestEABadMergeMem$Box:NotNull:exact+12 *,iid=32 } Memory: @BotPTR *+bot, idx=Bot; !orig=[196] !jvms: TestEABadMer\ geMem::test @ bci:47 for the new unique type. In: // Phase 3: Process MergeMem nodes from mergemem_worklist. // Walk each memory slice moving the first node encountered of each // instance type to the the input corresponding to its alias index. MergeMem 274 is processed. We go over all of its inputs, including the newly added one above. When that input is processed, the: while (mem->is_Mem()) { loop iterates until store 108 (110 is for the new unique type) and the MergeMem is updated again: MergeMem === _ 1 129 1 108 1 1 275 276 110 [[ 233 ]] { - N108:TestEABadMergeMem$Box+12 * - - N275:java/lang/Class:exact+104 * N276:java/lang/Class:exact+108 * N110:TestEABadMergeMem$Box:NotNull:exact+12 *,iid=32 } Memory: @BotPTR *+bot, idx=Bot;\ !orig=[196] !jvms: TestEABadMergeMem::test @ bci:47 But now the MergeMem is incorrect: LoadI 246 now takes its memory state from that MergeMem through Phi 233 and for LoadI 246, the memory state is StoreI 108 which is before StoreI 125 that sets the field. The compiler then uses split through Phi to optimize the if (flag3) { test in the test case which causes LoadI 246 to move through Phi 233 and a bad value to be loaded. I think the root cause is that we process the input added to the MergeMem in Phase 2 in Phase 3. The fix I propose prevents that. Roland. From vladimir.kozlov at oracle.com Mon Aug 31 16:25:33 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 31 Aug 2015 09:25:33 -0700 Subject: RFR (S) 8134758: Final String field values should be trusted as stable In-Reply-To: <55E46BC3.3040203@oracle.com> References: <55E46BC3.3040203@oracle.com> Message-ID: <55E47FFD.7050805@oracle.com> On 8/31/15 7:59 AM, Aleksey Shipilev wrote: > Hi, > > I would like to make a forward move and make VM to trust all final > String fields. I cannot quickly find the scenario where it helps current > JDK -- there is only String.value field, which components are not > treated as constants anyway. But, it helps a lot the upcoming Compact > Strings change, which introduces String.coder field. Reflection? > > String.value is actually handled as stable in the GraphKit with > UseImplicitStableValues, but it does not affect "normal" Java code. What do you mean by "normal" code? > Therefore, in a way, this change extends the same behavior to the normal > code. See more here: > https://bugs.openjdk.java.net/browse/JDK-8134758 > > Here is a patch: > http://cr.openjdk.java.net/~shade/8134758/webrev.00/ Do we still need UseImplicitStableValues code in GraphKit::load_String_value() and library_call.cpp? Thanks, Vladimir > > Passes JPRT and eyeballed assembly looks fine on Linux x86_64. > > Does the change itself look generic enough to consider straight in the > mainline? Otherwise, we can keep it in Compact Strings sandbox, but it > will eventually arrive back entangled in a much larger code change, and > shall still require review. > > Thanks, > -Aleksey > From aleksey.shipilev at oracle.com Mon Aug 31 17:15:51 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 31 Aug 2015 20:15:51 +0300 Subject: RFR (S) 8134758: Final String field values should be trusted as stable In-Reply-To: <55E47FFD.7050805@oracle.com> References: <55E46BC3.3040203@oracle.com> <55E47FFD.7050805@oracle.com> Message-ID: <55E48BC7.5080307@oracle.com> On 08/31/2015 07:25 PM, Vladimir Kozlov wrote: > On 8/31/15 7:59 AM, Aleksey Shipilev wrote: >> I would like to make a forward move and make VM to trust all final >> String fields. I cannot quickly find the scenario where it helps current >> JDK -- there is only String.value field, which components are not >> treated as constants anyway. But, it helps a lot the upcoming Compact >> Strings change, which introduces String.coder field. > > Reflection? Sorry? In Compact Strings, String.coder field defines how to interpret String.value byte[] array. See the example in the bug: public int length() { return value.length >> coder; } Ah, and if you worry about changing the field via Reflection, then like String.value, which is already treated as "stable" by intrinsics, String.coder is not supposed to be changed via Reflection (that is, without potentially devastating consequences). Strings are also deserialized via StringBuilder, so we are safe there as well. >> >> String.value is actually handled as stable in the GraphKit with >> UseImplicitStableValues, but it does not affect "normal" Java code. > > What do you mean by "normal" code? The code that is not handled by intrinsics on any other special explicit magic in a compiler. Just plain Java code, like the length() method above. How do you call it? >> Therefore, in a way, this change extends the same behavior to the normal >> code. See more here: >> https://bugs.openjdk.java.net/browse/JDK-8134758 >> >> Here is a patch: >> http://cr.openjdk.java.net/~shade/8134758/webrev.00/ > > Do we still need UseImplicitStableValues code in > GraphKit::load_String_value() and library_call.cpp? I think so. These seem to enforce that array components are also treated as constants? Vladimir I., can you take a look? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Mon Aug 31 17:25:59 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 31 Aug 2015 10:25:59 -0700 Subject: RFR (S) 8134758: Final String field values should be trusted as stable In-Reply-To: <55E48BC7.5080307@oracle.com> References: <55E46BC3.3040203@oracle.com> <55E47FFD.7050805@oracle.com> <55E48BC7.5080307@oracle.com> Message-ID: <55E48E27.1010204@oracle.com> On 8/31/15 10:15 AM, Aleksey Shipilev wrote: > On 08/31/2015 07:25 PM, Vladimir Kozlov wrote: >> On 8/31/15 7:59 AM, Aleksey Shipilev wrote: >>> I would like to make a forward move and make VM to trust all final >>> String fields. I cannot quickly find the scenario where it helps current >>> JDK -- there is only String.value field, which components are not >>> treated as constants anyway. But, it helps a lot the upcoming Compact >>> Strings change, which introduces String.coder field. >> >> Reflection? > > Sorry? In Compact Strings, String.coder field defines how to interpret > String.value byte[] array. See the example in the bug: > > public int length() { > return value.length >> coder; > } > > Ah, and if you worry about changing the field via Reflection, then like > String.value, which is already treated as "stable" by intrinsics, > String.coder is not supposed to be changed via Reflection (that is, > without potentially devastating consequences). Strings are also > deserialized via StringBuilder, so we are safe there as well. Yes, I meant changing fields. Thank you for explaining. > >>> >>> String.value is actually handled as stable in the GraphKit with >>> UseImplicitStableValues, but it does not affect "normal" Java code. >> >> What do you mean by "normal" code? > > The code that is not handled by intrinsics on any other special explicit > magic in a compiler. Just plain Java code, like the length() method > above. How do you call it? Got it. LoadNode::Value() has special case for String fields when string is constant. We can add UseImplicitStableValues code there for non-constant String. Thanks, Vladimir > > >>> Therefore, in a way, this change extends the same behavior to the normal >>> code. See more here: >>> https://bugs.openjdk.java.net/browse/JDK-8134758 >>> >>> Here is a patch: >>> http://cr.openjdk.java.net/~shade/8134758/webrev.00/ >> >> Do we still need UseImplicitStableValues code in >> GraphKit::load_String_value() and library_call.cpp? > > I think so. These seem to enforce that array components are also treated > as constants? Vladimir I., can you take a look? > > Thanks, > -Aleksey > > From tomasz.wojtowicz at intel.com Mon Aug 31 17:57:26 2015 From: tomasz.wojtowicz at intel.com (Wojtowicz, Tomasz) Date: Mon, 31 Aug 2015 17:57:26 +0000 Subject: assembler_solaris_x86.cpp uses r8-r11 for 32-bit compilation Message-ID: <3616187E21868C40AD1B36D41D29F4C136900398@FMSMSX106.amr.corp.intel.com> Hi, Shouldn't src/os_cpu/solaris_x86/vm/assembler_solaris_x86.cpp http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/9df4555d2d7d/src/os_cpu/solaris_x86/vm/assembler_solaris_x86.cpp void MacroAssembler::get_thread(Register thread) { be included (at least part) under #ifdef depending on bit width of a compiled target? I see push(r8); push(r9); push(r10); push(r11); for registers which are not defined for 32-bit which is causing compilation error and subsequent failure. -- Thank you, Tomek -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleksey.shipilev at oracle.com Mon Aug 31 20:36:21 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 31 Aug 2015 23:36:21 +0300 Subject: RFR (S) 8134758: Final String field values should be trusted as stable In-Reply-To: <55E48E27.1010204@oracle.com> References: <55E46BC3.3040203@oracle.com> <55E47FFD.7050805@oracle.com> <55E48BC7.5080307@oracle.com> <55E48E27.1010204@oracle.com> Message-ID: <55E4BAC5.4040904@oracle.com> On 08/31/2015 08:25 PM, Vladimir Kozlov wrote: > On 8/31/15 10:15 AM, Aleksey Shipilev wrote: >> On 08/31/2015 07:25 PM, Vladimir Kozlov wrote: >>> On 8/31/15 7:59 AM, Aleksey Shipilev wrote: >>>> String.value is actually handled as stable in the GraphKit with >>>> UseImplicitStableValues, but it does not affect "normal" Java code. >>> >>> What do you mean by "normal" code? >> >> The code that is not handled by intrinsics on any other special explicit >> magic in a compiler. Just plain Java code, like the length() method >> above. How do you call it? > > Got it. LoadNode::Value() has special case for String fields when string > is constant. We can add UseImplicitStableValues code there for > non-constant String. Ohhhh, thanks! In fact, that where it should be fixed. Current JDK 9 code captures T_INT constants there, but we need T_BYTE as well. Compact Strings version has even more constant types handled, but not T_BYTE [1] (this should be the leftovers from CompressedStrings). Tried to hack T_BYTE there for Compact Strings, and the same performance effect was achieved. I'll see how and where to fix this properly. Tentatively, I'd say all types should be handled in the mainline version, and Compact Strings should purge its own version, to avoid further omissions. Thanks, -Aleksey [1] http://hg.openjdk.java.net/jdk9/sandbox/hotspot/file/c3a11189c852/src/share/vm/opto/memnode.cpp#l1732 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Mon Aug 31 23:33:45 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 31 Aug 2015 16:33:45 -0700 Subject: assembler_solaris_x86.cpp uses r8-r11 for 32-bit compilation In-Reply-To: <3616187E21868C40AD1B36D41D29F4C136900398@FMSMSX106.amr.corp.intel.com> References: <3616187E21868C40AD1B36D41D29F4C136900398@FMSMSX106.amr.corp.intel.com> Message-ID: <55E4E459.5080400@oracle.com> Hi, Tomek Note that Oracle does not support 32-bit JDK(JVM) on Solaris since jdk8. Why you need 32-bit JVM on Solaris? As much as we want 8130212 changes be done for 32-bit too but it is not as simple as it looks. Putting #ifdef is not enough for 32-bit code works correctly. And it will rot anyway since we don't test it. Regards, Vladimir On 8/31/15 10:57 AM, Wojtowicz, Tomasz wrote: > Hi, > > Shouldn?t src/os_cpu/solaris_x86/vm/assembler_solaris_x86.cpp > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/9df4555d2d7d/src/os_cpu/solaris_x86/vm/assembler_solaris_x86.cpp > > void MacroAssembler::get_thread(Register thread) { > > be included (at least part) under #ifdef depending on bit width of a > compiled target? > > I see > > push(r8); > > push(r9); > > push(r10); > > push(r11); > > for registers which are not defined for 32-bit which is causing > compilation error and subsequent failure. > > -- > > Thank you, > > Tomek >