From vladimir.kozlov at oracle.com Mon Aug 2 14:57:25 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 02 Aug 2010 14:57:25 -0700 Subject: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA Message-ID: <4C573F45.4060309@oracle.com> http://cr.openjdk.java.net/~kvn/6973963/webrev Fixed 6973963: SEGV in ciBlock::start_bci() with EA I added stress recompilation during CompileTheWorld and found this case. It is similar to 6968368. BCEscapeAnalyzer::do_analysis() calls ciMethod::get_method_blocks() which calls constructor ciMethodBlocks. This constructor allocates GrowableArray elements on stack (thread local resource area). As result when the method recompiled without EA _blocks->_data is NULL. Solution: Added stress recompilation during CompileTheWorld: recompile with subsume_loads = false and do_escape_analysis = false. Added more checks into ResourceObj and growableArray to verify correctness of allocation. I have to relax the new assert in GrowableArray when elements are allocated on arena to allow allocattion of GrowableArray object as a part of an other object (for example, in ConnectionGraph and SuperWord). Tested with failed cases, CTW. From vladimir.kozlov at oracle.com Mon Aug 2 22:45:10 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 02 Aug 2010 22:45:10 -0700 Subject: [Fwd: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA] In-Reply-To: <4C574904.2050307@oracle.com> References: <4C57401E.5040204@oracle.com> <4C574904.2050307@oracle.com> Message-ID: <4C57ACE6.6090702@oracle.com> I updated webrev: http://cr.openjdk.java.net/~kvn/6973963/webrev.01 Additional changes: 1. Used new option StressRecompilation instead of CompileTheWorld. I am still not sure if it is OK since we never test our stress options. 2. Added new allocation type: STACK_OR_EMBEDDED for cases when new() is not called. 3. Added ResourceObj destructor to zap _allocation field. 4. Added assert into get_allocation_type() to check that 'this' address is still encoded in _allocation. Found several cases where it was not true, have to add copy constructor and assignment operator. 5. Moved all new methods with asserts into allocation.cpp. 6. The added assert failed for CodeBuffer since it destroys itself inside destructor before ResourceObj destructor called. Moved Copy::fill_to_bytes(badResourceValue) into CodeBuffer::operator delete(). 7. Replaced PhaseCFG::_node_latency field with pointer since it is valid only inside resource mark in GlobalCodeMotion(). Thanks, Vladimir Vladimir Kozlov wrote: > Thank you, Tom > > RESOURCE_AREA case combines 3 cases: > > GrowableArray *foo = new GrowableArray(size); // resource > array allocation > > GrowableArray foo; // stack allocation > > class A : public ResourceObj { > GrowableArray foo; // embedding allocation > } > > In all this cases GrowableArray::_data array is allocated in thread local > resource area. GrowableArray calls all these cases as stack allocation > and I followed this naming. > > But you are right I can separate 2 last cases into "stack" (stack or > embedding) > allocation, I still have 1 bit in allocation_type :) > > Thanks, > Vladimir > > Tom Rodriguez wrote: >> So doesn't this triple the amount of time taken by CTW? That seems >> kind of extreme for the default mode. Maybe it should be under a >> flag? The allocation.hpp changes look a little sketchy. >> + if (~(_allocation | allocation_mask) != (uintptr_t)this) { >> + set_allocation_type((address)this, RESOURCE_AREA); >> >> Why is this case called RESOURCE_AREA? Isn't this the stack or >> embedding case? >> >> This doesn't make sense either. >> >> ! bool allocated_on_stack() { return get_allocation_type() == >> RESOURCE_AREA; } >> >> Anyway, the existing logic around this seemed sketchy so I can't quite >> say whether this is better or not. I'll have to leave that to someone >> else. >> >> The GrowableArray changes themselves look fine. >> >> tom >> >> On Aug 2, 2010, at 3:01 PM, Vladimir Kozlov wrote: >> >>> Forwarding to GC and Runtime groups since it is common code. >>> >>> Vladimir >>> >>> -------- Original Message -------- >>> Subject: Request for reviews (M): 6973963: SEGV in >>> ciBlock::start_bci() with EA >>> Date: Mon, 02 Aug 2010 14:57:25 -0700 >>> From: Vladimir Kozlov >>> To: hotspot compiler >>> >>> http://cr.openjdk.java.net/~kvn/6973963/webrev >>> >>> Fixed 6973963: SEGV in ciBlock::start_bci() with EA >>> >>> I added stress recompilation during CompileTheWorld and found this case. >>> It is similar to 6968368. BCEscapeAnalyzer::do_analysis() calls >>> ciMethod::get_method_blocks() which calls constructor ciMethodBlocks. >>> This constructor allocates GrowableArray elements on stack (thread >>> local resource area). As result when the method recompiled without EA >>> _blocks->_data is NULL. >>> >>> Solution: >>> Added stress recompilation during CompileTheWorld: recompile with >>> subsume_loads = false and do_escape_analysis = false. >>> Added more checks into ResourceObj and growableArray to verify >>> correctness >>> of allocation. I have to relax the new assert in GrowableArray when >>> elements are allocated on arena to allow allocattion of GrowableArray >>> object >>> as a part of an other object (for example, in ConnectionGraph and >>> SuperWord). >>> >>> Tested with failed cases, CTW. >> From David.Holmes at oracle.com Mon Aug 2 23:23:12 2010 From: David.Holmes at oracle.com (David Holmes) Date: Tue, 03 Aug 2010 16:23:12 +1000 Subject: [Fwd: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA] In-Reply-To: <4C57ACE6.6090702@oracle.com> References: <4C57401E.5040204@oracle.com> <4C574904.2050307@oracle.com> <4C57ACE6.6090702@oracle.com> Message-ID: <4C57B5D0.4090807@oracle.com> Hi Vladimir, I always get nervous when people start tweaking these fundamental allocation classes - there are always unforeseen interactions. :) This is only a partial review. src/share/vm/asm/codeBuffer.cpp + void CodeBuffer::operator delete(void* p) { + ResourceObj::operator delete(p); #ifdef ASSERT ! Copy::fill_to_bytes(p, sizeof(CodeBuffer), badResourceValue); #endif } You moved the copy from the destructor to operator delete, but now you access p after you have deleted it. src/share/vm/utilities/growableArray.hpp + // on stack or ebedded into an other object. Typo: embedded src/share/vm/memory/allocation.hpp ! ~ResourceObj(); You've added a destructor, but subclasses define their own destructors - so doesn't this need to be virtual? src/share/vm/memory/allocation.cpp + #ifdef ASSERT + ((ResourceObj *)p)->_allocation = badHeapOopVal; + #endif // ASSERT For consistency this should be a DEBUG_ONLY(...) + assert((allocation & allocation_mask) == 0, ""); + assert(type <= allocation_mask, ""); Please give meaningful messages to assertion failures. David ----- Vladimir Kozlov said the following on 08/03/10 15:45: > I updated webrev: > > http://cr.openjdk.java.net/~kvn/6973963/webrev.01 > > Additional changes: > > 1. Used new option StressRecompilation instead of CompileTheWorld. > I am still not sure if it is OK since we never test our stress options. > > 2. Added new allocation type: STACK_OR_EMBEDDED for cases when new() is > not called. > > 3. Added ResourceObj destructor to zap _allocation field. > > 4. Added assert into get_allocation_type() to check that 'this' address > is still encoded in _allocation. > Found several cases where it was not true, have to add copy > constructor and assignment operator. > > 5. Moved all new methods with asserts into allocation.cpp. > > 6. The added assert failed for CodeBuffer since it destroys itself > inside destructor before ResourceObj destructor called. > Moved Copy::fill_to_bytes(badResourceValue) into CodeBuffer::operator > delete(). > > 7. Replaced PhaseCFG::_node_latency field with pointer since it is valid > only inside resource mark in GlobalCodeMotion(). > > Thanks, > Vladimir > > Vladimir Kozlov wrote: >> Thank you, Tom >> >> RESOURCE_AREA case combines 3 cases: >> >> GrowableArray *foo = new GrowableArray(size); // resource >> array allocation >> >> GrowableArray foo; // stack allocation >> >> class A : public ResourceObj { >> GrowableArray foo; // embedding allocation >> } >> >> In all this cases GrowableArray::_data array is allocated in thread local >> resource area. GrowableArray calls all these cases as stack allocation >> and I followed this naming. >> >> But you are right I can separate 2 last cases into "stack" (stack or >> embedding) >> allocation, I still have 1 bit in allocation_type :) >> >> Thanks, >> Vladimir >> >> Tom Rodriguez wrote: >>> So doesn't this triple the amount of time taken by CTW? That seems >>> kind of extreme for the default mode. Maybe it should be under a >>> flag? The allocation.hpp changes look a little sketchy. + if >>> (~(_allocation | allocation_mask) != (uintptr_t)this) { >>> + set_allocation_type((address)this, RESOURCE_AREA); >>> >>> Why is this case called RESOURCE_AREA? Isn't this the stack or >>> embedding case? >>> >>> This doesn't make sense either. >>> >>> ! bool allocated_on_stack() { return get_allocation_type() == >>> RESOURCE_AREA; } >>> >>> Anyway, the existing logic around this seemed sketchy so I can't >>> quite say whether this is better or not. I'll have to leave that to >>> someone else. >>> >>> The GrowableArray changes themselves look fine. >>> >>> tom >>> >>> On Aug 2, 2010, at 3:01 PM, Vladimir Kozlov wrote: >>> >>>> Forwarding to GC and Runtime groups since it is common code. >>>> >>>> Vladimir >>>> >>>> -------- Original Message -------- >>>> Subject: Request for reviews (M): 6973963: SEGV in >>>> ciBlock::start_bci() with EA >>>> Date: Mon, 02 Aug 2010 14:57:25 -0700 >>>> From: Vladimir Kozlov >>>> To: hotspot compiler >>>> >>>> http://cr.openjdk.java.net/~kvn/6973963/webrev >>>> >>>> Fixed 6973963: SEGV in ciBlock::start_bci() with EA >>>> >>>> I added stress recompilation during CompileTheWorld and found this >>>> case. >>>> It is similar to 6968368. BCEscapeAnalyzer::do_analysis() calls >>>> ciMethod::get_method_blocks() which calls constructor ciMethodBlocks. >>>> This constructor allocates GrowableArray elements on stack (thread >>>> local resource area). As result when the method recompiled without EA >>>> _blocks->_data is NULL. >>>> >>>> Solution: >>>> Added stress recompilation during CompileTheWorld: recompile with >>>> subsume_loads = false and do_escape_analysis = false. >>>> Added more checks into ResourceObj and growableArray to verify >>>> correctness >>>> of allocation. I have to relax the new assert in GrowableArray when >>>> elements are allocated on arena to allow allocattion of >>>> GrowableArray object >>>> as a part of an other object (for example, in ConnectionGraph and >>>> SuperWord). >>>> >>>> Tested with failed cases, CTW. >>> From vladimir.kozlov at oracle.com Tue Aug 3 02:33:28 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 03 Aug 2010 02:33:28 -0700 Subject: [Fwd: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA] In-Reply-To: <4C57B5D0.4090807@oracle.com> References: <4C57401E.5040204@oracle.com> <4C574904.2050307@oracle.com> <4C57ACE6.6090702@oracle.com> <4C57B5D0.4090807@oracle.com> Message-ID: <4C57E268.5060602@oracle.com> Thank you, Dave On 8/2/10 11:23 PM, David Holmes wrote: > Hi Vladimir, > > I always get nervous when people start tweaking these fundamental allocation classes - there are always unforeseen > interactions. :) I hate to touch it also. But my patience was broken when I found this problem. > > This is only a partial review. > > src/share/vm/asm/codeBuffer.cpp > > + void CodeBuffer::operator delete(void* p) { > + ResourceObj::operator delete(p); > #ifdef ASSERT > ! Copy::fill_to_bytes(p, sizeof(CodeBuffer), badResourceValue); > #endif > } > > You moved the copy from the destructor to operator delete, but now you access p after you have deleted it. Yes, you are right. But CodeBuffer is never allocated on C heap and ResourceObj::delete(p) should be called only for allocation on C_HEAP. So this code should never be executed, I will replace it with ShouldNotCallThis() in CodeBuffer::delete(void* p). Which leaves the original problem: I need to find how to zap CodeBuffer after ResourceObj destructor which is called after CodeBuffer destructor. Note: the problem here is not CodeBuffer but its field: OopRecorder _default_oop_recorder; > > src/share/vm/utilities/growableArray.hpp > > + // on stack or ebedded into an other object. > > Typo: embedded OK. > > src/share/vm/memory/allocation.hpp > > ! ~ResourceObj(); > > You've added a destructor, but subclasses define their own destructors - so doesn't this need to be virtual? Constructor and destructor are special methods. Super class's default destructor are always called from subclass's destructor. The only case when you need to declare it as virtual if subclass object is assigned to local/field of super class type and compiler does not know what the original subclass was. ResourceObj *f = new OopRecorder(); ... delete f; We don't use such constructions. > > src/share/vm/memory/allocation.cpp > > + #ifdef ASSERT > + ((ResourceObj *)p)->_allocation = badHeapOopVal; > + #endif // ASSERT > > For consistency this should be a DEBUG_ONLY(...) OK. I had problems recently with some macros when I passed expressions with nested () so I was too conservative here. > > + assert((allocation & allocation_mask) == 0, ""); > + assert(type <= allocation_mask, ""); > > Please give meaningful messages to assertion failures. OK. I will update webrev today (during work hours :) ). Thanks, Vladimir > > David > ----- > > Vladimir Kozlov said the following on 08/03/10 15:45: >> I updated webrev: >> >> http://cr.openjdk.java.net/~kvn/6973963/webrev.01 >> >> Additional changes: >> >> 1. Used new option StressRecompilation instead of CompileTheWorld. >> I am still not sure if it is OK since we never test our stress options. >> >> 2. Added new allocation type: STACK_OR_EMBEDDED for cases when new() is not called. >> >> 3. Added ResourceObj destructor to zap _allocation field. >> >> 4. Added assert into get_allocation_type() to check that 'this' address is still encoded in _allocation. >> Found several cases where it was not true, have to add copy constructor and assignment operator. >> >> 5. Moved all new methods with asserts into allocation.cpp. >> >> 6. The added assert failed for CodeBuffer since it destroys itself inside destructor before ResourceObj destructor >> called. >> Moved Copy::fill_to_bytes(badResourceValue) into CodeBuffer::operator delete(). >> >> 7. Replaced PhaseCFG::_node_latency field with pointer since it is valid only inside resource mark in GlobalCodeMotion(). >> >> Thanks, >> Vladimir >> >> Vladimir Kozlov wrote: >>> Thank you, Tom >>> >>> RESOURCE_AREA case combines 3 cases: >>> >>> GrowableArray *foo = new GrowableArray(size); // resource array allocation >>> >>> GrowableArray foo; // stack allocation >>> >>> class A : public ResourceObj { >>> GrowableArray foo; // embedding allocation >>> } >>> >>> In all this cases GrowableArray::_data array is allocated in thread local >>> resource area. GrowableArray calls all these cases as stack allocation >>> and I followed this naming. >>> >>> But you are right I can separate 2 last cases into "stack" (stack or embedding) >>> allocation, I still have 1 bit in allocation_type :) >>> >>> Thanks, >>> Vladimir >>> >>> Tom Rodriguez wrote: >>>> So doesn't this triple the amount of time taken by CTW? That seems kind of extreme for the default mode. Maybe it >>>> should be under a flag? The allocation.hpp changes look a little sketchy. + if (~(_allocation | allocation_mask) != >>>> (uintptr_t)this) { >>>> + set_allocation_type((address)this, RESOURCE_AREA); >>>> >>>> Why is this case called RESOURCE_AREA? Isn't this the stack or embedding case? >>>> >>>> This doesn't make sense either. >>>> >>>> ! bool allocated_on_stack() { return get_allocation_type() == RESOURCE_AREA; } >>>> >>>> Anyway, the existing logic around this seemed sketchy so I can't quite say whether this is better or not. I'll have >>>> to leave that to someone else. >>>> >>>> The GrowableArray changes themselves look fine. >>>> >>>> tom >>>> >>>> On Aug 2, 2010, at 3:01 PM, Vladimir Kozlov wrote: >>>> >>>>> Forwarding to GC and Runtime groups since it is common code. >>>>> >>>>> Vladimir >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA >>>>> Date: Mon, 02 Aug 2010 14:57:25 -0700 >>>>> From: Vladimir Kozlov >>>>> To: hotspot compiler >>>>> >>>>> http://cr.openjdk.java.net/~kvn/6973963/webrev >>>>> >>>>> Fixed 6973963: SEGV in ciBlock::start_bci() with EA >>>>> >>>>> I added stress recompilation during CompileTheWorld and found this case. >>>>> It is similar to 6968368. BCEscapeAnalyzer::do_analysis() calls >>>>> ciMethod::get_method_blocks() which calls constructor ciMethodBlocks. >>>>> This constructor allocates GrowableArray elements on stack (thread >>>>> local resource area). As result when the method recompiled without EA >>>>> _blocks->_data is NULL. >>>>> >>>>> Solution: >>>>> Added stress recompilation during CompileTheWorld: recompile with >>>>> subsume_loads = false and do_escape_analysis = false. >>>>> Added more checks into ResourceObj and growableArray to verify correctness >>>>> of allocation. I have to relax the new assert in GrowableArray when >>>>> elements are allocated on arena to allow allocattion of GrowableArray object >>>>> as a part of an other object (for example, in ConnectionGraph and SuperWord). >>>>> >>>>> Tested with failed cases, CTW. >>>> From David.Holmes at oracle.com Tue Aug 3 05:13:37 2010 From: David.Holmes at oracle.com (David Holmes) Date: Tue, 03 Aug 2010 22:13:37 +1000 Subject: [Fwd: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA] In-Reply-To: <4C57E268.5060602@oracle.com> References: <4C57401E.5040204@oracle.com> <4C574904.2050307@oracle.com> <4C57ACE6.6090702@oracle.com> <4C57B5D0.4090807@oracle.com> <4C57E268.5060602@oracle.com> Message-ID: <4C5807F1.1070607@oracle.com> Vladimir Kozlov said the following on 08/03/10 19:33: > On 8/2/10 11:23 PM, David Holmes wrote: >> src/share/vm/asm/codeBuffer.cpp >> >> + void CodeBuffer::operator delete(void* p) { >> + ResourceObj::operator delete(p); >> #ifdef ASSERT >> ! Copy::fill_to_bytes(p, sizeof(CodeBuffer), badResourceValue); >> #endif >> } >> >> You moved the copy from the destructor to operator delete, but now you >> access p after you have deleted it. > > Yes, you are right. But CodeBuffer is never allocated on C heap and > ResourceObj::delete(p) > should be called only for allocation on C_HEAP. So this code should > never be executed, I'm confused. CodeBuffer is a StackObj, not a ResourceObj, so why would you call ResourceObj::delete in the first place ?? > I will replace it with ShouldNotCallThis() in CodeBuffer::delete(void* p). > Which leaves the original problem: I need to find how to zap CodeBuffer > after ResourceObj destructor > which is called after CodeBuffer destructor. Again I'm missing something - what is the connection between CodeBuffer and ResourceObj ? > Note: the problem here is not CodeBuffer but its field: OopRecorder > _default_oop_recorder; Explain that to me off-list if you like, I'm not familiar with this part of the code. >> src/share/vm/memory/allocation.hpp >> >> ! ~ResourceObj(); >> >> You've added a destructor, but subclasses define their own destructors >> - so doesn't this need to be virtual? > > Constructor and destructor are special methods. Super class's default > destructor are always called from > subclass's destructor. The only case when you need to declare it as > virtual if subclass object is assigned to > local/field of super class type and compiler does not know what the > original subclass was. > > ResourceObj *f = new OopRecorder(); > ... > delete f; > > We don't use such constructions. Ok. Just wanted to check. In another codebase we got bitten by a cleanup method in the superclass that did "delete this;" which did not invoke the destructor for the dynamic type of 'this' because the destructor was not virtual. Cheers, David From vladimir.kozlov at oracle.com Tue Aug 3 11:33:38 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 03 Aug 2010 11:33:38 -0700 Subject: [Fwd: Request for reviews (M): 6973963: SEGV in ciBlock::start_bci() with EA] In-Reply-To: <4C5807F1.1070607@oracle.com> References: <4C57401E.5040204@oracle.com> <4C574904.2050307@oracle.com> <4C57ACE6.6090702@oracle.com> <4C57B5D0.4090807@oracle.com> <4C57E268.5060602@oracle.com> <4C5807F1.1070607@oracle.com> Message-ID: <4C586102.3050508@oracle.com> I updated webrev http://cr.openjdk.java.net/~kvn/6973963/webrev.02 I added 'virtual' to ~ResourceObj() to avoid any surprises as Dave pointed. I save/restore allocation type around Copy::fill_to_bytes() in ~CodeBuffer() to solve my problem. Vladimir David Holmes wrote: > Vladimir Kozlov said the following on 08/03/10 19:33: >> On 8/2/10 11:23 PM, David Holmes wrote: >>> src/share/vm/asm/codeBuffer.cpp >>> >>> + void CodeBuffer::operator delete(void* p) { >>> + ResourceObj::operator delete(p); >>> #ifdef ASSERT >>> ! Copy::fill_to_bytes(p, sizeof(CodeBuffer), badResourceValue); >>> #endif >>> } >>> >>> You moved the copy from the destructor to operator delete, but now >>> you access p after you have deleted it. >> >> Yes, you are right. But CodeBuffer is never allocated on C heap and >> ResourceObj::delete(p) >> should be called only for allocation on C_HEAP. So this code should >> never be executed, > > I'm confused. CodeBuffer is a StackObj, not a ResourceObj, so why would > you call ResourceObj::delete in the first place ?? > >> I will replace it with ShouldNotCallThis() in CodeBuffer::delete(void* >> p). >> Which leaves the original problem: I need to find how to zap >> CodeBuffer after ResourceObj destructor >> which is called after CodeBuffer destructor. > > Again I'm missing something - what is the connection between CodeBuffer > and ResourceObj ? > >> Note: the problem here is not CodeBuffer but its field: OopRecorder >> _default_oop_recorder; > > Explain that to me off-list if you like, I'm not familiar with this part > of the code. > >>> src/share/vm/memory/allocation.hpp >>> >>> ! ~ResourceObj(); >>> >>> You've added a destructor, but subclasses define their own >>> destructors - so doesn't this need to be virtual? >> >> Constructor and destructor are special methods. Super class's default >> destructor are always called from >> subclass's destructor. The only case when you need to declare it as >> virtual if subclass object is assigned to >> local/field of super class type and compiler does not know what the >> original subclass was. >> >> ResourceObj *f = new OopRecorder(); >> ... >> delete f; >> >> We don't use such constructions. > > Ok. Just wanted to check. In another codebase we got bitten by a cleanup > method in the superclass that did "delete this;" which did not invoke > the destructor for the dynamic type of 'this' because the destructor was > not virtual. > > Cheers, > David From vladimir.kozlov at oracle.com Tue Aug 3 17:56:50 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 04 Aug 2010 00:56:50 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA Message-ID: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> Changeset: 0e35fa8ebccd Author: kvn Date: 2010-08-03 15:55 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd 6973963: SEGV in ciBlock::start_bci() with EA Summary: Added more checks into ResourceObj and growableArray to verify correctness of allocation type. Reviewed-by: never, coleenp, dholmes ! src/share/vm/asm/codeBuffer.cpp ! src/share/vm/asm/codeBuffer.hpp ! src/share/vm/ci/ciInstanceKlass.cpp ! src/share/vm/ci/ciMethodBlocks.cpp ! src/share/vm/ci/ciTypeFlow.cpp ! src/share/vm/classfile/classFileParser.cpp ! src/share/vm/memory/allocation.cpp ! src/share/vm/memory/allocation.hpp ! src/share/vm/opto/block.cpp ! src/share/vm/opto/block.hpp ! src/share/vm/opto/c2_globals.hpp ! src/share/vm/opto/c2compiler.cpp ! src/share/vm/opto/chaitin.cpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/gcm.cpp ! src/share/vm/opto/lcm.cpp ! src/share/vm/utilities/growableArray.hpp From vladimir.kozlov at oracle.com Wed Aug 4 15:03:48 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 04 Aug 2010 15:03:48 -0700 Subject: Request for reviews (XS): 6974682: CTW: assert(target != NULL) failed: must not be null Message-ID: <4C59E3C4.4090102@oracle.com> http://cr.openjdk.java.net/~kvn/6974682/webrev Fixed 6974682: CTW: assert(target != NULL) failed: must not be null Address table for indirect branch is allocated in constant section but its size is not taking into account when constant section's size is defined. Solution: Add address table size to constant section. Tested with failed case, CTW. From tom.rodriguez at oracle.com Wed Aug 4 17:03:06 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 4 Aug 2010 17:03:06 -0700 Subject: Request for reviews (XS): 6974682: CTW: assert(target != NULL) failed: must not be null In-Reply-To: <4C59E3C4.4090102@oracle.com> References: <4C59E3C4.4090102@oracle.com> Message-ID: Your change seems ok but that const_size code is crap. It always greatly overestimates the space needed. #ifdef SPARC // Sparc doubles entries in the constant table require more space for // alignment. (expires 9/98) int table_entries = (3 * instr->num_consts( _globalNames, Form::idealD )) + instr->num_consts( _globalNames, Form::idealF ); #else int table_entries = instr->num_consts( _globalNames, Form::idealD ) + instr->num_consts( _globalNames, Form::idealF ); #endif So on sparc a double reserves 6 words. It can/should be cleaned up as part of the changes Christian is working on. tom On Aug 4, 2010, at 3:03 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6974682/webrev > > Fixed 6974682: CTW: assert(target != NULL) failed: must not be null > > Address table for indirect branch is allocated in constant section > but its size is not taking into account when constant section's > size is defined. > > Solution: > Add address table size to constant section. > > Tested with failed case, CTW. From vladimir.kozlov at oracle.com Wed Aug 4 21:13:03 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Thu, 05 Aug 2010 04:13:03 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6974682: CTW: assert(target != NULL) failed: must not be null Message-ID: <20100805041305.5BCE147EFF@hg.openjdk.java.net> Changeset: 0e09207fc81b Author: kvn Date: 2010-08-04 17:42 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e09207fc81b 6974682: CTW: assert(target != NULL) failed: must not be null Summary: Add address table size to constant section size. Reviewed-by: never ! src/share/vm/opto/output.cpp From tom.rodriguez at oracle.com Thu Aug 5 13:25:09 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 5 Aug 2010 13:25:09 -0700 Subject: review (S) for 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt Message-ID: http://cr.openjdk.java.net/~never/6975006 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt Reviewed-by: The safepointing logic treats threads that are thread_in_native as if they are halted since the Java state is safe while we are in native state. If the thread happens to return from native during the safepoint it will simply come to a halt. On sparc this creates some complexity when patching for deoptimization because the return address is kept in a register and only flushed to stack by the chip. We force flushing of the windows in the JNI stub but because of the way register windows work this doesn't help the frame that is just above a native wrapper since the window might be on chip while the native wrapper itself is executing. There's machinery in the deopt code that detects the case where the caller of a native wrapper is the one being deoptimized and arranges for the native wrapper to rewrite the return address when it comes out of native. The problem is that this code examines the current state of the thread at the time the deopt occurs not what state was when the safepoint started. This creates a little race where a native wrapper might come to a halt on it's own after the safepoint started but before the deopt patching occurred, which sidesteps the deopt suspend logic because it's not in one of the thread_in_native states. The fix is to record the state of the thread at the beginning of the safepoint and consult that when triggering the deopt suspend logic. Tested by repeatedly running test with -XX:+DeoptimizeALot. Previously it would fail within 5 minutes but after the fix it ran overnight until I simply killed it. From vladimir.kozlov at oracle.com Thu Aug 5 15:59:48 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 05 Aug 2010 15:59:48 -0700 Subject: Request for reviews (S): 6975078: assert(allocated_on_res_area() || allocated_on_C_heap() || allocated_on_arena() Message-ID: <4C5B4264.2020004@oracle.com> Sending also to GC since I touched G1 code :) http://cr.openjdk.java.net/~kvn/6975078/webrev Fixed 6975078: assert(allocated_on_res_area() || allocated_on_C_heap() || allocated_on_arena() The assert is from my fix for 6973963 and I can't reproduce this failure. void emit_call_reloc() { MacroAssembler _masm(&cbuf); <<< asserts here, allocation on stack. It could be because ~ResourceObj() destructor is not called for _masm but I doubt it. In 6973963 changes to track correctness of allocation type (to separate it from garbage on stack) I encoded (negated) 'this' address into _allocation value and zap it in ~ResourceObj() destructor. Most likely it is because the garbage value on stack is equal to ~(address of _masm on stack). For example, for 0xffffffff613fb4d0 (sp from hs_err file) it could be 0x9ec04b20. I thought it would be impossible but I was wrong, it seems. Solution: Pass the check in constructor ResourceObj() if _allocation has a value which looks like an allocation on stack and it is really allocated on stack. I also did cleanup: - added 'const' to ResourceObj access methods, - fixed few typos and comments, - replaced in G1 call to ResourceObj::new() with ResourceObj::set_allocation_type(). JPRT. From vladimir.kozlov at oracle.com Thu Aug 5 16:25:23 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 05 Aug 2010 16:25:23 -0700 Subject: review (S) for 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt In-Reply-To: References: Message-ID: <4C5B4863.3050408@oracle.com> Looks good. Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6975006 > > 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt > Reviewed-by: > > The safepointing logic treats threads that are thread_in_native as if > they are halted since the Java state is safe while we are in native > state. If the thread happens to return from native during the > safepoint it will simply come to a halt. On sparc this creates some > complexity when patching for deoptimization because the return address > is kept in a register and only flushed to stack by the chip. We force > flushing of the windows in the JNI stub but because of the way > register windows work this doesn't help the frame that is just above a > native wrapper since the window might be on chip while the native > wrapper itself is executing. There's machinery in the deopt code that > detects the case where the caller of a native wrapper is the one being > deoptimized and arranges for the native wrapper to rewrite the return > address when it comes out of native. The problem is that this code > examines the current state of the thread at the time the deopt occurs > not what state was when the safepoint started. This creates a little > race where a native wrapper might come to a halt on it's own after the > safepoint started but before the deopt patching occurred, which > sidesteps the deopt suspend logic because it's not in one of the > thread_in_native states. The fix is to record the state of the thread > at the beginning of the safepoint and consult that when triggering the > deopt suspend logic. > > Tested by repeatedly running test with -XX:+DeoptimizeALot. > Previously it would fail within 5 minutes but after the fix it ran > overnight until I simply killed it. From tom.rodriguez at oracle.com Thu Aug 5 13:24:17 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 5 Aug 2010 13:24:17 -0700 Subject: review (XS) for 6975027: use of movptr to set length of array Message-ID: <79CD3A82-8384-4CB8-B6C8-B45CE016A988@oracle.com> http://cr.openjdk.java.net/~never/6975027/ 6975027: use of movptr to set length of array Reviewed-by: Some C1 slow path code for setting the length of an array that's used to fill dead space in a tlab is using movptr instead of movl. From vladimir.kozlov at oracle.com Fri Aug 6 08:38:10 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 06 Aug 2010 08:38:10 -0700 Subject: review (XS) for 6975027: use of movptr to set length of array In-Reply-To: <79CD3A82-8384-4CB8-B6C8-B45CE016A988@oracle.com> References: <79CD3A82-8384-4CB8-B6C8-B45CE016A988@oracle.com> Message-ID: <4C5C2C62.4070203@oracle.com> Something strange with mail server. I received this mail and several others only today. Changes look good. Vladimir On 8/5/10 1:24 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6975027/ > > 6975027: use of movptr to set length of array > Reviewed-by: > > Some C1 slow path code for setting the length of an array that's used > to fill dead space in a tlab is using movptr instead of movl. From igor.veresov at oracle.com Fri Aug 6 10:36:31 2010 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 06 Aug 2010 10:36:31 -0700 Subject: review (XS) for 6975027: use of movptr to set length of array In-Reply-To: <79CD3A82-8384-4CB8-B6C8-B45CE016A988@oracle.com> References: <79CD3A82-8384-4CB8-B6C8-B45CE016A988@oracle.com> Message-ID: <4C5C481F.3@oracle.com> Looks good. On 8/5/10 1:24 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6975027/ > > 6975027: use of movptr to set length of array > Reviewed-by: > > Some C1 slow path code for setting the length of an array that's used > to fill dead space in a tlab is using movptr instead of movl. From vladimir.kozlov at oracle.com Fri Aug 6 13:51:20 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Fri, 06 Aug 2010 20:51:20 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6975049: nsk/regression/b4287029 crashes with -Xss64 on solaris-i586 Message-ID: <20100806205123.E0D9747F65@hg.openjdk.java.net> Changeset: fb8abd207dbe Author: kvn Date: 2010-08-06 11:53 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/fb8abd207dbe 6975049: nsk/regression/b4287029 crashes with -Xss64 on solaris-i586 Summary: Tell C++ to not inline so much by using flag -xspace. Reviewed-by: ysr ! make/solaris/makefiles/sparcWorks.make From christian.thalinger at oracle.com Mon Aug 9 04:52:06 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 09 Aug 2010 13:52:06 +0200 Subject: review (S) for 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt In-Reply-To: References: Message-ID: <1281354726.1119.266.camel@macbook> On Thu, 2010-08-05 at 13:25 -0700, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6975006 > > 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt > Reviewed-by: > > The safepointing logic treats threads that are thread_in_native as if > they are halted since the Java state is safe while we are in native > state. If the thread happens to return from native during the > safepoint it will simply come to a halt. On sparc this creates some > complexity when patching for deoptimization because the return address > is kept in a register and only flushed to stack by the chip. We force > flushing of the windows in the JNI stub but because of the way > register windows work this doesn't help the frame that is just above a > native wrapper since the window might be on chip while the native > wrapper itself is executing. There's machinery in the deopt code that > detects the case where the caller of a native wrapper is the one being > deoptimized and arranges for the native wrapper to rewrite the return > address when it comes out of native. The problem is that this code > examines the current state of the thread at the time the deopt occurs > not what state was when the safepoint started. This creates a little > race where a native wrapper might come to a halt on it's own after the > safepoint started but before the deopt patching occurred, which > sidesteps the deopt suspend logic because it's not in one of the > thread_in_native states. The fix is to record the state of the thread > at the beginning of the safepoint and consult that when triggering the > deopt suspend logic. > > Tested by repeatedly running test with -XX:+DeoptimizeALot. > Previously it would fail within 5 minutes but after the fix it ran > overnight until I simply killed it. Looks good. -- Christian From vladimir.kozlov at oracle.com Mon Aug 9 17:18:32 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Tue, 10 Aug 2010 00:18:32 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6975078: assert(allocated_on_res_area() || allocated_on_C_heap() || allocated_on_arena() Message-ID: <20100810001838.818D147030@hg.openjdk.java.net> Changeset: 2dfd013a7465 Author: kvn Date: 2010-08-09 15:17 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/2dfd013a7465 6975078: assert(allocated_on_res_area() || allocated_on_C_heap() || allocated_on_arena() Summary: Pass the check in ResourceObj() if _allocation value is already set and object is allocated on stack. Reviewed-by: dholmes, johnc ! src/share/vm/gc_implementation/g1/collectionSetChooser.cpp ! src/share/vm/gc_implementation/g1/heapRegionSeq.cpp ! src/share/vm/memory/allocation.cpp ! src/share/vm/memory/allocation.hpp ! src/share/vm/runtime/thread.cpp ! src/share/vm/runtime/thread.hpp From tom.rodriguez at oracle.com Mon Aug 9 21:42:44 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Tue, 10 Aug 2010 04:42:44 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 8 new changesets Message-ID: <20100810044300.82DF947039@hg.openjdk.java.net> Changeset: a81afd9c293c Author: alanb Date: 2010-07-16 13:14 +0100 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/a81afd9c293c 6649594: Intermittent IOExceptions during dynamic attach on linux and solaris Reviewed-by: dcubed, dholmes ! src/os/linux/vm/attachListener_linux.cpp ! src/os/solaris/vm/attachListener_solaris.cpp Changeset: 920aa833fd16 Author: apangin Date: 2010-07-17 21:49 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/920aa833fd16 Merge Changeset: a5c9d63a187d Author: apangin Date: 2010-07-20 08:41 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/a5c9d63a187d 6964170: Verifier crashes Summary: Check if klassOop != NULL rather than klass_part != NULL Reviewed-by: kamg, never ! src/share/vm/classfile/verificationType.cpp ! src/share/vm/classfile/verifier.cpp Changeset: 7f0fdccac34f Author: apangin Date: 2010-07-25 07:31 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/7f0fdccac34f Merge ! src/share/vm/classfile/verifier.cpp Changeset: 3d90023429ec Author: aph Date: 2010-07-28 17:38 +0100 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/3d90023429ec 6888526: Linux getCurrentThreadCpuTime is drastically slower than Windows Reviewed-by: dcubed, dholmes ! src/os/linux/vm/globals_linux.hpp ! src/share/vm/runtime/arguments.cpp Changeset: a64438a2b7e8 Author: coleenp Date: 2010-07-28 17:57 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/a64438a2b7e8 6958465: Sparc aten build24.0: openjdk-7.ea-b96 failed Error: Formal argument ... requires an lvalue Summary: Fix compilation errors. Made non-const references const so can be assigned with lvalue. Reviewed-by: phh, xlu ! src/cpu/sparc/vm/assembler_sparc.cpp ! src/cpu/sparc/vm/assembler_sparc.hpp ! src/cpu/sparc/vm/assembler_sparc.inline.hpp Changeset: 126ea7725993 Author: bobv Date: 2010-08-03 08:13 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/126ea7725993 6953477: Increase portability and flexibility of building Hotspot Summary: A collection of portability improvements including shared code support for PPC, ARM platforms, software floating point, cross compilation support and improvements in error crash detail. Reviewed-by: phh, never, coleenp, dholmes ! agent/src/os/linux/ps_proc.c ! make/Makefile ! make/defs.make ! make/linux/makefiles/build_vm_def.sh ! make/linux/makefiles/buildtree.make ! make/linux/makefiles/defs.make ! make/linux/makefiles/gcc.make ! make/linux/makefiles/product.make ! make/linux/makefiles/sa.make ! make/linux/makefiles/saproc.make ! make/linux/makefiles/vm.make ! make/solaris/makefiles/defs.make ! src/cpu/sparc/vm/bytecodeInterpreter_sparc.inline.hpp ! src/cpu/sparc/vm/c1_LIRGenerator_sparc.cpp ! src/cpu/sparc/vm/c1_Runtime1_sparc.cpp ! src/cpu/sparc/vm/interpreterRT_sparc.cpp ! src/cpu/sparc/vm/javaFrameAnchor_sparc.hpp ! src/cpu/sparc/vm/templateTable_sparc.cpp ! src/cpu/x86/vm/bytecodeInterpreter_x86.inline.hpp ! src/cpu/x86/vm/c1_LIRGenerator_x86.cpp ! src/cpu/x86/vm/c1_Runtime1_x86.cpp ! src/cpu/x86/vm/frame_x86.cpp ! src/cpu/x86/vm/interpreterRT_x86_32.cpp ! src/cpu/x86/vm/javaFrameAnchor_x86.hpp ! src/cpu/x86/vm/templateTable_x86_32.cpp ! src/cpu/x86/vm/templateTable_x86_64.cpp ! src/os/linux/launcher/java_md.c ! src/os/linux/vm/os_linux.cpp ! src/os/solaris/vm/os_solaris.cpp ! src/os/windows/vm/os_windows.cpp ! src/os_cpu/linux_sparc/vm/thread_linux_sparc.cpp ! src/os_cpu/linux_x86/vm/os_linux_x86.cpp ! src/os_cpu/linux_x86/vm/thread_linux_x86.cpp ! src/os_cpu/linux_zero/vm/thread_linux_zero.cpp ! src/os_cpu/solaris_sparc/vm/os_solaris_sparc.cpp ! src/os_cpu/solaris_sparc/vm/thread_solaris_sparc.cpp ! src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp ! src/os_cpu/solaris_x86/vm/thread_solaris_x86.cpp ! src/os_cpu/windows_x86/vm/os_windows_x86.cpp ! src/os_cpu/windows_x86/vm/thread_windows_x86.cpp ! src/share/vm/asm/codeBuffer.hpp ! src/share/vm/c1/c1_CodeStubs.hpp ! src/share/vm/c1/c1_Compilation.hpp ! src/share/vm/c1/c1_FrameMap.cpp ! src/share/vm/c1/c1_FrameMap.hpp ! src/share/vm/c1/c1_LIR.cpp ! src/share/vm/c1/c1_LIR.hpp ! src/share/vm/c1/c1_LIRGenerator.cpp ! src/share/vm/c1/c1_LIRGenerator.hpp ! src/share/vm/c1/c1_LinearScan.cpp ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/c1/c1_Runtime1.hpp ! src/share/vm/code/codeBlob.cpp ! src/share/vm/code/codeBlob.hpp ! src/share/vm/code/nmethod.hpp ! src/share/vm/code/vtableStubs.cpp ! src/share/vm/code/vtableStubs.hpp ! src/share/vm/compiler/disassembler.cpp ! src/share/vm/includeDB_compiler1 ! src/share/vm/includeDB_core ! src/share/vm/interpreter/bytecodeInterpreter.cpp ! src/share/vm/interpreter/bytecodeInterpreter.hpp ! src/share/vm/interpreter/bytecodeInterpreter.inline.hpp ! src/share/vm/interpreter/interpreter.cpp ! src/share/vm/interpreter/interpreter.hpp ! src/share/vm/interpreter/oopMapCache.cpp ! src/share/vm/memory/allocation.cpp ! src/share/vm/memory/allocation.hpp ! src/share/vm/memory/genCollectedHeap.cpp ! src/share/vm/memory/generation.hpp ! src/share/vm/oops/arrayKlass.cpp ! src/share/vm/oops/arrayKlass.hpp ! src/share/vm/oops/arrayKlassKlass.cpp ! src/share/vm/oops/arrayKlassKlass.hpp ! src/share/vm/oops/compiledICHolderKlass.cpp ! src/share/vm/oops/compiledICHolderKlass.hpp ! src/share/vm/oops/constMethodKlass.cpp ! src/share/vm/oops/constMethodKlass.hpp ! src/share/vm/oops/constantPoolKlass.cpp ! src/share/vm/oops/constantPoolKlass.hpp ! src/share/vm/oops/cpCacheKlass.cpp ! src/share/vm/oops/cpCacheKlass.hpp ! src/share/vm/oops/generateOopMap.cpp ! src/share/vm/oops/klass.cpp ! src/share/vm/oops/klass.hpp ! src/share/vm/oops/klassKlass.cpp ! src/share/vm/oops/klassKlass.hpp ! src/share/vm/oops/oop.cpp ! src/share/vm/prims/jni.cpp ! src/share/vm/prims/jvmtiEnvThreadState.hpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/frame.cpp ! src/share/vm/runtime/frame.hpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/java.cpp ! src/share/vm/runtime/javaFrameAnchor.hpp ! src/share/vm/runtime/os.cpp ! src/share/vm/runtime/os.hpp ! src/share/vm/runtime/sharedRuntime.cpp ! src/share/vm/runtime/sharedRuntime.hpp ! src/share/vm/runtime/sharedRuntimeTrans.cpp ! src/share/vm/runtime/signature.hpp ! src/share/vm/runtime/stubCodeGenerator.cpp ! src/share/vm/runtime/stubCodeGenerator.hpp ! src/share/vm/runtime/thread.cpp ! src/share/vm/runtime/thread.hpp ! src/share/vm/runtime/vm_version.cpp ! src/share/vm/runtime/vm_version.hpp ! src/share/vm/utilities/debug.cpp ! src/share/vm/utilities/globalDefinitions_gcc.hpp ! src/share/vm/utilities/macros.hpp ! src/share/vm/utilities/vmError.cpp ! src/share/vm/utilities/vmError.hpp Changeset: f4f596978298 Author: never Date: 2010-08-09 17:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/f4f596978298 Merge ! src/share/vm/asm/codeBuffer.hpp ! src/share/vm/memory/allocation.cpp ! src/share/vm/memory/allocation.hpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/thread.cpp ! src/share/vm/runtime/thread.hpp ! src/share/vm/utilities/vmError.cpp From christian.thalinger at oracle.com Tue Aug 10 03:10:33 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 10 Aug 2010 12:10:33 +0200 Subject: Request for review(XXS): 6962980 C1: stub area should take into account method handle deopt stub In-Reply-To: <1277200845.27950.42.camel@macbook> References: <4C20851E.3030802@oracle.com> <1277200845.27950.42.camel@macbook> Message-ID: <1281435033.1022.9.camel@macbook> On Tue, 2010-06-22 at 12:00 +0200, Christian Thalinger wrote: > On Tue, 2010-06-22 at 02:40 -0700, Igor Veresov wrote: > > We should reserve additional space in the code buffer for method handle > > deopt stub. > > > > Webrev: http://cr.openjdk.java.net/~iveresov/6962980/webrev/ > > Good point. That reminds me of something I wanted to do for C1 as I did > for C2: only emit the handler when it's really required. > > C2 does this: > > if (has_method_handle_invokes()) > total_req += deopt_handler_req; // deopt MH handler > > I should file a CR for that. 6975855: don't emit deopt MH handler in C1 if not required -- Christian From christian.thalinger at oracle.com Tue Aug 10 06:40:55 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 10 Aug 2010 15:40:55 +0200 Subject: Request for reviews (S): 6975855: don't emit deopt MH handler in C1 if not required Message-ID: <1281447655.17347.1.camel@macbook> http://cr.openjdk.java.net/~twisti/6975855/webrev.01/ 6975855: don't emit deopt MH handler in C1 if not required Summary: This CR implements the same for C1 as 6926782 for C2. Reviewed-by: The changes of 6926782 don't emit the deopt MH handler in C2. This CR implements the same for C1. From vladimir.kozlov at oracle.com Tue Aug 10 11:17:39 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Aug 2010 11:17:39 -0700 Subject: Request for reviews (S): 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 Message-ID: <4C6197C3.2070808@oracle.com> http://cr.openjdk.java.net/~kvn/6973329/webrev Fixed 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 Main problem: RA ignores anti-dependence when placing a clone of a node which produces flags (or any rematerializable nodes). The code generating implicit_null_check may move such nodes above nodes modifying flags which will force RA to clone it. Solution: Recompile without subsuming loads if RA try to clone a node with anti_dependence. Do not use nodes which produce flags in implicit null checks. Added regression test based on failure I saw in SPECjEnterprise2010. JPRT, SPECjEnterprise2010 Notes: I collected statistic about how many dependences are found per each call to insert_anti_dependences(). And how many recompilation without subsuming loads and total bailout happened in new RA code. CTW rt.jar: 1431559 made anti_dependence checks, 5497392 found anti_dependences (384%), 128477 did not find anti_dependences ( 8%) jvm2008: 200020 made anti_dependence checks, 724530 found anti_dependences (362%), 10339 did not find anti_dependences ( 5%) Originally I thought about adding new Node flag has_anti_dependence which I would set in insert_anti_dependences() and check it in the clone_node() in RA. But looking on this statistic I decided to use existing flag needs_anti_dependence_check. After that number of recompilation without subsuming loads changed from 115 to 127 (in CTW rt.jar). I think it is acceptable. Next I found that this recompilation number could be significantly reduced if I exclude nodes which produce flags from implicit null checks. Before: CTW rt.jar: RA: 127 recompile without subsume_loads, 0 bailout compilation jvm2008: RA: 140 recompile without subsume_loads, 0 bailout compilation After: CTW rt.jar: RA: 39 recompile without subsume_loads, 0 bailout compilation jvm2008: RA: 3 recompile without subsume_loads, 0 bailout compilation From tom.rodriguez at oracle.com Tue Aug 10 11:32:19 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 10 Aug 2010 11:32:19 -0700 Subject: Request for reviews (S): 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 In-Reply-To: <4C6197C3.2070808@oracle.com> References: <4C6197C3.2070808@oracle.com> Message-ID: I think that looks good. Not using must_clone nodes in implicit null checks is a good fix. tom On Aug 10, 2010, at 11:17 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6973329/webrev > > Fixed 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 > > Main problem: RA ignores anti-dependence when placing a clone of > a node which produces flags (or any rematerializable nodes). > The code generating implicit_null_check may move such nodes > above nodes modifying flags which will force RA to clone it. > > Solution: > Recompile without subsuming loads if RA try to clone a node with anti_dependence. > Do not use nodes which produce flags in implicit null checks. > > Added regression test based on failure I saw in SPECjEnterprise2010. > > JPRT, SPECjEnterprise2010 > > Notes: > I collected statistic about how many dependences are found per > each call to insert_anti_dependences(). And how many recompilation > without subsuming loads and total bailout happened in new RA code. > > CTW rt.jar: > 1431559 made anti_dependence checks, 5497392 found anti_dependences (384%), > 128477 did not find anti_dependences ( 8%) > > jvm2008: > 200020 made anti_dependence checks, 724530 found anti_dependences (362%), > 10339 did not find anti_dependences ( 5%) > > Originally I thought about adding new Node flag has_anti_dependence which I > would set in insert_anti_dependences() and check it in the clone_node() in RA. > But looking on this statistic I decided to use existing flag > needs_anti_dependence_check. After that number of recompilation without > subsuming loads changed from 115 to 127 (in CTW rt.jar). I think it is acceptable. > > Next I found that this recompilation number could be significantly reduced > if I exclude nodes which produce flags from implicit null checks. > > Before: > CTW rt.jar: > RA: 127 recompile without subsume_loads, 0 bailout compilation > jvm2008: > RA: 140 recompile without subsume_loads, 0 bailout compilation > > After: > CTW rt.jar: > RA: 39 recompile without subsume_loads, 0 bailout compilation > jvm2008: > RA: 3 recompile without subsume_loads, 0 bailout compilation > > From vladimir.kozlov at oracle.com Tue Aug 10 11:33:35 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Aug 2010 11:33:35 -0700 Subject: Request for reviews (S): 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 In-Reply-To: References: <4C6197C3.2070808@oracle.com> Message-ID: <4C619B7F.1090508@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > I think that looks good. Not using must_clone nodes in implicit null checks is a good fix. > > tom > > On Aug 10, 2010, at 11:17 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/6973329/webrev >> >> Fixed 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 >> >> Main problem: RA ignores anti-dependence when placing a clone of >> a node which produces flags (or any rematerializable nodes). >> The code generating implicit_null_check may move such nodes >> above nodes modifying flags which will force RA to clone it. >> >> Solution: >> Recompile without subsuming loads if RA try to clone a node with anti_dependence. >> Do not use nodes which produce flags in implicit null checks. >> >> Added regression test based on failure I saw in SPECjEnterprise2010. >> >> JPRT, SPECjEnterprise2010 >> >> Notes: >> I collected statistic about how many dependences are found per >> each call to insert_anti_dependences(). And how many recompilation >> without subsuming loads and total bailout happened in new RA code. >> >> CTW rt.jar: >> 1431559 made anti_dependence checks, 5497392 found anti_dependences (384%), >> 128477 did not find anti_dependences ( 8%) >> >> jvm2008: >> 200020 made anti_dependence checks, 724530 found anti_dependences (362%), >> 10339 did not find anti_dependences ( 5%) >> >> Originally I thought about adding new Node flag has_anti_dependence which I >> would set in insert_anti_dependences() and check it in the clone_node() in RA. >> But looking on this statistic I decided to use existing flag >> needs_anti_dependence_check. After that number of recompilation without >> subsuming loads changed from 115 to 127 (in CTW rt.jar). I think it is acceptable. >> >> Next I found that this recompilation number could be significantly reduced >> if I exclude nodes which produce flags from implicit null checks. >> >> Before: >> CTW rt.jar: >> RA: 127 recompile without subsume_loads, 0 bailout compilation >> jvm2008: >> RA: 140 recompile without subsume_loads, 0 bailout compilation >> >> After: >> CTW rt.jar: >> RA: 39 recompile without subsume_loads, 0 bailout compilation >> jvm2008: >> RA: 3 recompile without subsume_loads, 0 bailout compilation >> >> > From tom.rodriguez at oracle.com Tue Aug 10 11:36:44 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 10 Aug 2010 11:36:44 -0700 Subject: Request for reviews (S): 6975855: don't emit deopt MH handler in C1 if not required In-Reply-To: <1281447655.17347.1.camel@macbook> References: <1281447655.17347.1.camel@macbook> Message-ID: <571D377C-B835-4456-9A1C-FBE8D6B235AC@oracle.com> Looks ok. tom On Aug 10, 2010, at 6:40 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/6975855/webrev.01/ > > 6975855: don't emit deopt MH handler in C1 if not required > Summary: This CR implements the same for C1 as 6926782 for C2. > Reviewed-by: > > The changes of 6926782 don't emit the deopt MH handler in C2. This CR > implements the same for C1. > From tom.rodriguez at oracle.com Tue Aug 10 11:59:08 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 10 Aug 2010 11:59:08 -0700 Subject: review (S) for 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt In-Reply-To: <1281354726.1119.266.camel@macbook> References: <1281354726.1119.266.camel@macbook> Message-ID: <95FA4159-DCB2-4A78-A6D6-064960C87D7A@oracle.com> I discovered a problem with this code that necessitated a small change. Basically the code shouldn't be used when frame::deoptimize isn't called from a safepoint, which occurs as part of the fix for 6902182 where we stopped forcing a safepoint for all deoptimization. Basically the whole deopt suspend logic can be skipped in that case because you can't be in thread_in_native when deoptimizing yourself. Tested with full nsk with -XX:+DeoptimizeALot -XX:CompileThreshold=100. http://cr.openjdk.java.net/~never/6975006. tom On Aug 9, 2010, at 4:52 AM, Christian Thalinger wrote: > On Thu, 2010-08-05 at 13:25 -0700, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/6975006 >> >> 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt >> Reviewed-by: >> >> The safepointing logic treats threads that are thread_in_native as if >> they are halted since the Java state is safe while we are in native >> state. If the thread happens to return from native during the >> safepoint it will simply come to a halt. On sparc this creates some >> complexity when patching for deoptimization because the return address >> is kept in a register and only flushed to stack by the chip. We force >> flushing of the windows in the JNI stub but because of the way >> register windows work this doesn't help the frame that is just above a >> native wrapper since the window might be on chip while the native >> wrapper itself is executing. There's machinery in the deopt code that >> detects the case where the caller of a native wrapper is the one being >> deoptimized and arranges for the native wrapper to rewrite the return >> address when it comes out of native. The problem is that this code >> examines the current state of the thread at the time the deopt occurs >> not what state was when the safepoint started. This creates a little >> race where a native wrapper might come to a halt on it's own after the >> safepoint started but before the deopt patching occurred, which >> sidesteps the deopt suspend logic because it's not in one of the >> thread_in_native states. The fix is to record the state of the thread >> at the beginning of the safepoint and consult that when triggering the >> deopt suspend logic. >> >> Tested by repeatedly running test with -XX:+DeoptimizeALot. >> Previously it would fail within 5 minutes but after the fix it ran >> overnight until I simply killed it. > > Looks good. -- Christian > From vladimir.kozlov at oracle.com Tue Aug 10 14:18:02 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Aug 2010 14:18:02 -0700 Subject: review (S) for 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt In-Reply-To: <95FA4159-DCB2-4A78-A6D6-064960C87D7A@oracle.com> References: <1281354726.1119.266.camel@macbook> <95FA4159-DCB2-4A78-A6D6-064960C87D7A@oracle.com> Message-ID: <4C61C20A.2040901@oracle.com> Looks good. Vladimir Tom Rodriguez wrote: > I discovered a problem with this code that necessitated a small change. Basically the code shouldn't be used when frame::deoptimize isn't called from a safepoint, which occurs as part of the fix for 6902182 where we stopped forcing a safepoint for all deoptimization. Basically the whole deopt suspend logic can be skipped in that case because you can't be in thread_in_native when deoptimizing yourself. Tested with full nsk with -XX:+DeoptimizeALot -XX:CompileThreshold=100. http://cr.openjdk.java.net/~never/6975006. > > tom > > On Aug 9, 2010, at 4:52 AM, Christian Thalinger wrote: > >> On Thu, 2010-08-05 at 13:25 -0700, Tom Rodriguez wrote: >>> http://cr.openjdk.java.net/~never/6975006 >>> >>> 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt >>> Reviewed-by: >>> >>> The safepointing logic treats threads that are thread_in_native as if >>> they are halted since the Java state is safe while we are in native >>> state. If the thread happens to return from native during the >>> safepoint it will simply come to a halt. On sparc this creates some >>> complexity when patching for deoptimization because the return address >>> is kept in a register and only flushed to stack by the chip. We force >>> flushing of the windows in the JNI stub but because of the way >>> register windows work this doesn't help the frame that is just above a >>> native wrapper since the window might be on chip while the native >>> wrapper itself is executing. There's machinery in the deopt code that >>> detects the case where the caller of a native wrapper is the one being >>> deoptimized and arranges for the native wrapper to rewrite the return >>> address when it comes out of native. The problem is that this code >>> examines the current state of the thread at the time the deopt occurs >>> not what state was when the safepoint started. This creates a little >>> race where a native wrapper might come to a halt on it's own after the >>> safepoint started but before the deopt patching occurred, which >>> sidesteps the deopt suspend logic because it's not in one of the >>> thread_in_native states. The fix is to record the state of the thread >>> at the beginning of the safepoint and consult that when triggering the >>> deopt suspend logic. >>> >>> Tested by repeatedly running test with -XX:+DeoptimizeALot. >>> Previously it would fail within 5 minutes but after the fix it ran >>> overnight until I simply killed it. >> Looks good. -- Christian >> > From tom.rodriguez at oracle.com Tue Aug 10 15:35:42 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Tue, 10 Aug 2010 22:35:42 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6975027: use of movptr to set length of array Message-ID: <20100810223543.BFD534707A@hg.openjdk.java.net> Changeset: 36519c19beeb Author: never Date: 2010-08-10 12:15 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/36519c19beeb 6975027: use of movptr to set length of array Reviewed-by: kvn, iveresov ! src/cpu/x86/vm/assembler_x86.cpp From christian.thalinger at oracle.com Wed Aug 11 01:24:42 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 11 Aug 2010 10:24:42 +0200 Subject: Review Request: Shark In-Reply-To: <4C365A67.6070402@oracle.com> References: <20100611141655.GI3674@redhat.com> <1277195726.27950.40.camel@macbook> <20100622132939.GC3420@redhat.com> <1277216100.27950.46.camel@macbook> <20100622144841.GD3420@redhat.com> <1278582298.19588.22.camel@macbook> <4C365A67.6070402@oracle.com> Message-ID: <1281515082.17347.16.camel@macbook> On Thu, 2010-07-08 at 16:08 -0700, Vladimir Kozlov wrote: > Changes in our files looks fine. Sorry for the delay, I'll push it today. -- Christian From christian.thalinger at oracle.com Wed Aug 11 01:41:06 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 11 Aug 2010 10:41:06 +0200 Subject: Review Request: Shark In-Reply-To: <20100611141655.GI3674@redhat.com> References: <20100611141655.GI3674@redhat.com> Message-ID: <1281516066.17347.17.camel@macbook> On Fri, 2010-06-11 at 15:16 +0100, Gary Benson wrote: > And this webrev adds a little bit of build stuff to the > non-HotSpot parts of the JDK: > > http://cr.openjdk.java.net/~gbenson/shark-build-01/ Did you send this one out to the JDK people for review? I will not push these changes. -- Christian From christian.thalinger at oracle.com Wed Aug 11 01:46:11 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 11 Aug 2010 10:46:11 +0200 Subject: Review Request: Shark In-Reply-To: <1281515082.17347.16.camel@macbook> References: <20100611141655.GI3674@redhat.com> <1277195726.27950.40.camel@macbook> <20100622132939.GC3420@redhat.com> <1277216100.27950.46.camel@macbook> <20100622144841.GD3420@redhat.com> <1278582298.19588.22.camel@macbook> <4C365A67.6070402@oracle.com> <1281515082.17347.16.camel@macbook> Message-ID: <1281516371.17347.18.camel@macbook> On Wed, 2010-08-11 at 10:24 +0200, Christian Thalinger wrote: > On Thu, 2010-07-08 at 16:08 -0700, Vladimir Kozlov wrote: > > Changes in our files looks fine. > > Sorry for the delay, I'll push it today. -- Christian ...and it goes in as: 6976186: integrate Shark HotSpot changes -- Christian From gbenson at redhat.com Wed Aug 11 02:10:10 2010 From: gbenson at redhat.com (Gary Benson) Date: Wed, 11 Aug 2010 10:10:10 +0100 Subject: Review Request: Shark In-Reply-To: <1281516371.17347.18.camel@macbook> References: <20100611141655.GI3674@redhat.com> <1277195726.27950.40.camel@macbook> <20100622132939.GC3420@redhat.com> <1277216100.27950.46.camel@macbook> <20100622144841.GD3420@redhat.com> <1278582298.19588.22.camel@macbook> <4C365A67.6070402@oracle.com> <1281515082.17347.16.camel@macbook> <1281516371.17347.18.camel@macbook> Message-ID: <20100811091010.GA3420@redhat.com> Christian Thalinger wrote: > On Wed, 2010-08-11 at 10:24 +0200, Christian Thalinger wrote: > > On Thu, 2010-07-08 at 16:08 -0700, Vladimir Kozlov wrote: > > > Changes in our files looks fine. > > > > Sorry for the delay, I'll push it today. > > ...and it goes in as: > > 6976186: integrate Shark HotSpot changes Awesome, thanks! Cheers, Gary -- http://gbenson.net/ From gbenson at redhat.com Wed Aug 11 02:11:03 2010 From: gbenson at redhat.com (Gary Benson) Date: Wed, 11 Aug 2010 10:11:03 +0100 Subject: Review Request: Shark In-Reply-To: <1281516066.17347.17.camel@macbook> References: <20100611141655.GI3674@redhat.com> <1281516066.17347.17.camel@macbook> Message-ID: <20100811091103.GB3420@redhat.com> Christian Thalinger wrote: > On Fri, 2010-06-11 at 15:16 +0100, Gary Benson wrote: > > And this webrev adds a little bit of build stuff to the > > non-HotSpot parts of the JDK: > > > > http://cr.openjdk.java.net/~gbenson/shark-build-01/ > > Did you send this one out to the JDK people for review? > I will not push these changes. Ah, I didn't send it, no. I meant to once the main bit was underway, and then I forgot... :) I'll do it now! Cheers, Gary -- http://gbenson.net/ From christian.thalinger at oracle.com Wed Aug 11 02:40:21 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 11 Aug 2010 11:40:21 +0200 Subject: Review Request: Shark In-Reply-To: <20100811091010.GA3420@redhat.com> References: <20100611141655.GI3674@redhat.com> <1277195726.27950.40.camel@macbook> <20100622132939.GC3420@redhat.com> <1277216100.27950.46.camel@macbook> <20100622144841.GD3420@redhat.com> <1278582298.19588.22.camel@macbook> <4C365A67.6070402@oracle.com> <1281515082.17347.16.camel@macbook> <1281516371.17347.18.camel@macbook> <20100811091010.GA3420@redhat.com> Message-ID: <1281519621.17347.20.camel@macbook> On Wed, 2010-08-11 at 10:10 +0100, Gary Benson wrote: > Christian Thalinger wrote: > > On Wed, 2010-08-11 at 10:24 +0200, Christian Thalinger wrote: > > > On Thu, 2010-07-08 at 16:08 -0700, Vladimir Kozlov wrote: > > > > Changes in our files looks fine. > > > > > > Sorry for the delay, I'll push it today. > > > > ...and it goes in as: > > > > 6976186: integrate Shark HotSpot changes > > Awesome, thanks! I needed to fix some copyright years, one copyright header (make/linux/makefiles/shark.make) and a couple of trailing spaces. It's in the JPRT queue. -- Christian From Christian.Thalinger at Sun.COM Wed Aug 11 03:46:15 2010 From: Christian.Thalinger at Sun.COM (Christian.Thalinger at Sun.COM) Date: Wed, 11 Aug 2010 10:46:15 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6975855: don't emit deopt MH handler in C1 if not required Message-ID: <20100811104619.7581547095@hg.openjdk.java.net> Changeset: 4a665be40fd3 Author: twisti Date: 2010-08-11 01:17 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/4a665be40fd3 6975855: don't emit deopt MH handler in C1 if not required Summary: This CR implements the same for C1 as 6926782 for C2. Reviewed-by: never ! src/share/vm/c1/c1_Compilation.cpp ! src/share/vm/c1/c1_Compilation.hpp ! src/share/vm/c1/c1_LIRAssembler.cpp ! src/share/vm/code/nmethod.cpp From Christian.Thalinger at Sun.COM Wed Aug 11 09:01:54 2010 From: Christian.Thalinger at Sun.COM (Christian.Thalinger at Sun.COM) Date: Wed, 11 Aug 2010 16:01:54 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6976186: integrate Shark HotSpot changes Message-ID: <20100811160157.CF641470A1@hg.openjdk.java.net> Changeset: d2ede61b7a12 Author: twisti Date: 2010-08-11 05:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/d2ede61b7a12 6976186: integrate Shark HotSpot changes Summary: Shark is a JIT compiler for Zero that uses the LLVM compiler infrastructure. Reviewed-by: kvn, twisti Contributed-by: Gary Benson ! make/Makefile ! make/linux/Makefile ! make/linux/makefiles/gcc.make + make/linux/makefiles/shark.make ! make/linux/makefiles/top.make ! make/linux/makefiles/vm.make ! src/cpu/zero/vm/disassembler_zero.hpp + src/cpu/zero/vm/shark_globals_zero.hpp ! src/share/vm/ci/ciMethod.cpp ! src/share/vm/ci/ciMethod.hpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/code/nmethod.hpp ! src/share/vm/compiler/abstractCompiler.hpp ! src/share/vm/compiler/compileBroker.cpp ! src/share/vm/compiler/disassembler.cpp + src/share/vm/includeDB_shark ! src/share/vm/memory/cardTableModRefBS.hpp ! src/share/vm/oops/methodOop.cpp ! src/share/vm/runtime/deoptimization.cpp ! src/share/vm/runtime/frame.cpp ! src/share/vm/runtime/globals.cpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/vm_version.cpp + src/share/vm/shark/llvmHeaders.hpp + src/share/vm/shark/llvmValue.hpp + src/share/vm/shark/sharkBlock.cpp + src/share/vm/shark/sharkBlock.hpp + src/share/vm/shark/sharkBuilder.cpp + src/share/vm/shark/sharkBuilder.hpp + src/share/vm/shark/sharkCacheDecache.cpp + src/share/vm/shark/sharkCacheDecache.hpp + src/share/vm/shark/sharkCodeBuffer.hpp + src/share/vm/shark/sharkCompiler.cpp + src/share/vm/shark/sharkCompiler.hpp + src/share/vm/shark/sharkConstant.cpp + src/share/vm/shark/sharkConstant.hpp + src/share/vm/shark/sharkContext.cpp + src/share/vm/shark/sharkContext.hpp + src/share/vm/shark/sharkEntry.hpp + src/share/vm/shark/sharkFunction.cpp + src/share/vm/shark/sharkFunction.hpp + src/share/vm/shark/sharkInliner.cpp + src/share/vm/shark/sharkInliner.hpp + src/share/vm/shark/sharkIntrinsics.cpp + src/share/vm/shark/sharkIntrinsics.hpp + src/share/vm/shark/sharkInvariants.cpp + src/share/vm/shark/sharkInvariants.hpp + src/share/vm/shark/sharkMemoryManager.cpp + src/share/vm/shark/sharkMemoryManager.hpp + src/share/vm/shark/sharkNativeWrapper.cpp + src/share/vm/shark/sharkNativeWrapper.hpp + src/share/vm/shark/sharkRuntime.cpp + src/share/vm/shark/sharkRuntime.hpp + src/share/vm/shark/sharkStack.cpp + src/share/vm/shark/sharkStack.hpp + src/share/vm/shark/sharkState.cpp + src/share/vm/shark/sharkState.hpp + src/share/vm/shark/sharkStateScanner.cpp + src/share/vm/shark/sharkStateScanner.hpp + src/share/vm/shark/sharkTopLevelBlock.cpp + src/share/vm/shark/sharkTopLevelBlock.hpp + src/share/vm/shark/sharkType.hpp + src/share/vm/shark/sharkValue.cpp + src/share/vm/shark/sharkValue.hpp + src/share/vm/shark/shark_globals.cpp + src/share/vm/shark/shark_globals.hpp ! src/share/vm/utilities/macros.hpp From tom.rodriguez at oracle.com Wed Aug 11 12:00:36 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 11 Aug 2010 12:00:36 -0700 Subject: review (S) for 6976372: # assert(_owner == Thread::current()) failed: invariant Message-ID: http://cr.openjdk.java.net/~never/6976372/ 6976372: # assert(_owner == Thread::current()) failed: invariant Reviewed-by: The full code cache path in the native wrapper has an unnecessary call to MutexUnlocker which asserts when executed. The fix is simply to remove the call. Additionally handle_full_code_cache assumes it's called from the CompilerThread but this isn't true for native wrappers. The fix is to log to xtty if it exists, which also covers the LogCompilation case when called from a compiler thread. Tested by faking code cache full in the debugger. src/share/vm/runtime/sharedRuntime.cpp src/share/vm/compiler/compileBroker.cpp From tom.rodriguez at oracle.com Wed Aug 11 12:02:38 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 11 Aug 2010 12:02:38 -0700 Subject: review (S) for 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 Message-ID: <19C98D00-9241-40E6-9A42-8D9ABD710E4E@oracle.com> http://cr.openjdk.java.net/~never/6974176 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 Reviewed-by: The changes for 6965184 reordered the flush_dependencies and post_compiled_method_unload which allowed a safepoint to happen in between. Zombie nmethods don't have their oops scanned so when flush_dependencies was run it was reading stale oops. If the oops didn't move then it would work as intended but otherwise you might get various weird failures. The fix is to restore the original order. I also added a No_SafePoint_Verifier and some logic to mark the oops as potentially stale. Tested with failing test on SQE machine that reproduced it reliably, plus nsk suites with -XX:+DeoptimizeALot -XX:CompileThreshold=100. From vladimir.kozlov at oracle.com Wed Aug 11 12:13:39 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Aug 2010 12:13:39 -0700 Subject: review (S) for 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 In-Reply-To: <19C98D00-9241-40E6-9A42-8D9ABD710E4E@oracle.com> References: <19C98D00-9241-40E6-9A42-8D9ABD710E4E@oracle.com> Message-ID: <4C62F663.5020700@oracle.com> Tom, Should the next code be in own scope {} ?: // zombie only - if a JVMTI agent has enabled the CompiledMethodUnload event // and it hasn't already been reported for this nmethod then report it now. // (the event may have been reported earilier if the GC marked it for unloading). + Pause_No_Safepoint_Verifier pnsv(&nsv); post_compiled_method_unload(); MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); flush_dependencies(NULL); + #ifdef ASSERT + // It's no longer safe to access the oops section since zombie + // nmethods aren't scanned for GC. + _oops_are_stale = true; + #endif Thanks, Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6974176 > > 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 > Reviewed-by: > > The changes for 6965184 reordered the flush_dependencies and > post_compiled_method_unload which allowed a safepoint to happen in > between. Zombie nmethods don't have their oops scanned so when > flush_dependencies was run it was reading stale oops. If the oops > didn't move then it would work as intended but otherwise you might get > various weird failures. The fix is to restore the original order. I > also added a No_SafePoint_Verifier and some logic to mark the oops as > potentially stale. Tested with failing test on SQE machine that > reproduced it reliably, plus nsk suites with -XX:+DeoptimizeALot > -XX:CompileThreshold=100. From tom.rodriguez at oracle.com Wed Aug 11 12:22:03 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 11 Aug 2010 12:22:03 -0700 Subject: review (S) for 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 In-Reply-To: <4C62F663.5020700@oracle.com> References: <19C98D00-9241-40E6-9A42-8D9ABD710E4E@oracle.com> <4C62F663.5020700@oracle.com> Message-ID: <9884C083-FD16-4F65-BB9B-7DF44E5DBECD@oracle.com> I could explicitly wrap it but I figured it was ok since it terminated the scope. It's probably clearer if I wrap it. (copy paste made that code look wrong, since the flush_dep call has actually been moved). I've updated the webrev. tom On Aug 11, 2010, at 12:13 PM, Vladimir Kozlov wrote: > Tom, > > Should the next code be in own scope {} ?: > > // zombie only - if a JVMTI agent has enabled the CompiledMethodUnload event > // and it hasn't already been reported for this nmethod then report it now. > // (the event may have been reported earilier if the GC marked it for unloading). > + Pause_No_Safepoint_Verifier pnsv(&nsv); > post_compiled_method_unload(); > > MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); > flush_dependencies(NULL); > + #ifdef ASSERT > + // It's no longer safe to access the oops section since zombie > + // nmethods aren't scanned for GC. > + _oops_are_stale = true; > + #endif > > Thanks, > Vladimir > > Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/6974176 >> 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 >> Reviewed-by: >> The changes for 6965184 reordered the flush_dependencies and >> post_compiled_method_unload which allowed a safepoint to happen in >> between. Zombie nmethods don't have their oops scanned so when >> flush_dependencies was run it was reading stale oops. If the oops >> didn't move then it would work as intended but otherwise you might get >> various weird failures. The fix is to restore the original order. I >> also added a No_SafePoint_Verifier and some logic to mark the oops as >> potentially stale. Tested with failing test on SQE machine that >> reproduced it reliably, plus nsk suites with -XX:+DeoptimizeALot >> -XX:CompileThreshold=100. From vladimir.kozlov at oracle.com Wed Aug 11 12:40:48 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Aug 2010 12:40:48 -0700 Subject: review (S) for 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 In-Reply-To: <9884C083-FD16-4F65-BB9B-7DF44E5DBECD@oracle.com> References: <19C98D00-9241-40E6-9A42-8D9ABD710E4E@oracle.com> <4C62F663.5020700@oracle.com> <9884C083-FD16-4F65-BB9B-7DF44E5DBECD@oracle.com> Message-ID: <4C62FCC0.3080505@oracle.com> Looks good. Vladimir Tom Rodriguez wrote: > I could explicitly wrap it but I figured it was ok since it terminated the scope. It's probably clearer if I wrap it. (copy paste made that code look wrong, since the flush_dep call has actually been moved). I've updated the webrev. > > tom > > On Aug 11, 2010, at 12:13 PM, Vladimir Kozlov wrote: > >> Tom, >> >> Should the next code be in own scope {} ?: >> >> // zombie only - if a JVMTI agent has enabled the CompiledMethodUnload event >> // and it hasn't already been reported for this nmethod then report it now. >> // (the event may have been reported earilier if the GC marked it for unloading). >> + Pause_No_Safepoint_Verifier pnsv(&nsv); >> post_compiled_method_unload(); >> >> MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); >> flush_dependencies(NULL); >> + #ifdef ASSERT >> + // It's no longer safe to access the oops section since zombie >> + // nmethods aren't scanned for GC. >> + _oops_are_stale = true; >> + #endif >> >> Thanks, >> Vladimir >> >> Tom Rodriguez wrote: >>> http://cr.openjdk.java.net/~never/6974176 >>> 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 >>> Reviewed-by: >>> The changes for 6965184 reordered the flush_dependencies and >>> post_compiled_method_unload which allowed a safepoint to happen in >>> between. Zombie nmethods don't have their oops scanned so when >>> flush_dependencies was run it was reading stale oops. If the oops >>> didn't move then it would work as intended but otherwise you might get >>> various weird failures. The fix is to restore the original order. I >>> also added a No_SafePoint_Verifier and some logic to mark the oops as >>> potentially stale. Tested with failing test on SQE machine that >>> reproduced it reliably, plus nsk suites with -XX:+DeoptimizeALot >>> -XX:CompileThreshold=100. > From vladimir.kozlov at oracle.com Wed Aug 11 14:10:42 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 11 Aug 2010 21:10:42 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 Message-ID: <20100811211054.29BE3470B3@hg.openjdk.java.net> Changeset: 6c9cc03d8726 Author: kvn Date: 2010-08-11 10:48 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/6c9cc03d8726 6973329: C2 with Zero based COOP produces code with broken anti-dependency on x86 Summary: Recompile without subsuming loads if RA try to clone a node with anti_dependence. Reviewed-by: never ! src/share/vm/includeDB_compiler2 ! src/share/vm/opto/lcm.cpp ! src/share/vm/opto/reg_split.cpp + test/compiler/6973329/Test.java From christian.thalinger at oracle.com Thu Aug 12 00:29:12 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 12 Aug 2010 09:29:12 +0200 Subject: review (S) for 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 In-Reply-To: <9884C083-FD16-4F65-BB9B-7DF44E5DBECD@oracle.com> References: <19C98D00-9241-40E6-9A42-8D9ABD710E4E@oracle.com> <4C62F663.5020700@oracle.com> <9884C083-FD16-4F65-BB9B-7DF44E5DBECD@oracle.com> Message-ID: <1281598152.17347.33.camel@macbook> On Wed, 2010-08-11 at 12:22 -0700, Tom Rodriguez wrote: > I could explicitly wrap it but I figured it was ok since it terminated > the scope. It's probably clearer if I wrap it. (copy paste made that > code look wrong, since the flush_dep call has actually been moved). > I've updated the webrev. Looks good. -- Christian From christian.thalinger at oracle.com Thu Aug 12 00:45:58 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 12 Aug 2010 09:45:58 +0200 Subject: review (S) for 6976372: # assert(_owner == Thread::current()) failed: invariant In-Reply-To: References: Message-ID: <1281599158.17347.34.camel@macbook> On Wed, 2010-08-11 at 12:00 -0700, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6976372/ > > 6976372: # assert(_owner == Thread::current()) failed: invariant > Reviewed-by: > > The full code cache path in the native wrapper has an unnecessary call > to MutexUnlocker which asserts when executed. The fix is simply to > remove the call. Additionally handle_full_code_cache assumes it's > called from the CompilerThread but this isn't true for native > wrappers. The fix is to log to xtty if it exists, which also covers > the LogCompilation case when called from a compiler thread. Tested by > faking code cache full in the debugger. > > src/share/vm/runtime/sharedRuntime.cpp > src/share/vm/compiler/compileBroker.cpp Looks good. -- Christian From gbenson at redhat.com Thu Aug 12 01:56:11 2010 From: gbenson at redhat.com (Gary Benson) Date: Thu, 12 Aug 2010 09:56:11 +0100 Subject: Review Request: Shark In-Reply-To: <1281519621.17347.20.camel@macbook> References: <1277195726.27950.40.camel@macbook> <20100622132939.GC3420@redhat.com> <1277216100.27950.46.camel@macbook> <20100622144841.GD3420@redhat.com> <1278582298.19588.22.camel@macbook> <4C365A67.6070402@oracle.com> <1281515082.17347.16.camel@macbook> <1281516371.17347.18.camel@macbook> <20100811091010.GA3420@redhat.com> <1281519621.17347.20.camel@macbook> Message-ID: <20100812085610.GA3263@redhat.com> Christian Thalinger wrote: > On Wed, 2010-08-11 at 10:10 +0100, Gary Benson wrote: > > Christian Thalinger wrote: > > > On Wed, 2010-08-11 at 10:24 +0200, Christian Thalinger wrote: > > > > On Thu, 2010-07-08 at 16:08 -0700, Vladimir Kozlov wrote: > > > > > Changes in our files looks fine. > > > > > > > > Sorry for the delay, I'll push it today. > > > > > > ...and it goes in as: > > > > > > 6976186: integrate Shark HotSpot changes > > > > Awesome, thanks! > > I needed to fix some copyright years, one copyright header > (make/linux/makefiles/shark.make) and a couple of trailing spaces. > It's in the JPRT queue. Thank you. And sorry for all the copyrights and whitespace! Cheers, Gary -- http://gbenson.net/ From gbenson at redhat.com Thu Aug 12 03:52:42 2010 From: gbenson at redhat.com (Gary Benson) Date: Thu, 12 Aug 2010 11:52:42 +0100 Subject: Review Request: Zero and Shark fixes Message-ID: <20100812105242.GD3263@redhat.com> Hi all, This webrev contains a number of fixes for Zero and Shark: http://cr.openjdk.java.net/~gbenson/zero-shark-fixes-01/ Firstly, the two bugs 6953477 and 6730276 required changes to be made to Zero. Secondly, an assertion that is invalid on Shark was removed, which allows debug builds of Shark to be built on ia32. I don't have a bug id for this. Cheers, Gary -- http://gbenson.net/ From vladimir.kozlov at oracle.com Thu Aug 12 10:48:23 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Aug 2010 10:48:23 -0700 Subject: Request for reviews (XS): 6976400: "Meet Not Symmetric" Message-ID: <4C6433E7.1000804@oracle.com> http://cr.openjdk.java.net/~kvn/6976400/webrev Fixed 6976400: "Meet Not Symmetric" Meet of integer array pointer type with array pointer which has j.l.Object klass incorrectly falls to bottom: t = byte[int:>=0]:NotNull:exact+12 * this= bottom[int:>=0]+12 * mt=(t meet this)= bottom[int:>=0]+12 * t_dual= int:127..-128:www[int:max..0]:AnyNull:exact+12 *,iid=top mt_dual= top[int:max..0]:TopPTR+12 *,iid=top mt_dual meet t_dual= bottom[int:max..0]:AnyNull:exact+12 * [narrow] Solution: Add missing checks for j.l.Object klass in TypeAryPtr::xmeet(). Tested with failing cases, CTW, java/lang regression tests. From tom.rodriguez at oracle.com Thu Aug 12 12:55:33 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 12 Aug 2010 12:55:33 -0700 Subject: Request for reviews (XS): 6976400: "Meet Not Symmetric" In-Reply-To: <4C6433E7.1000804@oracle.com> References: <4C6433E7.1000804@oracle.com> Message-ID: I'm having some trouble wrapping my head around this. The output below is for when it fails, right? What does it return for these after your fix? I guess I don't see why checking for j.l.O is correct. tom On Aug 12, 2010, at 10:48 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6976400/webrev > > Fixed 6976400: "Meet Not Symmetric" > > Meet of integer array pointer type with array pointer > which has j.l.Object klass incorrectly falls to bottom: > > t = byte[int:>=0]:NotNull:exact+12 * > this= bottom[int:>=0]+12 * > mt=(t meet this)= bottom[int:>=0]+12 * > t_dual= int:127..-128:www[int:max..0]:AnyNull:exact+12 *,iid=top > mt_dual= top[int:max..0]:TopPTR+12 *,iid=top > mt_dual meet t_dual= bottom[int:max..0]:AnyNull:exact+12 * [narrow] > > Solution: > Add missing checks for j.l.Object klass in TypeAryPtr::xmeet(). > > Tested with failing cases, CTW, java/lang regression tests. From vladimir.kozlov at oracle.com Thu Aug 12 13:15:22 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Aug 2010 13:15:22 -0700 Subject: Request for reviews (XS): 6976400: "Meet Not Symmetric" In-Reply-To: References: <4C6433E7.1000804@oracle.com> Message-ID: <4C64565A.6040001@oracle.com> Tom, Tom Rodriguez wrote: > I'm having some trouble wrapping my head around this. The output below is for when it fails, right? What does it return for these after your fix? I guess I don't see why checking for j.l.O is correct. Yes, the output below is for failing case. The assert is triggered because 'mt_dual meet t_dual' != 't_dual'. After the fix only 'mt_dual meet t_dual' was changed: t2t->dump() // mt_dual meet t_dual int:127..-128:www[int:max..0]:AnyNull:exact+12 *,iid=top Based on the comments at the part which I changed the resulting meet type should be bottom only when we have klasses which can't be subclasses. But if one klass is j.l.O then an other klass can be its subclass. This is how I'm interpreting it. Thanks, Vladimir > > tom > > On Aug 12, 2010, at 10:48 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/6976400/webrev >> >> Fixed 6976400: "Meet Not Symmetric" >> >> Meet of integer array pointer type with array pointer >> which has j.l.Object klass incorrectly falls to bottom: >> >> t = byte[int:>=0]:NotNull:exact+12 * >> this= bottom[int:>=0]+12 * >> mt=(t meet this)= bottom[int:>=0]+12 * >> t_dual= int:127..-128:www[int:max..0]:AnyNull:exact+12 *,iid=top >> mt_dual= top[int:max..0]:TopPTR+12 *,iid=top >> mt_dual meet t_dual= bottom[int:max..0]:AnyNull:exact+12 * [narrow] >> >> Solution: >> Add missing checks for j.l.Object klass in TypeAryPtr::xmeet(). >> >> Tested with failing cases, CTW, java/lang regression tests. > From tom.rodriguez at oracle.com Thu Aug 12 18:35:52 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Fri, 13 Aug 2010 01:35:52 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 Message-ID: <20100813013554.B78A247117@hg.openjdk.java.net> Changeset: 71faaa8e3ccc Author: never Date: 2010-08-12 16:38 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/71faaa8e3ccc 6974176: ShouldNotReachHere, instanceKlass.cpp:1426 Reviewed-by: kvn, twisti ! src/share/vm/code/nmethod.cpp ! src/share/vm/code/nmethod.hpp From tom.rodriguez at oracle.com Fri Aug 13 01:31:47 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Fri, 13 Aug 2010 08:31:47 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt Message-ID: <20100813083149.9F6CA47142@hg.openjdk.java.net> Changeset: da877bdc9000 Author: never Date: 2010-08-12 23:34 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/da877bdc9000 6975006: assert(check.is_deoptimized_frame()) failed: missed deopt Reviewed-by: kvn, twisti ! src/share/vm/runtime/frame.cpp ! src/share/vm/runtime/frame.hpp ! src/share/vm/runtime/safepoint.cpp ! src/share/vm/runtime/safepoint.hpp ! src/share/vm/runtime/thread.cpp From tom.rodriguez at oracle.com Fri Aug 13 20:20:42 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Sat, 14 Aug 2010 03:20:42 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6976372: # assert(_owner == Thread::current()) failed: invariant Message-ID: <20100814032051.ACE5747177@hg.openjdk.java.net> Changeset: a62d332029cf Author: never Date: 2010-08-13 15:14 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/a62d332029cf 6976372: # assert(_owner == Thread::current()) failed: invariant Reviewed-by: kvn, twisti ! src/share/vm/compiler/compileBroker.cpp ! src/share/vm/runtime/sharedRuntime.cpp From christian.thalinger at oracle.com Tue Aug 17 01:30:28 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 17 Aug 2010 10:30:28 +0200 Subject: Review Request: Zero and Shark fixes In-Reply-To: <20100812105242.GD3263@redhat.com> References: <20100812105242.GD3263@redhat.com> Message-ID: <1282033828.20216.8.camel@macbook> On Thu, 2010-08-12 at 11:52 +0100, Gary Benson wrote: > Hi all, > > This webrev contains a number of fixes for Zero and Shark: > > http://cr.openjdk.java.net/~gbenson/zero-shark-fixes-01/ 6977640: Zero and Shark fixes > > Firstly, the two bugs 6953477 and 6730276 required changes > to be made to Zero. Secondly, an assertion that is invalid > on Shark was removed, which allows debug builds of Shark to > be built on ia32. Hmm. Two questions: 1. Why does the assert not hold? 2. Why would anyone build Shark on x86? -- Christian From gbenson at redhat.com Tue Aug 17 02:03:50 2010 From: gbenson at redhat.com (Gary Benson) Date: Tue, 17 Aug 2010 10:03:50 +0100 Subject: Review Request: Zero and Shark fixes In-Reply-To: <1282033828.20216.8.camel@macbook> References: <20100812105242.GD3263@redhat.com> <1282033828.20216.8.camel@macbook> Message-ID: <20100817090349.GA5828@redhat.com> Christian Thalinger wrote: > On Thu, 2010-08-12 at 11:52 +0100, Gary Benson wrote: > > Hi all, > > > > This webrev contains a number of fixes for Zero and Shark: > > > > http://cr.openjdk.java.net/~gbenson/zero-shark-fixes-01/ > > 6977640: Zero and Shark fixes > > > Firstly, the two bugs 6953477 and 6730276 required changes > > to be made to Zero. Secondly, an assertion that is invalid > > on Shark was removed, which allows debug builds of Shark to > > be built on ia32. > > Hmm. Two questions: > > 1. Why does the assert not hold? It's checking something about the stack layout which is true with the native x86 stack but false with the Zero stack. > 2. Why would anyone build Shark on x86? Testing. If you can reproduce an ARM issue on x86 then you can build it in 30 seconds instead of 30 hours :) Cheers, Gary -- http://gbenson.net/ From gbenson at redhat.com Tue Aug 17 02:17:08 2010 From: gbenson at redhat.com (Gary Benson) Date: Tue, 17 Aug 2010 10:17:08 +0100 Subject: Review Request: Final Shark buildsystem piece Message-ID: <20100817091708.GB5828@redhat.com> Hi all, This webrev contains the final piece that integrates Shark into the OpenJDK build: http://cr.openjdk.java.net/~gbenson/shark-build-03/ I kept this separate from the main commit because I thought it needed approval by build-dev, but Kelly said it needs approval by the HotSpot team. I don't have a bug id for this. Cheers, Gary -- http://gbenson.net/ From christian.thalinger at oracle.com Tue Aug 17 02:19:17 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 17 Aug 2010 11:19:17 +0200 Subject: Review Request: Zero and Shark fixes In-Reply-To: <20100817090349.GA5828@redhat.com> References: <20100812105242.GD3263@redhat.com> <1282033828.20216.8.camel@macbook> <20100817090349.GA5828@redhat.com> Message-ID: <1282036757.20216.14.camel@macbook> On Tue, 2010-08-17 at 10:03 +0100, Gary Benson wrote: > > 1. Why does the assert not hold? > > It's checking something about the stack layout which is true > with the native x86 stack but false with the Zero stack. OK. > > > 2. Why would anyone build Shark on x86? > > Testing. If you can reproduce an ARM issue on x86 then you > can build it in 30 seconds instead of 30 hours :) Haha! That makes sense. The changes look good. -- Christian From kelly.ohair at oracle.com Tue Aug 17 13:51:21 2010 From: kelly.ohair at oracle.com (Kelly O'Hair) Date: Tue, 17 Aug 2010 13:51:21 -0700 Subject: Review Request: Final Shark buildsystem piece In-Reply-To: <20100817091708.GB5828@redhat.com> References: <20100817091708.GB5828@redhat.com> Message-ID: Gary, I had assumed this was a file in the hotspot repo, but it's actually in the top repo. Feel free to push this change into the tl forest, or whereever you pushed it's cohort changeset. Sorry. -kto On Aug 17, 2010, at 2:17 AM, Gary Benson wrote: > Hi all, > > This webrev contains the final piece that integrates Shark into > the OpenJDK build: > > http://cr.openjdk.java.net/~gbenson/shark-build-03/ > > I kept this separate from the main commit because I thought it > needed approval by build-dev, but Kelly said it needs approval > by the HotSpot team. > > I don't have a bug id for this. > > Cheers, > Gary > > -- > http://gbenson.net/ From christian.thalinger at oracle.com Wed Aug 18 01:17:29 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 18 Aug 2010 10:17:29 +0200 Subject: Review Request: Final Shark buildsystem piece In-Reply-To: References: <20100817091708.GB5828@redhat.com> Message-ID: <1282119450.25224.7.camel@macbook> On Tue, 2010-08-17 at 13:51 -0700, Kelly O'Hair wrote: > Gary, > > I had assumed this was a file in the hotspot repo, but it's actually > in the top repo. > Feel free to push this change into the tl forest, or whereever you > pushed it's cohort changeset. Gary, do you need a new CR? -- Christian From gbenson at redhat.com Wed Aug 18 01:53:53 2010 From: gbenson at redhat.com (Gary Benson) Date: Wed, 18 Aug 2010 09:53:53 +0100 Subject: Review Request: Final Shark buildsystem piece In-Reply-To: <1282119450.25224.7.camel@macbook> References: <20100817091708.GB5828@redhat.com> <1282119450.25224.7.camel@macbook> Message-ID: <20100818085353.GA3280@redhat.com> Christian Thalinger wrote: > On Tue, 2010-08-17 at 13:51 -0700, Kelly O'Hair wrote: > > I had assumed this was a file in the hotspot repo, but it's > > actually in the top repo. Feel free to push this change into > > the tl forest, or whereever you pushed it's cohort changeset. > > Gary, do you need a new CR? -- Christian Yes please :) Cheers, Gary -- http://gbenson.net/ From Christian.Thalinger at Sun.COM Wed Aug 18 06:09:47 2010 From: Christian.Thalinger at Sun.COM (Christian.Thalinger at Sun.COM) Date: Wed, 18 Aug 2010 13:09:47 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6977640: Zero and Shark fixes Message-ID: <20100818130955.2C9BB4727A@hg.openjdk.java.net> Changeset: 13b87063b4d8 Author: twisti Date: 2010-08-18 01:22 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/13b87063b4d8 6977640: Zero and Shark fixes Summary: A number of fixes for Zero and Shark. Reviewed-by: twisti Contributed-by: Gary Benson ! src/cpu/zero/vm/bytecodeInterpreter_zero.inline.hpp ! src/cpu/zero/vm/javaFrameAnchor_zero.hpp ! src/os_cpu/linux_zero/vm/os_linux_zero.cpp ! src/os_cpu/linux_zero/vm/thread_linux_zero.cpp ! src/share/vm/interpreter/bytecodeInterpreter.cpp From tom.rodriguez at oracle.com Wed Aug 18 15:19:14 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 18 Aug 2010 15:19:14 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast Message-ID: http://cr.openjdk.java.net/~never/6978249 6978249: spill between cpu and fpu registers when those moves are fast Reviewed-by: On some architectures moves between CPU and FPU registers are fast so they can be used for spilling instead of the stack. This change adds a new flag UseFPUForSpilling and sets up the spill reg masks to allow this when the flag is on. Currently for Nehalem class chips it seems to be a uniform win but we'll keep it under AggressiveOpts for now. There are some minor changes to spilling logic that are currently guarded until we determine that they are generally a good idea. I also moved the logic for PrintFlagsFinal since the initialization of several subsystems may change some flag values which will be missed by the current location. Tested with scimark, ctw and the nsk tests on 32 and 64 bit. From vladimir.kozlov at oracle.com Wed Aug 18 16:36:18 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Aug 2010 16:36:18 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: References: Message-ID: <4C6C6E72.9050909@oracle.com> Tom, c2_globals.hpp: + "Spill integer registers to XMM instead of stack when possible") \ ^FPU registers coalesce.cpp: How it is related to FPU spilling? matcher.cpp: this code is x86 specific, for sparc it would be different. Need comment so we will not forget to fix it for sparc. Why you added spill mask for F/D to I/L? Are these changes allow to spill FPU regs to GPU regs? Also could you use one #ifdef? reg_split.cpp: The next comment does not explain why we need to split for Calls: // These need a Split regardless of overlap or pressure On 8/18/10 3:19 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6978249 > > 6978249: spill between cpu and fpu registers when those moves are fast > Reviewed-by: > > On some architectures moves between CPU and FPU registers are fast so > they can be used for spilling instead of the stack. This change adds > a new flag UseFPUForSpilling and sets up the spill reg masks to allow > this when the flag is on. Currently for Nehalem class chips it seems > to be a uniform win but we'll keep it under AggressiveOpts for now. > There are some minor changes to spilling logic that are currently > guarded until we determine that they are generally a good idea. I > also moved the logic for PrintFlagsFinal since the initialization of > several subsystems may change some flag values which will be missed by > the current location. Tested with scimark, ctw and the nsk tests on > 32 and 64 bit. From tom.rodriguez at oracle.com Wed Aug 18 17:01:36 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 18 Aug 2010 17:01:36 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <4C6C6E72.9050909@oracle.com> References: <4C6C6E72.9050909@oracle.com> Message-ID: On Aug 18, 2010, at 4:36 PM, Vladimir Kozlov wrote: > Tom, > > c2_globals.hpp: > > + "Spill integer registers to XMM instead of stack when possible") \ > ^FPU registers ok. > > coalesce.cpp: > How it is related to FPU spilling? It's not directly related but there were some spilling tweaks done as part of the change. I need to go back and evaluate them more carefully but I want to include them for now which is why I left them guarded. > > matcher.cpp: > this code is x86 specific, for sparc it would be different. Need comment so we will not forget to fix it for sparc. I think it might have to become AD specific because the sparc to instructions aren't symmetric. I added some comments explaining the logic. > Why you added spill mask for F/D to I/L? Are these changes allow to spill FPU regs to GPU regs? Yes, it allows spilling in both directions. If they are fast enough for one why not allow the other? > Also could you use one #ifdef? > > reg_split.cpp: > The next comment does not explain why we need to split for Calls: > // These need a Split regardless of overlap or pressure I thought I understood this one but reading it again I'm not sure. I'll look at it some more. tom > > > On 8/18/10 3:19 PM, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/6978249 >> >> 6978249: spill between cpu and fpu registers when those moves are fast >> Reviewed-by: >> >> On some architectures moves between CPU and FPU registers are fast so >> they can be used for spilling instead of the stack. This change adds >> a new flag UseFPUForSpilling and sets up the spill reg masks to allow >> this when the flag is on. Currently for Nehalem class chips it seems >> to be a uniform win but we'll keep it under AggressiveOpts for now. >> There are some minor changes to spilling logic that are currently >> guarded until we determine that they are generally a good idea. I >> also moved the logic for PrintFlagsFinal since the initialization of >> several subsystems may change some flag values which will be missed by >> the current location. Tested with scimark, ctw and the nsk tests on >> 32 and 64 bit. From vladimir.kozlov at oracle.com Wed Aug 18 17:06:26 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Aug 2010 17:06:26 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: References: <4C6C6E72.9050909@oracle.com> Message-ID: <4C6C7582.8000402@oracle.com> On 8/18/10 5:01 PM, Tom Rodriguez wrote: >> coalesce.cpp: >> How it is related to FPU spilling? > > It's not directly related but there were some spilling tweaks done as part of the change. I need to go back and evaluate them more carefully but I want to include them for now which is why I left them guarded. OK. >> matcher.cpp: >> this code is x86 specific, for sparc it would be different. Need comment so we will not forget to fix it for sparc. > > I think it might have to become AD specific because the sparc to instructions aren't symmetric. I added some comments explaining the logic. Good. >> Why you added spill mask for F/D to I/L? Are these changes allow to spill FPU regs to GPU regs? > > Yes, it allows spilling in both directions. If they are fast enough for one why not allow the other? OK. >> reg_split.cpp: >> The next comment does not explain why we need to split for Calls: >> // These need a Split regardless of overlap or pressure > > I thought I understood this one but reading it again I'm not sure. I'll look at it some more. Thanks, Vladimir > > tom > >> >> >> On 8/18/10 3:19 PM, Tom Rodriguez wrote: >>> http://cr.openjdk.java.net/~never/6978249 >>> >>> 6978249: spill between cpu and fpu registers when those moves are fast >>> Reviewed-by: >>> >>> On some architectures moves between CPU and FPU registers are fast so >>> they can be used for spilling instead of the stack. This change adds >>> a new flag UseFPUForSpilling and sets up the spill reg masks to allow >>> this when the flag is on. Currently for Nehalem class chips it seems >>> to be a uniform win but we'll keep it under AggressiveOpts for now. >>> There are some minor changes to spilling logic that are currently >>> guarded until we determine that they are generally a good idea. I >>> also moved the logic for PrintFlagsFinal since the initialization of >>> several subsystems may change some flag values which will be missed by >>> the current location. Tested with scimark, ctw and the nsk tests on >>> 32 and 64 bit. > From christian.thalinger at oracle.com Thu Aug 19 09:39:54 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 19 Aug 2010 18:39:54 +0200 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> Message-ID: <1282235995.29965.2.camel@macbook> On Wed, 2010-08-04 at 00:56 +0000, vladimir.kozlov at oracle.com wrote: > Changeset: 0e35fa8ebccd > Author: kvn > Date: 2010-08-03 15:55 -0700 > URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd > > 6973963: SEGV in ciBlock::start_bci() with EA > Summary: Added more checks into ResourceObj and growableArray to verify correctness of allocation type. > Reviewed-by: never, coleenp, dholmes I get a compiler warning with GCC 4.1.2 after this change: src/share/vm/memory/allocation.cpp: In static member function ?static void ResourceObj::operator delete(void*)?: src/share/vm/memory/allocation.cpp:61: warning: negative integer implicitly converted to unsigned type src/share/vm/memory/allocation.cpp: In destructor ?ResourceObj::~ResourceObj()?: src/share/vm/memory/allocation.cpp:107: warning: negative integer implicitly converted to unsigned type Should I fix it in one of my changes? -- Christian From paul.hohensee at oracle.com Thu Aug 19 09:58:16 2010 From: paul.hohensee at oracle.com (Paul Hohensee) Date: Thu, 19 Aug 2010 12:58:16 -0400 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <1282235995.29965.2.camel@macbook> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> <1282235995.29965.2.camel@macbook> Message-ID: <4C6D62A8.7030704@oracle.com> Already fixed by JohnC2 in the gc repo, cr# 6977924. On 8/19/10 12:39 PM, Christian Thalinger wrote: > On Wed, 2010-08-04 at 00:56 +0000, vladimir.kozlov at oracle.com wrote: >> Changeset: 0e35fa8ebccd >> Author: kvn >> Date: 2010-08-03 15:55 -0700 >> URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd >> >> 6973963: SEGV in ciBlock::start_bci() with EA >> Summary: Added more checks into ResourceObj and growableArray to verify correctness of allocation type. >> Reviewed-by: never, coleenp, dholmes > I get a compiler warning with GCC 4.1.2 after this change: > > src/share/vm/memory/allocation.cpp: In static member function ?static void ResourceObj::operator delete(void*)?: > src/share/vm/memory/allocation.cpp:61: warning: negative integer implicitly converted to unsigned type > src/share/vm/memory/allocation.cpp: In destructor ?ResourceObj::~ResourceObj()?: > src/share/vm/memory/allocation.cpp:107: warning: negative integer implicitly converted to unsigned type > > Should I fix it in one of my changes? > > -- Christian > From vladimir.kozlov at oracle.com Thu Aug 19 10:01:12 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Aug 2010 10:01:12 -0700 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <1282235995.29965.2.camel@macbook> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> <1282235995.29965.2.camel@macbook> Message-ID: <4C6D6358.4000000@oracle.com> You mean casting to uintptr_t? + _allocation = (uintptr_t)badHeapOopVal; Yes, please. Thanks, Vladimir Christian Thalinger wrote: > On Wed, 2010-08-04 at 00:56 +0000, vladimir.kozlov at oracle.com wrote: >> Changeset: 0e35fa8ebccd >> Author: kvn >> Date: 2010-08-03 15:55 -0700 >> URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd >> >> 6973963: SEGV in ciBlock::start_bci() with EA >> Summary: Added more checks into ResourceObj and growableArray to verify correctness of allocation type. >> Reviewed-by: never, coleenp, dholmes > > I get a compiler warning with GCC 4.1.2 after this change: > > src/share/vm/memory/allocation.cpp: In static member function ?static void ResourceObj::operator delete(void*)?: > src/share/vm/memory/allocation.cpp:61: warning: negative integer implicitly converted to unsigned type > src/share/vm/memory/allocation.cpp: In destructor ?ResourceObj::~ResourceObj()?: > src/share/vm/memory/allocation.cpp:107: warning: negative integer implicitly converted to unsigned type > > Should I fix it in one of my changes? > > -- Christian > From vladimir.kozlov at oracle.com Thu Aug 19 10:06:27 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Aug 2010 10:06:27 -0700 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <4C6D62A8.7030704@oracle.com> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> <1282235995.29965.2.camel@macbook> <4C6D62A8.7030704@oracle.com> Message-ID: <4C6D6493.7000303@oracle.com> Thank you, JohnC2 :) Sorry, I totally missed it (6977924) because of the office move. Thanks, Vladimir Paul Hohensee wrote: > Already fixed by JohnC2 in the gc repo, cr# 6977924. > > On 8/19/10 12:39 PM, Christian Thalinger wrote: >> On Wed, 2010-08-04 at 00:56 +0000, vladimir.kozlov at oracle.com wrote: >>> Changeset: 0e35fa8ebccd >>> Author: kvn >>> Date: 2010-08-03 15:55 -0700 >>> URL: >>> http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd >>> >>> 6973963: SEGV in ciBlock::start_bci() with EA >>> Summary: Added more checks into ResourceObj and growableArray to >>> verify correctness of allocation type. >>> Reviewed-by: never, coleenp, dholmes >> I get a compiler warning with GCC 4.1.2 after this change: >> >> src/share/vm/memory/allocation.cpp: In static member function ?static >> void ResourceObj::operator delete(void*)?: >> src/share/vm/memory/allocation.cpp:61: warning: negative integer >> implicitly converted to unsigned type >> src/share/vm/memory/allocation.cpp: In destructor >> ?ResourceObj::~ResourceObj()?: >> src/share/vm/memory/allocation.cpp:107: warning: negative integer >> implicitly converted to unsigned type >> >> Should I fix it in one of my changes? >> >> -- Christian >> From tom.rodriguez at oracle.com Thu Aug 19 10:20:06 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 19 Aug 2010 10:20:06 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <4C6C7582.8000402@oracle.com> References: <4C6C6E72.9050909@oracle.com> <4C6C7582.8000402@oracle.com> Message-ID: <53F5614A-C015-44CF-96BD-8668C92B4386@oracle.com> On Aug 18, 2010, at 5:06 PM, Vladimir Kozlov wrote: > On 8/18/10 5:01 PM, Tom Rodriguez wrote: >>> coalesce.cpp: >>> How it is related to FPU spilling? >> >> It's not directly related but there were some spilling tweaks done as part of the change. I need to go back and evaluate them more carefully but I want to include them for now which is why I left them guarded. > > OK. > >>> matcher.cpp: >>> this code is x86 specific, for sparc it would be different. Need comment so we will not forget to fix it for sparc. >> >> I think it might have to become AD specific because the sparc to instructions aren't symmetric. I added some comments explaining the logic. > > Good. > >>> Why you added spill mask for F/D to I/L? Are these changes allow to spill FPU regs to GPU regs? >> >> Yes, it allows spilling in both directions. If they are fast enough for one why not allow the other? > > OK. > >>> reg_split.cpp: >>> The next comment does not explain why we need to split for Calls: >>> // These need a Split regardless of overlap or pressure >> >> I thought I understood this one but reading it again I'm not sure. I'll look at it some more. So the umask at a call is generally the debug info mask which is pretty much stack only so if you execute this code: // Both are either up or down, and there is overlap, No Split n->set_req(inpidx, def); you force the def down. If you insert a split at the use before the call you keep the call from forcing the def down which is what the new code is doing. I suspect the new code should replace the code above instead of skipping the mem-mem case that follows but I'd like to leave it as for now. I'm going to go back and evaluate this change and the coalesce change later. I updated the code with this comment: if (UseFPUForSpilling && n->is_Call() && !uup && !dup ) { // The use at the call can force the def down so insert // a split before the use to allow the def more freedom. maxlrg = split_USE(def,b,n,inpidx,maxlrg,dup,false, splits,slidx); Part of the problem here is that up and down are slightly strange notions. up means register only and down means not register only, so down can actually include some registers. RegMask::is_AllStack covers the case where it's stack only. Adding FPU registers to the spill mask makes this distinction some more illogical which is what motivated the tweaks I think. tom > > Thanks, > Vladimir > >> >> tom >> >>> >>> >>> On 8/18/10 3:19 PM, Tom Rodriguez wrote: >>>> http://cr.openjdk.java.net/~never/6978249 >>>> >>>> 6978249: spill between cpu and fpu registers when those moves are fast >>>> Reviewed-by: >>>> >>>> On some architectures moves between CPU and FPU registers are fast so >>>> they can be used for spilling instead of the stack. This change adds >>>> a new flag UseFPUForSpilling and sets up the spill reg masks to allow >>>> this when the flag is on. Currently for Nehalem class chips it seems >>>> to be a uniform win but we'll keep it under AggressiveOpts for now. >>>> There are some minor changes to spilling logic that are currently >>>> guarded until we determine that they are generally a good idea. I >>>> also moved the logic for PrintFlagsFinal since the initialization of >>>> several subsystems may change some flag values which will be missed by >>>> the current location. Tested with scimark, ctw and the nsk tests on >>>> 32 and 64 bit. >> From vladimir.kozlov at oracle.com Thu Aug 19 10:24:33 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Aug 2010 10:24:33 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <53F5614A-C015-44CF-96BD-8668C92B4386@oracle.com> References: <4C6C6E72.9050909@oracle.com> <4C6C7582.8000402@oracle.com> <53F5614A-C015-44CF-96BD-8668C92B4386@oracle.com> Message-ID: <4C6D68D1.90908@oracle.com> Thank you, Tom, for explanation. The changes are good to push. Thanks, Vladimir Tom Rodriguez wrote: > On Aug 18, 2010, at 5:06 PM, Vladimir Kozlov wrote: > >> On 8/18/10 5:01 PM, Tom Rodriguez wrote: >>>> reg_split.cpp: >>>> The next comment does not explain why we need to split for Calls: >>>> // These need a Split regardless of overlap or pressure >>> I thought I understood this one but reading it again I'm not sure. I'll look at it some more. > > So the umask at a call is generally the debug info mask which is pretty much stack only so if you execute this code: > > // Both are either up or down, and there is overlap, No Split > n->set_req(inpidx, def); > > you force the def down. If you insert a split at the use before the call you keep the call from forcing the def down which is what the new code is doing. I suspect the new code should replace the code above instead of skipping the mem-mem case that follows but I'd like to leave it as for now. I'm going to go back and evaluate this change and the coalesce change later. I updated the code with this comment: > > if (UseFPUForSpilling && n->is_Call() && !uup && !dup ) { > // The use at the call can force the def down so insert > // a split before the use to allow the def more freedom. > maxlrg = split_USE(def,b,n,inpidx,maxlrg,dup,false, splits,slidx); > > Part of the problem here is that up and down are slightly strange notions. up means register only and down means not register only, so down can actually include some registers. RegMask::is_AllStack covers the case where it's stack only. Adding FPU registers to the spill mask makes this distinction some more illogical which is what motivated the tweaks I think. > > tom > >> Thanks, >> Vladimir >> >>> tom >>> >>>> >>>> On 8/18/10 3:19 PM, Tom Rodriguez wrote: >>>>> http://cr.openjdk.java.net/~never/6978249 >>>>> >>>>> 6978249: spill between cpu and fpu registers when those moves are fast >>>>> Reviewed-by: >>>>> >>>>> On some architectures moves between CPU and FPU registers are fast so >>>>> they can be used for spilling instead of the stack. This change adds >>>>> a new flag UseFPUForSpilling and sets up the spill reg masks to allow >>>>> this when the flag is on. Currently for Nehalem class chips it seems >>>>> to be a uniform win but we'll keep it under AggressiveOpts for now. >>>>> There are some minor changes to spilling logic that are currently >>>>> guarded until we determine that they are generally a good idea. I >>>>> also moved the logic for PrintFlagsFinal since the initialization of >>>>> several subsystems may change some flag values which will be missed by >>>>> the current location. Tested with scimark, ctw and the nsk tests on >>>>> 32 and 64 bit. > From tom.rodriguez at oracle.com Thu Aug 19 10:30:58 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 19 Aug 2010 10:30:58 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <4C6D68D1.90908@oracle.com> References: <4C6C6E72.9050909@oracle.com> <4C6C7582.8000402@oracle.com> <53F5614A-C015-44CF-96BD-8668C92B4386@oracle.com> <4C6D68D1.90908@oracle.com> Message-ID: Thanks! tom On Aug 19, 2010, at 10:24 AM, Vladimir Kozlov wrote: > Thank you, Tom, for explanation. > > The changes are good to push. > > Thanks, > > Vladimir > > Tom Rodriguez wrote: >> On Aug 18, 2010, at 5:06 PM, Vladimir Kozlov wrote: >>> On 8/18/10 5:01 PM, Tom Rodriguez wrote: >>>>> reg_split.cpp: >>>>> The next comment does not explain why we need to split for Calls: >>>>> // These need a Split regardless of overlap or pressure >>>> I thought I understood this one but reading it again I'm not sure. I'll look at it some more. >> So the umask at a call is generally the debug info mask which is pretty much stack only so if you execute this code: >> // Both are either up or down, and there is overlap, No Split >> n->set_req(inpidx, def); >> you force the def down. If you insert a split at the use before the call you keep the call from forcing the def down which is what the new code is doing. I suspect the new code should replace the code above instead of skipping the mem-mem case that follows but I'd like to leave it as for now. I'm going to go back and evaluate this change and the coalesce change later. I updated the code with this comment: >> if (UseFPUForSpilling && n->is_Call() && !uup && !dup ) { >> // The use at the call can force the def down so insert // a split before the use to allow the def more freedom. maxlrg = split_USE(def,b,n,inpidx,maxlrg,dup,false, splits,slidx); >> Part of the problem here is that up and down are slightly strange notions. up means register only and down means not register only, so down can actually include some registers. RegMask::is_AllStack covers the case where it's stack only. Adding FPU registers to the spill mask makes this distinction some more illogical which is what motivated the tweaks I think. >> tom >>> Thanks, >>> Vladimir >>> >>>> tom >>>> >>>>> >>>>> On 8/18/10 3:19 PM, Tom Rodriguez wrote: >>>>>> http://cr.openjdk.java.net/~never/6978249 >>>>>> >>>>>> 6978249: spill between cpu and fpu registers when those moves are fast >>>>>> Reviewed-by: >>>>>> >>>>>> On some architectures moves between CPU and FPU registers are fast so >>>>>> they can be used for spilling instead of the stack. This change adds >>>>>> a new flag UseFPUForSpilling and sets up the spill reg masks to allow >>>>>> this when the flag is on. Currently for Nehalem class chips it seems >>>>>> to be a uniform win but we'll keep it under AggressiveOpts for now. >>>>>> There are some minor changes to spilling logic that are currently >>>>>> guarded until we determine that they are generally a good idea. I >>>>>> also moved the logic for PrintFlagsFinal since the initialization of >>>>>> several subsystems may change some flag values which will be missed by >>>>>> the current location. Tested with scimark, ctw and the nsk tests on >>>>>> 32 and 64 bit. From christian.thalinger at oracle.com Thu Aug 19 10:34:01 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 19 Aug 2010 19:34:01 +0200 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <4C6D6493.7000303@oracle.com> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> <1282235995.29965.2.camel@macbook> <4C6D62A8.7030704@oracle.com> <4C6D6493.7000303@oracle.com> Message-ID: <1282239241.29965.3.camel@macbook> On Thu, 2010-08-19 at 10:06 -0700, Vladimir Kozlov wrote: > Thank you, JohnC2 :) > > Sorry, I totally missed it (6977924) because of the office move. I missed it too. Good it's already fixed. -- Christian From y.s.ramakrishna at oracle.com Thu Aug 19 10:36:53 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Thu, 19 Aug 2010 10:36:53 -0700 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <1282235995.29965.2.camel@macbook> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> <1282235995.29965.2.camel@macbook> Message-ID: <4C6D6BB5.3010207@oracle.com> i think John Cuthbertson (cc'd) has fixed it in hotspot-gc. It should be promoted to hotspot/hotspot sometime this morning, and thence will find its way to hotspot-comp. -- ramki Christian Thalinger wrote: > On Wed, 2010-08-04 at 00:56 +0000, vladimir.kozlov at oracle.com wrote: >> Changeset: 0e35fa8ebccd >> Author: kvn >> Date: 2010-08-03 15:55 -0700 >> URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd >> >> 6973963: SEGV in ciBlock::start_bci() with EA >> Summary: Added more checks into ResourceObj and growableArray to verify correctness of allocation type. >> Reviewed-by: never, coleenp, dholmes > > I get a compiler warning with GCC 4.1.2 after this change: > > src/share/vm/memory/allocation.cpp: In static member function ?static void ResourceObj::operator delete(void*)?: > src/share/vm/memory/allocation.cpp:61: warning: negative integer implicitly converted to unsigned type > src/share/vm/memory/allocation.cpp: In destructor ?ResourceObj::~ResourceObj()?: > src/share/vm/memory/allocation.cpp:107: warning: negative integer implicitly converted to unsigned type > > Should I fix it in one of my changes? > > -- Christian > From christian.thalinger at oracle.com Thu Aug 19 10:53:22 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 19 Aug 2010 19:53:22 +0200 Subject: Request for reviews (XL): 6978355: renaming for 6961697 Message-ID: <1282240402.29965.10.camel@macbook> http://cr.openjdk.java.net/~twisti/6978355/webrev.01/ 6978355: renaming for 6961697 Summary: This is the renaming part of 6961697 to keep the actual changes small for review. Reviewed-by: This is a split-off of 6961697 to make the review of the actual changes easier. This CR only does a couple of renames, moves code_* methods into CodeBlob, adds a new CodeBuffer constructor to make code simpler, adds some emit functions to CodeSection to make AD code simpler, ... I did the rename in two steps to make sure that no new CodeBlob::code_* method is called instead of an old nmethod::code_* method (which are now called nmethod::insts_*). From christian.thalinger at oracle.com Thu Aug 19 11:38:31 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 19 Aug 2010 20:38:31 +0200 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <1278616722.1475.134.camel@macbook> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> Message-ID: <1282243111.29965.24.camel@macbook> On Thu, 2010-07-08 at 21:18 +0200, Christian Thalinger wrote: > I agree on that. When I've finished the new webrev I will merge it with > the other workspace and do some performance evaluation. So here's the new version: http://cr.openjdk.java.net/~twisti/6961697/webrev.02/ This one is completely different to the ones before and much simpler. The renaming part was moved to 6978355 and I only changed the order of the code sections: consts, insts, stubs. The CodeBuffer works as before and the memory layout is the same. But when the CodeBuffer is later copied into a CodeBlob the consts sections moves to the beginning. I did a lot of testing with the changes of 6961690. From dmdabbs at gmail.com Thu Aug 19 13:49:43 2010 From: dmdabbs at gmail.com (David Dabbs) Date: Thu, 19 Aug 2010 15:49:43 -0500 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: References: Message-ID: <00a601cb3fe0$0f44c360$2dce4a20$@com> Tom Rodriguez wrote: --8< snip [in addition to the register spilling changes] I also moved the logic for PrintFlagsFinal since the initialization of several subsystems may change some flag values which will be missed by the current location. Tested with scimark, ctw and the nsk tests on 32 and 64 bit. Does the "bundling" of the PrintFlagsFinal tweaks with the spilling mods mean the former won't make it into JDK6 until HS19 lands there? Thank you, David p.s. Would this be the right list to post questions regarding HS flags appropriate for maximizing HotSpot performance on Nehalem CPUs? From tom.rodriguez at oracle.com Thu Aug 19 14:05:09 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 19 Aug 2010 14:05:09 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <00a601cb3fe0$0f44c360$2dce4a20$@com> References: <00a601cb3fe0$0f44c360$2dce4a20$@com> Message-ID: <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> On Aug 19, 2010, at 1:49 PM, David Dabbs wrote: > > Tom Rodriguez wrote: > > --8< snip > [in addition to the register spilling changes] > I also moved the logic for PrintFlagsFinal since the initialization of > several subsystems may change some flag values which will be missed by > the current location. Tested with scimark, ctw and the nsk tests on > 32 and 64 bit. > > > Does the "bundling" of the PrintFlagsFinal tweaks with the spilling mods > mean the former won't make it into JDK6 until HS19 lands there? Yes. They could always be backported separately under a new bug id. At the time it didn't seem worth a new bug id. > > > Thank you, > > David > > > p.s. Would this be the right list to post questions regarding HS flags > appropriate for > maximizing HotSpot performance on Nehalem CPUs? Yes it would be fine to ask about that here. Mostly we autodetect and use the appropriate settings so there's not a lot which is or should be tuned. tom > > > > From tom.rodriguez at oracle.com Thu Aug 19 15:11:29 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 19 Aug 2010 15:11:29 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) Message-ID: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> 4809552: Optimize Arrays.fill(...) Reviewed-by: This adds new logic to recognize fill idioms and convert them into a call to an optimized fill routine. Loop predication creates easily matched loops that are simply replaced with calls to the new assembly stubs. Currently only 1,2 and 4 byte primitive types are supported. Objects and longs/double will be supported in a later putback. Tested with runthese, nsk and ctw plus jbb2005. http://cr.openjdk.java.net/~never/4809552 From vladimir.kozlov at oracle.com Thu Aug 19 19:46:33 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Aug 2010 19:46:33 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> Message-ID: <4C6DEC89.8030202@oracle.com> Tom, First, I would not call these changes Medium. They are Large at least. Should we allow OptimizeFill only when UseLoopPredicate is true? loopTransform.cpp: In match_fill_loop() should we exclude StoreCMNode also? RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? store and store_value is not set for "copy candidate": + if (value->is_Load() && lpt->_body.contains(value)) { + // tty->print_cr("possible copy candidate"); + } else { + msg = "variant store value"; + } Why you assume that on 'else' it is mem_phi?: + if (n == head->phi()) { + // ok + } else { + // mem_phi + } Should we also skip proj node (ifFalse) or it is not part of loop body? + } else if (n->is_CountedLoopEnd()) { + // ok so skip it. + msg = "node used outside loop"; ^ is How you translate next assert message?: + assert(store_value->is_Load(), "shouldn't only happen for this case"); the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: + #ifdef ASSERT + tty->print_cr("possible copy"); + store_value->dump(); + store->dump(); + #endif + msg = "variant store in loop"; For Op_LShiftX there is no check (n->in(1) == head->phi()): + } else if (n->Opcode() == Op_LShiftX) { + shift = n; + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); s_offs already includes base_offset, see GraphKit::array_element_address(): + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); Also the above expression is wrong if initial index != 0. And actually you don't need to calculate it in match_fill_loop() since it is used only in call to StubRoutines::select_fill_function() to verify that element type is supported. In intrinsify_fill() initial index value is taking into account for aligned but base_offset_in_bytes could be already part of offset and you need to multiply by element_size only initial index: + if (offset != NULL && head->init_trip()->is_Con()) { + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); + int element_size = type2aelembytes(t); + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); + } stubRoutines.cpp: why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? stubGenerator_sparc.cpp: + // Generate stub for disjoint short fill. If "aligned" is true, the ^ Generate stub for array fill. + // from: O0 ^ to + // to: O1 ^ value O5 is not used and not input argument: + const Register offset = O5; // offset from start of arrays stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: + switch (t) { + case T_BOOLEAN: + case T_BYTE: + shift = 2; + break; + case T_CHAR: + case T_SHORT: + shift = 1; + break; + case T_FLOAT: + case T_INT: + shift = 0; + break; + default: ShouldNotReachHere(); + } The same in assembler_x86.cpp In stubGenerator_x86_64.cpp new fill_32_bytes_forward() is not used. Remove commented code for T_LONG in both stubGenerator_x86_??.cpp I did not look on assembler. May be tomorrow. Thanks, Vladimir Tom Rodriguez wrote: > 4809552: Optimize Arrays.fill(...) > Reviewed-by: > > This adds new logic to recognize fill idioms and convert them into a > call to an optimized fill routine. Loop predication creates easily > matched loops that are simply replaced with calls to the new assembly > stubs. Currently only 1,2 and 4 byte primitive types are supported. > Objects and longs/double will be supported in a later putback. Tested > with runthese, nsk and ctw plus jbb2005. > > http://cr.openjdk.java.net/~never/4809552 From tom.rodriguez at oracle.com Thu Aug 19 20:59:50 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Fri, 20 Aug 2010 03:59:50 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6978249: spill between cpu and fpu registers when those moves are fast Message-ID: <20100820035954.114D3472EB@hg.openjdk.java.net> Changeset: f55c4f82ab9d Author: never Date: 2010-08-19 14:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/f55c4f82ab9d 6978249: spill between cpu and fpu registers when those moves are fast Reviewed-by: kvn ! src/cpu/sparc/vm/vm_version_sparc.cpp ! src/cpu/x86/vm/vm_version_x86.cpp ! src/cpu/x86/vm/x86_32.ad ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/opto/c2_globals.hpp ! src/share/vm/opto/coalesce.cpp ! src/share/vm/opto/matcher.cpp ! src/share/vm/opto/reg_split.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/init.cpp From vladimir.kozlov at oracle.com Thu Aug 19 22:04:13 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Aug 2010 22:04:13 -0700 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <1282243111.29965.24.camel@macbook> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> Message-ID: <4C6E0CCD.8080800@oracle.com> Christian, Are these changes made above 6978355 changes? I did not find code_offset() definition in this changes? Why you removed oops section print in nmethod.cpp? What about FIXME commented assert in relocInfo.cpp? Otherwise this looks good. Thanks, Vladimir On 8/19/10 11:38 AM, Christian Thalinger wrote: > On Thu, 2010-07-08 at 21:18 +0200, Christian Thalinger wrote: >> I agree on that. When I've finished the new webrev I will merge it with >> the other workspace and do some performance evaluation. > > So here's the new version: > > http://cr.openjdk.java.net/~twisti/6961697/webrev.02/ > > This one is completely different to the ones before and much simpler. > The renaming part was moved to 6978355 and I only changed the order of > the code sections: consts, insts, stubs. The CodeBuffer works as > before and the memory layout is the same. But when the CodeBuffer is > later copied into a CodeBlob the consts sections moves to the beginning. > > I did a lot of testing with the changes of 6961690. > From Ulf.Zibis at gmx.de Fri Aug 20 03:20:51 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 20 Aug 2010 12:20:51 +0200 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> Message-ID: <4C6E5703.6000004@gmx.de> A comment aside: Having: int[] ia = new int[SIZE]; Arrays.fill(ia, 1234); The 1st line causes the array to be first filled with zeroes according the JLS. The 2nd line causes the array to be *again* filled with 1234's. This situation could be optimized by the JIT in skipping the zero-filling. Maybe this is just done by HotSpot, then forget my 2 cents. 1 cent more: To give programmer better control of that and to enhance the interpreter in the same way, we would need to have an additional syntax, something like: int [] ia = new int[SIZE](1234); // Project Coin candidate! -Ulf Am 20.08.2010 00:11, schrieb Tom Rodriguez: > 4809552: Optimize Arrays.fill(...) > Reviewed-by: > > This adds new logic to recognize fill idioms and convert them into a > call to an optimized fill routine. Loop predication creates easily > matched loops that are simply replaced with calls to the new assembly > stubs. Currently only 1,2 and 4 byte primitive types are supported. > Objects and longs/double will be supported in a later putback. Tested > with runthese, nsk and ctw plus jbb2005. > > http://cr.openjdk.java.net/~never/4809552 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20100820/ee0a1774/attachment.html From Ulf.Zibis at gmx.de Fri Aug 20 03:30:47 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 20 Aug 2010 12:30:47 +0200 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> Message-ID: <4C6E5957.3020207@gmx.de> Couldn't we intrinsify loops like? : for (int i = fromIndex; i < toIndex; i++) a[i] = val; for (int i = 0, len = a.length; i < len; i++) a[i] = val; So all similar Java-coded loops would benefit from the performance gain of using REP STOS operation, especially in case of client compiler and maybe interpreter too. -Ulf Am 20.08.2010 00:11, schrieb Tom Rodriguez: > 4809552: Optimize Arrays.fill(...) > Reviewed-by: > > This adds new logic to recognize fill idioms and convert them into a > call to an optimized fill routine. Loop predication creates easily > matched loops that are simply replaced with calls to the new assembly > stubs. Currently only 1,2 and 4 byte primitive types are supported. > Objects and longs/double will be supported in a later putback. Tested > with runthese, nsk and ctw plus jbb2005. > > http://cr.openjdk.java.net/~never/4809552 > > > From opinali at gmail.com Fri Aug 20 05:48:54 2010 From: opinali at gmail.com (Osvaldo Doederlein) Date: Fri, 20 Aug 2010 09:48:54 -0300 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6E5703.6000004@gmx.de> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6E5703.6000004@gmx.de> Message-ID: 2010/8/20 Ulf Zibis : > A comment aside: > Having: > ??? int[] ia = new int[SIZE]; > ??? Arrays.fill(ia, 1234); > > The 1st line causes the array to be first filled with zeroes according the > JLS. > The 2nd line causes the array to be *again* filled with 1234's. > > This situation could be optimized by the JIT in skipping the zero-filling. > Maybe this is just done by HotSpot, then forget my 2 cents. I think the zero-filling is already performed as part of memory management & GC, so new[] never needs to zero-fill? A+ Osvaldo > 1 cent more: > To give programmer better control of that and to enhance the interpreter in > the same way, we would need to have an additional syntax, something like: > int [] ia = new int[SIZE](1234); // Project Coin candidate! > > -Ulf > > > Am 20.08.2010 00:11, schrieb Tom Rodriguez: > > 4809552: Optimize Arrays.fill(...) > Reviewed-by: > > This adds new logic to recognize fill idioms and convert them into a > call to an optimized fill routine. Loop predication creates easily > matched loops that are simply replaced with calls to the new assembly > stubs. Currently only 1,2 and 4 byte primitive types are supported. > Objects and longs/double will be supported in a later putback. Tested > with runthese, nsk and ctw plus jbb2005. > > http://cr.openjdk.java.net/~never/4809552 > > > From Ulf.Zibis at gmx.de Fri Aug 20 06:06:07 2010 From: Ulf.Zibis at gmx.de (Ulf Zibis) Date: Fri, 20 Aug 2010 15:06:07 +0200 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6E5703.6000004@gmx.de> Message-ID: <4C6E7DBF.5030303@gmx.de> Am 20.08.2010 14:48, schrieb Osvaldo Doederlein: > 2010/8/20 Ulf Zibis: > >> A comment aside: >> Having: >> int[] ia = new int[SIZE]; >> Arrays.fill(ia, 1234); >> >> The 1st line causes the array to be first filled with zeroes according the >> JLS. >> The 2nd line causes the array to be *again* filled with 1234's. >> >> This situation could be optimized by the JIT in skipping the zero-filling. >> Maybe this is just done by HotSpot, then forget my 2 cents. >> > I think the zero-filling is already performed as part of memory > management& GC, so new[] never needs to zero-fill? > I think, irrespectively from which management already performs the zero-filling, it wastes CPU-time, and therefore should be avoided here. -Ulf > A+ > Osvaldo > > >> 1 cent more: >> To give programmer better control of that and to enhance the interpreter in >> the same way, we would need to have an additional syntax, something like: >> int [] ia = new int[SIZE](1234); // Project Coin candidate! >> >> -Ulf >> >> >> Am 20.08.2010 00:11, schrieb Tom Rodriguez: >> >> 4809552: Optimize Arrays.fill(...) >> Reviewed-by: >> >> This adds new logic to recognize fill idioms and convert them into a >> call to an optimized fill routine. Loop predication creates easily >> matched loops that are simply replaced with calls to the new assembly >> stubs. Currently only 1,2 and 4 byte primitive types are supported. >> Objects and longs/double will be supported in a later putback. Tested >> with runthese, nsk and ctw plus jbb2005. >> >> http://cr.openjdk.java.net/~never/4809552 >> >> >> >> > > From john.cuthbertson at oracle.com Thu Aug 19 11:16:50 2010 From: john.cuthbertson at oracle.com (John Cuthbertson) Date: Thu, 19 Aug 2010 11:16:50 -0700 Subject: hg: jdk7/hotspot-comp/hotspot: 6973963: SEGV in ciBlock::start_bci() with EA In-Reply-To: <4C6D6BB5.3010207@oracle.com> References: <20100804005652.5BC8A47EBE@hg.openjdk.java.net> <1282235995.29965.2.camel@macbook> <4C6D6BB5.3010207@oracle.com> Message-ID: <4C6D7512.7050909@oracle.com> Hi All, Yes it's already fixed. I discovered it after building on intelsdv03 which has gcc 4.1.2 installed. The jprt machines have gcc 4.3 which doesn't flag the assignments. As Ramki said it should be propagated to hotspot/hotspot sometime today. JohnC On 08/19/10 10:36, Y. Srinivas Ramakrishna wrote: > i think John Cuthbertson (cc'd) has fixed it in hotspot-gc. It should > be promoted > to hotspot/hotspot sometime this morning, and thence will find its way > to hotspot-comp. > > -- ramki > > Christian Thalinger wrote: >> On Wed, 2010-08-04 at 00:56 +0000, vladimir.kozlov at oracle.com wrote: >>> Changeset: 0e35fa8ebccd >>> Author: kvn >>> Date: 2010-08-03 15:55 -0700 >>> URL: >>> http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e35fa8ebccd >>> >>> 6973963: SEGV in ciBlock::start_bci() with EA >>> Summary: Added more checks into ResourceObj and growableArray to >>> verify correctness of allocation type. >>> Reviewed-by: never, coleenp, dholmes >> >> I get a compiler warning with GCC 4.1.2 after this change: >> >> src/share/vm/memory/allocation.cpp: In static member function ?static >> void ResourceObj::operator delete(void*)?: >> src/share/vm/memory/allocation.cpp:61: warning: negative integer >> implicitly converted to unsigned type >> src/share/vm/memory/allocation.cpp: In destructor >> ?ResourceObj::~ResourceObj()?: >> src/share/vm/memory/allocation.cpp:107: warning: negative integer >> implicitly converted to unsigned type >> >> Should I fix it in one of my changes? >> >> -- Christian >> > From tom.rodriguez at oracle.com Fri Aug 20 09:00:00 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 09:00:00 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6E5703.6000004@gmx.de> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6E5703.6000004@gmx.de> Message-ID: On Aug 20, 2010, at 3:20 AM, Ulf Zibis wrote: > A comment aside: > Having: > int[] ia = new int[SIZE]; > Arrays.fill(ia, 1234); > > The 1st line causes the array to be first filled with zeroes according the JLS. > The 2nd line causes the array to be *again* filled with 1234's. > > This situation could be optimized by the JIT in skipping the zero-filling. > Maybe this is just done by HotSpot, then forget my 2 cents. My change will skip the initial zeroing and replace it by the work done in the fill. tom > > 1 cent more: > To give programmer better control of that and to enhance the interpreter in the same way, we would need to have an additional syntax, something like: > int [] ia = new int[SIZE](1234); // Project Coin candidate! > > -Ulf > > > Am 20.08.2010 00:11, schrieb Tom Rodriguez: >> 4809552: Optimize Arrays.fill(...) >> Reviewed-by: >> >> This adds new logic to recognize fill idioms and convert them into a >> call to an optimized fill routine. Loop predication creates easily >> matched loops that are simply replaced with calls to the new assembly >> stubs. Currently only 1,2 and 4 byte primitive types are supported. >> Objects and longs/double will be supported in a later putback. Tested >> with runthese, nsk and ctw plus jbb2005. >> >> >> http://cr.openjdk.java.net/~never/4809552 >> >> >> >> >> From tom.rodriguez at Oracle.COM Fri Aug 20 09:00:56 2010 From: tom.rodriguez at Oracle.COM (Tom Rodriguez) Date: Fri, 20 Aug 2010 09:00:56 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6E5957.3020207@gmx.de> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6E5957.3020207@gmx.de> Message-ID: <7E5F3267-8CD8-4DE5-BE7D-B17A8D07EC5D@oracle.com> On Aug 20, 2010, at 3:30 AM, Ulf Zibis wrote: > Couldn't we intrinsify loops like? : > for (int i = fromIndex; i < toIndex; i++) > a[i] = val; > for (int i = 0, len = a.length; i < len; i++) > a[i] = val; It will. tom > > So all similar Java-coded loops would benefit from the performance gain of using REP STOS operation, especially in case of client compiler and maybe interpreter too. > > -Ulf > > > Am 20.08.2010 00:11, schrieb Tom Rodriguez: >> 4809552: Optimize Arrays.fill(...) >> Reviewed-by: >> >> This adds new logic to recognize fill idioms and convert them into a >> call to an optimized fill routine. Loop predication creates easily >> matched loops that are simply replaced with calls to the new assembly >> stubs. Currently only 1,2 and 4 byte primitive types are supported. >> Objects and longs/double will be supported in a later putback. Tested >> with runthese, nsk and ctw plus jbb2005. >> >> http://cr.openjdk.java.net/~never/4809552 >> >> >> > From y.s.ramakrishna at oracle.com Fri Aug 20 09:59:11 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 20 Aug 2010 09:59:11 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6E5703.6000004@gmx.de> Message-ID: <4C6EB45F.7000207@oracle.com> Osvaldo Doederlein wrote: > 2010/8/20 Ulf Zibis : >> A comment aside: >> Having: >> int[] ia = new int[SIZE]; >> Arrays.fill(ia, 1234); >> >> The 1st line causes the array to be first filled with zeroes according the >> JLS. >> The 2nd line causes the array to be *again* filled with 1234's. >> >> This situation could be optimized by the JIT in skipping the zero-filling. >> Maybe this is just done by HotSpot, then forget my 2 cents. > > I think the zero-filling is already performed as part of memory > management & GC, so new[] never needs to zero-fill? As you might have surmised from Tom's response, no GC's (except perhaps G1?) now do 0-fill. (Although there is an option to enable 0-filling of TLAB's, it's off by default.) -- ramki > > A+ > Osvaldo > >> 1 cent more: >> To give programmer better control of that and to enhance the interpreter in >> the same way, we would need to have an additional syntax, something like: >> int [] ia = new int[SIZE](1234); // Project Coin candidate! >> >> -Ulf >> >> >> Am 20.08.2010 00:11, schrieb Tom Rodriguez: >> >> 4809552: Optimize Arrays.fill(...) >> Reviewed-by: >> >> This adds new logic to recognize fill idioms and convert them into a >> call to an optimized fill routine. Loop predication creates easily >> matched loops that are simply replaced with calls to the new assembly >> stubs. Currently only 1,2 and 4 byte primitive types are supported. >> Objects and longs/double will be supported in a later putback. Tested >> with runthese, nsk and ctw plus jbb2005. >> >> http://cr.openjdk.java.net/~never/4809552 >> >> >> From vladimir.kozlov at oracle.com Fri Aug 20 10:49:32 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Aug 2010 10:49:32 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6DEC89.8030202@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> Message-ID: <4C6EC02C.7050300@oracle.com> Assembler part review. In stubGenerator_sparc.cpp Move next lines above 64 bit value construction: + __ cmp(count, 2<nop(); In assembler_x86.cpp You don't need next: + jmpb(L_copy_4_bytes); // all dwords were copied Use next movdqa since you aligned address: + movdqa(Address(to, 0), xtmp); + movdqa(Address(to, 16), xtmp); instead of + movq(Address(to, 0), xtmp); + movq(Address(to, 8), xtmp); + movq(Address(to, 16), xtmp); + movq(Address(to, 24), xtmp); Vladimir Vladimir Kozlov wrote: > Tom, > > First, I would not call these changes Medium. They are Large at least. > > Should we allow OptimizeFill only when UseLoopPredicate is true? > > loopTransform.cpp: > > In match_fill_loop() should we exclude StoreCMNode also? > > RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do > it explicitly? > > store and store_value is not set for "copy candidate": > > + if (value->is_Load() && lpt->_body.contains(value)) { > + // tty->print_cr("possible copy candidate"); > + } else { > + msg = "variant store value"; > + } > > Why you assume that on 'else' it is mem_phi?: > > + if (n == head->phi()) { > + // ok > + } else { > + // mem_phi > + } > > Should we also skip proj node (ifFalse) or it is not part of loop body? > > + } else if (n->is_CountedLoopEnd()) { > + // ok so skip it. > > + msg = "node used outside loop"; > ^ is > > How you translate next assert message?: > + assert(store_value->is_Load(), "shouldn't only happen for this > case"); > > the next dump should be under flag and 'msg' should reflect "possible > copy" or set msg_node: > > + #ifdef ASSERT > + tty->print_cr("possible copy"); > + store_value->dump(); > + store->dump(); > + #endif > + msg = "variant store in loop"; > > For Op_LShiftX there is no check (n->in(1) == head->phi()): > > + } else if (n->Opcode() == Op_LShiftX) { > + shift = n; > + assert(type2aelembytes(store->as_Mem()->memory_type(), true) > == 1 << shift->in(2)->get_int(), "scale should match"); > > > s_offs already includes base_offset, see GraphKit::array_element_address(): > + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * > element_size) % HeapWordSize == 0); > Also the above expression is wrong if initial index != 0. > And actually you don't need to calculate it in match_fill_loop() since > it is used only in call to StubRoutines::select_fill_function() to verify > that element type is supported. > > > In intrinsify_fill() initial index value is taking into account for aligned > but base_offset_in_bytes could be already part of offset and you need > to multiply by element_size only initial index: > > + if (offset != NULL && head->init_trip()->is_Con()) { > + intptr_t offs = offset->find_intptr_t_type()->get_con() + > head->init_trip()->get_int(); > + int element_size = type2aelembytes(t); > + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * > element_size) % HeapWordSize == 0); > + } > > stubRoutines.cpp: > why you have specialized copies for testing _jint_fill and _jbyte_fill. > Is not it covered by TEST_FILL already? > > stubGenerator_sparc.cpp: > + // Generate stub for disjoint short fill. If "aligned" is true, the > ^ Generate stub for array fill. > > + // from: O0 > ^ to > + // to: O1 > ^ value > > O5 is not used and not input argument: > + const Register offset = O5; // offset from start of arrays > > stubs are generated only for byte,short and int, so allowing bollean, > char and float is wrong: > + switch (t) { > + case T_BOOLEAN: > + case T_BYTE: > + shift = 2; > + break; > + case T_CHAR: > + case T_SHORT: > + shift = 1; > + break; > + case T_FLOAT: > + case T_INT: > + shift = 0; > + break; > + default: ShouldNotReachHere(); > + } > > The same in assembler_x86.cpp > > In stubGenerator_x86_64.cpp > new fill_32_bytes_forward() is not used. > > Remove commented code for T_LONG in both stubGenerator_x86_??.cpp > > I did not look on assembler. May be tomorrow. > > Thanks, > Vladimir > > > Tom Rodriguez wrote: >> 4809552: Optimize Arrays.fill(...) >> Reviewed-by: >> >> This adds new logic to recognize fill idioms and convert them into a >> call to an optimized fill routine. Loop predication creates easily >> matched loops that are simply replaced with calls to the new assembly >> stubs. Currently only 1,2 and 4 byte primitive types are supported. >> Objects and longs/double will be supported in a later putback. Tested >> with runthese, nsk and ctw plus jbb2005. >> >> http://cr.openjdk.java.net/~never/4809552 From vladimir.kozlov at oracle.com Fri Aug 20 10:57:59 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Aug 2010 10:57:59 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6EC02C.7050300@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> Message-ID: <4C6EC227.3050807@oracle.com> Actually I think it does not make sense to use movdqu() since it still slower on cache line boundary. Align to 8 bytes always and use only movdqa(). Vladimir Vladimir Kozlov wrote: > Assembler part review. > > In stubGenerator_sparc.cpp > > Move next lines above 64 bit value construction: > > + __ cmp(count, 2< + __ brx(Assembler::lessUnsigned, false, Assembler::pn, > L_copy_4_bytes); // use unsigned cmp > + __ delayed()->nop(); > > In assembler_x86.cpp > > You don't need next: > + jmpb(L_copy_4_bytes); // all dwords were copied > > Use next movdqa since you aligned address: > + movdqa(Address(to, 0), xtmp); > + movdqa(Address(to, 16), xtmp); > > instead of > + movq(Address(to, 0), xtmp); > + movq(Address(to, 8), xtmp); > + movq(Address(to, 16), xtmp); > + movq(Address(to, 24), xtmp); > > Vladimir > > Vladimir Kozlov wrote: >> Tom, >> >> First, I would not call these changes Medium. They are Large at least. >> >> Should we allow OptimizeFill only when UseLoopPredicate is true? >> >> loopTransform.cpp: >> >> In match_fill_loop() should we exclude StoreCMNode also? >> >> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do >> it explicitly? >> >> store and store_value is not set for "copy candidate": >> >> + if (value->is_Load() && lpt->_body.contains(value)) { >> + // tty->print_cr("possible copy candidate"); >> + } else { >> + msg = "variant store value"; >> + } >> >> Why you assume that on 'else' it is mem_phi?: >> >> + if (n == head->phi()) { >> + // ok >> + } else { >> + // mem_phi >> + } >> >> Should we also skip proj node (ifFalse) or it is not part of loop body? >> >> + } else if (n->is_CountedLoopEnd()) { >> + // ok so skip it. >> >> + msg = "node used outside loop"; >> ^ is >> >> How you translate next assert message?: >> + assert(store_value->is_Load(), "shouldn't only happen for this >> case"); >> >> the next dump should be under flag and 'msg' should reflect "possible >> copy" or set msg_node: >> >> + #ifdef ASSERT >> + tty->print_cr("possible copy"); >> + store_value->dump(); >> + store->dump(); >> + #endif >> + msg = "variant store in loop"; >> >> For Op_LShiftX there is no check (n->in(1) == head->phi()): >> >> + } else if (n->Opcode() == Op_LShiftX) { >> + shift = n; >> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) >> == 1 << shift->in(2)->get_int(), "scale should match"); >> >> >> s_offs already includes base_offset, see >> GraphKit::array_element_address(): >> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * >> element_size) % HeapWordSize == 0); >> Also the above expression is wrong if initial index != 0. >> And actually you don't need to calculate it in match_fill_loop() since >> it is used only in call to StubRoutines::select_fill_function() to verify >> that element type is supported. >> >> >> In intrinsify_fill() initial index value is taking into account for >> aligned >> but base_offset_in_bytes could be already part of offset and you need >> to multiply by element_size only initial index: >> >> + if (offset != NULL && head->init_trip()->is_Con()) { >> + intptr_t offs = offset->find_intptr_t_type()->get_con() + >> head->init_trip()->get_int(); >> + int element_size = type2aelembytes(t); >> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * >> element_size) % HeapWordSize == 0); >> + } >> >> stubRoutines.cpp: >> why you have specialized copies for testing _jint_fill and >> _jbyte_fill. Is not it covered by TEST_FILL already? >> >> stubGenerator_sparc.cpp: >> + // Generate stub for disjoint short fill. If "aligned" is true, the >> ^ Generate stub for array fill. >> >> + // from: O0 >> ^ to >> + // to: O1 >> ^ value >> >> O5 is not used and not input argument: >> + const Register offset = O5; // offset from start of arrays >> >> stubs are generated only for byte,short and int, so allowing bollean, >> char and float is wrong: >> + switch (t) { >> + case T_BOOLEAN: >> + case T_BYTE: >> + shift = 2; >> + break; >> + case T_CHAR: >> + case T_SHORT: >> + shift = 1; >> + break; >> + case T_FLOAT: >> + case T_INT: >> + shift = 0; >> + break; >> + default: ShouldNotReachHere(); >> + } >> >> The same in assembler_x86.cpp >> >> In stubGenerator_x86_64.cpp >> new fill_32_bytes_forward() is not used. >> >> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >> >> I did not look on assembler. May be tomorrow. >> >> Thanks, >> Vladimir >> >> >> Tom Rodriguez wrote: >>> 4809552: Optimize Arrays.fill(...) >>> Reviewed-by: >>> >>> This adds new logic to recognize fill idioms and convert them into a >>> call to an optimized fill routine. Loop predication creates easily >>> matched loops that are simply replaced with calls to the new assembly >>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>> Objects and longs/double will be supported in a later putback. Tested >>> with runthese, nsk and ctw plus jbb2005. >>> >>> http://cr.openjdk.java.net/~never/4809552 From yamauchi at google.com Fri Aug 20 11:13:25 2010 From: yamauchi at google.com (Hiroshi Yamauchi) Date: Fri, 20 Aug 2010 11:13:25 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> References: <00a601cb3fe0$0f44c360$2dce4a20$@com> <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> Message-ID: Hi Tom, I'm just curious - how much win is this spilling technique? Hiroshi On Thu, Aug 19, 2010 at 2:05 PM, Tom Rodriguez wrote: > > On Aug 19, 2010, at 1:49 PM, David Dabbs wrote: > >> >> Tom Rodriguez wrote: >> >> --8< snip >> [in addition to the register spilling changes] >> I also moved the logic for PrintFlagsFinal since the initialization of >> several subsystems may change some flag values which will be missed by >> the current location. ?Tested with scimark, ctw and the nsk tests on >> 32 and 64 bit. >> >> >> Does the "bundling" of the PrintFlagsFinal tweaks with the spilling mods >> mean the former won't make it into JDK6 until HS19 lands there? > > Yes. ?They could always be backported separately under a new bug id. ?At the time it didn't seem worth a new bug id. > >> >> >> Thank you, >> >> David >> >> >> p.s. Would this be the right list to post questions regarding HS flags >> appropriate for >> maximizing HotSpot performance on Nehalem CPUs? > > Yes it would be fine to ask about that here. ?Mostly we autodetect and use the appropriate settings so there's not a lot which is or should be tuned. > > tom > >> >> >> >> > > From tom.rodriguez at oracle.com Fri Aug 20 11:14:56 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 11:14:56 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6EC02C.7050300@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> Message-ID: On Aug 20, 2010, at 10:49 AM, Vladimir Kozlov wrote: > Assembler part review. > > In stubGenerator_sparc.cpp > > Move next lines above 64 bit value construction: > > + __ cmp(count, 2< + __ brx(Assembler::lessUnsigned, false, Assembler::pn, L_copy_4_bytes); // use unsigned cmp > + __ delayed()->nop(); Actually I moved the 64 bit construction down to just before the fill 32 loop, like the x86 version. > In assembler_x86.cpp > > You don't need next: > + jmpb(L_copy_4_bytes); // all dwords were copied yep. > Use next movdqa since you aligned address: > + movdqa(Address(to, 0), xtmp); > + movdqa(Address(to, 16), xtmp); > > instead of > + movq(Address(to, 0), xtmp); > + movq(Address(to, 8), xtmp); > + movq(Address(to, 16), xtmp); > + movq(Address(to, 24), xtmp); But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? tom > > Vladimir > > Vladimir Kozlov wrote: >> Tom, >> First, I would not call these changes Medium. They are Large at least. >> Should we allow OptimizeFill only when UseLoopPredicate is true? >> loopTransform.cpp: >> In match_fill_loop() should we exclude StoreCMNode also? >> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >> store and store_value is not set for "copy candidate": >> + if (value->is_Load() && lpt->_body.contains(value)) { >> + // tty->print_cr("possible copy candidate"); >> + } else { >> + msg = "variant store value"; >> + } >> Why you assume that on 'else' it is mem_phi?: >> + if (n == head->phi()) { >> + // ok >> + } else { >> + // mem_phi >> + } >> Should we also skip proj node (ifFalse) or it is not part of loop body? >> + } else if (n->is_CountedLoopEnd()) { >> + // ok so skip it. >> + msg = "node used outside loop"; >> ^ is >> How you translate next assert message?: >> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >> + #ifdef ASSERT >> + tty->print_cr("possible copy"); >> + store_value->dump(); >> + store->dump(); >> + #endif >> + msg = "variant store in loop"; >> For Op_LShiftX there is no check (n->in(1) == head->phi()): >> + } else if (n->Opcode() == Op_LShiftX) { >> + shift = n; >> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >> s_offs already includes base_offset, see GraphKit::array_element_address(): >> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >> Also the above expression is wrong if initial index != 0. >> And actually you don't need to calculate it in match_fill_loop() since >> it is used only in call to StubRoutines::select_fill_function() to verify >> that element type is supported. >> In intrinsify_fill() initial index value is taking into account for aligned >> but base_offset_in_bytes could be already part of offset and you need >> to multiply by element_size only initial index: >> + if (offset != NULL && head->init_trip()->is_Con()) { >> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >> + int element_size = type2aelembytes(t); >> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >> + } >> stubRoutines.cpp: >> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >> stubGenerator_sparc.cpp: >> + // Generate stub for disjoint short fill. If "aligned" is true, the >> ^ Generate stub for array fill. >> + // from: O0 >> ^ to >> + // to: O1 >> ^ value >> O5 is not used and not input argument: >> + const Register offset = O5; // offset from start of arrays >> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >> + switch (t) { >> + case T_BOOLEAN: >> + case T_BYTE: >> + shift = 2; >> + break; >> + case T_CHAR: >> + case T_SHORT: >> + shift = 1; >> + break; >> + case T_FLOAT: >> + case T_INT: >> + shift = 0; >> + break; >> + default: ShouldNotReachHere(); >> + } >> The same in assembler_x86.cpp >> In stubGenerator_x86_64.cpp >> new fill_32_bytes_forward() is not used. >> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >> I did not look on assembler. May be tomorrow. >> Thanks, >> Vladimir >> Tom Rodriguez wrote: >>> 4809552: Optimize Arrays.fill(...) >>> Reviewed-by: >>> >>> This adds new logic to recognize fill idioms and convert them into a >>> call to an optimized fill routine. Loop predication creates easily >>> matched loops that are simply replaced with calls to the new assembly >>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>> Objects and longs/double will be supported in a later putback. Tested >>> with runthese, nsk and ctw plus jbb2005. >>> >>> http://cr.openjdk.java.net/~never/4809552 From tom.rodriguez at oracle.com Fri Aug 20 11:24:54 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 11:24:54 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: References: <00a601cb3fe0$0f44c360$2dce4a20$@com> <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> Message-ID: <8A7EAA6B-E8F6-4C46-911F-A3DB50B9C1C9@oracle.com> It's very program and processor dependent. If the moves are fast enough then it never appears to hurt and it can bring a small benefit for real programs. Tight kernels with more live values can benefit more. We're still getting a handle on the exact performance characteristics and hopefully will turn it on by default in some configs once we're more confident. Anyway, your mileage may vary. tom On Aug 20, 2010, at 11:13 AM, Hiroshi Yamauchi wrote: > Hi Tom, > > I'm just curious - how much win is this spilling technique? > > Hiroshi > > On Thu, Aug 19, 2010 at 2:05 PM, Tom Rodriguez wrote: >> >> On Aug 19, 2010, at 1:49 PM, David Dabbs wrote: >> >>> >>> Tom Rodriguez wrote: >>> >>> --8< snip >>> [in addition to the register spilling changes] >>> I also moved the logic for PrintFlagsFinal since the initialization of >>> several subsystems may change some flag values which will be missed by >>> the current location. Tested with scimark, ctw and the nsk tests on >>> 32 and 64 bit. >>> >>> >>> Does the "bundling" of the PrintFlagsFinal tweaks with the spilling mods >>> mean the former won't make it into JDK6 until HS19 lands there? >> >> Yes. They could always be backported separately under a new bug id. At the time it didn't seem worth a new bug id. >> >>> >>> >>> Thank you, >>> >>> David >>> >>> >>> p.s. Would this be the right list to post questions regarding HS flags >>> appropriate for >>> maximizing HotSpot performance on Nehalem CPUs? >> >> Yes it would be fine to ask about that here. Mostly we autodetect and use the appropriate settings so there's not a lot which is or should be tuned. >> >> tom >> >>> >>> >>> >>> >> >> From tom.rodriguez at oracle.com Fri Aug 20 11:26:33 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 11:26:33 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6EC227.3050807@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6EC227.3050807@oracle.com> Message-ID: <567AC6D9-5B70-4302-8993-DD80323B21B0@oracle.com> On Aug 20, 2010, at 10:57 AM, Vladimir Kozlov wrote: > Actually I think it does not make sense to use movdqu() > since it still slower on cache line boundary. Agreed. > Align to 8 bytes always and use only movdqa(). You mean 16 byte? tom > > Vladimir > > Vladimir Kozlov wrote: >> Assembler part review. >> In stubGenerator_sparc.cpp >> Move next lines above 64 bit value construction: >> + __ cmp(count, 2<> + __ brx(Assembler::lessUnsigned, false, Assembler::pn, L_copy_4_bytes); // use unsigned cmp >> + __ delayed()->nop(); >> In assembler_x86.cpp >> You don't need next: >> + jmpb(L_copy_4_bytes); // all dwords were copied >> Use next movdqa since you aligned address: >> + movdqa(Address(to, 0), xtmp); >> + movdqa(Address(to, 16), xtmp); >> instead of >> + movq(Address(to, 0), xtmp); >> + movq(Address(to, 8), xtmp); >> + movq(Address(to, 16), xtmp); >> + movq(Address(to, 24), xtmp); >> Vladimir >> Vladimir Kozlov wrote: >>> Tom, >>> >>> First, I would not call these changes Medium. They are Large at least. >>> >>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>> >>> loopTransform.cpp: >>> >>> In match_fill_loop() should we exclude StoreCMNode also? >>> >>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>> >>> store and store_value is not set for "copy candidate": >>> >>> + if (value->is_Load() && lpt->_body.contains(value)) { >>> + // tty->print_cr("possible copy candidate"); >>> + } else { >>> + msg = "variant store value"; >>> + } >>> >>> Why you assume that on 'else' it is mem_phi?: >>> >>> + if (n == head->phi()) { >>> + // ok >>> + } else { >>> + // mem_phi >>> + } >>> >>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>> >>> + } else if (n->is_CountedLoopEnd()) { >>> + // ok so skip it. >>> >>> + msg = "node used outside loop"; >>> ^ is >>> >>> How you translate next assert message?: >>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>> >>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>> >>> + #ifdef ASSERT >>> + tty->print_cr("possible copy"); >>> + store_value->dump(); >>> + store->dump(); >>> + #endif >>> + msg = "variant store in loop"; >>> >>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>> >>> + } else if (n->Opcode() == Op_LShiftX) { >>> + shift = n; >>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>> >>> >>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>> Also the above expression is wrong if initial index != 0. >>> And actually you don't need to calculate it in match_fill_loop() since >>> it is used only in call to StubRoutines::select_fill_function() to verify >>> that element type is supported. >>> >>> >>> In intrinsify_fill() initial index value is taking into account for aligned >>> but base_offset_in_bytes could be already part of offset and you need >>> to multiply by element_size only initial index: >>> >>> + if (offset != NULL && head->init_trip()->is_Con()) { >>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>> + int element_size = type2aelembytes(t); >>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>> + } >>> >>> stubRoutines.cpp: >>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>> >>> stubGenerator_sparc.cpp: >>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>> ^ Generate stub for array fill. >>> >>> + // from: O0 >>> ^ to >>> + // to: O1 >>> ^ value >>> >>> O5 is not used and not input argument: >>> + const Register offset = O5; // offset from start of arrays >>> >>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>> + switch (t) { >>> + case T_BOOLEAN: >>> + case T_BYTE: >>> + shift = 2; >>> + break; >>> + case T_CHAR: >>> + case T_SHORT: >>> + shift = 1; >>> + break; >>> + case T_FLOAT: >>> + case T_INT: >>> + shift = 0; >>> + break; >>> + default: ShouldNotReachHere(); >>> + } >>> >>> The same in assembler_x86.cpp >>> >>> In stubGenerator_x86_64.cpp >>> new fill_32_bytes_forward() is not used. >>> >>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>> >>> I did not look on assembler. May be tomorrow. >>> >>> Thanks, >>> Vladimir >>> >>> >>> Tom Rodriguez wrote: >>>> 4809552: Optimize Arrays.fill(...) >>>> Reviewed-by: >>>> >>>> This adds new logic to recognize fill idioms and convert them into a >>>> call to an optimized fill routine. Loop predication creates easily >>>> matched loops that are simply replaced with calls to the new assembly >>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>> Objects and longs/double will be supported in a later putback. Tested >>>> with runthese, nsk and ctw plus jbb2005. >>>> >>>> http://cr.openjdk.java.net/~never/4809552 From yamauchi at google.com Fri Aug 20 11:36:20 2010 From: yamauchi at google.com (Hiroshi Yamauchi) Date: Fri, 20 Aug 2010 11:36:20 -0700 Subject: review (S) for 6978249: spill between cpu and fpu registers when those moves are fast In-Reply-To: <8A7EAA6B-E8F6-4C46-911F-A3DB50B9C1C9@oracle.com> References: <00a601cb3fe0$0f44c360$2dce4a20$@com> <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> <8A7EAA6B-E8F6-4C46-911F-A3DB50B9C1C9@oracle.com> Message-ID: Interesting. Thanks for the info. On Fri, Aug 20, 2010 at 11:24 AM, Tom Rodriguez wrote: > It's very program and processor dependent. ?If the moves are fast enough then it never appears to hurt and it can bring a small benefit for real programs. ?Tight kernels with more live values can benefit more. ?We're still getting a handle on the exact performance characteristics and hopefully will turn it on by default in some configs once we're more confident. ?Anyway, your mileage may vary. > > tom > > On Aug 20, 2010, at 11:13 AM, Hiroshi Yamauchi wrote: > >> Hi Tom, >> >> I'm just curious - how much win is this spilling technique? >> >> Hiroshi >> >> On Thu, Aug 19, 2010 at 2:05 PM, Tom Rodriguez wrote: >>> >>> On Aug 19, 2010, at 1:49 PM, David Dabbs wrote: >>> >>>> >>>> Tom Rodriguez wrote: >>>> >>>> --8< snip >>>> [in addition to the register spilling changes] >>>> I also moved the logic for PrintFlagsFinal since the initialization of >>>> several subsystems may change some flag values which will be missed by >>>> the current location. ?Tested with scimark, ctw and the nsk tests on >>>> 32 and 64 bit. >>>> >>>> >>>> Does the "bundling" of the PrintFlagsFinal tweaks with the spilling mods >>>> mean the former won't make it into JDK6 until HS19 lands there? >>> >>> Yes. ?They could always be backported separately under a new bug id. ?At the time it didn't seem worth a new bug id. >>> >>>> >>>> >>>> Thank you, >>>> >>>> David >>>> >>>> >>>> p.s. Would this be the right list to post questions regarding HS flags >>>> appropriate for >>>> maximizing HotSpot performance on Nehalem CPUs? >>> >>> Yes it would be fine to ask about that here. ?Mostly we autodetect and use the appropriate settings so there's not a lot which is or should be tuned. >>> >>> tom >>> >>>> >>>> >>>> >>>> >>> >>> > > From tom.rodriguez at oracle.com Fri Aug 20 11:47:09 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 11:47:09 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6DEC89.8030202@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> Message-ID: <1519F924-6633-4AB2-B8A2-CCF1E1532DA5@oracle.com> On Aug 19, 2010, at 7:46 PM, Vladimir Kozlov wrote: > Tom, > > First, I would not call these changes Medium. They are Large at least. > > Should we allow OptimizeFill only when UseLoopPredicate is true? I could guard it. > > loopTransform.cpp: > > In match_fill_loop() should we exclude StoreCMNode also? Probably. > > RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? It gets filtered out currently by the StoreP check but once I add oop support I'll filter it out explicitly. > > store and store_value is not set for "copy candidate": > > + if (value->is_Load() && lpt->_body.contains(value)) { > + // tty->print_cr("possible copy candidate"); > + } else { > + msg = "variant store value"; > + } I deleted the copy candidate logic. We may want to expand the fill match to match arraycopy idioms which was part of what I was looking for. It didn't seem that common though. > > Why you assume that on 'else' it is mem_phi?: > > + if (n == head->phi()) { > + // ok > + } else { > + // mem_phi > + } I test explicitly for that case now and bailout if I see another phi. > > Should we also skip proj node (ifFalse) or it is not part of loop body? > > + } else if (n->is_CountedLoopEnd()) { > + // ok so skip it. I should probably test for it. I really should be more careful in this loop. I'm going to extend the checks in this loop to make sure I'm really looking at everything. > + msg = "node used outside loop"; > ^ is ok. > > How you translate next assert message?: > + assert(store_value->is_Load(), "shouldn't only happen for this case"); this is gone now. > > the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: > > + #ifdef ASSERT > + tty->print_cr("possible copy"); > + store_value->dump(); > + store->dump(); > + #endif > + msg = "variant store in loop"; This is gone too. > For Op_LShiftX there is no check (n->in(1) == head->phi()): > > + } else if (n->Opcode() == Op_LShiftX) { > + shift = n; > + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); I've added a check. > s_offs already includes base_offset, see GraphKit::array_element_address(): > + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); > Also the above expression is wrong if initial index != 0. > And actually you don't need to calculate it in match_fill_loop() since > it is used only in call to StubRoutines::select_fill_function() to verify > that element type is supported. True. I'll drop that copy. > In intrinsify_fill() initial index value is taking into account for aligned > but base_offset_in_bytes could be already part of offset and you need > to multiply by element_size only initial index: > > + if (offset != NULL && head->init_trip()->is_Con()) { > + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); > + int element_size = type2aelembytes(t); > + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); > + } Actually I don't think I should be including arrayOopDesc::base_offset_in_bytes(t) since it should be included in offset already. It really should be: aligned = (offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int() * element_size) % HeapWordSize == 0); > > stubRoutines.cpp: > why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? That was leftover debug code. I've deleted the extra copy. > > stubGenerator_sparc.cpp: > + // Generate stub for disjoint short fill. If "aligned" is true, the > ^ Generate stub for array fill. > > + // from: O0 > ^ to > + // to: O1 > ^ value ok. > > O5 is not used and not input argument: > + const Register offset = O5; // offset from start of arrays ok. I also corrected the comment below it to indicate that on O3 is used as a temp. > > stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: > + switch (t) { > + case T_BOOLEAN: > + case T_BYTE: > + shift = 2; > + break; > + case T_CHAR: > + case T_SHORT: > + shift = 1; > + break; > + case T_FLOAT: > + case T_INT: > + shift = 0; > + break; > + default: ShouldNotReachHere(); > + } > > The same in assembler_x86.cpp Yep. > > In stubGenerator_x86_64.cpp > new fill_32_bytes_forward() is not used. I didn't even notice that was there. Deleted. > > Remove commented code for T_LONG in both stubGenerator_x86_??.cpp yep. > > I did not look on assembler. May be tomorrow. Ok. The sparc version of the code is pretty much an exact duplicate structurally of the x86 code. I just realized I need to do a little delay slot filling on sparc. I'll fix the rest of this stuff and send out a new webrev. tom > > Thanks, > Vladimir > > > Tom Rodriguez wrote: >> 4809552: Optimize Arrays.fill(...) >> Reviewed-by: >> This adds new logic to recognize fill idioms and convert them into a >> call to an optimized fill routine. Loop predication creates easily >> matched loops that are simply replaced with calls to the new assembly >> stubs. Currently only 1,2 and 4 byte primitive types are supported. >> Objects and longs/double will be supported in a later putback. Tested >> with runthese, nsk and ctw plus jbb2005. >> http://cr.openjdk.java.net/~never/4809552 From vladimir.kozlov at oracle.com Fri Aug 20 11:46:35 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Aug 2010 11:46:35 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> Message-ID: <4C6ECD8B.7080706@oracle.com> Tom Rodriguez wrote: > >> Use next movdqa since you aligned address: >> + movdqa(Address(to, 0), xtmp); >> + movdqa(Address(to, 16), xtmp); >> >> instead of >> + movq(Address(to, 0), xtmp); >> + movq(Address(to, 8), xtmp); >> + movq(Address(to, 16), xtmp); >> + movq(Address(to, 24), xtmp); > > But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? Sorry, you are right, it requires 16 not 8 bytes. :( I think it worth to align to 16 since it will benefit all x86. movdl(xtmp, value); pshufd(xtmp, xtmp, 0); + // align to 16 bytes, we know we are 8 byte aligned to start + Label L_skip_align16; + testptr(to, 8); + jccb(Assembler::zero, L_skip_align16); + subl(count, 2< > tom > >> Vladimir >> >> Vladimir Kozlov wrote: >>> Tom, >>> First, I would not call these changes Medium. They are Large at least. >>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>> loopTransform.cpp: >>> In match_fill_loop() should we exclude StoreCMNode also? >>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>> store and store_value is not set for "copy candidate": >>> + if (value->is_Load() && lpt->_body.contains(value)) { >>> + // tty->print_cr("possible copy candidate"); >>> + } else { >>> + msg = "variant store value"; >>> + } >>> Why you assume that on 'else' it is mem_phi?: >>> + if (n == head->phi()) { >>> + // ok >>> + } else { >>> + // mem_phi >>> + } >>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>> + } else if (n->is_CountedLoopEnd()) { >>> + // ok so skip it. >>> + msg = "node used outside loop"; >>> ^ is >>> How you translate next assert message?: >>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>> + #ifdef ASSERT >>> + tty->print_cr("possible copy"); >>> + store_value->dump(); >>> + store->dump(); >>> + #endif >>> + msg = "variant store in loop"; >>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>> + } else if (n->Opcode() == Op_LShiftX) { >>> + shift = n; >>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>> Also the above expression is wrong if initial index != 0. >>> And actually you don't need to calculate it in match_fill_loop() since >>> it is used only in call to StubRoutines::select_fill_function() to verify >>> that element type is supported. >>> In intrinsify_fill() initial index value is taking into account for aligned >>> but base_offset_in_bytes could be already part of offset and you need >>> to multiply by element_size only initial index: >>> + if (offset != NULL && head->init_trip()->is_Con()) { >>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>> + int element_size = type2aelembytes(t); >>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>> + } >>> stubRoutines.cpp: >>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>> stubGenerator_sparc.cpp: >>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>> ^ Generate stub for array fill. >>> + // from: O0 >>> ^ to >>> + // to: O1 >>> ^ value >>> O5 is not used and not input argument: >>> + const Register offset = O5; // offset from start of arrays >>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>> + switch (t) { >>> + case T_BOOLEAN: >>> + case T_BYTE: >>> + shift = 2; >>> + break; >>> + case T_CHAR: >>> + case T_SHORT: >>> + shift = 1; >>> + break; >>> + case T_FLOAT: >>> + case T_INT: >>> + shift = 0; >>> + break; >>> + default: ShouldNotReachHere(); >>> + } >>> The same in assembler_x86.cpp >>> In stubGenerator_x86_64.cpp >>> new fill_32_bytes_forward() is not used. >>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>> I did not look on assembler. May be tomorrow. >>> Thanks, >>> Vladimir >>> Tom Rodriguez wrote: >>>> 4809552: Optimize Arrays.fill(...) >>>> Reviewed-by: >>>> >>>> This adds new logic to recognize fill idioms and convert them into a >>>> call to an optimized fill routine. Loop predication creates easily >>>> matched loops that are simply replaced with calls to the new assembly >>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>> Objects and longs/double will be supported in a later putback. Tested >>>> with runthese, nsk and ctw plus jbb2005. >>>> >>>> http://cr.openjdk.java.net/~never/4809552 > From tom.rodriguez at oracle.com Fri Aug 20 11:58:48 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 11:58:48 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6ECD8B.7080706@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> Message-ID: <371AE61D-E3ED-47E0-B946-4D7C23D05AF6@oracle.com> On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: > Tom Rodriguez wrote: >>> Use next movdqa since you aligned address: >>> + movdqa(Address(to, 0), xtmp); >>> + movdqa(Address(to, 16), xtmp); >>> >>> instead of >>> + movq(Address(to, 0), xtmp); >>> + movq(Address(to, 8), xtmp); >>> + movq(Address(to, 16), xtmp); >>> + movq(Address(to, 24), xtmp); >> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? > > Sorry, you are right, it requires 16 not 8 bytes. :( > I think it worth to align to 16 since it will benefit all x86. > > movdl(xtmp, value); > pshufd(xtmp, xtmp, 0); > > + // align to 16 bytes, we know we are 8 byte aligned to start > + Label L_skip_align16; > + testptr(to, 8); > + jccb(Assembler::zero, L_skip_align16); > + subl(count, 2< + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) > + movq(Address(to, 0), xtmp); > + addptr(to, 8); > + BIND(L_skip_align16); > > subl(count, 8 << shift); > jcc(Assembler::less, L_check_fill_8_bytes); > align(16); I'll test that out. tom > > Vladimir > >> tom >>> Vladimir >>> >>> Vladimir Kozlov wrote: >>>> Tom, >>>> First, I would not call these changes Medium. They are Large at least. >>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>> loopTransform.cpp: >>>> In match_fill_loop() should we exclude StoreCMNode also? >>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>> store and store_value is not set for "copy candidate": >>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>> + // tty->print_cr("possible copy candidate"); >>>> + } else { >>>> + msg = "variant store value"; >>>> + } >>>> Why you assume that on 'else' it is mem_phi?: >>>> + if (n == head->phi()) { >>>> + // ok >>>> + } else { >>>> + // mem_phi >>>> + } >>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>> + } else if (n->is_CountedLoopEnd()) { >>>> + // ok so skip it. >>>> + msg = "node used outside loop"; >>>> ^ is >>>> How you translate next assert message?: >>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>> + #ifdef ASSERT >>>> + tty->print_cr("possible copy"); >>>> + store_value->dump(); >>>> + store->dump(); >>>> + #endif >>>> + msg = "variant store in loop"; >>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>> + } else if (n->Opcode() == Op_LShiftX) { >>>> + shift = n; >>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>> Also the above expression is wrong if initial index != 0. >>>> And actually you don't need to calculate it in match_fill_loop() since >>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>> that element type is supported. >>>> In intrinsify_fill() initial index value is taking into account for aligned >>>> but base_offset_in_bytes could be already part of offset and you need >>>> to multiply by element_size only initial index: >>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>> + int element_size = type2aelembytes(t); >>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>> + } >>>> stubRoutines.cpp: >>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>> stubGenerator_sparc.cpp: >>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>> ^ Generate stub for array fill. >>>> + // from: O0 >>>> ^ to >>>> + // to: O1 >>>> ^ value >>>> O5 is not used and not input argument: >>>> + const Register offset = O5; // offset from start of arrays >>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>> + switch (t) { >>>> + case T_BOOLEAN: >>>> + case T_BYTE: >>>> + shift = 2; >>>> + break; >>>> + case T_CHAR: >>>> + case T_SHORT: >>>> + shift = 1; >>>> + break; >>>> + case T_FLOAT: >>>> + case T_INT: >>>> + shift = 0; >>>> + break; >>>> + default: ShouldNotReachHere(); >>>> + } >>>> The same in assembler_x86.cpp >>>> In stubGenerator_x86_64.cpp >>>> new fill_32_bytes_forward() is not used. >>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>> I did not look on assembler. May be tomorrow. >>>> Thanks, >>>> Vladimir >>>> Tom Rodriguez wrote: >>>>> 4809552: Optimize Arrays.fill(...) >>>>> Reviewed-by: >>>>> >>>>> This adds new logic to recognize fill idioms and convert them into a >>>>> call to an optimized fill routine. Loop predication creates easily >>>>> matched loops that are simply replaced with calls to the new assembly >>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>> Objects and longs/double will be supported in a later putback. Tested >>>>> with runthese, nsk and ctw plus jbb2005. >>>>> >>>>> http://cr.openjdk.java.net/~never/4809552 From tom.rodriguez at oracle.com Fri Aug 20 12:49:25 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 12:49:25 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C6ECD8B.7080706@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> Message-ID: <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> It seems like a bit of a mixed bag. Moderately sized fills are slightly slower because of the extra alignment but larger fills are faster. I'll play with it some more. tom On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: > Tom Rodriguez wrote: >>> Use next movdqa since you aligned address: >>> + movdqa(Address(to, 0), xtmp); >>> + movdqa(Address(to, 16), xtmp); >>> >>> instead of >>> + movq(Address(to, 0), xtmp); >>> + movq(Address(to, 8), xtmp); >>> + movq(Address(to, 16), xtmp); >>> + movq(Address(to, 24), xtmp); >> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? > > Sorry, you are right, it requires 16 not 8 bytes. :( > I think it worth to align to 16 since it will benefit all x86. > > movdl(xtmp, value); > pshufd(xtmp, xtmp, 0); > > + // align to 16 bytes, we know we are 8 byte aligned to start > + Label L_skip_align16; > + testptr(to, 8); > + jccb(Assembler::zero, L_skip_align16); > + subl(count, 2< + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) > + movq(Address(to, 0), xtmp); > + addptr(to, 8); > + BIND(L_skip_align16); > > subl(count, 8 << shift); > jcc(Assembler::less, L_check_fill_8_bytes); > align(16); > > Vladimir > >> tom >>> Vladimir >>> >>> Vladimir Kozlov wrote: >>>> Tom, >>>> First, I would not call these changes Medium. They are Large at least. >>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>> loopTransform.cpp: >>>> In match_fill_loop() should we exclude StoreCMNode also? >>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>> store and store_value is not set for "copy candidate": >>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>> + // tty->print_cr("possible copy candidate"); >>>> + } else { >>>> + msg = "variant store value"; >>>> + } >>>> Why you assume that on 'else' it is mem_phi?: >>>> + if (n == head->phi()) { >>>> + // ok >>>> + } else { >>>> + // mem_phi >>>> + } >>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>> + } else if (n->is_CountedLoopEnd()) { >>>> + // ok so skip it. >>>> + msg = "node used outside loop"; >>>> ^ is >>>> How you translate next assert message?: >>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>> + #ifdef ASSERT >>>> + tty->print_cr("possible copy"); >>>> + store_value->dump(); >>>> + store->dump(); >>>> + #endif >>>> + msg = "variant store in loop"; >>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>> + } else if (n->Opcode() == Op_LShiftX) { >>>> + shift = n; >>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>> Also the above expression is wrong if initial index != 0. >>>> And actually you don't need to calculate it in match_fill_loop() since >>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>> that element type is supported. >>>> In intrinsify_fill() initial index value is taking into account for aligned >>>> but base_offset_in_bytes could be already part of offset and you need >>>> to multiply by element_size only initial index: >>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>> + int element_size = type2aelembytes(t); >>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>> + } >>>> stubRoutines.cpp: >>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>> stubGenerator_sparc.cpp: >>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>> ^ Generate stub for array fill. >>>> + // from: O0 >>>> ^ to >>>> + // to: O1 >>>> ^ value >>>> O5 is not used and not input argument: >>>> + const Register offset = O5; // offset from start of arrays >>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>> + switch (t) { >>>> + case T_BOOLEAN: >>>> + case T_BYTE: >>>> + shift = 2; >>>> + break; >>>> + case T_CHAR: >>>> + case T_SHORT: >>>> + shift = 1; >>>> + break; >>>> + case T_FLOAT: >>>> + case T_INT: >>>> + shift = 0; >>>> + break; >>>> + default: ShouldNotReachHere(); >>>> + } >>>> The same in assembler_x86.cpp >>>> In stubGenerator_x86_64.cpp >>>> new fill_32_bytes_forward() is not used. >>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>> I did not look on assembler. May be tomorrow. >>>> Thanks, >>>> Vladimir >>>> Tom Rodriguez wrote: >>>>> 4809552: Optimize Arrays.fill(...) >>>>> Reviewed-by: >>>>> >>>>> This adds new logic to recognize fill idioms and convert them into a >>>>> call to an optimized fill routine. Loop predication creates easily >>>>> matched loops that are simply replaced with calls to the new assembly >>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>> Objects and longs/double will be supported in a later putback. Tested >>>>> with runthese, nsk and ctw plus jbb2005. >>>>> >>>>> http://cr.openjdk.java.net/~never/4809552 From dmdabbs at gmail.com Fri Aug 20 14:29:21 2010 From: dmdabbs at gmail.com (David Dabbs) Date: Fri, 20 Aug 2010 16:29:21 -0500 Subject: Question about HS build version in jdk7-b106 In-Reply-To: <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> References: <00a601cb3fe0$0f44c360$2dce4a20$@com> <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> Message-ID: <005401cb40ae$c2744cc0$475ce640$@com> Hello. The b106 change notes indicate that the HS build was bumped to 06, but java -version indicates otherwise: java version "1.7.0-ea" Java(TM) SE Runtime Environment (build 1.7.0-ea-b106) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b05, mixed mode) From vladimir.kozlov at oracle.com Fri Aug 20 17:20:45 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Aug 2010 17:20:45 -0700 Subject: Request for reviews (S): 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack,"stack overflow") Message-ID: <4C6F1BDD.4000000@oracle.com> http://cr.openjdk.java.net/~kvn/6896381/webrev Fixed 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack,"stack overflow") Bytecode Analyzer emulates stack usage to track objects reference. For constants load (ldc) it loads a constant from constant pool to check its type but it does not check T_ILLEGAL type returned for unloaded strings and klasses when no space left in PermGen. Solution: Check constant Tag type instead since we need to know only constant's type. Also changed asserts to guarantee to avoid memory stomp in product VM. Tested with CTW. From tom.rodriguez at oracle.com Fri Aug 20 17:28:57 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 20 Aug 2010 17:28:57 -0700 Subject: Request for reviews (S): 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack, "stack overflow") In-Reply-To: <4C6F1BDD.4000000@oracle.com> References: <4C6F1BDD.4000000@oracle.com> Message-ID: Looks good. tom On Aug 20, 2010, at 5:20 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6896381/webrev > > Fixed 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack,"stack overflow") > > Bytecode Analyzer emulates stack usage to track > objects reference. For constants load (ldc) it > loads a constant from constant pool to check its > type but it does not check T_ILLEGAL type returned > for unloaded strings and klasses when no space left > in PermGen. > > Solution: > Check constant Tag type instead since we need > to know only constant's type. > Also changed asserts to guarantee to avoid memory > stomp in product VM. > > Tested with CTW. > From vladimir.kozlov at oracle.com Fri Aug 20 17:29:30 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Aug 2010 17:29:30 -0700 Subject: Request for reviews (S): 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack,"stack overflow") In-Reply-To: References: <4C6F1BDD.4000000@oracle.com> Message-ID: <4C6F1DEA.2020002@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > Looks good. > > tom > > On Aug 20, 2010, at 5:20 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/6896381/webrev >> >> Fixed 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack,"stack overflow") >> >> Bytecode Analyzer emulates stack usage to track >> objects reference. For constants load (ldc) it >> loads a constant from constant pool to check its >> type but it does not check T_ILLEGAL type returned >> for unloaded strings and klasses when no space left >> in PermGen. >> >> Solution: >> Check constant Tag type instead since we need >> to know only constant's type. >> Also changed asserts to guarantee to avoid memory >> stomp in product VM. >> >> Tested with CTW. >> > From david.cox at oracle.com Fri Aug 20 18:25:32 2010 From: david.cox at oracle.com (David Cox) Date: Fri, 20 Aug 2010 18:25:32 -0700 Subject: Question about HS build version in jdk7-b106 In-Reply-To: <005401cb40ae$c2744cc0$475ce640$@com> References: <00a601cb3fe0$0f44c360$2dce4a20$@com> <63117F40-83A9-4FDE-BCEF-4957D2FA6B1D@oracle.com> <005401cb40ae$c2744cc0$475ce640$@com> Message-ID: <4C6F2B0C.20803@oracle.com> Unfortunately, HotSpot version hs19.0-b06 was not included in Java SE Runtime Environment build 1.7.0-ea-b106 due to a problem that should be resolved with the next build. Dave On 8/20/10 2:29 PM, David Dabbs wrote: > Hello. > > The b106 change notes indicate that the HS build was bumped to 06, but java > -version indicates otherwise: > > > java version "1.7.0-ea" > Java(TM) SE Runtime Environment (build 1.7.0-ea-b106) > Java HotSpot(TM) 64-Bit Server VM (build 19.0-b05, mixed mode) > > > > > From john.r.rose at oracle.com Fri Aug 20 23:48:33 2010 From: john.r.rose at oracle.com (John Rose) Date: Fri, 20 Aug 2010 23:48:33 -0700 Subject: review request (M): 6912064: type profiles need to be more thorough for dynamic language support In-Reply-To: <4C40AB70.2050201@oracle.com> References: <4C408F9D.8010006@oracle.com> <05989C3B-12EA-4608-B8EC-2BD91BA6D340@oracle.com> <4C40AB70.2050201@oracle.com> Message-ID: <3B35424C-53CF-4875-B54F-B4A2109DF116@oracle.com> FYI: I attempted to commit this last month; the jprt sync failed and I timed out for the Summit. I just started a jprt job to integrate this change... -- John On Jul 16, 2010, at 11:56 AM, Vladimir Kozlov wrote: > OK. > > Vladimir > > John Rose wrote: >> On Jul 16, 2010, at 9:58 AM, Vladimir Kozlov wrote: >>> About last changes. Why you not do 'stopped()' check after that call?: >>> >>> 2668 cast_obj = maybe_cast_profiled_receiver(not_null_obj, data, tk->klass()); >> Because the code (in gen_checkcast) works correctly in the case of dead control after the profile cast. When the obj_path control is dead, the null path might still be live, and the region/phi collect the right results in any case. >> Should I add a comment? (It's not the only way in which the two routines fail to be parallel, BTW.) >> Or, the other way to fix it might be to have load_object_klass and gen_subtype_check DTRT with dead control, and remove the explicit check in gen_instanceof. I took the path of least resistance. >> -- John From john.r.rose at oracle.com Sat Aug 21 01:43:11 2010 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Sat, 21 Aug 2010 08:43:11 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6912064: type profiles need to be exploited more for dynamic language support Message-ID: <20100821084313.49F6F47345@hg.openjdk.java.net> Changeset: 4b29a725c43c Author: jrose Date: 2010-08-20 23:40 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/4b29a725c43c 6912064: type profiles need to be exploited more for dynamic language support Reviewed-by: kvn ! src/share/vm/includeDB_compiler2 ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/graphKit.hpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/parse.hpp ! src/share/vm/opto/parse2.cpp ! src/share/vm/opto/parseHelper.cpp ! src/share/vm/runtime/globals.hpp From vladimir.kozlov at oracle.com Sat Aug 21 19:03:23 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 21 Aug 2010 19:03:23 -0700 Subject: Request for reviews (XL): 6978355: renaming for 6961697 In-Reply-To: <1282240402.29965.10.camel@macbook> References: <1282240402.29965.10.camel@macbook> Message-ID: <4C70856B.9060005@oracle.com> This looks good. Vladimir On 8/19/10 10:53 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/6978355/webrev.01/ > > 6978355: renaming for 6961697 > Summary: This is the renaming part of 6961697 to keep the actual changes small for review. > Reviewed-by: > > This is a split-off of 6961697 to make the review of the actual > changes easier. This CR only does a couple of renames, moves code_* > methods into CodeBlob, adds a new CodeBuffer constructor to make code > simpler, adds some emit functions to CodeSection to make AD code > simpler, ... > > I did the rename in two steps to make sure that no new > CodeBlob::code_* method is called instead of an old nmethod::code_* > method (which are now called nmethod::insts_*). > From tom.rodriguez at oracle.com Mon Aug 23 08:34:37 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 23 Aug 2010 08:34:37 -0700 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <1282243111.29965.24.camel@macbook> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> Message-ID: I think rearranging the all the section names wasn't really needed particularly in CodeBuffer since the ordering the buffer didn't change. You don't need to undo though it. Please correct the comment here: // Here is the list of all possible sections, in order of ascending address. + SECT_FIRST = 0, + SECT_CONSTS = SECT_FIRST, // Non-instruction data: Floats, jump tables, etc. SECT_INSTS, // Executable instructions. SECT_STUBS, // Outbound trampolines for supporting call sites. SECT_CONSTS, // Non-instruction data: Floats, jump tables, etc. SECT_LIMIT, SECT_NONE = -1 Since the address ordering in CodeBuffers hasn't changed. I think relocate relocate_code_to should have some more comments explaining that reordering is occurring too. Otherwise this looks ok. tom On Aug 19, 2010, at 11:38 AM, Christian Thalinger wrote: > On Thu, 2010-07-08 at 21:18 +0200, Christian Thalinger wrote: >> I agree on that. When I've finished the new webrev I will merge it with >> the other workspace and do some performance evaluation. > > So here's the new version: > > http://cr.openjdk.java.net/~twisti/6961697/webrev.02/ > > This one is completely different to the ones before and much simpler. > The renaming part was moved to 6978355 and I only changed the order of > the code sections: consts, insts, stubs. The CodeBuffer works as > before and the memory layout is the same. But when the CodeBuffer is > later copied into a CodeBlob the consts sections moves to the beginning. > > I did a lot of testing with the changes of 6961690. > From tom.rodriguez at oracle.com Mon Aug 23 10:42:15 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 23 Aug 2010 10:42:15 -0700 Subject: Request for reviews (XL): 6978355: renaming for 6961697 In-Reply-To: <1282240402.29965.10.camel@macbook> References: <1282240402.29965.10.camel@macbook> Message-ID: <7F51D3E8-E980-4583-A01F-0CA246F7A43F@oracle.com> In the jniFastGetField can you switch: + address fast_entry = blob->code_begin(); to: + address fast_entry = __ pc(); Just so it matches how we normally do it. Don't forget to fix CodeBlob.java. Otherwise it looks ok. tom On Aug 19, 2010, at 10:53 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/6978355/webrev.01/ > > 6978355: renaming for 6961697 > Summary: This is the renaming part of 6961697 to keep the actual changes small for review. > Reviewed-by: > > This is a split-off of 6961697 to make the review of the actual > changes easier. This CR only does a couple of renames, moves code_* > methods into CodeBlob, adds a new CodeBuffer constructor to make code > simpler, adds some emit functions to CodeSection to make AD code > simpler, ... > > I did the rename in two steps to make sure that no new > CodeBlob::code_* method is called instead of an old nmethod::code_* > method (which are now called nmethod::insts_*). > From vladimir.kozlov at oracle.com Mon Aug 23 15:34:22 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Mon, 23 Aug 2010 22:34:22 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack, "stack overflow") Message-ID: <20100823223425.45456473CD@hg.openjdk.java.net> Changeset: 53dbe853fb3a Author: kvn Date: 2010-08-23 09:09 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/53dbe853fb3a 6896381: CTW fails share/vm/ci/bcEscapeAnalyzer.cpp:99, assert(_stack_height < _max_stack,"stack overflow") Summary: Check constant Tag type instead of calling get_constant(). Reviewed-by: never ! src/share/vm/ci/bcEscapeAnalyzer.cpp From christian.thalinger at oracle.com Tue Aug 24 09:06:03 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 24 Aug 2010 18:06:03 +0200 Subject: Request for reviews .02 (XL): 6978355: renaming for 6961697 In-Reply-To: <7F51D3E8-E980-4583-A01F-0CA246F7A43F@oracle.com> References: <1282240402.29965.10.camel@macbook> <7F51D3E8-E980-4583-A01F-0CA246F7A43F@oracle.com> Message-ID: <1282665964.1282.32.camel@macbook> On Mon, 2010-08-23 at 10:42 -0700, Tom Rodriguez wrote: > In the jniFastGetField can you switch: > > + address fast_entry = blob->code_begin(); > > to: > > + address fast_entry = __ pc(); > > Just so it matches how we normally do it. OK. > > Don't forget to fix CodeBlob.java. Ohh, I forgot that. > > Otherwise it looks ok. Here is the updated webrev: http://cr.openjdk.java.net/~twisti/6978355/webrev.02/ -- Christian From tom.rodriguez at oracle.com Tue Aug 24 09:07:50 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 24 Aug 2010 09:07:50 -0700 Subject: Request for reviews .02 (XL): 6978355: renaming for 6961697 In-Reply-To: <1282665964.1282.32.camel@macbook> References: <1282240402.29965.10.camel@macbook> <7F51D3E8-E980-4583-A01F-0CA246F7A43F@oracle.com> <1282665964.1282.32.camel@macbook> Message-ID: <6C73E5F0-BA5F-4CD2-AAD9-E389968089F2@oracle.com> Looks good. tom On Aug 24, 2010, at 9:06 AM, Christian Thalinger wrote: > On Mon, 2010-08-23 at 10:42 -0700, Tom Rodriguez wrote: >> In the jniFastGetField can you switch: >> >> + address fast_entry = blob->code_begin(); >> >> to: >> >> + address fast_entry = __ pc(); >> >> Just so it matches how we normally do it. > > OK. > >> >> Don't forget to fix CodeBlob.java. > > Ohh, I forgot that. > >> >> Otherwise it looks ok. > > Here is the updated webrev: > > http://cr.openjdk.java.net/~twisti/6978355/webrev.02/ > > -- Christian > From vladimir.kozlov at oracle.com Tue Aug 24 10:11:11 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Aug 2010 10:11:11 -0700 Subject: Request for reviews .02 (XL): 6978355: renaming for 6961697 In-Reply-To: <1282665964.1282.32.camel@macbook> References: <1282240402.29965.10.camel@macbook> <7F51D3E8-E980-4583-A01F-0CA246F7A43F@oracle.com> <1282665964.1282.32.camel@macbook> Message-ID: <4C73FD2F.4000803@oracle.com> Good. Vladimir Christian Thalinger wrote: > On Mon, 2010-08-23 at 10:42 -0700, Tom Rodriguez wrote: >> In the jniFastGetField can you switch: >> >> + address fast_entry = blob->code_begin(); >> >> to: >> >> + address fast_entry = __ pc(); >> >> Just so it matches how we normally do it. > > OK. > >> Don't forget to fix CodeBlob.java. > > Ohh, I forgot that. > >> Otherwise it looks ok. > > Here is the updated webrev: > > http://cr.openjdk.java.net/~twisti/6978355/webrev.02/ > > -- Christian > From vladimir.kozlov at oracle.com Tue Aug 24 10:34:21 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Aug 2010 10:34:21 -0700 Subject: Update: Request for reviews (S): 6976400: "Meet Not Symmetric" In-Reply-To: <4C6433E7.1000804@oracle.com> References: <4C6433E7.1000804@oracle.com> Message-ID: <4C74029D.9030807@oracle.com> http://cr.openjdk.java.net/~kvn/6976400/webrev.03 After discussion with Tom and investigating further the problem was identified as how we define TypeAryPtr::RANGE which is bottom[int:>=0]+12 * : TypeAryPtr::RANGE = TypeAryPtr::make( TypePtr::BotPTR, TypeAry::make(Type::BOTTOM,TypeInt::POS), current->env()->Object_klass(), false, arrayOopDesc::length_offset_in_bytes()); It has bottom element type and defined klass but C2 type system expect NULL klass for bottom[] and top[]. Solution: Use NULL as klass for TypeAryPtr::RANGE. Add verification into TypeAryPtr constructor to make sure that specified klass and TypeAryPtr::klass() follow the same rules. Tested CTW, java/lang regression tests, nsk tests Vladimir Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6976400/webrev > > Fixed 6976400: "Meet Not Symmetric" > > Meet of integer array pointer type with array pointer > which has j.l.Object klass incorrectly falls to bottom: > > t = byte[int:>=0]:NotNull:exact+12 * > this= bottom[int:>=0]+12 * > mt=(t meet this)= bottom[int:>=0]+12 * > t_dual= int:127..-128:www[int:max..0]:AnyNull:exact+12 > *,iid=top > mt_dual= top[int:max..0]:TopPTR+12 *,iid=top > mt_dual meet t_dual= bottom[int:max..0]:AnyNull:exact+12 * [narrow] > > Solution: > Add missing checks for j.l.Object klass in TypeAryPtr::xmeet(). > > Tested with failing cases, CTW, java/lang regression tests. From christian.thalinger at oracle.com Wed Aug 25 06:16:23 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 25 Aug 2010 15:16:23 +0200 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <4C6E0CCD.8080800@oracle.com> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> <4C6E0CCD.8080800@oracle.com> Message-ID: <1282742183.28481.19.camel@macbook> On Thu, 2010-08-19 at 22:04 -0700, Vladimir Kozlov wrote: > Christian, > > Are these changes made above 6978355 changes? I did not find code_offset() definition in this changes? Yes, it's on top of 6978355. > Why you removed oops section print in nmethod.cpp? Because it was printed twice. That is a bug I introduced with: 6951083: oops and relocations should part of nmethod not CodeBlob > What about FIXME commented assert in relocInfo.cpp? I removed that in 6978355 but added it again here. I can't remember if that was by accident or I really hit the assert. Should I uncomment it again? -- Christian From christian.thalinger at oracle.com Wed Aug 25 06:40:43 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 25 Aug 2010 15:40:43 +0200 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> Message-ID: <1282743643.28481.28.camel@macbook> On Mon, 2010-08-23 at 08:34 -0700, Tom Rodriguez wrote: > I think rearranging the all the section names wasn't really needed > particularly in CodeBuffer since the ordering the buffer didn't > change. You don't need to undo though it. Yeah, right. > Please correct the comment here: > > // Here is the list of all possible sections, in order of ascending address. > + SECT_FIRST = 0, > + SECT_CONSTS = SECT_FIRST, // Non-instruction data: Floats, jump tables, etc. > SECT_INSTS, // Executable instructions. > SECT_STUBS, // Outbound trampolines for supporting call sites. > SECT_CONSTS, // Non-instruction data: Floats, jump tables, etc. > SECT_LIMIT, SECT_NONE = -1 Done. > > Since the address ordering in CodeBuffers hasn't changed. I think > relocate relocate_code_to should have some more comments explaining > that reordering is occurring too. Added a comment. http://cr.openjdk.java.net/~twisti/6961697/webrev.03/ -- Christian From Christian.Thalinger at Sun.COM Wed Aug 25 07:26:16 2010 From: Christian.Thalinger at Sun.COM (Christian.Thalinger at Sun.COM) Date: Wed, 25 Aug 2010 14:26:16 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6978355: renaming for 6961697 Message-ID: <20100825142618.226CE47429@hg.openjdk.java.net> Changeset: 3e8fbc61cee8 Author: twisti Date: 2010-08-25 05:27 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/3e8fbc61cee8 6978355: renaming for 6961697 Summary: This is the renaming part of 6961697 to keep the actual changes small for review. Reviewed-by: kvn, never ! agent/src/share/classes/sun/jvm/hotspot/CommandProcessor.java ! agent/src/share/classes/sun/jvm/hotspot/c1/Runtime1.java ! agent/src/share/classes/sun/jvm/hotspot/code/CodeBlob.java ! agent/src/share/classes/sun/jvm/hotspot/code/NMethod.java ! agent/src/share/classes/sun/jvm/hotspot/code/PCDesc.java ! agent/src/share/classes/sun/jvm/hotspot/ui/FindInCodeCachePanel.java ! agent/src/share/classes/sun/jvm/hotspot/ui/classbrowser/HTMLGenerator.java ! agent/src/share/classes/sun/jvm/hotspot/utilities/PointerFinder.java ! agent/src/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java ! src/cpu/sparc/vm/assembler_sparc.cpp ! src/cpu/sparc/vm/codeBuffer_sparc.hpp ! src/cpu/sparc/vm/frame_sparc.cpp ! src/cpu/sparc/vm/jniFastGetField_sparc.cpp ! src/cpu/sparc/vm/nativeInst_sparc.cpp ! src/cpu/sparc/vm/sparc.ad ! src/cpu/x86/vm/frame_x86.cpp ! src/cpu/x86/vm/frame_x86.inline.hpp ! src/cpu/x86/vm/jniFastGetField_x86_32.cpp ! src/cpu/x86/vm/jniFastGetField_x86_64.cpp ! src/cpu/x86/vm/vm_version_x86.cpp ! src/cpu/x86/vm/x86_32.ad ! src/cpu/x86/vm/x86_64.ad ! src/os/solaris/dtrace/generateJvmOffsets.cpp ! src/os/solaris/dtrace/libjvm_db.c ! src/os_cpu/windows_x86/vm/os_windows_x86.cpp ! src/os_cpu/windows_x86/vm/windows_x86_32.ad ! src/os_cpu/windows_x86/vm/windows_x86_64.ad ! src/share/vm/adlc/output_c.cpp ! src/share/vm/asm/codeBuffer.cpp ! src/share/vm/asm/codeBuffer.hpp ! src/share/vm/c1/c1_Compilation.cpp ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/c1/c1_Runtime1.hpp ! src/share/vm/ci/ciMethod.cpp ! src/share/vm/code/codeBlob.cpp ! src/share/vm/code/codeBlob.hpp ! src/share/vm/code/codeCache.cpp ! src/share/vm/code/exceptionHandlerTable.cpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/code/nmethod.hpp ! src/share/vm/code/pcDesc.cpp ! src/share/vm/code/relocInfo.cpp ! src/share/vm/code/scopeDesc.cpp ! src/share/vm/code/stubs.cpp ! src/share/vm/code/vtableStubs.cpp ! src/share/vm/compiler/compileBroker.cpp ! src/share/vm/compiler/disassembler.cpp ! src/share/vm/interpreter/interpreter.hpp ! src/share/vm/interpreter/interpreterRuntime.cpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/lcm.cpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/output.cpp ! src/share/vm/opto/stringopts.cpp ! src/share/vm/prims/jvmtiCodeBlobEvents.cpp ! src/share/vm/prims/jvmtiExport.cpp ! src/share/vm/prims/methodHandles.cpp ! src/share/vm/runtime/compilationPolicy.cpp ! src/share/vm/runtime/frame.cpp ! src/share/vm/runtime/icache.cpp ! src/share/vm/runtime/rframe.cpp ! src/share/vm/runtime/sharedRuntime.cpp ! src/share/vm/runtime/sharedRuntime.hpp ! src/share/vm/runtime/stubRoutines.cpp ! src/share/vm/runtime/vmStructs.cpp From tom.rodriguez at oracle.com Wed Aug 25 10:20:48 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 25 Aug 2010 10:20:48 -0700 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <1282743643.28481.28.camel@macbook> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> <1282743643.28481.28.camel@macbook> Message-ID: Looks good. tom On Aug 25, 2010, at 6:40 AM, Christian Thalinger wrote: > On Mon, 2010-08-23 at 08:34 -0700, Tom Rodriguez wrote: >> I think rearranging the all the section names wasn't really needed >> particularly in CodeBuffer since the ordering the buffer didn't >> change. You don't need to undo though it. > > Yeah, right. > >> Please correct the comment here: >> >> // Here is the list of all possible sections, in order of ascending address. >> + SECT_FIRST = 0, >> + SECT_CONSTS = SECT_FIRST, // Non-instruction data: Floats, jump tables, etc. >> SECT_INSTS, // Executable instructions. >> SECT_STUBS, // Outbound trampolines for supporting call sites. >> SECT_CONSTS, // Non-instruction data: Floats, jump tables, etc. >> SECT_LIMIT, SECT_NONE = -1 > > Done. > >> >> Since the address ordering in CodeBuffers hasn't changed. I think >> relocate relocate_code_to should have some more comments explaining >> that reordering is occurring too. > > Added a comment. > > http://cr.openjdk.java.net/~twisti/6961697/webrev.03/ > > -- Christian > From tom.rodriguez at oracle.com Wed Aug 25 10:28:01 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 25 Aug 2010 10:28:01 -0700 Subject: Update: Request for reviews (S): 6976400: "Meet Not Symmetric" In-Reply-To: <4C74029D.9030807@oracle.com> References: <4C6433E7.1000804@oracle.com> <4C74029D.9030807@oracle.com> Message-ID: Looks good. tom On Aug 24, 2010, at 10:34 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6976400/webrev.03 > > After discussion with Tom and investigating further > the problem was identified as how we define TypeAryPtr::RANGE > which is bottom[int:>=0]+12 * : > > TypeAryPtr::RANGE = TypeAryPtr::make( TypePtr::BotPTR, TypeAry::make(Type::BOTTOM,TypeInt::POS), current->env()->Object_klass(), false, arrayOopDesc::length_offset_in_bytes()); > > It has bottom element type and defined klass but C2 > type system expect NULL klass for bottom[] and top[]. > > Solution: > Use NULL as klass for TypeAryPtr::RANGE. > Add verification into TypeAryPtr constructor to make sure > that specified klass and TypeAryPtr::klass() follow > the same rules. > > Tested CTW, java/lang regression tests, nsk tests > > Vladimir > > Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/6976400/webrev >> Fixed 6976400: "Meet Not Symmetric" >> Meet of integer array pointer type with array pointer >> which has j.l.Object klass incorrectly falls to bottom: >> t = byte[int:>=0]:NotNull:exact+12 * >> this= bottom[int:>=0]+12 * >> mt=(t meet this)= bottom[int:>=0]+12 * >> t_dual= int:127..-128:www[int:max..0]:AnyNull:exact+12 *,iid=top >> mt_dual= top[int:max..0]:TopPTR+12 *,iid=top >> mt_dual meet t_dual= bottom[int:max..0]:AnyNull:exact+12 * [narrow] >> Solution: >> Add missing checks for j.l.Object klass in TypeAryPtr::xmeet(). >> Tested with failing cases, CTW, java/lang regression tests. From vladimir.kozlov at oracle.com Wed Aug 25 10:56:54 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 25 Aug 2010 10:56:54 -0700 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <1282742183.28481.19.camel@macbook> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> <4C6E0CCD.8080800@oracle.com> <1282742183.28481.19.camel@macbook> Message-ID: <4C755966.5030206@oracle.com> Christian Thalinger wrote: > >> What about FIXME commented assert in relocInfo.cpp? > > I removed that in 6978355 but added it again here. I can't remember if > that was by accident or I really hit the assert. Should I uncomment it > again? Could you investigate it. If you hit the assert then you need to have at least the comment which explains what happened. Thanks, Vladimir > > -- Christian > From vladimir.kozlov at oracle.com Wed Aug 25 12:10:41 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 25 Aug 2010 12:10:41 -0700 Subject: Update: Request for reviews (S): 6976400: "Meet Not Symmetric" In-Reply-To: References: <4C6433E7.1000804@oracle.com> <4C74029D.9030807@oracle.com> Message-ID: <4C756AB1.6020205@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > Looks good. > > tom > > On Aug 24, 2010, at 10:34 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/6976400/webrev.03 >> >> After discussion with Tom and investigating further >> the problem was identified as how we define TypeAryPtr::RANGE >> which is bottom[int:>=0]+12 * : >> >> TypeAryPtr::RANGE = TypeAryPtr::make( TypePtr::BotPTR, TypeAry::make(Type::BOTTOM,TypeInt::POS), current->env()->Object_klass(), false, arrayOopDesc::length_offset_in_bytes()); >> >> It has bottom element type and defined klass but C2 >> type system expect NULL klass for bottom[] and top[]. >> >> Solution: >> Use NULL as klass for TypeAryPtr::RANGE. >> Add verification into TypeAryPtr constructor to make sure >> that specified klass and TypeAryPtr::klass() follow >> the same rules. >> >> Tested CTW, java/lang regression tests, nsk tests >> >> Vladimir >> >> Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/6976400/webrev >>> Fixed 6976400: "Meet Not Symmetric" >>> Meet of integer array pointer type with array pointer >>> which has j.l.Object klass incorrectly falls to bottom: >>> t = byte[int:>=0]:NotNull:exact+12 * >>> this= bottom[int:>=0]+12 * >>> mt=(t meet this)= bottom[int:>=0]+12 * >>> t_dual= int:127..-128:www[int:max..0]:AnyNull:exact+12 *,iid=top >>> mt_dual= top[int:max..0]:TopPTR+12 *,iid=top >>> mt_dual meet t_dual= bottom[int:max..0]:AnyNull:exact+12 * [narrow] >>> Solution: >>> Add missing checks for j.l.Object klass in TypeAryPtr::xmeet(). >>> Tested with failing cases, CTW, java/lang regression tests. > From tom.rodriguez at oracle.com Wed Aug 25 12:32:55 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Wed, 25 Aug 2010 19:32:55 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 48 new changesets Message-ID: <20100825193417.8F3F047438@hg.openjdk.java.net> Changeset: ab3fd720516c Author: rasbold Date: 2010-08-10 19:17 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/ab3fd720516c 6378314: Bad warning message when agent library not found. local directory is not searched. Summary: Print a more detailed error message for agent library load failure. Reviewed-by: jcoomes, never, ohair, coleenp Contributed-by: jeremymanson at google.com ! src/share/vm/runtime/thread.cpp Changeset: 21e519b91576 Author: dcubed Date: 2010-08-13 07:33 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/21e519b91576 Merge ! src/share/vm/runtime/thread.cpp Changeset: f6f3eef8a521 Author: kevinw Date: 2010-07-30 22:43 +0100 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/f6f3eef8a521 6581734: CMS Old Gen's collection usage is zero after GC which is incorrect Summary: Management code enabled for use by a concurrent collector. Reviewed-by: mchung, ysr ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp ! src/share/vm/gc_implementation/includeDB_gc_concurrentMarkSweep ! src/share/vm/services/management.cpp ! src/share/vm/services/memoryManager.cpp ! src/share/vm/services/memoryManager.hpp ! src/share/vm/services/memoryService.cpp ! src/share/vm/services/memoryService.hpp + test/gc/6581734/Test6581734.java Changeset: 63f4675ac87d Author: kevinw Date: 2010-07-31 15:10 +0100 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/63f4675ac87d Merge - src/os/linux/vm/vtune_linux.cpp - src/os/solaris/vm/vtune_solaris.cpp - src/os/windows/vm/vtune_windows.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp - src/share/vm/runtime/vtune.hpp Changeset: 2d160770d2e5 Author: johnc Date: 2010-08-02 12:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/2d160770d2e5 6814437: G1: remove the _new_refs array Summary: The per-worker _new_refs array is used to hold references that point into the collection set. It is populated during RSet updating and subsequently processed. In the event of an evacuation failure it processed again to recreate the RSets of regions in the collection set. Remove the per-worker _new_refs array by processing the references directly. Use a DirtyCardQueue to hold the cards containing the references so that the RSets of regions in the collection set can be recreated when handling an evacuation failure. Reviewed-by: iveresov, jmasa, tonyp ! src/share/vm/gc_implementation/g1/concurrentG1Refine.cpp ! src/share/vm/gc_implementation/g1/concurrentG1Refine.hpp ! src/share/vm/gc_implementation/g1/dirtyCardQueue.cpp ! src/share/vm/gc_implementation/g1/dirtyCardQueue.hpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp ! src/share/vm/gc_implementation/g1/g1OopClosures.inline.hpp ! src/share/vm/gc_implementation/g1/g1RemSet.cpp ! src/share/vm/gc_implementation/g1/g1RemSet.hpp ! src/share/vm/gc_implementation/g1/g1RemSet.inline.hpp ! src/share/vm/gc_implementation/g1/heapRegion.cpp ! src/share/vm/gc_implementation/includeDB_gc_g1 Changeset: 9d7a8ab3736b Author: tonyp Date: 2010-07-22 10:27 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/9d7a8ab3736b 6962589: remove breadth first scanning code from parallel gc Summary: Remove the breadth-first copying order from ParallelScavenge and use depth-first by default. Reviewed-by: jcoomes, ysr, johnc ! src/share/vm/gc_implementation/includeDB_gc_parallelScavenge ! src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp - src/share/vm/gc_implementation/parallelScavenge/prefetchQueue.hpp ! src/share/vm/gc_implementation/parallelScavenge/psPromotionManager.cpp ! src/share/vm/gc_implementation/parallelScavenge/psPromotionManager.hpp ! src/share/vm/gc_implementation/parallelScavenge/psPromotionManager.inline.hpp ! src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp ! src/share/vm/gc_implementation/parallelScavenge/psScavenge.inline.hpp ! src/share/vm/gc_implementation/parallelScavenge/psTasks.cpp ! src/share/vm/oops/arrayKlassKlass.cpp ! src/share/vm/oops/compiledICHolderKlass.cpp ! src/share/vm/oops/constMethodKlass.cpp ! src/share/vm/oops/constantPoolKlass.cpp ! src/share/vm/oops/cpCacheKlass.cpp ! src/share/vm/oops/instanceKlass.cpp ! src/share/vm/oops/instanceKlass.hpp ! src/share/vm/oops/instanceKlassKlass.cpp ! src/share/vm/oops/instanceRefKlass.cpp ! src/share/vm/oops/klassKlass.cpp ! src/share/vm/oops/klassPS.hpp ! src/share/vm/oops/methodDataKlass.cpp ! src/share/vm/oops/methodKlass.cpp ! src/share/vm/oops/objArrayKlass.cpp ! src/share/vm/oops/objArrayKlassKlass.cpp ! src/share/vm/oops/oop.hpp ! src/share/vm/oops/oop.psgc.inline.hpp ! src/share/vm/oops/symbolKlass.cpp ! src/share/vm/oops/typeArrayKlass.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/globals.hpp Changeset: 0ce1569c90e5 Author: tonyp Date: 2010-08-04 13:03 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0ce1569c90e5 6963209: G1: remove the concept of abandoned pauses Summary: As part of 6944166 we disabled the concept of abandoned pauses (i.e., if the collection set is empty, we would still try to do a pause even if it is to update the RSets and scan the roots). This changeset removes the code and structures associated with abandoned pauses. Reviewed-by: iveresov, johnc ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp Changeset: a03ae377b2e8 Author: johnc Date: 2010-08-06 10:17 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/a03ae377b2e8 6930581: G1: assert(ParallelGCThreads > 1 || n_yielded() == _hrrs->occupied(),"Should have yielded all the .. Summary: During RSet updating, when ParallelGCThreads is zero, references that point into the collection set are added directly the referenced region's RSet. This can cause the sparse table in the RSet to expand. RSet scanning and the "occupied" routine will then operate on different instances of the sparse table causing the assert to trip. This may also cause some cards added post expansion to be missed during RSet scanning. When ParallelGCThreads is non-zero such references are recorded on the "references to be scanned" queue and the card containing the reference is recorded in a dirty card queue for use in the event of an evacuation failure. Employ the parallel code in the serial case to avoid expanding the RSets of regions in the collection set. Reviewed-by: iveresov, ysr, tonyp ! src/share/vm/gc_implementation/g1/g1RemSet.cpp ! src/share/vm/gc_implementation/g1/g1RemSet.hpp ! src/share/vm/gc_implementation/g1/g1RemSet.inline.hpp ! src/share/vm/gc_implementation/g1/sparsePRT.cpp Changeset: 5f429ee79634 Author: jcoomes Date: 2010-08-09 05:41 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/5f429ee79634 6966222: G1: simplify TaskQueue overflow handling Reviewed-by: tonyp, ysr ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp ! src/share/vm/utilities/taskqueue.cpp ! src/share/vm/utilities/taskqueue.hpp Changeset: 94251661de76 Author: jcoomes Date: 2010-08-09 18:03 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/94251661de76 6970376: ParNew: shared TaskQueue statistics Reviewed-by: ysr ! src/share/vm/gc_implementation/parNew/parNewGeneration.cpp ! src/share/vm/gc_implementation/parNew/parNewGeneration.hpp Changeset: a6bff45449bc Author: ysr Date: 2010-08-10 14:53 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/a6bff45449bc 6973570: OrderAccess::storestore() scales poorly on multi-socket x64 and sparc: cache-line ping-ponging Summary: volatile store to static variable removed in favour of a volatile store to stack to avoid excessive cache coherency traffic; verified that the volatile store is not elided by any of our current compilers. Reviewed-by: dholmes, dice, jcoomes, kvn ! src/os_cpu/linux_sparc/vm/orderAccess_linux_sparc.inline.hpp ! src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp ! src/os_cpu/solaris_sparc/vm/orderAccess_solaris_sparc.inline.hpp ! src/os_cpu/solaris_x86/vm/orderAccess_solaris_x86.inline.hpp ! src/os_cpu/windows_x86/vm/orderAccess_windows_x86.inline.hpp ! src/share/vm/runtime/orderAccess.cpp ! src/share/vm/runtime/orderAccess.hpp Changeset: 2d6b74c9a797 Author: jcoomes Date: 2010-08-11 13:12 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/2d6b74c9a797 6976378: ParNew: stats are printed unconditionally in debug builds Reviewed-by: tonyp ! src/share/vm/gc_implementation/parNew/parNewGeneration.cpp Changeset: 7fcd5f39bd7a Author: johnc Date: 2010-08-14 00:47 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/7fcd5f39bd7a Merge - src/share/vm/gc_implementation/parallelScavenge/prefetchQueue.hpp ! src/share/vm/oops/arrayKlassKlass.cpp ! src/share/vm/oops/compiledICHolderKlass.cpp ! src/share/vm/oops/constMethodKlass.cpp ! src/share/vm/oops/constantPoolKlass.cpp ! src/share/vm/oops/cpCacheKlass.cpp ! src/share/vm/oops/klassKlass.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/globals.hpp Changeset: cb4250ef73b2 Author: mikejwre Date: 2010-07-23 16:42 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/cb4250ef73b2 Added tag jdk7-b102 for changeset c5cadf1a0771 ! .hgtags Changeset: efd4401fab1d Author: cl Date: 2010-07-29 13:33 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/efd4401fab1d Added tag jdk7-b103 for changeset cb4250ef73b2 ! .hgtags Changeset: cc3fdfeb54b0 Author: trims Date: 2010-07-29 23:14 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/cc3fdfeb54b0 Merge Changeset: fd2645290e89 Author: trims Date: 2010-07-30 06:56 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/fd2645290e89 6973381: Bump the HS19 build number to 05 Summary: Update the HS19 build number to 05 Reviewed-by: jcoomes ! make/hotspot_version Changeset: 28abe3f6a5f6 Author: trims Date: 2010-08-03 19:01 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/28abe3f6a5f6 Merge Changeset: b4acf10eb134 Author: trims Date: 2010-08-05 02:48 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/b4acf10eb134 Added tag hs19-b04 for changeset e55900b5c1b8 ! .hgtags Changeset: 6709c14587c2 Author: cl Date: 2010-08-06 12:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/6709c14587c2 Added tag jdk7-b104 for changeset b4acf10eb134 ! .hgtags Changeset: 3dc64719cf18 Author: cl Date: 2010-08-13 11:38 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/3dc64719cf18 Added tag jdk7-b105 for changeset 6709c14587c2 ! .hgtags Changeset: 688a538aa654 Author: trims Date: 2010-08-13 10:55 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/688a538aa654 Merge Changeset: 5f3c8db59d83 Author: trims Date: 2010-08-13 10:56 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/5f3c8db59d83 6977051: Bump the HS19 build number to 06 Summary: Update the HS19 build number to 06 Reviewed-by: jcoomes ! make/hotspot_version Changeset: 1b81ca701fa5 Author: trims Date: 2010-08-17 09:43 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/1b81ca701fa5 Merge Changeset: f121b2772674 Author: trims Date: 2010-08-18 16:11 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/f121b2772674 Merge - src/share/vm/gc_implementation/parallelScavenge/prefetchQueue.hpp Changeset: 495caa35b1b5 Author: asaha Date: 2010-08-17 22:52 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/495caa35b1b5 6977952: Test: Sync missing tests from hs16.3 to hs17.x Reviewed-by: wrockett + test/compiler/6894807/IsInstanceTest.java + test/compiler/6894807/Test6894807.sh + test/runtime/6626217/IFace.java + test/runtime/6626217/Loader2.java + test/runtime/6626217/Test6626217.sh + test/runtime/6626217/You_Have_Been_P0wned.java + test/runtime/6626217/bug_21227.java + test/runtime/6626217/from_loader2.java + test/runtime/6626217/many_loader1.java.foo + test/runtime/6626217/many_loader2.java.foo Changeset: be3f9c242c9d Author: ysr Date: 2010-08-16 15:58 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/be3f9c242c9d 6948538: CMS: BOT walkers can fall into object allocation and initialization cracks Summary: GC workers now recognize an intermediate transient state of blocks which are allocated but have not yet completed initialization. blk_start() calls do not attempt to determine the size of a block in the transient state, rather waiting for the block to become initialized so that it is safe to query its size. Audited and ensured the order of initialization of object fields (klass, free bit and size) to respect block state transition protocol. Also included some new assertion checking code enabled in debug mode. Reviewed-by: chrisphi, johnc, poonam ! src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.hpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp ! src/share/vm/gc_implementation/concurrentMarkSweep/freeChunk.hpp ! src/share/vm/gc_implementation/concurrentMarkSweep/promotionInfo.cpp ! src/share/vm/gc_implementation/includeDB_gc_concurrentMarkSweep ! src/share/vm/includeDB_core ! src/share/vm/memory/blockOffsetTable.cpp ! src/share/vm/memory/blockOffsetTable.hpp ! src/share/vm/memory/blockOffsetTable.inline.hpp ! src/share/vm/runtime/globals.hpp Changeset: 688c3755d7af Author: tonyp Date: 2010-08-17 14:40 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/688c3755d7af 6959014: G1: assert(minimum_desired_capacity <= maximum_desired_capacity) failed: sanity check Summary: There are a few issues in the code that calculates whether to resize the heap and by how much: a) some calculations can overflow 32-bit size_t's, b) min_desired_capacity is not bounded by the max heap size, and c) the assrt that fires is in the wrong place. The fix also includes some tidying up of the related verbose code. Reviewed-by: ysr, jmasa ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp Changeset: bb847e31b836 Author: tonyp Date: 2010-08-17 14:40 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/bb847e31b836 6974928: G1: sometimes humongous objects are allocated in young regions Summary: as the title says, sometimes we are allocating humongous objects in young regions and we shouldn't. Reviewed-by: ysr, johnc ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp ! src/share/vm/gc_implementation/g1/heapRegion.cpp Changeset: b63010841f78 Author: tonyp Date: 2010-08-17 14:40 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/b63010841f78 6975964: G1: print out a more descriptive message for evacuation failure when +PrintGCDetails is set Summary: we're renaming "evacuation failure" to "to-space overflow". I'm also piggy-backing a small additional change which removes the "Mark closure took..." output. Reviewed-by: ysr, johnc ! src/share/vm/gc_implementation/g1/concurrentMark.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp Changeset: 5ed703250bff Author: ysr Date: 2010-08-18 11:39 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/5ed703250bff 6977970: CMS: concurrentMarkSweepGeneration.cpp:7947 assert(addr <= _limit) failed: sweep invariant Summary: Allow for the possibility (when the heap is expanding) that the sweep might skip over and past, rather than necessarily step on, the sweep limit determined at the beginning of a concurrent marking cycle. Reviewed-by: jmasa, tonyp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp Changeset: 413ad0331a0c Author: johnc Date: 2010-08-18 10:59 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/413ad0331a0c 6977924: Changes for 6975078 produce build error with certain gcc versions Summary: The changes introduced for 6975078 assign badHeapOopVal to the _allocation field in the ResourceObj class. In 32 bit linux builds with certain versions of gcc this assignment will be flagged as an error while compiling allocation.cpp. In 32 bit builds the constant value badHeapOopVal (which is cast to an intptr_t) is negative. The _allocation field is typed as an unsigned intptr_t and gcc catches this as an error. Reviewed-by: jcoomes, ysr, phh ! src/share/vm/memory/allocation.cpp Changeset: effb55808a18 Author: johnc Date: 2010-08-18 17:44 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/effb55808a18 Merge Changeset: 1b0104ab1e5e Author: tonyp Date: 2010-08-19 14:08 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/1b0104ab1e5e Merge Changeset: 30266066c77c Author: cl Date: 2010-08-19 15:13 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/30266066c77c Added tag jdk7-b106 for changeset 1b81ca701fa5 ! .hgtags Changeset: 295c3ae4ab5b Author: trims Date: 2010-08-19 18:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/295c3ae4ab5b Added tag hs19-b05 for changeset cc3fdfeb54b0 ! .hgtags Changeset: bf496cbe9b74 Author: trims Date: 2010-08-19 18:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/bf496cbe9b74 Added tag hs19-b06 for changeset 688a538aa654 ! .hgtags Changeset: 0e509ddd9962 Author: trims Date: 2010-08-20 03:47 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0e509ddd9962 6978726: Bump the HS19 build number to 07 Summary: Update the HS19 build number to 07 Reviewed-by: jcoomes ! make/hotspot_version Changeset: 09cdb1e1c77b Author: trims Date: 2010-08-20 04:08 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/09cdb1e1c77b Merge - src/share/vm/gc_implementation/parallelScavenge/prefetchQueue.hpp Changeset: ee5cc9e78493 Author: never Date: 2010-08-20 09:55 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/ee5cc9e78493 Merge ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/thread.cpp Changeset: 52f2bc645da5 Author: ysr Date: 2010-08-19 12:02 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/52f2bc645da5 6978533: CMS: Elide BOT update asserts until 6977974 is fixed correctly Reviewed-by: jcoomes, jmasa, tonyp ! src/share/vm/memory/blockOffsetTable.hpp Changeset: 66b9f90a9211 Author: tonyp Date: 2010-08-20 13:17 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/66b9f90a9211 Merge Changeset: 26faca352942 Author: tonyp Date: 2010-08-20 12:01 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/26faca352942 Merge Changeset: 571f6b35140b Author: trims Date: 2010-08-20 12:57 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/571f6b35140b 6978889: Remove premature change of build number to Hotspot 19 Build 07 Summary: Change the build number back to 06 Reviewed-by: jcoomes ! make/hotspot_version Changeset: b0b9d64ed9bc Author: trims Date: 2010-08-20 14:24 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/b0b9d64ed9bc 6978915: Remove Mercurial tags for Hotspot 19 Build 06 Summary: Delete the hs19-b06 Hg tag, as it was put on incorrectly Reviewed-by: jcoomes ! .hgtags Changeset: f8c5d1bdaad4 Author: ptisnovs Date: 2010-08-19 14:23 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/f8c5d1bdaad4 6885308: The incorrect -XX:StackRedPages, -XX:StackShadowPages, -XX:StackYellowPages could cause VM crash Summary: Test minimal stack sizes given (also fixed linux compilation error) Reviewed-by: never, phh, coleenp ! src/share/vm/memory/allocation.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/arguments.hpp Changeset: ebfb7c68865e Author: dcubed Date: 2010-08-23 08:44 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/ebfb7c68865e Merge ! src/share/vm/memory/allocation.cpp ! src/share/vm/runtime/arguments.cpp Changeset: b4099f5786da Author: never Date: 2010-08-25 10:31 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/b4099f5786da Merge ! src/share/vm/runtime/globals.hpp From christian.thalinger at oracle.com Thu Aug 26 00:47:20 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 26 Aug 2010 09:47:20 +0200 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <4C755966.5030206@oracle.com> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> <4C6E0CCD.8080800@oracle.com> <1282742183.28481.19.camel@macbook> <4C755966.5030206@oracle.com> Message-ID: <1282808840.4740.18.camel@macbook> On Wed, 2010-08-25 at 10:56 -0700, Vladimir Kozlov wrote: > Christian Thalinger wrote: > > > >> What about FIXME commented assert in relocInfo.cpp? > > > > I removed that in 6978355 but added it again here. I can't remember if > > that was by accident or I really hit the assert. Should I uncomment it > > again? > > Could you investigate it. If you hit the assert then you need to have > at least the comment which explains what happened. I'll try. I actually never hit that assert and John Rose didn't add a comment when commenting it. -- Christian From christian.thalinger at oracle.com Thu Aug 26 02:41:03 2010 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 26 Aug 2010 11:41:03 +0200 Subject: Request for reviews (XS): 6974682: CTW: assert(target != NULL) failed: must not be null In-Reply-To: References: <4C59E3C4.4090102@oracle.com> Message-ID: <1282815663.4740.20.camel@macbook> On Wed, 2010-08-04 at 17:03 -0700, Tom Rodriguez wrote: > Your change seems ok but that const_size code is crap. It always greatly overestimates the space needed. > > #ifdef SPARC > // Sparc doubles entries in the constant table require more space for > // alignment. (expires 9/98) > int table_entries = (3 * instr->num_consts( _globalNames, Form::idealD )) > + instr->num_consts( _globalNames, Form::idealF ); > #else > int table_entries = instr->num_consts( _globalNames, Form::idealD ) > + instr->num_consts( _globalNames, Form::idealF ); > #endif > > So on sparc a double reserves 6 words. It can/should be cleaned up as part of the changes Christian is working on. Working on that. -- Christian From vladimir.kozlov at oracle.com Thu Aug 26 11:09:40 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 26 Aug 2010 11:09:40 -0700 Subject: Request for reviews .02 (M): 6961697: move nmethod constants section before instruction section In-Reply-To: <1282808840.4740.18.camel@macbook> References: <1278520980.17142.3.camel@macbook> <4C351442.1090907@oracle.com> <1278576028.19588.13.camel@macbook> <4C3608FA.2040406@oracle.com> <1278616722.1475.134.camel@macbook> <1282243111.29965.24.camel@macbook> <4C6E0CCD.8080800@oracle.com> <1282742183.28481.19.camel@macbook> <4C755966.5030206@oracle.com> <1282808840.4740.18.camel@macbook> Message-ID: <4C76ADE4.80600@oracle.com> Christian, I thought you hit it before. Then just uncomment it and push your changes. We can file new bug if we hit it again. Thanks, Vladimir Christian Thalinger wrote: > On Wed, 2010-08-25 at 10:56 -0700, Vladimir Kozlov wrote: >> Christian Thalinger wrote: >>>> What about FIXME commented assert in relocInfo.cpp? >>> I removed that in 6978355 but added it again here. I can't remember if >>> that was by accident or I really hit the assert. Should I uncomment it >>> again? >> Could you investigate it. If you hit the assert then you need to have >> at least the comment which explains what happened. > > I'll try. I actually never hit that assert and John Rose didn't add a > comment when commenting it. > > -- Christian > From tom.rodriguez at oracle.com Thu Aug 26 12:45:47 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 26 Aug 2010 12:45:47 -0700 Subject: higher_equal and the widen bits Message-ID: <12C664EE-276D-4906-86FE-7BF8B92185B3@oracle.com> I was using higher_equal(TypeInt::POS) to identify positive values and came across a weird bug. I have a type which is int:>=0:www but t->higher_equal(TypeInt::POS) returns false because they have differing widen bits. Shouldn't the widen bits be ignored in the higher_equal tests or could that create monotonicity problems? I've written it in a more explicit fashion but it makes me suspicious of our other uses of this idiom. tom From john.r.rose at oracle.com Thu Aug 26 13:05:22 2010 From: john.r.rose at oracle.com (John Rose) Date: Thu, 26 Aug 2010 13:05:22 -0700 Subject: higher_equal and the widen bits In-Reply-To: <12C664EE-276D-4906-86FE-7BF8B92185B3@oracle.com> References: <12C664EE-276D-4906-86FE-7BF8B92185B3@oracle.com> Message-ID: On Aug 26, 2010, at 12:45 PM, Tom Rodriguez wrote: > I was using higher_equal(TypeInt::POS) to identify positive values and came across a weird bug. I have a type which is int:>=0:www but t->higher_equal(TypeInt::POS) returns false because they have differing widen bits. Shouldn't the widen bits be ignored in the higher_equal tests or could that create monotonicity problems? I've written it in a more explicit fashion but it makes me suspicious of our other uses of this idiom. The global constant TypeInt::POS is already a very wide type, in the general sense of "wide". (FTR, "wide" here means "tends to absorb other, "narrower" types in Type::meet. And that terminology FTR is backwards with respect to much of the literature, a fact which confuses even veteran C2-ers. It helps if you think concretely in terms of assertion sets, which are dual to value sets. The more assertions, the fewer values, and vice versa. Joining a pair of types unions a pair of assertion sets, which narrows the resulting value set to the intersection of the original two value sets.) I doubt whether it is worthwhile allowing it to be less than WidenMax. If it is always being used as a lower/wider bound (as in the case you mention) there is no value to allowing the global constant to be less than WidenMax. This is probably true of POS, POS1, and SYMINT also. If they are only used as higher_equal bounds and operands to join, perhaps they should all have WidenMax instead of WidenMin bits. (FTR, the purpose of the widen bits is to detect data flow loops in CCP which tend to create slowly sequences of slowly widening type values of the form [0..1], [0..2], [0..3], ..., [0..maxint], [minint..maxint]. This is typical with loop iteration variables, as the dataflow solver works around the loop. The intermediate steps are probably not interesting, and the widen bits help us decide to skip most of the steps after the first few. See TypeInt::widen.) -- John From tom.rodriguez at oracle.com Thu Aug 26 16:17:33 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 26 Aug 2010 16:17:33 -0700 Subject: higher_equal and the widen bits In-Reply-To: References: <12C664EE-276D-4906-86FE-7BF8B92185B3@oracle.com> Message-ID: <4ED669A2-4874-4C7B-B3C3-1045E238170B@oracle.com> On Aug 26, 2010, at 1:05 PM, John Rose wrote: > On Aug 26, 2010, at 12:45 PM, Tom Rodriguez wrote: > >> I was using higher_equal(TypeInt::POS) to identify positive values and came across a weird bug. I have a type which is int:>=0:www but t->higher_equal(TypeInt::POS) returns false because they have differing widen bits. Shouldn't the widen bits be ignored in the higher_equal tests or could that create monotonicity problems? I've written it in a more explicit fashion but it makes me suspicious of our other uses of this idiom. > > The global constant TypeInt::POS is already a very wide type, in the general sense of "wide". > > (FTR, "wide" here means "tends to absorb other, "narrower" types in Type::meet. And that terminology FTR is backwards with respect to much of the literature, a fact which confuses even veteran C2-ers. It helps if you think concretely in terms of assertion sets, which are dual to value sets. The more assertions, the fewer values, and vice versa. Joining a pair of types unions a pair of assertion sets, which narrows the resulting value set to the intersection of the original two value sets.) > > I doubt whether it is worthwhile allowing it to be less than WidenMax. If it is always being used as a lower/wider bound (as in the case you mention) there is no value to allowing the global constant to be less than WidenMax. > > This is probably true of POS, POS1, and SYMINT also. If they are only used as higher_equal bounds and operands to join, perhaps they should all have WidenMax instead of WidenMin bits. So since higher_equal is cmp(meet(t), t), if t, which is TypeInt::POS, had WidenMax then, this->meet(t) would also have WidenMax so it would work as expected for any widen bits. I notice that the widest integer types TypeInt::INT and TypeLong::LONG are initialized with WidenMax. TypeInt::INT = TypeInt::make(min_jint,max_jint, WidenMax); // 32-bit integers TypeLong::LONG = TypeLong::make(min_jlong,max_jlong,WidenMax); // 64-bit integers Presumably this is because they are already as wide as they can get. Making POS, POS1 and SYMINT WidenMax seems ok since more detailed widening of those types isn't that interesting. It still bugs me that higher_equal considers the widen bits. Some of our tests are against singletons so there's no problem there but many of the tests are against computed types. I'm going to add a little logic to higher_equal to see if it fails very often for types which differ only by the widen bits. tom > > (FTR, the purpose of the widen bits is to detect data flow loops in CCP which tend to create slowly sequences of slowly widening type values of the form [0..1], [0..2], [0..3], ..., [0..maxint], [minint..maxint]. This is typical with loop iteration variables, as the dataflow solver works around the loop. The intermediate steps are probably not interesting, and the widen bits help us decide to skip most of the steps after the first few. See TypeInt::widen.) > > -- John From vladimir.kozlov at oracle.com Thu Aug 26 16:23:30 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Thu, 26 Aug 2010 23:23:30 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6976400: "Meet Not Symmetric" Message-ID: <20100826232333.83D8347480@hg.openjdk.java.net> Changeset: 14b92b91f460 Author: kvn Date: 2010-08-26 11:05 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/14b92b91f460 6976400: "Meet Not Symmetric" Summary: Use NULL as klass for TypeAryPtr::RANGE. Add klass verification into TypeAryPtr ctor. Reviewed-by: never ! src/share/vm/opto/type.cpp ! src/share/vm/opto/type.hpp From Christian.Thalinger at Sun.COM Fri Aug 27 04:20:34 2010 From: Christian.Thalinger at Sun.COM (Christian.Thalinger at Sun.COM) Date: Fri, 27 Aug 2010 11:20:34 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6961697: move nmethod constants section before instruction section Message-ID: <20100827112036.7E14F474A8@hg.openjdk.java.net> Changeset: 0878d7bae69f Author: twisti Date: 2010-08-27 01:51 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/0878d7bae69f 6961697: move nmethod constants section before instruction section Summary: This is a preparation for 6961690. Reviewed-by: kvn, never ! src/share/vm/asm/codeBuffer.cpp ! src/share/vm/asm/codeBuffer.hpp ! src/share/vm/code/codeBlob.cpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/code/nmethod.hpp ! src/share/vm/code/relocInfo.cpp ! src/share/vm/code/relocInfo.hpp From tom.rodriguez at oracle.com Fri Aug 27 10:18:38 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 27 Aug 2010 10:18:38 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> Message-ID: On Aug 20, 2010, at 12:49 PM, Tom Rodriguez wrote: > It seems like a bit of a mixed bag. Moderately sized fills are slightly slower because of the extra alignment but larger fills are faster. I'll play with it some more. 16 byte alignment is better in some ways and worse in others so I'd like to leave as is with the 8 byte alignment. I've made all the changes from your earlier review and rewrote match_fill_loop to check all the conditions required. I made a minor change to AddPNode::unpack_offsets to make sure it only claims success if it can fully unpack the offsets. Previously it would just give up if it encountered something unexpected but still claim success. The sparc code now has most delay slots filled. I also reran all the tests and everything looks good. tom > > tom > > On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: > >> Tom Rodriguez wrote: >>>> Use next movdqa since you aligned address: >>>> + movdqa(Address(to, 0), xtmp); >>>> + movdqa(Address(to, 16), xtmp); >>>> >>>> instead of >>>> + movq(Address(to, 0), xtmp); >>>> + movq(Address(to, 8), xtmp); >>>> + movq(Address(to, 16), xtmp); >>>> + movq(Address(to, 24), xtmp); >>> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? >> >> Sorry, you are right, it requires 16 not 8 bytes. :( >> I think it worth to align to 16 since it will benefit all x86. >> >> movdl(xtmp, value); >> pshufd(xtmp, xtmp, 0); >> >> + // align to 16 bytes, we know we are 8 byte aligned to start >> + Label L_skip_align16; >> + testptr(to, 8); >> + jccb(Assembler::zero, L_skip_align16); >> + subl(count, 2<> + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) >> + movq(Address(to, 0), xtmp); >> + addptr(to, 8); >> + BIND(L_skip_align16); >> >> subl(count, 8 << shift); >> jcc(Assembler::less, L_check_fill_8_bytes); >> align(16); >> >> Vladimir >> >>> tom >>>> Vladimir >>>> >>>> Vladimir Kozlov wrote: >>>>> Tom, >>>>> First, I would not call these changes Medium. They are Large at least. >>>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>>> loopTransform.cpp: >>>>> In match_fill_loop() should we exclude StoreCMNode also? >>>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>>> store and store_value is not set for "copy candidate": >>>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>>> + // tty->print_cr("possible copy candidate"); >>>>> + } else { >>>>> + msg = "variant store value"; >>>>> + } >>>>> Why you assume that on 'else' it is mem_phi?: >>>>> + if (n == head->phi()) { >>>>> + // ok >>>>> + } else { >>>>> + // mem_phi >>>>> + } >>>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>>> + } else if (n->is_CountedLoopEnd()) { >>>>> + // ok so skip it. >>>>> + msg = "node used outside loop"; >>>>> ^ is >>>>> How you translate next assert message?: >>>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>>> + #ifdef ASSERT >>>>> + tty->print_cr("possible copy"); >>>>> + store_value->dump(); >>>>> + store->dump(); >>>>> + #endif >>>>> + msg = "variant store in loop"; >>>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>>> + } else if (n->Opcode() == Op_LShiftX) { >>>>> + shift = n; >>>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>>> Also the above expression is wrong if initial index != 0. >>>>> And actually you don't need to calculate it in match_fill_loop() since >>>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>>> that element type is supported. >>>>> In intrinsify_fill() initial index value is taking into account for aligned >>>>> but base_offset_in_bytes could be already part of offset and you need >>>>> to multiply by element_size only initial index: >>>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>>> + int element_size = type2aelembytes(t); >>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>>> + } >>>>> stubRoutines.cpp: >>>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>>> stubGenerator_sparc.cpp: >>>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>>> ^ Generate stub for array fill. >>>>> + // from: O0 >>>>> ^ to >>>>> + // to: O1 >>>>> ^ value >>>>> O5 is not used and not input argument: >>>>> + const Register offset = O5; // offset from start of arrays >>>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>>> + switch (t) { >>>>> + case T_BOOLEAN: >>>>> + case T_BYTE: >>>>> + shift = 2; >>>>> + break; >>>>> + case T_CHAR: >>>>> + case T_SHORT: >>>>> + shift = 1; >>>>> + break; >>>>> + case T_FLOAT: >>>>> + case T_INT: >>>>> + shift = 0; >>>>> + break; >>>>> + default: ShouldNotReachHere(); >>>>> + } >>>>> The same in assembler_x86.cpp >>>>> In stubGenerator_x86_64.cpp >>>>> new fill_32_bytes_forward() is not used. >>>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>>> I did not look on assembler. May be tomorrow. >>>>> Thanks, >>>>> Vladimir >>>>> Tom Rodriguez wrote: >>>>>> 4809552: Optimize Arrays.fill(...) >>>>>> Reviewed-by: >>>>>> >>>>>> This adds new logic to recognize fill idioms and convert them into a >>>>>> call to an optimized fill routine. Loop predication creates easily >>>>>> matched loops that are simply replaced with calls to the new assembly >>>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>>> Objects and longs/double will be supported in a later putback. Tested >>>>>> with runthese, nsk and ctw plus jbb2005. >>>>>> >>>>>> http://cr.openjdk.java.net/~never/4809552 > From vladimir.kozlov at oracle.com Fri Aug 27 11:56:40 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 27 Aug 2010 11:56:40 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> Message-ID: <4C780A68.1030600@oracle.com> Tom, loopTransform.cpp: In PhaseIdealLoop::match_fill_loop() you are filtering out If nodes which is not loopexit during store search. And later you do more robust search for unhandled nodes in loop. Could you move at least part of it into the first loop? You have a lot of code in between which would be wasted if you find later that loop has unhandled nodes. But it is fine with me if you don't want to do it, current code works. addnode.cpp: you changes will not work with raw object's field reference, I think you should allow base == top. + if (addr != base && !base->is_top()) { + return -1; + } Otherwise it looks good. Vladimir Tom Rodriguez wrote: > On Aug 20, 2010, at 12:49 PM, Tom Rodriguez wrote: > >> It seems like a bit of a mixed bag. Moderately sized fills are slightly slower because of the extra alignment but larger fills are faster. I'll play with it some more. > > 16 byte alignment is better in some ways and worse in others so I'd like to leave as is with the 8 byte alignment. I've made all the changes from your earlier review and rewrote match_fill_loop to check all the conditions required. I made a minor change to AddPNode::unpack_offsets to make sure it only claims success if it can fully unpack the offsets. Previously it would just give up if it encountered something unexpected but still claim success. The sparc code now has most delay slots filled. I also reran all the tests and everything looks good. > > tom > >> tom >> >> On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: >> >>> Tom Rodriguez wrote: >>>>> Use next movdqa since you aligned address: >>>>> + movdqa(Address(to, 0), xtmp); >>>>> + movdqa(Address(to, 16), xtmp); >>>>> >>>>> instead of >>>>> + movq(Address(to, 0), xtmp); >>>>> + movq(Address(to, 8), xtmp); >>>>> + movq(Address(to, 16), xtmp); >>>>> + movq(Address(to, 24), xtmp); >>>> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? >>> Sorry, you are right, it requires 16 not 8 bytes. :( >>> I think it worth to align to 16 since it will benefit all x86. >>> >>> movdl(xtmp, value); >>> pshufd(xtmp, xtmp, 0); >>> >>> + // align to 16 bytes, we know we are 8 byte aligned to start >>> + Label L_skip_align16; >>> + testptr(to, 8); >>> + jccb(Assembler::zero, L_skip_align16); >>> + subl(count, 2<>> + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) >>> + movq(Address(to, 0), xtmp); >>> + addptr(to, 8); >>> + BIND(L_skip_align16); >>> >>> subl(count, 8 << shift); >>> jcc(Assembler::less, L_check_fill_8_bytes); >>> align(16); >>> >>> Vladimir >>> >>>> tom >>>>> Vladimir >>>>> >>>>> Vladimir Kozlov wrote: >>>>>> Tom, >>>>>> First, I would not call these changes Medium. They are Large at least. >>>>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>>>> loopTransform.cpp: >>>>>> In match_fill_loop() should we exclude StoreCMNode also? >>>>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>>>> store and store_value is not set for "copy candidate": >>>>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>>>> + // tty->print_cr("possible copy candidate"); >>>>>> + } else { >>>>>> + msg = "variant store value"; >>>>>> + } >>>>>> Why you assume that on 'else' it is mem_phi?: >>>>>> + if (n == head->phi()) { >>>>>> + // ok >>>>>> + } else { >>>>>> + // mem_phi >>>>>> + } >>>>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>>>> + } else if (n->is_CountedLoopEnd()) { >>>>>> + // ok so skip it. >>>>>> + msg = "node used outside loop"; >>>>>> ^ is >>>>>> How you translate next assert message?: >>>>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>>>> + #ifdef ASSERT >>>>>> + tty->print_cr("possible copy"); >>>>>> + store_value->dump(); >>>>>> + store->dump(); >>>>>> + #endif >>>>>> + msg = "variant store in loop"; >>>>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>>>> + } else if (n->Opcode() == Op_LShiftX) { >>>>>> + shift = n; >>>>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>>>> Also the above expression is wrong if initial index != 0. >>>>>> And actually you don't need to calculate it in match_fill_loop() since >>>>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>>>> that element type is supported. >>>>>> In intrinsify_fill() initial index value is taking into account for aligned >>>>>> but base_offset_in_bytes could be already part of offset and you need >>>>>> to multiply by element_size only initial index: >>>>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>>>> + int element_size = type2aelembytes(t); >>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>>>> + } >>>>>> stubRoutines.cpp: >>>>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>>>> stubGenerator_sparc.cpp: >>>>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>>>> ^ Generate stub for array fill. >>>>>> + // from: O0 >>>>>> ^ to >>>>>> + // to: O1 >>>>>> ^ value >>>>>> O5 is not used and not input argument: >>>>>> + const Register offset = O5; // offset from start of arrays >>>>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>>>> + switch (t) { >>>>>> + case T_BOOLEAN: >>>>>> + case T_BYTE: >>>>>> + shift = 2; >>>>>> + break; >>>>>> + case T_CHAR: >>>>>> + case T_SHORT: >>>>>> + shift = 1; >>>>>> + break; >>>>>> + case T_FLOAT: >>>>>> + case T_INT: >>>>>> + shift = 0; >>>>>> + break; >>>>>> + default: ShouldNotReachHere(); >>>>>> + } >>>>>> The same in assembler_x86.cpp >>>>>> In stubGenerator_x86_64.cpp >>>>>> new fill_32_bytes_forward() is not used. >>>>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>>>> I did not look on assembler. May be tomorrow. >>>>>> Thanks, >>>>>> Vladimir >>>>>> Tom Rodriguez wrote: >>>>>>> 4809552: Optimize Arrays.fill(...) >>>>>>> Reviewed-by: >>>>>>> >>>>>>> This adds new logic to recognize fill idioms and convert them into a >>>>>>> call to an optimized fill routine. Loop predication creates easily >>>>>>> matched loops that are simply replaced with calls to the new assembly >>>>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>>>> Objects and longs/double will be supported in a later putback. Tested >>>>>>> with runthese, nsk and ctw plus jbb2005. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~never/4809552 > From tom.rodriguez at oracle.com Fri Aug 27 14:06:47 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 27 Aug 2010 14:06:47 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C780A68.1030600@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> <4C780A68.1030600@oracle.com> Message-ID: On Aug 27, 2010, at 11:56 AM, Vladimir Kozlov wrote: > Tom, > > loopTransform.cpp: > In PhaseIdealLoop::match_fill_loop() you are filtering out If nodes which is not loopexit during store search. And later you do more robust search for unhandled nodes in loop. > Could you move at least part of it into the first loop? You have a lot of code in between which would be wasted if you find later that loop has unhandled nodes. Originally I kept the initial loop as simple as possible but I decided to add the single check for Ifs so it would rule out simple cases more quickly. Usually if you get past that it will succeed except in a few rare cases. So I'd like to leave it alone. > But it is fine with me if you don't want to do it, current code works. > > addnode.cpp: > you changes will not work with raw object's field reference, I think you should allow base == top. > > + if (addr != base && !base->is_top()) { > + return -1; > + } It's never used for raw right now and I'm not sure it can be safely used with raw either. I think the contract for unpack_offsets must be that it returns the components that go together to make up the offsets relative to some base and I don't see any way to do that for raw since it doesn't have a well defined base. I can either assert that !base->is_top() or check for it early and return -1. What do you think? tom > > Otherwise it looks good. > > Vladimir > > Tom Rodriguez wrote: >> On Aug 20, 2010, at 12:49 PM, Tom Rodriguez wrote: >>> It seems like a bit of a mixed bag. Moderately sized fills are slightly slower because of the extra alignment but larger fills are faster. I'll play with it some more. >> 16 byte alignment is better in some ways and worse in others so I'd like to leave as is with the 8 byte alignment. I've made all the changes from your earlier review and rewrote match_fill_loop to check all the conditions required. I made a minor change to AddPNode::unpack_offsets to make sure it only claims success if it can fully unpack the offsets. Previously it would just give up if it encountered something unexpected but still claim success. The sparc code now has most delay slots filled. I also reran all the tests and everything looks good. >> tom >>> tom >>> >>> On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: >>> >>>> Tom Rodriguez wrote: >>>>>> Use next movdqa since you aligned address: >>>>>> + movdqa(Address(to, 0), xtmp); >>>>>> + movdqa(Address(to, 16), xtmp); >>>>>> >>>>>> instead of >>>>>> + movq(Address(to, 0), xtmp); >>>>>> + movq(Address(to, 8), xtmp); >>>>>> + movq(Address(to, 16), xtmp); >>>>>> + movq(Address(to, 24), xtmp); >>>>> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? >>>> Sorry, you are right, it requires 16 not 8 bytes. :( >>>> I think it worth to align to 16 since it will benefit all x86. >>>> >>>> movdl(xtmp, value); >>>> pshufd(xtmp, xtmp, 0); >>>> >>>> + // align to 16 bytes, we know we are 8 byte aligned to start >>>> + Label L_skip_align16; >>>> + testptr(to, 8); >>>> + jccb(Assembler::zero, L_skip_align16); >>>> + subl(count, 2<>>> + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) >>>> + movq(Address(to, 0), xtmp); >>>> + addptr(to, 8); >>>> + BIND(L_skip_align16); >>>> >>>> subl(count, 8 << shift); >>>> jcc(Assembler::less, L_check_fill_8_bytes); >>>> align(16); >>>> >>>> Vladimir >>>> >>>>> tom >>>>>> Vladimir >>>>>> >>>>>> Vladimir Kozlov wrote: >>>>>>> Tom, >>>>>>> First, I would not call these changes Medium. They are Large at least. >>>>>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>>>>> loopTransform.cpp: >>>>>>> In match_fill_loop() should we exclude StoreCMNode also? >>>>>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>>>>> store and store_value is not set for "copy candidate": >>>>>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>>>>> + // tty->print_cr("possible copy candidate"); >>>>>>> + } else { >>>>>>> + msg = "variant store value"; >>>>>>> + } >>>>>>> Why you assume that on 'else' it is mem_phi?: >>>>>>> + if (n == head->phi()) { >>>>>>> + // ok >>>>>>> + } else { >>>>>>> + // mem_phi >>>>>>> + } >>>>>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>>>>> + } else if (n->is_CountedLoopEnd()) { >>>>>>> + // ok so skip it. >>>>>>> + msg = "node used outside loop"; >>>>>>> ^ is >>>>>>> How you translate next assert message?: >>>>>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>>>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>>>>> + #ifdef ASSERT >>>>>>> + tty->print_cr("possible copy"); >>>>>>> + store_value->dump(); >>>>>>> + store->dump(); >>>>>>> + #endif >>>>>>> + msg = "variant store in loop"; >>>>>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>>>>> + } else if (n->Opcode() == Op_LShiftX) { >>>>>>> + shift = n; >>>>>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>>>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>>>>> Also the above expression is wrong if initial index != 0. >>>>>>> And actually you don't need to calculate it in match_fill_loop() since >>>>>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>>>>> that element type is supported. >>>>>>> In intrinsify_fill() initial index value is taking into account for aligned >>>>>>> but base_offset_in_bytes could be already part of offset and you need >>>>>>> to multiply by element_size only initial index: >>>>>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>>>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>>>>> + int element_size = type2aelembytes(t); >>>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>>>>> + } >>>>>>> stubRoutines.cpp: >>>>>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>>>>> stubGenerator_sparc.cpp: >>>>>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>>>>> ^ Generate stub for array fill. >>>>>>> + // from: O0 >>>>>>> ^ to >>>>>>> + // to: O1 >>>>>>> ^ value >>>>>>> O5 is not used and not input argument: >>>>>>> + const Register offset = O5; // offset from start of arrays >>>>>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>>>>> + switch (t) { >>>>>>> + case T_BOOLEAN: >>>>>>> + case T_BYTE: >>>>>>> + shift = 2; >>>>>>> + break; >>>>>>> + case T_CHAR: >>>>>>> + case T_SHORT: >>>>>>> + shift = 1; >>>>>>> + break; >>>>>>> + case T_FLOAT: >>>>>>> + case T_INT: >>>>>>> + shift = 0; >>>>>>> + break; >>>>>>> + default: ShouldNotReachHere(); >>>>>>> + } >>>>>>> The same in assembler_x86.cpp >>>>>>> In stubGenerator_x86_64.cpp >>>>>>> new fill_32_bytes_forward() is not used. >>>>>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>>>>> I did not look on assembler. May be tomorrow. >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> Tom Rodriguez wrote: >>>>>>>> 4809552: Optimize Arrays.fill(...) >>>>>>>> Reviewed-by: >>>>>>>> >>>>>>>> This adds new logic to recognize fill idioms and convert them into a >>>>>>>> call to an optimized fill routine. Loop predication creates easily >>>>>>>> matched loops that are simply replaced with calls to the new assembly >>>>>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>>>>> Objects and longs/double will be supported in a later putback. Tested >>>>>>>> with runthese, nsk and ctw plus jbb2005. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~never/4809552 From vladimir.kozlov at oracle.com Fri Aug 27 14:44:09 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 27 Aug 2010 14:44:09 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> <4C780A68.1030600@oracle.com> Message-ID: <4C7831A9.3010901@oracle.com> Tom Rodriguez wrote: > On Aug 27, 2010, at 11:56 AM, Vladimir Kozlov wrote: > >> Tom, >> >> loopTransform.cpp: >> In PhaseIdealLoop::match_fill_loop() you are filtering out If nodes which is not loopexit during store search. And later you do more robust search for unhandled nodes in loop. >> Could you move at least part of it into the first loop? You have a lot of code in between which would be wasted if you find later that loop has unhandled nodes. > > Originally I kept the initial loop as simple as possible but I decided to add the single check for Ifs so it would rule out simple cases more quickly. Usually if you get past that it will succeed except in a few rare cases. So I'd like to leave it alone. OK. > >> But it is fine with me if you don't want to do it, current code works. >> >> addnode.cpp: >> you changes will not work with raw object's field reference, I think you should allow base == top. >> >> + if (addr != base && !base->is_top()) { >> + return -1; >> + } > > It's never used for raw right now and I'm not sure it can be safely used with raw either. I think the contract for unpack_offsets must be that it returns the components that go together to make up the offsets relative to some base and I don't see any way to do that for raw since it doesn't have a well defined base. I can either assert that !base->is_top() or check for it early and return -1. What do you think? Sorry, I somehow thought it should behave line Ideal_base_and_offset() (yes, weird :) ). But unpack_offsets() is only used in eliminate autobox and in the new code. In both case it should return -1 for Raw. So leave code as it now. Your changes good to push. Thanks, Vladimir > > tom > >> Otherwise it looks good. >> >> Vladimir >> >> Tom Rodriguez wrote: >>> On Aug 20, 2010, at 12:49 PM, Tom Rodriguez wrote: >>>> It seems like a bit of a mixed bag. Moderately sized fills are slightly slower because of the extra alignment but larger fills are faster. I'll play with it some more. >>> 16 byte alignment is better in some ways and worse in others so I'd like to leave as is with the 8 byte alignment. I've made all the changes from your earlier review and rewrote match_fill_loop to check all the conditions required. I made a minor change to AddPNode::unpack_offsets to make sure it only claims success if it can fully unpack the offsets. Previously it would just give up if it encountered something unexpected but still claim success. The sparc code now has most delay slots filled. I also reran all the tests and everything looks good. >>> tom >>>> tom >>>> >>>> On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: >>>> >>>>> Tom Rodriguez wrote: >>>>>>> Use next movdqa since you aligned address: >>>>>>> + movdqa(Address(to, 0), xtmp); >>>>>>> + movdqa(Address(to, 16), xtmp); >>>>>>> >>>>>>> instead of >>>>>>> + movq(Address(to, 0), xtmp); >>>>>>> + movq(Address(to, 8), xtmp); >>>>>>> + movq(Address(to, 16), xtmp); >>>>>>> + movq(Address(to, 24), xtmp); >>>>>> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? >>>>> Sorry, you are right, it requires 16 not 8 bytes. :( >>>>> I think it worth to align to 16 since it will benefit all x86. >>>>> >>>>> movdl(xtmp, value); >>>>> pshufd(xtmp, xtmp, 0); >>>>> >>>>> + // align to 16 bytes, we know we are 8 byte aligned to start >>>>> + Label L_skip_align16; >>>>> + testptr(to, 8); >>>>> + jccb(Assembler::zero, L_skip_align16); >>>>> + subl(count, 2<>>>> + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) >>>>> + movq(Address(to, 0), xtmp); >>>>> + addptr(to, 8); >>>>> + BIND(L_skip_align16); >>>>> >>>>> subl(count, 8 << shift); >>>>> jcc(Assembler::less, L_check_fill_8_bytes); >>>>> align(16); >>>>> >>>>> Vladimir >>>>> >>>>>> tom >>>>>>> Vladimir >>>>>>> >>>>>>> Vladimir Kozlov wrote: >>>>>>>> Tom, >>>>>>>> First, I would not call these changes Medium. They are Large at least. >>>>>>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>>>>>> loopTransform.cpp: >>>>>>>> In match_fill_loop() should we exclude StoreCMNode also? >>>>>>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>>>>>> store and store_value is not set for "copy candidate": >>>>>>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>>>>>> + // tty->print_cr("possible copy candidate"); >>>>>>>> + } else { >>>>>>>> + msg = "variant store value"; >>>>>>>> + } >>>>>>>> Why you assume that on 'else' it is mem_phi?: >>>>>>>> + if (n == head->phi()) { >>>>>>>> + // ok >>>>>>>> + } else { >>>>>>>> + // mem_phi >>>>>>>> + } >>>>>>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>>>>>> + } else if (n->is_CountedLoopEnd()) { >>>>>>>> + // ok so skip it. >>>>>>>> + msg = "node used outside loop"; >>>>>>>> ^ is >>>>>>>> How you translate next assert message?: >>>>>>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>>>>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>>>>>> + #ifdef ASSERT >>>>>>>> + tty->print_cr("possible copy"); >>>>>>>> + store_value->dump(); >>>>>>>> + store->dump(); >>>>>>>> + #endif >>>>>>>> + msg = "variant store in loop"; >>>>>>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>>>>>> + } else if (n->Opcode() == Op_LShiftX) { >>>>>>>> + shift = n; >>>>>>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>>>>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>>>>>> Also the above expression is wrong if initial index != 0. >>>>>>>> And actually you don't need to calculate it in match_fill_loop() since >>>>>>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>>>>>> that element type is supported. >>>>>>>> In intrinsify_fill() initial index value is taking into account for aligned >>>>>>>> but base_offset_in_bytes could be already part of offset and you need >>>>>>>> to multiply by element_size only initial index: >>>>>>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>>>>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>>>>>> + int element_size = type2aelembytes(t); >>>>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>>>>>> + } >>>>>>>> stubRoutines.cpp: >>>>>>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>>>>>> stubGenerator_sparc.cpp: >>>>>>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>>>>>> ^ Generate stub for array fill. >>>>>>>> + // from: O0 >>>>>>>> ^ to >>>>>>>> + // to: O1 >>>>>>>> ^ value >>>>>>>> O5 is not used and not input argument: >>>>>>>> + const Register offset = O5; // offset from start of arrays >>>>>>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>>>>>> + switch (t) { >>>>>>>> + case T_BOOLEAN: >>>>>>>> + case T_BYTE: >>>>>>>> + shift = 2; >>>>>>>> + break; >>>>>>>> + case T_CHAR: >>>>>>>> + case T_SHORT: >>>>>>>> + shift = 1; >>>>>>>> + break; >>>>>>>> + case T_FLOAT: >>>>>>>> + case T_INT: >>>>>>>> + shift = 0; >>>>>>>> + break; >>>>>>>> + default: ShouldNotReachHere(); >>>>>>>> + } >>>>>>>> The same in assembler_x86.cpp >>>>>>>> In stubGenerator_x86_64.cpp >>>>>>>> new fill_32_bytes_forward() is not used. >>>>>>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>>>>>> I did not look on assembler. May be tomorrow. >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> Tom Rodriguez wrote: >>>>>>>>> 4809552: Optimize Arrays.fill(...) >>>>>>>>> Reviewed-by: >>>>>>>>> >>>>>>>>> This adds new logic to recognize fill idioms and convert them into a >>>>>>>>> call to an optimized fill routine. Loop predication creates easily >>>>>>>>> matched loops that are simply replaced with calls to the new assembly >>>>>>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>>>>>> Objects and longs/double will be supported in a later putback. Tested >>>>>>>>> with runthese, nsk and ctw plus jbb2005. >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~never/4809552 > From tom.rodriguez at oracle.com Fri Aug 27 14:52:12 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 27 Aug 2010 14:52:12 -0700 Subject: review (M) for 4809552: Optimize Arrays.fill(...) In-Reply-To: <4C7831A9.3010901@oracle.com> References: <8DF430F7-BA3F-40EC-958F-9A5DDFE09227@oracle.com> <4C6DEC89.8030202@oracle.com> <4C6EC02C.7050300@oracle.com> <4C6ECD8B.7080706@oracle.com> <296DD407-B744-48C2-9C37-318CC69BA251@oracle.com> <4C780A68.1030600@oracle.com> <4C7831A9.3010901@oracle.com> Message-ID: On Aug 27, 2010, at 2:44 PM, Vladimir Kozlov wrote: > > > Tom Rodriguez wrote: >> On Aug 27, 2010, at 11:56 AM, Vladimir Kozlov wrote: >>> Tom, >>> >>> loopTransform.cpp: >>> In PhaseIdealLoop::match_fill_loop() you are filtering out If nodes which is not loopexit during store search. And later you do more robust search for unhandled nodes in loop. >>> Could you move at least part of it into the first loop? You have a lot of code in between which would be wasted if you find later that loop has unhandled nodes. >> Originally I kept the initial loop as simple as possible but I decided to add the single check for Ifs so it would rule out simple cases more quickly. Usually if you get past that it will succeed except in a few rare cases. So I'd like to leave it alone. > > OK. > >>> But it is fine with me if you don't want to do it, current code works. >>> >>> addnode.cpp: >>> you changes will not work with raw object's field reference, I think you should allow base == top. >>> >>> + if (addr != base && !base->is_top()) { >>> + return -1; >>> + } >> It's never used for raw right now and I'm not sure it can be safely used with raw either. I think the contract for unpack_offsets must be that it returns the components that go together to make up the offsets relative to some base and I don't see any way to do that for raw since it doesn't have a well defined base. I can either assert that !base->is_top() or check for it early and return -1. What do you think? > > Sorry, I somehow thought it should behave line Ideal_base_and_offset() (yes, weird :) ). That interface doesn't do anything useful for array addressing. > But unpack_offsets() is only used in eliminate autobox and in the new code. In both case it should return -1 for Raw. So leave code as it now. > > Your changes good to push. Thanks for the review. tom > > Thanks, > Vladimir > >> tom >>> Otherwise it looks good. >>> >>> Vladimir >>> >>> Tom Rodriguez wrote: >>>> On Aug 20, 2010, at 12:49 PM, Tom Rodriguez wrote: >>>>> It seems like a bit of a mixed bag. Moderately sized fills are slightly slower because of the extra alignment but larger fills are faster. I'll play with it some more. >>>> 16 byte alignment is better in some ways and worse in others so I'd like to leave as is with the 8 byte alignment. I've made all the changes from your earlier review and rewrote match_fill_loop to check all the conditions required. I made a minor change to AddPNode::unpack_offsets to make sure it only claims success if it can fully unpack the offsets. Previously it would just give up if it encountered something unexpected but still claim success. The sparc code now has most delay slots filled. I also reran all the tests and everything looks good. >>>> tom >>>>> tom >>>>> >>>>> On Aug 20, 2010, at 11:46 AM, Vladimir Kozlov wrote: >>>>> >>>>>> Tom Rodriguez wrote: >>>>>>>> Use next movdqa since you aligned address: >>>>>>>> + movdqa(Address(to, 0), xtmp); >>>>>>>> + movdqa(Address(to, 16), xtmp); >>>>>>>> >>>>>>>> instead of >>>>>>>> + movq(Address(to, 0), xtmp); >>>>>>>> + movq(Address(to, 8), xtmp); >>>>>>>> + movq(Address(to, 16), xtmp); >>>>>>>> + movq(Address(to, 24), xtmp); >>>>>>> But it's only aligned to 8 bytes, not 16. maybe it would be worth it to align to 16? >>>>>> Sorry, you are right, it requires 16 not 8 bytes. :( >>>>>> I think it worth to align to 16 since it will benefit all x86. >>>>>> >>>>>> movdl(xtmp, value); >>>>>> pshufd(xtmp, xtmp, 0); >>>>>> >>>>>> + // align to 16 bytes, we know we are 8 byte aligned to start >>>>>> + Label L_skip_align16; >>>>>> + testptr(to, 8); >>>>>> + jccb(Assembler::zero, L_skip_align16); >>>>>> + subl(count, 2<>>>>> + jcc(Assembler::below, L_copy_4_bytes); // Short arrays (< 8 bytes) >>>>>> + movq(Address(to, 0), xtmp); >>>>>> + addptr(to, 8); >>>>>> + BIND(L_skip_align16); >>>>>> >>>>>> subl(count, 8 << shift); >>>>>> jcc(Assembler::less, L_check_fill_8_bytes); >>>>>> align(16); >>>>>> >>>>>> Vladimir >>>>>> >>>>>>> tom >>>>>>>> Vladimir >>>>>>>> >>>>>>>> Vladimir Kozlov wrote: >>>>>>>>> Tom, >>>>>>>>> First, I would not call these changes Medium. They are Large at least. >>>>>>>>> Should we allow OptimizeFill only when UseLoopPredicate is true? >>>>>>>>> loopTransform.cpp: >>>>>>>>> In match_fill_loop() should we exclude StoreCMNode also? >>>>>>>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly? >>>>>>>>> store and store_value is not set for "copy candidate": >>>>>>>>> + if (value->is_Load() && lpt->_body.contains(value)) { >>>>>>>>> + // tty->print_cr("possible copy candidate"); >>>>>>>>> + } else { >>>>>>>>> + msg = "variant store value"; >>>>>>>>> + } >>>>>>>>> Why you assume that on 'else' it is mem_phi?: >>>>>>>>> + if (n == head->phi()) { >>>>>>>>> + // ok >>>>>>>>> + } else { >>>>>>>>> + // mem_phi >>>>>>>>> + } >>>>>>>>> Should we also skip proj node (ifFalse) or it is not part of loop body? >>>>>>>>> + } else if (n->is_CountedLoopEnd()) { >>>>>>>>> + // ok so skip it. >>>>>>>>> + msg = "node used outside loop"; >>>>>>>>> ^ is >>>>>>>>> How you translate next assert message?: >>>>>>>>> + assert(store_value->is_Load(), "shouldn't only happen for this case"); >>>>>>>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node: >>>>>>>>> + #ifdef ASSERT >>>>>>>>> + tty->print_cr("possible copy"); >>>>>>>>> + store_value->dump(); >>>>>>>>> + store->dump(); >>>>>>>>> + #endif >>>>>>>>> + msg = "variant store in loop"; >>>>>>>>> For Op_LShiftX there is no check (n->in(1) == head->phi()): >>>>>>>>> + } else if (n->Opcode() == Op_LShiftX) { >>>>>>>>> + shift = n; >>>>>>>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match"); >>>>>>>>> s_offs already includes base_offset, see GraphKit::array_element_address(): >>>>>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0); >>>>>>>>> Also the above expression is wrong if initial index != 0. >>>>>>>>> And actually you don't need to calculate it in match_fill_loop() since >>>>>>>>> it is used only in call to StubRoutines::select_fill_function() to verify >>>>>>>>> that element type is supported. >>>>>>>>> In intrinsify_fill() initial index value is taking into account for aligned >>>>>>>>> but base_offset_in_bytes could be already part of offset and you need >>>>>>>>> to multiply by element_size only initial index: >>>>>>>>> + if (offset != NULL && head->init_trip()->is_Con()) { >>>>>>>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int(); >>>>>>>>> + int element_size = type2aelembytes(t); >>>>>>>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0); >>>>>>>>> + } >>>>>>>>> stubRoutines.cpp: >>>>>>>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already? >>>>>>>>> stubGenerator_sparc.cpp: >>>>>>>>> + // Generate stub for disjoint short fill. If "aligned" is true, the >>>>>>>>> ^ Generate stub for array fill. >>>>>>>>> + // from: O0 >>>>>>>>> ^ to >>>>>>>>> + // to: O1 >>>>>>>>> ^ value >>>>>>>>> O5 is not used and not input argument: >>>>>>>>> + const Register offset = O5; // offset from start of arrays >>>>>>>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong: >>>>>>>>> + switch (t) { >>>>>>>>> + case T_BOOLEAN: >>>>>>>>> + case T_BYTE: >>>>>>>>> + shift = 2; >>>>>>>>> + break; >>>>>>>>> + case T_CHAR: >>>>>>>>> + case T_SHORT: >>>>>>>>> + shift = 1; >>>>>>>>> + break; >>>>>>>>> + case T_FLOAT: >>>>>>>>> + case T_INT: >>>>>>>>> + shift = 0; >>>>>>>>> + break; >>>>>>>>> + default: ShouldNotReachHere(); >>>>>>>>> + } >>>>>>>>> The same in assembler_x86.cpp >>>>>>>>> In stubGenerator_x86_64.cpp >>>>>>>>> new fill_32_bytes_forward() is not used. >>>>>>>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp >>>>>>>>> I did not look on assembler. May be tomorrow. >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> Tom Rodriguez wrote: >>>>>>>>>> 4809552: Optimize Arrays.fill(...) >>>>>>>>>> Reviewed-by: >>>>>>>>>> >>>>>>>>>> This adds new logic to recognize fill idioms and convert them into a >>>>>>>>>> call to an optimized fill routine. Loop predication creates easily >>>>>>>>>> matched loops that are simply replaced with calls to the new assembly >>>>>>>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported. >>>>>>>>>> Objects and longs/double will be supported in a later putback. Tested >>>>>>>>>> with runthese, nsk and ctw plus jbb2005. >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~never/4809552 From tom.rodriguez at oracle.com Sat Aug 28 12:37:57 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Sat, 28 Aug 2010 19:37:57 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 2 new changesets Message-ID: <20100828193800.AA1F8474FA@hg.openjdk.java.net> Changeset: d6f45b55c972 Author: never Date: 2010-08-27 17:33 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/d6f45b55c972 4809552: Optimize Arrays.fill(...) Reviewed-by: kvn ! src/cpu/sparc/vm/stubGenerator_sparc.cpp ! src/cpu/x86/vm/assembler_x86.cpp ! src/cpu/x86/vm/assembler_x86.hpp ! src/cpu/x86/vm/stubGenerator_x86_32.cpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/share/vm/includeDB_compiler2 ! src/share/vm/opto/addnode.cpp ! src/share/vm/opto/c2_globals.hpp ! src/share/vm/opto/loopTransform.cpp ! src/share/vm/opto/loopnode.cpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/runtime.cpp ! src/share/vm/opto/runtime.hpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/stubRoutines.cpp ! src/share/vm/runtime/stubRoutines.hpp ! src/share/vm/utilities/globalDefinitions.hpp Changeset: 14197af1010e Author: never Date: 2010-08-27 17:35 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/14197af1010e Merge ! src/share/vm/includeDB_compiler2 ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/stubRoutines.cpp From vladimir.kozlov at oracle.com Mon Aug 30 10:40:44 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 30 Aug 2010 10:40:44 -0700 Subject: Request for reviews (S): 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative Message-ID: <4C7BED1C.2030301@oracle.com> http://cr.openjdk.java.net/~kvn/6980978/webrev Fixed 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative TypeAryPtr klass exactness is not symmetric when one of the poiter is constant array. And not NULL klass set for bottom[] at the same case. Solution: Fix code in TypeAryPtr::xmeet() for constant array. Tested with failed java regression tests. From tom.rodriguez at oracle.com Mon Aug 30 10:47:43 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 30 Aug 2010 10:47:43 -0700 Subject: Request for reviews (S): 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative In-Reply-To: <4C7BED1C.2030301@oracle.com> References: <4C7BED1C.2030301@oracle.com> Message-ID: <922C2BCF-95BD-4F40-9A7D-29D098529E76@oracle.com> Ok. tom On Aug 30, 2010, at 10:40 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6980978/webrev > > Fixed 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative > > TypeAryPtr klass exactness is not symmetric when > one of the poiter is constant array. > And not NULL klass set for bottom[] at the same case. > > Solution: > Fix code in TypeAryPtr::xmeet() for constant array. > > Tested with failed java regression tests. From vladimir.kozlov at oracle.com Mon Aug 30 10:57:36 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 30 Aug 2010 10:57:36 -0700 Subject: Request for reviews (S): 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative In-Reply-To: <922C2BCF-95BD-4F40-9A7D-29D098529E76@oracle.com> References: <4C7BED1C.2030301@oracle.com> <922C2BCF-95BD-4F40-9A7D-29D098529E76@oracle.com> Message-ID: <4C7BF110.8080703@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > Ok. > > tom > > On Aug 30, 2010, at 10:40 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/6980978/webrev >> >> Fixed 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative >> >> TypeAryPtr klass exactness is not symmetric when >> one of the poiter is constant array. >> And not NULL klass set for bottom[] at the same case. >> >> Solution: >> Fix code in TypeAryPtr::xmeet() for constant array. >> >> Tested with failed java regression tests. > From tom.rodriguez at oracle.com Mon Aug 30 13:59:13 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 30 Aug 2010 13:59:13 -0700 Subject: higher_equal and the widen bits In-Reply-To: <4ED669A2-4874-4C7B-B3C3-1045E238170B@oracle.com> References: <12C664EE-276D-4906-86FE-7BF8B92185B3@oracle.com> <4ED669A2-4874-4C7B-B3C3-1045E238170B@oracle.com> Message-ID: <6EAAF6C5-0120-4D82-A86A-3D1BD87CC84D@oracle.com> > So since higher_equal is cmp(meet(t), t), if t, which is TypeInt::POS, had WidenMax then, this->meet(t) would also have WidenMax so it would work as expected for any widen bits. I notice that the widest integer types TypeInt::INT and TypeLong::LONG are initialized with WidenMax. > > TypeInt::INT = TypeInt::make(min_jint,max_jint, WidenMax); // 32-bit integers > TypeLong::LONG = TypeLong::make(min_jlong,max_jlong,WidenMax); // 64-bit integers > > Presumably this is because they are already as wide as they can get. Making POS, POS1 and SYMINT WidenMax seems ok since more detailed widening of those types isn't that interesting. > > It still bugs me that higher_equal considers the widen bits. Some of our tests are against singletons so there's no problem there but many of the tests are against computed types. I'm going to add a little logic to higher_equal to see if it fails very often for types which differ only by the widen bits. It seems fairly rare in real programs, though there's a systematic failure in some superword code where we're looking at the values being stored and see ((i << 24) >> 24) with the type int:-128..127:www which fails the higher_equal(TypeInt::BYTE) test because of the widen bits. I think I'm going to leave this alone. tom > > tom > >> >> (FTR, the purpose of the widen bits is to detect data flow loops in CCP which tend to create slowly sequences of slowly widening type values of the form [0..1], [0..2], [0..3], ..., [0..maxint], [minint..maxint]. This is typical with loop iteration variables, as the dataflow solver works around the loop. The intermediate steps are probably not interesting, and the widen bits help us decide to skip most of the steps after the first few. See TypeInt::widen.) >> >> -- John > From vladimir.kozlov at oracle.com Mon Aug 30 15:48:57 2010 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Mon, 30 Aug 2010 22:48:57 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative Message-ID: <20100830224859.88E1C47563@hg.openjdk.java.net> Changeset: 114e6b93e9e1 Author: kvn Date: 2010-08-30 11:02 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/114e6b93e9e1 6980978: assert(mt == t->xmeet(this)) failed: meet not commutative Summary: Fix code in TypeAryPtr::xmeet() for constant array. Reviewed-by: never ! src/share/vm/opto/type.cpp From tom.rodriguez at oracle.com Mon Aug 30 17:06:42 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 30 Aug 2010 17:06:42 -0700 Subject: review (XS) for 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() Message-ID: http://cr.openjdk.java.net/~never/6969586 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() Reviewed-by: The logic in memnode that tries to constant fold loads from constant Strings doesn't handle a NULL base properly leading to a SEGV. The fix is to test if the type is a really a constant TypeOopPtr instead of testing that the node is a ConP. Tested with failing CTW test case. From vladimir.kozlov at oracle.com Mon Aug 30 17:13:20 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 30 Aug 2010 17:13:20 -0700 Subject: review (XS) for 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() In-Reply-To: References: Message-ID: <4C7C4920.3010701@oracle.com> Looks good. Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/6969586 > > 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() > Reviewed-by: > > The logic in memnode that tries to constant fold loads from constant > Strings doesn't handle a NULL base properly leading to a SEGV. The > fix is to test if the type is a really a constant TypeOopPtr instead > of testing that the node is a ConP. Tested with failing CTW test > case. From tom.rodriguez at oracle.com Mon Aug 30 17:25:05 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 30 Aug 2010 17:25:05 -0700 Subject: review (XS) for 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() In-Reply-To: <4C7C4920.3010701@oracle.com> References: <4C7C4920.3010701@oracle.com> Message-ID: <28537EE4-43C0-404A-8E87-B26846C79C66@oracle.com> Thanks. tom On Aug 30, 2010, at 5:13 PM, Vladimir Kozlov wrote: > Looks good. > > Vladimir > > Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/6969586 >> 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() >> Reviewed-by: >> The logic in memnode that tries to constant fold loads from constant >> Strings doesn't handle a NULL base properly leading to a SEGV. The >> fix is to test if the type is a really a constant TypeOopPtr instead >> of testing that the node is a ConP. Tested with failing CTW test >> case. From tom.rodriguez at oracle.com Tue Aug 31 03:14:14 2010 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Tue, 31 Aug 2010 10:14:14 +0000 Subject: hg: jdk7/hotspot-comp/hotspot: 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() Message-ID: <20100831101426.1B4B44757D@hg.openjdk.java.net> Changeset: 02f0a9b6f654 Author: never Date: 2010-08-30 17:27 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/02f0a9b6f654 6969586: OptimizeStringConcat: SIGSEGV in LoadNode::Value() Reviewed-by: kvn ! src/share/vm/opto/memnode.cpp From vladimir.kozlov at oracle.com Tue Aug 31 18:30:31 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 31 Aug 2010 18:30:31 -0700 Subject: Request for reviews (S): 6981431: IdealKit should support I_O projections Message-ID: <4C7DACB7.9040803@oracle.com> http://cr.openjdk.java.net/~kvn/6981431/webrev Fixed 6981431: IdealKit should support I_O projections IdealKit should support I_O projections to be able generate allocations by IdealKit. Please, verify the new code in IdealKit which merges i_o. Thanks, Vladimir From tom.rodriguez at oracle.com Tue Aug 31 19:48:02 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 31 Aug 2010 19:48:02 -0700 Subject: Request for reviews (S): 6981431: IdealKit should support I_O projections In-Reply-To: <4C7DACB7.9040803@oracle.com> References: <4C7DACB7.9040803@oracle.com> Message-ID: <275AAD03-D7A5-4137-BD37-2D131E45BB4B@oracle.com> Looks good. Maybe the IdealKit should take a GraphKit directly instead of each constructor call unpacking the arguments? Instead of: IdealKit kit(gvn(), control(), merged_memory(), i_o(), false, true); this: IdealKit kit(this, false, true); I'm ok with how it is though. tom On Aug 31, 2010, at 6:30 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/6981431/webrev > > Fixed 6981431: IdealKit should support I_O projections > > IdealKit should support I_O projections to be able > generate allocations by IdealKit. > > Please, verify the new code in IdealKit which merges i_o. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Tue Aug 31 20:14:53 2010 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 31 Aug 2010 20:14:53 -0700 Subject: Request for reviews (S): 6981431: IdealKit should support I_O projections In-Reply-To: <275AAD03-D7A5-4137-BD37-2D131E45BB4B@oracle.com> References: <4C7DACB7.9040803@oracle.com> <275AAD03-D7A5-4137-BD37-2D131E45BB4B@oracle.com> Message-ID: <4C7DC52D.2060405@oracle.com> Thank you, Tom I agree with passing GraphKit into IdealKit constructor. Also if I keep GraphKit pointer in IdealKit I can use ~IdealKit destructor for final sync instead of having separate method GraphKit::sync_kit(). But it is for an other round. Thanks, Vladimir On 8/31/10 7:48 PM, Tom Rodriguez wrote: > Looks good. Maybe the IdealKit should take a GraphKit directly instead of each constructor call unpacking the arguments? Instead of: > > IdealKit kit(gvn(), control(), merged_memory(), i_o(), false, true); > > this: > > IdealKit kit(this, false, true); > > I'm ok with how it is though. > > tom > > On Aug 31, 2010, at 6:30 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/6981431/webrev >> >> Fixed 6981431: IdealKit should support I_O projections >> >> IdealKit should support I_O projections to be able >> generate allocations by IdealKit. >> >> Please, verify the new code in IdealKit which merges i_o. >> >> Thanks, >> Vladimir > From tom.rodriguez at oracle.com Tue Aug 31 20:19:51 2010 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 31 Aug 2010 20:19:51 -0700 Subject: Request for reviews (S): 6981431: IdealKit should support I_O projections In-Reply-To: <4C7DC52D.2060405@oracle.com> References: <4C7DACB7.9040803@oracle.com> <275AAD03-D7A5-4137-BD37-2D131E45BB4B@oracle.com> <4C7DC52D.2060405@oracle.com> Message-ID: Sounds good. tom On Aug 31, 2010, at 8:14 PM, Vladimir Kozlov wrote: > Thank you, Tom > > I agree with passing GraphKit into IdealKit constructor. Also if I keep GraphKit pointer in IdealKit I can use > ~IdealKit destructor for final sync instead of having separate method GraphKit::sync_kit(). But it is for an other round. > > Thanks, > Vladimir > > On 8/31/10 7:48 PM, Tom Rodriguez wrote: >> Looks good. Maybe the IdealKit should take a GraphKit directly instead of each constructor call unpacking the arguments? Instead of: >> >> IdealKit kit(gvn(), control(), merged_memory(), i_o(), false, true); >> >> this: >> >> IdealKit kit(this, false, true); >> >> I'm ok with how it is though. >> >> tom >> >> On Aug 31, 2010, at 6:30 PM, Vladimir Kozlov wrote: >> >>> http://cr.openjdk.java.net/~kvn/6981431/webrev >>> >>> Fixed 6981431: IdealKit should support I_O projections >>> >>> IdealKit should support I_O projections to be able >>> generate allocations by IdealKit. >>> >>> Please, verify the new code in IdealKit which merges i_o. >>> >>> Thanks, >>> Vladimir >>