From peter.helfer.java at gmail.com Tue Oct 9 08:15:25 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Tue, 9 Oct 2007 17:15:25 +0200 Subject: Interpreter calling a C method Message-ID: Hi all I would like to call a C function in the interpreter when I invoke a new method; I assume I have to change the TemplateTable::invokeZZZ, by inserting some assembly instructions. I figured out the calls are finally done by InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp). I thought I could use call_VM_leaf(CAST_FROM_FN_PTR(address, SharedRuntime::do_funnymethod), 0) - but I get the error message 'InterpreterMacroAssembler::call_VM_leaf_base: last_sp != NULL'. Now the questions.. a) how can I call a C method from the runtime-generated assembly without destroying the current frame ? b) how do I figure out which regs are used for what ? Do I have to go with that OopMap and let it iterate over the frames ? c) is there a definite 'calling convention guide' for Java ? I have found some information dispersed in serveral places (ok, they are at least all in the same folder) about the frame layout = stack layout, about some regs and their usage Thanks, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071009/85e6abee/attachment.html From Steve.Goldman at Sun.COM Tue Oct 9 08:44:43 2007 From: Steve.Goldman at Sun.COM (steve goldman) Date: Tue, 09 Oct 2007 11:44:43 -0400 Subject: Interpreter calling a C method In-Reply-To: References: Message-ID: <470BA1EB.7000703@sun.com> Peter Helfer wrote: > Hi all > > I would like to call a C function in the interpreter when I invoke a new > method; I assume I have to change the TemplateTable::invokeZZZ, by inserting > some assembly instructions. I figured out the calls are finally done by > InterpreterMacroAssembler::jump_from_interpreted(Register method, Register > temp). You do want to modify TemplateTable::invokeZZZ. > > I thought I could use call_VM_leaf(CAST_FROM_FN_PTR(address, > SharedRuntime::do_funnymethod), 0) - but I get the error message > 'InterpreterMacroAssembler::call_VM_leaf_base: last_sp != NULL'. First you need to understand the difference between leaf calls and non-leaf call_VM's. You only can do a call_VM_leaf if there is no possibility of blocking for a safepoint. So if do_funnymethod does much in the vm you don't want to be using leaf. The next thing to learn about is what is called the frame anchor (or last Java frame). Whenever we are leaving Java mode for either the vm or native (jni) we need to store a description of where the last Java frame was so that the stack walker can find all the frames should a safepoint occur while in vm/native. This is will happen automatically when you use call_VM (call_VM_leaf doesn't do this because you promised not to block). There error you are getting appears to be happing because you tried to jam you call_VM_leaf into jump_from_interpreted after this code: // set sender sp leal(rsi, Address(rsp, wordSize)); // record last_sp movl(Address(rbp, frame::interpreter_frame_last_sp_offset * wordSize), rsi); which is incorrect. call_VM_leaf expects that no blocking will occur and to that no one will need to see last_sp for the interpreter frame and so it complains. This assert is mostly there to ensure that the interpreter frame's last_sp is properly set/cleared around Java calls. If you moved your code ahead of the leal then it should work as long as you don't destroy the register "method". > > Now the questions.. > a) how can I call a C method from the runtime-generated assembly without > destroying the current frame ? call_VM or call_VM_leaf > > b) how do I figure out which regs are used for what ? Do I have to go with > that OopMap and let it iterate over the frames ? I don't begin to really understand the question. So I'll ask more, what regs are we talking about where? Since you are calling C the only regs you care about are the ones the interpreter is already using. call_VM* will save/restore them around the call. If do_funnymethod takes any parameters you'll need to develop them into regs that the interpreter isn't using. (rsi/rdi should be avoided). You don't have to worry about oopMaps. For interpreter frame there is a fixed layout and the jvm figures out what is live based on the bci (bytecode index) stored in the frame. oopMaps are typically used for stubs and for jit compiled code. > > c) is there a definite 'calling convention guide' for Java ? I have found > some information dispersed in serveral places (ok, they are at least all in > the same folder) about the frame layout = stack layout, about some regs and > their usage > I don't think so. There's some stuff in my blog (http://blogs.sun.com/fatcatair) about calling conventions but I don't think it is exactly what you want. -- Steve From Steve.Goldman at Sun.COM Tue Oct 9 11:29:22 2007 From: Steve.Goldman at Sun.COM (steve goldman) Date: Tue, 09 Oct 2007 14:29:22 -0400 Subject: Interpreter calling a C method In-Reply-To: References: <470BA1EB.7000703@sun.com> Message-ID: <470BC882.7080200@sun.com> Peter Helfer wrote: > @Steve: thanks for the quick reply! > > Ok, firstly my do_funnymethod is actually just calling tty->print_cr("Here I > am") which should not cause a safepoint, I assume; so far it worked when > compiling the rest of make debug_build with that message. In some sense that is illegal since it is doing i/o and should it block the jvm will halt. In this exploratory instance it is mostly ok. In general you need to use valid thread state transitions. While we are in Java code the thread_state is "in_Java". When you transition to somewhere where you essentially leave Java (a safepoint could occur) you need to change the state so that the jvm won't dead lock. In general in entering the jvm you'll see some sort of entry macro (JRT_ENTRY, IRT_ENTRY) that defines the type of entry and does the proper state transitions. > > I want to be able to tinker around with the frames just left by the > interpreter (later on, it should be as well on compiled frames...), so I > guess I have to stick to call_VM_base. What I've found is the interpreter > frame layout in frame_i486.hpp: You want to use call_VM (or call_VM_leaf) leave the _base methods alone. > > // A frame represents a physical stack frame (an activation). Frames can be > // C or Java frames, and the Java frames can be interpreted or compiled. > // In contrast, vframes represent source-level activations, so that one > physical frame > // can correspond to multiple source level frames because of inlining. > // A frame is comprised of {pc, fp, sp} > > // Layout of interpreter frame: > // [expression stack ] * <- sp > // [monitors ] \ > // ... | monitor block size > // [monitors ] / > // [monitor block size ] > // [byte code index/pointr] = bcx() > bcx_offset > // [pointer to locals ] = locals() > locals_offset > // [constant pool cache ] = cache() > cache_offset > // [methodData ] = mdp() > mdx_offset > // [methodOop ] = method() > method_offset > // [last sp ] = last_sp() > last_sp_offset > // [old stack pointer ] (sender_sp) > sender_sp_offset > // [old frame pointer ] <- fp = link() > // [return pc ] > // [oop temp ] (only for native calls) > // [locals and parameters ] > // <- sender sp > > As well I figured out that the interpreter tries to leave things in regs > (RAX, RDX) rather than storing it always on stack; this is encoded in > TosState, what it is. Now when calling call_vm, it should take care of > those, as you are telling me, right ? Not if you are doing arbitrary calls at arbitrary places. In the case of invoke the tosca (top-of-stack-cache) has been flushed (i.e. stored to the expression stack so that we're in "vtos" mode). If you try this in arbitrary places to tosca must be flushed so that if a gc occurs it will see the proper stack state. > > To get the vframes beneath the current frame (which is now marked as native > (=in VM) I believe), I call thread->last_frame to get all the frames (which > can comprise multiple vframes), or thread->last_java_vframe to get only the > last java frames, without any native threads ? The stack walkers can only see a subset of frames. They won't see any c++ frames. They can see interpreter frames, compiled frames, and stub frames. When you use a vanilla frame to walk the stack (via sender calls ) you'll see individual frames as they exist on the stack (i.e. an actual activation). Typically you do those kind of walk be starting with thread->last_frame(). In the case of a compiled frame that actual activation may contain multiple Java method frames because of inlining. You can see every one of these by using a vframeStream. The stream is constructed using the thread and you don't need to do sender type calls. > > In my case is this the right assumption about the stack layout ? > > (up, growing to 0x0000) > > | local vars | <- %ESP > | saved ebp | <- %EBP > | return addr | > | param1 | > | ... | the call_vm frame (which is cdecl) > | param N | > ++++++++++++++++++ > | java_argN | [ADDR1] > | ... | > | java_arg1 | > | objectref | (as long as we are not static) > | rest of the | > | expression- | > | stack | > | monitorN | > | .... | > | monitor0 | > | monitorblsize| I don't know what monitorblsize is. I think this is wrong. > | bci/bcp |-> pointing to the instruction invokeZZZ in > methodOop->constMethodOop->codes_offset()+ ~bci~ > | ptrToLocals |-> pointing to ADDR0 > | cpCache | > | methodData | > | methodOop | > | lastSP |-> pointing to ADDR1 > | oldSP |-> pointing to ADDR2 (senderSP) > | oldFP |-> pointing to ADDR3 > | returnPC |-> the PC of the caller, either interp, c2i or deopt > -stubcode > | local0 | [ADDR0] > | locals1-N | > | paramN | > | ... | > | param0 | > ++++++++++++++ This last section is almost all wrong. The params of the caller become the locals of the callee. Since this is a stack and param0 is pushed first it is at a higher address and it is local[0] for the callee. param1 is essentially local[-1], etc. Since the callee can have more locals than params the caller's stack is typically extended by the "extra locals". There might be some more things to learn at this blog: http://gbenson.livejournal.com/ which is a guy from RedHat doing a ppc port. He's using the c++ interpreter but a lot of the basics (like local layout) is the same. I've left some comments in his blog about questions he has raised trying to make this clear. > | either: | [ADDR2] > | -expression stack > | -compiled stuff stack > | | > | oldFP | [ADDR3] > | > | > 0xFFFF: > > Finally, to mess up totally, I could use in call_VM_base > thread->frame_anchor->set_last_Java_pc(address pc) to change the return > point to something totally different (to a compiled stub, which knows about > how to continue with that current expression stack for example) ? No. The anchor pc is there just to make the frame recognizable. That mutator is probably only used by the c++ interpreter and if you change the value you mostly only succeed in crashing the stack walker. In order to modify a return address you need to use frame::patch_pc(). This only works in very certain circumstances. You can't use it to change where the VM is going to return to a stub. In general you are very safe to modify the pc of last_frame(). In even more generality you are likely to crash something without being very careful about when you apply a patch like this. > > > One additional question: is there a general rule when a safepoint can happen > ? I've seen the runtime/interfaceSupport.hpp, which declares all the macros > JRT_ and IRT_ - which are allocating the > ThreadInVMfromJava, and HandleMarkCleaner onstack to provoke a transition in > the constructor/destructor > Now by whom are safepoints placed ? safepoint is a very overloaded term in the vm. A thread is at a "safe point" whenever it is stack walkable and safe for the gc to modify its stack. A safepoint is also used to mean when the vm thread has brought all the Java threads a safepoint. Reaching a safe point is a cooperative process. If a Java thread transitions to native then it is at a safepoint. Its stack is walkable and it will be blocked from further execution if it tries to modify Java state. Threads in "vm" state are not at a safepoint but they do have a stack that is walkable (in the sense we can find it all). Since the thead is still executing it isn't quite safe yet. If the vm thread decided to bring the world to a stop for something like a gc then if a thread in vm attempts to return to Java (say compiled/interpreted code) it will block itself. Threads running in Java mode poll to see whether they should block. -- Steve From peter.helfer.java at gmail.com Fri Oct 12 12:51:15 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Fri, 12 Oct 2007 21:51:15 +0200 Subject: 3 questions Message-ID: Hi all 1) how far is the release of the disassembler to the public, specifically: x86 ? 2) I'd like to allocate a pool of threads (JavaThreads) in the VM, and keep them waiting until I figure out, what entrypoint they should take. Now the plan would be to keep the threads preallocated, and let them wait on a condition variable to be released. It seems only JavaThread(ThreadFunction entry_point, size_t stack_size = 0) is to be used, the other constructor is only for the main thread & jni, right ? Now I would provide a function matching 'typedef void (*ThreadFunction)(JavaThread*, TRAPS)' to the constructor, add to my pool (just an array so far), and invoke Thread::start(): // assume JavaThread is extended by // - a monitor (runtime/mutex.hpp) '_sleepVar' // - an additional entry point of type address '_newentry' which should be either the entry point of the interpreter when jumping back from a method (assumed that bcp is correctly updated), or any instruction in a compiled version of a java method. I assume (for now) that the frame is correctly initialized to continue at that point. while(true){ _sleepVar.wait(no_safepoint_check = false, timeout = 0, as_suspend_equivalent = !_as_suspend_equivalent_flag); // what is that last flag doing ? if(_newentry != NULL){ // In GCC AT&T syntax: Jump to _newentry (clobbers eax) asm ("movl %0, %%eax; \n\t" "jmp %eax" : /* output: none */ :"r"(_newentry) /* input: _newentry */ : "%eax" /* clobbered register */ ); } } // return point of function _newentry = NULL; // do some housecleaning run_housecleaning(); } .. and some starting function: jbool start_entrypoint(address entrypoint){ assert(entrypoint); JavaThread* thread = _singleton_pool.getThread(); if(thread != NULL){ thread->set_new_entry(entrypoint); // setter for entry point thread->getSleepVar()->notify(); // getter for sleep var return true; } return false; } Does this look feasible or is there a better way to go for ? Is there a thread pool around (apart from java.util.concurrent.Executor et al.) ? 3) I know that the interpreter jumps away using jump_from(Method, temp) to jump to either the compiled entry (_code->entry()) or again the interpreter (_i2i_entry, _from_compiled initially). This entry corresponds to the type of method (native, synchronized, accessors, empty, intrinsic aka math functions, or zerolocals aka normal), and has been determined at link time (methodOopDesc:link_method). I know as well, that many return stubs are generated, in order to jump back into the interpreter and pick up where it left, as described in AbstractInterpreterGenerator::generate_return_entry_for(TosState state, int step) and stored into 'static Entrypoint Interpreter::_return_entry[number_of_return_entries = 9]. If I'm not totally mistaken, the _return_entry[3] are for invokespecial, static, virtual and [5] for invokeinterface, because of: address AbstractInterpreter::return_entry(TosState state, int length) { guarantee(0 <= length && length < Interpreter::number_of_return_entries, "illegal length"); return _return_entry[length].entry(state); } .. and in TemplateTable_i486.cpp, prepare_invoke: // compute return type __ shrl(flags, ConstantPoolCacheEntry::tosBits); // Make sure we don't need to mask flags for tosBits after the above shift ConstantPoolCacheEntry::verify_tosBits(); // load return address { const int table = is_invokeinterface ? (int)Interpreter::return_5_addrs_by_index_table() : (int)Interpreter::return_3_addrs_by_index_table(); __ movl(flags, Address(noreg, flags, Address::times_4, table)); } // push return address __ pushl(flags); // Restore flag value from the constant pool cache, and restore rsi // for later null checks. rsi is the bytecode pointer if (save_flags) { __ movl(flags, rsi); __ restore_bcp(); } So this code determines by checking the TosBits in the child method, what kind of return value it has to expect, computes the offset in either return_X_addrs_by_index_table and pushes that value on the stack ? So this means that it expects the result of that method in RAX (+RDX for long/double), irregarding of whether the child method is compiled or the interpreter ? Now if I wanted to reroute the return call, I could change this pushed return address to another stub, which would save the result (RAX/RDX), do some freaky stuff like call the VM again, and finally return to the entry beforehand exchanged ? Thanks, Peter PS @Steve: your hint helped really well, thanks! To bring Steve's answer again to the list - I had to add the save of the BCP before leaving it, otherwise the assertion would fail in methodOop::bcp_from(int bci) void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { if(MyMagicEnabled) Label ignore; cmpw(Address(method, methodOopDesc::myFlag_offset()), myFlagValue); jcc(Assembler::aboveEqual, ignore); restore_bcp(); // this saves the current BCP into the frame and allows to jump into the VM call_VM(temp, CAST_FROM_FN_PTR(address, MyCode::setMyFlagValueRight), temp, true); bind(ignore); } // add the custom code BEFORE moving the last_sp into place // set sender sp leal(rsi, Address(rsp, wordSize)); // record last_sp movl(Address(rbp, frame::interpreter_frame_last_sp_offset * wordSize), rsi); //here is the jvti in between //finally jump! jmp(Address(method, methodOopDesc::from_interpreted_offset())); -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071012/1f499761/attachment.html From David.Holmes at Sun.COM Fri Oct 12 21:01:48 2007 From: David.Holmes at Sun.COM (David Holmes - Sun Microsystems) Date: Sat, 13 Oct 2007 14:01:48 +1000 Subject: VM thread pool ( was: 3 questions) In-Reply-To: References: Message-ID: <4710432C.80306@sun.com> Peter, One answer, or more comment :) to #2. There is no existing native thread pool in the VM. I'm sure it must have been considered though, ... which means there are probably some non-obvious "gotchas" waiting out there. Someone else may be able to chime in with more info there. If you are going to try this then a couple of things you need to watch for: 1. Cleanup and reinitialization of thread-local state (native level I mean not Java ThreadLocal). 2. Removing the threads from the ThreadsList while they are idle and adding back when needed. (As you need the ThreadsLock for this you might as well use it to protect the queue of pool threads too.) 3. I'm not sure I see the need for a new entry point - as long as these are always going to execute Java threads. You just need to turn the existing entry into a loop that will return to waiting when a logical-thread has completed - though some of the initialization done when a new thread is created will have to be moved to the thread itself as part of the entry logic. I don't see that you would want to "jump" to a new entry rather than just calling it and returning normally. Other than the TLS issue I think this would be a fairly straight-forward exercise. The complexities come with all of the thread management policies you might want, and how to expose them - as with ThreadPoolExecutor. :) Cheers, David Holmes PS. I'll be traveling over the new few days so if there's any follow-up I may not see it for a while. Peter Helfer said the following on 13/10/07 05:51 AM: > Hi all > > 1) how far is the release of the disassembler to the public, > specifically: x86 ? > > 2) I'd like to allocate a pool of threads (JavaThreads) in the VM, and > keep them waiting until I figure out, what entrypoint they should take. > > Now the plan would be to keep the threads preallocated, and let them > wait on a condition variable to be released. It seems only > JavaThread(ThreadFunction entry_point, size_t stack_size = 0) is to be > used, the other constructor is only for the main thread & jni, right ? > > Now I would provide a function matching 'typedef void > (*ThreadFunction)(JavaThread*, TRAPS)' to the constructor, add to my > pool (just an array so far), and invoke Thread::start(): > > > // assume JavaThread is extended by > // - a monitor (runtime/mutex.hpp) '_sleepVar' > // - an additional entry point of type address '_newentry' which should > be either the entry point of the interpreter when jumping back from a > method (assumed that bcp is correctly updated), or any instruction in a > compiled version of a java method. I assume (for now) that the frame is > correctly initialized to continue at that point. > > while(true){ > _sleepVar.wait(no_safepoint_check = false, timeout = 0, > as_suspend_equivalent = !_as_suspend_equivalent_flag); > // what is that last flag doing ? > > if(_newentry != NULL){ > // In GCC AT&T syntax: Jump to _newentry (clobbers eax) > asm ("movl %0, %%eax; \n\t" > "jmp %eax" > > : /* output: none */ > :"r"(_newentry) /* input: _newentry */ > : "%eax" /* clobbered register */ > ); > > } > } > > // return point of function > _newentry = NULL; > > // do some housecleaning > run_housecleaning(); > > } > > > .. and some starting function: > > jbool start_entrypoint(address entrypoint){ > assert(entrypoint); > JavaThread* thread = _singleton_pool.getThread(); > if(thread != NULL){ > thread->set_new_entry(entrypoint); // setter for entry point > thread->getSleepVar()->notify(); // getter for sleep var > return true; > } > return false; > } > > > Does this look feasible or is there a better way to go for ? Is there a > thread pool around (apart from java.util.concurrent.Executor et al.) ? > > > 3) > I know that the interpreter jumps away using jump_from(Method, temp) to > jump to either the compiled entry (_code->entry()) or again the > interpreter (_i2i_entry, _from_compiled initially). This entry > corresponds to the type of method (native, synchronized, accessors, > empty, intrinsic aka math functions, or zerolocals aka normal), and has > been determined at link time (methodOopDesc:link_method). > > I know as well, that many return stubs are generated, in order to jump > back into the interpreter and pick up where it left, as described in > AbstractInterpreterGenerator::generate_return_entry_for(TosState state, > int step) and stored into 'static Entrypoint > Interpreter::_return_entry[number_of_return_entries = 9]. > > If I'm not totally mistaken, the _return_entry[3] are for invokespecial, > static, virtual and [5] for invokeinterface, because of: > > address AbstractInterpreter::return_entry(TosState state, int length) { > guarantee(0 <= length && length < > Interpreter::number_of_return_entries, "illegal length"); > return _return_entry[length].entry(state); > } > > .. and in TemplateTable_i486.cpp, prepare_invoke: > > // compute return type > __ shrl(flags, ConstantPoolCacheEntry::tosBits); > // Make sure we don't need to mask flags for tosBits after the above shift > ConstantPoolCacheEntry::verify_tosBits(); > // load return address > { const int table = > is_invokeinterface > ? (int)Interpreter::return_5_addrs_by_index_table() > : (int)Interpreter::return_3_addrs_by_index_table(); > __ movl(flags, Address(noreg, flags, Address::times_4, table)); > } > > // push return address > __ pushl(flags); > > // Restore flag value from the constant pool cache, and restore rsi > // for later null checks. rsi is the bytecode pointer > if (save_flags) { > __ movl(flags, rsi); > __ restore_bcp(); > } > > > So this code determines by checking the TosBits in the child method, > what kind of return value it has to expect, computes the offset in > either return_X_addrs_by_index_table and pushes that value on the stack ? > So this means that it expects the result of that method in RAX (+RDX for > long/double), irregarding of whether the child method is compiled or the > interpreter ? > > > Now if I wanted to reroute the return call, I could change this pushed > return address to another stub, which would save the result (RAX/RDX), > do some freaky stuff like call the VM again, and finally return to the > entry beforehand exchanged ? > > > Thanks, Peter > > > > > > PS @Steve: your hint helped really well, thanks! > To bring Steve's answer again to the list - I had to add the save of the > BCP before leaving it, otherwise the assertion would fail in > methodOop::bcp_from(int bci) > > void InterpreterMacroAssembler::jump_from_interpreted(Register method, > Register temp) { > if(MyMagicEnabled) > Label ignore; > > cmpw(Address(method, methodOopDesc::myFlag_offset()), myFlagValue); > jcc(Assembler::aboveEqual, ignore); > > restore_bcp(); // this saves the current BCP into the frame and > allows to jump into the VM > call_VM(temp, CAST_FROM_FN_PTR(address, > MyCode::setMyFlagValueRight), temp, true); > > bind(ignore); > } > // add the custom code BEFORE moving the last_sp into place > > // set sender sp > leal(rsi, Address(rsp, wordSize)); > // record last_sp > movl(Address(rbp, frame::interpreter_frame_last_sp_offset * wordSize), > rsi); > > //here is the jvti in between > > //finally jump! > jmp(Address(method, methodOopDesc::from_interpreted_offset())); From linuxhippy at gmail.com Sat Oct 13 03:43:21 2007 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Sat, 13 Oct 2007 12:43:21 +0200 Subject: VM thread pool ( was: 3 questions) In-Reply-To: <4710432C.80306@sun.com> References: <4710432C.80306@sun.com> Message-ID: <194f62550710130343t340bacddvf0ddf80d95bc372d@mail.gmail.com> > 3. I'm not sure I see the need for a new entry point - as long as these > are always going to execute Java threads. You just need to turn the > existing entry into a loop that will return to waiting when a > logical-thread has completed - though some of the initialization done > when a new thread is created will have to be moved to the thread itself > as part of the entry logic. I don't see that you would want to "jump" to > a new entry rather than just calling it and returning normally. Well wouldn't it be great too if the thread-pool could be accessed from native code too? I just imagine that for some quite heavy, but "concurrent-ready" work it would be maybe good to have such a system. I just imagine Java2D where large workloads (e.g. interpolate large image, fill antialiased path) could benefit from the additional processors available. lg Clemens From avinash.lakshman at gmail.com Sun Oct 14 20:25:26 2007 From: avinash.lakshman at gmail.com (Avinash Lakshman) Date: Sun, 14 Oct 2007 20:25:26 -0700 Subject: Dolphin release and Escape Analysis Message-ID: Hi All I recently downloaded the latest Dolphin release. I was curious to check out the much talked about stack allocation feature. Is this available in the Dolphin release and if so how do I turn it on. Please advice Thanks A -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071014/c81489c1/attachment.html From peter.helfer.java at gmail.com Mon Oct 15 04:15:28 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Mon, 15 Oct 2007 13:15:28 +0200 Subject: The different stubs, what they are for.. Message-ID: I'm trying to compile a small overview about the interpreter, and how it works all together. I stumbled across some Stubs I don't know yet the precise meaning or intention behind, could somebody correct my assumptions ? - early_ret(TosState) // forced return by debugger/JVMTI, removes activation frame, puts assignment compatible result on stack ? - slow signature handler // what is that for ? - continuation handler(TosState) // it sets interpreter mode (by setting last_sp = NULL_WORD), and continues dispatching Thanks, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071015/95d1451e/attachment.html From Steve.Goldman at Sun.COM Mon Oct 15 06:09:00 2007 From: Steve.Goldman at Sun.COM (steve goldman) Date: Mon, 15 Oct 2007 09:09:00 -0400 Subject: The different stubs, what they are for.. In-Reply-To: References: Message-ID: <4713666C.5030602@sun.com> Peter Helfer wrote: > I'm trying to compile a small overview about the interpreter, and how it > works all together. I stumbled across some Stubs I don't know yet the > precise meaning or intention behind, could somebody correct my assumptions ? > > - early_ret(TosState) // forced return by debugger/JVMTI, removes > activation frame, puts assignment compatible result on stack ? jvmti can ask the jvm to abort the current activation as if it were complete and return a result of the type expected. > - slow signature handler // what is that for ? Passing native arguments is done by signature handlers that are separate little pieces of code for particular signatures. For signatures that are too wide (many parameters) there is a generic handler to copy the args from the location that Java put them to where the native call expects them. You could in theory run with only the slow signature handler. It is slow since it is a jvm entry and a safepoint could occur. The latter point has caused some bugs in the past because the youngest frame is at an interesting state. > - continuation handler(TosState) // it sets interpreter mode (by setting > last_sp = NULL_WORD), and continues dispatching This is used as part of deoptimization. When we create an interpreter frame(s) to replace a compiled frame we need to come up with pc's to return to. Depending on the exact state we were in when the deopt happened we made need various spots in the interpreter to resume execution. This is one of those. Look at the code in vframeArrayElement::unpack_on_stack(). -- Steve From peter.helfer.java at gmail.com Mon Oct 15 09:33:10 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Mon, 15 Oct 2007 18:33:10 +0200 Subject: My view of the interpreter Message-ID: Ok, I thought it might be of interest to others.. - how it is generated, the stubs.. - how the interpreter works (how it is jumping around..) - some words about registers/frame layout I would greatly appreciate any comments / corrections / extensions ! Peter Licensing: I believe that this should be under CC license.. any objections ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071015/514431d4/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: hotspot_interpreter.pdf Type: application/pdf Size: 370015 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071015/514431d4/attachment.pdf From linuxhippy at gmail.com Mon Oct 15 11:10:12 2007 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Mon, 15 Oct 2007 20:10:12 +0200 Subject: My view of the interpreter In-Reply-To: References: Message-ID: <194f62550710151110r3726fb7uc55237022c5c34ee@mail.gmail.com> Thanks a lot, very interesting :) lg Clemens From Thomas.Rodriguez at Sun.COM Tue Oct 16 16:17:20 2007 From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez) Date: Tue, 16 Oct 2007 16:17:20 -0700 Subject: Dolphin release and Escape Analysis In-Reply-To: References: Message-ID: <47154680.9010805@sun.com> Currently there is no support for stack allocation in hotspot. EA currently is only used for lock elision and there's work in progress to support scalar replacement of objects. Because hotspot GC is precise, true stack allocation trickles out into the rest of the system since code which was expecting to see a pointer into the heap might instead see pointers into the stack. It's tractable but somewhat tricky. An alternative would be to have a thread local area in the heap which can be managed directly by compiled code for the purposes of stack allocation. We have an ongoing research project with the University of Linz in Austria around the hotspot compilers and as part that they developed an escape analysis algorithm for the client compiler along with the needed runtime support for rematerialization of objects which was needed to support deoptimization. The runtime support has been integrated into hotspot already but C2 uses a different algorithm than was used in the C1 work. Anyway, you might find the papers at http://www.ssw.uni-linz.ac.at/General/Staff/TK/Research/Publications interesting as they discuss supporting true stack allocation. tom Avinash Lakshman wrote: > Hi All > > I recently downloaded the latest Dolphin release. I was curious to check > out the much talked about stack allocation feature. Is this available in > the Dolphin release and if so how do I turn it on. Please advice > > Thanks > A From peter.helfer.java at gmail.com Wed Oct 17 08:43:27 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Wed, 17 Oct 2007 17:43:27 +0200 Subject: Bug when walking entry frame...? Message-ID: Hi all I'm seeing this error.. I can make a workaround, but is this the intended behavior ? #>cd openjdk/control/build/linux-i586-debug/bin #>./java ------------------- Frame ID: b7db7c04 Testers: is_interpreted_frame(): true is_java_frame(): true is_entry_frame(): false is_native_frame(): false is_runtime_frame(): false is_compiled_frame(): false is_safepoint_blob_frame(): false is_deoptimized(): false is_first_frame(): false is_first_java_frame(): true is_interpreted_frame_valid(): true should_be_deoptimized(): false can_be_deoptimized(): false frame size: 11 sender frame: b7db7c30 real sender frame: b7db7c30 ------------------- Frame ID: b7db7c30 Testers: is_interpreted_frame(): false is_java_frame(): false is_entry_frame(): true is_native_frame(): false is_runtime_frame(): false is_compiled_frame(): false is_safepoint_blob_frame(): false is_deoptimized(): false is_first_frame(): true # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/frame_i486.cpp:148 # # An unexpected error has been detected by Java Runtime Environment: # # Internal Error (/home/phelfer/workspace/openjdk/hotspot/src/cpu/i486/vm/frame_i486.cpp:148), pid=29739, tid=3084618640 # Error: assert(!entry_frame_is_first(),"next Java fp must be non zero") # The code that leads to it: print_custom(){ [...] RegisterMap(thread, false); // happens as well with 'true' tty->print_cr(" is_first_java_frame():\t\t%s", is_first_java_frame() ? "true" : "false"); tty->print_cr(" is_interpreted_frame_valid():\t%s", is_interpreted_frame_valid() ? "true" : "false"); tty->print_cr(" should_be_deoptimized():\t%s", should_be_deoptimized() ? "true" : "false"); tty->print_cr(" can_be_deoptimized():\t\t%s", can_be_deoptimized() ? "true" : "false"); tty->print_cr("frame size:\t\t%d", frame_size()); tty->print_cr("sender frame:\t\t%x", sender(&map).id()); tty->print_cr("real sender frame:\t%x", real_sender(&map).id()); } frame frame::sender(RegisterMap* map) const { // Default is we done have to follow them. The sender_for_xxx will // update it accordingly map->set_include_argument_oops(false); if (is_entry_frame()) return sender_for_entry_frame(map); if (is_interpreted_frame()) return sender_for_interpreter_frame(map); assert(_cb == CodeCache::find_blob(pc()),"Must be the same"); if (_cb != NULL) { return sender_for_compiled_frame(map); } // Must be native-compiled frame, i.e. the marshaling code for native // methods that exists in the core system. return frame(sender_sp(), link(), sender_pc()); } frame frame::sender_for_entry_frame(RegisterMap* map) const { assert(map != NULL, "map must be set"); // Java frame called from C; skip all C frames and return top C // frame of that chunk as the sender JavaFrameAnchor* jfa = entry_frame_call_wrapper()->anchor(); assert(!entry_frame_is_first(), "next Java fp must be non zero"); assert(jfa->last_Java_sp() > _sp, "must be above this frame on stack"); map->clear(); Regards, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071017/6c4c10a6/attachment.html From Thomas.Rodriguez at Sun.COM Wed Oct 17 09:40:51 2007 From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez) Date: Wed, 17 Oct 2007 09:40:51 -0700 Subject: Bug when walking entry frame...? In-Reply-To: References: Message-ID: <47163B13.409@sun.com> I think you're asking for the sender of the oldest frame which doesn't have a sender. It only safe to call sender if !is_first_frame() which is basically what the assert is complaining about. By the way it also may be not safe to call is_interpreted_frame_valid() on something that isn't an interpreter frame. tom Peter Helfer wrote: > Hi all > > I'm seeing this error.. I can make a workaround, but is this the > intended behavior ? > > #>cd openjdk/control/build/linux-i586-debug/bin > #>./java > ------------------- > Frame ID: b7db7c04 > Testers: > is_interpreted_frame(): true > is_java_frame(): true > is_entry_frame(): false > is_native_frame(): false > is_runtime_frame(): false > is_compiled_frame(): false > is_safepoint_blob_frame(): false > is_deoptimized(): false > is_first_frame(): false > is_first_java_frame(): true > is_interpreted_frame_valid(): true > should_be_deoptimized(): false > can_be_deoptimized(): false > frame size: 11 > sender frame: b7db7c30 > real sender frame: b7db7c30 > ------------------- > Frame ID: b7db7c30 > Testers: > is_interpreted_frame(): false > is_java_frame(): false > is_entry_frame(): true > is_native_frame(): false > is_runtime_frame(): false > is_compiled_frame(): false > is_safepoint_blob_frame(): false > is_deoptimized(): false > is_first_frame(): true > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/frame_i486.cpp:148 > # > # An unexpected error has been detected by Java Runtime Environment: > # > # Internal Error > (/home/phelfer/workspace/openjdk/hotspot/src/cpu/i486/vm/frame_i486.cpp:148), > pid=29739, tid=3084618640 > # Error: assert(!entry_frame_is_first(),"next Java fp must be non zero") > # > > The code that leads to it: > > print_custom(){ > [...] > RegisterMap(thread, false); // happens as well with 'true' > tty->print_cr(" is_first_java_frame():\t\t%s", is_first_java_frame() ? > "true" : "false"); > tty->print_cr(" is_interpreted_frame_valid():\t%s", > is_interpreted_frame_valid() ? "true" : "false"); > tty->print_cr(" should_be_deoptimized():\t%s", should_be_deoptimized() > ? "true" : "false"); > tty->print_cr(" can_be_deoptimized():\t\t%s", can_be_deoptimized() ? > "true" : "false"); > tty->print_cr("frame size:\t\t%d", frame_size()); > tty->print_cr("sender frame:\t\t%x", sender(&map).id()); > tty->print_cr("real sender frame:\t%x", real_sender(&map).id()); > > } > > frame frame::sender(RegisterMap* map) const { > // Default is we done have to follow them. The sender_for_xxx will > // update it accordingly > map->set_include_argument_oops(false); > > if (is_entry_frame()) return sender_for_entry_frame(map); > if (is_interpreted_frame()) return sender_for_interpreter_frame(map); > assert(_cb == CodeCache::find_blob(pc()),"Must be the same"); > > if (_cb != NULL) { > return sender_for_compiled_frame(map); > } > // Must be native-compiled frame, i.e. the marshaling code for native > // methods that exists in the core system. > return frame(sender_sp(), link(), sender_pc()); > } > > frame frame::sender_for_entry_frame(RegisterMap* map) const { > assert(map != NULL, "map must be set"); > // Java frame called from C; skip all C frames and return top C > // frame of that chunk as the sender > JavaFrameAnchor* jfa = entry_frame_call_wrapper()->anchor(); > assert(!entry_frame_is_first(), "next Java fp must be non zero"); > assert(jfa->last_Java_sp() > _sp, "must be above this frame on stack"); > map->clear(); > > > > Regards, Peter > > > > > > From peter.helfer.java at gmail.com Thu Oct 18 08:54:49 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Thu, 18 Oct 2007 17:54:49 +0200 Subject: The right locks for frame rewriting / Exceptions Message-ID: Hi all I'm on the way to do some frame-hacking (patch_pc, copy some frames into another thread..) Now I need to be sure, that what I am doing is safe with regard to: - any GC operation working on either of the two thread stacks - neither thread is running currently on that part of the stack = one thread must be sleeping, the other might call a VM function to do exactly that stack-changing operation (even in its own stack..) Q1) what kind of lock(s) is suited for this operation ? Additionally, assume I have an interpreted method calling an interpreted method; this one rethrows an exception - that is, the local exception handler couldn't handle the exception. Did I get that right: (x86-specific) - Bytecode 'athrow' calls Interpreter::throw_exception_entry() with the exception object (oop) in RAX. This empties the expression stack and FPU stack, and calls InterpreterRuntime::exception_handler_for_exception. This resolves the exception into the handler for it, returning the exception in RDX; and the handler in RAX. The resolving process checks whether a 'catch' is around for that (methodOop->fast_exception_handler_bci_for greater zero), adds this BCI to the BCP, (handler_pc = h_method->code_base() + handler_bci) and returns the dispatch table entry for that one: continuation = Interpreter::dispatch_table(vtos)[*handler_pc]; Oh, Im getting off topic: if there is no handler around, it returns the Interpreter::remove_activation_entry(). This remove_activation_entry does save the exception from the stack into RAX, and saves RAX again in currentThread::vmResult. Now it calls masm->remove_activation(TosState=vtos, returnaddr=rdx, throwMonitorException=false, installMonitorException=true, notifyJVMDI=false). After the removal, it restores the exception into RAX (verifyOop as well), and calls just again (save temporarly RAX, RDX) InterpreterRuntime::exception_handler_for_exception, save this result into RBX (restore RDX, RAX) and jump there.. Q2) Now what does remove_activation exactly ? It does unlocking of objects under certain circumstances... now could somebody literate please shed some light on that ? I haven't figured out, how that beast works... Q3) implicit exceptions are being generated by some fancy path leading to THROW_MSG/THROW_OOP which finally creates a exception oop of the desired type, and sets that for the thread: thread->set_pending_exception(h_exception(), file, line) - but where are they picked up again ? Regards, Peter PS: the next questions will be most probably about deoptimizing... be prepared for nasty questions :-) ! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071018/f1c45f14/attachment.html From peter.helfer.java at gmail.com Fri Oct 19 09:50:12 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Fri, 19 Oct 2007 18:50:12 +0200 Subject: The right locks for frame rewriting / Exceptions In-Reply-To: References: Message-ID: Ok, I found one answer myself... for the rest I am still trying to figure out. One additional question which came to my mind: When creating a new JavaThread, do I have to pass a java.lang.Thread-Oop to the newly created thread ? And if so, what is the right way to create that ? JavaThread* TPool::createThread(ThreadFunction entrypoint){ JavaThread* result_thread = NULL; { MutexLocker ml_thread(Threads_lock); result_thread = new JavaThread(entrypoint); //size = 0 if (result_thread->osthread() != NULL) { //jobject javalangThreadObj = DO I NEED THAT ? // result_thread->prepare(javalangThreadObj); } else { delete result_thread; result_thread = NULL; // no message of failed thread creation... } // Thread::start(result_thread) -- nope, we are just creating it... } } Regards, Peter 2007/10/18, Peter Helfer : > > Hi all > > I'm on the way to do some frame-hacking (patch_pc, copy some frames into > another thread..) > > Now I need to be sure, that what I am doing is safe with regard to: > - any GC operation working on either of the two thread stacks > - neither thread is running currently on that part of the stack > = one thread must be sleeping, the other might call a VM function to do > exactly that stack-changing operation (even in its own stack..) > Q1) what kind of lock(s) is suited for this operation ? > > > > Additionally, assume I have an interpreted method calling an interpreted > method; this one rethrows an exception - that is, the local exception > handler couldn't handle the exception. Did I get that right: (x86-specific) > > - Bytecode 'athrow' calls Interpreter::throw_exception_entry() with the > exception object (oop) in RAX. This empties the expression stack and FPU > stack, and calls InterpreterRuntime::exception_handler_for_exception. This > resolves the exception into the handler for it, returning the exception in > RDX; and the handler in RAX. > The resolving process checks whether a 'catch' is around for that > (methodOop->fast_exception_handler_bci_for greater zero), adds this BCI to > the BCP, (handler_pc = h_method->code_base() + handler_bci) and returns the > dispatch table entry for that one: continuation = > Interpreter::dispatch_table(vtos)[*handler_pc]; > Oh, Im getting off topic: if there is no handler around, it returns the > Interpreter::remove_activation_entry(). > > This remove_activation_entry does save the exception from the stack into > RAX, and saves RAX again in currentThread::vmResult. Now it calls > masm->remove_activation(TosState=vtos, returnaddr=rdx, > throwMonitorException=false, installMonitorException=true, > notifyJVMDI=false). After the removal, it restores the exception into RAX > (verifyOop as well), and calls just again (save temporarly RAX, RDX) > InterpreterRuntime::exception_handler_for_exception, save this result into > RBX (restore RDX, RAX) and jump there.. > > Q2) Now what does remove_activation exactly ? It does unlocking of objects > under certain circumstances... now could somebody literate please shed some > light on that ? I haven't figured out, how that beast works... Doh, I can answer that myself after scrolling around... Read the source, luke :-) interp_masm_i486.cpp says: // remove activation // // Unlock the receiver if this is a synchronized method. // Unlock any Java monitors from synchronized blocks. // Remove the activation from the stack. // // If there are locked Java monitors // If throw_monitor_exception // throws IllegalMonitorStateException // Else if install_monitor_exception // installs IllegalMonitorStateException // Else // no error processing Q3) implicit exceptions are being generated by some fancy path leading to > THROW_MSG/THROW_OOP which finally creates a exception oop of the desired > type, and sets that for the thread: > thread->set_pending_exception(h_exception(), file, line) - but where are > they picked up again ? > > > > Regards, Peter > > > > PS: the next questions will be most probably about deoptimizing... be > prepared for nasty questions :-) ! > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071019/91c2d728/attachment.html From David.Holmes at Sun.COM Sat Oct 20 06:13:43 2007 From: David.Holmes at Sun.COM (David Holmes) Date: Sat, 20 Oct 2007 23:13:43 +1000 Subject: The right locks for frame rewriting / Exceptions In-Reply-To: References: Message-ID: <4719FF07.7010604@sun.com> Peter, Regarding the thread question. Take a look at attach_current_thread to see how a Java Thread object is created for an existing native thread. There should always be an associated Thread oop while a JavaThread is active. Of course when you are starting a java.lang.Thread you already have the Thread oop. David Holmes Peter Helfer wrote: > Ok, I found one answer myself... for the rest I am still trying to > figure out. One additional question which came to my mind: > > When creating a new JavaThread, do I have to pass a java.lang.Thread-Oop > to the newly created thread ? And if so, what is the right way to create > that ? > > > JavaThread* TPool::createThread(ThreadFunction entrypoint){ > JavaThread* result_thread = NULL; > { > MutexLocker ml_thread(Threads_lock); > result_thread = new JavaThread(entrypoint); //size = 0 > if (result_thread->osthread() != NULL) { > //jobject javalangThreadObj = DO I NEED THAT ? > // result_thread->prepare(javalangThreadObj); > } else { > delete result_thread; > result_thread = NULL; > // no message of failed thread creation... > } > // Thread::start(result_thread) -- nope, we are just creating it... > } > } > > > > Regards, Peter > > > > 2007/10/18, Peter Helfer < peter.helfer.java at gmail.com > >: > > Hi all > > I'm on the way to do some frame-hacking (patch_pc, copy some frames > into another thread..) > > Now I need to be sure, that what I am doing is safe with regard to: > - any GC operation working on either of the two thread stacks > - neither thread is running currently on that part of the stack > = one thread must be sleeping, the other might call a VM function > to do exactly that stack-changing operation (even in its own stack..) > Q1) what kind of lock(s) is suited for this operation ? > > > > Additionally, assume I have an interpreted method calling an > interpreted method; this one rethrows an exception - that is, the > local exception handler couldn't handle the exception. Did I get > that right: (x86-specific) > > - Bytecode 'athrow' calls Interpreter::throw_exception_entry() with > the exception object (oop) in RAX. This empties the expression stack > and FPU stack, and calls > InterpreterRuntime::exception_handler_for_exception. This resolves > the exception into the handler for it, returning the exception in > RDX; and the handler in RAX. > The resolving process checks whether a 'catch' is around for that > (methodOop->fast_exception_handler_bci_for greater zero), adds this > BCI to the BCP, (handler_pc = h_method->code_base() + handler_bci) > and returns the dispatch table entry for that one: continuation = > Interpreter::dispatch_table(vtos)[*handler_pc]; > Oh, Im getting off topic: if there is no handler around, it returns > the Interpreter::remove_activation_entry(). > > This remove_activation_entry does save the exception from the stack > into RAX, and saves RAX again in currentThread::vmResult. Now it > calls masm->remove_activation(TosState=vtos, returnaddr=rdx, > throwMonitorException=false, installMonitorException=true, > notifyJVMDI=false). After the removal, it restores the exception > into RAX (verifyOop as well), and calls just again (save temporarly > RAX, RDX) InterpreterRuntime::exception_handler_for_exception, save > this result into RBX (restore RDX, RAX) and jump there.. > > Q2) Now what does remove_activation exactly ? It does unlocking of > objects under certain circumstances... now could somebody literate > please shed some light on that ? I haven't figured out, how that > beast works... > > > Doh, I can answer that myself after scrolling around... Read the source, > luke :-) interp_masm_i486.cpp says: > > // remove activation > // > // Unlock the receiver if this is a synchronized method. > // Unlock any Java monitors from synchronized blocks. > // Remove the activation from the stack. > // > // If there are locked Java monitors > // If throw_monitor_exception > // throws IllegalMonitorStateException > // Else if install_monitor_exception > // installs IllegalMonitorStateException > // Else > // no error processing > > > Q3) implicit exceptions are being generated by some fancy path > leading to THROW_MSG/THROW_OOP which finally creates a exception oop > of the desired type, and sets that for the thread: > thread->set_pending_exception(h_exception(), file, line) - but where > are they picked up again ? > > > > Regards, Peter > > > > PS: the next questions will be most probably about deoptimizing... > be prepared for nasty questions :-) ! > > > > > From Thomas.Rodriguez at Sun.COM Mon Oct 22 10:06:02 2007 From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez) Date: Mon, 22 Oct 2007 10:06:02 -0700 Subject: The right locks for frame rewriting / Exceptions In-Reply-To: References: Message-ID: <471CD87A.2010209@sun.com> > I'm on the way to do some frame-hacking (patch_pc, copy some frames into > another thread..) > > Now I need to be sure, that what I am doing is safe with regard to: > - any GC operation working on either of the two thread stacks > - neither thread is running currently on that part of the stack > = one thread must be sleeping, the other might call a VM function to > do exactly that stack-changing operation (even in its own stack..) > Q1) what kind of lock(s) is suited for this operation ? The main rule for modifying thread stacks in arbitrary ways is that any changes which aren't thread safe need to guarantee that no safepoint checks are performed while the stack is in an unsafe state. This means that you can perform VM transitions or acquire most locks. The standard MutexLocker includes safepoint checks though those can be skipped if you use a MutexLockerEx. You need to make sure you don't stop the system for an arbitrarily long time either since you'll be stopping all threads if a GC is required during this period. The best example of this is the deoptimization code which has an initial setup which is done without any special care in the VM and then once it's collected all the information it needs it it proceeds carefully while constructing the new interpreter frames and populating them. A NoSafepointVerifier can be used to make sure you aren't getting safepoint checks in situations you don't want. It's possible you'd need your own lock to coordinate your work though I don't know whether that's true. You'd need to create what's called a leaf lock, meaning that no other locks can be acquired while it's held and that it's generally held for a relatively short period of time. The Patching_lock is an example of this. Obviously both threads would have to be blocked at the point you are doing this. How you coordinate them is up to you. You could use a Monitor to coordinate them, though off hand I can't remember whether you'd have to worry about any safepoint issues or if that's take care of for you. > Q2) Now what does remove_activation exactly ? It does unlocking of > objects under certain circumstances... now could somebody literate > please shed some light on that ? I haven't figured out, how that beast > works... It basically remove an existing frame either so that you can resume in the caller or so that you can reexecute it. In the case of throwing an exception where there is no handler in the current frame it unlocks any locks held and then execution should move into the exception dispatch code to figure out how to handle exceptions in the caller frame. The basic model of exception dispatch is that you check the current frame for a handler and resume execution at that handler if it exists, otherwise you remove the current frame and then find the exception handler for the current return address, which might be an interpreter or compiled frame. > > Q3) implicit exceptions are being generated by some fancy path leading > to THROW_MSG/THROW_OOP which finally creates a exception oop of the > desired type, and sets that for the thread: > thread->set_pending_exception(h_exception(), file, line) - but where are > they picked up again ? The runtime sets up newly thrown exceptions in the _pending_exception field of the thread. Runtime code written in C++ checks this field directly. Generated code normally checks this field on return from calls to the runtime and the value is moved out into a special register if it's non-null and then we jump to the exception dispatch code. The call_VM code normally takes care of this and it's a requirement that generated code check this on return, otherwise the exception could hang around forever. tom > > > Regards, Peter > > > > PS: the next questions will be most probably about deoptimizing... be > prepared for nasty questions :-) ! > > > > From Thomas.Rodriguez at Sun.COM Tue Oct 23 18:23:09 2007 From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez) Date: Tue, 23 Oct 2007 18:23:09 -0700 Subject: The right locks for frame rewriting / Exceptions In-Reply-To: References: <471CD87A.2010209@sun.com> Message-ID: <471E9E7D.1010204@sun.com> > I guess I found that in sharedRuntime_i486.cpp - > SharedRuntime::generate_deopt_blob(). I could see the > "save_live_registers" and the work thereafter. I assume a safepoint > is circumvented by calling the C function directly, without going > through callVM, and JRT_ENTRY, which would cause a ThreadInVMfromJava > to be allocated. That's right. That's what's generically called a leaf call and it's used in quite a few places like exception dispatch and for various helper functions which are implemented in C but aren't allowed to safepoint. > Well thats exactly my problem :-) I wanted to something like: > > [..jumping here from a return of a interpreter / compiled frame...] > otherThread = thisThread.otherThreadReference; > if(otherThread != NULL){ > adjustSP(-16); // 2 words, plus double word from XMM0 > save_RAX_into_Stack(); > save_RDX_into_Stack(); > save_XMM0_into_Stack(); > seen = Atomic::cmpxchg(1, otherThread.syncReq, 0); > if(seen == 0){ > this.monitor.wait(); // here, the returnPC could be changed in > between... > restore_XMM0_from_Stack(); > restore_RDX_from_Stack(); > restore_RAX_from_Stack(); > } > adjustSP(+16); > } > > What happens if I am having an object reference as return value, and the > GC decides to move just that object around, while my monitor is sleeping > ? As opposed to deopt, I cannot just run through this unsafe passage... I'm a little confused by your code and I can't quite make out what you are trying to accomplish. Is this piece of code part of every return or are you only jumping into this code sometimes? Are you expecting all C++ code or some mix? I think it would have to a mix. You cannot do anything that blocks from within generated assembly. You always have to call into C++ code in the runtime if you want to perform a blocking operation. One important point to keep in mind is that you can rarely do tricky stuff the same way in both interpreted and compiled code. In the interpeter if you want to call some special piece of code as part of the execution of a return bytecode then you don't really need to tell the GC anything special for it to find the return value since you should be at a return bytecode with the value on the top of stack. You just need to flush the frame state and call into the VM and block. For compiled code it's more tricky and you need something like the SafepointBlob to handle the state saving. It not safe for the VM to stop on the actual return instruction of compiled code since this creates various unpleasant states we don't want to deal with. The way this works for safepointing is that there's a poll right before the return and if we stop there then we pop that frame off and then call into the runtime so it looks like we're stopped at the call in the caller frame. GC of the return value is handled specially since it doesn't really belong to any frame at this point. Look at the code in safepoint.cpp in the method handle_polling_page_exception, in particular the code guarded by is_at_poll_return. Are you expecting to check otherThreadReference for every return bytecode? That seems very expensive... Also because of inlining in compiled code you would only be checking it on return from the whole compile unless you modify the compiler to emit checks for every inlined return. Can you give me the 10 second explanation of what you are trying to do? > I am still a bit confused when it comes to that magic RegisterMap / GC > thing ... If I assume that my return value is a reference to a GC-able > object, and I am saving it in the stack, how do I tell this to the frame > walker? If you are referencing oops from generated code this is usually accomplished by describing it in an OopMap or by saving it in special fields in the JavaThread named _vm_result and _vm_result2. sharedRuntime_ and c1_Runtime1_ have examples of this. call_VM allows you to pass in registers which should be saved and restored for the GC in the special fields. The complexity in your case is that if you use one stub for all return types you don't know statically whether the return value is an oop or not so it's impossible to know whether it's ok to store them as oops. That's why the safepoint blob works the way it does. tom From peter.helfer.java at gmail.com Wed Oct 24 06:29:31 2007 From: peter.helfer.java at gmail.com (Peter Helfer) Date: Wed, 24 Oct 2007 15:29:31 +0200 Subject: The right locks for frame rewriting / Exceptions In-Reply-To: <471E9E7D.1010204@sun.com> References: <471CD87A.2010209@sun.com> <471E9E7D.1010204@sun.com> Message-ID: 2007/10/24, Tom Rodriguez : > > > I guess I found that in sharedRuntime_i486.cpp - > > SharedRuntime::generate_deopt_blob(). I could see the > > "save_live_registers" and the work thereafter. I assume a safepoint > > is circumvented by calling the C function directly, without going > > through callVM, and JRT_ENTRY, which would cause a ThreadInVMfromJava > > to be allocated. > > That's right. That's what's generically called a leaf call and it's > used in quite a few places like exception dispatch and for various > helper functions which are implemented in C but aren't allowed to > safepoint. > > > Well thats exactly my problem :-) I wanted to something like: > > > > [..jumping here from a return of a interpreter / compiled frame...] > > otherThread = thisThread.otherThreadReference; > > if(otherThread != NULL){ > > adjustSP(-16); // 2 words, plus double word from XMM0 > > save_RAX_into_Stack(); > > save_RDX_into_Stack(); > > save_XMM0_into_Stack(); > > seen = Atomic::cmpxchg(1, otherThread.syncReq, 0); > > if(seen == 0){ > > this.monitor.wait(); // here, the returnPC could be changed in > > between... > > restore_XMM0_from_Stack(); > > restore_RDX_from_Stack(); > > restore_RAX_from_Stack(); > > } > > adjustSP(+16); > > } > > > > What happens if I am having an object reference as return value, and the > > GC decides to move just that object around, while my monitor is sleeping > > ? As opposed to deopt, I cannot just run through this unsafe passage... > > I'm a little confused by your code and I can't quite make out what you > are trying to accomplish. Is this piece of code part of every return or > are you only jumping into this code sometimes? It is only used once-in-a-while, i.e. not between every frame. They way it works is that two threads are started to work concurrently within the same method; the original frame gets a "join frame" inserted, which saves the original returnPC, and makes space for the results to be saved while joining. The newly created thread starts out with a ThreadFunction which is calling a wait first; in the mean time the parent thread, in the VM, gives the child thread the new entrypoint and notifies it. The ThreadFunction then calls that entrypoint, and on return does some similar "join function" in order to try synchronizing with the parent. (Cf. discussion "VM thread pool") Are you expecting all > C++ code or some mix? I think it would have to a mix. You cannot do > anything that blocks from within generated assembly. You always have to > call into C++ code in the runtime if you want to perform a blocking > operation. The plan for this piece of code (mix of generated asm, and calls into VM for blocking) is: its inserted (at least for now) between two interpreter frames, by patching the return PC of the younger frame, forcing it to return through here. I haven't yet figured out how to influence the return from a "compiled frame" properly, but thought of inserting it into the i2c adapter, as this is the last piece of code I can control who touches the returnPC, before leaving off into compiled code. To make sure that I am getting every return, I will insert some code as well into exception handling for the interpreter - in the remove_activation handler, where synchrinzation will be aborted properly. If I got it right, the RuntimeExceptions force deoptimization (looking at DeoptReason) - but I cant recall what happens for user generated exceptions.. One important point to keep in mind is that you can rarely do tricky > stuff the same way in both interpreted and compiled code. In the > interpeter if you want to call some special piece of code as part of the > execution of a return bytecode then you don't really need to tell the GC > anything special for it to find the return value since you should be at > a return bytecode with the value on the top of stack. You just need to > flush the frame state and call into the VM and block. > > For compiled code it's more tricky and you need something like the > SafepointBlob to handle the state saving. It not safe for the VM to > stop on the actual return instruction of compiled code since this > creates various unpleasant states we don't want to deal with. The way > this works for safepointing is that there's a poll right before the > return and if we stop there then we pop that frame off and then call > into the runtime so it looks like we're stopped at the call in the > caller frame. GC of the return value is handled specially since it > doesn't really belong to any frame at this point. Look at the code in > safepoint.cpp in the method handle_polling_page_exception, in particular > the code guarded by is_at_poll_return. > > Are you expecting to check otherThreadReference for every return > bytecode? That seems very expensive... Also because of inlining in > compiled code you would only be checking it on return from the whole > compile unless you modify the compiler to emit checks for every inlined > return. As explained above, the otherThreadRef is only checked in the join frames (Well the child really checks another flag more often..) Can you give me the 10 second explanation of what you are trying to do? Hopefully the explanation above is ok for you. > I am still a bit confused when it comes to that magic RegisterMap / GC > > thing ... If I assume that my return value is a reference to a GC-able > > object, and I am saving it in the stack, how do I tell this to the frame > > walker? > > If you are referencing oops from generated code this is usually > accomplished by describing it in an OopMap or by saving it in special > fields in the JavaThread named _vm_result and _vm_result2. > sharedRuntime_ and c1_Runtime1_ have examples of this. > call_VM allows you to pass in registers which should be saved and > restored for the GC in the special fields. The complexity in your case > is that if you use one stub for all return types you don't know > statically whether the return value is an oop or not so it's impossible > to know whether it's ok to store them as oops. That's why the safepoint > blob works the way it does. > > tom > Well, what I could do is generate two different join stubs, one for value types and one for oops - but still I need to be sure, that this saved oop is not gonna be changed. ---------------------------------------------- So lets assume I take the OopMap path. Maybe I'm stubborn or an idiot... but I still don't get the point of that following code: On entry into the code, we first save all registers into the stack, then we note down frame_complete (which I don't know what it should mark); call C code, but outside of the VM; then we add this OopMap to the set of OopMaps by calling add_gc_map; what the range current_offset-start should denote is not clear to me. The new_runtime_stub allocates a ThreadInVMfromUnknown, in order to get the CodeCache_lock; and then generates this RuntimeStub by calling the constructor. Essentially it is calling CodeBlob(name, cb, sizeof(RuntimeStub), size, frame_complete, frame_size, oop_maps) - which in turn really sets up that piece of code by compacting it, writing down the relocs, header, and instruction start. RuntimeStub* generate_my_stub(){ ResourceMark rm; CodeBuffer buffer("my cool join stub", 1000, 512); MacroAssembler* masm = new MacroAssembler(&buffer); int frame_size_words; OopMapSet *oop_maps = new OopMapSet(); OopMap* map = NULL; int start = __ offset(); map = RegisterSaver::save_live_registers(masm, extra_words = 0, &frame_size_words); int frame_complete = __ offset(); __ get_thread(rdi); __ pushl(rdi); __ set_last_Java_frame(thread, noreg, rbp, NULL); __ call(RuntimeAddress(CAST_FROM_FN_PTR(Static::myFancyCFunction)); // calls static (whatever) Static::myFancyCFunction(JavaThread* thread); // Set an oopmap for the call site. // We need this not only for callee-saved registers, but also for volatile // registers that the compiler might be keeping live across a safepoint. oop_maps->add_gc_map( __ offset() - start, map); [... do something with the results from RAX ] // make sure all code is generated masm->flush(); // return the blob // frame_size_words or bytes?? return RuntimeStub::new_runtime_stub(name, &buffer, frame_complete, frame_size_words, oop_maps, true); ---------------------------------------------- On the other hand, if I'm taking the save-oop-in-thread path, how does GC make sure it doesnt touch those objects ? are you keeping a don't-touch-list ? Does this code look safe ? [return from method, whose methodOop.is_returning_oop() == true] proposal_stub_for_oop(){ enum layout_for_all_join_stubs { returnPC = 0 rax rdx xmm0_l xmm0_h extra } // RAX has the oop get_thread(rcx) movl(Address(rcx, vm_result_offset()), rax); // save the result Oop // load saved returnPC movl(rax, Address(rsp, returnPC*wordsize)); movl(Address(rcx, vm_result_offset_2() callVM(.. sleep_on_monitor()..); // set_vm_result_2 could be set to a new return addr! // RAX is ignored, i.e. void get_thread(rcx) movl(rax, Address(rcx, vm_result_offset()); // restore Oop movl(rcx, Address(rcx, vm_result_offset_2()); // restore returnPC jmp(rcx); // and jump to return pc (whether patched or not) } Ok, I hope its not too much in one mail.. Thank you very much for working out all the details.. it really helps pushing my research further! Regards, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071024/cb7e286a/attachment.html