From peter.helfer.java at gmail.com  Tue Oct  9 08:15:25 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Tue, 9 Oct 2007 17:15:25 +0200
Subject: Interpreter calling a C method
Message-ID: <ee73e03b0710090815u1503964dn4a0be5ebe43ef66e@mail.gmail.com>

Hi all

I would like to call a C function in the interpreter when I invoke a new
method; I assume I have to change the TemplateTable::invokeZZZ, by inserting
some assembly instructions. I figured out the calls are finally done by
InterpreterMacroAssembler::jump_from_interpreted(Register method, Register
temp).

I thought I could use call_VM_leaf(CAST_FROM_FN_PTR(address,
SharedRuntime::do_funnymethod), 0) - but I get the error message
'InterpreterMacroAssembler::call_VM_leaf_base: last_sp != NULL'.

Now the questions..
a) how can I call a C method from the runtime-generated assembly without
destroying the current frame ?

b) how do I figure out which regs are used for what ? Do I have to go with
that OopMap and let it iterate over the frames ?

c) is there a definite 'calling convention guide' for Java ? I have found
some information dispersed in serveral places (ok, they are at least all in
the same folder) about the frame layout = stack layout, about some regs and
their usage


Thanks, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071009/85e6abee/attachment.html 

From Steve.Goldman at Sun.COM  Tue Oct  9 08:44:43 2007
From: Steve.Goldman at Sun.COM (steve goldman)
Date: Tue, 09 Oct 2007 11:44:43 -0400
Subject: Interpreter calling a C method
In-Reply-To: <ee73e03b0710090815u1503964dn4a0be5ebe43ef66e@mail.gmail.com>
References: <ee73e03b0710090815u1503964dn4a0be5ebe43ef66e@mail.gmail.com>
Message-ID: <470BA1EB.7000703@sun.com>

Peter Helfer wrote:
> Hi all
> 
> I would like to call a C function in the interpreter when I invoke a new
> method; I assume I have to change the TemplateTable::invokeZZZ, by inserting
> some assembly instructions. I figured out the calls are finally done by
> InterpreterMacroAssembler::jump_from_interpreted(Register method, Register
> temp).

You do want to modify TemplateTable::invokeZZZ.

> 
> I thought I could use call_VM_leaf(CAST_FROM_FN_PTR(address,
> SharedRuntime::do_funnymethod), 0) - but I get the error message
> 'InterpreterMacroAssembler::call_VM_leaf_base: last_sp != NULL'.

First you need to understand the difference between leaf calls and 
non-leaf call_VM's. You only can do a call_VM_leaf if there is no 
possibility of blocking for a safepoint. So if do_funnymethod does much 
in the vm you don't want to be using leaf.

The next thing to learn about is what is called the frame anchor (or 
last Java frame). Whenever we are leaving Java mode for either the vm or 
native (jni) we need to store a description of where the last Java frame 
was so that the stack walker can find all the frames should a safepoint 
occur while in vm/native. This is will happen automatically when you use 
call_VM (call_VM_leaf doesn't do this because you promised not to block).

There error you are getting appears to be happing because you tried to 
jam you call_VM_leaf into jump_from_interpreted after this code:

// set sender sp
   leal(rsi, Address(rsp, wordSize));
   // record last_sp
   movl(Address(rbp, frame::interpreter_frame_last_sp_offset * 
wordSize), rsi);

which is incorrect. call_VM_leaf expects that no blocking will occur and 
to that no one will need to see last_sp for the interpreter frame and so 
it complains. This assert is mostly there to ensure that the interpreter 
frame's last_sp is properly set/cleared around Java calls.

If you moved your code ahead of the leal then it should work  as long as 
you don't destroy the register "method".

> 
> Now the questions..
> a) how can I call a C method from the runtime-generated assembly without
> destroying the current frame ?

call_VM or call_VM_leaf
> 
> b) how do I figure out which regs are used for what ? Do I have to go with
> that OopMap and let it iterate over the frames ?

I don't begin to really understand the question. So I'll ask more,

what regs are we talking about where? Since you are calling C the only 
regs you care about are the ones the interpreter is already using. 
call_VM* will save/restore them around the call. If do_funnymethod takes 
any parameters you'll need to develop them into regs that the 
interpreter isn't using. (rsi/rdi should be avoided).

You don't have to worry about oopMaps. For interpreter frame there is a 
fixed layout and the jvm figures out what is live based on the bci 
(bytecode index) stored in the frame. oopMaps are typically used for 
stubs and for jit compiled code.

> 
> c) is there a definite 'calling convention guide' for Java ? I have found
> some information dispersed in serveral places (ok, they are at least all in
> the same folder) about the frame layout = stack layout, about some regs and
> their usage
> 

I don't think so. There's some stuff in my blog 
(http://blogs.sun.com/fatcatair) about calling conventions but I don't 
think it is exactly what you want.


-- 
Steve


From Steve.Goldman at Sun.COM  Tue Oct  9 11:29:22 2007
From: Steve.Goldman at Sun.COM (steve goldman)
Date: Tue, 09 Oct 2007 14:29:22 -0400
Subject: Interpreter calling a C method
In-Reply-To: <ee73e03b0710091040h3991e925jc8f07538b171c67@mail.gmail.com>
References: <ee73e03b0710090815u1503964dn4a0be5ebe43ef66e@mail.gmail.com>
	<470BA1EB.7000703@sun.com>
	<ee73e03b0710091040h3991e925jc8f07538b171c67@mail.gmail.com>
Message-ID: <470BC882.7080200@sun.com>

Peter Helfer wrote:
> @Steve: thanks for the quick reply!
> 
> Ok, firstly my do_funnymethod is actually just calling tty->print_cr("Here I
> am") which should not cause a safepoint, I assume; so far it worked when
> compiling the rest of make debug_build with that message.

In some sense that is illegal since it is doing i/o and should it block 
the jvm will halt. In this exploratory instance it is mostly ok.

In general you need to use valid thread state transitions. While we are 
in Java code the thread_state is "in_Java". When you transition to 
somewhere where you essentially leave Java (a safepoint could occur) you 
need to change the state so that the jvm won't dead lock. In general in 
entering the jvm you'll see some sort of entry macro (JRT_ENTRY, 
IRT_ENTRY) that defines the type of entry and does the proper state 
transitions.
> 
> I want to be able to tinker around with the frames just left by the
> interpreter (later on, it should be as well on compiled frames...), so I
> guess I have to stick to call_VM_base. What I've found is the interpreter
> frame layout in frame_i486.hpp:

You want to use call_VM (or call_VM_leaf) leave the _base methods alone.

> 
> // A frame represents a physical stack frame (an activation).  Frames can be
> // C or Java frames, and the Java frames can be interpreted or compiled.
> // In contrast, vframes represent source-level activations, so that one
> physical frame
> // can correspond to multiple source level frames because of inlining.
> // A frame is comprised of {pc, fp, sp}
> 
> // Layout of interpreter frame:
> //    [expression stack      ] * <- sp
> //    [monitors              ]   \
> //     ...                        | monitor block size
> //    [monitors              ]   /
> //    [monitor block size    ]
> //    [byte code index/pointr]                   = bcx()
> bcx_offset
> //    [pointer to locals     ]                   = locals()
> locals_offset
> //    [constant pool cache   ]                   = cache()
> cache_offset
> //    [methodData            ]                   = mdp()
> mdx_offset
> //    [methodOop             ]                   = method()
> method_offset
> //    [last sp               ]                   = last_sp()
> last_sp_offset
> //    [old stack pointer     ]                     (sender_sp)
> sender_sp_offset
> //    [old frame pointer     ]   <- fp           = link()
> //    [return pc             ]
> //    [oop temp              ]                     (only for native calls)
> //    [locals and parameters ]
> //                               <- sender sp
> 
> As well I figured out that the interpreter tries to leave things in regs
> (RAX, RDX) rather than storing it always on stack; this is encoded in
> TosState, what it is. Now when calling call_vm, it should take care of
> those, as you are telling me, right ?

Not if you are doing arbitrary calls at arbitrary places. In the case of 
invoke the tosca (top-of-stack-cache) has been flushed (i.e. stored to 
the expression stack so that we're in "vtos" mode). If you try this in 
arbitrary places to tosca must be flushed so that if a gc occurs it will 
see the proper stack state.

> 
> To get the vframes beneath the current frame (which is now marked as native
> (=in VM) I believe), I call thread->last_frame to get all the frames (which
> can comprise multiple vframes), or thread->last_java_vframe to get only the
> last java frames, without any native threads ?

The stack walkers can only see a subset of frames. They won't see any 
c++ frames. They can see interpreter frames, compiled frames, and stub 
frames. When you use a vanilla frame to walk the stack (via sender calls 
) you'll see individual frames  as they exist on the stack (i.e. an 
actual activation). Typically you do those kind of walk be starting with 
thread->last_frame(). In the case of a compiled frame that actual 
activation may contain multiple Java method frames because of inlining. 
You can see every one of these by using a vframeStream.  The stream is 
constructed using the thread and you don't need to do sender type calls.

> 
> In my case is this the right assumption about the stack layout ?
> 
> (up, growing to 0x0000)
> 
> | local vars   | <- %ESP
> | saved ebp    | <- %EBP
> | return addr  |
> | param1       |
> | ...          |         the call_vm frame (which is cdecl)
> | param N      |
> ++++++++++++++++++
> | java_argN    | [ADDR1]
> | ...          |
> | java_arg1    |
> | objectref    | (as long as we are not static)
> | rest of the  |
> | expression-  |
> | stack        |
> | monitorN     |
> | ....         |
> | monitor0     |
> | monitorblsize|

I don't know what monitorblsize is. I think this is wrong.

> | bci/bcp      |-> pointing to the instruction invokeZZZ in
> methodOop->constMethodOop->codes_offset()+ ~bci~
> | ptrToLocals  |-> pointing to ADDR0
> | cpCache      |
> | methodData   |
> | methodOop    |
> | lastSP       |-> pointing to ADDR1
> | oldSP        |-> pointing to ADDR2 (senderSP)
> | oldFP        |-> pointing to ADDR3
> | returnPC     |-> the PC of the caller, either interp, c2i or deopt
> -stubcode
> | local0       | [ADDR0]
> | locals1-N    |
> | paramN       | 
> | ...          |
> | param0       |
> ++++++++++++++

This last section is almost all wrong. The params of the caller become 
the locals of the callee. Since this is a stack and param0 is pushed 
first it is at a higher address and it is local[0] for the callee. 
param1 is essentially local[-1], etc. Since the callee can have more 
locals than params the caller's stack is typically extended by the 
"extra locals".

There might be some more things to learn at this blog: 
http://gbenson.livejournal.com/

which is a guy from RedHat doing a ppc port. He's using the c++ 
interpreter but a lot of the basics (like local layout) is the same. 
I've left some comments in his blog about questions he has raised trying 
to make this clear.

> | either:      | [ADDR2]
> | -expression stack
> | -compiled stuff stack
> |              |
> | oldFP        | [ADDR3]
> |
> |
> 0xFFFF:
> 
> Finally, to mess up totally, I could use in call_VM_base
> thread->frame_anchor->set_last_Java_pc(address pc) to change the return
> point to something totally different (to a compiled stub, which knows about
> how to continue with that current expression stack for example) ?

No. The anchor pc is there just to make the frame recognizable. That 
mutator is probably only used by the c++ interpreter and if you change 
the value you mostly only succeed in crashing the stack walker.

In order to modify a return address you need to use frame::patch_pc(). 
This only works in very certain circumstances. You can't use it to 
change where the VM is going to return to a stub. In general you are 
very safe to modify the pc of last_frame(). In even more generality you 
are likely to crash something without being very careful about when you 
apply a patch like this.
> 
> 
> One additional question: is there a general rule when a safepoint can happen
> ? I've seen the runtime/interfaceSupport.hpp, which declares all the macros
> JRT_ and IRT_ - which are allocating the
> ThreadInVMfromJava, and HandleMarkCleaner onstack to provoke a transition in
> the constructor/destructor
> Now by whom are safepoints placed ?

safepoint is a very overloaded term in the vm. A thread is at a "safe 
point" whenever it is stack walkable and safe for the gc to modify its 
stack. A safepoint is also used to mean when the vm thread has brought 
all the Java threads a safepoint. Reaching a safe point is a cooperative 
process. If a Java thread transitions to native then it is at a 
safepoint. Its stack is walkable and it will be blocked from further 
execution if it tries to modify Java state. Threads in "vm" state are 
not at a safepoint but they do have a stack that is walkable (in the 
sense we can find it all). Since the thead is still executing it isn't 
quite safe yet. If the vm thread decided to bring the world to a stop 
for something like a gc then if a thread in vm attempts to return to 
Java (say compiled/interpreted code) it will block itself. Threads 
running in Java mode poll to see whether they should block.

-- 
Steve


From peter.helfer.java at gmail.com  Fri Oct 12 12:51:15 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Fri, 12 Oct 2007 21:51:15 +0200
Subject: 3 questions
Message-ID: <ee73e03b0710121251p7ea5bbecyd0eb14ca1d66c4f0@mail.gmail.com>

Hi all

1) how far is the release of the disassembler to the public, specifically:
x86 ?

2) I'd like to allocate a pool of threads (JavaThreads) in the VM, and keep
them waiting until I figure out, what entrypoint they should take.

Now the plan would be to keep the threads preallocated, and let them wait on
a condition variable to be released. It seems only JavaThread(ThreadFunction
entry_point, size_t stack_size = 0) is to be used, the other constructor is
only for the main thread & jni, right ?

Now I would provide a function matching 'typedef void
(*ThreadFunction)(JavaThread*, TRAPS)' to the constructor, add to my pool
(just an array so far), and invoke Thread::start():


// assume JavaThread is extended by
// - a monitor (runtime/mutex.hpp) '_sleepVar'
// - an additional entry point of type address '_newentry' which should be
either the entry point of  the interpreter when jumping back from a method
(assumed that bcp is correctly updated), or any instruction in a compiled
version of a java method. I assume (for now) that the frame is correctly
initialized to continue at that point.

while(true){
      _sleepVar.wait(no_safepoint_check = false, timeout = 0,
as_suspend_equivalent = !_as_suspend_equivalent_flag);
      // what is that last flag doing ?

      if(_newentry != NULL){
            // In GCC AT&T syntax: Jump to _newentry (clobbers eax)
                     asm ("movl %0, %%eax; \n\t"
                              "jmp %eax"

             :                /* output: none */
             :"r"(_newentry)  /* input: _newentry */
             :  "%eax"        /* clobbered register */
             );

           }
      }

      // return point of function
      _newentry = NULL;

      // do some housecleaning
      run_housecleaning();

}


.. and some starting function:

jbool start_entrypoint(address entrypoint){
     assert(entrypoint);
     JavaThread* thread = _singleton_pool.getThread();
     if(thread != NULL){
        thread->set_new_entry(entrypoint);  // setter for entry point
        thread->getSleepVar()->notify();    // getter for sleep var
        return true;
     }
     return false;
}


Does this look feasible or is there a better way to go for ? Is there a
thread pool around (apart from java.util.concurrent.Executor et al.) ?


3)
I know that the interpreter jumps away using jump_from(Method, temp) to jump
to either the compiled entry (_code->entry()) or again the interpreter
(_i2i_entry, _from_compiled initially). This entry corresponds to the type
of method (native, synchronized, accessors, empty, intrinsic aka math
functions, or zerolocals aka normal), and has been determined at link time
(methodOopDesc:link_method).

I know as well, that many return stubs are generated, in order to jump back
into the interpreter and pick up where it left, as described in
AbstractInterpreterGenerator::generate_return_entry_for(TosState state, int
step) and stored into 'static Entrypoint
Interpreter::_return_entry[number_of_return_entries = 9].

If I'm not totally mistaken, the _return_entry[3] are for invokespecial,
static, virtual and [5] for invokeinterface, because of:

address AbstractInterpreter::return_entry(TosState state, int length) {
  guarantee(0 <= length && length < Interpreter::number_of_return_entries,
"illegal length");
  return _return_entry[length].entry(state);
}

.. and in TemplateTable_i486.cpp, prepare_invoke:

  // compute return type
  __ shrl(flags, ConstantPoolCacheEntry::tosBits);
  // Make sure we don't need to mask flags for tosBits after the above shift
  ConstantPoolCacheEntry::verify_tosBits();
  // load return address
  { const int table =
      is_invokeinterface
      ? (int)Interpreter::return_5_addrs_by_index_table()
      : (int)Interpreter::return_3_addrs_by_index_table();
    __ movl(flags, Address(noreg, flags, Address::times_4, table));
  }

  // push return address
  __ pushl(flags);

  // Restore flag value from the constant pool cache, and restore rsi
  // for later null checks.  rsi is the bytecode pointer
  if (save_flags) {
    __ movl(flags, rsi);
    __ restore_bcp();
  }


So this code determines by checking the TosBits in the child method, what
kind of return value it has to expect, computes the offset in either
return_X_addrs_by_index_table and pushes that value on the stack ?
So this means that it expects the result of that method in RAX (+RDX for
long/double), irregarding of whether the child method is compiled or the
interpreter ?


Now if I wanted to reroute the return call, I could change this pushed
return address to another stub, which would save the result (RAX/RDX), do
some freaky stuff like call the VM again, and finally return to the entry
beforehand exchanged ?


Thanks, Peter


PS @Steve: your hint helped really well, thanks!
To bring Steve's answer again to the list - I had to add the save of the BCP
before leaving it, otherwise the assertion would fail in
methodOop::bcp_from(int bci)

void InterpreterMacroAssembler::jump_from_interpreted(Register method,
Register temp) {
if(MyMagicEnabled)
    Label ignore;

    cmpw(Address(method, methodOopDesc::myFlag_offset()), myFlagValue);
    jcc(Assembler::aboveEqual, ignore);

    restore_bcp(); // this saves the current BCP into the frame and allows
to jump into the VM
    call_VM(temp, CAST_FROM_FN_PTR(address, MyCode::setMyFlagValueRight),
temp, true);

    bind(ignore);
  }
   // add the custom code BEFORE moving the last_sp into place

   // set sender sp
  leal(rsi, Address(rsp, wordSize));
  // record last_sp
  movl(Address(rbp, frame::interpreter_frame_last_sp_offset * wordSize),
rsi);

  //here is the jvti in between

  //finally jump!
  jmp(Address(method, methodOopDesc::from_interpreted_offset()));
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071012/1f499761/attachment.html 

From David.Holmes at Sun.COM  Fri Oct 12 21:01:48 2007
From: David.Holmes at Sun.COM (David Holmes - Sun Microsystems)
Date: Sat, 13 Oct 2007 14:01:48 +1000
Subject: VM thread pool ( was:  3 questions)
In-Reply-To: <ee73e03b0710121251p7ea5bbecyd0eb14ca1d66c4f0@mail.gmail.com>
References: <ee73e03b0710121251p7ea5bbecyd0eb14ca1d66c4f0@mail.gmail.com>
Message-ID: <4710432C.80306@sun.com>

Peter,

One answer, or more comment :) to #2.

There is no existing native thread pool in the VM. I'm sure it must have 
been considered though, ... which means there are probably some 
non-obvious "gotchas" waiting out there. Someone else may be able to 
chime in with more info there.

If you are going to try this then a couple of things you need to watch for:

1. Cleanup and reinitialization of thread-local state (native level I 
mean not Java ThreadLocal).

2. Removing the threads from the ThreadsList while they are idle and 
adding back when needed. (As you need the ThreadsLock for this you might 
as well use it to protect the queue of pool threads too.)

3. I'm not sure I see the need for a new entry point - as long as these 
are always going to execute Java threads. You just need to turn the 
existing entry into a loop that will return to waiting when a 
logical-thread has completed - though some of the initialization done 
when a new thread is created will have to be moved to the thread itself 
as part of the entry logic. I don't see that you would want to "jump" to 
a new entry rather than just calling it and returning normally.

Other than the TLS issue I think this would be a fairly straight-forward 
exercise. The complexities come with all of the thread management 
policies you might want, and how to expose them - as with 
ThreadPoolExecutor. :)

Cheers,
David Holmes

PS. I'll be traveling over the new few days so if there's any follow-up 
I may not see it for a while.


Peter Helfer said the following on 13/10/07 05:51 AM:
> Hi all
> 
> 1) how far is the release of the disassembler to the public, 
> specifically: x86 ?
> 
> 2) I'd like to allocate a pool of threads (JavaThreads) in the VM, and 
> keep them waiting until I figure out, what entrypoint they should take.
> 
> Now the plan would be to keep the threads preallocated, and let them 
> wait on a condition variable to be released. It seems only 
> JavaThread(ThreadFunction entry_point, size_t stack_size = 0) is to be 
> used, the other constructor is only for the main thread & jni, right ?
> 
> Now I would provide a function matching 'typedef void 
> (*ThreadFunction)(JavaThread*, TRAPS)' to the constructor, add to my 
> pool (just an array so far), and invoke Thread::start():
> 
> 
> // assume JavaThread is extended by
> // - a monitor (runtime/mutex.hpp) '_sleepVar'
> // - an additional entry point of type address '_newentry' which should 
> be either the entry point of  the interpreter when jumping back from a 
> method (assumed that bcp is correctly updated), or any instruction in a 
> compiled version of a java method. I assume (for now) that the frame is 
> correctly initialized to continue at that point.
> 
> while(true){
>       _sleepVar.wait(no_safepoint_check = false, timeout = 0, 
> as_suspend_equivalent = !_as_suspend_equivalent_flag); 
>       // what is that last flag doing ?
> 
>       if(_newentry != NULL){
>             // In GCC AT&T syntax: Jump to _newentry (clobbers eax)
>                      asm ("movl %0, %%eax; \n\t"
>                               "jmp %eax"
> 
>              :                /* output: none */
>              :"r"(_newentry)  /* input: _newentry */
>              :  "%eax"        /* clobbered register */
>              );
> 
>            }
>       }
> 
>       // return point of function
>       _newentry = NULL;
>      
>       // do some housecleaning
>       run_housecleaning();
> 
> }
> 
> 
> .. and some starting function:
> 
> jbool start_entrypoint(address entrypoint){
>      assert(entrypoint);
>      JavaThread* thread = _singleton_pool.getThread();
>      if(thread != NULL){
>         thread->set_new_entry(entrypoint);  // setter for entry point
>         thread->getSleepVar()->notify();    // getter for sleep var
>         return true;
>      }
>      return false;
> }
> 
> 
> Does this look feasible or is there a better way to go for ? Is there a 
> thread pool around (apart from java.util.concurrent.Executor et al.) ?
> 
> 
> 3)
> I know that the interpreter jumps away using jump_from(Method, temp) to 
> jump to either the compiled entry (_code->entry()) or again the 
> interpreter (_i2i_entry, _from_compiled initially). This entry 
> corresponds to the type of method (native, synchronized, accessors, 
> empty, intrinsic aka math functions, or zerolocals aka normal), and has 
> been determined at link time (methodOopDesc:link_method).
> 
> I know as well, that many return stubs are generated, in order to jump 
> back into the interpreter and pick up where it left, as described in
> AbstractInterpreterGenerator::generate_return_entry_for(TosState state, 
> int step) and stored into 'static Entrypoint 
> Interpreter::_return_entry[number_of_return_entries = 9].
> 
> If I'm not totally mistaken, the _return_entry[3] are for invokespecial, 
> static, virtual and [5] for invokeinterface, because of:
> 
> address AbstractInterpreter::return_entry(TosState state, int length) {
>   guarantee(0 <= length && length < 
> Interpreter::number_of_return_entries, "illegal length");
>   return _return_entry[length].entry(state);
> }
> 
> .. and in TemplateTable_i486.cpp, prepare_invoke:
> 
>   // compute return type
>   __ shrl(flags, ConstantPoolCacheEntry::tosBits);
>   // Make sure we don't need to mask flags for tosBits after the above shift
>   ConstantPoolCacheEntry::verify_tosBits();
>   // load return address
>   { const int table =
>       is_invokeinterface
>       ? (int)Interpreter::return_5_addrs_by_index_table()
>       : (int)Interpreter::return_3_addrs_by_index_table();
>     __ movl(flags, Address(noreg, flags, Address::times_4, table));
>   }
> 
>   // push return address
>   __ pushl(flags);
> 
>   // Restore flag value from the constant pool cache, and restore rsi
>   // for later null checks.  rsi is the bytecode pointer
>   if (save_flags) {
>     __ movl(flags, rsi);
>     __ restore_bcp();
>   }
> 
> 
> So this code determines by checking the TosBits in the child method, 
> what kind of return value it has to expect, computes the offset in 
> either return_X_addrs_by_index_table and pushes that value on the stack ?
> So this means that it expects the result of that method in RAX (+RDX for 
> long/double), irregarding of whether the child method is compiled or the 
> interpreter ?
> 
> 
> Now if I wanted to reroute the return call, I could change this pushed 
> return address to another stub, which would save the result (RAX/RDX), 
> do some freaky stuff like call the VM again, and finally return to the 
> entry beforehand exchanged ?
> 
> 
> Thanks, Peter
> 
> 
> 
> 
> 
> PS @Steve: your hint helped really well, thanks!
> To bring Steve's answer again to the list - I had to add the save of the 
> BCP before leaving it, otherwise the assertion would fail in 
> methodOop::bcp_from(int bci)
> 
> void InterpreterMacroAssembler::jump_from_interpreted(Register method, 
> Register temp) {
> if(MyMagicEnabled)
>     Label ignore;
> 
>     cmpw(Address(method, methodOopDesc::myFlag_offset()), myFlagValue);
>     jcc(Assembler::aboveEqual, ignore);
> 
>     restore_bcp(); // this saves the current BCP into the frame and 
> allows to jump into the VM
>     call_VM(temp, CAST_FROM_FN_PTR(address, 
> MyCode::setMyFlagValueRight), temp, true);
> 
>     bind(ignore);
>   }
>    // add the custom code BEFORE moving the last_sp into place
> 
>    // set sender sp
>   leal(rsi, Address(rsp, wordSize));
>   // record last_sp
>   movl(Address(rbp, frame::interpreter_frame_last_sp_offset * wordSize), 
> rsi);
> 
>   //here is the jvti in between
> 
>   //finally jump!
>   jmp(Address(method, methodOopDesc::from_interpreted_offset()));


From linuxhippy at gmail.com  Sat Oct 13 03:43:21 2007
From: linuxhippy at gmail.com (Clemens Eisserer)
Date: Sat, 13 Oct 2007 12:43:21 +0200
Subject: VM thread pool ( was: 3 questions)
In-Reply-To: <4710432C.80306@sun.com>
References: <ee73e03b0710121251p7ea5bbecyd0eb14ca1d66c4f0@mail.gmail.com>
	<4710432C.80306@sun.com>
Message-ID: <194f62550710130343t340bacddvf0ddf80d95bc372d@mail.gmail.com>

> 3. I'm not sure I see the need for a new entry point - as long as these
> are always going to execute Java threads. You just need to turn the
> existing entry into a loop that will return to waiting when a
> logical-thread has completed - though some of the initialization done
> when a new thread is created will have to be moved to the thread itself
> as part of the entry logic. I don't see that you would want to "jump" to
> a new entry rather than just calling it and returning normally.

Well wouldn't it be great too if the thread-pool could be accessed
from native code too?
I just imagine that for some quite heavy, but "concurrent-ready" work
it would be maybe good to have such a system. I just imagine Java2D
where large workloads (e.g. interpolate large image, fill antialiased
path) could benefit from the additional processors available.

lg Clemens


From avinash.lakshman at gmail.com  Sun Oct 14 20:25:26 2007
From: avinash.lakshman at gmail.com (Avinash Lakshman)
Date: Sun, 14 Oct 2007 20:25:26 -0700
Subject: Dolphin release and Escape Analysis
Message-ID: <a06de5520710142025t30551109je5294a3fd42d0efb@mail.gmail.com>

Hi All

I recently downloaded the latest Dolphin release. I was curious to check out
the much talked about stack allocation feature. Is this available in the
Dolphin release and if so how do I turn it on. Please advice

Thanks
A
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071014/c81489c1/attachment.html 

From peter.helfer.java at gmail.com  Mon Oct 15 04:15:28 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Mon, 15 Oct 2007 13:15:28 +0200
Subject: The different stubs, what they are for..
Message-ID: <ee73e03b0710150415j72635244t8967c96473b4cd27@mail.gmail.com>

I'm trying to compile a small overview about the interpreter, and how it
works all together. I stumbled across some Stubs I don't know yet the
precise meaning or intention behind, could somebody correct my assumptions ?

- early_ret(TosState) //  forced return by debugger/JVMTI, removes
activation frame, puts assignment compatible result on stack ?
- slow signature handler // what is that for ?
- continuation handler(TosState) // it sets interpreter mode (by setting
last_sp = NULL_WORD), and continues dispatching


Thanks, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071015/95d1451e/attachment.html 

From Steve.Goldman at Sun.COM  Mon Oct 15 06:09:00 2007
From: Steve.Goldman at Sun.COM (steve goldman)
Date: Mon, 15 Oct 2007 09:09:00 -0400
Subject: The different stubs, what they are for..
In-Reply-To: <ee73e03b0710150415j72635244t8967c96473b4cd27@mail.gmail.com>
References: <ee73e03b0710150415j72635244t8967c96473b4cd27@mail.gmail.com>
Message-ID: <4713666C.5030602@sun.com>

Peter Helfer wrote:
> I'm trying to compile a small overview about the interpreter, and how it
> works all together. I stumbled across some Stubs I don't know yet the
> precise meaning or intention behind, could somebody correct my assumptions ?
> 
> - early_ret(TosState) //  forced return by debugger/JVMTI, removes
> activation frame, puts assignment compatible result on stack ?

jvmti can ask the jvm to abort the current activation as if it were 
complete and return a result of the type expected.

> - slow signature handler // what is that for ?

Passing native arguments is done by signature handlers that are separate 
little pieces of code for particular signatures. For signatures that are 
too wide (many parameters) there is a generic handler to copy the args 
from the location that Java put them to where the native call expects 
them. You could in theory run with only the slow signature handler. It 
is slow since it is a jvm entry and a safepoint could occur. The latter 
point has caused some bugs in the past because the youngest frame is at 
an interesting state.

> - continuation handler(TosState) // it sets interpreter mode (by setting
> last_sp = NULL_WORD), and continues dispatching

This is used as part of deoptimization. When we create an interpreter 
frame(s) to replace a compiled frame we need to come up with pc's to 
return to. Depending on the exact state we were in when the deopt 
happened we made need various spots in the interpreter to resume 
execution. This is one of those. Look at the code in 
vframeArrayElement::unpack_on_stack().

-- 
Steve


From peter.helfer.java at gmail.com  Mon Oct 15 09:33:10 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Mon, 15 Oct 2007 18:33:10 +0200
Subject: My view of the interpreter
Message-ID: <ee73e03b0710150933r3093c112l185a5c26408ab4d@mail.gmail.com>

Ok, I thought it might be of interest to others..
- how it is generated, the stubs..
- how the interpreter works (how it is jumping around..)
- some words about registers/frame layout

I would greatly appreciate any comments / corrections / extensions !

Peter


Licensing: I believe that this should be under CC license.. any objections ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071015/514431d4/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hotspot_interpreter.pdf
Type: application/pdf
Size: 370015 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071015/514431d4/attachment.pdf 

From linuxhippy at gmail.com  Mon Oct 15 11:10:12 2007
From: linuxhippy at gmail.com (Clemens Eisserer)
Date: Mon, 15 Oct 2007 20:10:12 +0200
Subject: My view of the interpreter
In-Reply-To: <ee73e03b0710150933r3093c112l185a5c26408ab4d@mail.gmail.com>
References: <ee73e03b0710150933r3093c112l185a5c26408ab4d@mail.gmail.com>
Message-ID: <194f62550710151110r3726fb7uc55237022c5c34ee@mail.gmail.com>

Thanks a lot, very interesting :)

lg Clemens


From Thomas.Rodriguez at Sun.COM  Tue Oct 16 16:17:20 2007
From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez)
Date: Tue, 16 Oct 2007 16:17:20 -0700
Subject: Dolphin release and Escape Analysis
In-Reply-To: <a06de5520710142025t30551109je5294a3fd42d0efb@mail.gmail.com>
References: <a06de5520710142025t30551109je5294a3fd42d0efb@mail.gmail.com>
Message-ID: <47154680.9010805@sun.com>

Currently there is no support for stack allocation in hotspot.  EA currently is 
only used for lock elision and there's work in progress to support scalar 
replacement of objects.

Because hotspot GC is precise, true stack allocation trickles out into the rest 
of the system since code which was expecting to see a pointer into the heap 
might instead see pointers into the stack.  It's tractable but somewhat tricky. 
  An alternative would be to have a thread local area in the heap which can be 
managed directly by compiled code for the purposes of stack allocation.

We have an ongoing research project with the University of Linz in Austria 
around the hotspot compilers and as part that they developed an escape analysis 
algorithm for the client compiler along with the needed runtime support for 
rematerialization of objects which was needed to support deoptimization.  The 
runtime support has been integrated into hotspot already but C2 uses a different 
algorithm than was used in the C1 work.  Anyway, you might find the papers at 
http://www.ssw.uni-linz.ac.at/General/Staff/TK/Research/Publications interesting 
as they discuss supporting true stack allocation.

tom

Avinash Lakshman wrote:
> Hi All
> 
> I recently downloaded the latest Dolphin release. I was curious to check 
> out the much talked about stack allocation feature. Is this available in 
> the Dolphin release and if so how do I turn it on. Please advice
> 
> Thanks
> A


From peter.helfer.java at gmail.com  Wed Oct 17 08:43:27 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Wed, 17 Oct 2007 17:43:27 +0200
Subject: Bug when walking entry frame...?
Message-ID: <ee73e03b0710170843q494288f9p4ca4018f1ad8dc69@mail.gmail.com>

Hi all

I'm seeing this error.. I can make a workaround, but is this the intended
behavior ?

#>cd openjdk/control/build/linux-i586-debug/bin
#>./java
-------------------
Frame ID:               b7db7c04
Testers:
 is_interpreted_frame():        true
 is_java_frame():               true
 is_entry_frame():              false
 is_native_frame():             false
 is_runtime_frame():            false
 is_compiled_frame():           false
 is_safepoint_blob_frame():     false
 is_deoptimized():              false
 is_first_frame():                      false
 is_first_java_frame():         true
 is_interpreted_frame_valid():  true
 should_be_deoptimized():       false
 can_be_deoptimized():          false
frame size:             11
sender frame:           b7db7c30
real sender frame:      b7db7c30
-------------------
Frame ID:               b7db7c30
Testers:
 is_interpreted_frame():        false
 is_java_frame():               false
 is_entry_frame():              true
 is_native_frame():             false
 is_runtime_frame():            false
 is_compiled_frame():           false
 is_safepoint_blob_frame():     false
 is_deoptimized():              false
 is_first_frame():                      true
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/frame_i486.cpp:148
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  Internal Error
(/home/phelfer/workspace/openjdk/hotspot/src/cpu/i486/vm/frame_i486.cpp:148),
pid=29739, tid=3084618640
#  Error: assert(!entry_frame_is_first(),"next Java fp must be non zero")
#

The code that leads to it:

print_custom(){
 [...]
 RegisterMap(thread, false); // happens as well with 'true'
 tty->print_cr(" is_first_java_frame():\t\t%s", is_first_java_frame() ?
"true" : "false");
 tty->print_cr(" is_interpreted_frame_valid():\t%s",
is_interpreted_frame_valid() ? "true" : "false");
 tty->print_cr(" should_be_deoptimized():\t%s", should_be_deoptimized() ?
"true" : "false");
 tty->print_cr(" can_be_deoptimized():\t\t%s", can_be_deoptimized() ? "true"
: "false");
 tty->print_cr("frame size:\t\t%d", frame_size());
 tty->print_cr("sender frame:\t\t%x", sender(&map).id());
 tty->print_cr("real sender frame:\t%x", real_sender(&map).id());

}

frame frame::sender(RegisterMap* map) const {
  // Default is we done have to follow them. The sender_for_xxx will
  // update it accordingly
  map->set_include_argument_oops(false);

  if (is_entry_frame())       return sender_for_entry_frame(map);
  if (is_interpreted_frame()) return sender_for_interpreter_frame(map);
  assert(_cb == CodeCache::find_blob(pc()),"Must be the same");

  if (_cb != NULL) {
    return sender_for_compiled_frame(map);
  }
  // Must be native-compiled frame, i.e. the marshaling code for native
  // methods that exists in the core system.
  return frame(sender_sp(), link(), sender_pc());
}

frame frame::sender_for_entry_frame(RegisterMap* map) const {
  assert(map != NULL, "map must be set");
  // Java frame called from C; skip all C frames and return top C
  // frame of that chunk as the sender
  JavaFrameAnchor* jfa = entry_frame_call_wrapper()->anchor();
  assert(!entry_frame_is_first(), "next Java fp must be non zero");
  assert(jfa->last_Java_sp() > _sp, "must be above this frame on stack");
  map->clear();


Regards, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071017/6c4c10a6/attachment.html 

From Thomas.Rodriguez at Sun.COM  Wed Oct 17 09:40:51 2007
From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez)
Date: Wed, 17 Oct 2007 09:40:51 -0700
Subject: Bug when walking entry frame...?
In-Reply-To: <ee73e03b0710170843q494288f9p4ca4018f1ad8dc69@mail.gmail.com>
References: <ee73e03b0710170843q494288f9p4ca4018f1ad8dc69@mail.gmail.com>
Message-ID: <47163B13.409@sun.com>

I think you're asking for the sender of the oldest frame which doesn't have a 
sender.  It only safe to call sender if !is_first_frame() which is basically 
what the assert is complaining about.  By the way it also may be not safe to 
call is_interpreted_frame_valid() on something that isn't an interpreter frame.

tom

Peter Helfer wrote:
> Hi all
> 
> I'm seeing this error.. I can make a workaround, but is this the 
> intended behavior ?
> 
> #>cd openjdk/control/build/linux-i586-debug/bin
> #>./java
> -------------------
> Frame ID:               b7db7c04
> Testers:
>  is_interpreted_frame():        true
>  is_java_frame():               true
>  is_entry_frame():              false
>  is_native_frame():             false
>  is_runtime_frame():            false
>  is_compiled_frame():           false
>  is_safepoint_blob_frame():     false
>  is_deoptimized():              false
>  is_first_frame():                      false
>  is_first_java_frame():         true
>  is_interpreted_frame_valid():  true
>  should_be_deoptimized():       false
>  can_be_deoptimized():          false
> frame size:             11
> sender frame:           b7db7c30
> real sender frame:      b7db7c30
> -------------------
> Frame ID:               b7db7c30
> Testers:
>  is_interpreted_frame():        false
>  is_java_frame():               false
>  is_entry_frame():              true
>  is_native_frame():             false
>  is_runtime_frame():            false
>  is_compiled_frame():           false
>  is_safepoint_blob_frame():     false
>  is_deoptimized():              false
>  is_first_frame():                      true
> # To suppress the following error report, specify this argument
> # after -XX: or in .hotspotrc:  SuppressErrorAt=/frame_i486.cpp:148
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  Internal Error 
> (/home/phelfer/workspace/openjdk/hotspot/src/cpu/i486/vm/frame_i486.cpp:148), 
> pid=29739, tid=3084618640
> #  Error: assert(!entry_frame_is_first(),"next Java fp must be non zero")
> #
> 
> The code that leads to it:
> 
> print_custom(){
>  [...]
>  RegisterMap(thread, false); // happens as well with 'true'
>  tty->print_cr(" is_first_java_frame():\t\t%s", is_first_java_frame() ? 
> "true" : "false");
>  tty->print_cr(" is_interpreted_frame_valid():\t%s", 
> is_interpreted_frame_valid() ? "true" : "false");
>  tty->print_cr(" should_be_deoptimized():\t%s", should_be_deoptimized() 
> ? "true" : "false");
>  tty->print_cr(" can_be_deoptimized():\t\t%s", can_be_deoptimized() ? 
> "true" : "false");
>  tty->print_cr("frame size:\t\t%d", frame_size());
>  tty->print_cr("sender frame:\t\t%x", sender(&map).id());
>  tty->print_cr("real sender frame:\t%x", real_sender(&map).id());
> 
> }
> 
> frame frame::sender(RegisterMap* map) const {
>   // Default is we done have to follow them. The sender_for_xxx will
>   // update it accordingly
>   map->set_include_argument_oops(false);
> 
>   if (is_entry_frame())       return sender_for_entry_frame(map);
>   if (is_interpreted_frame()) return sender_for_interpreter_frame(map);
>   assert(_cb == CodeCache::find_blob(pc()),"Must be the same");
> 
>   if (_cb != NULL) {
>     return sender_for_compiled_frame(map);
>   }
>   // Must be native-compiled frame, i.e. the marshaling code for native
>   // methods that exists in the core system.
>   return frame(sender_sp(), link(), sender_pc());
> }
> 
> frame frame::sender_for_entry_frame(RegisterMap* map) const {
>   assert(map != NULL, "map must be set");
>   // Java frame called from C; skip all C frames and return top C
>   // frame of that chunk as the sender
>   JavaFrameAnchor* jfa = entry_frame_call_wrapper()->anchor();
>   assert(!entry_frame_is_first(), "next Java fp must be non zero");
>   assert(jfa->last_Java_sp() > _sp, "must be above this frame on stack");
>   map->clear();
>  
> 
> 
> Regards, Peter
> 
> 
> 
> 
> 
> 


From peter.helfer.java at gmail.com  Thu Oct 18 08:54:49 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Thu, 18 Oct 2007 17:54:49 +0200
Subject: The right locks for frame rewriting / Exceptions
Message-ID: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>

Hi all

I'm on the way to do some frame-hacking (patch_pc, copy some frames into
another thread..)

Now I need to be sure, that what I am doing is safe with regard to:
- any GC operation working on either of the two thread stacks
- neither thread is running currently on that part of the stack
  = one thread must be sleeping, the other might call a VM function to do
exactly that stack-changing operation (even in its own stack..)
Q1) what kind of lock(s) is suited for this operation ?


Additionally, assume I have an interpreted method calling an interpreted
method; this one rethrows an exception - that is, the local exception
handler couldn't handle the exception. Did I get that right: (x86-specific)

- Bytecode 'athrow' calls Interpreter::throw_exception_entry() with the
exception object (oop) in RAX. This empties the expression stack and FPU
stack, and calls InterpreterRuntime::exception_handler_for_exception. This
resolves the exception into the handler for it, returning the exception in
RDX; and the handler in RAX.
The resolving process checks whether a 'catch' is around for that
(methodOop->fast_exception_handler_bci_for greater zero), adds this BCI to
the BCP, (handler_pc = h_method->code_base() + handler_bci) and returns the
dispatch table entry for that one: continuation =
Interpreter::dispatch_table(vtos)[*handler_pc];
Oh, Im getting off topic: if there is no handler around, it returns the
Interpreter::remove_activation_entry().

This remove_activation_entry does save the exception from the stack into
RAX, and saves RAX again in currentThread::vmResult. Now it calls
masm->remove_activation(TosState=vtos, returnaddr=rdx,
throwMonitorException=false, installMonitorException=true,
notifyJVMDI=false). After the removal, it restores the exception into RAX
(verifyOop as well), and calls just again (save temporarly RAX, RDX)
InterpreterRuntime::exception_handler_for_exception, save this result into
RBX (restore RDX, RAX) and jump there..

Q2) Now what does remove_activation exactly ? It does unlocking of objects
under certain circumstances... now could somebody literate please shed some
light on that ? I haven't figured out, how that beast works...


Q3) implicit exceptions are being generated by some fancy path leading to
THROW_MSG/THROW_OOP which finally creates a exception oop of the desired
type, and sets that for the thread:
thread->set_pending_exception(h_exception(), file, line) - but where are
they picked up again ?


Regards, Peter


PS: the next questions will be most probably about deoptimizing... be
prepared for nasty questions :-) !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071018/f1c45f14/attachment.html 

From peter.helfer.java at gmail.com  Fri Oct 19 09:50:12 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Fri, 19 Oct 2007 18:50:12 +0200
Subject: The right locks for frame rewriting / Exceptions
In-Reply-To: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
References: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
Message-ID: <ee73e03b0710190950p43dff3dn85483b7c870bf513@mail.gmail.com>

Ok, I found one answer myself... for the rest I am still trying to figure
out. One additional question which came to my mind:

When creating a new JavaThread, do I have to pass a java.lang.Thread-Oop to
the newly created thread ? And if so, what is the right way to create that ?


JavaThread* TPool::createThread(ThreadFunction entrypoint){
        JavaThread* result_thread = NULL;
        {
         MutexLocker ml_thread(Threads_lock);
         result_thread = new JavaThread(entrypoint); //size = 0
         if (result_thread->osthread() != NULL) {
                //jobject javalangThreadObj = DO I NEED THAT ?
                // result_thread->prepare(javalangThreadObj);
         } else {
                delete result_thread;
                result_thread = NULL;
                // no message of failed thread creation...
         }
         // Thread::start(result_thread) -- nope, we are just creating it...
        }
}


Regards, Peter


2007/10/18, Peter Helfer <peter.helfer.java at gmail.com>:
>
> Hi all
>
> I'm on the way to do some frame-hacking (patch_pc, copy some frames into
> another thread..)
>
> Now I need to be sure, that what I am doing is safe with regard to:
> - any GC operation working on either of the two thread stacks
> - neither thread is running currently on that part of the stack
>   = one thread must be sleeping, the other might call a VM function to do
> exactly that stack-changing operation (even in its own stack..)
> Q1) what kind of lock(s) is suited for this operation ?
>
>
>
> Additionally, assume I have an interpreted method calling an interpreted
> method; this one rethrows an exception - that is, the local exception
> handler couldn't handle the exception. Did I get that right: (x86-specific)
>
> - Bytecode 'athrow' calls Interpreter::throw_exception_entry() with the
> exception object (oop) in RAX. This empties the expression stack and FPU
> stack, and calls InterpreterRuntime::exception_handler_for_exception. This
> resolves the exception into the handler for it, returning the exception in
> RDX; and the handler in RAX.
> The resolving process checks whether a 'catch' is around for that
> (methodOop->fast_exception_handler_bci_for greater zero), adds this BCI to
> the BCP, (handler_pc = h_method->code_base() + handler_bci) and returns the
> dispatch table entry for that one: continuation =
> Interpreter::dispatch_table(vtos)[*handler_pc];
> Oh, Im getting off topic: if there is no handler around, it returns the
> Interpreter::remove_activation_entry().
>
> This remove_activation_entry does save the exception from the stack into
> RAX, and saves RAX again in currentThread::vmResult. Now it calls
> masm->remove_activation(TosState=vtos, returnaddr=rdx,
> throwMonitorException=false, installMonitorException=true,
> notifyJVMDI=false). After the removal, it restores the exception into RAX
> (verifyOop as well), and calls just again (save temporarly RAX, RDX)
> InterpreterRuntime::exception_handler_for_exception, save this result into
> RBX (restore RDX, RAX) and jump there..
>
> Q2) Now what does remove_activation exactly ? It does unlocking of objects
> under certain circumstances... now could somebody literate please shed some
> light on that ? I haven't figured out, how that beast works...


Doh, I can answer that myself after scrolling around... Read the source,
luke :-)  interp_masm_i486.cpp says:

// remove activation
//
// Unlock the receiver if this is a synchronized method.
// Unlock any Java monitors from synchronized blocks.
// Remove the activation from the stack.
//
// If there are locked Java monitors
//    If throw_monitor_exception
//       throws IllegalMonitorStateException
//    Else if install_monitor_exception
//       installs IllegalMonitorStateException
//    Else
//       no error processing


Q3) implicit exceptions are being generated by some fancy path leading to
> THROW_MSG/THROW_OOP which finally creates a exception oop of the desired
> type, and sets that for the thread:
> thread->set_pending_exception(h_exception(), file, line) - but where are
> they picked up again ?
>
>
>
> Regards, Peter
>
>
>
> PS: the next questions will be most probably about deoptimizing... be
> prepared for nasty questions :-) !
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071019/91c2d728/attachment.html 

From David.Holmes at Sun.COM  Sat Oct 20 06:13:43 2007
From: David.Holmes at Sun.COM (David Holmes)
Date: Sat, 20 Oct 2007 23:13:43 +1000
Subject: The right locks for frame rewriting / Exceptions
In-Reply-To: <ee73e03b0710190950p43dff3dn85483b7c870bf513@mail.gmail.com>
References: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
	<ee73e03b0710190950p43dff3dn85483b7c870bf513@mail.gmail.com>
Message-ID: <4719FF07.7010604@sun.com>

Peter,

Regarding the thread question. Take a look at attach_current_thread to see 
how a Java Thread object is created for an existing native thread. There 
should always be an associated Thread oop while a JavaThread is active. Of 
course when you are starting a java.lang.Thread you already have the Thread oop.

David Holmes

Peter Helfer wrote:
> Ok, I found one answer myself... for the rest I am still trying to 
> figure out. One additional question which came to my mind:
> 
> When creating a new JavaThread, do I have to pass a java.lang.Thread-Oop 
> to the newly created thread ? And if so, what is the right way to create 
> that ?
> 
>  
> JavaThread* TPool::createThread(ThreadFunction entrypoint){
>         JavaThread* result_thread = NULL;
>         {
>          MutexLocker ml_thread(Threads_lock);
>          result_thread = new JavaThread(entrypoint); //size = 0
>          if (result_thread->osthread() != NULL) {
>                 //jobject javalangThreadObj = DO I NEED THAT ?
>                 // result_thread->prepare(javalangThreadObj);
>          } else {
>                 delete result_thread;
>                 result_thread = NULL;
>                 // no message of failed thread creation...
>          }
>          // Thread::start(result_thread) -- nope, we are just creating it...
>         }
> }
>  
> 
> 
> Regards, Peter
> 
> 
> 
> 2007/10/18, Peter Helfer < peter.helfer.java at gmail.com 
> <mailto:peter.helfer.java at gmail.com>>:
> 
>     Hi all
> 
>     I'm on the way to do some frame-hacking (patch_pc, copy some frames
>     into another thread..)
> 
>     Now I need to be sure, that what I am doing is safe with regard to:
>     - any GC operation working on either of the two thread stacks
>     - neither thread is running currently on that part of the stack
>       = one thread must be sleeping, the other might call a VM function
>     to do exactly that stack-changing operation (even in its own stack..)
>     Q1) what kind of lock(s) is suited for this operation ?
> 
> 
> 
>     Additionally, assume I have an interpreted method calling an
>     interpreted method; this one rethrows an exception - that is, the
>     local exception handler couldn't handle the exception. Did I get
>     that right: (x86-specific)
> 
>     - Bytecode 'athrow' calls Interpreter::throw_exception_entry() with
>     the exception object (oop) in RAX. This empties the expression stack
>     and FPU stack, and calls
>     InterpreterRuntime::exception_handler_for_exception. This resolves
>     the exception into the handler for it, returning the exception in
>     RDX; and the handler in RAX.
>     The resolving process checks whether a 'catch' is around for that
>     (methodOop->fast_exception_handler_bci_for greater zero), adds this
>     BCI to the BCP, (handler_pc = h_method->code_base() + handler_bci)
>     and returns the dispatch table entry for that one: continuation =
>     Interpreter::dispatch_table(vtos)[*handler_pc];
>     Oh, Im getting off topic: if there is no handler around, it returns
>     the Interpreter::remove_activation_entry().
> 
>     This remove_activation_entry does save the exception from the stack
>     into RAX, and saves RAX again in currentThread::vmResult. Now it
>     calls masm->remove_activation(TosState=vtos, returnaddr=rdx,
>     throwMonitorException=false, installMonitorException=true,
>     notifyJVMDI=false). After the removal, it restores the exception
>     into RAX (verifyOop as well), and calls just again (save temporarly
>     RAX, RDX) InterpreterRuntime::exception_handler_for_exception, save
>     this result into RBX (restore RDX, RAX) and jump there..
> 
>     Q2) Now what does remove_activation exactly ? It does unlocking of
>     objects under certain circumstances... now could somebody literate
>     please shed some light on that ? I haven't figured out, how that
>     beast works... 
> 
> 
> Doh, I can answer that myself after scrolling around... Read the source, 
> luke :-)  interp_masm_i486.cpp says:
> 
> // remove activation
> //
> // Unlock the receiver if this is a synchronized method.
> // Unlock any Java monitors from synchronized blocks.
> // Remove the activation from the stack.
> //
> // If there are locked Java monitors
> //    If throw_monitor_exception
> //       throws IllegalMonitorStateException
> //    Else if install_monitor_exception
> //       installs IllegalMonitorStateException
> //    Else
> //       no error processing
> 
> 
>     Q3) implicit exceptions are being generated by some fancy path
>     leading to THROW_MSG/THROW_OOP which finally creates a exception oop
>     of the desired type, and sets that for the thread: 
>     thread->set_pending_exception(h_exception(), file, line) - but where
>     are they picked up again ?
> 
> 
> 
>     Regards, Peter
> 
> 
> 
>     PS: the next questions will be most probably about deoptimizing...
>     be prepared for nasty questions :-) !
> 
> 
> 
> 
> 


From Thomas.Rodriguez at Sun.COM  Mon Oct 22 10:06:02 2007
From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez)
Date: Mon, 22 Oct 2007 10:06:02 -0700
Subject: The right locks for frame rewriting / Exceptions
In-Reply-To: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
References: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
Message-ID: <471CD87A.2010209@sun.com>

 > I'm on the way to do some frame-hacking (patch_pc, copy some frames into
 > another thread..)
 >
 > Now I need to be sure, that what I am doing is safe with regard to:
 > - any GC operation working on either of the two thread stacks
 > - neither thread is running currently on that part of the stack
 >   = one thread must be sleeping, the other might call a VM function to
 > do exactly that stack-changing operation (even in its own stack..)
 > Q1) what kind of lock(s) is suited for this operation ?

The main rule for modifying thread stacks in arbitrary ways is that any 
changes which aren't thread safe need to guarantee that no safepoint 
checks are performed while the stack is in an unsafe state.  This means 
that you can perform VM transitions or acquire most locks.  The standard 
MutexLocker includes safepoint checks though those can be skipped if you 
use a MutexLockerEx.  You need to make sure you don't stop the system 
for an arbitrarily long time either since you'll be stopping all threads 
if a GC is required during this period.  The best example of this is the 
deoptimization code which has an initial setup which is done without any 
special care in the VM and then once it's collected all the information 
it needs it it proceeds carefully while constructing the new interpreter 
frames and populating them.  A NoSafepointVerifier can be used to make 
sure you aren't getting safepoint checks in situations you don't want.

It's possible you'd need your own lock to coordinate your work though I 
don't know whether that's true.  You'd need to create what's called a 
leaf lock, meaning that no other locks can be acquired while it's held 
and that it's generally held for a relatively short period of time.  The 
Patching_lock is an example of this.

Obviously both threads would have to be blocked at the point you are 
doing this.  How you coordinate them is up to you.  You could use a 
Monitor to coordinate them, though off hand I can't remember whether 
you'd have to worry about any safepoint issues or if that's take care of 
for you.

> Q2) Now what does remove_activation exactly ? It does unlocking of 
> objects under certain circumstances... now could somebody literate 
> please shed some light on that ? I haven't figured out, how that beast 
> works...

It basically remove an existing frame either so that you can resume in 
the caller or so that you can reexecute it.  In the case of throwing an 
exception where there is no handler in the current frame it unlocks any 
locks held and then execution should move into the exception dispatch 
code to figure out how to handle exceptions in the caller frame.  The 
basic model of exception dispatch is that you check the current frame 
for a handler and resume execution at that handler if it exists, 
otherwise you remove the current frame and then find the exception 
handler for the current return address, which might be an interpreter or 
compiled frame.

> 
> Q3) implicit exceptions are being generated by some fancy path leading 
> to THROW_MSG/THROW_OOP which finally creates a exception oop of the 
> desired type, and sets that for the thread:  
> thread->set_pending_exception(h_exception(), file, line) - but where are 
> they picked up again ?

The runtime sets up newly thrown exceptions in the _pending_exception 
field of the thread.  Runtime code written in C++ checks this field 
directly.  Generated code normally checks this field on return from 
calls to the runtime and the value is moved out into a special register 
if it's non-null and then we jump to the exception dispatch code.  The 
call_VM code normally takes care of this and it's a requirement that 
generated code check this on return, otherwise the exception could hang 
around forever.

tom

> 
> 
> Regards, Peter
> 
> 
> 
> PS: the next questions will be most probably about deoptimizing... be 
> prepared for nasty questions :-) !
> 
> 
> 
> 


From Thomas.Rodriguez at Sun.COM  Tue Oct 23 18:23:09 2007
From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez)
Date: Tue, 23 Oct 2007 18:23:09 -0700
Subject: The right locks for frame rewriting / Exceptions
In-Reply-To: <ee73e03b0710231227u5ba91701yeb08d3f1a2f25bbf@mail.gmail.com>
References: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
	<471CD87A.2010209@sun.com>
	<ee73e03b0710231227u5ba91701yeb08d3f1a2f25bbf@mail.gmail.com>
Message-ID: <471E9E7D.1010204@sun.com>

> I guess I found that in sharedRuntime_i486.cpp -
> SharedRuntime::generate_deopt_blob(). I could see the
> "save_live_registers" and the work thereafter. I assume a safepoint
> is circumvented by calling the C function directly, without going
> through callVM, and JRT_ENTRY, which would cause a ThreadInVMfromJava
> to be allocated.

That's right.  That's what's generically called a leaf call and it's 
used in quite a few places like exception dispatch and for various 
helper functions which are implemented in C but aren't allowed to safepoint.

> Well thats exactly my problem :-)   I wanted to something like:
> 
> [..jumping here from a return of a interpreter / compiled frame...]
> otherThread = thisThread.otherThreadReference;
> if(otherThread != NULL){
>   adjustSP(-16);  // 2 words,  plus double word from XMM0
>   save_RAX_into_Stack();
>   save_RDX_into_Stack();
>   save_XMM0_into_Stack();
>   seen = Atomic::cmpxchg(1, otherThread.syncReq, 0);
>   if(seen == 0){
>      this.monitor.wait();  // here, the returnPC could be changed in 
> between...
>      restore_XMM0_from_Stack();
>      restore_RDX_from_Stack();
>      restore_RAX_from_Stack();
>   }
>   adjustSP(+16);
> }
> 
> What happens if I am having an object reference as return value, and the 
> GC decides to move just that object around, while my monitor is sleeping 
> ? As opposed to deopt, I cannot just run through this unsafe passage...

I'm a little confused by your code and I can't quite make out what you 
are trying to accomplish.  Is this piece of code part of every return or 
are you only jumping into this code sometimes?  Are you expecting all 
C++ code or some mix?  I think it would have to a mix.  You cannot do 
anything that blocks from within generated assembly.  You always have to 
call into C++ code in the runtime if you want to perform a blocking 
operation.

One important point to keep in mind is that you can rarely do tricky 
stuff the same way in both interpreted and compiled code.  In the 
interpeter if you want to call some special piece of code as part of the 
execution of a return bytecode then you don't really need to tell the GC 
anything special for it to find the return value since you should be at 
a return bytecode with the value on the top of stack.  You just need to 
flush the frame state and call into the VM and block.

For compiled code it's more tricky and you need something like the 
SafepointBlob to handle the state saving.  It not safe for the VM to 
stop on the actual return instruction of compiled code since this 
creates various unpleasant states we don't want to deal with.  The way 
this works for safepointing is that there's a poll right before the 
return and if we stop there then we pop that frame off and then call 
into the runtime so it looks like we're stopped at the call in the 
caller frame.  GC of the return value is handled specially since it 
doesn't really belong to any frame at this point.  Look at the code in 
safepoint.cpp in the method handle_polling_page_exception, in particular 
the code guarded by is_at_poll_return.

Are you expecting to check otherThreadReference for every return 
bytecode?  That seems very expensive...  Also because of inlining in 
compiled code you would only be checking it on return from the whole 
compile unless you modify the compiler to emit checks for every inlined 
return.

Can you give me the 10 second explanation of what you are trying to do?

> I am still a bit confused when it comes to that magic RegisterMap / GC 
> thing ... If I assume that my return value is a reference to a GC-able 
> object, and I am saving it in the stack, how do I tell this to the frame 
> walker? 

If you are referencing oops from generated code this is usually 
accomplished by describing it in an OopMap or by saving it in special 
fields in the JavaThread named _vm_result and _vm_result2. 
sharedRuntime_<arch> and c1_Runtime1_<arch> have examples of this. 
call_VM allows you to pass in registers which should be saved and 
restored for the GC in the special fields.  The complexity in your case 
is that if you use one stub for all return types you don't know 
statically whether the return value is an oop or not so it's impossible 
to know whether it's ok to store them as oops.  That's why the safepoint 
blob works the way it does.

tom


From peter.helfer.java at gmail.com  Wed Oct 24 06:29:31 2007
From: peter.helfer.java at gmail.com (Peter Helfer)
Date: Wed, 24 Oct 2007 15:29:31 +0200
Subject: The right locks for frame rewriting / Exceptions
In-Reply-To: <471E9E7D.1010204@sun.com>
References: <ee73e03b0710180854gf0d0e93s66ddd8b3cffea591@mail.gmail.com>
	<471CD87A.2010209@sun.com>
	<ee73e03b0710231227u5ba91701yeb08d3f1a2f25bbf@mail.gmail.com>
	<471E9E7D.1010204@sun.com>
Message-ID: <ee73e03b0710240629w1a3f67d3md89bd99da6d30405@mail.gmail.com>

2007/10/24, Tom Rodriguez <Thomas.Rodriguez at sun.com>:
>
> > I guess I found that in sharedRuntime_i486.cpp -
> > SharedRuntime::generate_deopt_blob(). I could see the
> > "save_live_registers" and the work thereafter. I assume a safepoint
> > is circumvented by calling the C function directly, without going
> > through callVM, and JRT_ENTRY, which would cause a ThreadInVMfromJava
> > to be allocated.
>
> That's right.  That's what's generically called a leaf call and it's
> used in quite a few places like exception dispatch and for various
> helper functions which are implemented in C but aren't allowed to
> safepoint.
>
> > Well thats exactly my problem :-)   I wanted to something like:
> >
> > [..jumping here from a return of a interpreter / compiled frame...]
> > otherThread = thisThread.otherThreadReference;
> > if(otherThread != NULL){
> >   adjustSP(-16);  // 2 words,  plus double word from XMM0
> >   save_RAX_into_Stack();
> >   save_RDX_into_Stack();
> >   save_XMM0_into_Stack();
> >   seen = Atomic::cmpxchg(1, otherThread.syncReq, 0);
> >   if(seen == 0){
> >      this.monitor.wait();  // here, the returnPC could be changed in
> > between...
> >      restore_XMM0_from_Stack();
> >      restore_RDX_from_Stack();
> >      restore_RAX_from_Stack();
> >   }
> >   adjustSP(+16);
> > }
> >
> > What happens if I am having an object reference as return value, and the
> > GC decides to move just that object around, while my monitor is sleeping
> > ? As opposed to deopt, I cannot just run through this unsafe passage...
>
> I'm a little confused by your code and I can't quite make out what you
> are trying to accomplish.  Is this piece of code part of every return or
> are you only jumping into this code sometimes?


It is only used once-in-a-while, i.e. not between every frame. They way it
works is that two threads are started to work concurrently within the same
method; the original frame gets a "join frame" inserted, which saves the
original returnPC, and makes space for the results to be saved while
joining.
The newly created thread starts out with a ThreadFunction which is calling a
wait first; in the mean time the parent thread, in the VM, gives the child
thread the new entrypoint and notifies it. The ThreadFunction then calls
that entrypoint, and on return does some similar "join function" in order to
try synchronizing with the parent. (Cf. discussion "VM thread pool")


Are you expecting all
> C++ code or some mix?  I think it would have to a mix.  You cannot do
> anything that blocks from within generated assembly.  You always have to
> call into C++ code in the runtime if you want to perform a blocking
> operation.


The plan for this piece of code (mix of generated asm, and calls into VM for
blocking) is: its inserted (at least for now)  between two  interpreter
frames, by patching the return PC of the younger frame, forcing it to return
through here. I haven't yet figured out how to influence the return from a
"compiled frame" properly, but thought of inserting it into the i2c adapter,
as this is the last piece of code I can control who touches the returnPC,
before leaving off into compiled code.

To make sure that I am getting every return, I will insert some code as well
into exception handling for the interpreter - in the remove_activation
handler, where synchrinzation will be aborted properly.
If I got it right, the RuntimeExceptions force deoptimization (looking at
DeoptReason) - but I cant recall what happens for user generated
exceptions..


One important point to keep in mind is that you can rarely do tricky
> stuff the same way in both interpreted and compiled code.  In the
> interpeter if you want to call some special piece of code as part of the
> execution of a return bytecode then you don't really need to tell the GC
> anything special for it to find the return value since you should be at
> a return bytecode with the value on the top of stack.  You just need to
> flush the frame state and call into the VM and block.
>
> For compiled code it's more tricky and you need something like the
> SafepointBlob to handle the state saving.  It not safe for the VM to
> stop on the actual return instruction of compiled code since this
> creates various unpleasant states we don't want to deal with.  The way
> this works for safepointing is that there's a poll right before the
> return and if we stop there then we pop that frame off and then call
> into the runtime so it looks like we're stopped at the call in the
> caller frame.  GC of the return value is handled specially since it
> doesn't really belong to any frame at this point.  Look at the code in
> safepoint.cpp in the method handle_polling_page_exception, in particular
> the code guarded by is_at_poll_return.
>
> Are you expecting to check otherThreadReference for every return
> bytecode?  That seems very expensive...  Also because of inlining in
> compiled code you would only be checking it on return from the whole
> compile unless you modify the compiler to emit checks for every inlined
> return.


As explained above, the otherThreadRef is only checked in the  join frames
(Well the child really checks another flag more often..)

Can you give me the 10 second explanation of what you are trying to do?


Hopefully the explanation above is ok for you.

> I am still a bit confused when it comes to that magic RegisterMap / GC
> > thing ... If I assume that my return value is a reference to a GC-able
> > object, and I am saving it in the stack, how do I tell this to the frame
> > walker?
>
> If you are referencing oops from generated code this is usually
> accomplished by describing it in an OopMap or by saving it in special
> fields in the JavaThread named _vm_result and _vm_result2.
> sharedRuntime_<arch> and c1_Runtime1_<arch> have examples of this.
> call_VM allows you to pass in registers which should be saved and
> restored for the GC in the special fields.  The complexity in your case
> is that if you use one stub for all return types you don't know
> statically whether the return value is an oop or not so it's impossible
> to know whether it's ok to store them as oops.  That's why the safepoint
> blob works the way it does.
>
> tom
>


Well, what I could do is generate two different join stubs, one for value
types and one for oops - but still I need to be sure, that this saved oop is
not gonna be changed.

----------------------------------------------

So lets assume I take the OopMap path. Maybe I'm stubborn or an idiot... but
I still don't get the point of that following code:

On entry into the code, we first save all registers into the stack, then we
note down frame_complete (which I don't know what it should mark); call C
code, but outside of the VM; then we add this OopMap to the set of OopMaps
by calling add_gc_map; what the range current_offset-start should denote is
not clear to me.

The new_runtime_stub allocates a ThreadInVMfromUnknown, in order to get the
CodeCache_lock; and then generates this RuntimeStub by calling the
constructor. Essentially it is calling CodeBlob(name, cb,
sizeof(RuntimeStub), size, frame_complete, frame_size, oop_maps) - which in
turn really sets up that piece of code by compacting it, writing down the
relocs, header, and instruction start.


RuntimeStub* generate_my_stub(){
 ResourceMark rm;
 CodeBuffer buffer("my cool join stub", 1000, 512);
 MacroAssembler* masm  = new MacroAssembler(&buffer);
 int frame_size_words;
 OopMapSet *oop_maps = new OopMapSet();
 OopMap* map = NULL;

 int start = __ offset();
 map = RegisterSaver::save_live_registers(masm, extra_words = 0,
&frame_size_words);

 int frame_complete = __ offset();

 __ get_thread(rdi);
 __ pushl(rdi);
 __ set_last_Java_frame(thread, noreg, rbp, NULL);
 __ call(RuntimeAddress(CAST_FROM_FN_PTR(Static::myFancyCFunction));
 // calls static (whatever) Static::myFancyCFunction(JavaThread* thread);

 // Set an oopmap for the call site.
 // We need this not only for callee-saved registers, but also for volatile
 // registers that the compiler might be keeping live across a safepoint.
 oop_maps->add_gc_map( __ offset() - start, map);

  [... do something with the results from RAX ]

  // make sure all code is generated
  masm->flush();

  // return the  blob
  // frame_size_words or bytes??
 return RuntimeStub::new_runtime_stub(name, &buffer, frame_complete,
frame_size_words, oop_maps, true);

----------------------------------------------

On the other hand, if I'm taking the save-oop-in-thread path, how does GC
make sure it doesnt touch those objects ? are you keeping a don't-touch-list
? Does this code look safe ?

[return from method, whose methodOop.is_returning_oop() == true]
proposal_stub_for_oop(){
  enum layout_for_all_join_stubs {
                         returnPC = 0
                         rax
                         rdx
                         xmm0_l
                         xmm0_h
                         extra
  }

  // RAX has the oop
  get_thread(rcx)
  movl(Address(rcx, vm_result_offset()), rax); // save the result Oop

  // load saved returnPC
  movl(rax, Address(rsp, returnPC*wordsize));
  movl(Address(rcx, vm_result_offset_2()

  callVM(.. sleep_on_monitor()..); // set_vm_result_2 could be set to a new
return addr!
  // RAX is ignored, i.e. void

  get_thread(rcx)
  movl(rax, Address(rcx, vm_result_offset()); // restore Oop
  movl(rcx, Address(rcx, vm_result_offset_2()); // restore returnPC
  jmp(rcx); // and jump to return pc (whether patched or not)
}


Ok, I hope its not too much in one mail.. Thank you very much for working
out all the details.. it really helps pushing my research further!

Regards, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071024/cb7e286a/attachment.html