The right locks for frame rewriting / Exceptions

Wed Oct 24 06:29:31 PDT 2007

2007/10/24, Tom Rodriguez <Thomas.Rodriguez at sun.com>:
>
> > I guess I found that in sharedRuntime_i486.cpp -
> > SharedRuntime::generate_deopt_blob(). I could see the
> > "save_live_registers" and the work thereafter. I assume a safepoint
> > is circumvented by calling the C function directly, without going
> > through callVM, and JRT_ENTRY, which would cause a ThreadInVMfromJava
> > to be allocated.
>
> That's right.  That's what's generically called a leaf call and it's
> used in quite a few places like exception dispatch and for various
> helper functions which are implemented in C but aren't allowed to
> safepoint.
>
> > Well thats exactly my problem :-)   I wanted to something like:
> >
> > [..jumping here from a return of a interpreter / compiled frame...]
> > otherThread = thisThread.otherThreadReference;
> > if(otherThread != NULL){
> >   adjustSP(-16);  // 2 words,  plus double word from XMM0
> >   save_RAX_into_Stack();
> >   save_RDX_into_Stack();
> >   save_XMM0_into_Stack();
> >   seen = Atomic::cmpxchg(1, otherThread.syncReq, 0);
> >   if(seen == 0){
> >      this.monitor.wait();  // here, the returnPC could be changed in
> > between...
> >      restore_XMM0_from_Stack();
> >      restore_RDX_from_Stack();
> >      restore_RAX_from_Stack();
> >   }
> >   adjustSP(+16);
> > }
> >
> > What happens if I am having an object reference as return value, and the
> > GC decides to move just that object around, while my monitor is sleeping
> > ? As opposed to deopt, I cannot just run through this unsafe passage...
>
> I'm a little confused by your code and I can't quite make out what you
> are trying to accomplish.  Is this piece of code part of every return or
> are you only jumping into this code sometimes?

It is only used once-in-a-while, i.e. not between every frame. They way it
works is that two threads are started to work concurrently within the same
method; the original frame gets a "join frame" inserted, which saves the
original returnPC, and makes space for the results to be saved while
joining.
The newly created thread starts out with a ThreadFunction which is calling a
wait first; in the mean time the parent thread, in the VM, gives the child
thread the new entrypoint and notifies it. The ThreadFunction then calls
that entrypoint, and on return does some similar "join function" in order to
try synchronizing with the parent. (Cf. discussion "VM thread pool")

Are you expecting all
> C++ code or some mix?  I think it would have to a mix.  You cannot do
> anything that blocks from within generated assembly.  You always have to
> call into C++ code in the runtime if you want to perform a blocking
> operation.

The plan for this piece of code (mix of generated asm, and calls into VM for
blocking) is: its inserted (at least for now)  between two  interpreter
frames, by patching the return PC of the younger frame, forcing it to return
through here. I haven't yet figured out how to influence the return from a
"compiled frame" properly, but thought of inserting it into the i2c adapter,
as this is the last piece of code I can control who touches the returnPC,
before leaving off into compiled code.

To make sure that I am getting every return, I will insert some code as well
into exception handling for the interpreter - in the remove_activation
handler, where synchrinzation will be aborted properly.
If I got it right, the RuntimeExceptions force deoptimization (looking at
DeoptReason) - but I cant recall what happens for user generated
exceptions..

One important point to keep in mind is that you can rarely do tricky
> stuff the same way in both interpreted and compiled code.  In the
> interpeter if you want to call some special piece of code as part of the
> execution of a return bytecode then you don't really need to tell the GC
> anything special for it to find the return value since you should be at
> a return bytecode with the value on the top of stack.  You just need to
> flush the frame state and call into the VM and block.
>
> For compiled code it's more tricky and you need something like the
> SafepointBlob to handle the state saving.  It not safe for the VM to
> stop on the actual return instruction of compiled code since this
> creates various unpleasant states we don't want to deal with.  The way
> this works for safepointing is that there's a poll right before the
> return and if we stop there then we pop that frame off and then call
> into the runtime so it looks like we're stopped at the call in the
> caller frame.  GC of the return value is handled specially since it
> doesn't really belong to any frame at this point.  Look at the code in
> safepoint.cpp in the method handle_polling_page_exception, in particular
> the code guarded by is_at_poll_return.
>
> Are you expecting to check otherThreadReference for every return
> bytecode?  That seems very expensive...  Also because of inlining in
> compiled code you would only be checking it on return from the whole
> compile unless you modify the compiler to emit checks for every inlined
> return.

As explained above, the otherThreadRef is only checked in the  join frames
(Well the child really checks another flag more often..)

Can you give me the 10 second explanation of what you are trying to do?

Hopefully the explanation above is ok for you.

> I am still a bit confused when it comes to that magic RegisterMap / GC
> > thing ... If I assume that my return value is a reference to a GC-able
> > object, and I am saving it in the stack, how do I tell this to the frame
> > walker?
>
> If you are referencing oops from generated code this is usually
> accomplished by describing it in an OopMap or by saving it in special
> fields in the JavaThread named _vm_result and _vm_result2.
> sharedRuntime_<arch> and c1_Runtime1_<arch> have examples of this.
> call_VM allows you to pass in registers which should be saved and
> restored for the GC in the special fields.  The complexity in your case
> is that if you use one stub for all return types you don't know
> statically whether the return value is an oop or not so it's impossible
> to know whether it's ok to store them as oops.  That's why the safepoint
> blob works the way it does.
>
> tom
>

Well, what I could do is generate two different join stubs, one for value
types and one for oops - but still I need to be sure, that this saved oop is
not gonna be changed.

----------------------------------------------

So lets assume I take the OopMap path. Maybe I'm stubborn or an idiot... but
I still don't get the point of that following code:

On entry into the code, we first save all registers into the stack, then we
note down frame_complete (which I don't know what it should mark); call C
code, but outside of the VM; then we add this OopMap to the set of OopMaps
by calling add_gc_map; what the range current_offset-start should denote is
not clear to me.

The new_runtime_stub allocates a ThreadInVMfromUnknown, in order to get the
CodeCache_lock; and then generates this RuntimeStub by calling the
constructor. Essentially it is calling CodeBlob(name, cb,
sizeof(RuntimeStub), size, frame_complete, frame_size, oop_maps) - which in
turn really sets up that piece of code by compacting it, writing down the
relocs, header, and instruction start.

RuntimeStub* generate_my_stub(){
 ResourceMark rm;
 CodeBuffer buffer("my cool join stub", 1000, 512);
 MacroAssembler* masm  = new MacroAssembler(&buffer);
 int frame_size_words;
 OopMapSet *oop_maps = new OopMapSet();
 OopMap* map = NULL;

 int start = __ offset();
 map = RegisterSaver::save_live_registers(masm, extra_words = 0,
&frame_size_words);

 int frame_complete = __ offset();

 __ get_thread(rdi);
 __ pushl(rdi);
 __ set_last_Java_frame(thread, noreg, rbp, NULL);
 __ call(RuntimeAddress(CAST_FROM_FN_PTR(Static::myFancyCFunction));
 // calls static (whatever) Static::myFancyCFunction(JavaThread* thread);

 // Set an oopmap for the call site.
 // We need this not only for callee-saved registers, but also for volatile
 // registers that the compiler might be keeping live across a safepoint.
 oop_maps->add_gc_map( __ offset() - start, map);

  [... do something with the results from RAX ]

  // make sure all code is generated
  masm->flush();

  // return the  blob
  // frame_size_words or bytes??
 return RuntimeStub::new_runtime_stub(name, &buffer, frame_complete,
frame_size_words, oop_maps, true);

----------------------------------------------

On the other hand, if I'm taking the save-oop-in-thread path, how does GC
make sure it doesnt touch those objects ? are you keeping a don't-touch-list
? Does this code look safe ?

[return from method, whose methodOop.is_returning_oop() == true]
proposal_stub_for_oop(){
  enum layout_for_all_join_stubs {
                         returnPC = 0
                         rax
                         rdx
                         xmm0_l
                         xmm0_h
                         extra
  }

  // RAX has the oop
  get_thread(rcx)
  movl(Address(rcx, vm_result_offset()), rax); // save the result Oop

  // load saved returnPC
  movl(rax, Address(rsp, returnPC*wordsize));
  movl(Address(rcx, vm_result_offset_2()

  callVM(.. sleep_on_monitor()..); // set_vm_result_2 could be set to a new
return addr!
  // RAX is ignored, i.e. void

  get_thread(rcx)
  movl(rax, Address(rcx, vm_result_offset()); // restore Oop
  movl(rcx, Address(rcx, vm_result_offset_2()); // restore returnPC
  jmp(rcx); // and jump to return pc (whether patched or not)
}

Ok, I hope its not too much in one mail.. Thank you very much for working
out all the details.. it really helps pushing my research further!

Regards, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20071024/cb7e286a/attachment.html