RFR: Parallelize safepoint cleanup

Thu Jul 6 17:14:38 UTC 2017

On 7/6/17 10:53 AM, Roman Kennke wrote:
> Am 06.07.2017 um 18:47 schrieb Igor Veresov:
>>
>>> On Jul 6, 2017, at 3:14 AM, Tobias Hartmann 
>>> <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>
>>> Hi,
>>>
>>> On 05.07.2017 20:30, Daniel D. Daugherty wrote:
>>>> JDK-8132849 is assigned to Tobias; it would be good to get Tobias'
>>>> review of this fix also.
>>>
>>> Thanks for the notification. The sweeper/safepoint changes look good 
>>> to me!
>>>
>>>> src/share/vm/runtime/sweeper.cpp
>>>>    L205:     // TODO: Is this really needed?
>>>>    L206:     OrderAccess::storestore();
>>>>        That's a good question. Looks like that storestore() was
>>>>        added by this changeset:
>>>>
>>>>        $ hg log -r 5357 src/share/vm/runtime/sweeper.cpp
>>>>        changeset:   5357:510fbd28919c
>>>>        user:        anoll
>>>>        date:        Fri Sep 27 10:50:55 2013 +0200
>>>>        summary:     8020151: PSR:PERF Large performance regressions 
>>>> when code cache is filled
>>>>
>>>>        The changeset is not small and it looks like two
>>>>        OrderAccess::storestore() calls were added (and one
>>>>        load_ptr_acquire() was deleted):
>>>>
>>>>        $ hg diff -r 5356 -r 5357 | grep OrderAccess
>>>>        +      OrderAccess::storestore();
>>>>        -  nmethod *code = (nmethod 
>>>> *)OrderAccess::load_ptr_acquire(&_code);
>>>>        +  OrderAccess::storestore();
>>>>
>>>>        It could be that the storestore() is matching an existing
>>>>        OrderAccess operation or it could have been added in an
>>>>        abundance of caution. We definitely need a Compiler team
>>>>        person to take a look here.
>>>
>>> Unfortunately, I'm also not sure if that barrier is required. 
>>> Looking at the old RFR thread:
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-September/011588.html
>>>
>>> It seems that Igor V. suggested this:
>>> "You definitely need a store-store barrier for non-TSO architectures 
>>> after the mark_as_seen_on_stack() call on line 1360. Otherwise it 
>>> still can be reordered by the CPU with respect to the following 
>>> state assignment. Also neither of these state variables are volatile 
>>> in nmethod, so even the compiler may reorder the stores."
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-September/011729.html
>>>
>>> The requested OrderAccess::storestore() was added to 
>>> nmethod::make_not_entrant_or_zombie() but seems like Albert also 
>>> added one to NMethodSweeper::mark_active_nmethods().
>>>
>>> I'll ping Igor, maybe he knows more.
>>
>>
>> I think the reason is explained in the comment:
>>
>> // Must happen before state change. Otherwise we have a race condition in
>> // nmethod::can_not_entrant_be_converted(). I.e., a method can 
>> immediately
>> // transition its state from 'not_entrant' to 'zombie' without having 
>> to wait
>> // for stack scanning.
>> if (state == not_entrant) {
>> mark_as_seen_on_stack();
>> OrderAccess::storestore();
>>     }
>>
>> // Change state
>> _state = state;
>>
>> Although can_not_entrant_be_converted() is now called 
>> can_convert_to_zombie(). The scenario can so like this:
>> 1. We’re setting the state to not_entrant. But the _state assignment 
>> happens before setting the traversal count in mark_as_seen_on_stack().
>> 2. While we’re doing this, the sweeper scans nmethods and is in 
>> process_compiled_method():
>>
>>   } else if (cm->is_not_entrant()) {
>> // If there are no current activations of this method on the
>> // stack we can safely convert it to a zombie method
>> if (cm->can_convert_to_zombie()) {
>> // Clear ICStubs to prevent back patching stubs of zombie or flushed
>> // nmethods during the next safepoint (see ICStub::finalize).
>>       {
>> MutexLocker cl(CompiledIC_lock);
>> cm->clear_ic_stubs();
>>       }
>> // Code cache state change is tracked in make_zombie()
>> cm->make_zombie();
>>
>>
>> So if state change happens before setting the traversal mark, the 
>> sweeper can go ahead and make it a zombie.
>>
>>
>> Makes sense? Or am I missing something?
>
> I have probably not fully digged the code. As far as I can see:
> - sweeper thread runs outside safepoint
> - VMThread (which is doing the nmethod marking in the case that I'm 
> looking at) runs while all other threads (incl. the sweeper) is 
> holding still.
>
> In between we have a guaranteed fence().
>
> There should be no need for a storestore() (at least in sweeper.cpp... 
> in nmethod.cpp it seems to actually make sense as you pointed out 
> above). *However* it doesn't really hurt to OrderAccess::storestore() 
> there... so play it conservative and leave it in, as RFR'd in my last 
> patch?

If we are going to have the OrderAccess::storestore() calls, then
we have to have a proper comment explaining why they are needed.

Unfortunately, the OrderAccess::storestore() call that was added
by anoll to src/share/vm/runtime/sweeper.cpp back in 2013 was not
properly documented and we're bumping into that with this review.

I'm not happy about this change:

+  ~ParallelSPCleanupThreadClosure() {
+    // This is here to be consistent with sweeper.cpp
NMethodSweeper::mark_active_nmethods().
+    // TODO: Is this really needed?
+    OrderAccess::storestore();
+  }

because we're adding an OrderAccess::storestore() to be consistent
with an OrderAccess::storestore() that's not properly documented
which is only increasing the technical debt.

So a couple of things above don't make sense to me:

 > - sweeper thread runs outside safepoint
 > - VMThread (which is doing the nmethod marking in the case that
 >   I'm looking at) runs while all other threads (incl. the sweeper)
 >   is holding still.

and:

 > There should be no need for a storestore() (at least in sweeper.cpp...

If the sweeper thread is running "outside safepoint", then how is
the sweeper thread "holding still" while the VMThread is doing the
nmethod marking? Those two points are contradictory.

If the sweeper thread is indeed executing outside a safepoint, then
a storestore() is needed for its memory changes to be seen by the
VMThread which is doing things in parallel. That means that the
comment that sweeper.cpp doesn't need the storestore() is also
contradictory.

So what do you mean by this comment:

 > - sweeper thread runs outside safepoint

and once we know that we can figure out the rest...

Dan

>
> Roman
>
>>
>> igor
>>
>>
>>>
>>> Thanks,
>>> Tobias
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170706/b127206b/attachment.htm>