RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime

Jiangli Zhou jianglizhou at google.com
Fri Jun 12 02:10:39 UTC 2020


Correction. The Before and After in the previous email were labeled wrong.

After
------
             82.37 msec task-clock:u              #    1.568 CPUs
utilized            ( +-  0.35% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             4,020      page-faults:u             #    0.049 M/sec
               ( +-  0.08% )
        93,940,085      cycles:u                  #    1.140 GHz
               ( +-  0.17% )
        89,093,125      instructions:u            #    0.95  insn per
cycle           ( +-  0.08% )
        17,585,478      branches:u                #  213.484 M/sec
               ( +-  0.09% )
           643,748      branch-misses:u           #    3.66% of all
branches          ( +-  0.13% )

          0.052522 +- 0.000244 seconds time elapsed  ( +-  0.47% )

Before
------
             82.73 msec task-clock:u              #    1.559 CPUs
utilized            ( +-  0.38% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             4,007      page-faults:u             #    0.048 M/sec
               ( +-  0.09% )
        96,329,114      cycles:u                  #    1.164 GHz
               ( +-  0.21% )
        89,985,873      instructions:u            #    0.93  insn per
cycle           ( +-  0.08% )
        17,770,854      branches:u                #  214.814 M/sec
               ( +-  0.08% )
           644,142      branch-misses:u           #    3.62% of all
branches          ( +-  0.12% )
          0.053056 +- 0.000263 seconds time elapsed  ( +-  0.49% )

On Thu, Jun 11, 2020 at 6:35 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> Hi Serguei, Coleen, David and Ioi,
>
> Thank you for all the responses!
>
> Sorry for the delay. I found time today to collect data with a JVMTI
> agent enabled.
>
> To have a more controlled measurement, I created a micro benchmark
> with an agent that registers a callback to handle the ClassPrepare
> event. The agent code is the same as the libSimpleClassPrepare.c
> (included in the webrev). The main app is a simple HelloWorld app.
> Following are the Before and After (with the lastest code). The saving
> for each class is a constant, and is proportional to the number of
> classes with early linked state.
>
> Before
> ---------
>              82.37 msec task-clock:u              #    1.568 CPUs
> utilized            ( +-  0.35% )
>                  0      context-switches:u        #    0.000 K/sec
>                  0      cpu-migrations:u          #    0.000 K/sec
>              4,020      page-faults:u             #    0.049 M/sec
>                ( +-  0.08% )
>         93,940,085      cycles:u                  #    1.140 GHz
>                ( +-  0.17% )
>         89,093,125      instructions:u            #    0.95  insn per
> cycle           ( +-  0.08% )
>         17,585,478      branches:u                #  213.484 M/sec
>                ( +-  0.09% )
>            643,748      branch-misses:u           #    3.66% of all
> branches          ( +-  0.13% )
>
>           0.052522 +- 0.000244 seconds time elapsed  ( +-  0.47% )
>
> After
> ------
>              82.73 msec task-clock:u              #    1.559 CPUs
> utilized            ( +-  0.38% )
>                  0      context-switches:u        #    0.000 K/sec
>                  0      cpu-migrations:u          #    0.000 K/sec
>              4,007      page-faults:u             #    0.048 M/sec
>                ( +-  0.09% )
>         96,329,114      cycles:u                  #    1.164 GHz
>                ( +-  0.21% )
>         89,985,873      instructions:u            #    0.93  insn per
> cycle           ( +-  0.08% )
>         17,770,854      branches:u                #  214.814 M/sec
>                ( +-  0.08% )
>            644,142      branch-misses:u           #    3.62% of all
> branches          ( +-  0.12% )
>           0.053056 +- 0.000263 seconds time elapsed  ( +-  0.49% )
>
> David, thanks for the details on the potential event race. It makes
> sense to me. Coleen thanks for pointing to Erik's new binding changes.
> I will find more details from Erik on if it is workable to have early
> link states with his changes, and put the update for the current
> change temporarily on hold again before that can be worked out.
>
> Thanks again to everyone's feedback!
>
> Best,
> Jiangli
>
> On Thu, Jun 11, 2020 at 2:10 AM serguei.spitsyn at oracle.com
> <serguei.spitsyn at oracle.com> wrote:
> >
> > Hi Jiangli,
> >
> > I'm sorry for being that late to the party.
> > I had a problem to follow all the details in this email thread discussion.
> >
> > It is hard to notice race issues from simple webrev reading.
> > So, thanks a lot to Ioi and David for catching it.
> > As I get from the review comments this fix is not mature enough and more
> > work and discussions are necessary.
> > I'll try to better track this discussion in the future.
> >
> > Thanks,
> > Serguei
> >
> >
> > On 6/9/20 05:43, coleen.phillimore at oracle.com wrote:
> > > (Posting on the right thread and list now...)
> > >
> > > On 6/9/20 2:26 AM, David Holmes wrote:
> > >> Hi Jiangli,
> > >>
> > >> >    http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/
> > >>
> > >> I'm having trouble keeping track of all the issues, so let me walk
> > >> through the changes as I see them:
> > >>
> > >> - InstanceKlass::restore_unshareable_info
> > >>
> > >> For boot loader classes, when no verification is enabled, we mark the
> > >> class as linked immediately. By doing this in
> > >> restore_unshareable_info there are no races (as the class is not
> > >> exposed to anyone yet) and it allows later checks for is_linked to be
> > >> by-passed (under the assumption that the class and its supertypes
> > >> truly are in a state that appears linked). However, this doesn't
> > >> generate the JVM TI class prepare event, and we can't do it here as
> > >> that would introduce a number of potential issues with JVM TI.
> > >>
> > >> I see in the bug report some metrics from HelloWorld, but really this
> > >> needs to be backed up by a lot more performance measurements to
> > >> establish this is actually a worthwhile optimisation.
> > >>
> > >> - SystemDictionary::define_instance_class
> > >>
> > >> This is where we catch up with the JVM TI requirements and
> > >> immediately after posting the class load event we post the class
> > >> prepare event.
> > >>
> > >> As we have discussed, this earlier posting of the event is observable
> > >> to a JVMTI agent and although permitted by the specification it is a
> > >> change in behaviour that might impact existing agents.
> > >>
> > >> Ioi has raised an issue about there being a race here with the
> > >> potential for the event being delivered multiple times. I agree this
> > >> code is not adequate:
> > >>
> > >> 1718   if (k->is_shared() && k->is_linked()) {
> > >>
> > >> You only want to fire the event for exactly those classes that you
> > >> pre-linked, so at a minimum this has to be restricted to boot classes
> > >> only. Even then as Ioi points out once the class is exported to the
> > >> SystemDictionary and visibly seen to be loaded, then other threads
> > >> may race to link it and so have already posted the class prepare
> > >> event. In normal linking this race is avoided by the use of the
> > >> init_lock to check the linked state, do the linking and issue the
> > >> class prepare event, atomically. But your approach cannot do this as
> > >> it stands, you would need to add an additional flag to track whether
> > >> the prepare event had already be issued.
> > >>
> > >
> > > Thanks to Ioi and David for seeing this race.  As I looked at the
> > > change, it looked fairly simple and almost straightforward, but very
> > > scary how these changes interact in such surprising ways. Without this
> > > careful review, these changes cause endless work later on.  The area
> > > of class loading and our code for doing so has all sorts of subtle
> > > details that are hard to reason about.  I wish this weren't so and we
> > > can have code that we're not afraid of.
> > >
> > > The CSR is a nice writeup but I didn't see the race from the CSR either.
> > >
> > > We need to take the opportunity to look at this from the top down in a
> > > project like Leyden.
> > >
> > > There are still some opportunities to speed up class loading in the
> > > context of CDS and finding places that we can simplify, but this was
> > > alarmingly not simple.  I'm grateful to Ioi and David for doing this
> > > work, and yours, for thorougly discussing this change.
> > >
> > > Thanks,
> > > Coleen
> > >> ---
> > >>
> > >> So the change as it stands is incomplete, and introduces a
> > >> behavioural change to JVM TI, and the benefits of it have not been
> > >> clearly established.
> > >>
> > >> The JBS issue states this is a first step towards pre-initialization
> > >> and other optimisations, and it is certainly a pre-requisite to
> > >> pre-link before you can pre-initialize, but I don't think pulling out
> > >> pre-linking as a separate optimisation is really a worthwhile first
> > >> step. I have grave reservations about the ability to pre-initialize
> > >> in general and those issues have to be fleshed out in a project like
> > >> Leyden. Further, as Coleen points out this pre-linking optimisation
> > >> is incompatible with proposed vtable changes. Additionally, this
> > >> seems it will be incompatible with changes proposed in Valhalla, as
> > >> additional link-time actions will be needed that can't be done at the
> > >> time of restore_unshareable_info.
> > >>
> > >> Bottom line for me is that I just don't think this change is worth
> > >> pursuing as a stand-alone optimisation at this time. Sorry.
> > >>
> > >> Cheers,
> > >> David
> > >> -----
> > >>
> > >> On 5/06/2020 8:14 am, Jiangli Zhou wrote:
> > >>> Hi David,
> > >>>
> > >>> On Wed, Jun 3, 2020 at 9:59 PM David Holmes
> > >>> <david.holmes at oracle.com> wrote:
> > >>>>
> > >>>> Ioi pointed out that my proposal was incomplete and that it would need
> > >>>> to be more like:
> > >>>>
> > >>>> if (is_shared() &&
> > >>>>       JvmtiExport::should_post_class_prepare() &&
> > >>>>       !BytecodeVerificationLocal &&
> > >>>>       loader_data->is_the_null_class_loader_data()) {
> > >>>>       Handle h_init_lock(THREAD, init_lock());
> > >>>>       ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL);
> > >>>>       set_init_state(linked);
> > >>>>       >>> call JVMTI
> > >>>>       return true;
> > >>>>     }
> > >>>>
> > >>>> This alleviates any concerns about behavioural changes to JVM TI, and
> > >>>> also allows JVM TI enabled code to partially benefit from the
> > >>>> pre-linking optimisation.
> > >>>>
> > >>>> Otherwise I agree with Ioi that any behaviour change to JVM TI
> > >>>> needs to
> > >>>> be justified by significant performance gains.
> > >>>>
> > >>>
> > >>> Thanks a lot for the input and suggestion! Locking the init_lock for
> > >>> the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is
> > >>> normally posted before the ClassPrepare. That's why the change was
> > >>> made in systemDictionary.cpp instead of within
> > >>> InstanceKlass::restore_unshareable_info() function, to keep the same
> > >>> events ordering for any given class. I added the 'init_lock' locking
> > >>> code for post_class_prepare(), and kept the code in
> > >>> systemDictionary.cpp in webreve.03 below.  Not changing the JVMTI
> > >>> events ordering feels safer to me. Would the following be ok to
> > >>> everyone?
> > >>>
> > >>>    http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/
> > >>>
> > >>> I also changed the InstanceKlass::restore_unshareable_info() to set
> > >>> _init_state via set_init_state API as you suggested. We can get away
> > >>> without locking the init_lock for setting the flag itself.
> > >>>
> > >>> Best regards,
> > >>>
> > >>> Jiangli
> > >>>
> > >>>
> > >>>> David
> > >>>> -----
> > >>>>
> > >>>> On 4/06/2020 8:42 am, David Holmes wrote:
> > >>>>> Correction ...
> > >>>>>
> > >>>>> On 3/06/2020 5:19 pm, David Holmes wrote:
> > >>>>>> On 3/06/2020 3:44 pm, Ioi Lam wrote:
> > >>>>>>> On 6/2/20 10:16 PM, David Holmes wrote:
> > >>>>>>>> Hi Ioi,
> > >>>>>>>>
> > >>>>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 5/27/20 11:13 PM, David Holmes wrote:
> > >>>>>>>>>> Hi Jiangli,
> > >>>>>>>>>>
> > >>>>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> I was going to take the suggestion, but realized that it
> > >>>>>>>>>>>> would add
> > >>>>>>>>>>>> unnecessary complications for archived boot classes with class
> > >>>>>>>>>>>> pre-initialization support. Some agents may set
> > >>>>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to
> > >>>>>>>>>>>> support
> > >>>>>>>>>>>> class pre-init uniformly for archived boot classes with
> > >>>>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled.
> > >>>>>>>>>>>
> > >>>>>>>>>>> This would introduce behavioral changes when JVMTI is enabled:
> > >>>>>>>>>>>
> > >>>>>>>>>>> + The order of JvmtiExport::post_class_prepare is different
> > >>>>>>>>>>> than
> > >>>>>>>>>>> before
> > >>>>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class
> > >>>>>>>>>>> that
> > >>>>>>>>>>> was not called before (if the class is never linked during
> > >>>>>>>>>>> run time)
> > >>>>>>>>>>> + JvmtiExport::post_class_prepare was called inside the
> > >>>>>>>>>>> init_lock, now it's called outside of the init_lock
> > >>>>>>>>>>
> > >>>>>>>>>> I have to say I share Ioi's concerns here. This change will
> > >>>>>>>>>> impact
> > >>>>>>>>>> JVM TI agents in a way we can't be sure of. From a specification
> > >>>>>>>>>> perspective I think we are fine as linking can be lazy or eager,
> > >>>>>>>>>> so there's no implied order either. But this would be a
> > >>>>>>>>>> behavioural change that will be observable by agents. (I'm less
> > >>>>>>>>>> concerned about the init_lock situation as it seems potentially
> > >>>>>>>>>> buggy to me to call out to an agent with the init_lock held
> > >>>>>>>>>> in the
> > >>>>>>>>>> first place! I find it hard to imagine an agent only working
> > >>>>>>>>>> correctly if the init_lock is held.)
> > >>>>>>>>>
> > >>>>>>>>> David,
> > >>>>>>>>>
> > >>>>>>>>> The init_lock has a serializing effect. The callback for a
> > >>>>>>>>> subclass
> > >>>>>>>>> will not be executed until the callback for its super class has
> > >>>>>>>>> been finished.
> > >>>>>>>>
> > >>>>>>>> Sorry I don't see that is the case. The init_lock for the subclass
> > >>>>>>>> is distinct from the init_lock of the superclass, and linking of
> > >>>>>>>> subclasses and superclasses is independent.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> In InstanceKlass::link_class_impl, you first link all of your super
> > >>>>>>> classes.
> > >>>>>>>
> > >>>>>>> If another thread is already linking your super class, you will
> > >>>>>>> block
> > >>>>>>> on that superclass's init_lock.
> > >>>>>>
> > >>>>>> The point is that there is already a race in terms of the
> > >>>>>> execution of
> > >>>>>> the two callbacks. So while this change can certainly produce a
> > >>>>>> different result to what would previously be seen, such a result is
> > >>>>>> already possible in the general case.
> > >>>>>>
> > >>>>>>> Of course, I may be wrong and my analysis may be bogus. But I hope
> > >>>>>>> you can appreciate that this is not going to be a trivial change to
> > >>>>>>> analyze.
> > >>>>>>
> > >>>>>> Yes I agree. While in general ordering of the class_prepare
> > >>>>>> callbacks
> > >>>>>> is not guaranteed for independent classes, if a given application
> > >>>>>> explicitly loads and links classes in a known order then it can
> > >>>>>> (reasonably) expect its callbacks to execute in that order. If this
> > >>>>>> change means classes will now be linked in an order independent of
> > >>>>>> what the normal runtime order would be then that could be a problem
> > >>>>>> for existing agents.
> > >>>>>>
> > >>>>>> So where does this leave us? The change is within spec, but could
> > >>>>>> trigger changes in agent behaviour that we can't really evaluate
> > >>>>>> a-priori. So as you say we should have a fairly good reason for
> > >>>>>> doing
> > >>>>>> this. I can easily envisage that pre-linking when no callbacks are
> > >>>>>> enabled would be a performance boost. But with callbacks enabled and
> > >>>>>> consuming CPU cycles any benefit from pre-linking could be lost
> > >>>>>> in the
> > >>>>>> noise.
> > >>>>>>
> > >>>>>> What if we did as Ioi suggested and only set the class as linked in
> > >>>>>> restore_unshareable_info if
> > >>>>>> !JvmtiExport::should_post_class_prepare();
> > >>>>>> and in addition in InstanceKlass::link_class_imp we added an
> > >>>>>> additional check at the start:
> > >>>>>>
> > >>>>>> // Pre-linking at load time may have been disabled for shared
> > >>>>>> classes,
> > >>>>>> // but we may be able to do it now.
> > >>>>>> if (JvmtiExport::should_post_class_prepare() &&
> > >>>>>>       !BytecodeVerificationLocal &&
> > >>>>>>       loader_data->is_the_null_class_loader_data()) {
> > >>>>>>     _init_state = linked;
> > >>>>>> }
> > >>>>>
> > >>>>> There should obviously be a check for is_shared() in there as well.
> > >>>>>
> > >>>>> David
> > >>>>> -----
> > >>>>>
> > >>>>>> ?
> > >>>>>>
> > >>>>>> That avoids the problem of changing the JVM TI callback
> > >>>>>> behaviour, but
> > >>>>>> also shortens the link time path when the callbacks are enabled.
> > >>>>>>
> > >>>>>> Hope I got that right. :)
> > >>>>>>
> > >>>>>> David
> > >>>>>> -----
> > >>>>>>
> > >>>>>>> Thanks
> > >>>>>>> - Ioi
> > >>>>>>>>
> > >>>>>>>> David
> > >>>>>>>> -----
> > >>>>>>>>
> > >>>>>>>>> With the proposed patch, the callback for both the super class
> > >>>>>>>>> and
> > >>>>>>>>> subclass can proceed in parallel. So if an agent performs class
> > >>>>>>>>> hierarchy analysis, for example, it may need to perform extra
> > >>>>>>>>> synchronization.
> > >>>>>>>>>
> > >>>>>>>>> This is just one example that I can think of. I am sure there are
> > >>>>>>>>> other issues that we have not thought about.
> > >>>>>>>>>
> > >>>>>>>>> The fact is we are dealing with arbitrary code in the callbacks,
> > >>>>>>>>> and we are changing the conditions of how they are called. The
> > >>>>>>>>> calls happen inside very delicate code (class loading, system
> > >>>>>>>>> dictionary). I am reluctant to do the due diligence, which is
> > >>>>>>>>> substantial, of verifying that this is a safe change, unless we
> > >>>>>>>>> have a really compelling reason to do so.
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>> - Ioi
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >
> >


More information about the serviceability-dev mailing list