Pls review 7091418: FX priority class from Solaris should be available to JVM )

Tue Jan 24 12:16:27 PST 2012

I've filed

7132966: Allow mapping of Java thread priorities to OS scheduling classes as
                 well as priorities

to track your recommendation.

Paul

On 1/24/12 2:55 PM, Paul Hohensee wrote:
> Thanks for the review.
>
> Inline...
>
> On 1/23/12 11:53 PM, David Holmes wrote:
>> Hi Paul,
>>
>> Warning: long winded response below
>
> Looking forward to it. :)
>
>>
>> Summary:
>> - file RFE to fully support "system" priority for VM internal threads
>> - Fully implement UseCriticalJavaThreadPriority on non-Solaris
>>
>> You'll have to read on to understand what I mean by both of these :)
>>
>> On 24/01/2012 8:00 AM, Paul Hohensee wrote:
>>> I just found
>>>
>>> 7082553: Interpret Thread.setPriority(Thread.MAX_PRIORITY) to mean FX60
>>> on Solaris 10 and 11
>>>
>>> which seems like the CR I should actually be using.
>>>
>>> Ref your last comment on 7082553, we could indeed change the definition
>>> of JavaPriority10_To_OSPriority, but that would cover only Java 
>>> threads,
>>> not compiler or the CMS background threads.
>>
>> Thanks for reminding me about this CR - I knew there had been further 
>> discussion on this. That CR is not public, which seems a little 
>> pointless now.
>
> Yes.  I've made 7082553 public and make 7091418 a dup of 7082553.
>
>>
>> I'll restate my concerns as outlined in that CR. It is very difficult 
>> for an application programmer to determine that FX60 will provide any 
>> performance benefit, whether assigned to a compiler thread, GC thread 
>> or Java thread. The benefits of FX60 only exist when a system has 
>> idle cores, so the application programmer and/or the application 
>> "deployer" has to know what overall load will exist on the system. 
>> For the GC and compiler threads this is not such an issue as the 
>> person deploying the app can simply experiment with the flag being on 
>> and off and see it it helps in their situation.
>
> Undoubtedly true that only experts should be using it.  There are 
> significant
> Oracle applications running on T4 that contain serial bottlenecks, so 
> by definition
> there are idle cores available.
>
>>
>> For Java threads however the application programmer has to use 
>> Thread.MAX_PRIORITY. This has two negative consequences. First it 
>> encourages people to start thinking about Thread priorities when we 
>> have spent a dozen years telling them to forget about them because in 
>> 99% of the cases they have no meaning. For them to have meaning you 
>> have to select non default ThreadPriorityPolicy settings and use the 
>> right OS - even then you get different affects on different OSes. 
>> Which leads me to the second consequence: a loss of portability. 
>> Using Thread.MAX_PRIORITY can have different consequences on 
>> different platforms, even today.
>
> That's true.  Imo, the correct solution is to invent a QOS API at a 
> higher abstraction
> level, e.g., something like Thread.prefer(SYSTEM_THROUGHPUT) or
> Thread.prefer(FAST_RESPONSE_TIME).  We wanted to get something useful 
> sooner
> than that though, hence this RFE.  Note that it's very much 
> experimental, which
> is why it's use is gated by -XX:+UseExperimentalVMOptions.  There's no
> guarantee at all that we'll continue to support these switches.
>
>>
>> Ok - so the response to the above is "These are experimental options 
>> which must be unlocked, and its for advanced users and 'caveat 
>> emptor'". I can live with that, so lets look at the details of the 
>> implementation.
>
> Ah, didn't need to write the above. :)
>
>>
>> There are really two distinct parts to this. First the ability to 
>> define a special system thread priority. In that regard I like the 
>> idea of creating a pseudo-priority of 11 and using that to map to 
>> this system, aka critical, priority (though this could be better 
>> documented). You then have flags the say "use the system priority for 
>> threads of type X". This is cross-platform. There are two things I 
>> find problematic with this though:
>>
>> 1. As priority is simply a number we can't directly use this to 
>> implement FX60. Instead we have to put in a hack where a negative 
>> value on Solaris tells the system specific code "that means FX 60".
>
> Yes.  I could make things a little better by having a negative number 
> mean
> "FX" and to use the positive equivalent directly (i.e., not scale 
> it).  An alternative
> would be to add a second mapping array (or make an array of structs) that
> contains the desired scheduling class.  This seemed like overkill to me
> given the limited goal of this change.
>
>> 2. There is no mechanism, apart from recompiling the VM, to actually 
>> control what this system/critical priority is.
>
> True.
>
>>
>> So this general purpose mechanism falls somewhat short of being 
>> generally useful as a cross-platform mechanism. I find both of these 
>> unsatisfactory and a RFE should be filed to address them.
>
> It's intended to be Solaris-only right now, which is why I didn't 
> implement the
> full-blown solution on non-Solaris platforms.  Rather I made it 
> incidental to
> an understandable Solaris implementation, esp. the common code parts
> in compileBroker.cpp and concurrentMarkSweepThread.cpp.
>
> I'll file an RFE along the lines of being able to specify a scheduling 
> class
> and priority for each Java thread priority.  Either extend 
> JavaPriority<n>_To_OSPriority
> to take a CSV list argument (which is coming to the command line 
> parser at
> some point anyway), or add 10 more switches to specify the scheduling 
> class.
>
>>
>> The second part is the "map Thread.MAX_PRIORITY to the critical 
>> priority" part. On Solaris you use UseCriticalJavaThreadPriority to 
>> actually update the priority mapping 10 -> critical. But you don't do 
>> this on the other platforms. This is what I object to - if you are 
>> going to make this a cross-platform setting then it should be 
>> implemented fully on all platforms. It doesn't matter that by default 
>> the end result is the same, the code should be there so that if you 
>> could define the critical priority at runtime things would work as 
>> expected.
>
> I didn't do it on non-Solaris platforms (at least not in the code) 
> because on those
> platforms CriticalPriority is the same as MaxPriority.  No runtime 
> mapping needed.
> For compiler threads and the CMS background thread on non-Solaris 
> platforms,
> using critical priority will up their priority, which is new.
>
> I don't know what other equivalent (i.e., scheduling classes) to use 
> on non-Solaris
> platforms.  Can you recommend something so I can put it in an RFE?  I 
> hesitate to
> do more than I have because we're at the end of the development cycle 
> for 7u4/hs23.
>
> Paul
>
>>
>> Cheers,
>> David
>> -----
>>
>>>
>>> Paul
>>>
>>> On 1/23/12 2:11 PM, Paul Hohensee wrote:
>>>> Thanks for the review.
>>>>
>>>> Inline...
>>>>
>>>> On 1/22/12 7:39 PM, David Holmes wrote:
>>>>> Hi Paul,
>>>>>
>>>>> The meta-comment here is that there needs to be a clear description
>>>>> of what "critical priority" means and what constraints there are on
>>>>> setting it to some OS specific value. For example the current changes
>>>>> uses the FX scheduling class, but what if someone used the RT
>>>>> scheduling class instead? Would that work? Probably not, in which
>>>>> case we should document that this selection of the "critical
>>>>> priority" is not an arbitrary choice that can be made.
>>>>>
>>>>> Even for FX/60 I'm not certain that using this for Java threads might
>>>>> not prevent safepoints from being reached or induce some other form
>>>>> of livelock.
>>>>
>>>> I added material to the Comments field of the CR.
>>>>
>>>> I don't think there's a livelock problem with Java threads, because
>>>> Solaris takes
>>>> FX60 as advisory, not as a command. All that should happen is that a
>>>> critical
>>>> priority Java thread will get to the safepoint earlier than
>>>> non-critical ones.
>>>> I suppose it's possible for critical priority CMS or compiler threads
>>>> to starve
>>>> non-critical Java threads, but they run at NearMaxPriority by default
>>>> now,
>>>> which can do the same thing. This is definitely an "expert-only" 
>>>> feature
>>>> though, which is why it's experimental for the time being.
>>>>>
>>>>> On 21/01/2012 3:13 AM, Paul Hohensee wrote:
>>>>>> Webrev here
>>>>>>
>>>>>> http://cr.openjdk.java.net/~phh/7091418.00/
>>>>>>
>>>>>> This change defines a new Java pseudo-priority called
>>>>>> CriticalPriority, just above MaxPriority. Compiler threads, the CMS
>>>>>> background thread, and Java threads can have the os equivalent of
>>>>>> this priority. On Solaris, this is the FX/60 scheduling
>>>>>> class/priority. On other platforms, it's the same as 
>>>>>> MaxPriority's os
>>>>>> priority.
>>>>>
>>>>> For reference this is why the mapping to FX/60 has been proposed:
>>>>>
>>>>> http://blogs.oracle.com/observatory/entry/critical_threads_optimization 
>>>>>
>>>>>
>>>>> I still don't fully grok what this optimization does in a general
>>>>> sense and it seems to be geared to providing better single-threaded
>>>>> performance on near-idle systems - which doesn't make any sense to me
>>>>> in a JVM context. But FX/60 also gives you true priority over TS/IA
>>>>> threads so that may be where the gain comes from. I wonder if any
>>>>> experiments were actually done using FX/59 rather than the "magical"
>>>>> FX/60?
>>>>
>>>> It's meant to be Solaris-Sparc-specific, but it was easier to
>>>> implement as a
>>>> general feature than to specialize it. Given enough cores, FX60 does
>>>> indeed
>>>> give you true priority over TS/IA threads. If there aren't enough 
>>>> cores
>>>> to run both critical threads in single-thread mode and non-critical
>>>> threads
>>>> at the same time, Solaris will allow non-critical threads to run on 
>>>> the
>>>> same core(s) as critical ones.
>>>>
>>>> I don't know of any FX59 experiments, but given the amount of work
>>>> it's taken
>>>> for the Solaris folks to get FX60 working, I doubt using it would have
>>>> any positive
>>>> effect.
>>>>>
>>>>>> There are 3 new command line switches, all gated by
>>>>>> UseExperimentalVMOptions.
>>>>>>
>>>>>> -XX:+UseCriticalJavaThreadPriority
>>>>>>
>>>>>> Maps Java MAX_PRIORITY to critical priority.
>>>>>
>>>>> I found what you have done here to be very confusing. The only place
>>>>> UseCriticalJavaThreadPriority is used is on Solaris. There you re-map
>>>>> the priority mapping for priority 10 to the "critical priority" as
>>>>> described.
>>>>
>>>> It's actually used on the other OSs. It just maps to MaxPriority on
>>>> those.
>>>>
>>>>>
>>>>> On all platforms you added an entry to the priority mapping table(s)
>>>>> for a non-existent Java priority 11. This provides a way to lookup
>>>>> the "critical priority" for the CMS/Compiler threads - in essence use
>>>>> of critical priority for those threads says "pretend these have Java
>>>>> priority 11" and then you've added a mapping for a priority 11 that
>>>>> is the same as for priority 10 except on Solaris. On Solaris you had
>>>>> to use a sentinel value to say "this really means use the "critical
>>>>> priority" because there is no way to convey a change of scheduling
>>>>> class.
>>>>>
>>>>> It seems to me that we are pretending to have "critical priority"
>>>>> support on all platforms when in reality we don't. If we want to go
>>>>> that way then we should extend it to the
>>>>> UseCriticalJavaThreadPriority case as well. It should be all or 
>>>>> nothing.
>>>>
>>>> Extend it beyond making CriticalPriority == MaxPriority on non-Solaris
>>>> platforms?
>>>> I.e., we can now change the compiler and CMS thread priority to
>>>> MaxPriority on
>>>> non-Solaris platforms. I don't know how to make CriticalPriority
>>>> higher than that
>>>> on non-Solaris platforms.
>>>>
>>>>>
>>>>> Further it needs to be made clear that these may still be dependent
>>>>> on the value of ThreadPriorityPolicy.
>>>>
>>>> I added a comment to the CR to that effect.
>>>>
>>>>>
>>>>>> -XX:+UseCriticalCompilerThreadPriority
>>>>>>
>>>>>> All compiler threads run at critical priority.
>>>>>
>>>>> It should be more clear that UseCriticalCompilerThreadPriority only
>>>>> applies if CompilerThreadPriority is not set. Perhaps there should
>>>>> also be a startup check for both being used?
>>>>
>>>> I could, but making CompilerThreadPriority rule is what I intended. 
>>>> I'll
>>>> add a comment to globals.hpp and the CR.
>>>>
>>>>>
>>>>> Thinking more though we really shouldn't need both flags. The basic
>>>>> problem is that the current "api" only supports setting a simple
>>>>> number and to use FX/60 also requires a change of scheduling class.
>>>>> You could add a hack that CompilerThreadPriority=60 means FX/60. Or,
>>>>> as I've suggested in past email we could generalize the format of the
>>>>> option to allow both a scheduling class designator and priority to be
>>>>> passed - that would be a more general mechanism.
>>>>
>>>> I didn't want to remove CompilerThreadPriority or change it's effect.
>>>> I can file a CR
>>>> to do that though. Current uses of CompilerThreadPriority=60 should
>>>> work like
>>>> they always have.
>>>>
>>>> I wanted to confine the change as much as possible to Solaris _and_ to
>>>> limit it
>>>> to just scheduling classes where we know we're not likely to provoke
>>>> thread
>>>> starvation. I can file a CR to add the ability to specify a scheduling
>>>> class for
>>>> Java threads. It would probably add 10 switches for scheduling class
>>>> corresponding
>>>> to the existing 10 Java priority switches. I don't have any ideas on
>>>> how to
>>>> designate particular threads for particular class/priorities.
>>>>
>>>>>
>>>>> Adding a psuedo-priority 11 is just means to work within the current
>>>>> limitations of the priority scheme.
>>>>
>>>> Correct.
>>>>
>>>>>
>>>>>> -XX:+UseCriticalCMSThreadPriority
>>>>>>
>>>>>> The CMS background thread runs at critical priority.
>>>>>
>>>>> This doesn't make a lot of sense when you consider the comments in
>>>>>
>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp 
>>>>>
>>>>>
>>>>>
>>>>> which still states:
>>>>>
>>>>> "Priority should be just less than that of VMThread"
>>>>>
>>>>> This seems to indicate that we don't really understand what the
>>>>> priority relationship between GC threads and the VMThread should be.
>>>>
>>>> No, we don't. That's why this is experimental.
>>>>
>>>>>
>>>>> Should we be able to run the VMThread at FX/60?
>>>>
>>>> Perhaps. It only matters for things like serial gc, which isn't used
>>>> on big iron.
>>>>
>>>>>
>>>>>> On Solaris, one must in addition use -XX:+UseThreadPriorities to use
>>>>>> native
>>>>>> priorities at all. Otherwise, Hotspot just accepts whatever Solaris
>>>>>> decides.
>>>>>
>>>>> Is it also dependent on the value of ThreadPriorityPolicy? Should it
>>>>> be? Does it make sense to use it with either policy value?
>>>>
>>>> No, it's not dependent on ThreadPriorityPolicy. Critical priority is
>>>> the same
>>>> no matter what the default MaxPriority java_to_os_priority is. I think
>>>> that's
>>>> the right thing to do.
>>>>
>>>>>
>>>>>>
>>>>>> Before this change, the Solaris implementation could only change
>>>>>> priorities
>>>>>> within the process scheduling class. It didn't change scheduling
>>>>>> classes on
>>>>>> a per-thread basis. I added that capability and used it for the
>>>>>> critical
>>>>>> thread
>>>>>> work. I also fixed a bug where we were using thr_setprio() to 
>>>>>> save the
>>>>>> original native priority during thread creation and reading it back
>>>>>> when
>>>>>> the thread started via thr_getprio(). Since thr_setprio() can change
>>>>>> the
>>>>>> user-supplied priority, this resulted in an unintended (lower) 
>>>>>> priority
>>>>>> being used.
>>>>>
>>>>> I don't quite follow this. We used thr_setprio to set the native OS
>>>>> priority, and we then read it back using thr_getprio and then used
>>>>> that to pass to thr_setprio again (and also set_lwp_priority). If
>>>>> thr_setprio can change the user-supplied priority then it can make
>>>>> that change on the second call too can't it? Does the fact we now
>>>>> have a lwp affect this? I'm curious about the fact we still both use
>>>>> thr_setprio and set the LWP priority directly ???
>>>>
>>>> Possibly someone like Dave Dice can answer that question. We were
>>>> already using
>>>> both thr_setprio and set_lwp_priority together. Likely that was in
>>>> case set_lwp_priority
>>>> wasn't available.
>>>>
>>>> thr_setprio takes a value between 0 and 127 and map that to "some
>>>> priority" that
>>>> may not be the same as its argument. You can, for example, pass it 127
>>>> and
>>>> get 60 back from thr_getprio. So if we set it once with 127 and then
>>>> set it again
>>>> with 60, we can ultimately get back 0. Which is what actually used to
>>>> happen.
>>>>
>>>> Paul
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Paul
>>>>>>
>>>>