RFR: JDK-8086056: ParNew: auto-tune ParGCCardsPerStrideChunk

Tony Printezis tprintezis at twitter.com
Thu Jun 25 13:29:17 UTC 2015


Bengt,

Thanks for looking at this again. See inline.

On June 25, 2015 at 4:06:40 AM, Bengt Rutisson (bengt.rutisson at oracle.com) wrote:


Hi Tony,

No worries about the delay, I'm also quite busy with other things here. ;)

On 2015-06-24 20:41, Tony Printezis wrote:
New webrev that covers all the suggestions:

http://cr.openjdk.java.net/~tonyp/8086056/webrev.1/

Thanks for providing a new webrev!

Some comments:

parNewGeneration.cpp

The flag validation code on lines 886-912 can be replaced by the recently added support for command line verification. See JEP-245:

JEP 245: Validate JVM Command-Line Flag Arguments
http://openjdk.java.net/jeps/245
http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/5bbf25472731

You can simply specify a constraint function in the definition of the flags in globals.hpp.


Thanks for pointing me to this. Given the flags are manageable, do the constraints fire when the flags are updated dynamically? If they do, what happens if the contrain fails?




I have a very hard time following all the type conversions going on in adjust_cards_per_stride().

The capacity flags are size_t, the DynamicParGCStrides[Min/Max]Size flags are uintx, we then convert the result through intptr_t, 


(to my defense!) I had to convert to intptr_t to use the log2_intptr() function. And I assume the stride size values will be relatively small (i.e., maybe several K). So, casting to intptr_t should be OK. But I do like to contrain things as much as possible. Should we add an artificial upper bound just in case?



int and uintx to eventually assign back to ParGCCardsPerStrideChunk as intx.


So, I was going for the capacities being size_t and the stride sizes being uintx. ParGCCardsPerStrideChunk is intx and I thought that uintx would be more appropriate as it cannot be negative. I suggested to also make ParGCCardsPerStrideChunk uintx but was I encouraged against it. To keep the calculations simpler I’ll cast everything to size_t locally and use log2_long() instead of log2_intrptr(). 

What do I do with the types of the args? Let’s go for consistency: if you guys want me to leave ParGCCardsPerStrideChunk as intx, I’ll leave everything as intx. My preference would be to change everything to uintx.



 The value of ParGCCardsPerStrideChunk is then used in the code to offset pointers of the jbyte* type. How can we be sure that this works correctly?


I’d feel better if it was uintx. I’ll also check what happens if the arg is not aligned properly.




globalDefinitions.hpp

I don't think that globalDefinitions.hpp is the right place to put linearly_interpolate_bounded(). I would prefer to have it as a local method in parNewGeneration.cpp. Or if you think it is actually going to be used more widely I think it should go somewhere in the utilities/ directory.


Well, it is a general-purpose method, if you want to do such an interpolation. Whether it will be needed by anything else, I don’t know. If you don’t think I should package it up in this general-purpose way, I’ll just go back to what I had before and not factor it out, as the ParNewGeneration class would not be the right place for it either.


Does it really make sense to use different types for X and Y in linearly_interpolate_bounded()? 


I think it does. The two ranges are separate and don’t have to be of the same type. In this case, you can have the strides size as, say, ushort and the capacity as size_t and it’d work. Note that the ratio is calculated based on one range and then applied to the other range so I think the calculation should be safe for separate types.



I would think that one type is enough. Also, it does not look safe to just cast the type to double for a general purpose template method.


Why (and how else would you do this)? This should be safe for numetic types, which X and Y will be.

Tony




Thanks,
Bengt


(and I also found a version of hg that works OK with webrev; thanks to Jon Masa for the help)

I’m not really happy with the way I deal with the 32-bit case. Better suggestions are welcome. Also, I’m not attached to what the cmd line args are called. Better suggestions on that are welcome too.

Tony

On June 23, 2015 at 12:19:12 PM, Tony Printezis (tprintezis at twitter.com) wrote:

Hi Bengt,

I’ll try factoring out the interpolation and tell me what you think. Sorry, I’m a bit behind on this as I’ve been busy with a couple of more urgent tasks hee. I’ll get this done shortly.

Tony

On June 22, 2015 at 3:36:10 AM, Bengt Rutisson (bengt.rutisson at oracle.com) wrote:


Hi Tony,

I agree with your comments. Thanks for fixing this!

One answer at the end of this message...

On 2015-06-18 15:50, Tony Printezis wrote:
Hi Bengt,

Thanks for looking at it. Inline.

On June 18, 2015 at 3:02:45 AM, Bengt Rutisson (bengt.rutisson at oracle.com) wrote:


Hi Tony,

NIce to hear from you!


On 18/06/15 00:30, Tony Printezis wrote:
Hi all,

A small patch for your consideration:

http://cr.openjdk.java.net/~tonyp/8086056/webrev.0/

A couple of style comments:

 891   const size_t MinOldGenCapacity =      G;
 892   const size_t MaxOldGenCapacity = 16 * G;
 893   assert(MinOldGenCapacity <= MaxOldGenCapacity, "sanity");

Local variables are normmaly named using lower case and underscore. 


Yeah, I of course usually do that. But as they are constants I thought I’d make them look like constants. :-)



So, min_old_gen_capacity and max_old_gen_capacity. Same for the rest of the local variables in ParNewGeneration::adjust_cards_per_stride().

Having said that, I think it would be good to turn some of these constants into flags (which means that the naming should be as it is now ;) ). 


Hang on! Are you and Masa encourating me to add 5 (!!!) more cmd line args to HotSpot? ;-)



Especially since you say that you haven't done much performance testing of these values.


To be clear: I did performance testing with chunks sizes in the 256-8k range and with three old gen sizes (2g, 10g, and 20g) and auto-tuning comes pretty close to optimal; I have numbers on the JIRA. But, yes, that was only with a couple of (synthetic) tests and only on one architecture (Linux/x64). If someone there could try this on a couple more apps and maybe on sparc, as we don’t happen to have any sparc boxes lying around :-), it’d be helpful.

But, yes, your suggestion of making the chunk size / old gen capacity limits into cmd line args definitely makes sense and I had also considered that myself. But I had decided against it as I thought you wouldn’t want more cmd line args. :-) How about:

+UseDynamicParGCStrides

DynamicParGCStrides{Min,Max}OldGenCapacity

DynamicParGCStrides{Min,Max}Size

I should also make them all manageable.



 I would prefer that we could adjust them at runtime. Best would of course be if you could take the time to do the performance testing.


So, as I said earlier, I did. The numbers are on the JIRA. Is there anything else I could maybe run? Maybe larger old gen than 20g? (but that’s reaching the memory limits of my workstation, FWIW)




I realize that the assert on line 893 might have been useful if you were playing around with the numbers when you were experimenting with the patch. But now it looks mostly like line noise to me. Same with the assert on line 898.


Actually, it as basically stating the invariant for those parameters (in case someone wants to play around with them). But if we make the constants into cmd line args this will go away...




I would also consider factoring the actual computation out into a separate helper method.


You mean the interpolation? Like:

(base_min, base_max, value_min, value_max, value) -> result in range [base_min, base_max]?


Yes, that's what I meant. But if you, as suggested above, get rid of most (all?) of the constant definitions in adjust_cards_per_stride() then I guess it is mostly only the interpolation left. So, in that case it may not be necessary to factor it out. I would have to see a new webrev to have more comments.

Thanks,
Bengt

Tony




I also second Jon's comment on that it would be good with a more explicit way of turning this feature off.

Thanks,
Bengt


(BTW, for some reason some of the webrev output is a bit messed up. Not sure why, maybe some hg incompatibility I guess. The diffs look OK though. I also attached the patch to this e-mail.)

There’s a bit of info on the JIRA on the rationale for the patch:

https://bugs.openjdk.java.net/browse/JDK-8086056

The min / max values for the old gen capacity and ParGCCardsPerStrideChunk were chosen empircally after running a few (mostly synthetic tests) on Linux x64. If someone has the cycles to do a more extensive performance study, I’d be happy to revise them accordingly.

Regards,

Tony

-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com







-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com


-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com

-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com







-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150625/21ed3a7f/attachment.htm>


More information about the hotspot-gc-dev mailing list