String Deduplication in JEP192

Per Liden per.liden at oracle.com
Mon Mar 3 09:50:01 UTC 2014


Hi,

I think Charlie answered most or all questions already, just an 
additional comment below.

On 2014-03-01 14:59, charlie hunt wrote:
> On 03/01/2014 07:45 AM, Kirk Pepperdine wrote:
>> On Mar 1, 2014, at 2:42 PM, charlie hunt <charlie.hunt at oracle.com> 
>> wrote:
>>
>>> Hi Kirk,
>>>
>>> Thanks for the good questions. :-)
>>>
>>> At the risk of jumping in ahead of the JEP authors and reviewers ...
>>>
>>>> where did you get the statistics from?
>>> 900+ profiles
>> Customer apps I presume?
> Yep
>>> If you or others have a repository of profiles, (if can) please 
>>> share your observations.
>> Sorry but I generally don’t keep customers code. ;-)
> Understood
>>>> If the weak generational hypothesis holds will you not be spending 
>>>> more time deduping soon to be garbage which potentially would place 
>>>> more load on the GC threads?
>>> As mentioned in the JEP, you can manipulate the age at which a 
>>> String becomes a candidate with -XX:StringDeduplicationAgeThreshold.
>> Missed that one.. I guess an obvious threshold would be a promotion 
>> to tenured????
> A reasonable place to start. ;-)  As you know one's goals, i.e. 
> throughput, latency or footprint will drive you to the best setting 
> assuming a representative workload and monitoring production behavior.

It would have been nice to have the deduplication age threshold 
automatically follow the tenuring threshold. However, there are some 
technical details here which makes this problematic. To avoid inspecting 
the same String more than once we want to be able to quickly filter out 
Strings which have already been inspected. We want to do this cheaply as 
it's in the hot path (i.e. we don't want to look it up in the dedup 
table). With a fixed deduplication threshold we can do this by simply 
looking at the String's age and the type of region it's in. The tenuring 
threshold is dynamic and recalculated for each GC, if this was also used 
as deduplication threshold we wouldn't be able to cheaply tell if a 
String has already been inspected or not. I've prototyped different 
approaches where the tenuring threshold was used as dedup threshold, but 
in the end they all become less attractive options.

/Per

>>
>> I guess that only partially addresses the cost issues.
>
> Agreed ... no replacement for measurements (on representative data). :-)
>
> charlie
>>
>> — Kirk
>>
>




More information about the hotspot-gc-dev mailing list