Breadth-first vs. Depth-first

Tue Jun 10 18:22:24 UTC 2008

'Ergo writ large' proposes a repository that would contain history of
actual runs, so no explicit training runs would be necessary.  No
resources though.

Paul

Tony Printezis wrote:
> Martin,
>
> Martin Buchholz wrote:
>> Tony,
>>
>> Thanks for the reply.
>>
>> I continue to see parallels between code hotspots and data hotspots.
>> In both cases there are few opportunities to rectify previous bad
>> decisions. 
> Yes, but in the case of code generation, recompiling a method can be 
> (generally) done in a quite low-overhead manner. However, re-arranging 
> the location of objects is much much more intrusive.
>>  There is no way to declare that the initialization phase of
>> the application is over, and the "money phase" is starting.
>>   
> Maybe, there should be?
>> Identifying code hotspots pays off better, because code inlining
>> may give rise to further optimizations, unlike data colocation.
>>
>> I imagine that data colocation will increase in value over time,
>> with memory access becoming a more important part of
>> application performance.
>>   
> I totally agree with this.
>> I wonder why they y'all never seem to use a technique from the
>> AOT compiler world - provide a profiling mode to allow users
>> to run the app in for a "typical" run, then use that profiling
>> information in future runs.
>>   
> I've never been enamored with the idea, I have to say. I just don't 
> know how willing customers will be to do training runs. And remember 
> to redo the training runs when their code changes. And do the training 
> run with a realistic input ("Hey, your GC sucks. I did a training run 
> with a 1MB DB input and when I loaded 50GB, the application went 
> slower." Well, yeah....).
>
> Given that some of our customers are lurking here, I'd love to hear 
> from them on this, actually...
>
> Tony
>> Martin (definitely *not* a hotspot engineer)
>>
>> On Mon, Jun 9, 2008 at 8:51 AM, Tony Printezis 
>> <tony.printezis at sun.com> wrote:
>>  
>>> Martin,
>>>
>>> I'm personally skeptical on whether we should pursue this further. 
>>> There are
>>> a couple of reasons why.
>>>
>>> Such profiling could be expensive and I'm not convinced that we will 
>>> be able
>>> to reclaim back all the extra cost (the benefit we got from 
>>> depth-first was
>>> very dependent on the architecture; some architectures got a lot, 
>>> some got
>>> peanuts; the latter might be penalized, instead of benefit, due to 
>>> the extra
>>> profiling).
>>>
>>> Additionally, the benefit from depth-first really depends on the GC 
>>> used.
>>> For the GCs in HotSpot, the only opportunity we have to re-arrange 
>>> objects
>>> is when new objects are copied from the eden to the survivor spaces 
>>> and from
>>> the survivor spaces to the old generation. After that, you're stuck 
>>> with the
>>> order in which objects moved to the old generation, as the old 
>>> generation
>>> typically doesn't re-order objects (UseParallelGC and 
>>> UseParallelOldGC do
>>> sliding compaction and they maintain object order, CMS doesn't 
>>> typically do
>>> compaction; in fact, due to its in-place de-allocation policy, CMS 
>>> cannot
>>> even guarantee that two objects that are allocated one after the 
>>> other, will
>>> actually be located in the same order in the old gen).
>>>
>>> As an aside, Garbage-First:
>>>
>>> http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5419.pdf
>>>
>>> might have the edge over the other GCs in that respect, as it does 
>>> copying
>>> (not sliding) GC in the old generation too and it might have more
>>> opportunities to re-order objects.
>>>
>>> Now, think about the following (very realistic!) scenario. An 
>>> application
>>> has a non-trivial initialization phase and a lot of objects created 
>>> during
>>> it will survive the lifetime of that application. After the 
>>> initialization
>>> phase, the application goes into a steady-state phase (this is the 
>>> "request
>>> serving" phase). Now, you would think that the initialization and
>>> steady-state phases will have quite different access patterns (you 
>>> see where
>>> this is going, aren't you?). So, the objects created during the
>>> initialization phase will be copied according to the initialization 
>>> access
>>> pattern, which might or might not be what you want during the 
>>> steady-state
>>> phase. I think this is an example that shows that, in some (many?) 
>>> cases,
>>> when the decision would be made on which objects to cluster 
>>> together, it
>>> would be made prematurely.
>>>
>>> Now, do I think depth-first was a bad idea? Nope. I think it was a good
>>> idea, not because depth-first is really good, but instead because
>>> breadth-first was pathological in many many cases.
>>>
>>> Regarding, the GC times vs. mutator speed trade-off, I was actually
>>> expecting depth-first to have exactly this effect: slower GCs, but 
>>> faster
>>> mutator threads. Interestingly, in most tests I ran, GCs didn't slow 
>>> down
>>> much and, in some cases, they actually speeded up. I think this 
>>> could have
>>> been due to the new order also benefiting the GC too, as well as the
>>> mutators threads.
>>>
>>> My two cents,
>>>
>>> Tony, HS GC Group
>>>
>>> Martin Buchholz wrote:
>>>    
>>>> JavaOne 2008 technical session PDFs are now available
>>>>
>>>>
>>>> http://developers.sun.com/learning/javaoneonline/j1online.jsp?track=javase&yr=2008 
>>>>
>>>>
>>>> I enjoyed the discussion of breadh-first vs. depth-first GC copying
>>>> in the JavaOne talk
>>>> http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-6434.pdf
>>>>
>>>> I'd like to suggest that this is a fruitful avenue for investigation,
>>>> if hotspot engineers go further than simply switching whole-hog
>>>> to depth-first.
>>>>
>>>> Profiling can find reference fields that are rarely dereferenced,
>>>> and also ones that are likely to be dereferenced soon after the object
>>>> itself is accessed.
>>>>
>>>> The former can be sequestered by the next copying GC into
>>>> an unloved-object ghetto, while the latter can be stored contiguously
>>>> with its parent object.  Tricky to implement, and will certainly
>>>> make GC slower, but might make the runtime of the mutator win big.
>>>> And should be easier to implement than true object inlining.
>>>>
>>>> This ought to be very similar to the moving of dead or cold code
>>>> fragments inside a method away from the hot code, that hotspot
>>>> already does.
>>>>
>>>> Martin
>>>>
>>>>       
>>> -- 
>>> ----------------------------------------------------------------------
>>> | Tony Printezis, Staff Engineer    | Sun Microsystems Inc.          |
>>> |                                   | MS BUR02-311                   |
>>> | e-mail: tony.printezis at sun.com    | 35 Network Drive               |
>>> | office: +1 781 442 0998 (x20998)  | Burlington, MA01803-0902, USA  |
>>> ----------------------------------------------------------------------
>>> e-mail client: Thunderbird (Solaris)
>>>
>>>
>>>
>>>     
>