<div dir="ltr"><div><div>Thanks, Thomas!</div><div><br></div><div>I will refine my code carefully according to your suggestions as soon as possible.</div><div><br></div><div>Since I'm new in contributing to OpenJDK, actually I'm not sure how to upload when my new code is ready?</div><div> Just upload an attach of diffs as I did, or some other ways?</div><div><br></div><div>Besides, I'm currently processing my signed OCA for the committer rights, </div><div>I'm not sure how long it will take, does it matter?</div><div><br></div><div>As for the VM Thread, it does a serial adjust_roots() within the current thread (VM thread) during PSParallelCompact::invoke_no_policy().</div><div> I think psCompactionManager.cpp:117 and the surrounding comments confirm the use of VMThread in full GC.</div><div>Do you agree with my above understanding of the use of VMThread?</div></div><div><br></div><div><br></div><div><div>Sincerely,</div><div>  Tianyang</div><div><div class="gmail_extra"><br><div class="gmail_quote">2015-12-14 22:48 GMT+08:00 Thomas Schatzl <span dir="ltr"><<a href="mailto:thomas.schatzl@oracle.com" target="_blank">thomas.schatzl@oracle.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi Tianyang,<br>

<span><br>

On Mon, 2015-12-14 at 15:55 +0800, ray alex wrote:<br>

> Hi all,<br>

><br>

> Can I have reviews for the following enhancement?<br>

><br>

<br>

</span>Thanks for contributing :) Here is a quick review.<br>

<br>

Note that I am typically not working on Parallel GC, so I am not very<br>

proficient in this code and did not take the time to look into it in<br>

more detail yet. Also, I did not do any performance testing, so I am<br>

mostly criticizing coding style :)<br>

<br>

- please remove obvious debug code, like most of the code in<br>

psParallelCompact.cpp. Also, for whatever reason the TraceTime<br>

constructor gets passed a print_phases() method that does not exist.<br>

<br>

- I am not sure why the VM thread needs to have the cached<br>

live-words-in-range information. For that matter, I think that putting<br>

new members into the Thread class is a no-go because these members are<br>

only used by the GC threads, penalizing everyone else. I think the<br>

correct way of putting GC task thread specific data is adding these<br>

members to the GCTaskThread class.<br>

<br>

Since for worker threads Thread::current() is a GCTaskThread(), you can<br>

still use it to retrieve the values (and add a<br>

Thread::as_gctask_thread() method like as_worker_thread() to retrieve<br>

it).<br>

<br>

Also, please create appropriate getters and/or setters for the<br>

variables, taking into account below comments.<br>

<br>

- in the new ParMarkBitMap::live_words_in_range() method, please do not<br>

use l-value references. Hotspot code does not use them, and it does not<br>

seem worth to change this "do as the surrounding code does" rule.<br>

<br>

Following comments are for ParMarkBitMap::live_words_in_range():<br>

<br>

- I do not see a point in opening extra scopes for straight-line code<br>

(like in parMarkBitMap.cpp:161)<br>

<br>

- the second part of the new live_words_in_range() looks like a verbatim<br>

copy of (the now) live_words_in_range_helper(). Please refactor the code<br>

appropriately.<br>

<br>

E.g. I would prefer if the code looked like the following (actual naming<br>

up to debate, I think you will find better names):<br>

<br>

live_words_in_range() {<br>

  if (live_words_in_range_in_cache()) {<br>

    return cached-values();<br>

  }<br>

  return live_words_in_range_helper();<br>

}<br>

<br>

or something similar. That would be a lot better to read and maintain<br>

instead of the copy&paste.<br>

<br>

- use of uint64_t for casting addresses/comparisons is wrong and just<br>

unnecessary type conversion on 32 bit. The type the language provides<br>

for this purpose is (u)intptr_t.<br>

<br>

Also, you probably want to use the pointer_delta function to calculate<br>

pointer deltas without overflow.<br>

<span><br>

> Description:<br>

> The change addresses the inefficiency in the live_words_in_range()<br>

> routine in ParallelScavenge Full GC,<br>

> which generally costs most of the time during the compacting phase.<br>

><br>

>Our idea records last query information of live_words_in_range as well<br>

>as the query result.<br>

> Our approach reuses last query result so that can reduce much<br>

> computation of bitmap searching.<br>

<br>

</span>:)<br>

<span><br>

><br>

> Patch:  as attached.<br>

><br>

> Testing:<br>

> Manual testing of full GC intensive applications including JOlden,<br>

> Dacapo, SPECjvm2008 benchmarks.<br>

><br>

> Results show that our enhancement can reduce much full GC time by<br>

> 20~50%, depending on application scenarios.<br>

<br>

</span>I am not sure if JOlden (benchmark from '95), or Dacapo (from '09) is<br>

representative of today's workloads (or specjvm2008), but the idea<br>

sounds timeless :)<br>

<br>

I will run (a slightly modified version, see pointer_delta() comment<br>

above) through some basic testing, and then some benchmarks.<br>

<span><br>

> Lei Tianyang<br>

> Institute of Parallel and Distributed System (IPADS)<br>

> Shanghai Jiao Tong University<br>

<br>

</span>Another procedural comment, you do not seem to have signed an Oracle's<br>

Contributor agreement. We need that to be able to take your patch,<br>

unless you are covered by some companies' OCA (but then we would need<br>

you to use an email address from that company I think).<br>

<br>

Please see <a href="http://www.oracle.com/technetwork/community/oca-486395.html#l" rel="noreferrer" target="_blank">http://www.oracle.com/technetwork/community/oca-486395.html#l</a><br>

for more information.<br>

<br>

Thanks,<br>

  Thomas<br>

<br>

><br>

<br>

<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Lei Tianyang<div><br></div></div>

</div></div></div></div>