RFR (XS): 8033922: G1: Back out 8033601 and go back to use the to-obj for chunked arrays.
Bengt Rutisson
bengt.rutisson at oracle.com
Fri Feb 7 20:27:41 UTC 2014
Hi Tony,
On 2014-02-07 19:42, Tony Printezis wrote:
> Hi guys,
>
> First, apologies for this. :-) We have done a bunch of G1 testing here
> with said fix (it's been in our repo for a few months) and we also
> tested it with production loads without any failures. But of course we
> don't have all your tests, so clearly we can't reach your level of
> coverage. FWIW, I'm glad you caught the problem since we know that we
> have to come up with a fix (at least, for our repo).
No problem. Running with VerifyBeforeGC and VerifyAfterGC caught the
issue on my local machine. Might be a way for you to reproduce it if you
want to investigate this further.
>
> I agree with Bengt here, best approach right now is to undo the fix
> and I'll work on coming up with an alternative.
Good :) I already pushed the patch to revert the change.
> I'll post some suggestions on the CR.
Great! Thanks!
Bengt
>
> Tony
>
> On 2/7/14, 8:24 AM, Bengt Rutisson wrote:
>>
>> Hi Thomas,
>>
>> Thanks for the quick review!
>>
>> Yes, there are ways to fix it, but I don't think it is worth the
>> effort since the original fix was just a fix to make G1 look more
>> like the other collectors. It was not a correctness or performance fix.
>>
>> Bengt
>>
>> On 2/7/14 1:51 PM, Thomas Schatzl wrote:
>>> Hi,
>>>
>>> On Fri, 2014-02-07 at 13:29 +0100, Bengt Rutisson wrote:
>>>> Hi all,
>>>>
>>>> Can I have a couple of reviews for backing out the changes for
>>>> 8033601.
>>>> Those changes cause verification failures in our nightly testing.
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/8033922/webrev.00/
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8033922
>>>>
>>>> Background from the CR:
>>>>
>>>> The fix for 8033601 changed the code for chunked arrays to use the
>>>> from
>>>> obj length as index rather than the to object.
>>>>
>>>> This was nice since it made G1 look more like the other collectors.
>>>>
>>>> However, another difference in G1 is that it writes the forward
>>>> pointer
>>>> before it writes the to-object. That means that there are times
>>>> when an
>>>> object is forwarded, but there is no to-object image that can be
>>>> trusted.
>>>>
>>>> One such place is
>>>>
>>>> void G1ParCopyClosure<barrier, do_mark_object>
>>>> ::do_oop_work(T* p)
>>>>
>>>> We can find that the object is forwarded, but it may not have been
>>>> marked yet. So, we end up calling mark_forwarded_object() which needs
>>>> the size to call _cm->grayRoot(to_obj, (size_t) from_obj->size(),
>>>> _worker_id);
>>>>
>>>> mark_forwarded_object() has a comment that tries to describe this
>>>> situation:
>>>>
>>>> // The object might be in the process of being copied by another
>>>> // worker so we cannot trust that its to-space image is
>>>> // well-formed. So we have to read its size from its from-space
>>>> // image which we know should not be changing.
>>>> _cm->grayRoot(to_obj, (size_t) from_obj->size(), _worker_id);
>>>>
>>>> So, just as the comment say, we can not trust the to-space image.
>>>> However, after the change for 8033601 we can not trust the from-space
>>>> image either.
>>> I.e. the problem is that there is a small time window where another
>>> thread might read a wrong length field.
>>>
>>>> This seems like a show-stopper for the suggested change for 8033601 so
>>>> we need to back it out.
>>> An alternative would be to write/copy the object header (with the
>>> length
>>> field) in the to_obj before forwarding (the atomic forwarding should
>>> take care of memory ordering issues).
>>>
>>> This seems to be too much hassle though, as it has at least one
>>> disadvantage: it widens the opportunity for multiple threads working on
>>> the same object (potentially more undo).
>>>
>>> Another alternative could be to not continue with the rest of the body
>>> of G1ParCopyClosure<barrier, do_mark_object>::do_oop_work() if a thread
>>> in copy_to_survivor_space() looses the race for installing the forward
>>> pointer.
>>>
>>> So I agree with undoing this change at least for now unless somebody
>>> (Tony :)) comes up with a clearly better working solution.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>
>
More information about the hotspot-gc-dev
mailing list