Request for review (M): 7016112 CMS: crash during promotion testing

Thu Jun 23 05:49:22 UTC 2011

Coleen,

Thanks for the review!

On 2011-06-23 03:46, Coleen Phillimore wrote:
> http://cr.openjdk.java.net/~brutisso/7016112/webrev.01/src/share/vm/utilities/quicksort.cpp.html  <http://cr.openjdk.java.net/%7Ebrutisso/7016112/webrev.01/src/share/vm/utilities/quicksort.cpp.html>
>
>    46   bool aIsOdd = a % 2 == 1;
>    47   bool bIsOdd = b % 2 == 1;
> Can you add () around the bit operations?  Since the order precedence 
> of these operators is always surprising (at least to me).

Yes, I'll do that.

> I like the way you set the pattern for the internal testing 
> framework.  Thanks for getting this started.

Thanks, it was really useful to have the unit tests when I made this fix.

> When we move these methodOops out of permgen, we might want to go back 
> to stdlib::quicksort.    It would be less code for us to maintain.  
> Otherwise, this looks pretty straightforward and I like this fix.

I agree that we should get back to stdlib::quicksort when it is safe to 
use it again.

Thanks,
Bengt

>
> Thanks,
> Coleen
>
>
> On 6/21/2011 5:32 PM, Bengt Rutisson wrote:
>>
>> Hi again,
>>
>> For completeness. Here is the graph for sorting maximum length arrays 
>> on Linux x64 (run on my laptop). These runs show that my 
>> implementation takes twice as long as the stdlib version. I am not 
>> happy about that, but I don't know how much effort it is worth to 
>> optimize for this case.
>>
>> Bengt
>>
>>
>> On 2011-06-21 14:50, Bengt Rutisson wrote:
>>>
>>> Hi Runtime and GC,
>>>
>>> Sending this review request to both groups. I fixed a GC bug, but 
>>> the changes are in runtime code.
>>>
>>> The bug that I fixed is this one:
>>> 7016112 CMS: crash during promotion testing
>>> http://monaco.sfbay.sun.com/detail.jsf?cr=7016112
>>>
>>> And here is the webrev:
>>> http://cr.openjdk.java.net/~brutisso/7016112/webrev.01/
>>>
>>> The investigation to find the root of the crashes reported in 
>>> 7016112 was quite lengthy. It is not that easy to read the CR and 
>>> figure out what is going on, so here is some background:
>>>
>>> When we load classes we store references to the methods of a class 
>>> in an object array. When we have loaded all methods we sort the 
>>> object array to allow binary search within the array. To do this 
>>> sort we use stdlib::qsort(), which is the standard library quicksort 
>>> implementation.
>>>
>>> If we are using CMS we might be doing concurrent marking while we 
>>> are sorting the object array. The object array can be found by the 
>>> concurrent marking code and it may start iterating over the array 
>>> while we are sorting it. The problem is that on Windows the 
>>> stdlib::qsort() is not implemented to do atomic updates of elements 
>>> in the array that it sorts. Instead it does a byte-by-byte swap when 
>>> it needs to swap two values. That is an easy way to implement 
>>> different element widths, but it means that at some point in time 
>>> one element may contain a few bytes from the element above or below. 
>>> If this happens at the same time as the marking code is trying to 
>>> read that element, we will be marking some random address and not 
>>> the method that was supposed to be marked.
>>>
>>> On Solaris and Linux the stdlib::qsort() implementations try to swap 
>>> as wide data types as possible so this issue should not occur there. 
>>> On the other hand we have no guarantees that this will always be how 
>>> stdlib::qsort() is implemented.
>>>
>>> After some discussions about different ways of solving this we came 
>>> to the conclusion that the simplest way is to implement our own 
>>> quicksort that operates on the correct pointer width (oop or 
>>> narrowOop).
>>>
>>> So, this is what I have done to fix this bug.
>>>
>>> Also, it is likely that this problem will go away when the perm gen 
>>> removal project is finished. Right now it looks like we will not be 
>>> tracing and marking methods at all after that change.
>>>
>>> * Questions *
>>>
>>> - Should we keep the bubble sort that is done before calling 
>>> quicksort in methodOopDesc::sort_methods() ?
>>>
>>> - Should we keep the idempotent option or should we try to always 
>>> use idempotent sorting (see performance test below)?
>>>
>>> - What is the best way to handle unit tests? I added a flag called 
>>> ExecuteInternalVMTests to run unit tests. This is in line with the 
>>> existing ErrorHandlerTest flag. My thought is that we can use this 
>>> same flag for other unit tests than just the quicksort tests. Would 
>>> be good if we could get these tests executed by JPRT as well. I 
>>> simply run these with "java -XX:+ExecuteInternalVMTests -version".
>>>
>>>
>>> * Testing *
>>>
>>> Did the obvious testing: Ran JPRT with the changes in the webrev and 
>>> ran the failing nsk test from the bug 
>>> (nsk/sysdict/vm/stress/jck12a/sysdictj12a008) repeatedly for sevaral 
>>> days without failing.
>>>
>>> I created some unit tests for the quicksort implementation and they 
>>> all pass.
>>>
>>> I also made a build that sorts both with my own quicksort and with 
>>> stdlib::qsort and then compares that the arrays have the same sort 
>>> order. Ran this through JPRT and it seems to work on all platforms. 
>>> That run also included the unit tests. If anybody wants to see how 
>>> this testing was done, there is a separate webrev for that build. 
>>> The interesting code is in methodOop.cpp:
>>>
>>> http://cr.openjdk.java.net/~brutisso/7016112/webrev-verify-sorting/
>>>
>>> * Performance *
>>>
>>> I am a bit unsure how to get any relevant performance data for this 
>>> change. What I have done so far is to create a class that has 65535 
>>> methods (which is the maximum - the length of the method array is 
>>> u2) and I have measured how long it takes to sort this method array. 
>>> The methods have random names but every hundredth method has 4 
>>> overloaded version. This makes sure that there are some duplicates 
>>> in the array.
>>>
>>> For now I have run this on my Windows x64 work station with 4 cpus 
>>> and on a Solaris Sparc machine with 2 cpus (uname says: SunOS 
>>> sthsparc24 5.10 Generic_139555-08 sun4us sparc FJSV,GPUZC-M Solaris).
>>>
>>> I am attaching graphs for the results. The Y-axis has time in nano 
>>> seconds. Judging by this my quicksort is a bit faster on Windows and 
>>> a bit slower on Solaris. But the interesting thing is that the 
>>> idempotent version is faster than the default behavior on Windows 
>>> and on par with the default on Solaris. I assume that this is due to 
>>> the fact that some stores can be avoided if we don't do swap of 
>>> duplicates. This means that (at least on Windows) swap is more 
>>> expensive than compare. If this is true we should probably remove 
>>> the special treatment of idempotent.
>>>
>>> I could run this on more machines, but I am not sure how relevant 
>>> this type of data is. Most method arrays will be much shorter and 
>>> compared to reading the class from a file the sort will be in the 
>>> noise.
>>>
>>> Long email...I hope I covered most of the issues here. Let me know 
>>> if you have any questions.
>>>
>>> Thanks,
>>> Bengt
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110623/3d18fd0c/attachment.htm>