Request for review (M): 7016112 CMS: crash during promotion testing

Mon Jun 27 23:22:15 UTC 2011

Tom, Coleen and Ramki,

Thanks for the reviews! Here is an updated webrev that I hope includes 
fixes to all the comments that you had:

http://cr.openjdk.java.net/~brutisso/7016112/webrev.03/

The main change compared to the previous version is that I skipped the 
"middle layer sort" in objArrayOop. This was based on a comment from 
Ramki that this sort would not be safe for all collectors anyway. Better 
in that case to do the dirtying of cards directly in methodOop.

I also looked some more at performance. It was bugging me that the Linux 
implementation of stdlib::qsort was so much faster than my version. I 
picked up some of the optimization that the Linux stdlib::qsort does and 
this made my implementation quite a bit faster. Not just on Linux, but 
also on Windows and Solaris.

As always there is a trade-off between clean code and fast code. I 
basically tweaked quickSort.hpp in four ways:

1) Use insertion sort if the length of the array is less than 7
2) Use insertion sort if it appears that the array is almost sorted 
(partition did not have to do any calls to swap())
3) Added inline to all inner methods
4) Use more samples for calculating the pivot element if the length of 
the array is larger than 40

Of these, 4) is the one that has the biggest impact. But they all 
contribute in some extent to better performance.

Here is a webrev with these optimizations. It just contains the changes 
compared to the webrev above:

http://cr.openjdk.java.net/~brutisso/7016112/webrev.03-opt/

I would like to push the optimized version, but I would like to hear 
your thougths. If you think it is too ugly I am ok with pusing the 
non-optimized version.

I did some runs on Win x64, Linux x64 and Solaris x64 using my test 
class with 65535 methods. As you can see from the graphs that I attach, 
my implementation is still behind on Linux but not quite as bad as before.

The Linux and Windows runs are made on the same dual-booted computer. So 
they have identical hardware. It is interesting to see that both my 
implementations (non-optimized and optimized) take about the same time 
on Windows and Linux, but stdlib:qsort is much faster on Linux than on 
Windows.

The Y-axis in the graphs now has time in seconds.

I also ran RefWorkLoad a bit. I will run it some more, but so far I have 
not seen any significant differences there. I'll probably focus my runs 
on Linux to see if I can see any regressions.

Any thoughts on this? I would like to push this within the next few days 
since I will be going on vacation next week.

Thanks,
Bengt

On 2011-06-22 23:55, Tom Rodriguez wrote:
> On Jun 22, 2011, at 5:17 AM, Bengt Rutisson wrote:
>
>> Hi Tom,
>>
>> Thanks for the review!
>>
>>> Overall this looks fine to me.  quicksort should really be QuickSort since its a class name and normally we'd make it inherit from AllStatic to indicate that's just a collection of functions.  method names should be lower case with underscores though that isn't universally followed.
>> Good points. I am still getting used to the coding standard. I made the changes you proposed. Here is a new webrev:
>>
>> http://cr.openjdk.java.net/~brutisso/7016112/webrev.02/
>>
>> FYI: I changed the name of the files quicksort.cpp/hpp to quickSort.cpp/hpp. This is not trivial on Windows, since it does not care about case in file names. I got mercurial, Visual Studio and the file system to accept my change.  But for some reason webrev still thinks the files are named with all lower case. I'll make sure they are correctly named when I push.
> Ok.
>
>>> I assume we need to keep the idempotent stuff for the original reasons stated in that comment.  We could leave the bubble sort in place or just do some measurement to decide if it's really worth it.  It would be nice to get rid of it though.
>> Just to be clear; What I proposed was not to remove the idempotent sorting but to remove the parameter for it and always sort in an idempotent way. I have not seen any performance issues with always sorting with idempotent=true. On the other hand I am still a bit uncertain about performance so it might be best to keep it.
> Sorry I misunderstood.  I would probably keep it as is.  You could use an algorithm which is stable in first place instead but that opens a whole other can of worms.
>
> tom
>
>> Bengt
>>
>>> tom
>>>
>>> On Jun 21, 2011, at 5:50 AM, Bengt Rutisson wrote:
>>>
>>>> Hi Runtime and GC,
>>>>
>>>> Sending this review request to both groups. I fixed a GC bug, but the changes are in runtime code.
>>>>
>>>> The bug that I fixed is this one:
>>>> 7016112 CMS: crash during promotion testing
>>>> http://monaco.sfbay.sun.com/detail.jsf?cr=7016112
>>>>
>>>> And here is the webrev:
>>>> http://cr.openjdk.java.net/~brutisso/7016112/webrev.01/
>>>>
>>>> The investigation to find the root of the crashes reported in 7016112 was quite lengthy. It is not that easy to read the CR and figure out what is going on, so here is some background:
>>>>
>>>> When we load classes we store references to the methods of a class in an object array. When we have loaded all methods we sort the object array to allow binary search within the array. To do this sort we use stdlib::qsort(), which is the standard library quicksort implementation.
>>>>
>>>> If we are using CMS we might be doing concurrent marking while we are sorting the object array. The object array can be found by the concurrent marking code and it may start iterating over the array while we are sorting it. The problem is that on Windows the stdlib::qsort() is not implemented to do atomic updates of elements in the array that it sorts. Instead it does a byte-by-byte swap when it needs to swap two values. That is an easy way to implement different element widths, but it means that at some point in time one element may contain a few bytes from the element above or below. If this happens at the same time as the marking code is trying to read that element, we will be marking some random address and not the method that was supposed to be marked.
>>>>
>>>> On Solaris and Linux the stdlib::qsort() implementations try to swap as wide data types as possible so this issue should not occur there. On the other hand we have no guarantees that this will always be how stdlib::qsort() is implemented.
>>>>
>>>> After some discussions about different ways of solving this we came to the conclusion that the simplest way is to implement our own quicksort that operates on the correct pointer width (oop or narrowOop).
>>>>
>>>> So, this is what I have done to fix this bug.
>>>>
>>>> Also, it is likely that this problem will go away when the perm gen removal project is finished. Right now it looks like we will not be tracing and marking methods at all after that change.
>>>>
>>>> * Questions *
>>>>
>>>> - Should we keep the bubble sort that is done before calling quicksort in methodOopDesc::sort_methods() ?
>>>>
>>>> - Should we keep the idempotent option or should we try to always use idempotent sorting (see performance test below)?
>>>>
>>>> - What is the best way to handle unit tests? I added a flag called ExecuteInternalVMTests to run unit tests. This is in line with the existing ErrorHandlerTest flag. My thought is that we can use this same flag for other unit tests than just the quicksort tests. Would be good if we could get these tests executed by JPRT as well. I simply run these with "java -XX:+ExecuteInternalVMTests -version".
>>>>
>>>>
>>>> * Testing *
>>>>
>>>> Did the obvious testing: Ran JPRT with the changes in the webrev and ran the failing nsk test from the bug (nsk/sysdict/vm/stress/jck12a/sysdictj12a008) repeatedly for sevaral days without failing.
>>>>
>>>> I created some unit tests for the quicksort implementation and they all pass.
>>>>
>>>> I also made a build that sorts both with my own quicksort and with stdlib::qsort and then compares that the arrays have the same sort order. Ran this through JPRT and it seems to work on all platforms. That run also included the unit tests. If anybody wants to see how this testing was done, there is a separate webrev for that build. The interesting code is in methodOop.cpp:
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/7016112/webrev-verify-sorting/
>>>>
>>>> * Performance *
>>>>
>>>> I am a bit unsure how to get any relevant performance data for this change. What I have done so far is to create a class that has 65535 methods (which is the maximum - the length of the method array is u2) and I have measured how long it takes to sort this method array. The methods have random names but every hundredth method has 4 overloaded version. This makes sure that there are some duplicates in the array.
>>>>
>>>> For now I have run this on my Windows x64 work station with 4 cpus and on a Solaris Sparc machine with 2 cpus (uname says: SunOS sthsparc24 5.10 Generic_139555-08 sun4us sparc FJSV,GPUZC-M Solaris).
>>>>
>>>> I am attaching graphs for the results. The Y-axis has time in nano seconds. Judging by this my quicksort is a bit faster on Windows and a bit slower on Solaris. But the interesting thing is that the idempotent version is faster than the default behavior on Windows and on par with the default on Solaris. I assume that this is due to the fact that some stores can be avoided if we don't do swap of duplicates. This means that (at least on Windows) swap is more expensive than compare. If this is true we should probably remove the special treatment of idempotent.
>>>>
>>>> I could run this on more machines, but I am not sure how relevant this type of data is. Most method arrays will be much shorter and compared to reading the class from a file the sort will be in the noise.
>>>>
>>>> Long email...I hope I covered most of the issues here. Let me know if you have any questions.
>>>>
>>>> Thanks,
>>>> Bengt
>>>> <win_x64_quicksort-perf.png><sparc_quicksort-perf.png>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: solarisx64.png
Type: image/png
Size: 16422 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110628/14479ce3/solarisx64.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winx64.png
Type: image/png
Size: 25386 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110628/14479ce3/winx64.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: linuxx64.png
Type: image/png
Size: 16601 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110628/14479ce3/linuxx64.png>