Proxy.isProxyClass scalability

Wed Apr 17 14:18:50 UTC 2013

Hi Mandy,

Here's the updated webrev:

https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/index.html

This adds TwoLevelWeakCache to the scene with following performance 
compared to other alternatives:

Summary (4 Cores x 2 Threads i7 CPU):

Test                     Threads  ns/op Original Patch(CL field)  
FlattenedWeakCache  TwoLevelWeakCache
=======================  =======  ============== ===============  
==================  =================
Proxy_getProxyClass            1 2,403.27           163.70              
206.88             252.89
                                4 3,039.01           202.77              
303.38             327.62
                                8 5,193.58           314.47              
442.58             510.63

Proxy_isProxyClassTrue         1 95.02            10.78               
41.85              42.03
                                4 2,266.29            
10.80               42.32              42.07
                                8 4,782.29            
20.53               72.29              69.25

Proxy_isProxyClassFalse        1 95.02             1.36                
1.36               1.36
                                4 2,186.59             
1.36                1.37               1.40
                                8 4,891.15             
2.72                2.94               2.72

Annotation_equals              1 240.10           152.29              
193.27             200.45
                                4 1,864.06           153.81              
195.60             202.45
                                8 8,639.20           262.09              
384.72             338.70

As expected, the Proxy.getProxyClass() is yet a little slower than with 
FlattenedWeakCache, but still much faster than original Proxy 
implementation. Another lookup in the ConcurrentHashMap and another 
indirection have a price, but we also get something in return - space.

This is all obtained on latest lambda build (with new segment-less 
ConcurrentHashMap). I also added another ClassLoader to see what happens 
when the 2nd is added to the cache:

# Original Proxy, 32 bit addressing

class     proxy     size of   delta to
loaders   classes   caches    prev.ln.
--------  --------  --------  --------
        0         0       400       400
        1         1       768       368
        1         2       920       152
        1         3      1072       152
        1         4      1224       152
        1         5      1376       152
        1         6      1528       152
        1         7      1680       152
        1         8      1832       152
        1         9      1984       152
        1        10      2136       152
        2        11      2456       320
        2        12      2672       216
        2        13      2824       152
        2        14      2976       152
        2        15      3128       152
        2        16      3280       152
        2        17      3432       152
        2        18      3584       152
        2        19      3736       152
        2        20      3888       152

# Original Proxy, 64 bit addressing

class     proxy     size of   delta to
loaders   classes   caches    prev.ln.
--------  --------  --------  --------
        0         0       632       632
        1         1      1216       584
        1         2      1448       232
        1         3      1680       232
        1         4      1912       232
        1         5      2144       232
        1         6      2376       232
        1         7      2608       232
        1         8      2840       232
        1         9      3072       232
        1        10      3304       232
        2        11      3832       528
        2        12      4192       360
        2        13      4424       232
        2        14      4656       232
        2        15      4888       232
        2        16      5120       232
        2        17      5352       232
        2        18      5584       232
        2        19      5816       232
        2        20      6048       232

# Patched Proxy (FlattenedWeakCache), 32 bit addressing

class     proxy     size of   delta to
loaders   classes   caches    prev.ln.
--------  --------  --------  --------
        0         0       240       240
        1         1       584       344
        1         2       768       184
        1         3       952       184
        1         4      1136       184
        1         5      1320       184
        1         6      1504       184
        1         7      1688       184
        1         8      1872       184
        1         9      2056       184
        1        10      2240       184
        2        11      2424       184
        2        12      2736       312
        2        13      2920       184
        2        14      3104       184
        2        15      3288       184
        2        16      3472       184
        2        17      3656       184
        2        18      3840       184
        2        19      4024       184
        2        20      4208       184

# Patched Proxy (FlattenedWeakCache), 64 bit addressing

class     proxy     size of   delta to
loaders   classes   caches    prev.ln.
--------  --------  --------  --------
        0         0       336       336
        1         1       920       584
        1         2      1200       280
        1         3      1480       280
        1         4      1760       280
        1         5      2040       280
        1         6      2320       280
        1         7      2600       280
        1         8      2880       280
        1         9      3160       280
        1        10      3440       280
        2        11      3720       280
        2        12      4256       536
        2        13      4536       280
        2        14      4816       280
        2        15      5096       280
        2        16      5376       280
        2        17      5656       280
        2        18      5936       280
        2        19      6216       280
        2        20      6496       280

# Patched Proxy (TwoLevelWeakCache), 32 bit addressing

class     proxy     size of   delta to
loaders   classes   caches    prev.ln.
--------  --------  --------  --------
        0         0       240       240
        1         1       752       512
        1         2       896       144
        1         3      1040       144
        1         4      1184       144
        1         5      1328       144
        1         6      1472       144
        1         7      1616       144
        1         8      1760       144
        1         9      1904       144
        1        10      2048       144
        2        11      2400       352
        2        12      2608       208
        2        13      2752       144
        2        14      2896       144
        2        15      3040       144
        2        16      3184       144
        2        17      3328       144
        2        18      3472       144
        2        19      3616       144
        2        20      3760       144

# Patched Proxy (TwoLevelWeakCache), 64 bit addressing

class     proxy     size of   delta to
loaders   classes   caches    prev.ln.
--------  --------  --------  --------
        0         0       336       336
        1         1      1216       880
        1         2      1440       224
        1         3      1664       224
        1         4      1888       224
        1         5      2112       224
        1         6      2336       224
        1         7      2560       224
        1         8      2784       224
        1         9      3008       224
        1        10      3232       224
        2        11      3808       576
        2        12      4160       352
        2        13      4384       224
        2        14      4608       224
        2        15      4832       224
        2        16      5056       224
        2        17      5280       224
        2        18      5504       224
        2        19      5728       224
        2        20      5952       224

So we loose approx. 32 bytes (32bit addresses) or 48 bytes (64 bit 
addresses) for each proxy class compared to original code when using 
FlattenedWeakCache, but we gain 8 bytes (32 bit or 64 bit addresses) for 
each proxy class cached compared to original code when using 
TwoLevelWeakCache. So which to favour, space or time?

Other comments in-line...

On 04/17/2013 07:31 AM, Mandy Chung wrote:
> On 4/16/2013 7:18 AM, Peter Levart wrote:
>> Hi Mandy,
>>
>> I prepared a preview variant of j.l.r.Proxy using WeakCache (turned 
>> into an interface and a special FlattenedWeakCache implementation in 
>> anticipation to create another variant using two-levels of 
>> ConcurrentHashMaps for backing storage, but with same API) just to 
>> compare performance:
>>
>> https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/index.html
>>
>
>
> thanks for getting this prototype done quickly.
>
>> As the values (Class objects of proxy classes) must be wrapped in a 
>> WeakReference, the same instance of WeakReference can be re-used as a 
>> key in another ConcurrentHashMap to implement quick look-up for 
>> Proxy.isProxyClass() method eliminating the need to use ClassValue, 
>> which is quite space-hungry.
>>
>
> I also think maintaining another ConcurrentHashMap is a good 
> replacement with the use of ClassValue to avoid its memory overhead.
>
>> Comparing the performance, here's a summary of all 3 variants 
>> (original, patched using a field in ClassLoader and this variant):
>>
>> [...]
>>
>> The improvement is still quite satisfactory, although a little slower 
>> than the direct-field variant. The scalability is the same as with 
>> direct-field variant.
>>
>
> Agree - the improvement is quite good.
>
>> Space consumption of cache structure, calculated as deep-size of the 
>> structure, ignoring interned Strings, Class and ClassLoader objects 
>> unsing single non-bootstrap ClassLoader for defining the proxy 
>> classes and using 32 bit addressing is the following:
>>
>> [...]
>>
>> So with new ConcurrentHashMap the patched Proxy uses about 32 bytes 
>> more per proxy class.
>>
>> Is this satisfactory or should we also try a variant with two-levels 
>> of ConcurrentHashMaps?
>>
>
> The overhead seems okay to trade off the scalability.
>
> Since you have prepared for doing another variant, it'd be good to 
> compare two prototypes if this doesn't bring too much work :)  I would 
> imagine that there might be slight difference in your measurement when 
> comparing with proxies defined by a single class loader but the code 
> might be simpler (might not be if you keep the same API but different 
> implementation).

With TwoLevelWeakCache, there is a "step" of 108 bytes (32bit addresses) 
when new ClassLoader is encoutered (new 2nd level ConcurrentHashMap is 
allocated and new entry added to 1st level CHM. There's no such "step" 
in FlattenedWeakCache (modulo the steps when the CHMs are itself 
resized). So we roughly have 108 bytes wasted for each new ClassLoader 
encountered with TwoLevelWeakCache vs. FlattenedWeakCache, but we also 
have 40 bytes spared for each proxy class cached. TwoLevelWeakCache 
starts to pay off if there are at least 3 proxy classes defined per 
ClassLoader in average.

>
> Regardless of which approach to use - you have added a general purpose 
> WeakCache and the implementation class in the sun.misc package.  While 
> it's good to have such class for other jdk class to use, I am more 
> comfortable in keeping it as a private class for proxy implementation 
> to use.  We need existing applications to migrate away from sun.misc 
> and other private APIs to prepare for modularization.

What about package-private in java.lang.reflect? It makes Proxy itself 
much easier to read. When we decide which way to go, I can remove the 
interface and only leave a single package-private class...

>
> Nits: can you wrap the lines around 80 columns including comments?  
> try-catch-finally statements need some formatting fixes.  Our 
> convention is to have 'catch', or 'finally' following the closing 
> bracket '}' in the same line.  Your editor breaks 'catch' or 'finally' 
> into the next line.

Fixed.

Regards, Peter

>
>>
>> Even without SecurityManager installed the performance of native 
>> getClassLoader0 was a hog. I don't know why? Isn't there an implicit 
>> reference to defining ClassLoader from every Class object?
>
> That's right - it looks for the caller class only if the security 
> manager is installed.   The defining class loader is kept in the VM's 
> Klass object (language-level Class instance representation in the VM) 
> and there is no computation needed to obtain a defining class loader 
> of a given Class object.  I can only think of the Java <-> native 
> transition overhead that could be one factor.  Class.getClassLoader0 
> is not intrinsified.  I'll find out (others on this mailing list may 
> probably know).
>
> Mandy