Proxy.isProxyClass scalability
Peter Levart
peter.levart at gmail.com
Wed Apr 17 14:18:50 UTC 2013
Hi Mandy,
Here's the updated webrev:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/index.html
This adds TwoLevelWeakCache to the scene with following performance
compared to other alternatives:
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original Patch(CL field)
FlattenedWeakCache TwoLevelWeakCache
======================= ======= ============== ===============
================== =================
Proxy_getProxyClass 1 2,403.27 163.70
206.88 252.89
4 3,039.01 202.77
303.38 327.62
8 5,193.58 314.47
442.58 510.63
Proxy_isProxyClassTrue 1 95.02 10.78
41.85 42.03
4 2,266.29
10.80 42.32 42.07
8 4,782.29
20.53 72.29 69.25
Proxy_isProxyClassFalse 1 95.02 1.36
1.36 1.36
4 2,186.59
1.36 1.37 1.40
8 4,891.15
2.72 2.94 2.72
Annotation_equals 1 240.10 152.29
193.27 200.45
4 1,864.06 153.81
195.60 202.45
8 8,639.20 262.09
384.72 338.70
As expected, the Proxy.getProxyClass() is yet a little slower than with
FlattenedWeakCache, but still much faster than original Proxy
implementation. Another lookup in the ConcurrentHashMap and another
indirection have a price, but we also get something in return - space.
This is all obtained on latest lambda build (with new segment-less
ConcurrentHashMap). I also added another ClassLoader to see what happens
when the 2nd is added to the cache:
# Original Proxy, 32 bit addressing
class proxy size of delta to
loaders classes caches prev.ln.
-------- -------- -------- --------
0 0 400 400
1 1 768 368
1 2 920 152
1 3 1072 152
1 4 1224 152
1 5 1376 152
1 6 1528 152
1 7 1680 152
1 8 1832 152
1 9 1984 152
1 10 2136 152
2 11 2456 320
2 12 2672 216
2 13 2824 152
2 14 2976 152
2 15 3128 152
2 16 3280 152
2 17 3432 152
2 18 3584 152
2 19 3736 152
2 20 3888 152
# Original Proxy, 64 bit addressing
class proxy size of delta to
loaders classes caches prev.ln.
-------- -------- -------- --------
0 0 632 632
1 1 1216 584
1 2 1448 232
1 3 1680 232
1 4 1912 232
1 5 2144 232
1 6 2376 232
1 7 2608 232
1 8 2840 232
1 9 3072 232
1 10 3304 232
2 11 3832 528
2 12 4192 360
2 13 4424 232
2 14 4656 232
2 15 4888 232
2 16 5120 232
2 17 5352 232
2 18 5584 232
2 19 5816 232
2 20 6048 232
# Patched Proxy (FlattenedWeakCache), 32 bit addressing
class proxy size of delta to
loaders classes caches prev.ln.
-------- -------- -------- --------
0 0 240 240
1 1 584 344
1 2 768 184
1 3 952 184
1 4 1136 184
1 5 1320 184
1 6 1504 184
1 7 1688 184
1 8 1872 184
1 9 2056 184
1 10 2240 184
2 11 2424 184
2 12 2736 312
2 13 2920 184
2 14 3104 184
2 15 3288 184
2 16 3472 184
2 17 3656 184
2 18 3840 184
2 19 4024 184
2 20 4208 184
# Patched Proxy (FlattenedWeakCache), 64 bit addressing
class proxy size of delta to
loaders classes caches prev.ln.
-------- -------- -------- --------
0 0 336 336
1 1 920 584
1 2 1200 280
1 3 1480 280
1 4 1760 280
1 5 2040 280
1 6 2320 280
1 7 2600 280
1 8 2880 280
1 9 3160 280
1 10 3440 280
2 11 3720 280
2 12 4256 536
2 13 4536 280
2 14 4816 280
2 15 5096 280
2 16 5376 280
2 17 5656 280
2 18 5936 280
2 19 6216 280
2 20 6496 280
# Patched Proxy (TwoLevelWeakCache), 32 bit addressing
class proxy size of delta to
loaders classes caches prev.ln.
-------- -------- -------- --------
0 0 240 240
1 1 752 512
1 2 896 144
1 3 1040 144
1 4 1184 144
1 5 1328 144
1 6 1472 144
1 7 1616 144
1 8 1760 144
1 9 1904 144
1 10 2048 144
2 11 2400 352
2 12 2608 208
2 13 2752 144
2 14 2896 144
2 15 3040 144
2 16 3184 144
2 17 3328 144
2 18 3472 144
2 19 3616 144
2 20 3760 144
# Patched Proxy (TwoLevelWeakCache), 64 bit addressing
class proxy size of delta to
loaders classes caches prev.ln.
-------- -------- -------- --------
0 0 336 336
1 1 1216 880
1 2 1440 224
1 3 1664 224
1 4 1888 224
1 5 2112 224
1 6 2336 224
1 7 2560 224
1 8 2784 224
1 9 3008 224
1 10 3232 224
2 11 3808 576
2 12 4160 352
2 13 4384 224
2 14 4608 224
2 15 4832 224
2 16 5056 224
2 17 5280 224
2 18 5504 224
2 19 5728 224
2 20 5952 224
So we loose approx. 32 bytes (32bit addresses) or 48 bytes (64 bit
addresses) for each proxy class compared to original code when using
FlattenedWeakCache, but we gain 8 bytes (32 bit or 64 bit addresses) for
each proxy class cached compared to original code when using
TwoLevelWeakCache. So which to favour, space or time?
Other comments in-line...
On 04/17/2013 07:31 AM, Mandy Chung wrote:
> On 4/16/2013 7:18 AM, Peter Levart wrote:
>> Hi Mandy,
>>
>> I prepared a preview variant of j.l.r.Proxy using WeakCache (turned
>> into an interface and a special FlattenedWeakCache implementation in
>> anticipation to create another variant using two-levels of
>> ConcurrentHashMaps for backing storage, but with same API) just to
>> compare performance:
>>
>> https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/index.html
>>
>
>
> thanks for getting this prototype done quickly.
>
>> As the values (Class objects of proxy classes) must be wrapped in a
>> WeakReference, the same instance of WeakReference can be re-used as a
>> key in another ConcurrentHashMap to implement quick look-up for
>> Proxy.isProxyClass() method eliminating the need to use ClassValue,
>> which is quite space-hungry.
>>
>
> I also think maintaining another ConcurrentHashMap is a good
> replacement with the use of ClassValue to avoid its memory overhead.
>
>> Comparing the performance, here's a summary of all 3 variants
>> (original, patched using a field in ClassLoader and this variant):
>>
>> [...]
>>
>> The improvement is still quite satisfactory, although a little slower
>> than the direct-field variant. The scalability is the same as with
>> direct-field variant.
>>
>
> Agree - the improvement is quite good.
>
>> Space consumption of cache structure, calculated as deep-size of the
>> structure, ignoring interned Strings, Class and ClassLoader objects
>> unsing single non-bootstrap ClassLoader for defining the proxy
>> classes and using 32 bit addressing is the following:
>>
>> [...]
>>
>> So with new ConcurrentHashMap the patched Proxy uses about 32 bytes
>> more per proxy class.
>>
>> Is this satisfactory or should we also try a variant with two-levels
>> of ConcurrentHashMaps?
>>
>
> The overhead seems okay to trade off the scalability.
>
> Since you have prepared for doing another variant, it'd be good to
> compare two prototypes if this doesn't bring too much work :) I would
> imagine that there might be slight difference in your measurement when
> comparing with proxies defined by a single class loader but the code
> might be simpler (might not be if you keep the same API but different
> implementation).
With TwoLevelWeakCache, there is a "step" of 108 bytes (32bit addresses)
when new ClassLoader is encoutered (new 2nd level ConcurrentHashMap is
allocated and new entry added to 1st level CHM. There's no such "step"
in FlattenedWeakCache (modulo the steps when the CHMs are itself
resized). So we roughly have 108 bytes wasted for each new ClassLoader
encountered with TwoLevelWeakCache vs. FlattenedWeakCache, but we also
have 40 bytes spared for each proxy class cached. TwoLevelWeakCache
starts to pay off if there are at least 3 proxy classes defined per
ClassLoader in average.
>
> Regardless of which approach to use - you have added a general purpose
> WeakCache and the implementation class in the sun.misc package. While
> it's good to have such class for other jdk class to use, I am more
> comfortable in keeping it as a private class for proxy implementation
> to use. We need existing applications to migrate away from sun.misc
> and other private APIs to prepare for modularization.
What about package-private in java.lang.reflect? It makes Proxy itself
much easier to read. When we decide which way to go, I can remove the
interface and only leave a single package-private class...
>
> Nits: can you wrap the lines around 80 columns including comments?
> try-catch-finally statements need some formatting fixes. Our
> convention is to have 'catch', or 'finally' following the closing
> bracket '}' in the same line. Your editor breaks 'catch' or 'finally'
> into the next line.
Fixed.
Regards, Peter
>
>>
>> Even without SecurityManager installed the performance of native
>> getClassLoader0 was a hog. I don't know why? Isn't there an implicit
>> reference to defining ClassLoader from every Class object?
>
> That's right - it looks for the caller class only if the security
> manager is installed. The defining class loader is kept in the VM's
> Klass object (language-level Class instance representation in the VM)
> and there is no computation needed to obtain a defining class loader
> of a given Class object. I can only think of the Java <-> native
> transition overhead that could be one factor. Class.getClassLoader0
> is not intrinsified. I'll find out (others on this mailing list may
> probably know).
>
> Mandy
More information about the core-libs-dev
mailing list