Proxy.isProxyClass scalability

Tue Apr 16 14:18:27 UTC 2013

Hi Mandy,

I prepared a preview variant of j.l.r.Proxy using WeakCache (turned into 
an interface and a special FlattenedWeakCache implementation in 
anticipation to create another variant using two-levels of 
ConcurrentHashMaps for backing storage, but with same API) just to 
compare performance:

https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/index.html

As the values (Class objects of proxy classes) must be wrapped in a 
WeakReference, the same instance of WeakReference can be re-used as a 
key in another ConcurrentHashMap to implement quick look-up for 
Proxy.isProxyClass() method eliminating the need to use ClassValue, 
which is quite space-hungry.

Comparing the performance, here's a summary of all 3 variants (original, 
patched using a field in ClassLoader and this variant):

Summary (4 Cores x 2 Threads i7 CPU):

Test                     Threads  ns/op Original  Patched (CL field)  
Patched (WeakCache)
=======================  =======  ============== ==================  
===================
Proxy_getProxyClass            1 2,403.27              
163.70               206.88
                                4 3,039.01              
202.77               303.38
                                8 5,193.58              
314.47               442.58

Proxy_isProxyClassTrue         1 95.02               
10.78                41.85
                                4 2,266.29               
10.80                42.32
                                8 4,782.29               
20.53                72.29

Proxy_isProxyClassFalse        1 95.02                
1.36                 1.36
                                4 2,186.59                
1.36                 1.37
                                8 4,891.15                
2.72                 2.94

Annotation_equals              1 240.10              
152.29               193.27
                                4 1,864.06              
153.81               195.60
                                8 8,639.20              
262.09               384.72

The improvement is still quite satisfactory, although a little slower 
than the direct-field variant. The scalability is the same as with 
direct-field variant.

Space consumption of cache structure, calculated as deep-size of the 
structure, ignoring interned Strings, Class and ClassLoader objects 
unsing single non-bootstrap ClassLoader for defining the proxy classes 
and using 32 bit addressing is the following:

original Proxy code:

proxy     size of   delta to
classes   caches    prev.ln.
--------  --------  --------
        0       400       400
        1       768       368
        2       920       152
        3      1072       152
        4      1224       152
        5      1376       152
        6      1528       152
        7      1680       152
        8      1832       152
        9      1984       152
       10      2136       152

Proxy patched with the variant using FlattenedWeakCache, run on current 
JDK8/tl tip (still uses old ConcurrentHashMap implementation with segments):

proxy     size of   delta to
classes   caches    prev.ln.
--------  --------  --------
        0       560       560
        1       936       376
        2      1312       376
        3      1688       376
        4      2064       376
        5      2352       288
        6      2728       376
        7      3016       288
        8      3392       376
        9      3592       200
       10      3872       280

...and the same with current JDK8/lambda tip (using new segment-less 
ConcurrentHashMap):

proxy     size of   delta to
classes   caches    prev.ln.
--------  --------  --------
        0       240       240
        1       584       344
        2       768       184
        3       952       184
        4      1136       184
        5      1320       184
        6      1504       184
        7      1688       184
        8      1872       184
        9      2056       184
       10      2240       184

So with new ConcurrentHashMap the patched Proxy uses about 32 bytes more 
per proxy class.

Is this satisfactory or should we also try a variant with two-levels of 
ConcurrentHashMaps?

Regards, Peter

P.S. Comment to your comment in-line...

On 04/16/2013 12:58 AM, Mandy Chung wrote:
>
> On 4/13/2013 2:59 PM, Peter Levart wrote:
>>
>>>>
>>>> I also devised an alternative caching mechanism with scalability in 
>>>> mind which uses WeakReferences for keys (for example ClassLoader) 
>>>> and values (for example Class) that could be used in this situation 
>>>> in case adding a field to ClassLoader is not an option:
>>>>
>>>
>>> I would also consider any alternative to avoid adding the 
>>> proxyClassCache field in ClassLoader as Alan commented previously.
>>>
>>> My observation of the typical usage of proxies is to use the 
>>> interface's class loader to define the proxy class. So is it 
>>> necessary to maintain a per-loader cache?  The per-loader cache maps 
>>> from the interface names to a proxy class defined by one loader. I 
>>> would think it's reasonable to assume the number of loaders to 
>>> define proxy class with the same set of interfaces is small.  What 
>>> if we make the cache as "interface names" as the key to a set of 
>>> proxy class suppliers that can have only one proxy class per one 
>>> unique defining loader.  If the proxy class is being generated i.e. 
>>> ProxyClassFactory supplier, the loader is available for comparison. 
>>> When there are more than one matching proxy classes, it would have 
>>> to iterate all in the set.
>>
>> I would assume yes, proxy class for a particular set of interfaces is 
>> typically defined by one classloader only. But the API allows to 
>> specify different loaders as long as the interfaces implemented by 
>> proxy class are "visible" from the loader that defines the proxy 
>> class. If we're talking about interface names - as opposed to 
>> interfaces - then the possibility that a particular set of interface 
>> names would want to be used to define proxy classes with different 
>> loaders is even bigger, since an interface name can refer to 
>> different interfaces with same name (think of interfaces deployed as 
>> part of an app in an application server, say a set of annotations 
>> used by different apps but deployed as part of each individual app).
>>
>
> Agree.  I was tempted to consider making weak reference to the 
> interface classes as the key but in any case the overhead of 
> Class.getClassLoader() is still a performance hog.   Let's move 
> forward with the alternative you propose.
>
>> The scheme you're proposing might be possible, though not simple: The 
>> factory Supplier<Class> would become a Function<ClassLoader, Class> 
>> and would have to maintain it's own set of cached proxy classes. 
>> There would be a single ConcurrentMap<List<String>, 
>> Function<ClassLoader, Class>> to map sets of interface names to 
>> factory Functions, but the cached classes in a particular factory 
>> Function would still have to be weakly referenced. I see some 
>> difficulties in implementing such a scheme:
>> - expunging cleared WeakReferences could only reliably clear the 
>> cache inside each factory Function but removing the entry from the 
>> map of  factory Functions when last proxy class for a particular set 
>> of interface names is expunged  would become a difficult task if not 
>> impossible with all the scalability constraints in mind (just 
>> thinking about concurrent requests into same factory Function where 
>> one is requesting new proxy class and the other is expunging cleared 
>> WeakReference which represents the last element in the set of cached 
>> proxy classes).
>> - one of my past ideas of implementing scalable Proxy.isProxyClass() 
>> was to maintain a Set<Class> in each ClassLoader populated with all 
>> the proxy classes defined by a particular ClassLoader. Benchmarking 
>> such solution showed that Class.getClassLoader() is a peformance hog, 
>> so I scraped it in favor of ClassValue<Boolean> that is now 
>> incorporated in the patch. In order to "choose" the right proxy class 
>> from the set of proxy classes inside a particular factory Function, 
>> the Class.getClassLoader() method would have to be used, or entries 
>> would have to (weakly) reference a particular ClassLoader associated 
>> with each proxy class.
>>
>
> Thanks for reminding me your earlier prototype.  I suspect the cost of 
> Class.getClassLoader() is due to its lookup of the caller class every 
> time it's called.

Even without SecurityManager installed the performance of native 
getClassLoader0 was a hog. I don't know why? Isn't there an implicit 
reference to defining ClassLoader from every Class object?

>
>> Considering all that, such solution starts to look unappealing. It 
>> might even be more space-hungry then the presented WeakCache.
>>
>> WeakCache is currently the following:
>>
>> ConcurrentMap<WeakReferenceWithInterfaceNames<ClassLoader>, 
>> WeakReference<Class>>
>>
>> another alternative would be:
>>
>> ConcurrentMap<WeakReference<ClassLoader>, 
>> ConcurrentMap<InterfaceNames, WeakReference<Class>>>
>>
>> ...which might need a little less space than WeakCache (only one 
>> WeakReference per proxy class + one per ClassLoader instead of two 
>> WeakReferences per proxy class) but would require two map lookups 
>> during fast-path retrieval. It might not be performance critical and 
>> the expunging could be performed easily too.
>>
>
> I am fine with either of these alternatives.  As you noted, the latter 
> one would save little bit of memory for the cases when several proxy 
> classes are defined per loader e.g. one per each annotation type.
>
> Mandy