Proxy.isProxyClass scalability
Hello, Recently I stumbled upon the scalability problem of j.l.r.Proxy.isProxyClass(Class) method when trying to find-out why the equals() method of annotations is not scalable. The reason is that j.l.r.Proxy uses a single synchronized wrapper around WeakHashMap to keep track of proxy classes in order to implement isProxyClass method: 244 /** set of all generated proxy classes, for isProxyClass implementation */ 245 private static Map<Class<?>, Void> proxyClasses = 246 Collections.synchronizedMap(new WeakHashMap<Class<?>, Void>()); 631 public static boolean isProxyClass(Class<?> cl) { 632 if (cl == null) { 633 throw new NullPointerException(); 634 } 635 636 return proxyClasses.containsKey(cl); 637 } In the discussion on the cuncurrency-interest list: * http://cs.oswego.edu/pipermail/concurrency-interest/2013-January/010642.html ...it was indicated that other places in JDK besides the annotation's equals() method suffer from this scalability bottleneck, mainly in the field of serialization, RMI and JMX. With the help from this discussion, I prepared a possible fix which I'm proposing here: http://dl.dropbox.com/u/101777488/jdk8-tl/isProxyClass/webrev.01/index.html The fix drops usage of synchronized wrapper and WeakHashMap and replaces them with a ClassValue and a little ThreadLocal trick to initialize it. The ThreadLocal trick could be replaced with a simple static variable guarded by a synchronized block if desired. I have some results from a micro-benchmark that exercises the annotation's equals method. The results were obtained on two different systems: * https://raw.github.com/plevart/jdk8-tl/proxy/test/benchmark_results.txt The benchmark sources: * https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/AnnotationEquals... * https://github.com/plevart/micro-bench/blob/master/src/si/pele/microbench/Te... Thanks to Aleksey Shipilev for showing me the obvious pitfalls when designing a micro-benchmark. I hope this time it does the job correctly... What do you thik? Should I file a RFE first? Regards, Peter
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first? Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first? Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.do?bug_id=7123493
-Alan. I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
Peter, I know this was brought up on your c-i mailing list thread, but it really feels like a new boolean field in j.l.Class is the cleaner solution (and infinitely scalable and doesn't require bookkeeping in the Proxy class). Is there really no existing alignment gap in j.l.Class layout that a boolean can slide into? Sent from my phone On Jan 24, 2013 9:35 AM, "Peter Levart" <peter.levart@gmail.com> wrote:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first?
Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.**do?bug_id=7123493<http://bugs.sun.com/view_bug.do?bug_id=7123493>
-Alan.
I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
On 01/24/2013 03:40 PM, Vitaly Davidovich wrote:
Peter,
I know this was brought up on your c-i mailing list thread, but it really feels like a new boolean field in j.l.Class is the cleaner solution (and infinitely scalable and doesn't require bookkeeping in the Proxy class). Is there really no existing alignment gap in j.l.Class layout that a boolean can slide into?
There might be, I will check. The ClassValue is scalable too (check the benchmarks), and bookkeeping is performed in the ClassValue.ClassValueMap that is referenced from the Class - not in the Proxy class. Unfortunately the j.l.r.Proxy class is in another package from j.l.Class, so the solution with a simple boolean field would require JavaLangAccess or Unsafe acrobatics. Another thing is this separate bug: http://bugs.sun.com/view_bug.do?bug_id=7123493 To solve it, a reference field in j.l.ClassLoader or a hypothetical ClassLoaderValue implementation would be required. I looked at the bug reporter's solution (ConcurrentProxy). He guards the WeakHashMap<ClassLoader, ...> with a ReadWriteLock: /* * Find or create the proxy class cache for the class loader. */ Map<Object,Reference<Class>> cache; try{ loaderToCacheLock.readLock().lock(); cache = loaderToCache.get(loader); }finally{ loaderToCacheLock.readLock().unlock(); } // window of opportunity here between locks that would result in duplicate put(..) call if (cache == null) { try{ loaderToCacheLock.writeLock().lock(); cache = new ConcurrentHashMap<Object,Reference<Class>>(); loaderToCache.put(loader, cache); }finally{ loaderToCacheLock.writeLock().unlock(); } } That code is broken twice. First, it is not double-checking the existence of cache in the writeLock guarded section. That's not so serious and could be fixed. The other fault is using ReadWriteLock.readLock() to guard the WeakHashMap.get(). This is dangerous, since WeakHashMap.get() is a mutating method (expunging stale entries). I don't think fixing this bug without either: - new field in ClassLoader or - ClassLoaderValue or - WeakConcurrentHashMap ...is possible. Since we don't have the later two at the moment, the best bet for it unfortunately seems to be the first solution. Regards, Peter
Sent from my phone
On Jan 24, 2013 9:35 AM, "Peter Levart" <peter.levart@gmail.com <mailto:peter.levart@gmail.com>> wrote:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first?
Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.do?bug_id=7123493
-Alan.
I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
The boolean seems better from an intent standpoint as well - all this wants to do is tag a Class as having been internally generated by the proxy. This information is present at the point of generation, obviously (and the new. field can be marked final). Doing the tagging via indirection (ClassValue or otherwise) feels circuitous. If the new boolean would actually increase footprint, then OK - tagging via indirection to avoid footprint increase for a relatively uncommon thing seems worthwhile. I'm not that familiar with ClassValue but I don't doubt it scales better than a synch map, and it may scale "well enough" for the expected usage. But a read of a final field is obviously the best case. Are there any downsides to using ClassValue? That is, what could go wrong? Thanks Sent from my phone On Jan 24, 2013 10:46 AM, "Peter Levart" <peter.levart@gmail.com> wrote:
On 01/24/2013 03:40 PM, Vitaly Davidovich wrote:
Peter,
I know this was brought up on your c-i mailing list thread, but it really feels like a new boolean field in j.l.Class is the cleaner solution (and infinitely scalable and doesn't require bookkeeping in the Proxy class). Is there really no existing alignment gap in j.l.Class layout that a boolean can slide into?
There might be, I will check. The ClassValue is scalable too (check the benchmarks), and bookkeeping is performed in the ClassValue.ClassValueMap that is referenced from the Class - not in the Proxy class. Unfortunately the j.l.r.Proxy class is in another package from j.l.Class, so the solution with a simple boolean field would require JavaLangAccess or Unsafe acrobatics.
Another thing is this separate bug:
http://bugs.sun.com/view_bug.do?bug_id=7123493
To solve it, a reference field in j.l.ClassLoader or a hypothetical ClassLoaderValue implementation would be required. I looked at the bug reporter's solution (ConcurrentProxy). He guards the WeakHashMap<ClassLoader, ...> with a ReadWriteLock:
/* * Find or create the proxy class cache for the class loader. */ Map<Object,Reference<Class>> cache; try{ loaderToCacheLock.readLock().lock(); cache = loaderToCache.get(loader); }finally{ loaderToCacheLock.readLock().unlock(); } // window of opportunity here between locks that would result in duplicate put(..) call if (cache == null) { try{ loaderToCacheLock.writeLock().lock(); & nbsp; cache = new ConcurrentHashMap<Object,Reference<Class>>(); loaderToCache.put(loader, cache); }finally{ & nbsp; loaderToCacheLock.writeLock().unlock(); } }
That code is broken twice. First, it is not double-checking the existence of cache in the writeLock guarded section. That's not so serious and could be fixed. The other fault is using ReadWriteLock.readLock() to guard the WeakHashMap.get(). This is dangerous, since WeakHashMap.get() is a mutating method (expunging stale entries).
I don't think fixing this bug without either: - new field in ClassLoader or - ClassLoaderValue or - WeakConcurrentHashMap
...is possible. Since we don't have the later two at the moment, the best bet for it unfortunately seems to be the first solution.
Regards, Peter
Sent from my phone On Jan 24, 2013 9:35 AM, "Peter Levart" <peter.levart@gmail.com> wrote:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first?
Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.do?bug_id=7123493
-Alan.
I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
On 01/24/2013 04:55 PM, Vitaly Davidovich wrote:
I'm not that familiar with ClassValue but I don't doubt it scales better than a synch map,
It should be quite a bit faster. Here's an example: <http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-December/004209.html> It's quite a bit code, but no synchronization is needed, and all branches should be predictable. -- Florian Weimer / Red Hat Product Security Team
On 25/01/2013 2:36 AM, Peter Levart wrote:
On 01/24/2013 04:45 PM, Peter Levart wrote:
Is there really no existing alignment gap in j.l.Class layout that a boolean can slide into?
There might be, I will check. All instance fields in j.l.Class are either references or ints.
Instance are also 8-byte aligned though so does that leave any slop where an extra field would not make an actual difference. (I should know the answer to that after the ReflectionData changes but don't recall.) Note: I have not looked at this, just considering the "add a field" aspect of it. David
Regards, Peter
On 01/25/2013 06:35 AM, David Holmes wrote:
On 25/01/2013 2:36 AM, Peter Levart wrote:
On 01/24/2013 04:45 PM, Peter Levart wrote:
Is there really no existing alignment gap in j.l.Class layout that a boolean can slide into?
There might be, I will check. All instance fields in j.l.Class are either references or ints.
Instance are also 8-byte aligned though so does that leave any slop where an extra field would not make an actual difference. (I should know the answer to that after the ReflectionData changes but don't recall.)
Note: I have not looked at this, just considering the "add a field" aspect of it.
David
Here's what the Unsafe reports about current layout of j.l.Class (ReflectionData changes already taken into account): 32 bit pointers: java.lang.Class instance field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap 52 int classRedefinedCount 80 int lastAnnotationsRedefinedCount 84 java.lang.String static field offsets: Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 96 Comparator CASE_INSENSITIVE_ORDER 100 long serialVersionUID 104 int HASHING_SEED 112 64 bit pointers: java.lang.Class instance field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap 96 int classRedefinedCount 128 int lastAnnotationsRedefinedCount 132 java.lang.String static field offsets: Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 144 Comparator CASE_INSENSITIVE_ORDER 152 long serialVersionUID 160 int HASHING_SEED 168 If I add a boolean instance field "isProxy" to j.l.Class the report changes to: 32 bit pointers: java.lang.Class instance field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap 52 int classRedefinedCount 80 int lastAnnotationsRedefinedCount 84 boolean isProxy 96 java.lang.String static field offsets: Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 104 Comparator CASE_INSENSITIVE_ORDER 108 long serialVersionUID 112 int HASHING_SEED 120 64 bit pointers: java.lang.Class instance field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap 96 int classRedefinedCount 128 int lastAnnotationsRedefinedCount 132 boolean isProxy 144 java.lang.String static field offsets: Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 152 Comparator CASE_INSENSITIVE_ORDER 160 long serialVersionUID 168 int HASHING_SEED 176 ...so it seems that in both cases, adding a boolean to j.l.Class wastes 8 bytes per Class object :-( Regards, Peter
That's unfortunate. Can you steal a bit in one of the int fields? E.g. lastAnnotationsRedefinedCount surely doesn't need the full range right? :) Sent from my phone On Jan 25, 2013 11:02 AM, "Peter Levart" <peter.levart@gmail.com> wrote:
On 01/25/2013 06:35 AM, David Holmes wrote:
On 25/01/2013 2:36 AM, Peter Levart wrote:
On 01/24/2013 04:45 PM, Peter Levart wrote:
Is there really no existing alignment gap in j.l.Class layout that a boolean can slide into?
There might be, I will check.
All instance fields in j.l.Class are either references or ints.
Instance are also 8-byte aligned though so does that leave any slop where an extra field would not make an actual difference. (I should know the answer to that after the ReflectionData changes but don't recall.)
Note: I have not looked at this, just considering the "add a field" aspect of it.
David
Here's what the Unsafe reports about current layout of j.l.Class (ReflectionData changes already taken into account):
32 bit pointers:
java.lang.Class instance field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap 52 int classRedefinedCount 80 int lastAnnotationsRedefinedCount 84
java.lang.String static field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 96 Comparator CASE_INSENSITIVE_ORDER 100 long serialVersionUID 104 int HASHING_SEED 112
64 bit pointers:
java.lang.Class instance field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap 96 int classRedefinedCount 128 int lastAnnotationsRedefinedCount 132
java.lang.String static field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 144 Comparator CASE_INSENSITIVE_ORDER 152 long serialVersionUID 160 int HASHING_SEED 168
If I add a boolean instance field "isProxy" to j.l.Class the report changes to:
32 bit pointers:
java.lang.Class instance field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap 52 int classRedefinedCount 80 int lastAnnotationsRedefinedCount 84 boolean isProxy 96
java.lang.String static field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 104 Comparator CASE_INSENSITIVE_ORDER 108 long serialVersionUID 112 int HASHING_SEED 120
64 bit pointers:
java.lang.Class instance field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap 96 int classRedefinedCount 128 int lastAnnotationsRedefinedCount 132 boolean isProxy 144
java.lang.String static field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 152 Comparator CASE_INSENSITIVE_ORDER 160 long serialVersionUID 168 int HASHING_SEED 176
...so it seems that in both cases, adding a boolean to j.l.Class wastes 8 bytes per Class object :-(
Regards, Peter
On 01/25/2013 08:02 PM, Peter Levart wrote:
On 01/25/2013 06:35 AM, David Holmes wrote:
On 25/01/2013 2:36 AM, Peter Levart wrote:
On 01/24/2013 04:45 PM, Peter Levart wrote: ...so it seems that in both cases, adding a boolean to j.l.Class wastes 8 bytes per Class object :-(
Seems to be the case, we are hitting the 8-byte alignment boundary. java-object-layout [1] on jdk7u12: $ java -jar ~/projects/java-object-layout/target/java-object-layout.jar java.lang.Class Running 64-bit HotSpot VM. Using compressed references with 3-bit shift. Objects are 8 bytes aligned. java.lang.Class offset size type description 0 12 (assumed to be the object header) 12 4 Constructor Class.cachedConstructor 16 4 Class Class.newInstanceCallerCache 20 4 String Class.name 24 4 SoftReference Class.declaredFields 28 4 SoftReference Class.publicFields 32 4 SoftReference Class.declaredMethods 36 4 SoftReference Class.publicMethods 40 4 SoftReference Class.declaredConstructors 44 4 SoftReference Class.publicConstructors 48 4 SoftReference Class.declaredPublicFields 52 4 SoftReference Class.declaredPublicMethods 56 4 ClassRepository Class.genericInfo 60 4 Object[] Class.enumConstants 64 4 Map Class.enumConstantDirectory 68 4 Map Class.annotations 72 4 Map Class.declaredAnnotations 76 4 AnnotationType Class.annotationType 80 4 ClassValueMap Class.classValueMap 84 12 (alignment/padding gap) 96 4 int Class.classRedefinedCount 100 4 int Class.lastRedefinedCount 104 (object boundary, size estimate) But, otherwise, can't we use java.lang.ClassValue to associate this flag with each class? -Aleksey. [1] https://github.com/shipilev/java-field-layout
On 01/25/2013 05:34 PM, Aleksey Shipilev wrote:
On 01/25/2013 08:02 PM, Peter Levart wrote:
On 01/25/2013 06:35 AM, David Holmes wrote:
On 25/01/2013 2:36 AM, Peter Levart wrote:
On 01/24/2013 04:45 PM, Peter Levart wrote: ...so it seems that in both cases, adding a boolean to j.l.Class wastes 8 bytes per Class object :-( Seems to be the case, we are hitting the 8-byte alignment boundary.
java-object-layout [1] on jdk7u12:
$ java -jar ~/projects/java-object-layout/target/java-object-layout.jar java.lang.Class Running 64-bit HotSpot VM. Using compressed references with 3-bit shift. Objects are 8 bytes aligned.
java.lang.Class offset size type description 0 12 (assumed to be the object header) 12 4 Constructor Class.cachedConstructor 16 4 Class Class.newInstanceCallerCache 20 4 String Class.name 24 4 SoftReference Class.declaredFields 28 4 SoftReference Class.publicFields 32 4 SoftReference Class.declaredMethods 36 4 SoftReference Class.publicMethods 40 4 SoftReference Class.declaredConstructors 44 4 SoftReference Class.publicConstructors 48 4 SoftReference Class.declaredPublicFields 52 4 SoftReference Class.declaredPublicMethods 56 4 ClassRepository Class.genericInfo 60 4 Object[] Class.enumConstants 64 4 Map Class.enumConstantDirectory 68 4 Map Class.annotations 72 4 Map Class.declaredAnnotations 76 4 AnnotationType Class.annotationType 80 4 ClassValueMap Class.classValueMap 84 12 (alignment/padding gap) What's this? why 12 bytes? 96 4 int Class.classRedefinedCount 100 4 int Class.lastRedefinedCount 104 (object boundary, size estimate)
But, otherwise, can't we use java.lang.ClassValue to associate this flag with each class?
-Aleksey.
On 01/25/2013 08:37 PM, Peter Levart wrote:
On 01/25/2013 05:34 PM, Aleksey Shipilev wrote:
80 4 ClassValueMap Class.classValueMap 84 12 (alignment/padding gap)
What's this? why 12 bytes?
96 4 int Class.classRedefinedCount
Beats me, some voodoo VM magic? This is what Unsafe reports anyway. Your data have the same gap (though not immediately evident because you don't calculate the offset differences). -Aleksey.
On 01/25/2013 05:42 PM, Aleksey Shipilev wrote:
On 01/25/2013 08:37 PM, Peter Levart wrote:
On 01/25/2013 05:34 PM, Aleksey Shipilev wrote:
80 4 ClassValueMap Class.classValueMap 84 12 (alignment/padding gap)
What's this? why 12 bytes?
96 4 int Class.classRedefinedCount
Beats me, some voodoo VM magic? This is what Unsafe reports anyway. Your data have the same gap (though not immediately evident because you don't calculate the offset differences).
-Aleksey. j.l.Class seems to be very special with it's own layout lo(ma)gic. For example, I copied the source of j.l.Class into j.l.MyClass and chopped-out all the methods, which gives the following (32 bit pointers):
java.lang.MyClass instance field offsets: Field Type Field Name Offset ---------- ---------- ------ * int classRedefinedCount 12** ** int lastAnnotationsRedefinedCount 16* Constructor cachedConstructor 20 Class newInstanceCallerCache 24 String name 28 SoftReference reflectionData 32 ClassRepository genericInfo 36 Object[] enumConstants 40 Map enumConstantDirectory 44 Map annotations 48 Map declaredAnnotations 52 AnnotationType annotationType 56 ClassValueMap classValueMap 60 the primitive fields come before pointers whereas in j.l.Class: java.lang.Class instance field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap 52 * int classRedefinedCount 80** ** int lastAnnotationsRedefinedCount 84* ...they come after the pointers and the first one has a strange alignment... Regards, Peter
On 01/25/2013 05:34 PM, Aleksey Shipilev wrote:
But, otherwise, can't we use java.lang.ClassValue to associate this flag with each class? That is my proposed patch. It tries to be space saving and does not associate the flag with each class, but only with each subclass of java.lang.reflect.Proxy.
Regards, Peter
On 01/25/2013 08:40 PM, Peter Levart wrote:
On 01/25/2013 05:34 PM, Aleksey Shipilev wrote:
But, otherwise, can't we use java.lang.ClassValue to associate this flag with each class? That is my proposed patch. It tries to be space saving and does not associate the flag with each class, but only with each subclass of java.lang.reflect.Proxy.
Ah, I see. Sorry, I wasn't following the thread carefully. Cramming the boolean field into j.l.Class when we have ClassValue does not seem worth considering. -Aleksey.
Hi David, I was surprised to see Usafe report these offsets. See below: java.lang.Class *instance* field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap *52* Why the *24 bytes* gap between /classValueMap/ and /classRedefinedCount/ fields? int classRedefinedCount *80* int lastAnnotationsRedefinedCount 84 java.lang.String *static* field offsets: Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 96 Comparator CASE_INSENSITIVE_ORDER 100 long serialVersionUID 104 int HASHING_SEED 112 The 64 bit pointers variant: java.lang.Class instance field offsets: Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap *96* *24 bytes* gap here too! int classRedefinedCount *128* int lastAnnotationsRedefinedCount 132 java.lang.String static field offsets: Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 144 Comparator CASE_INSENSITIVE_ORDER 152 long serialVersionUID 160 int HASHING_SEED 168 Might it be that the "classRedefinedCount" field offset is fixed somehow in VM, since the VM has to update it? Should there be VM changes also to accomodate ReflectionData changes? Are there VM fields inserted here that don't have a Java mapping? Regards, Peter
Hi David, I think I already have a kind of answer. You wrote it in "RFR: 8005232 (JEP-149) Class Instance size reduction": On 01/06/2013 11:46 PM, David Holmes wrote:
In Java 8, using a 32-bit example, a java.lang.Class instance is 112 bytes consisting of:
- 8 byte object header - 20 declared fields (mostly references, some int) *- 5 injected fields (3 references, 2 ints) *
That gives: 8 + (20*4) +(5*4) = 108 bytes. But as we need 8-byte alignment that increases to 112 bytes.
Regards, Peter On 01/25/2013 05:34 PM, Peter Levart wrote:
Hi David,
I was surprised to see Usafe report these offsets. See below:
java.lang.Class *instance* field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap *52*
Why the *24 bytes* gap between /classValueMap/ and /classRedefinedCount/ fields?
int classRedefinedCount *80* int lastAnnotationsRedefinedCount 84
java.lang.String *static* field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 96 Comparator CASE_INSENSITIVE_ORDER 100 long serialVersionUID 104 int HASHING_SEED 112
The 64 bit pointers variant:
java.lang.Class instance field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap *96*
*24 bytes* gap here too!
int classRedefinedCount *128* int lastAnnotationsRedefinedCount 132
java.lang.String static field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 144 Comparator CASE_INSENSITIVE_ORDER 152 long serialVersionUID 160 int HASHING_SEED 168
Might it be that the "classRedefinedCount" field offset is fixed somehow in VM, since the VM has to update it? Should there be VM changes also to accomodate ReflectionData changes? Are there VM fields inserted here that don't have a Java mapping?
Regards, Peter
Right - I was going to ask if those tools took into account injected fields. David On 26/01/2013 3:32 AM, Peter Levart wrote:
Hi David,
I think I already have a kind of answer. You wrote it in "RFR: 8005232 (JEP-149) Class Instance size reduction":
On 01/06/2013 11:46 PM, David Holmes wrote:
In Java 8, using a 32-bit example, a java.lang.Class instance is 112 bytes consisting of:
- 8 byte object header - 20 declared fields (mostly references, some int) *- 5 injected fields (3 references, 2 ints) *
That gives: 8 + (20*4) +(5*4) = 108 bytes. But as we need 8-byte alignment that increases to 112 bytes.
Regards, Peter
On 01/25/2013 05:34 PM, Peter Levart wrote:
Hi David,
I was surprised to see Usafe report these offsets. See below:
java.lang.Class *instance* field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 12 Class newInstanceCallerCache 16 String name 20 SoftReference reflectionData 24 ClassRepository genericInfo 28 Object[] enumConstants 32 Map enumConstantDirectory 36 Map annotations 40 Map declaredAnnotations 44 AnnotationType annotationType 48 ClassValueMap classValueMap *52*
Why the *24 bytes* gap between /classValueMap/ and /classRedefinedCount/ fields?
int classRedefinedCount *80* int lastAnnotationsRedefinedCount 84
java.lang.String *static* field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 96 Comparator CASE_INSENSITIVE_ORDER 100 long serialVersionUID 104 int HASHING_SEED 112
The 64 bit pointers variant:
java.lang.Class instance field offsets:
Field Type Field Name Offset ---------- ---------- ------ Constructor cachedConstructor 16 Class newInstanceCallerCache 24 String name 32 SoftReference reflectionData 40 ClassRepository genericInfo 48 Object[] enumConstants 56 Map enumConstantDirectory 64 Map annotations 72 Map declaredAnnotations 80 AnnotationType annotationType 88 ClassValueMap classValueMap *96*
*24 bytes* gap here too!
int classRedefinedCount *128* int lastAnnotationsRedefinedCount 132
java.lang.String static field offsets:
Field Type Field Name Offset ---------- ---------- ------ ObjectStreamField[] serialPersistentFields 144 Comparator CASE_INSENSITIVE_ORDER 152 long serialVersionUID 160 int HASHING_SEED 168
Might it be that the "classRedefinedCount" field offset is fixed somehow in VM, since the VM has to update it? Should there be VM changes also to accomodate ReflectionData changes? Are there VM fields inserted here that don't have a Java mapping?
Regards, Peter
On 01/24/2013 03:34 PM, Peter Levart wrote:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first? Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.do?bug_id=7123493
-Alan. I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
Hi Alan, David, I thought about the ways to fix Proxy.isProxyClass() scalability and the Proxy.getProxyClass() scalability. While they are different methods, each with it's own data structure, I think that both problems can be solved with a single solution and that solution does not involve neither adding fields to j.l.Class nor ClassValue. The solution is actually very simple. I just want to validate my reasoning before jumping to implement it: - for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader Both maps hold strong references to Class objects, but only for the classes that are loaded by the ClassLoader that references them. Each ClassLoader already holds a strong reference to all the Class objects for the classes that were loaded by it in a Vector. Holding another reference does not present any problem, right? I think this would be the best solution and it would solve both scalability problems of j.l.Proxy in one go. Am I missing something? Regards, Peter
Hi Alan, David, I think I have a solution for scalability problems of java.lang.reflect.Proxy: * http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.01/index.html I designed it along the lines mentioned in previous mail (see below). First I thought that both caches (a cache mapping request parameters of Proxy.getProxyClass to a proxy Class and a cache for resolving the Proxy.isProxyClass method) could be placed inside the j.l.ClassLoader. And they actually could. I tried this and it worked correctly, but the raw (single-threaded) performance of Proxy.isProxyClass was half the performance of current JDK8 isProxyClass method that way. The main performance hog I identified was the Class.getClassLoader() method. This method delegates to a native JVM method after checking the callers permissions. So the plan to keep the isProxyClass cache inside the ClassLoader failed. But then the ClassValue solution that I proposed in previous revision works beautifully and fast, so I kept this solution for isProxyClass cache. The ThreadLocal trick to initialize the ClassValue for a particular proxy class is necessary because ClassValue does not have a put() method. It does have actually, but it's package private and the comment suggests that it might not be there forever: // Possible functionality for JSR 292 MR 1 /*public*/ void put(Class<?> type, T value) { ClassValueMap map = getMap(type); map.changeEntry(this, value); } I could use the JavaLangAccess to access it from Proxy code though, so this is another refinement that is possible and would eliminate the need for ThreadLocal trick. The cache for getProxyClass method on the other hand is most naturally hosted by j.l.ClassLoader and that's where I put it in the proposed patch. Currently the cross-package access to it is implemented using Unsafe, but the performance aspect is not so critical that It couldn't be changed to using JavaLangAccess instead if desired. I did some performance testing too. Here are the results taken on 3 different systems (i7 Linux PC, Raspberry Pi and Sun-Blade-T6320 Solaris machine): * https://raw.github.com/plevart/jdk8-tl/proxy/test/proxy_benchmark_results.tx... These are the test sources i used: * https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/ProxyBenchmarkTe... * https://github.com/plevart/micro-bench/blob/master/src/si/pele/microbench/Te... Results show scalability and raw performance improvements for both critical Proxy methods. So what do you think? Regards, Peter On 01/25/2013 06:55 PM, Peter Levart wrote:
On 01/24/2013 03:34 PM, Peter Levart wrote:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first? Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.do?bug_id=7123493
-Alan. I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
Hi Alan, David,
I thought about the ways to fix Proxy.isProxyClass() scalability and the Proxy.getProxyClass() scalability. While they are different methods, each with it's own data structure, I think that both problems can be solved with a single solution and that solution does not involve neither adding fields to j.l.Class nor ClassValue.
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
Both maps hold strong references to Class objects, but only for the classes that are loaded by the ClassLoader that references them. Each ClassLoader already holds a strong reference to all the Class objects for the classes that were loaded by it in a Vector. Holding another reference does not present any problem, right?
I think this would be the best solution and it would solve both scalability problems of j.l.Proxy in one go.
Am I missing something?
Regards, Peter
Hi again, By simple rearrangement of code in Proxy.getProxyClass (moved the parameters validation to slow-path) I managed to speed-up the fast-path getProxyClass so that now it's raw speed is 40x the speed of current getProxyClass method: * http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.02/index.html Here are quick measurements (only for i7 this time): * https://raw.github.com/plevart/jdk8-tl/proxy/test/proxy_benchmark_results_v2... Regards, Peter On 01/27/2013 02:32 PM, Peter Levart wrote:
Hi Alan, David,
I think I have a solution for scalability problems of java.lang.reflect.Proxy:
* http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.01/index.html
I designed it along the lines mentioned in previous mail (see below). First I thought that both caches (a cache mapping request parameters of Proxy.getProxyClass to a proxy Class and a cache for resolving the Proxy.isProxyClass method) could be placed inside the j.l.ClassLoader. And they actually could. I tried this and it worked correctly, but the raw (single-threaded) performance of Proxy.isProxyClass was half the performance of current JDK8 isProxyClass method that way. The main performance hog I identified was the Class.getClassLoader() method. This method delegates to a native JVM method after checking the callers permissions. So the plan to keep the isProxyClass cache inside the ClassLoader failed. But then the ClassValue solution that I proposed in previous revision works beautifully and fast, so I kept this solution for isProxyClass cache. The ThreadLocal trick to initialize the ClassValue for a particular proxy class is necessary because ClassValue does not have a put() method. It does have actually, but it's package private and the comment suggests that it might not be there forever:
// Possible functionality for JSR 292 MR 1 /*public*/ void put(Class<?> type, T value) { ClassValueMap map = getMap(type); map.changeEntry(this, value); }
I could use the JavaLangAccess to access it from Proxy code though, so this is another refinement that is possible and would eliminate the need for ThreadLocal trick. The cache for getProxyClass method on the other hand is most naturally hosted by j.l.ClassLoader and that's where I put it in the proposed patch. Currently the cross-package access to it is implemented using Unsafe, but the performance aspect is not so critical that It couldn't be changed to using JavaLangAccess instead if desired. I did some performance testing too. Here are the results taken on 3 different systems (i7 Linux PC, Raspberry Pi and Sun-Blade-T6320 Solaris machine):
* https://raw.github.com/plevart/jdk8-tl/proxy/test/proxy_benchmark_results.tx...
These are the test sources i used:
* https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/ProxyBenchmarkTe... * https://github.com/plevart/micro-bench/blob/master/src/si/pele/microbench/Te...
Results show scalability and raw performance improvements for both critical Proxy methods.
So what do you think?
Regards, Peter
On 01/25/2013 06:55 PM, Peter Levart wrote:
On 01/24/2013 03:34 PM, Peter Levart wrote:
On 01/24/2013 03:10 PM, Alan Bateman wrote:
On 24/01/2013 13:49, Peter Levart wrote:
Should I file a RFE first? Sorry I don't have time at the moment to study the proposed patch but just to mention that it has come up a few times, its just that it never bubbled up to the top of anyone's list. Here's the bug tracking it:
http://bugs.sun.com/view_bug.do?bug_id=7123493
-Alan. I belive that is another bottleneck. It is mentioning the Proxy.getProxyClass method which also uses synchronization for maintaining a cache of proxy classes by request parameters. I could as well try to fix this too in the same patch if there is interest.
Regards, Peter
Hi Alan, David,
I thought about the ways to fix Proxy.isProxyClass() scalability and the Proxy.getProxyClass() scalability. While they are different methods, each with it's own data structure, I think that both problems can be solved with a single solution and that solution does not involve neither adding fields to j.l.Class nor ClassValue.
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
Both maps hold strong references to Class objects, but only for the classes that are loaded by the ClassLoader that references them. Each ClassLoader already holds a strong reference to all the Class objects for the classes that were loaded by it in a Vector. Holding another reference does not present any problem, right?
I think this would be the best solution and it would solve both scalability problems of j.l.Proxy in one go.
Am I missing something?
Regards, Peter
On 25/01/2013 17:55, Peter Levart wrote:
:
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
I haven't had time to look very closely as your more recent changes (you are clearly doing very good work here). The only thing I wonder if whether it would be possible to avoid adding to ClassLoader. I can't say what percentage of frameworks and applications use proxies but it would be nice if the impact on applications that don't use proxies is zero. -Alan
On 01/28/2013 12:49 PM, Alan Bateman wrote:
On 25/01/2013 17:55, Peter Levart wrote:
:
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
I haven't had time to look very closely as your more recent changes (you are clearly doing very good work here). The only thing I wonder if whether it would be possible to avoid adding to ClassLoader. I can't say what percentage of frameworks and applications use proxies but it would be nice if the impact on applications that don't use proxies is zero. Hi Alan,
Hm, well. Any application that uses run-time annotations, is implicitly using Proxies. But I agree that there are applications that don't use either. Such applications usually don't use many ClassLoaders either. Applications that use many ClassLoaders are typically application servers or applications written for modular systems (such as OSGI or NetBeans) and all those applications are also full of runtime annotations nowadays. So a typical application that does not use neither Proxies nor runtime annotations is composed of bootstrap classloader, AppClassLoader and ExtClassLoader. The ConcurrentHashMap for the bootstrap classloader is hosted by j.l.r.Proxy class and is only initialized when the j.l.r.Proxy class is initialized - so in this case never. The overhead for such applications is therefore an empty ConcurrentHashMap instance plus the overhead for a pointer slot in the ClassLoader object multiplied by the number of ClassLoaders (typically 2). An empty ConcurrentHashMap in JDK8 is only pre-allocating a single internal Segment: java.util.concurrent.ConcurrentHashMap@642b6fc7(48 bytes) { keySet: null values: null hashSeed: int segmentMask: int segmentShift: int segments: java.util.concurrent.ConcurrentHashMap$Segment[16]@8e1dfb1(80 bytes) { java.util.concurrent.ConcurrentHashMap$Segment@2524e205(40 bytes) { sync: java.util.concurrent.locks.ReentrantLock$NonfairSync@17feafba(32 bytes) { exclusiveOwnerThread: null head: null tail: null state: int }->(32 deep bytes) table: java.util.concurrent.ConcurrentHashMap$HashEntry[2]@1c3aacb4(24 bytes) { null null }->(24 deep bytes) count: int modCount: int threshold: int loadFactor: float }->(96 deep bytes) null null null null null null null null null null null null null null null }->(176 deep bytes) keySet: null entrySet: null values: null }->(224 deep bytes) ...therefore the overhead is approx. 224+4 = 228 bytes (on 32 bit pointer environments) per ClassLoader. In typical application (with 2 ClassLoader objects) this amounts to approx. 456 bytes. Is 456 bytes overhead too much? If it is, I could do lazy initialization of per-classloader CHM instances, but then the fast-path would incur a little additional penalty (not to be taken seriously though). Regards, Peter P.S. I was inspecting the ClassValue internal implementation. This is a very nice piece of Java code. Without using any Unsafe magic, it provides a perfect performant an scalable map of lazily initialized independent data structures associated with Class instances. There's nothing special about ClassValue/ClassValueMap that would tie it to Class instances. In fact I think the ClassValueMap could be made generic so it could be reused for implementing a ClasLoaderValue, for example. This would provide a more performant and scalable alternative to using WeakHashMap<ClassLoader, ...> wrapped by synchronized wrappers for other uses. If anything like that happens in the future, the proposed patch can be quickly adapted to using that infrastructure instead of a direct reference in the ClassLoader.
-Alan
Hi Alan, I prepared the variant with lazy initialization of ConcurrentHashMaps per ClassLoader and performance measurements show no differences. So here's this variant: * http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.03/index.html I also checked the ClassLoader layout and as it happens the additional pointer slot increases the ClassLoader object size by 8 bytes in both addressing modes: 32bit and 64bit. But that's a small overhead compared to for example the deep-size of AppClassLoader at the beginning of the main method: 14648 bytes (32bit) / 22432 bytes (64bit). Regards, Peter On 01/28/2013 01:57 PM, Peter Levart wrote:
On 01/28/2013 12:49 PM, Alan Bateman wrote:
On 25/01/2013 17:55, Peter Levart wrote:
:
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
I haven't had time to look very closely as your more recent changes (you are clearly doing very good work here). The only thing I wonder if whether it would be possible to avoid adding to ClassLoader. I can't say what percentage of frameworks and applications use proxies but it would be nice if the impact on applications that don't use proxies is zero. Hi Alan,
Hm, well. Any application that uses run-time annotations, is implicitly using Proxies. But I agree that there are applications that don't use either. Such applications usually don't use many ClassLoaders either. Applications that use many ClassLoaders are typically application servers or applications written for modular systems (such as OSGI or NetBeans) and all those applications are also full of runtime annotations nowadays. So a typical application that does not use neither Proxies nor runtime annotations is composed of bootstrap classloader, AppClassLoader and ExtClassLoader. The ConcurrentHashMap for the bootstrap classloader is hosted by j.l.r.Proxy class and is only initialized when the j.l.r.Proxy class is initialized - so in this case never. The overhead for such applications is therefore an empty ConcurrentHashMap instance plus the overhead for a pointer slot in the ClassLoader object multiplied by the number of ClassLoaders (typically 2). An empty ConcurrentHashMap in JDK8 is only pre-allocating a single internal Segment:
java.util.concurrent.ConcurrentHashMap@642b6fc7(48 bytes) { keySet: null values: null hashSeed: int segmentMask: int segmentShift: int segments: java.util.concurrent.ConcurrentHashMap$Segment[16]@8e1dfb1(80 bytes) { java.util.concurrent.ConcurrentHashMap$Segment@2524e205(40 bytes) { sync: java.util.concurrent.locks.ReentrantLock$NonfairSync@17feafba(32 bytes) { exclusiveOwnerThread: null head: null tail: null state: int }->(32 deep bytes) table: java.util.concurrent.ConcurrentHashMap$HashEntry[2]@1c3aacb4(24 bytes) { null null }->(24 deep bytes) count: int modCount: int threshold: int loadFactor: float }->(96 deep bytes) null null null null null null null null null null null null null null null }->(176 deep bytes) keySet: null entrySet: null values: null }->(224 deep bytes)
...therefore the overhead is approx. 224+4 = 228 bytes (on 32 bit pointer environments) per ClassLoader. In typical application (with 2 ClassLoader objects) this amounts to approx. 456 bytes.
Is 456 bytes overhead too much?
If it is, I could do lazy initialization of per-classloader CHM instances, but then the fast-path would incur a little additional penalty (not to be taken seriously though).
Regards, Peter
P.S. I was inspecting the ClassValue internal implementation. This is a very nice piece of Java code. Without using any Unsafe magic, it provides a perfect performant an scalable map of lazily initialized independent data structures associated with Class instances. There's nothing special about ClassValue/ClassValueMap that would tie it to Class instances. In fact I think the ClassValueMap could be made generic so it could be reused for implementing a ClasLoaderValue, for example. This would provide a more performant and scalable alternative to using WeakHashMap<ClassLoader, ...> wrapped by synchronized wrappers for other uses. If anything like that happens in the future, the proposed patch can be quickly adapted to using that infrastructure instead of a direct reference in the ClassLoader.
-Alan
Hi Alan, Is this still being considered? Recently there were some changes in the j.l.r.Proxy (the @CallerSensitive annotation), so my final webrev of the patch will have to be rebased. Do you still think we should not add new fields to ClassLoader? I have some ideas how to do it without adding a field to ClassLoader but for the price of some additional space overhead for each Proxy class produced. On the other hand I think that in the future, more platform-internal data structures would want to be "attached" to ClassLoaders so perhaps a single ConcurrentHashMap per ClassLoader could be re-used for different purposes by using distinct (never-equal) keys for each purpose (fox example, the lambda metafactory currently does not do any caching for it's own spun SAM proxy classes or CallSite objects, but could benefit quite a bit from doing so). Regards, Peter On 01/28/2013 05:13 PM, Peter Levart wrote:
Hi Alan,
I prepared the variant with lazy initialization of ConcurrentHashMaps per ClassLoader and performance measurements show no differences. So here's this variant:
* http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.03/index.html
I also checked the ClassLoader layout and as it happens the additional pointer slot increases the ClassLoader object size by 8 bytes in both addressing modes: 32bit and 64bit. But that's a small overhead compared to for example the deep-size of AppClassLoader at the beginning of the main method: 14648 bytes (32bit) / 22432 bytes (64bit).
Regards, Peter
On 01/28/2013 01:57 PM, Peter Levart wrote:
On 01/28/2013 12:49 PM, Alan Bateman wrote:
On 25/01/2013 17:55, Peter Levart wrote:
:
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
I haven't had time to look very closely as your more recent changes (you are clearly doing very good work here). The only thing I wonder if whether it would be possible to avoid adding to ClassLoader. I can't say what percentage of frameworks and applications use proxies but it would be nice if the impact on applications that don't use proxies is zero. Hi Alan,
Hm, well. Any application that uses run-time annotations, is implicitly using Proxies. But I agree that there are applications that don't use either. Such applications usually don't use many ClassLoaders either. Applications that use many ClassLoaders are typically application servers or applications written for modular systems (such as OSGI or NetBeans) and all those applications are also full of runtime annotations nowadays. So a typical application that does not use neither Proxies nor runtime annotations is composed of bootstrap classloader, AppClassLoader and ExtClassLoader. The ConcurrentHashMap for the bootstrap classloader is hosted by j.l.r.Proxy class and is only initialized when the j.l.r.Proxy class is initialized - so in this case never. The overhead for such applications is therefore an empty ConcurrentHashMap instance plus the overhead for a pointer slot in the ClassLoader object multiplied by the number of ClassLoaders (typically 2). An empty ConcurrentHashMap in JDK8 is only pre-allocating a single internal Segment:
java.util.concurrent.ConcurrentHashMap@642b6fc7(48 bytes) { keySet: null values: null hashSeed: int segmentMask: int segmentShift: int segments: java.util.concurrent.ConcurrentHashMap$Segment[16]@8e1dfb1(80 bytes) { java.util.concurrent.ConcurrentHashMap$Segment@2524e205(40 bytes) { sync: java.util.concurrent.locks.ReentrantLock$NonfairSync@17feafba(32 bytes) { exclusiveOwnerThread: null head: null tail: null state: int }->(32 deep bytes) table: java.util.concurrent.ConcurrentHashMap$HashEntry[2]@1c3aacb4(24 bytes) { null null }->(24 deep bytes) count: int modCount: int threshold: int loadFactor: float }->(96 deep bytes) null null null null null null null null null null null null null null null }->(176 deep bytes) keySet: null entrySet: null values: null }->(224 deep bytes)
...therefore the overhead is approx. 224+4 = 228 bytes (on 32 bit pointer environments) per ClassLoader. In typical application (with 2 ClassLoader objects) this amounts to approx. 456 bytes.
Is 456 bytes overhead too much?
If it is, I could do lazy initialization of per-classloader CHM instances, but then the fast-path would incur a little additional penalty (not to be taken seriously though).
Regards, Peter
P.S. I was inspecting the ClassValue internal implementation. This is a very nice piece of Java code. Without using any Unsafe magic, it provides a perfect performant an scalable map of lazily initialized independent data structures associated with Class instances. There's nothing special about ClassValue/ClassValueMap that would tie it to Class instances. In fact I think the ClassValueMap could be made generic so it could be reused for implementing a ClasLoaderValue, for example. This would provide a more performant and scalable alternative to using WeakHashMap<ClassLoader, ...> wrapped by synchronized wrappers for other uses. If anything like that happens in the future, the proposed patch can be quickly adapted to using that infrastructure instead of a direct reference in the ClassLoader.
-Alan
Hi Alan, I have prepared new webrev of the patch rebased to current tip of jdk8/tl repo: https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy/webrev.04/index.... I noticed there were quite a few changes to the j.l.r.Proxy in the meanwhile regarding new security checks. But @CallerSensitive changes seem not to have been pushed yet. Anyway, I have also simplified the caching logic a bit so that it's now easier to reason about it's correctness. I have re-run the performance benchmarks that are still very favourable: https://raw.github.com/plevart/jdk8-tl/proxy/test/proxy_benchmark_results.tx... Proxy.getProxyClass(): is almost 15x faster single-threaded with same or even slightly better scalability Proxy.isProxyClass() == true: is about 9x faster for single-threaded execution and much more scalable Proxy.isProxyClass() == false: is about 1600x faster single-threaded and infinitely scalable Annotation.equals(): is 1.6x faster single-threaded and much more scalable I also devised an alternative caching mechanism with scalability in mind which uses WeakReferences for keys (for example ClassLoader) and values (for example Class) that could be used in this situation in case adding a field to ClassLoader is not an option: https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/WeakCache.java Regards, Peter On 01/28/2013 05:13 PM, Peter Levart wrote:
Hi Alan,
I prepared the variant with lazy initialization of ConcurrentHashMaps per ClassLoader and performance measurements show no differences. So here's this variant:
* http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.03/index.html
I also checked the ClassLoader layout and as it happens the additional pointer slot increases the ClassLoader object size by 8 bytes in both addressing modes: 32bit and 64bit. But that's a small overhead compared to for example the deep-size of AppClassLoader at the beginning of the main method: 14648 bytes (32bit) / 22432 bytes (64bit).
Regards, Peter
On 01/28/2013 01:57 PM, Peter Levart wrote:
On 01/28/2013 12:49 PM, Alan Bateman wrote:
On 25/01/2013 17:55, Peter Levart wrote:
:
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
I haven't had time to look very closely as your more recent changes (you are clearly doing very good work here). The only thing I wonder if whether it would be possible to avoid adding to ClassLoader. I can't say what percentage of frameworks and applications use proxies but it would be nice if the impact on applications that don't use proxies is zero. Hi Alan,
Hm, well. Any application that uses run-time annotations, is implicitly using Proxies. But I agree that there are applications that don't use either. Such applications usually don't use many ClassLoaders either. Applications that use many ClassLoaders are typically application servers or applications written for modular systems (such as OSGI or NetBeans) and all those applications are also full of runtime annotations nowadays. So a typical application that does not use neither Proxies nor runtime annotations is composed of bootstrap classloader, AppClassLoader and ExtClassLoader. The ConcurrentHashMap for the bootstrap classloader is hosted by j.l.r.Proxy class and is only initialized when the j.l.r.Proxy class is initialized - so in this case never. The overhead for such applications is therefore an empty ConcurrentHashMap instance plus the overhead for a pointer slot in the ClassLoader object multiplied by the number of ClassLoaders (typically 2). An empty ConcurrentHashMap in JDK8 is only pre-allocating a single internal Segment:
java.util.concurrent.ConcurrentHashMap@642b6fc7(48 bytes) { keySet: null values: null hashSeed: int segmentMask: int segmentShift: int segments: java.util.concurrent.ConcurrentHashMap$Segment[16]@8e1dfb1(80 bytes) { java.util.concurrent.ConcurrentHashMap$Segment@2524e205(40 bytes) { sync: java.util.concurrent.locks.ReentrantLock$NonfairSync@17feafba(32 bytes) { exclusiveOwnerThread: null head: null tail: null state: int }->(32 deep bytes) table: java.util.concurrent.ConcurrentHashMap$HashEntry[2]@1c3aacb4(24 bytes) { null null }->(24 deep bytes) count: int modCount: int threshold: int loadFactor: float }->(96 deep bytes) null null null null null null null null null null null null null null null }->(176 deep bytes) keySet: null entrySet: null values: null }->(224 deep bytes)
...therefore the overhead is approx. 224+4 = 228 bytes (on 32 bit pointer environments) per ClassLoader. In typical application (with 2 ClassLoader objects) this amounts to approx. 456 bytes.
Is 456 bytes overhead too much?
If it is, I could do lazy initialization of per-classloader CHM instances, but then the fast-path would incur a little additional penalty (not to be taken seriously though).
Regards, Peter
P.S. I was inspecting the ClassValue internal implementation. This is a very nice piece of Java code. Without using any Unsafe magic, it provides a perfect performant an scalable map of lazily initialized independent data structures associated with Class instances. There's nothing special about ClassValue/ClassValueMap that would tie it to Class instances. In fact I think the ClassValueMap could be made generic so it could be reused for implementing a ClasLoaderValue, for example. This would provide a more performant and scalable alternative to using WeakHashMap<ClassLoader, ...> wrapped by synchronized wrappers for other uses. If anything like that happens in the future, the proposed patch can be quickly adapted to using that infrastructure instead of a direct reference in the ClassLoader.
-Alan
On 4/10/2013 5:35 AM, Peter Levart wrote:
Hi Alan,
I have prepared new webrev of the patch rebased to current tip of jdk8/tl repo:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy/webrev.04/index....
I noticed there were quite a few changes to the j.l.r.Proxy in the meanwhile regarding new security checks. But @CallerSensitive changes seem not to have been pushed yet.
This has been waiting for the VM support which is now in jdk8/hotspot. Once TL is synced up jdk8 master (soon be this week), I'll push the changes.
Anyway, I have also simplified the caching logic a bit so that it's now easier to reason about it's correctness. I have re-run the performance benchmarks that are still very favourable:
https://raw.github.com/plevart/jdk8-tl/proxy/test/proxy_benchmark_results.tx...
Proxy.getProxyClass(): is almost 15x faster single-threaded with same or even slightly better scalability Proxy.isProxyClass() == true: is about 9x faster for single-threaded execution and much more scalable Proxy.isProxyClass() == false: is about 1600x faster single-threaded and infinitely scalable Annotation.equals(): is 1.6x faster single-threaded and much more scalable
That's great improvement. I have another patch in Proxy coming out for review soon. When I get several urgent things on my plate done, I'll take a look at your patch and get back to you this week hopefully. Mandy
I also devised an alternative caching mechanism with scalability in mind which uses WeakReferences for keys (for example ClassLoader) and values (for example Class) that could be used in this situation in case adding a field to ClassLoader is not an option:
https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/WeakCache.java
Regards, Peter
On 01/28/2013 05:13 PM, Peter Levart wrote:
Hi Alan,
I prepared the variant with lazy initialization of ConcurrentHashMaps per ClassLoader and performance measurements show no differences. So here's this variant:
* http://dl.dropbox.com/u/101777488/jdk8-tl/proxy/webrev.03/index.html
I also checked the ClassLoader layout and as it happens the additional pointer slot increases the ClassLoader object size by 8 bytes in both addressing modes: 32bit and 64bit. But that's a small overhead compared to for example the deep-size of AppClassLoader at the beginning of the main method: 14648 bytes (32bit) / 22432 bytes (64bit).
Regards, Peter
On 01/28/2013 01:57 PM, Peter Levart wrote:
On 01/28/2013 12:49 PM, Alan Bateman wrote:
On 25/01/2013 17:55, Peter Levart wrote:
:
The solution is actually very simple. I just want to validate my reasoning before jumping to implement it:
- for solving scalability of getProxyClass cache, a field with a reference to ConcurrentHashMap<List<String>, Class<? extends Proxy>> is added to j.l.ClassLoader - for solving scalability of isProxyClass, a field with a reference to ConcurrentHashMap<Class<? extends Proxy>, Boolean> is added to j.l.ClassLoader
I haven't had time to look very closely as your more recent changes (you are clearly doing very good work here). The only thing I wonder if whether it would be possible to avoid adding to ClassLoader. I can't say what percentage of frameworks and applications use proxies but it would be nice if the impact on applications that don't use proxies is zero. Hi Alan,
Hm, well. Any application that uses run-time annotations, is implicitly using Proxies. But I agree that there are applications that don't use either. Such applications usually don't use many ClassLoaders either. Applications that use many ClassLoaders are typically application servers or applications written for modular systems (such as OSGI or NetBeans) and all those applications are also full of runtime annotations nowadays. So a typical application that does not use neither Proxies nor runtime annotations is composed of bootstrap classloader, AppClassLoader and ExtClassLoader. The ConcurrentHashMap for the bootstrap classloader is hosted by j.l.r.Proxy class and is only initialized when the j.l.r.Proxy class is initialized - so in this case never. The overhead for such applications is therefore an empty ConcurrentHashMap instance plus the overhead for a pointer slot in the ClassLoader object multiplied by the number of ClassLoaders (typically 2). An empty ConcurrentHashMap in JDK8 is only pre-allocating a single internal Segment:
java.util.concurrent.ConcurrentHashMap@642b6fc7(48 bytes) { keySet: null values: null hashSeed: int segmentMask: int segmentShift: int segments: java.util.concurrent.ConcurrentHashMap$Segment[16]@8e1dfb1(80 bytes) { java.util.concurrent.ConcurrentHashMap$Segment@2524e205(40 bytes) { sync: java.util.concurrent.locks.ReentrantLock$NonfairSync@17feafba(32 bytes) { exclusiveOwnerThread: null head: null tail: null state: int }->(32 deep bytes) table: java.util.concurrent.ConcurrentHashMap$HashEntry[2]@1c3aacb4(24 bytes) { null null }->(24 deep bytes) count: int modCount: int threshold: int loadFactor: float }->(96 deep bytes) null null null null null null null null null null null null null null null }->(176 deep bytes) keySet: null entrySet: null values: null }->(224 deep bytes)
...therefore the overhead is approx. 224+4 = 228 bytes (on 32 bit pointer environments) per ClassLoader. In typical application (with 2 ClassLoader objects) this amounts to approx. 456 bytes.
Is 456 bytes overhead too much?
If it is, I could do lazy initialization of per-classloader CHM instances, but then the fast-path would incur a little additional penalty (not to be taken seriously though).
Regards, Peter
P.S. I was inspecting the ClassValue internal implementation. This is a very nice piece of Java code. Without using any Unsafe magic, it provides a perfect performant an scalable map of lazily initialized independent data structures associated with Class instances. There's nothing special about ClassValue/ClassValueMap that would tie it to Class instances. In fact I think the ClassValueMap could be made generic so it could be reused for implementing a ClasLoaderValue, for example. This would provide a more performant and scalable alternative to using WeakHashMap<ClassLoader, ...> wrapped by synchronized wrappers for other uses. If anything like that happens in the future, the proposed patch can be quickly adapted to using that infrastructure instead of a direct reference in the ClassLoader.
-Alan
Hi Peter, Thank you for rebasing the patch. This is very good work and I hope to make time to work with you to get your patch to jdk8 over the next couple weeks. On 4/10/2013 5:35 AM, Peter Levart wrote:
Hi Alan,
I have prepared new webrev of the patch rebased to current tip of jdk8/tl repo:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy/webrev.04/index....
[...]
I also devised an alternative caching mechanism with scalability in mind which uses WeakReferences for keys (for example ClassLoader) and values (for example Class) that could be used in this situation in case adding a field to ClassLoader is not an option:
I would also consider any alternative to avoid adding the proxyClassCache field in ClassLoader as Alan commented previously. My observation of the typical usage of proxies is to use the interface's class loader to define the proxy class. So is it necessary to maintain a per-loader cache? The per-loader cache maps from the interface names to a proxy class defined by one loader. I would think it's reasonable to assume the number of loaders to define proxy class with the same set of interfaces is small. What if we make the cache as "interface names" as the key to a set of proxy class suppliers that can have only one proxy class per one unique defining loader. If the proxy class is being generated i.e. ProxyClassFactory supplier, the loader is available for comparison. When there are more than one matching proxy classes, it would have to iterate all in the set. This alternative is sub-optimal than the per-loader cache. I'd be interested in your thought of this alternative and any rough idea of the performance difference with and without the per-loader cache approach for the annotation case. Coding convention: we use /** ... */ for javadoc and /*...*/ or // for comments. Your patch uses /**...*/ style as comments that need fixing. The style you have for try { } catch { } finally { } Mandy
https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/WeakCache.java
Regards, Peter
Hi Mandy, On 04/12/2013 11:31 PM, Mandy Chung wrote:
Hi Peter,
Thank you for rebasing the patch. This is very good work and I hope to make time to work with you to get your patch to jdk8 over the next couple weeks.
I hope I can be of assistance,
On 4/10/2013 5:35 AM, Peter Levart wrote:
Hi Alan,
I have prepared new webrev of the patch rebased to current tip of jdk8/tl repo:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy/webrev.04/index....
[...]
I also devised an alternative caching mechanism with scalability in mind which uses WeakReferences for keys (for example ClassLoader) and values (for example Class) that could be used in this situation in case adding a field to ClassLoader is not an option:
I would also consider any alternative to avoid adding the proxyClassCache field in ClassLoader as Alan commented previously.
My observation of the typical usage of proxies is to use the interface's class loader to define the proxy class. So is it necessary to maintain a per-loader cache? The per-loader cache maps from the interface names to a proxy class defined by one loader. I would think it's reasonable to assume the number of loaders to define proxy class with the same set of interfaces is small. What if we make the cache as "interface names" as the key to a set of proxy class suppliers that can have only one proxy class per one unique defining loader. If the proxy class is being generated i.e. ProxyClassFactory supplier, the loader is available for comparison. When there are more than one matching proxy classes, it would have to iterate all in the set.
I would assume yes, proxy class for a particular set of interfaces is typically defined by one classloader only. But the API allows to specify different loaders as long as the interfaces implemented by proxy class are "visible" from the loader that defines the proxy class. If we're talking about interface names - as opposed to interfaces - then the possibility that a particular set of interface names would want to be used to define proxy classes with different loaders is even bigger, since an interface name can refer to different interfaces with same name (think of interfaces deployed as part of an app in an application server, say a set of annotations used by different apps but deployed as part of each individual app). The scheme you're proposing might be possible, though not simple: The factory Supplier<Class> would become a Function<ClassLoader, Class> and would have to maintain it's own set of cached proxy classes. There would be a single ConcurrentMap<List<String>, Function<ClassLoader, Class>> to map sets of interface names to factory Functions, but the cached classes in a particular factory Function would still have to be weakly referenced. I see some difficulties in implementing such a scheme: - expunging cleared WeakReferences could only reliably clear the cache inside each factory Function but removing the entry from the map of factory Functions when last proxy class for a particular set of interface names is expunged would become a difficult task if not impossible with all the scalability constraints in mind (just thinking about concurrent requests into same factory Function where one is requesting new proxy class and the other is expunging cleared WeakReference which represents the last element in the set of cached proxy classes). - one of my past ideas of implementing scalable Proxy.isProxyClass() was to maintain a Set<Class> in each ClassLoader populated with all the proxy classes defined by a particular ClassLoader. Benchmarking such solution showed that Class.getClassLoader() is a peformance hog, so I scraped it in favor of ClassValue<Boolean> that is now incorporated in the patch. In order to "choose" the right proxy class from the set of proxy classes inside a particular factory Function, the Class.getClassLoader() method would have to be used, or entries would have to (weakly) reference a particular ClassLoader associated with each proxy class. Considering all that, such solution starts to look unappealing. It might even be more space-hungry then the presented WeakCache. WeakCache is currently the following: ConcurrentMap<WeakReferenceWithInterfaceNames<ClassLoader>, WeakReference<Class>> another alternative would be: ConcurrentMap<WeakReference<ClassLoader>, ConcurrentMap<InterfaceNames, WeakReference<Class>>> ...which might need a little less space than WeakCache (only one WeakReference per proxy class + one per ClassLoader instead of two WeakReferences per proxy class) but would require two map lookups during fast-path retrieval. It might not be performance critical and the expunging could be performed easily too. Regards, Peter
This alternative is sub-optimal than the per-loader cache. I'd be interested in your thought of this alternative and any rough idea of the performance difference with and without the per-loader cache approach for the annotation case.
Coding convention: we use /** ... */ for javadoc and /*...*/ or // for comments. Your patch uses /**...*/ style as comments that need fixing. The style you have for try { } catch { } finally { }
Mandy
https://github.com/plevart/jdk8-tl/blob/proxy/test/src/test/WeakCache.java
Regards, Peter
On 4/13/2013 2:59 PM, Peter Levart wrote:
I also devised an alternative caching mechanism with scalability in mind which uses WeakReferences for keys (for example ClassLoader) and values (for example Class) that could be used in this situation in case adding a field to ClassLoader is not an option:
I would also consider any alternative to avoid adding the proxyClassCache field in ClassLoader as Alan commented previously.
My observation of the typical usage of proxies is to use the interface's class loader to define the proxy class. So is it necessary to maintain a per-loader cache? The per-loader cache maps from the interface names to a proxy class defined by one loader. I would think it's reasonable to assume the number of loaders to define proxy class with the same set of interfaces is small. What if we make the cache as "interface names" as the key to a set of proxy class suppliers that can have only one proxy class per one unique defining loader. If the proxy class is being generated i.e. ProxyClassFactory supplier, the loader is available for comparison. When there are more than one matching proxy classes, it would have to iterate all in the set.
I would assume yes, proxy class for a particular set of interfaces is typically defined by one classloader only. But the API allows to specify different loaders as long as the interfaces implemented by proxy class are "visible" from the loader that defines the proxy class. If we're talking about interface names - as opposed to interfaces - then the possibility that a particular set of interface names would want to be used to define proxy classes with different loaders is even bigger, since an interface name can refer to different interfaces with same name (think of interfaces deployed as part of an app in an application server, say a set of annotations used by different apps but deployed as part of each individual app).
Agree. I was tempted to consider making weak reference to the interface classes as the key but in any case the overhead of Class.getClassLoader() is still a performance hog. Let's move forward with the alternative you propose.
The scheme you're proposing might be possible, though not simple: The factory Supplier<Class> would become a Function<ClassLoader, Class> and would have to maintain it's own set of cached proxy classes. There would be a single ConcurrentMap<List<String>, Function<ClassLoader, Class>> to map sets of interface names to factory Functions, but the cached classes in a particular factory Function would still have to be weakly referenced. I see some difficulties in implementing such a scheme: - expunging cleared WeakReferences could only reliably clear the cache inside each factory Function but removing the entry from the map of factory Functions when last proxy class for a particular set of interface names is expunged would become a difficult task if not impossible with all the scalability constraints in mind (just thinking about concurrent requests into same factory Function where one is requesting new proxy class and the other is expunging cleared WeakReference which represents the last element in the set of cached proxy classes). - one of my past ideas of implementing scalable Proxy.isProxyClass() was to maintain a Set<Class> in each ClassLoader populated with all the proxy classes defined by a particular ClassLoader. Benchmarking such solution showed that Class.getClassLoader() is a peformance hog, so I scraped it in favor of ClassValue<Boolean> that is now incorporated in the patch. In order to "choose" the right proxy class from the set of proxy classes inside a particular factory Function, the Class.getClassLoader() method would have to be used, or entries would have to (weakly) reference a particular ClassLoader associated with each proxy class.
Thanks for reminding me your earlier prototype. I suspect the cost of Class.getClassLoader() is due to its lookup of the caller class every time it's called.
Considering all that, such solution starts to look unappealing. It might even be more space-hungry then the presented WeakCache.
WeakCache is currently the following:
ConcurrentMap<WeakReferenceWithInterfaceNames<ClassLoader>, WeakReference<Class>>
another alternative would be:
ConcurrentMap<WeakReference<ClassLoader>, ConcurrentMap<InterfaceNames, WeakReference<Class>>>
...which might need a little less space than WeakCache (only one WeakReference per proxy class + one per ClassLoader instead of two WeakReferences per proxy class) but would require two map lookups during fast-path retrieval. It might not be performance critical and the expunging could be performed easily too.
I am fine with either of these alternatives. As you noted, the latter one would save little bit of memory for the cases when several proxy classes are defined per loader e.g. one per each annotation type. Mandy
Hi Mandy, I prepared a preview variant of j.l.r.Proxy using WeakCache (turned into an interface and a special FlattenedWeakCache implementation in anticipation to create another variant using two-levels of ConcurrentHashMaps for backing storage, but with same API) just to compare performance: https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/ind... As the values (Class objects of proxy classes) must be wrapped in a WeakReference, the same instance of WeakReference can be re-used as a key in another ConcurrentHashMap to implement quick look-up for Proxy.isProxyClass() method eliminating the need to use ClassValue, which is quite space-hungry. Comparing the performance, here's a summary of all 3 variants (original, patched using a field in ClassLoader and this variant): Summary (4 Cores x 2 Threads i7 CPU): Test Threads ns/op Original Patched (CL field) Patched (WeakCache) ======================= ======= ============== ================== =================== Proxy_getProxyClass 1 2,403.27 163.70 206.88 4 3,039.01 202.77 303.38 8 5,193.58 314.47 442.58 Proxy_isProxyClassTrue 1 95.02 10.78 41.85 4 2,266.29 10.80 42.32 8 4,782.29 20.53 72.29 Proxy_isProxyClassFalse 1 95.02 1.36 1.36 4 2,186.59 1.36 1.37 8 4,891.15 2.72 2.94 Annotation_equals 1 240.10 152.29 193.27 4 1,864.06 153.81 195.60 8 8,639.20 262.09 384.72 The improvement is still quite satisfactory, although a little slower than the direct-field variant. The scalability is the same as with direct-field variant. Space consumption of cache structure, calculated as deep-size of the structure, ignoring interned Strings, Class and ClassLoader objects unsing single non-bootstrap ClassLoader for defining the proxy classes and using 32 bit addressing is the following: original Proxy code: proxy size of delta to classes caches prev.ln. -------- -------- -------- 0 400 400 1 768 368 2 920 152 3 1072 152 4 1224 152 5 1376 152 6 1528 152 7 1680 152 8 1832 152 9 1984 152 10 2136 152 Proxy patched with the variant using FlattenedWeakCache, run on current JDK8/tl tip (still uses old ConcurrentHashMap implementation with segments): proxy size of delta to classes caches prev.ln. -------- -------- -------- 0 560 560 1 936 376 2 1312 376 3 1688 376 4 2064 376 5 2352 288 6 2728 376 7 3016 288 8 3392 376 9 3592 200 10 3872 280 ...and the same with current JDK8/lambda tip (using new segment-less ConcurrentHashMap): proxy size of delta to classes caches prev.ln. -------- -------- -------- 0 240 240 1 584 344 2 768 184 3 952 184 4 1136 184 5 1320 184 6 1504 184 7 1688 184 8 1872 184 9 2056 184 10 2240 184 So with new ConcurrentHashMap the patched Proxy uses about 32 bytes more per proxy class. Is this satisfactory or should we also try a variant with two-levels of ConcurrentHashMaps? Regards, Peter P.S. Comment to your comment in-line... On 04/16/2013 12:58 AM, Mandy Chung wrote:
On 4/13/2013 2:59 PM, Peter Levart wrote:
I also devised an alternative caching mechanism with scalability in mind which uses WeakReferences for keys (for example ClassLoader) and values (for example Class) that could be used in this situation in case adding a field to ClassLoader is not an option:
I would also consider any alternative to avoid adding the proxyClassCache field in ClassLoader as Alan commented previously.
My observation of the typical usage of proxies is to use the interface's class loader to define the proxy class. So is it necessary to maintain a per-loader cache? The per-loader cache maps from the interface names to a proxy class defined by one loader. I would think it's reasonable to assume the number of loaders to define proxy class with the same set of interfaces is small. What if we make the cache as "interface names" as the key to a set of proxy class suppliers that can have only one proxy class per one unique defining loader. If the proxy class is being generated i.e. ProxyClassFactory supplier, the loader is available for comparison. When there are more than one matching proxy classes, it would have to iterate all in the set.
I would assume yes, proxy class for a particular set of interfaces is typically defined by one classloader only. But the API allows to specify different loaders as long as the interfaces implemented by proxy class are "visible" from the loader that defines the proxy class. If we're talking about interface names - as opposed to interfaces - then the possibility that a particular set of interface names would want to be used to define proxy classes with different loaders is even bigger, since an interface name can refer to different interfaces with same name (think of interfaces deployed as part of an app in an application server, say a set of annotations used by different apps but deployed as part of each individual app).
Agree. I was tempted to consider making weak reference to the interface classes as the key but in any case the overhead of Class.getClassLoader() is still a performance hog. Let's move forward with the alternative you propose.
The scheme you're proposing might be possible, though not simple: The factory Supplier<Class> would become a Function<ClassLoader, Class> and would have to maintain it's own set of cached proxy classes. There would be a single ConcurrentMap<List<String>, Function<ClassLoader, Class>> to map sets of interface names to factory Functions, but the cached classes in a particular factory Function would still have to be weakly referenced. I see some difficulties in implementing such a scheme: - expunging cleared WeakReferences could only reliably clear the cache inside each factory Function but removing the entry from the map of factory Functions when last proxy class for a particular set of interface names is expunged would become a difficult task if not impossible with all the scalability constraints in mind (just thinking about concurrent requests into same factory Function where one is requesting new proxy class and the other is expunging cleared WeakReference which represents the last element in the set of cached proxy classes). - one of my past ideas of implementing scalable Proxy.isProxyClass() was to maintain a Set<Class> in each ClassLoader populated with all the proxy classes defined by a particular ClassLoader. Benchmarking such solution showed that Class.getClassLoader() is a peformance hog, so I scraped it in favor of ClassValue<Boolean> that is now incorporated in the patch. In order to "choose" the right proxy class from the set of proxy classes inside a particular factory Function, the Class.getClassLoader() method would have to be used, or entries would have to (weakly) reference a particular ClassLoader associated with each proxy class.
Thanks for reminding me your earlier prototype. I suspect the cost of Class.getClassLoader() is due to its lookup of the caller class every time it's called.
Even without SecurityManager installed the performance of native getClassLoader0 was a hog. I don't know why? Isn't there an implicit reference to defining ClassLoader from every Class object?
Considering all that, such solution starts to look unappealing. It might even be more space-hungry then the presented WeakCache.
WeakCache is currently the following:
ConcurrentMap<WeakReferenceWithInterfaceNames<ClassLoader>, WeakReference<Class>>
another alternative would be:
ConcurrentMap<WeakReference<ClassLoader>, ConcurrentMap<InterfaceNames, WeakReference<Class>>>
...which might need a little less space than WeakCache (only one WeakReference per proxy class + one per ClassLoader instead of two WeakReferences per proxy class) but would require two map lookups during fast-path retrieval. It might not be performance critical and the expunging could be performed easily too.
I am fine with either of these alternatives. As you noted, the latter one would save little bit of memory for the cases when several proxy classes are defined per loader e.g. one per each annotation type.
Mandy
On 4/16/2013 7:18 AM, Peter Levart wrote:
Hi Mandy,
I prepared a preview variant of j.l.r.Proxy using WeakCache (turned into an interface and a special FlattenedWeakCache implementation in anticipation to create another variant using two-levels of ConcurrentHashMaps for backing storage, but with same API) just to compare performance:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/ind...
thanks for getting this prototype done quickly.
As the values (Class objects of proxy classes) must be wrapped in a WeakReference, the same instance of WeakReference can be re-used as a key in another ConcurrentHashMap to implement quick look-up for Proxy.isProxyClass() method eliminating the need to use ClassValue, which is quite space-hungry.
I also think maintaining another ConcurrentHashMap is a good replacement with the use of ClassValue to avoid its memory overhead.
Comparing the performance, here's a summary of all 3 variants (original, patched using a field in ClassLoader and this variant):
[...]
The improvement is still quite satisfactory, although a little slower than the direct-field variant. The scalability is the same as with direct-field variant.
Agree - the improvement is quite good.
Space consumption of cache structure, calculated as deep-size of the structure, ignoring interned Strings, Class and ClassLoader objects unsing single non-bootstrap ClassLoader for defining the proxy classes and using 32 bit addressing is the following:
[...]
So with new ConcurrentHashMap the patched Proxy uses about 32 bytes more per proxy class.
Is this satisfactory or should we also try a variant with two-levels of ConcurrentHashMaps?
The overhead seems okay to trade off the scalability. Since you have prepared for doing another variant, it'd be good to compare two prototypes if this doesn't bring too much work :) I would imagine that there might be slight difference in your measurement when comparing with proxies defined by a single class loader but the code might be simpler (might not be if you keep the same API but different implementation). Regardless of which approach to use - you have added a general purpose WeakCache and the implementation class in the sun.misc package. While it's good to have such class for other jdk class to use, I am more comfortable in keeping it as a private class for proxy implementation to use. We need existing applications to migrate away from sun.misc and other private APIs to prepare for modularization. Nits: can you wrap the lines around 80 columns including comments? try-catch-finally statements need some formatting fixes. Our convention is to have 'catch', or 'finally' following the closing bracket '}' in the same line. Your editor breaks 'catch' or 'finally' into the next line.
Even without SecurityManager installed the performance of native getClassLoader0 was a hog. I don't know why? Isn't there an implicit reference to defining ClassLoader from every Class object?
That's right - it looks for the caller class only if the security manager is installed. The defining class loader is kept in the VM's Klass object (language-level Class instance representation in the VM) and there is no computation needed to obtain a defining class loader of a given Class object. I can only think of the Java <-> native transition overhead that could be one factor. Class.getClassLoader0 is not intrinsified. I'll find out (others on this mailing list may probably know). Mandy
Hi Mandy, Here's the updated webrev: https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind... This adds TwoLevelWeakCache to the scene with following performance compared to other alternatives: Summary (4 Cores x 2 Threads i7 CPU): Test Threads ns/op Original Patch(CL field) FlattenedWeakCache TwoLevelWeakCache ======================= ======= ============== =============== ================== ================= Proxy_getProxyClass 1 2,403.27 163.70 206.88 252.89 4 3,039.01 202.77 303.38 327.62 8 5,193.58 314.47 442.58 510.63 Proxy_isProxyClassTrue 1 95.02 10.78 41.85 42.03 4 2,266.29 10.80 42.32 42.07 8 4,782.29 20.53 72.29 69.25 Proxy_isProxyClassFalse 1 95.02 1.36 1.36 1.36 4 2,186.59 1.36 1.37 1.40 8 4,891.15 2.72 2.94 2.72 Annotation_equals 1 240.10 152.29 193.27 200.45 4 1,864.06 153.81 195.60 202.45 8 8,639.20 262.09 384.72 338.70 As expected, the Proxy.getProxyClass() is yet a little slower than with FlattenedWeakCache, but still much faster than original Proxy implementation. Another lookup in the ConcurrentHashMap and another indirection have a price, but we also get something in return - space. This is all obtained on latest lambda build (with new segment-less ConcurrentHashMap). I also added another ClassLoader to see what happens when the 2nd is added to the cache: # Original Proxy, 32 bit addressing class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 1 9 1984 152 1 10 2136 152 2 11 2456 320 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152 2 17 3432 152 2 18 3584 152 2 19 3736 152 2 20 3888 152 # Original Proxy, 64 bit addressing class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 632 632 1 1 1216 584 1 2 1448 232 1 3 1680 232 1 4 1912 232 1 5 2144 232 1 6 2376 232 1 7 2608 232 1 8 2840 232 1 9 3072 232 1 10 3304 232 2 11 3832 528 2 12 4192 360 2 13 4424 232 2 14 4656 232 2 15 4888 232 2 16 5120 232 2 17 5352 232 2 18 5584 232 2 19 5816 232 2 20 6048 232 # Patched Proxy (FlattenedWeakCache), 32 bit addressing class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 584 344 1 2 768 184 1 3 952 184 1 4 1136 184 1 5 1320 184 1 6 1504 184 1 7 1688 184 1 8 1872 184 1 9 2056 184 1 10 2240 184 2 11 2424 184 2 12 2736 312 2 13 2920 184 2 14 3104 184 2 15 3288 184 2 16 3472 184 2 17 3656 184 2 18 3840 184 2 19 4024 184 2 20 4208 184 # Patched Proxy (FlattenedWeakCache), 64 bit addressing class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 920 584 1 2 1200 280 1 3 1480 280 1 4 1760 280 1 5 2040 280 1 6 2320 280 1 7 2600 280 1 8 2880 280 1 9 3160 280 1 10 3440 280 2 11 3720 280 2 12 4256 536 2 13 4536 280 2 14 4816 280 2 15 5096 280 2 16 5376 280 2 17 5656 280 2 18 5936 280 2 19 6216 280 2 20 6496 280 # Patched Proxy (TwoLevelWeakCache), 32 bit addressing class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 752 512 1 2 896 144 1 3 1040 144 1 4 1184 144 1 5 1328 144 1 6 1472 144 1 7 1616 144 1 8 1760 144 1 9 1904 144 1 10 2048 144 2 11 2400 352 2 12 2608 208 2 13 2752 144 2 14 2896 144 2 15 3040 144 2 16 3184 144 2 17 3328 144 2 18 3472 144 2 19 3616 144 2 20 3760 144 # Patched Proxy (TwoLevelWeakCache), 64 bit addressing class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 1216 880 1 2 1440 224 1 3 1664 224 1 4 1888 224 1 5 2112 224 1 6 2336 224 1 7 2560 224 1 8 2784 224 1 9 3008 224 1 10 3232 224 2 11 3808 576 2 12 4160 352 2 13 4384 224 2 14 4608 224 2 15 4832 224 2 16 5056 224 2 17 5280 224 2 18 5504 224 2 19 5728 224 2 20 5952 224 So we loose approx. 32 bytes (32bit addresses) or 48 bytes (64 bit addresses) for each proxy class compared to original code when using FlattenedWeakCache, but we gain 8 bytes (32 bit or 64 bit addresses) for each proxy class cached compared to original code when using TwoLevelWeakCache. So which to favour, space or time? Other comments in-line... On 04/17/2013 07:31 AM, Mandy Chung wrote:
On 4/16/2013 7:18 AM, Peter Levart wrote:
Hi Mandy,
I prepared a preview variant of j.l.r.Proxy using WeakCache (turned into an interface and a special FlattenedWeakCache implementation in anticipation to create another variant using two-levels of ConcurrentHashMaps for backing storage, but with same API) just to compare performance:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/ind...
thanks for getting this prototype done quickly.
As the values (Class objects of proxy classes) must be wrapped in a WeakReference, the same instance of WeakReference can be re-used as a key in another ConcurrentHashMap to implement quick look-up for Proxy.isProxyClass() method eliminating the need to use ClassValue, which is quite space-hungry.
I also think maintaining another ConcurrentHashMap is a good replacement with the use of ClassValue to avoid its memory overhead.
Comparing the performance, here's a summary of all 3 variants (original, patched using a field in ClassLoader and this variant):
[...]
The improvement is still quite satisfactory, although a little slower than the direct-field variant. The scalability is the same as with direct-field variant.
Agree - the improvement is quite good.
Space consumption of cache structure, calculated as deep-size of the structure, ignoring interned Strings, Class and ClassLoader objects unsing single non-bootstrap ClassLoader for defining the proxy classes and using 32 bit addressing is the following:
[...]
So with new ConcurrentHashMap the patched Proxy uses about 32 bytes more per proxy class.
Is this satisfactory or should we also try a variant with two-levels of ConcurrentHashMaps?
The overhead seems okay to trade off the scalability.
Since you have prepared for doing another variant, it'd be good to compare two prototypes if this doesn't bring too much work :) I would imagine that there might be slight difference in your measurement when comparing with proxies defined by a single class loader but the code might be simpler (might not be if you keep the same API but different implementation).
With TwoLevelWeakCache, there is a "step" of 108 bytes (32bit addresses) when new ClassLoader is encoutered (new 2nd level ConcurrentHashMap is allocated and new entry added to 1st level CHM. There's no such "step" in FlattenedWeakCache (modulo the steps when the CHMs are itself resized). So we roughly have 108 bytes wasted for each new ClassLoader encountered with TwoLevelWeakCache vs. FlattenedWeakCache, but we also have 40 bytes spared for each proxy class cached. TwoLevelWeakCache starts to pay off if there are at least 3 proxy classes defined per ClassLoader in average.
Regardless of which approach to use - you have added a general purpose WeakCache and the implementation class in the sun.misc package. While it's good to have such class for other jdk class to use, I am more comfortable in keeping it as a private class for proxy implementation to use. We need existing applications to migrate away from sun.misc and other private APIs to prepare for modularization.
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
Nits: can you wrap the lines around 80 columns including comments? try-catch-finally statements need some formatting fixes. Our convention is to have 'catch', or 'finally' following the closing bracket '}' in the same line. Your editor breaks 'catch' or 'finally' into the next line.
Fixed. Regards, Peter
Even without SecurityManager installed the performance of native getClassLoader0 was a hog. I don't know why? Isn't there an implicit reference to defining ClassLoader from every Class object?
That's right - it looks for the caller class only if the security manager is installed. The defining class loader is kept in the VM's Klass object (language-level Class instance representation in the VM) and there is no computation needed to obtain a defining class loader of a given Class object. I can only think of the Java <-> native transition overhead that could be one factor. Class.getClassLoader0 is not intrinsified. I'll find out (others on this mailing list may probably know).
Mandy
Hi Peter, On 4/17/13 7:18 AM, Peter Levart wrote:
As expected, the Proxy.getProxyClass() is yet a little slower than with FlattenedWeakCache, but still much faster than original Proxy implementation. Another lookup in the ConcurrentHashMap and another indirection have a price, but we also get something in return - space.
[...]
So we loose approx. 32 bytes (32bit addresses) or 48 bytes (64 bit addresses) for each proxy class compared to original code when using FlattenedWeakCache, but we gain 8 bytes (32 bit or 64 bit addresses) for each proxy class cached compared to original code when using TwoLevelWeakCache. So which to favour, space or time?
With TwoLevelWeakCache, there is a "step" of 108 bytes (32bit addresses) when new ClassLoader is encoutered (new 2nd level ConcurrentHashMap is allocated and new entry added to 1st level CHM. There's no such "step" in FlattenedWeakCache (modulo the steps when the CHMs are itself resized). So we roughly have 108 bytes wasted for each new ClassLoader encountered with TwoLevelWeakCache vs. FlattenedWeakCache, but we also have 40 bytes spared for each proxy class cached. TwoLevelWeakCache starts to pay off if there are at least 3 proxy classes defined per ClassLoader in average.
Thanks for the detailed measurement and analysis. Although the extra lookup on the per-loader cache incurs additional overhead on Proxy.getProxyClass, it still shows 10x speed on your performance measurement result which is very good. Since the performance improvement is significant on both approaches, the memory saving would be the desirable choice.Especially Java SE 8 profiles [1] can run on small devices and we should be cautious about the space and speed tradeoff.I'll support going for per-loader cache (i.e. two-level weak cache). Another point - Proxies are used in the JDK to implement annotations, java.beans.EventHandler, JMX MXBeans mapping, JMX MBean proxies, Invocable interface by script engines, and RMI/serialization. JAXWS also uses proxies to support @javax.xml.bind.annotation.XmlElement. For the case of annotations and JMX, the number of generated proxy classes would typically not be a couple that depends on what interfaces application define. In EE environment, there will be many class loaders running. It'd be better to prepare to work the moderate number, if not high, of proxy classes and multiple loaders case.
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind... What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
thanks. Moving it to a single package-private classin java.lang.reflectand remove the interface sounds good. I have merged your patch with the recent TL repo and did some clean up while reviewing it. Some comments: 1. getProxyClass0 should validate the input interfaces and throw IAE if any illegal argument (e.g. interfaces are not visible to the given loader) before calling proxyClassCache.get(loader, interfaces). I moved back the validation code from ProxyClassFactory.apply to getProxyClass0. 2. I did some cleanup to restore some original comments to make the diffs clearer where the change is. 3. I removed the newInstance method which is dead code after my previous code. Since we are in this code, I took the chance to clean that up and also change a couple for-loop to use for-each. I have put the merged and slightly modified Proxy.java and webrev at: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.00/ We will use this bug for your contribution: 7123493 : (proxy) Proxy.getProxyClass doesn't scale under high load For the weak cache class, since it's for proxy implementation use, I suggest to take out the supportContainsValueOperation flagas it always keeps the reverse map for isProxyClass lookup. You can simply call Objects.requireNonNull(param) instead of requireNonNull(param, "param-name") since the proxy implementation should have made sure the argument is non-null. Formatting nits: 68 Object cacheKey = CacheKey.valueOf( 69 key, 70 refQueue 71 ); should be: all in one line or line break on a long-line. Same for method signature. 237 void expungeFrom( 238 ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, 239 ConcurrentMap<?, Boolean> reverseMap 240 ); should be: void expungeFrom(ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, ConcurrentMap<?, Boolean> reverseMap); so that it'll be more consistent with the existing code. I'll do a detailed review on the weak cache class as you will finalize the code per the decision to go with the two-level weak cache. Thanks again for the good work. Mandy [1] http://openjdk.java.net/jeps/161
Hi Mandy, On 04/19/2013 07:33 AM, Mandy Chung wrote:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind...
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
thanks. Moving it to a single package-private classin java.lang.reflectand remove the interface sounds good.
Right.
I have merged your patch with the recent TL repo and did some clean up while reviewing it. Some comments: 1. getProxyClass0 should validate the input interfaces and throw IAE if any illegal argument (e.g. interfaces are not visible to the given loader) before calling proxyClassCache.get(loader, interfaces). I moved back the validation code from ProxyClassFactory.apply to getProxyClass0.
Ops, you're right. There could be a request with interface(s) with same name(s) but loaded by different loader(s) and such code could return wrong pre-cached proxy class instead of throwing exception. Unfortunately, moving validation to slow-path was the cause of major performance and scalability improvement observed. With validation moved back to getProxyClass0 (in spite of using two-level WeakCache), we have the following performance (WeakCache#1): Summary (4 Cores x 2 Threads i7 CPU): Test Threads ns/op Original WeakCache#1 ======================= ======= ============== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 4 3,039.01 2,555.00 8 5,193.58 4,273.42 Proxy_isProxyClassTrue 1 95.02 43.14 4 2,266.29 43.23 8 4,782.29 72.06 Proxy_isProxyClassFalse 1 95.02 1.36 4 2,186.59 1.36 8 4,891.15 2.72 Annotation_equals 1 240.10 195.68 4 1,864.06 201.41 8 8,639.20 337.46 It's a little better than original getProxyClass(), but not much. The isProxyClass() and consequently Annotation.equals() are far better though. But as you might have guessed, I kind of solved that too... My first attempt was to optimize the interface validation. Only the "visibility" part is necessary to be performed on fast-path. Uniqueness and other things can be performed on slow-path. With split-validation and hacks like: private static final MethodHandle findLoadedClass0MH, findBootstrapClassMH; private static final ClassLoader dummyCL = new ClassLoader() {}; static { try { Method method = ClassLoader.class.getDeclaredMethod("findLoadedClass0", String.class); method.setAccessible(true); findLoadedClass0MH = MethodHandles.lookup().unreflect(method); method = ClassLoader.class.getDeclaredMethod("findBootstrapClass", String.class); method.setAccessible(true); findBootstrapClassMH = MethodHandles.lookup().unreflect(method); } catch (NoSuchMethodException e) { throw (Error) new NoSuchMethodError(e.getMessage()).initCause(e); } catch (IllegalAccessException e) { throw (Error) new IllegalAccessError(e.getMessage()).initCause(e); } } static Class<?> findLoadedClass(ClassLoader loader, String name) { try { if (VM.isSystemDomainLoader(loader)) { return (Class<?>) findBootstrapClassMH.invokeExact(dummyCL, name); } else { return (Class<?>) findLoadedClass0MH.invokeExact(loader, name); } } catch (RuntimeException | Error e) { throw e; } catch (Throwable t) { throw new UndeclaredThrowableException(t); } } ... using findLoadedClass(loader, intf.getName()) and only doing Class.forName(intf.getName(), false, loader) if the former returned null ... I managed to reclaim some performance (WeakCache#1+): Test Threads ns/op Original WeakCache#1 WeakCache#1+ ======================= ======= ============== =========== ============ Proxy_getProxyClass 1 2,403.27 2,174.51 1,589.36 4 3,039.01 2,555.00 1,929.11 8 5,193.58 4,273.42 3,409.77 ...but that was still not very satisfactory. I modified the KeyFactory to create keys that weakly-reference interface Class objects and implement hashCode/equals in terms of comparing those Class objects. With such keys, no false aliasing can occur and the whole validation can be pushed back to slow-path. I tried hard to create keys with as little space overhead as possible: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i... ...but there can be no miracles. The good news is that the performance is back (WeakCache#2): Summary (4 Cores x 2 Threads i7 CPU): Test Threads ns/op Original WeakCache#1 WeakCache#2 ======================= ======= ============== =========== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 163.57 4 3,039.01 2,555.00 211.70 8 5,193.58 4,273.42 322.14 Proxy_isProxyClassTrue 1 95.02 43.14 41.23 4 2,266.29 43.23 42.20 8 4,782.29 72.06 72.21 Proxy_isProxyClassFalse 1 95.02 1.36 1.36 4 2,186.59 1.36 1.36 8 4,891.15 2.72 2.72 Annotation_equals 1 240.10 195.68 194.61 4 1,864.06 201.41 198.81 8 8,639.20 337.46 342.90 ... and the most common usage (proxy class implementing exactly one interface) uses even less space than with interface-names-key - 16 bytes saved per proxy class vs. 8 bytes saved (32 bit addressing mode): -------------------------------------- Original j.l.r.Proxy 1 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152 -------------------------------------- Patched j.l.r.Proxy 1 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 792 552 1 2 928 136 1 3 1064 136 1 4 1200 136 1 5 1336 136 1 6 1472 136 1 7 1608 136 1 8 1744 136 2 9 2088 344 2 10 2224 136 2 11 2360 136 2 12 2560 200 2 13 2696 136 2 14 2832 136 2 15 2968 136 2 16 3104 136 Did you know, that Proxy.getProxyClass() can generate proxy classes implementing 0 interfaces? It can. There's exactly one such class generated per class loader: -------------------------------------- Original j.l.r.Proxy 0 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 760 360 2 2 1072 312 -------------------------------------- Patched j.l.r.Proxy 0 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 728 488 2 2 1040 312 With 2 or more interfaces implemented by proxy class the space overhead increases faster than with original Proxy: -------------------------------------- Original j.l.r.Proxy 2 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152 -------------------------------------- Patched j.l.r.Proxy 2 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 832 592 1 2 1008 176 1 3 1184 176 1 4 1360 176 1 5 1536 176 1 6 1712 176 1 7 1888 176 1 8 2064 176 2 9 2448 384 2 10 2624 176 2 11 2800 176 2 12 3040 240 2 13 3216 176 2 14 3392 176 2 15 3568 176 2 16 3744 176 -------------------------------------- Original j.l.r.Proxy 3 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160 -------------------------------------- Patched j.l.r.Proxy 3 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 864 624 1 2 1072 208 1 3 1280 208 1 4 1488 208 1 5 1696 208 1 6 1904 208 1 7 2112 208 1 8 2320 208 2 9 2736 416 2 10 2944 208 2 11 3152 208 2 12 3424 272 2 13 3632 208 2 14 3840 208 2 15 4048 208 2 16 4256 208 -------------------------------------- Original j.l.r.Proxy 4 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160 -------------------------------------- Patched j.l.r.Proxy 4 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 896 656 1 2 1136 240 1 3 1376 240 1 4 1616 240 1 5 1856 240 1 6 2096 240 1 7 2336 240 1 8 2576 240 2 9 3024 448 2 10 3264 240 2 11 3504 240 2 12 3808 304 2 13 4048 240 2 14 4288 240 2 15 4528 240 2 16 4768 240 There's an increase of 8 bytes per proxy class key for each 2 interfaces added to proxy class in original Proxy code, but there's an increase of 32 bytes per proxy class key for each single interface added in patched Proxy code. I think the most common usage is to implement a single interface and there is 16 bytes gained for each such usage compared to original Proxy code.
2. I did some cleanup to restore some original comments to make the diffs clearer where the change is. 3. I removed the newInstance method which is dead code after my previous code. Since we are in this code, I took the chance to clean that up and also change a couple for-loop to use for-each.
I have put the merged and slightly modified Proxy.java and webrev at: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.00/
We will use this bug for your contribution: 7123493 : (proxy) Proxy.getProxyClass doesn't scale under high load
I took j.l.r.Proxy file from your webrev and just changed the KeyFactory implementation. WeakCache is generic and can be used unchanged with either implementation of KeyFactory.
For the weak cache class, since it's for proxy implementation use, I suggest to take out the supportContainsValueOperation flagas it always keeps the reverse map for isProxyClass lookup.
You can simply call Objects.requireNonNull(param) instead of requireNonNull(param, "param-name") since the proxy implementation should have made sure the argument is non-null.
Formatting nits:
68 Object cacheKey = CacheKey.valueOf( 69 key, 70 refQueue 71 );
should be: all in one line or line break on a long-line. Same for method signature.
237 void expungeFrom( 238 ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, 239 ConcurrentMap<?, Boolean> reverseMap 240 );
should be:
void expungeFrom(ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, ConcurrentMap<?, Boolean> reverseMap);
so that it'll be more consistent with the existing code. I'll do a detailed review on the weak cache class as you will finalize the code per the decision to go with the two-level weak cache.
I hope I have addressed all that in above webrev. Regards, Peter
Thanks again for the good work.
Mandy [1] http://openjdk.java.net/jeps/161
On 04/19/2013 04:36 PM, Peter Levart wrote:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i...
Hi Mandy, I changed the copyright header of WeakCache to GPLv2 with ClassPath exception and corrected a minor formatting inconsistency: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.02/i... Regards, Peter
Hi Mandy, I just noticed the following (have been thinking of it already, but then forgot) in original code: /* 512 * Note that we need not worry about reaping the cache for 513 * entries with cleared weak references because if a proxy class 514 * has been garbage collected, its class loader will have been 515 * garbage collected as well, so the entire cache will be reaped 516 * from the loaderToCache map. 517 */ Each ClassLoader maintains explicit hard-references to all Class objects for classes defined by the loader. So proxy Class object can not be GC-ed until the ClassLoader is GC-ed. So we need not register the CacheValue objects in WeakCache with a refQueue. The expunging of reverseMap entries is already performed with CacheKey when it is cleared and equeued. There's no harm as it is, since the clean-up is performed with all the checks and is idempotent, but it need not be done for ClassValue objects holding weak references to proxy Class objects. I'll do that change to WeakCache in next webrev together with any other possible changes that might be needed. Regards, Peter On 04/19/2013 05:43 PM, Peter Levart wrote:
On 04/19/2013 04:36 PM, Peter Levart wrote:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i...
Hi Mandy,
I changed the copyright header of WeakCache to GPLv2 with ClassPath exception and corrected a minor formatting inconsistency:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.02/i...
Regards, Peter
Hi Mandy, I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example: - have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like: for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()... I feel that Class.isAssignableFrom(Class) check could be much faster and that the only reason the check can fail is if some interface is not visible from the class loader. Am I correct? Regards, Peter On 04/19/2013 04:36 PM, Peter Levart wrote:
Hi Mandy,
On 04/19/2013 07:33 AM, Mandy Chung wrote:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind...
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
thanks. Moving it to a single package-private classin java.lang.reflectand remove the interface sounds good.
Right.
I have merged your patch with the recent TL repo and did some clean up while reviewing it. Some comments: 1. getProxyClass0 should validate the input interfaces and throw IAE if any illegal argument (e.g. interfaces are not visible to the given loader) before calling proxyClassCache.get(loader, interfaces). I moved back the validation code from ProxyClassFactory.apply to getProxyClass0.
Ops, you're right. There could be a request with interface(s) with same name(s) but loaded by different loader(s) and such code could return wrong pre-cached proxy class instead of throwing exception. Unfortunately, moving validation to slow-path was the cause of major performance and scalability improvement observed. With validation moved back to getProxyClass0 (in spite of using two-level WeakCache), we have the following performance (WeakCache#1):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 ======================= ======= ============== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 4 3,039.01 2,555.00 8 5,193.58 4,273.42
Proxy_isProxyClassTrue 1 95.02 43.14 4 2,266.29 43.23 8 4,782.29 72.06
Proxy_isProxyClassFalse 1 95.02 1.36 4 2,186.59 1.36 8 4,891.15 2.72
Annotation_equals 1 240.10 195.68 4 1,864.06 201.41 8 8,639.20 337.46
It's a little better than original getProxyClass(), but not much. The isProxyClass() and consequently Annotation.equals() are far better though. But as you might have guessed, I kind of solved that too...
My first attempt was to optimize the interface validation. Only the "visibility" part is necessary to be performed on fast-path. Uniqueness and other things can be performed on slow-path. With split-validation and hacks like:
private static final MethodHandle findLoadedClass0MH, findBootstrapClassMH; private static final ClassLoader dummyCL = new ClassLoader() {};
static { try { Method method = ClassLoader.class.getDeclaredMethod("findLoadedClass0", String.class); method.setAccessible(true); findLoadedClass0MH = MethodHandles.lookup().unreflect(method);
method = ClassLoader.class.getDeclaredMethod("findBootstrapClass", String.class); method.setAccessible(true); findBootstrapClassMH = MethodHandles.lookup().unreflect(method); } catch (NoSuchMethodException e) { throw (Error) new NoSuchMethodError(e.getMessage()).initCause(e); } catch (IllegalAccessException e) { throw (Error) new IllegalAccessError(e.getMessage()).initCause(e); } }
static Class<?> findLoadedClass(ClassLoader loader, String name) { try { if (VM.isSystemDomainLoader(loader)) { return (Class<?>) findBootstrapClassMH.invokeExact(dummyCL, name); } else { return (Class<?>) findLoadedClass0MH.invokeExact(loader, name); } } catch (RuntimeException | Error e) { throw e; } catch (Throwable t) { throw new UndeclaredThrowableException(t); } }
... using findLoadedClass(loader, intf.getName()) and only doing Class.forName(intf.getName(), false, loader) if the former returned null ... I managed to reclaim some performance (WeakCache#1+):
Test Threads ns/op Original WeakCache#1 WeakCache#1+ ======================= ======= ============== =========== ============ Proxy_getProxyClass 1 2,403.27 2,174.51 1,589.36 4 3,039.01 2,555.00 1,929.11 8 5,193.58 4,273.42 3,409.77
...but that was still not very satisfactory.
I modified the KeyFactory to create keys that weakly-reference interface Class objects and implement hashCode/equals in terms of comparing those Class objects. With such keys, no false aliasing can occur and the whole validation can be pushed back to slow-path. I tried hard to create keys with as little space overhead as possible:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i...
...but there can be no miracles. The good news is that the performance is back (WeakCache#2):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 WeakCache#2 ======================= ======= ============== =========== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 163.57 4 3,039.01 2,555.00 211.70 8 5,193.58 4,273.42 322.14
Proxy_isProxyClassTrue 1 95.02 43.14 41.23 4 2,266.29 43.23 42.20 8 4,782.29 72.06 72.21
Proxy_isProxyClassFalse 1 95.02 1.36 1.36 4 2,186.59 1.36 1.36 8 4,891.15 2.72 2.72
Annotation_equals 1 240.10 195.68 194.61 4 1,864.06 201.41 198.81 8 8,639.20 337.46 342.90
... and the most common usage (proxy class implementing exactly one interface) uses even less space than with interface-names-key - 16 bytes saved per proxy class vs. 8 bytes saved (32 bit addressing mode):
-------------------------------------- Original j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 792 552 1 2 928 136 1 3 1064 136 1 4 1200 136 1 5 1336 136 1 6 1472 136 1 7 1608 136 1 8 1744 136 2 9 2088 344 2 10 2224 136 2 11 2360 136 2 12 2560 200 2 13 2696 136 2 14 2832 136 2 15 2968 136 2 16 3104 136
Did you know, that Proxy.getProxyClass() can generate proxy classes implementing 0 interfaces? It can. There's exactly one such class generated per class loader:
-------------------------------------- Original j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 760 360 2 2 1072 312
-------------------------------------- Patched j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 728 488 2 2 1040 312
With 2 or more interfaces implemented by proxy class the space overhead increases faster than with original Proxy:
-------------------------------------- Original j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 832 592 1 2 1008 176 1 3 1184 176 1 4 1360 176 1 5 1536 176 1 6 1712 176 1 7 1888 176 1 8 2064 176 2 9 2448 384 2 10 2624 176 2 11 2800 176 2 12 3040 240 2 13 3216 176 2 14 3392 176 2 15 3568 176 2 16 3744 176
-------------------------------------- Original j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 864 624 1 2 1072 208 1 3 1280 208 1 4 1488 208 1 5 1696 208 1 6 1904 208 1 7 2112 208 1 8 2320 208 2 9 2736 416 2 10 2944 208 2 11 3152 208 2 12 3424 272 2 13 3632 208 2 14 3840 208 2 15 4048 208 2 16 4256 208
-------------------------------------- Original j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 896 656 1 2 1136 240 1 3 1376 240 1 4 1616 240 1 5 1856 240 1 6 2096 240 1 7 2336 240 1 8 2576 240 2 9 3024 448 2 10 3264 240 2 11 3504 240 2 12 3808 304 2 13 4048 240 2 14 4288 240 2 15 4528 240 2 16 4768 240
There's an increase of 8 bytes per proxy class key for each 2 interfaces added to proxy class in original Proxy code, but there's an increase of 32 bytes per proxy class key for each single interface added in patched Proxy code.
I think the most common usage is to implement a single interface and there is 16 bytes gained for each such usage compared to original Proxy code.
2. I did some cleanup to restore some original comments to make the diffs clearer where the change is. 3. I removed the newInstance method which is dead code after my previous code. Since we are in this code, I took the chance to clean that up and also change a couple for-loop to use for-each.
I have put the merged and slightly modified Proxy.java and webrev at: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.00/
We will use this bug for your contribution: 7123493 : (proxy) Proxy.getProxyClass doesn't scale under high load
I took j.l.r.Proxy file from your webrev and just changed the KeyFactory implementation. WeakCache is generic and can be used unchanged with either implementation of KeyFactory.
For the weak cache class, since it's for proxy implementation use, I suggest to take out the supportContainsValueOperation flagas it always keeps the reverse map for isProxyClass lookup.
You can simply call Objects.requireNonNull(param) instead of requireNonNull(param, "param-name") since the proxy implementation should have made sure the argument is non-null.
Formatting nits:
68 Object cacheKey = CacheKey.valueOf( 69 key, 70 refQueue 71 );
should be: all in one line or line break on a long-line. Same for method signature.
237 void expungeFrom( 238 ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, 239 ConcurrentMap<?, Boolean> reverseMap 240 );
should be:
void expungeFrom(ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, ConcurrentMap<?, Boolean> reverseMap);
so that it'll be more consistent with the existing code. I'll do a detailed review on the weak cache class as you will finalize the code per the decision to go with the two-level weak cache.
I hope I have addressed all that in above webrev.
Regards, Peter
Thanks again for the good work.
Mandy [1] http://openjdk.java.net/jeps/161
On 04/20/2013 09:31 AM, Peter Levart wrote:
Hi Mandy,
I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example:
- have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like:
for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()...
I just did it: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.03/inde... I also incorporated the expunging optimization that I talked about in one of previous mails. And the keys, composed of interface names, are now more space-friendly too: - a key for 0 interface proxy is a singleton Object - a key for 1 interface proxy is the name of the interface itself (an already interned String) - a key for 2 interface proxy is a special class with 2 references to interned Strings - a key for 3+ interface proxy is a special class wrapping an array of interned Strings The performance is screaming again (WeakCache+post-validation): ns/op WeakCache+ WeakCache+ Test Threads Original pre-valid. post-valid. ======================= ======= =========== =========== =========== Proxy_getProxyClass 1 2,420.99 2,258.00 211.04 4 3,075.39 2,644.26 282.93 8 5,374.45 5,159.71 432.14 Proxy_isProxyClassTrue 1 97.75 45.37 42.67 4 2,505.92 42.59 42.77 8 5,042.61 75.44 73.20 Proxy_isProxyClassFalse 1 89.20 1.40 1.40 4 2,548.61 1.40 1.40 8 4,901.56 2.82 2.80 Annotation_equals 1 224.39 201.82 202.97 4 2,046.21 200.61 204.89 8 3,564.78 347.27 344.57 And the space savings are now even greater. Patched code is always better space-wise. The savings are: 32 bit addressing: - 56 bytes per proxy class implementing 1 interface - 32 bytes per proxy class implementing 2 interfaces - 16 bytes per proxy class implementing 3 or more interfaces 64 bit addressing: - 80 bytes per proxy class implementing 1 interface - 56 bytes per proxy class implementing 2 interfaces - 24 bytes per proxy class implementing 3 or more interfaces Regards, Peter P.S. Details: 32 bit addressing: ** Original j.l.r.Proxy ** 0 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 760 360 2 9 1072 312 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 1 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 2 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 3 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 4 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 0 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 768 528 2 9 1072 304 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 1 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 752 512 1 2 848 96 1 3 944 96 1 4 1040 96 1 5 1136 96 1 6 1232 96 1 7 1328 96 1 8 1424 96 2 9 1728 304 2 10 1824 96 2 11 1920 96 2 12 2080 160 2 13 2176 96 2 14 2272 96 2 15 2368 96 2 16 2464 96 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 2 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 776 536 1 2 896 120 1 3 1016 120 1 4 1136 120 1 5 1256 120 1 6 1376 120 1 7 1496 120 1 8 1616 120 2 9 1944 328 2 10 2064 120 2 11 2184 120 2 12 2368 184 2 13 2488 120 2 14 2608 120 2 15 2728 120 2 16 2848 120 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 3 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 800 560 1 2 944 144 1 3 1088 144 1 4 1232 144 1 5 1376 144 1 6 1520 144 1 7 1664 144 1 8 1808 144 2 9 2160 352 2 10 2304 144 2 11 2448 144 2 12 2656 208 2 13 2800 144 2 14 2944 144 2 15 3088 144 2 16 3232 144 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 4 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 800 560 1 2 944 144 1 3 1088 144 1 4 1232 144 1 5 1376 144 1 6 1520 144 1 7 1664 144 1 8 1808 144 2 9 2160 352 2 10 2304 144 2 11 2448 144 2 12 2656 208 2 13 2800 144 2 14 2944 144 2 15 3088 144 2 16 3232 144 -------- -------- -------- -------- 64 bit addressing: ** Original j.l.r.Proxy ** 0 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 632 632 1 1 1208 576 2 9 1728 520 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 1 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 632 632 1 1 1216 584 1 2 1448 232 1 3 1680 232 1 4 1912 232 1 5 2144 232 1 6 2376 232 1 7 2608 232 1 8 2840 232 2 9 3368 528 2 10 3600 232 2 11 3832 232 2 12 4192 360 2 13 4424 232 2 14 4656 232 2 15 4888 232 2 16 5120 232 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 2 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 632 632 1 1 1224 592 1 2 1464 240 1 3 1704 240 1 4 1944 240 1 5 2184 240 1 6 2424 240 1 7 2664 240 1 8 2904 240 2 9 3440 536 2 10 3680 240 2 11 3920 240 2 12 4288 368 2 13 4528 240 2 14 4768 240 2 15 5008 240 2 16 5248 240 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 3 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 632 632 1 1 1232 600 1 2 1480 248 1 3 1728 248 1 4 1976 248 1 5 2224 248 1 6 2472 248 1 7 2720 248 1 8 2968 248 2 9 3512 544 2 10 3760 248 2 11 4008 248 2 12 4384 376 2 13 4632 248 2 14 4880 248 2 15 5128 248 2 16 5376 248 -------- -------- -------- -------- ** Original j.l.r.Proxy ** 4 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 632 632 1 1 1240 608 1 2 1496 256 1 3 1752 256 1 4 2008 256 1 5 2264 256 1 6 2520 256 1 7 2776 256 1 8 3032 256 2 9 3584 552 2 10 3840 256 2 11 4096 256 2 12 4480 384 2 13 4736 256 2 14 4992 256 2 15 5248 256 2 16 5504 256 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 0 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 1216 880 2 9 1720 504 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 1 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 1200 864 1 2 1352 152 1 3 1504 152 1 4 1656 152 1 5 1808 152 1 6 1960 152 1 7 2112 152 1 8 2264 152 2 9 2768 504 2 10 2920 152 2 11 3072 152 2 12 3352 280 2 13 3504 152 2 14 3656 152 2 15 3808 152 2 16 3960 152 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 2 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 1232 896 1 2 1416 184 1 3 1600 184 1 4 1784 184 1 5 1968 184 1 6 2152 184 1 7 2336 184 1 8 2520 184 2 9 3056 536 2 10 3240 184 2 11 3424 184 2 12 3736 312 2 13 3920 184 2 14 4104 184 2 15 4288 184 2 16 4472 184 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 3 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 1272 936 1 2 1496 224 1 3 1720 224 1 4 1944 224 1 5 2168 224 1 6 2392 224 1 7 2616 224 1 8 2840 224 2 9 3416 576 2 10 3640 224 2 11 3864 224 2 12 4216 352 2 13 4440 224 2 14 4664 224 2 15 4888 224 2 16 5112 224 -------- -------- -------- -------- ** Patched j.l.r.Proxy ** 4 interfaces / proxy class class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 336 336 1 1 1280 944 1 2 1512 232 1 3 1744 232 1 4 1976 232 1 5 2208 232 1 6 2440 232 1 7 2672 232 1 8 2904 232 2 9 3488 584 2 10 3720 232 2 11 3952 232 2 12 4312 360 2 13 4544 232 2 14 4776 232 2 15 5008 232 2 16 5240 232 -------- -------- -------- -------- On 04/20/2013 09:31 AM, Peter Levart wrote:
Hi Mandy,
I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example:
- have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like:
for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()...
I feel that Class.isAssignableFrom(Class) check could be much faster and that the only reason the check can fail is if some interface is not visible from the class loader.
Am I correct?
Regards, Peter
On 04/19/2013 04:36 PM, Peter Levart wrote:
Hi Mandy,
On 04/19/2013 07:33 AM, Mandy Chung wrote:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind...
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
thanks. Moving it to a single package-private classin java.lang.reflectand remove the interface sounds good.
Right.
I have merged your patch with the recent TL repo and did some clean up while reviewing it. Some comments: 1. getProxyClass0 should validate the input interfaces and throw IAE if any illegal argument (e.g. interfaces are not visible to the given loader) before calling proxyClassCache.get(loader, interfaces). I moved back the validation code from ProxyClassFactory.apply to getProxyClass0.
Ops, you're right. There could be a request with interface(s) with same name(s) but loaded by different loader(s) and such code could return wrong pre-cached proxy class instead of throwing exception. Unfortunately, moving validation to slow-path was the cause of major performance and scalability improvement observed. With validation moved back to getProxyClass0 (in spite of using two-level WeakCache), we have the following performance (WeakCache#1):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 ======================= ======= ============== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 4 3,039.01 2,555.00 8 5,193.58 4,273.42
Proxy_isProxyClassTrue 1 95.02 43.14 4 2,266.29 43.23 8 4,782.29 72.06
Proxy_isProxyClassFalse 1 95.02 1.36 4 2,186.59 1.36 8 4,891.15 2.72
Annotation_equals 1 240.10 195.68 4 1,864.06 201.41 8 8,639.20 337.46
It's a little better than original getProxyClass(), but not much. The isProxyClass() and consequently Annotation.equals() are far better though. But as you might have guessed, I kind of solved that too...
My first attempt was to optimize the interface validation. Only the "visibility" part is necessary to be performed on fast-path. Uniqueness and other things can be performed on slow-path. With split-validation and hacks like:
private static final MethodHandle findLoadedClass0MH, findBootstrapClassMH; private static final ClassLoader dummyCL = new ClassLoader() {};
static { try { Method method = ClassLoader.class.getDeclaredMethod("findLoadedClass0", String.class); method.setAccessible(true); findLoadedClass0MH = MethodHandles.lookup().unreflect(method);
method = ClassLoader.class.getDeclaredMethod("findBootstrapClass", String.class); method.setAccessible(true); findBootstrapClassMH = MethodHandles.lookup().unreflect(method); } catch (NoSuchMethodException e) { throw (Error) new NoSuchMethodError(e.getMessage()).initCause(e); } catch (IllegalAccessException e) { throw (Error) new IllegalAccessError(e.getMessage()).initCause(e); } }
static Class<?> findLoadedClass(ClassLoader loader, String name) { try { if (VM.isSystemDomainLoader(loader)) { return (Class<?>) findBootstrapClassMH.invokeExact(dummyCL, name); } else { return (Class<?>) findLoadedClass0MH.invokeExact(loader, name); } } catch (RuntimeException | Error e) { throw e; } catch (Throwable t) { throw new UndeclaredThrowableException(t); } }
... using findLoadedClass(loader, intf.getName()) and only doing Class.forName(intf.getName(), false, loader) if the former returned null ... I managed to reclaim some performance (WeakCache#1+):
Test Threads ns/op Original WeakCache#1 WeakCache#1+ ======================= ======= ============== =========== ============ Proxy_getProxyClass 1 2,403.27 2,174.51 1,589.36 4 3,039.01 2,555.00 1,929.11 8 5,193.58 4,273.42 3,409.77
...but that was still not very satisfactory.
I modified the KeyFactory to create keys that weakly-reference interface Class objects and implement hashCode/equals in terms of comparing those Class objects. With such keys, no false aliasing can occur and the whole validation can be pushed back to slow-path. I tried hard to create keys with as little space overhead as possible:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i...
...but there can be no miracles. The good news is that the performance is back (WeakCache#2):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 WeakCache#2 ======================= ======= ============== =========== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 163.57 4 3,039.01 2,555.00 211.70 8 5,193.58 4,273.42 322.14
Proxy_isProxyClassTrue 1 95.02 43.14 41.23 4 2,266.29 43.23 42.20 8 4,782.29 72.06 72.21
Proxy_isProxyClassFalse 1 95.02 1.36 1.36 4 2,186.59 1.36 1.36 8 4,891.15 2.72 2.72
Annotation_equals 1 240.10 195.68 194.61 4 1,864.06 201.41 198.81 8 8,639.20 337.46 342.90
... and the most common usage (proxy class implementing exactly one interface) uses even less space than with interface-names-key - 16 bytes saved per proxy class vs. 8 bytes saved (32 bit addressing mode):
-------------------------------------- Original j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 792 552 1 2 928 136 1 3 1064 136 1 4 1200 136 1 5 1336 136 1 6 1472 136 1 7 1608 136 1 8 1744 136 2 9 2088 344 2 10 2224 136 2 11 2360 136 2 12 2560 200 2 13 2696 136 2 14 2832 136 2 15 2968 136 2 16 3104 136
Did you know, that Proxy.getProxyClass() can generate proxy classes implementing 0 interfaces? It can. There's exactly one such class generated per class loader:
-------------------------------------- Original j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 760 360 2 2 1072 312
-------------------------------------- Patched j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 728 488 2 2 1040 312
With 2 or more interfaces implemented by proxy class the space overhead increases faster than with original Proxy:
-------------------------------------- Original j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 832 592 1 2 1008 176 1 3 1184 176 1 4 1360 176 1 5 1536 176 1 6 1712 176 1 7 1888 176 1 8 2064 176 2 9 2448 384 2 10 2624 176 2 11 2800 176 2 12 3040 240 2 13 3216 176 2 14 3392 176 2 15 3568 176 2 16 3744 176
-------------------------------------- Original j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 864 624 1 2 1072 208 1 3 1280 208 1 4 1488 208 1 5 1696 208 1 6 1904 208 1 7 2112 208 1 8 2320 208 2 9 2736 416 2 10 2944 208 2 11 3152 208 2 12 3424 272 2 13 3632 208 2 14 3840 208 2 15 4048 208 2 16 4256 208
-------------------------------------- Original j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 896 656 1 2 1136 240 1 3 1376 240 1 4 1616 240 1 5 1856 240 1 6 2096 240 1 7 2336 240 1 8 2576 240 2 9 3024 448 2 10 3264 240 2 11 3504 240 2 12 3808 304 2 13 4048 240 2 14 4288 240 2 15 4528 240 2 16 4768 240
There's an increase of 8 bytes per proxy class key for each 2 interfaces added to proxy class in original Proxy code, but there's an increase of 32 bytes per proxy class key for each single interface added in patched Proxy code.
I think the most common usage is to implement a single interface and there is 16 bytes gained for each such usage compared to original Proxy code.
2. I did some cleanup to restore some original comments to make the diffs clearer where the change is. 3. I removed the newInstance method which is dead code after my previous code. Since we are in this code, I took the chance to clean that up and also change a couple for-loop to use for-each.
I have put the merged and slightly modified Proxy.java and webrev at: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.00/
We will use this bug for your contribution: 7123493 : (proxy) Proxy.getProxyClass doesn't scale under high load
I took j.l.r.Proxy file from your webrev and just changed the KeyFactory implementation. WeakCache is generic and can be used unchanged with either implementation of KeyFactory.
For the weak cache class, since it's for proxy implementation use, I suggest to take out the supportContainsValueOperation flagas it always keeps the reverse map for isProxyClass lookup.
You can simply call Objects.requireNonNull(param) instead of requireNonNull(param, "param-name") since the proxy implementation should have made sure the argument is non-null.
Formatting nits:
68 Object cacheKey = CacheKey.valueOf( 69 key, 70 refQueue 71 );
should be: all in one line or line break on a long-line. Same for method signature.
237 void expungeFrom( 238 ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, 239 ConcurrentMap<?, Boolean> reverseMap 240 );
should be:
void expungeFrom(ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, ConcurrentMap<?, Boolean> reverseMap);
so that it'll be more consistent with the existing code. I'll do a detailed review on the weak cache class as you will finalize the code per the decision to go with the two-level weak cache.
I hope I have addressed all that in above webrev.
Regards, Peter
Thanks again for the good work.
Mandy [1] http://openjdk.java.net/jeps/161
Hi Peter, I want to give it some more thought and discuss with you next week. As for the zero number of interface, I think it's a bug and it should throw IAE if the given interfaces array is empty. Thanks for finding it and I'll file a separate bug for that since it requires spec update/clarification. Mandy On 4/20/2013 12:31 AM, Peter Levart wrote:
Hi Mandy,
I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example:
- have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like:
for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()...
I feel that Class.isAssignableFrom(Class) check could be much faster and that the only reason the check can fail is if some interface is not visible from the class loader.
Am I correct?
Regards, Peter
On 04/19/2013 04:36 PM, Peter Levart wrote:
Hi Mandy,
On 04/19/2013 07:33 AM, Mandy Chung wrote:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind...
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
thanks. Moving it to a single package-private classin java.lang.reflectand remove the interface sounds good.
Right.
I have merged your patch with the recent TL repo and did some clean up while reviewing it. Some comments: 1. getProxyClass0 should validate the input interfaces and throw IAE if any illegal argument (e.g. interfaces are not visible to the given loader) before calling proxyClassCache.get(loader, interfaces). I moved back the validation code from ProxyClassFactory.apply to getProxyClass0.
Ops, you're right. There could be a request with interface(s) with same name(s) but loaded by different loader(s) and such code could return wrong pre-cached proxy class instead of throwing exception. Unfortunately, moving validation to slow-path was the cause of major performance and scalability improvement observed. With validation moved back to getProxyClass0 (in spite of using two-level WeakCache), we have the following performance (WeakCache#1):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 ======================= ======= ============== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 4 3,039.01 2,555.00 8 5,193.58 4,273.42
Proxy_isProxyClassTrue 1 95.02 43.14 4 2,266.29 43.23 8 4,782.29 72.06
Proxy_isProxyClassFalse 1 95.02 1.36 4 2,186.59 1.36 8 4,891.15 2.72
Annotation_equals 1 240.10 195.68 4 1,864.06 201.41 8 8,639.20 337.46
It's a little better than original getProxyClass(), but not much. The isProxyClass() and consequently Annotation.equals() are far better though. But as you might have guessed, I kind of solved that too...
My first attempt was to optimize the interface validation. Only the "visibility" part is necessary to be performed on fast-path. Uniqueness and other things can be performed on slow-path. With split-validation and hacks like:
private static final MethodHandle findLoadedClass0MH, findBootstrapClassMH; private static final ClassLoader dummyCL = new ClassLoader() {};
static { try { Method method = ClassLoader.class.getDeclaredMethod("findLoadedClass0", String.class); method.setAccessible(true); findLoadedClass0MH = MethodHandles.lookup().unreflect(method);
method = ClassLoader.class.getDeclaredMethod("findBootstrapClass", String.class); method.setAccessible(true); findBootstrapClassMH = MethodHandles.lookup().unreflect(method); } catch (NoSuchMethodException e) { throw (Error) new NoSuchMethodError(e.getMessage()).initCause(e); } catch (IllegalAccessException e) { throw (Error) new IllegalAccessError(e.getMessage()).initCause(e); } }
static Class<?> findLoadedClass(ClassLoader loader, String name) { try { if (VM.isSystemDomainLoader(loader)) { return (Class<?>) findBootstrapClassMH.invokeExact(dummyCL, name); } else { return (Class<?>) findLoadedClass0MH.invokeExact(loader, name); } } catch (RuntimeException | Error e) { throw e; } catch (Throwable t) { throw new UndeclaredThrowableException(t); } }
... using findLoadedClass(loader, intf.getName()) and only doing Class.forName(intf.getName(), false, loader) if the former returned null ... I managed to reclaim some performance (WeakCache#1+):
Test Threads ns/op Original WeakCache#1 WeakCache#1+ ======================= ======= ============== =========== ============ Proxy_getProxyClass 1 2,403.27 2,174.51 1,589.36 4 3,039.01 2,555.00 1,929.11 8 5,193.58 4,273.42 3,409.77
...but that was still not very satisfactory.
I modified the KeyFactory to create keys that weakly-reference interface Class objects and implement hashCode/equals in terms of comparing those Class objects. With such keys, no false aliasing can occur and the whole validation can be pushed back to slow-path. I tried hard to create keys with as little space overhead as possible:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i...
...but there can be no miracles. The good news is that the performance is back (WeakCache#2):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 WeakCache#2 ======================= ======= ============== =========== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 163.57 4 3,039.01 2,555.00 211.70 8 5,193.58 4,273.42 322.14
Proxy_isProxyClassTrue 1 95.02 43.14 41.23 4 2,266.29 43.23 42.20 8 4,782.29 72.06 72.21
Proxy_isProxyClassFalse 1 95.02 1.36 1.36 4 2,186.59 1.36 1.36 8 4,891.15 2.72 2.72
Annotation_equals 1 240.10 195.68 194.61 4 1,864.06 201.41 198.81 8 8,639.20 337.46 342.90
... and the most common usage (proxy class implementing exactly one interface) uses even less space than with interface-names-key - 16 bytes saved per proxy class vs. 8 bytes saved (32 bit addressing mode):
-------------------------------------- Original j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 792 552 1 2 928 136 1 3 1064 136 1 4 1200 136 1 5 1336 136 1 6 1472 136 1 7 1608 136 1 8 1744 136 2 9 2088 344 2 10 2224 136 2 11 2360 136 2 12 2560 200 2 13 2696 136 2 14 2832 136 2 15 2968 136 2 16 3104 136
Did you know, that Proxy.getProxyClass() can generate proxy classes implementing 0 interfaces? It can. There's exactly one such class generated per class loader:
-------------------------------------- Original j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 760 360 2 2 1072 312
-------------------------------------- Patched j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 728 488 2 2 1040 312
With 2 or more interfaces implemented by proxy class the space overhead increases faster than with original Proxy:
-------------------------------------- Original j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 832 592 1 2 1008 176 1 3 1184 176 1 4 1360 176 1 5 1536 176 1 6 1712 176 1 7 1888 176 1 8 2064 176 2 9 2448 384 2 10 2624 176 2 11 2800 176 2 12 3040 240 2 13 3216 176 2 14 3392 176 2 15 3568 176 2 16 3744 176
-------------------------------------- Original j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 864 624 1 2 1072 208 1 3 1280 208 1 4 1488 208 1 5 1696 208 1 6 1904 208 1 7 2112 208 1 8 2320 208 2 9 2736 416 2 10 2944 208 2 11 3152 208 2 12 3424 272 2 13 3632 208 2 14 3840 208 2 15 4048 208 2 16 4256 208
-------------------------------------- Original j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 896 656 1 2 1136 240 1 3 1376 240 1 4 1616 240 1 5 1856 240 1 6 2096 240 1 7 2336 240 1 8 2576 240 2 9 3024 448 2 10 3264 240 2 11 3504 240 2 12 3808 304 2 13 4048 240 2 14 4288 240 2 15 4528 240 2 16 4768 240
There's an increase of 8 bytes per proxy class key for each 2 interfaces added to proxy class in original Proxy code, but there's an increase of 32 bytes per proxy class key for each single interface added in patched Proxy code.
I think the most common usage is to implement a single interface and there is 16 bytes gained for each such usage compared to original Proxy code.
2. I did some cleanup to restore some original comments to make the diffs clearer where the change is. 3. I removed the newInstance method which is dead code after my previous code. Since we are in this code, I took the chance to clean that up and also change a couple for-loop to use for-each.
I have put the merged and slightly modified Proxy.java and webrev at: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.00/
We will use this bug for your contribution: 7123493 : (proxy) Proxy.getProxyClass doesn't scale under high load
I took j.l.r.Proxy file from your webrev and just changed the KeyFactory implementation. WeakCache is generic and can be used unchanged with either implementation of KeyFactory.
For the weak cache class, since it's for proxy implementation use, I suggest to take out the supportContainsValueOperation flagas it always keeps the reverse map for isProxyClass lookup.
You can simply call Objects.requireNonNull(param) instead of requireNonNull(param, "param-name") since the proxy implementation should have made sure the argument is non-null.
Formatting nits:
68 Object cacheKey = CacheKey.valueOf( 69 key, 70 refQueue 71 );
should be: all in one line or line break on a long-line. Same for method signature.
237 void expungeFrom( 238 ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, 239 ConcurrentMap<?, Boolean> reverseMap 240 );
should be:
void expungeFrom(ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, ConcurrentMap<?, Boolean> reverseMap);
so that it'll be more consistent with the existing code. I'll do a detailed review on the weak cache class as you will finalize the code per the decision to go with the two-level weak cache.
I hope I have addressed all that in above webrev.
Regards, Peter
Thanks again for the good work.
Mandy [1] http://openjdk.java.net/jeps/161
On 04/21/2013 04:52 AM, Mandy Chung wrote:
Hi Peter,
I want to give it some more thought and discuss with you next week. As for the zero number of interface, I think it's a bug and it should throw IAE if the given interfaces array is empty. Thanks for finding it and I'll file a separate bug for that since it requires spec update/clarification.
I think it's a feature. It's useful, since it forwards Object methods to InvocationHandler (equals, hashCode, ...). Sometimes that's all you need. Regards, Peter
Mandy
On 4/20/2013 12:31 AM, Peter Levart wrote:
Hi Mandy,
I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example:
- have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like:
for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()...
I feel that Class.isAssignableFrom(Class) check could be much faster and that the only reason the check can fail is if some interface is not visible from the class loader.
Am I correct?
Regards, Peter
On 04/19/2013 04:36 PM, Peter Levart wrote:
Hi Mandy,
On 04/19/2013 07:33 AM, Mandy Chung wrote:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/ind...
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
thanks. Moving it to a single package-private classin java.lang.reflectand remove the interface sounds good.
Right.
I have merged your patch with the recent TL repo and did some clean up while reviewing it. Some comments: 1. getProxyClass0 should validate the input interfaces and throw IAE if any illegal argument (e.g. interfaces are not visible to the given loader) before calling proxyClassCache.get(loader, interfaces). I moved back the validation code from ProxyClassFactory.apply to getProxyClass0.
Ops, you're right. There could be a request with interface(s) with same name(s) but loaded by different loader(s) and such code could return wrong pre-cached proxy class instead of throwing exception. Unfortunately, moving validation to slow-path was the cause of major performance and scalability improvement observed. With validation moved back to getProxyClass0 (in spite of using two-level WeakCache), we have the following performance (WeakCache#1):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 ======================= ======= ============== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 4 3,039.01 2,555.00 8 5,193.58 4,273.42
Proxy_isProxyClassTrue 1 95.02 43.14 4 2,266.29 43.23 8 4,782.29 72.06
Proxy_isProxyClassFalse 1 95.02 1.36 4 2,186.59 1.36 8 4,891.15 2.72
Annotation_equals 1 240.10 195.68 4 1,864.06 201.41 8 8,639.20 337.46
It's a little better than original getProxyClass(), but not much. The isProxyClass() and consequently Annotation.equals() are far better though. But as you might have guessed, I kind of solved that too...
My first attempt was to optimize the interface validation. Only the "visibility" part is necessary to be performed on fast-path. Uniqueness and other things can be performed on slow-path. With split-validation and hacks like:
private static final MethodHandle findLoadedClass0MH, findBootstrapClassMH; private static final ClassLoader dummyCL = new ClassLoader() {};
static { try { Method method = ClassLoader.class.getDeclaredMethod("findLoadedClass0", String.class); method.setAccessible(true); findLoadedClass0MH = MethodHandles.lookup().unreflect(method);
method = ClassLoader.class.getDeclaredMethod("findBootstrapClass", String.class); method.setAccessible(true); findBootstrapClassMH = MethodHandles.lookup().unreflect(method); } catch (NoSuchMethodException e) { throw (Error) new NoSuchMethodError(e.getMessage()).initCause(e); } catch (IllegalAccessException e) { throw (Error) new IllegalAccessError(e.getMessage()).initCause(e); } }
static Class<?> findLoadedClass(ClassLoader loader, String name) { try { if (VM.isSystemDomainLoader(loader)) { return (Class<?>) findBootstrapClassMH.invokeExact(dummyCL, name); } else { return (Class<?>) findLoadedClass0MH.invokeExact(loader, name); } } catch (RuntimeException | Error e) { throw e; } catch (Throwable t) { throw new UndeclaredThrowableException(t); } }
... using findLoadedClass(loader, intf.getName()) and only doing Class.forName(intf.getName(), false, loader) if the former returned null ... I managed to reclaim some performance (WeakCache#1+):
Test Threads ns/op Original WeakCache#1 WeakCache#1+ ======================= ======= ============== =========== ============ Proxy_getProxyClass 1 2,403.27 2,174.51 1,589.36 4 3,039.01 2,555.00 1,929.11 8 5,193.58 4,273.42 3,409.77
...but that was still not very satisfactory.
I modified the KeyFactory to create keys that weakly-reference interface Class objects and implement hashCode/equals in terms of comparing those Class objects. With such keys, no false aliasing can occur and the whole validation can be pushed back to slow-path. I tried hard to create keys with as little space overhead as possible:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.01/i...
...but there can be no miracles. The good news is that the performance is back (WeakCache#2):
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original WeakCache#1 WeakCache#2 ======================= ======= ============== =========== =========== Proxy_getProxyClass 1 2,403.27 2,174.51 163.57 4 3,039.01 2,555.00 211.70 8 5,193.58 4,273.42 322.14
Proxy_isProxyClassTrue 1 95.02 43.14 41.23 4 2,266.29 43.23 42.20 8 4,782.29 72.06 72.21
Proxy_isProxyClassFalse 1 95.02 1.36 1.36 4 2,186.59 1.36 1.36 8 4,891.15 2.72 2.72
Annotation_equals 1 240.10 195.68 194.61 4 1,864.06 201.41 198.81 8 8,639.20 337.46 342.90
... and the most common usage (proxy class implementing exactly one interface) uses even less space than with interface-names-key - 16 bytes saved per proxy class vs. 8 bytes saved (32 bit addressing mode):
-------------------------------------- Original j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 1 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 792 552 1 2 928 136 1 3 1064 136 1 4 1200 136 1 5 1336 136 1 6 1472 136 1 7 1608 136 1 8 1744 136 2 9 2088 344 2 10 2224 136 2 11 2360 136 2 12 2560 200 2 13 2696 136 2 14 2832 136 2 15 2968 136 2 16 3104 136
Did you know, that Proxy.getProxyClass() can generate proxy classes implementing 0 interfaces? It can. There's exactly one such class generated per class loader:
-------------------------------------- Original j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 760 360 2 2 1072 312
-------------------------------------- Patched j.l.r.Proxy 0 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 728 488 2 2 1040 312
With 2 or more interfaces implemented by proxy class the space overhead increases faster than with original Proxy:
-------------------------------------- Original j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 768 368 1 2 920 152 1 3 1072 152 1 4 1224 152 1 5 1376 152 1 6 1528 152 1 7 1680 152 1 8 1832 152 2 9 2152 320 2 10 2304 152 2 11 2456 152 2 12 2672 216 2 13 2824 152 2 14 2976 152 2 15 3128 152 2 16 3280 152
-------------------------------------- Patched j.l.r.Proxy 2 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 832 592 1 2 1008 176 1 3 1184 176 1 4 1360 176 1 5 1536 176 1 6 1712 176 1 7 1888 176 1 8 2064 176 2 9 2448 384 2 10 2624 176 2 11 2800 176 2 12 3040 240 2 13 3216 176 2 14 3392 176 2 15 3568 176 2 16 3744 176
-------------------------------------- Original j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 3 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 864 624 1 2 1072 208 1 3 1280 208 1 4 1488 208 1 5 1696 208 1 6 1904 208 1 7 2112 208 1 8 2320 208 2 9 2736 416 2 10 2944 208 2 11 3152 208 2 12 3424 272 2 13 3632 208 2 14 3840 208 2 15 4048 208 2 16 4256 208
-------------------------------------- Original j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 400 400 1 1 776 376 1 2 936 160 1 3 1096 160 1 4 1256 160 1 5 1416 160 1 6 1576 160 1 7 1736 160 1 8 1896 160 2 9 2224 328 2 10 2384 160 2 11 2544 160 2 12 2768 224 2 13 2928 160 2 14 3088 160 2 15 3248 160 2 16 3408 160
-------------------------------------- Patched j.l.r.Proxy 4 interfaces / proxy class
class proxy size of delta to loaders classes caches prev.ln. -------- -------- -------- -------- 0 0 240 240 1 1 896 656 1 2 1136 240 1 3 1376 240 1 4 1616 240 1 5 1856 240 1 6 2096 240 1 7 2336 240 1 8 2576 240 2 9 3024 448 2 10 3264 240 2 11 3504 240 2 12 3808 304 2 13 4048 240 2 14 4288 240 2 15 4528 240 2 16 4768 240
There's an increase of 8 bytes per proxy class key for each 2 interfaces added to proxy class in original Proxy code, but there's an increase of 32 bytes per proxy class key for each single interface added in patched Proxy code.
I think the most common usage is to implement a single interface and there is 16 bytes gained for each such usage compared to original Proxy code.
2. I did some cleanup to restore some original comments to make the diffs clearer where the change is. 3. I removed the newInstance method which is dead code after my previous code. Since we are in this code, I took the chance to clean that up and also change a couple for-loop to use for-each.
I have put the merged and slightly modified Proxy.java and webrev at: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.00/
We will use this bug for your contribution: 7123493 : (proxy) Proxy.getProxyClass doesn't scale under high load
I took j.l.r.Proxy file from your webrev and just changed the KeyFactory implementation. WeakCache is generic and can be used unchanged with either implementation of KeyFactory.
For the weak cache class, since it's for proxy implementation use, I suggest to take out the supportContainsValueOperation flagas it always keeps the reverse map for isProxyClass lookup.
You can simply call Objects.requireNonNull(param) instead of requireNonNull(param, "param-name") since the proxy implementation should have made sure the argument is non-null.
Formatting nits:
68 Object cacheKey = CacheKey.valueOf( 69 key, 70 refQueue 71 );
should be: all in one line or line break on a long-line. Same for method signature.
237 void expungeFrom( 238 ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, 239 ConcurrentMap<?, Boolean> reverseMap 240 );
should be:
void expungeFrom(ConcurrentMap<?, ? extends ConcurrentMap<?, ?>> map, ConcurrentMap<?, Boolean> reverseMap);
so that it'll be more consistent with the existing code. I'll do a detailed review on the weak cache class as you will finalize the code per the decision to go with the two-level weak cache.
I hope I have addressed all that in above webrev.
Regards, Peter
Thanks again for the good work.
Mandy [1] http://openjdk.java.net/jeps/161
On 4/21/2013 6:39 AM, Peter Levart wrote:
On 04/21/2013 04:52 AM, Mandy Chung wrote:
Hi Peter,
I want to give it some more thought and discuss with you next week. As for the zero number of interface, I think it's a bug and it should throw IAE if the given interfaces array is empty. Thanks for finding it and I'll file a separate bug for that since it requires spec update/clarification.
I think it's a feature. It's useful, since it forwards Object methods to InvocationHandler (equals, hashCode, ...). Sometimes that's all you need.
Do you have specific use case or know any existing applications doing that? What's the reason one would prefer to create a proxy class with InvocationHandler rather than defining its own class that implements equals, hashCode, or toString() method? Mandy
On 04/23/2013 05:27 AM, Mandy Chung wrote:
On 4/21/2013 6:39 AM, Peter Levart wrote:
On 04/21/2013 04:52 AM, Mandy Chung wrote:
Hi Peter,
I want to give it some more thought and discuss with you next week. As for the zero number of interface, I think it's a bug and it should throw IAE if the given interfaces array is empty. Thanks for finding it and I'll file a separate bug for that since it requires spec update/clarification.
I think it's a feature. It's useful, since it forwards Object methods to InvocationHandler (equals, hashCode, ...). Sometimes that's all you need.
Do you have specific use case or know any existing applications doing that? What's the reason one would prefer to create a proxy class with InvocationHandler rather than defining its own class that implements equals, hashCode, or toString() method?
I really don't have a use-case here. I can only think of a hypothetical case where one would already have an InvocationHandler capable of serving proxies for various interfaces, including equals/hashCode methods. Now to be able to re-use that logic for constructing keys for Maps, one could simply request a proxy with no interfaces. The fact is that currently this is allowed and although there might not be any usages, it doesn't hurt to allow it in the future and there's no performance cost supporting it. I think that not putting artificial constraints on API usage without a reason is a good design. It's similar in a way to some things in language, like for example the support for empty arrays. Why would anyone want to have an empty array? Regards, Peter P.S. What do you think of the latest webrev? I did some further simplifications to code (removed checks for reverseMap != null) and re-worded some javadocs. I can publish another webrev together with possible further changes when you review it.
Mandy
On 4/22/2013 11:24 PM, Peter Levart wrote:
On 04/23/2013 05:27 AM, Mandy Chung wrote:
On 4/21/2013 6:39 AM, Peter Levart wrote:
On 04/21/2013 04:52 AM, Mandy Chung wrote:
Hi Peter,
I want to give it some more thought and discuss with you next week. As for the zero number of interface, I think it's a bug and it should throw IAE if the given interfaces array is empty. Thanks for finding it and I'll file a separate bug for that since it requires spec update/clarification.
I think it's a feature. It's useful, since it forwards Object methods to InvocationHandler (equals, hashCode, ...). Sometimes that's all you need.
Do you have specific use case or know any existing applications doing that? What's the reason one would prefer to create a proxy class with InvocationHandler rather than defining its own class that implements equals, hashCode, or toString() method?
I really don't have a use-case here. I can only think of a hypothetical case where one would already have an InvocationHandler capable of serving proxies for various interfaces, including equals/hashCode methods. Now to be able to re-use that logic for constructing keys for Maps, one could simply request a proxy with no interfaces. The fact is that currently this is allowed and although there might not be any usages, it doesn't hurt to allow it in the future and there's no performance cost supporting it. I think that not putting artificial constraints on API usage without a reason is a good design.
I can't say for sure allowing to define a proxy class with no interface is a good design and thus I propose to file a separate issue to follow up and determine if there is any issue with and without such constraint (interfaces.length() > 0 or not). Yes - I am in a position not to allow creating a proxy class with no interface since that was not the original use case (for RMI). But I see your point and we need to investigate before we decide one way or the other. If we determine no need to constraint that, we can close that bug as will not fix.
It's similar in a way to some things in language, like for example the support for empty arrays. Why would anyone want to have an empty array?
I don't think the Proxy API can be compared with empty array language support :)
Regards, Peter
P.S. What do you think of the latest webrev? I did some further simplifications to code (removed checks for reverseMap != null) and re-worded some javadocs. I can publish another webrev together with possible further changes when you review it.
Sorry I didn't get the chance to look at your past 2 webrevs (I was distracted with higher priority tasks). Mandy
Mandy
Hi Peter, Sorry for the delay. On 4/20/2013 12:31 AM, Peter Levart wrote:
Hi Mandy,
I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example:
- have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like:
for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()...
I feel that Class.isAssignableFrom(Class) check could be much faster and that the only reason the check can fail is if some interface is not visible from the class loader.
Am I correct?
The isAssignableFrom check should be correct for well-behaved class loaders [1]. However, for non well-behaved class loaders, I'm not absolutely confident that this is right. The case that I was concerned is when intf.isAssignableFrom(proxyClass) returns true but the proxy class doesn't implement the runtime types (i.e. given interfaces). Precise check should be to validate if the given interfaces == the proxy interfaces implemented by the cached proxy class (i.e. proxyClass.getInterfaces()). Class.isAssignableFrom method checks if the proxy class is a subtype of the interface and no class loading/resolution involved. It's definitely much cheaper than Class.forName which involves checking if a class is loaded (class loader delegation and class file search if not loaded - which is not the case you measure). The existing implementation uses the interface names as the key to the per-loader proxy class cache. I think we should use the Class<?>[] as the key (your webrev.02). I have quickly modified the version I sent you (simply change the Key class to use Class<?>[] rather than String[] but didn't change the concurrent map cache to keep weak references) and the benchmark shows comparable speedup.
I just noticed the following (have been thinking of it already, but then forgot) in original code:
/* 512 * Note that we need not worry about reaping the cache for 513 * entries with cleared weak references because if a proxy class 514 * has been garbage collected, its class loader will have been 515 * garbage collected as well, so the entire cache will be reaped 516 * from the loaderToCache map. 517 */
Hmm... I think the original code has the class loader leak. The WeakHashMap<ClassLoader, HashMap<List<String>, Class>> keeps a weak reference to the ClassLoader but a strong reference to the cache of the proxy classes that are defined by that class loader. The proxy classes are not GC'ed and thus the class loader is still kept alive.
Each ClassLoader maintains explicit hard-references to all Class objects for classes defined by the loader. So proxy Class object can not be GC-ed until the ClassLoader is GC-ed. So we need not register the CacheValue objects in WeakCache with a refQueue. The expunging of reverseMap entries is already performed with CacheKey when it is cleared and equeued. There's no harm as it is, since the clean-up is performed with all the checks and is idempotent, but it need not be done for ClassValue objects holding weak references to proxy Class objects.
As explained above, for the per-loader proxy class cache, both the key (interfaces) and the proxy class should be wrapped with weak reference. In your revisions, you optimize for 0-interface and 1-interface proxy class. What I hacked up earlier was just to use Class<?>[] as the key (need to make a copy of the array to prevent that being mutated during runtime) that is a simpler and straightforward implementation. I didn't measure the footprint and compare the performance of your versions. Have you seen any performance difference which led you to make the recent changes? Mandy [1] http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3
More comments in addition to what I replied earlier.... On 4/23/2013 4:43 PM, Mandy Chung wrote:
Each ClassLoader maintains explicit hard-references to all Class objects for classes defined by the loader. So proxy Class object can not be GC-ed until the ClassLoader is GC-ed.
AFAIU, a class loader will be GC'ed as long as there is no reference to any of its loaded classes. The ClassLoader's internal vector of keeping the loaded classes is to prevent the classes from being GC'ed until the class loader is being GC'ed which will unload the classes. So we should make the class loader and the proxy classes weak referenced so that they will be GC'ed properly.
So we need not register the CacheValue objects in WeakCache with a refQueue. The expunging of reverseMap entries is already performed with CacheKey when it is cleared and equeued. There's no harm as it is, since the clean-up is performed with all the checks and is idempotent, but it need not be done for ClassValue objects holding weak references to proxy Class objects.
I actually think we don't need to make CacheKey as a weak reference but the CacheValue object should still be registered in the refQueue. The proxy class in the CacheValue implementing the given interfaces always reference the interfaces in the CacheKey and it means that the classes in the CacheKey will never be GC'ed before the proxy class. When there is no reference to the proxy class, it will be added to the reference queue. Once the entry with the CacheValue holding the proxy class is expunged, the interfaces will not be referenced in the cache. Does this make sense to you? Mandy
As explained above, for the per-loader proxy class cache, both the key (interfaces) and the proxy class should be wrapped with weak reference.
In your revisions, you optimize for 0-interface and 1-interface proxy class. What I hacked up earlier was just to use Class<?>[] as the key (need to make a copy of the array to prevent that being mutated during runtime) that is a simpler and straightforward implementation. I didn't measure the footprint and compare the performance of your versions. Have you seen any performance difference which led you to make the recent changes?
Mandy [1] http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3
On 04/24/2013 07:33 AM, Mandy Chung wrote:
More comments in addition to what I replied earlier....
On 4/23/2013 4:43 PM, Mandy Chung wrote:
Each ClassLoader maintains explicit hard-references to all Class objects for classes defined by the loader. So proxy Class object can not be GC-ed until the ClassLoader is GC-ed.
AFAIU, a class loader will be GC'ed as long as there is no reference to any of its loaded classes. The ClassLoader's internal vector of keeping the loaded classes is to prevent the classes from being GC'ed until the class loader is being GC'ed which will unload the classes.
So we should make the class loader and the proxy classes weak referenced so that they will be GC'ed properly.
That's right.
So we need not register the CacheValue objects in WeakCache with a refQueue. The expunging of reverseMap entries is already performed with CacheKey when it is cleared and equeued. There's no harm as it is, since the clean-up is performed with all the checks and is idempotent, but it need not be done for ClassValue objects holding weak references to proxy Class objects.
I actually think we don't need to make CacheKey as a weak reference but the CacheValue object should still be registered in the refQueue. The proxy class in the CacheValue implementing the given interfaces always reference the interfaces in the CacheKey and it means that the classes in the CacheKey will never be GC'ed before the proxy class. When there is no reference to the proxy class, it will be added to the reference queue. Once the entry with the CacheValue holding the proxy class is expunged, the interfaces will not be referenced in the cache.
But! If the interface in the sub-key (the CacheKey in WeakCache is the 1st-level key and is a WeakReference<ClassLoader>, the sub-key holds interfaces or interface names) happens to be loaded by the same ClassLoader as the proxy class, the ClassLoader will never be GC-ed and consequently the ClassValue will never be cleared and the entry will never be expunged... We have to wrap with a WeakReference: - the ClassLoader - each individual interface Class object - proxy Class object All 3 types of objects can have implicit or explicit strong references among them. Regards, Peter
Does this make sense to you? Mandy
As explained above, for the per-loader proxy class cache, both the key (interfaces) and the proxy class should be wrapped with weak reference.
In your revisions, you optimize for 0-interface and 1-interface proxy class. What I hacked up earlier was just to use Class<?>[] as the key (need to make a copy of the array to prevent that being mutated during runtime) that is a simpler and straightforward implementation. I didn't measure the footprint and compare the performance of your versions. Have you seen any performance difference which led you to make the recent changes?
Mandy [1] http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3
Hi Mandy, On 04/24/2013 01:43 AM, Mandy Chung wrote:
Hi Peter,
Sorry for the delay.
On 4/20/2013 12:31 AM, Peter Levart wrote:
Hi Mandy,
I have another idea. Before jumping to implement it, I will first ask what do you think of it. For example:
- have an optimal interface names key calculated from interfaces. - visibility of interfaces and other validations are pushed to slow-path (inside ProxyClassFactory.apply) - the proxy Class object returned from WeakCache.get() is post-validated with a check like:
for (Class<?> intf : interfaces) { if (!intf.isAssignableFrom(proxyClass)) { throw new IllegalArgumentException(); } } // return post-validated proxyClass from getProxyClass0()...
I feel that Class.isAssignableFrom(Class) check could be much faster and that the only reason the check can fail is if some interface is not visible from the class loader.
Am I correct?
The isAssignableFrom check should be correct for well-behaved class loaders [1]. However, for non well-behaved class loaders, I'm not absolutely confident that this is right. The case that I was concerned is when intf.isAssignableFrom(proxyClass) returns true but the proxy class doesn't implement the runtime types (i.e. given interfaces). Precise check should be to validate if the given interfaces == the proxy interfaces implemented by the cached proxy class (i.e. proxyClass.getInterfaces()).
Really? This can happen? Could you describe a situation when?
Class.isAssignableFrom method checks if the proxy class is a subtype of the interface and no class loading/resolution involved. It's definitely much cheaper than Class.forName which involves checking if a class is loaded (class loader delegation and class file search if not loaded - which is not the case you measure).
The existing implementation uses the interface names as the key to the per-loader proxy class cache. I think we should use the Class<?>[] as the key (your webrev.02). I have quickly modified the version I sent you (simply change the Key class to use Class<?>[] rather than String[] but didn't change the concurrent map cache to keep weak references) and the benchmark shows comparable speedup.
I just noticed the following (have been thinking of it already, but then forgot) in original code:
/* 512 * Note that we need not worry about reaping the cache for 513 * entries with cleared weak references because if a proxy class 514 * has been garbage collected, its class loader will have been 515 * garbage collected as well, so the entire cache will be reaped 516 * from the loaderToCache map. 517 */
Hmm... I think the original code has the class loader leak. The WeakHashMap<ClassLoader, HashMap<List<String>, Class>> keeps a weak reference to the ClassLoader but a strong reference to the cache of the proxy classes that are defined by that class loader. The proxy classes are not GC'ed and thus the class loader is still kept alive.
Original code is actualy using the following structure: WeakHashMap<ClassLoader, HashMap<List<String>, WeakReference<Class>>> ... so no ClassLoader leak here... The TwoLevelWeakCache is an analogous structure: ConcurrentHashMap<WeakReference<ClassLoader>, ConcurrentHashMap<SubKey, WeakReference<Class>>> The point I was trying to make is that it is not needed to register a ReferenceQueue with WeakReference<Class> values and remove entries as they are cleared and enqueued, because they will not be cleared until the ClassLoader that defines the Class-es is GC-ed and the expunging logic can work solely on the grounds of cleared and enqueued WeakReference<ClassLoader> keys, which is what my latest webrev using String-based sub-keys does (I have to update the other too).
Each ClassLoader maintains explicit hard-references to all Class objects for classes defined by the loader. So proxy Class object can not be GC-ed until the ClassLoader is GC-ed. So we need not register the CacheValue objects in WeakCache with a refQueue. The expunging of reverseMap entries is already performed with CacheKey when it is cleared and equeued. There's no harm as it is, since the clean-up is performed with all the checks and is idempotent, but it need not be done for ClassValue objects holding weak references to proxy Class objects.
As explained above, for the per-loader proxy class cache, both the key (interfaces) and the proxy class should be wrapped with weak reference.
Not the key (interfaces array) but the individual interface Class object - each has to be separately wrapped with a WeakReference. If you just do the WeakReference<Class[]> it will quickly be cleared since no-one is holding a strong reference to the array object. The following webrev was an attempt in that direction (notice different name in URL: proxy-wc-wi - WeakCache-WeakInterfaces - I maintain separate numbering for webrevs in this branch): http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.02/i...
In your revisions, you optimize for 0-interface and 1-interface proxy class. What I hacked up earlier was just to use Class<?>[] as the key (need to make a copy of the array to prevent that being mutated during runtime) that is a simpler and straightforward implementation. I didn't measure the footprint and compare the performance of your versions. Have you seen any performance difference which led you to make the recent changes?
I developed two different approaches: 1. Key made of WeakReference-s to interface Class objects. Strong points: - no key aliasing, validation can be pushed entirely to slow-path - quick and scalable - less memory usage than original code for 0 and 1-interface proxies Weak points: - more memory usage for 2+ -interface proxies Latest webrev: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.02/i... Comments: I like this one. If 2+ -interface proxies are relatively rare and you don't mind extra space consumption for them, I would go with this approach. 2. Key made of interned Strings (names of interfaces) Strong points: - quick and scalable - much less memory usage than original code for all variations of interface counts and in particular for 0, 1 and 2-interface proxies Weak points: - key is aliased, so the validation of interfaces has to be done - I tried to do it post-festum with intf.isAssignableFrom(proxyClass) but you say that is not reliable. If the validation is performed on fast-path before proxy class is obtained, using Class.forName, the slow-down is huge. Latest webrev: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.03/inde... Comments: This is the best space-saving approach. With tricks for single and two-interface keys, the savings are noticable. So, I would really like to understand the situation where intf.isAssignableFrom(proxyClass) returns true, but proxyClass does not implement the interface. Maybe we could work-it out somehow. Are you thinking of a situation like: - intf1: pkg.Interface, loaded by classloader1 - intf2: pkg.SubInterface extends pkg.Interface, loaded by classloader1 - intf3: pkg.Interface extends pkg.SubInterface, loaded by classloader2 which is child of classloader1 Now you call: proxy3 = Proxy.getProxyClass(classloader2, intf3); followed by: proxy1 = Proxy.getProxyClass(classloader2, intf1); Is it possible that the second call succeeds and returns proxy1 == proxy3 ? Regards, Peter
Mandy [1] http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3
On 04/24/2013 08:58 AM, Peter Levart wrote:
In your revisions, you optimize for 0-interface and 1-interface proxy class. What I hacked up earlier was just to use Class<?>[] as the key (need to make a copy of the array to prevent that being mutated during runtime) that is a simpler and straightforward implementation. I didn't measure the footprint and compare the performance of your versions. *Have you seen any performance difference which led you to make the recent changes?*
I developed two different approaches:
1. Key made of WeakReference-s to interface Class objects. Strong points: - no key aliasing, validation can be pushed entirely to slow-path - quick and scalable - less memory usage than original code for 0 and 1-interface proxies
That's the sole reason for optimized: WekaReferece<Class> -> WeakReference<Class> -> ... -> WeakReference<Class> linked list instead of the WeakReference<Class>[] array approach. And it's not using any recursion for equals/hashCode, so the peformance is comparable to array approach and it doesn't suffer from StackOverflowExceptions for lots of interfaces.
Weak points: - more memory usage for 2+ -interface proxies Latest webrev: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.02/i... Comments: I like this one. If 2+ -interface proxies are relatively rare and you don't mind extra space consumption for them, I would go with this approach.
2. Key made of interned Strings (names of interfaces) Strong points: - quick and scalable - much less memory usage than original code for all variations of interface counts and in particular for 0, 1 and 2-interface proxies
The special-cased keys for 1 and 2-interface proxies (the most frequent) make for additional space savings.
Weak points: - key is aliased, so the validation of interfaces has to be done - I tried to do it post-festum with intf.isAssignableFrom(proxyClass) but you say that is not reliable. If the validation is performed on fast-path before proxy class is obtained, using Class.forName, the slow-down is huge. Latest webrev: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.03/inde... Comments: This is the best space-saving approach. With tricks for single and two-interface keys, the savings are noticable.
Regards, Peter
On 4/23/2013 11:58 PM, Peter Levart wrote:
The isAssignableFrom check should be correct for well-behaved class loaders [1]. However, for non well-behaved class loaders, I'm not absolutely confident that this is right. The case that I was concerned is when intf.isAssignableFrom(proxyClass) returns true but the proxy class doesn't implement the runtime types (i.e. given interfaces). Precise check should be to validate if the given interfaces == the proxy interfaces implemented by the cached proxy class (i.e. proxyClass.getInterfaces()).
Really? This can happen? Could you describe a situation when? Are you thinking of a situation like:
- intf1: pkg.Interface, loaded by classloader1 - intf2: pkg.SubInterface extends pkg.Interface, loaded by classloader1 - intf3: pkg.Interface extends pkg.SubInterface, loaded by classloader2 which is child of classloader1
Similar but classloader2 is non well-behaved classloader e.g. doesn't have relationship with classloader1; otherwise I don't think it's possible to define intf3 (classloader1.loadClass("pkg.Interface") should be resolved with intf1 instead (not its own defined pkg.Interface). I haven't been able to identify such a case but I wasn't able to prove it not possible either as it is subject to non well-behaved class loader behavior :) The isAssignableFrom method returns true not only if it's identical but also if it's a superinterface that a class implements.
Now you call:
proxy3 = Proxy.getProxyClass(classloader2, intf3);
followed by:
proxy1 = Proxy.getProxyClass(classloader2, intf1);
Is it possible that the second call succeeds and returns proxy1 == proxy3 ?
If it's possible to have intf1 and intf3 different runtime type of the same name, the second getProxyclass call would return proxy3 since intf1.isAssignableFrom(proxy3).What I'm not certain is - how would classloader2 be able to define intf3 with classloader1 defining intf1? We can't really predict how a non well-behaved class loader and thus I wouldn't exclude the possibility. It seems that the key matching the list of interface names should already be a safe guard but we would need a proof for a name + isAssignableFrom is equivalent to that runtime type to determine its correctness with the consideration of all possible class loader behaviors that I don't have. That's what my feedback was about. Mandy
Hi Peter, We both prefer the interface types as the subKey. See my comment inlined below. On 4/23/2013 11:58 PM, Peter Levart wrote:
I developed two different approaches:
1. Key made of WeakReference-s to interface Class objects. Strong points: - no key aliasing, validation can be pushed entirely to slow-path - quick and scalable - less memory usage than original code for 0 and 1-interface proxies Weak points: - more memory usage for 2+ -interface proxies Latest webrev: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc-wi/webrev.02/i... Comments: I like this one. If 2+ -interface proxies are relatively rare and you don't mind extra space consumption for them, I would go with this approach.
I like this one too. The mapping is straight forward and clean that avoids the fallback and post validation path. Let's proceed with this one. It's good to optimize for the common case (1-interface). For 2 or more interfaces, we can improve the memory usage in the future when it turns out to be an issue. I'm fine with the zero-interface proxy which is the current implementation too.
That's the sole reason for optimized: WekaReferece<Class> -> WeakReference<Class> -> ... -> WeakReference<Class> linked list instead of the WeakReference<Class>[] array approach. And it's not using any recursion for equals/hashCode, so the peformance is comparable to array approach and it doesn't suffer from StackOverflowExceptions for lots of interfaces.
I'm not objecting to build its own linked list but I'm not convinced if it's worth the extra complexity (although it's simple, this adds KeyNode, MidKeyNode, Key1, KeyX classes). For 1-interface proxy, you can make it one single weak reference as you have right now. I would simply think the array approach for 2+ interface proxy is preferable and the performance is comparable as you noted. Did I miss any important reason you chose the linked list approach? If not, let's do the simple approach that will help the future maintainers of this code. The change in Proxy.java looks good to me with the comment about the array vs custom linked list. WeakCache.java The javadoc for the get(K,P) method - @throws RuntimeException and @throws Error are not needed here since any method being called in the implementation may throw unchecked exceptions. There are a couple places that checks if (reverseMap != null) .... that check is not needed since it's always non-null. But I'm fine if you keep it as it is - just wanted to mention it in case it was just leftover from the previous version. I think we're very close of getting this pushed. Below are my comments to follow up the discussion we had last night and this morning just for clarification. We should be now on the same page.
Original code is actualy using the following structure: WeakHashMap<ClassLoader, HashMap<List<String>, WeakReference<Class>>> ... so no ClassLoader leak here...
Oops... somehow I thought the original code didn't make a weak reference of the proxy class. I should have looked at the original code before I replied :(
The TwoLevelWeakCache is an analogous structure: ConcurrentHashMap<WeakReference<ClassLoader>, ConcurrentHashMap<SubKey, WeakReference<Class>>>
The point I was trying to make is that it is not needed to register a ReferenceQueue with WeakReference<Class> values and remove entries as they are cleared and enqueued, because they will not be cleared until the ClassLoader that defines the Class-es is GC-ed and the expunging logic can work solely on the grounds of cleared and enqueued WeakReference<ClassLoader> keys, which is what my latest webrev using String-based sub-keys does (I have to update the other too).
Agree. That makes sense. I got confused with CacheKey and the subkey. I am now reading the WeakCache code closely. You got it right that the key (defining ClassLoader), subkey (individual interfaces), and value (proxy class) have to be weakly referenced to ensure that classes loaded by any classloader cached as a key are not strongly reachable not causing the class loader leak.
Not the key (interfaces array) but the individual interface Class object - each has to be separately wrapped with a WeakReference.
Sorry I should have been more precise - that's indeed what I meant. thanks Mandy
Hi Mandy, Here's another update that changes the sub-key back to WeakReference<Class> based: http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.05/inde... On 04/25/2013 03:38 AM, Mandy Chung wrote:
I like this one too. The mapping is straight forward and clean that avoids the fallback and post validation path. Let's proceed with this one. It's good to optimize for the common case (1-interface). For 2 or more interfaces, we can improve the memory usage in the future when it turns out to be an issue. I'm fine with the zero-interface proxy which is the current implementation too.
I made 3 straight-forward implementations of sub-keys: - Key1 - single interface - Key2 - two interfaces - KeyX - any number of interfaces The cache-structure size increment for each new cached proxy-class is (32 bit OOPS): #of intfcs original patched ---------- -------- ------------ 1 152 128(Key1) 2 152 168(Key2), 208(KeyX) 3 160 248(KeyX) 4 160 280(KeyX) ...so you can decide if Key2 is worth having or not.
WeakCache.java The javadoc for the get(K,P) method - @throws RuntimeException and @throws Error are not needed here since any method being called in the implementation may throw unchecked exceptions. There are a couple places that checks if (reverseMap != null) .... that check is not needed since it's always non-null. But I'm fine if you keep it as it is - just wanted to mention it in case it was just leftover from the previous version.
Removed the unneeded @throws and reverseMap != null checks (the later was already removed in latest string-based webrev and I used that version here).
I think we're very close of getting this pushed.
Do you think this could also get backported to JDK7? The WeakCache uses two interfaces from java.util.function. Should we make private equivalents in this patch or do we leave that for the possible back-porting effort. I should note that JDK7's ConcurrentHashMap is not that space-efficient as proposed JDK8's, so space-usage would be different on JDK7... Regards, Peter
thanks Mandy
Hi Peter, This looks great. I have imported your patch with slight modification in WeakCache: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.01/ I believe WeakCache.get(K, P) should throw NPE if key is null and I fixed that. I changed refQueue to be of type ReferenceQueue<K> rather than ReferenceQueue<Object> since CacheValue no longer added to the ref queue. In the expungeStaleEntries method, change CacheKey<?> to CacheKey<K>. WeakCache.get(K, P) takes instance of K and P but subKeyFactory and valueFactory take superclasses of K and P - is that what you really want? I have changed them to BiFunction<K,P,...>. I also fixed a few typos and that's all. The performance improvement is significant and I want to push this version to jdk8/tl. We can tune the memory usage in the future if that turns out to be an issue. I don't have plan to backport to jdk7u-dev unless there are customers escalating this :) This should be easy to convert without using BiFunction and Supplier and I will leave it as it is until there is a request to backport. I keep Key2 class since jdk also creates a proxy of 2 interfaces and you have already implemented it. If we think of a better way to replace the generation of the subkey in an optimized way, we could improve that in the future. The first and second level maps maintained in the WeakCache have Object as the type for the key which I think we should look for a specific type (CacheKey and SubKey type). To make the key of the first-level map to CacheKey would need to keep a separate cache for null key. For the second-level map, the subkey type would need to be declared as a parameter type to WeakCache. They are something that we should look at and clean up in the future if appropriate. I think what you have is good work that shouldn't be delayed any further. I'm running more tests. If the above webrev looks fine with you, I'll push the changeset tomorrow. thanks Mandy On 4/25/13 8:40 AM, Peter Levart wrote:
Hi Mandy,
Here's another update that changes the sub-key back to WeakReference<Class> based:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.05/inde...
On 04/25/2013 03:38 AM, Mandy Chung wrote:
I like this one too. The mapping is straight forward and clean that avoids the fallback and post validation path. Let's proceed with this one. It's good to optimize for the common case (1-interface). For 2 or more interfaces, we can improve the memory usage in the future when it turns out to be an issue. I'm fine with the zero-interface proxy which is the current implementation too.
I made 3 straight-forward implementations of sub-keys: - Key1 - single interface - Key2 - two interfaces - KeyX - any number of interfaces
The cache-structure size increment for each new cached proxy-class is (32 bit OOPS):
#of intfcs original patched ---------- -------- ------------ 1 152 128(Key1) 2 152 168(Key2), 208(KeyX) 3 160 248(KeyX) 4 160 280(KeyX)
...so you can decide if Key2 is worth having or not.
WeakCache.java The javadoc for the get(K,P) method - @throws RuntimeException and @throws Error are not needed here since any method being called in the implementation may throw unchecked exceptions. There are a couple places that checks if (reverseMap != null) .... that check is not needed since it's always non-null. But I'm fine if you keep it as it is - just wanted to mention it in case it was just leftover from the previous version.
Removed the unneeded @throws and reverseMap != null checks (the later was already removed in latest string-based webrev and I used that version here).
I think we're very close of getting this pushed.
Do you think this could also get backported to JDK7? The WeakCache uses two interfaces from java.util.function. Should we make private equivalents in this patch or do we leave that for the possible back-porting effort. I should note that JDK7's ConcurrentHashMap is not that space-efficient as proposed JDK8's, so space-usage would be different on JDK7...
Regards, Peter
thanks Mandy
Hi Mandy, Just a note on null keys (1st level)... On 04/25/2013 11:53 PM, Mandy Chung wrote:
Hi Peter,
This looks great. I have imported your patch with slight modification in WeakCache: http://cr.openjdk.java.net/~mchung/jdk8/webrevs/7123493/webrev.01/
I believe WeakCache.get(K, P) should throw NPE if key is null and I fixed that.
The null key support is necessary to support bootstrap classloader as a key. It could be supported by a separate 2nd-level map internally (instead of using singleton NULL_KEY substitute Object), but the external API must support null. And as I can see the webrev at above URL still supports it.
I changed refQueue to be of type ReferenceQueue<K> rather than ReferenceQueue<Object> since CacheValue no longer added to the ref queue. In the expungeStaleEntries method, change CacheKey<?> to CacheKey<K>. WeakCache.get(K, P) takes instance of K and P but subKeyFactory and valueFactory take superclasses of K and P - is that what you really want? I have changed them to BiFunction<K,P,...>. I also fixed a few typos and that's all.
It's just standard wildcards to allow functions with contra-varint parameters and co-variant returns. It's useful as a generic API but in our concrete usage doesn't have a value.
The performance improvement is significant and I want to push this version to jdk8/tl. We can tune the memory usage in the future if that turns out to be an issue. I don't have plan to backport to jdk7u-dev unless there are customers escalating this :) This should be easy to convert without using BiFunction and Supplier and I will leave it as it is until there is a request to backport.
I keep Key2 class since jdk also creates a proxy of 2 interfaces and you have already implemented it. If we think of a better way to replace the generation of the subkey in an optimized way, we could improve that in the future. The first and second level maps maintained in the WeakCache have Object as the type for the key which I think we should look for a specific type (CacheKey and SubKey type). To make the key of the first-level map to CacheKey would need to keep a separate cache for null key. For the second-level map, the subkey type would need to be declared as a parameter type to WeakCache. They are something that we should look at and clean up in the future if appropriate. I think what you have is good work that shouldn't be delayed any further.
The CacheKey is only private and internal to WeekCache, so making the 1st-level map's key as Object allows for NULL_KEY trick and makes logic more uniform. If bootstrap classloader is used a lot to define proxy classes, then a separate map can be viewed as a little speed-up for a special case though (saves one Map.get, but introduces complications with lazy instantiation - with AtomicReference perhaps). The sub-key as a type parameter would only have some value if we would want the user of WeakCache to constrain himself on the type of sub-keys returned by the supplied subKeyFunction - so the constraint (the type of sub-keys) would be specified together with the constrained function - at the WeakCache constructor call site. In our case we would have to instantiate it as Object (the common supertype of key0, Key1, Key2 and KeyX). The type parameter for sub-key has little value in general, since WeakCache only needs them to be Objects and sub-keys are never "returned" from the methods of the public API...
I'm running more tests. If the above webrev looks fine with you, I'll push the changeset tomorrow.
thanks Mandy
Fingers crossed. Regards, Peter
On 4/25/13 8:40 AM, Peter Levart wrote:
Hi Mandy,
Here's another update that changes the sub-key back to WeakReference<Class> based:
http://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.05/inde...
On 04/25/2013 03:38 AM, Mandy Chung wrote:
I like this one too. The mapping is straight forward and clean that avoids the fallback and post validation path. Let's proceed with this one. It's good to optimize for the common case (1-interface). For 2 or more interfaces, we can improve the memory usage in the future when it turns out to be an issue. I'm fine with the zero-interface proxy which is the current implementation too.
I made 3 straight-forward implementations of sub-keys: - Key1 - single interface - Key2 - two interfaces - KeyX - any number of interfaces
The cache-structure size increment for each new cached proxy-class is (32 bit OOPS):
#of intfcs original patched ---------- -------- ------------ 1 152 128(Key1) 2 152 168(Key2), 208(KeyX) 3 160 248(KeyX) 4 160 280(KeyX)
...so you can decide if Key2 is worth having or not.
WeakCache.java The javadoc for the get(K,P) method - @throws RuntimeException and @throws Error are not needed here since any method being called in the implementation may throw unchecked exceptions. There are a couple places that checks if (reverseMap != null) .... that check is not needed since it's always non-null. But I'm fine if you keep it as it is - just wanted to mention it in case it was just leftover from the previous version.
Removed the unneeded @throws and reverseMap != null checks (the later was already removed in latest string-based webrev and I used that version here).
I think we're very close of getting this pushed.
Do you think this could also get backported to JDK7? The WeakCache uses two interfaces from java.util.function. Should we make private equivalents in this patch or do we leave that for the possible back-porting effort. I should note that JDK7's ConcurrentHashMap is not that space-efficient as proposed JDK8's, so space-usage would be different on JDK7...
Regards, Peter
thanks Mandy
On 04/26/2013 08:57 AM, Mandy Chung wrote:
On 4/25/2013 11:53 PM, Peter Levart wrote:
I believe WeakCache.get(K, P) should throw NPE if key is null and I fixed that.
Oops... typo sorry for the confusion.- I meant WeakCache.containsKey(V value) should throw NPE if value is null.
I agree. Analogous to ConcurrentHashMap, for example.
Mandy
Peter, I have pushed the changeset: Changeset: 5e7ae178b24d Author: plevart Date: 2013-04-26 16:09 -0700 URL:http://hg.openjdk.java.net/jdk8/tl/jdk/rev/5e7ae178b24d Mandy
On 27/04/2013 00:13, Mandy Chung wrote:
Peter,
I have pushed the changeset:
Changeset: 5e7ae178b24d Author: plevart Date: 2013-04-26 16:09 -0700 URL:http://hg.openjdk.java.net/jdk8/tl/jdk/rev/5e7ae178b24d
Mandy This is great work, thank you!
-Alan.
On 04/27/2013 11:37 AM, Alan Bateman wrote:
On 27/04/2013 00:13, Mandy Chung wrote:
Peter,
I have pushed the changeset:
Changeset: 5e7ae178b24d Author: plevart Date: 2013-04-26 16:09 -0700 URL:http://hg.openjdk.java.net/jdk8/tl/jdk/rev/5e7ae178b24d
Mandy This is great work, thank you!
+1. -Aleksey. P.S. Dear Santa, please get more people like Peter in this project, please?
On 04/24/2013 01:43 AM, Mandy Chung wrote:
Precise check should be to validate if the given interfaces == the proxy interfaces implemented by the cached proxy class (i.e. proxyClass.getInterfaces()).
Hi Mandy, I will try to profile this approach as a post-validation and let you know the results. Regards, Peter
On 04/24/2013 09:16 AM, Peter Levart wrote:
On 04/24/2013 01:43 AM, Mandy Chung wrote:
Precise check should be to validate if the given interfaces == the proxy interfaces implemented by the cached proxy class (i.e. proxyClass.getInterfaces()).
Hi Mandy,
I will try to profile this approach as a post-validation and let you know the results.
Hi Mandy, Here's the modified webrev using optimized String-based sub-keys and pos-validation of proxyClass using proxyClass.getInterfaces(): https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.04/ind... The benchmark results are the following: Summary (4 Cores x 2 Threads i7 CPU): WeakCache+ ns/op Strings key+ Test Threads Original getIntfcs() post-valid. ======================= ======= =========== =========== Proxy_getProxyClass 1 2,453.77 565.96 4 3,051.12 653.81 8 5,113.27 1,174.33 Proxy_isProxyClassTrue 1 94.96 41.47 4 2,102.29 41.57 8 4,823.50 71.80 Proxy_isProxyClassFalse 1 86.59 1.36 4 2,139.05 1.36 8 4,639.17 2.72 Annotation_equals 1 222.86 195.39 4 2,972.93 197.66 8 9,658.96 338.62 ... not that bad, but not so great either, compared to Weakly referenced interfaces-based sub-key and no post-validation: WeakCache+ Test Threads ns/op Original intfcs key ======================= ======= ============== =========== Proxy_getProxyClass 1 2,403.27 163.57 4 3,039.01 211.70 8 5,193.58 322.14 Proxy_isProxyClassTrue 1 95.02 41.23 4 2,266.29 42.20 8 4,782.29 72.21 Proxy_isProxyClassFalse 1 95.02 1.36 4 2,186.59 1.36 8 4,891.15 2.72 Annotation_equals 1 240.10 194.61 4 1,864.06 198.81 8 8,639.20 342.90 Regards, Peter
Hi Mandy, I have noticed recently on the hotspot-runime-dev mailing list: "RFR(XXS): 8012714: Assign the unique traceid directly to the Klass upon creation"... Could this be accessed from Java? It looks like a perfect key to identify a class within a JVM. Can it be represented as int or long? That would map to int[] or long[] as a sub-key in the WeakCache?... Regards, Peter
participants (7)
-
Alan Bateman
-
Aleksey Shipilev
-
David Holmes
-
Florian Weimer
-
Mandy Chung
-
Peter Levart
-
Vitaly Davidovich