bottleneck by java.lang.Class.getAnnotations() - a better patch

Fri Nov 23 17:09:08 UTC 2012

Hi again,

I have found an inconsistency in the last patch I posted. Specifically 
the Method.getAnnotation(Class) and Constructor.getAnnotation(Class) did 
not delegate to the root instance as .getAnnotations() did. I corrected 
that in new revision of the patch:

http://dl.dropbox.com/u/101777488/jdk8-hacks/JEP-149/webrev.03/index.html

I also modified the benchmarks somewhat so that now they exercise the 
.getAnnotation(Class) methods on various objects instead of bulk 
getAnnotations() methods:

https://raw.github.com/plevart/jdk8-hacks/annotation-arrays/src/test/ReflectionTest.java

I got results for 2 different machines:

Desktop PC (one socket i7-2600K, 4 cores), Linux 3.6 64bit:
https://raw.github.com/plevart/jdk8-hacks/annotation-arrays/benchmark_results_i7-2600K.txt

and:

Sun Blade T6320 (one socket T2, 8 cores), Solaris 10 64bit:
https://raw.github.com/plevart/jdk8-hacks/annotation-arrays/benchmark_results_T2.txt

On the T2, the concurrency bottleneck is even more visible.

Regards, Peter

On 11/23/2012 12:08 PM, Peter Levart wrote:
> Hi David, Alan, Alexander and others,
>
> Here's a refinement of a patch I proposed in this thread a couple of 
> weeks ago:
>
> http://dl.dropbox.com/u/101777488/jdk8-hacks/JEP-149/webrev.02/index.html
>
> The changed sources can be browsed here:
>
> https://github.com/plevart/jdk8-hacks/tree/annotation-arrays/src/java/lang
>
> The main improvement is the replacement of a filed:
>
> Map<Class<? extends Annotation>, Annotation> declaredAnnotations;
>
> with plain:
>
> Annotation[] declaredAnnotations;
>
> ...in the Class.VolatileData structure. It can be seen from the public 
> API that caching a HashMap of declared annotations is never needed 
> since the only public method (getDeclaredAnnotations()) returns an 
> array. Therefore it is much more efficient to cache the array and only 
> clone it on every public method invocation.
>
> Unfortunately it is not so with "annotations" field. There is a public 
> method (getAnnotation(Class<? extends Annotation>)) that has O(1) time 
> complexity because of keeping a HashMap in the cache.
>
> I did an experiment with keeping annotations (declared + inherited) in 
> an array, sorted by the hashCode of annotationType and the lookup then 
> was a binary search by hashCode of annotationType followed by linear 
> search of the equal hashCode neighborhood. This has O(log N) time 
> complexity, which is not so bad, but the constant factor was pretty 
> high because invoking Annotation.annotationType() method goes through 
> dynamic Proxy to an InvocationHandler...
>
> Speaking of that, I think there's plenty of opportunity for JEP-149 
> related optimization in the field of annotations. For example, 
> generating a special class for every annotationType that would hold 
> it's own annotation data and answer to method requests directly would 
> give a large time and space gain.
>
> It can also be noticed that there's a discrepancy of how lookup is 
> implemented for example:
> - lookup of Annotation by annotationType on a class is a HashMap.get() 
> which has O(1) time complexity
> - lookup of a of Field by fieldName on a class is a linear search in 
> an array which has O(N) time complexity
> - similar for Method by methodName and argumentTypes - O(N)
>
> The time complexity improvement for Field and Method lookup could be 
> performed by keeping some cached arrays pre-sorted by the name of 
> Field or Method and employing a binary search to locate or 
> almost-locate the object. Since public API explicitly states that 
> methods returning arrays of Fields and Methods do not specify any 
> order, pre-sorting the arrays would not break the contract.
>
> Regards, Peter
>
> On 11/07/2012 11:39 AM, Peter Levart wrote:
>> On 11/07/2012 03:10 AM, David Holmes wrote:
>>> Hi Peter,
>>>
>>> The movement of the reflection caches to a helper object is exactly 
>>> what I had previously proposed here (some differences in the details 
>>> of course):
>>>
>>> http://cr.openjdk.java.net/~dholmes/JEP-149/webrev/
>>>
>>> and discussed here:
>>>
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-April/009749.html 
>>>
>>>
>>> but this did not touch the annotations fields.
>>>
>>> David
>>
>> Hi David,
>>
>> Thanks for the pointer. There is a discussion between Brian and you 
>> (to quote some of it):
>>
>> On 5/04/2012 1:28 PM, Brian Goetz wrote:
>> >/  Reducing the number of SoftReferences in ReflectionHelper also seems an
>> />/  attractive target for memory reduction. Rather than eight soft
>> />/  references (eight extra objects), maintaining a SoftRef to the entire
>> />/  RH, or at least to the part of the RH that is currently SR'ed if the two
>> />/  non-SR'ed fields can't be recomputed, would save you a whole pile of
>> />/  objects per class (and might also reduce pressure on GC, having 8x fewer
>> />/  SRs to process.)
>> /
>> I'd have to consider the intended semantics of these soft references
>> before considering any change here. It would hard to predict how this
>> might impact runtime performance if we have coarser-grained soft
>> references. The current changes should be semantically null.
>>
>> >/  Finally, you may be able save an extra field per Class by storing the
>> />/  ReflectionHelper in a ClassValue on Java SE 8, rather than a field.
>> /
>> ClassValue is something I'm keeping an eye on, but an entry in
>> ClassValue is going to consume more dynamic memory than a single direct
>> field.
>>
>> Thanks,
>> David
>>
>>
>> ...the 8 SoftReferences refer to arrays which are never hard 
>> referenced by the outside world (arrays are always defensively 
>> copied), so it's reasonable to expect that all SoftReferences would 
>> be cleared at the same time anyway. And if 8 SoftReferences are 
>> replaced with 1, the worst case scenario (to quote Hinkmond Wong):
>>
>> Hi Brian,
>>
>> One of the issues we have in the Java Embedded group (as David points
>> out in his summary), is that while on paper the theoretical max savings
>> seems great (as you point out also), in practice as David points out in
>> his note, this might be a wash if there are a lot more reflection using
>> classes vs. non-reflection using classes in "typical" real-world
>> applications, not the low or zero reflection using class ratio that
>> happens in the theoretical "best case".
>>
>> So, a question comes up if we should judge the merit of this change on
>> the theoretical "best case" scenario, or should we judge it on
>> real-world applicability to "typical" apps (such as a finite set of
>> customer surveyed embedded apps that we feel represent a real-world
>> scenario).
>>
>>
>> Thanks,
>> Hinkmond
>>
>>
>> ...actually becomes even more favourable. We reduce huge overhead 
>> (each SoftReference is 4 OOPs and 1 long). And if this single 
>> SoftReference is ever cleared, more memory is released - the whole 
>> structure (ReflectionHelper / VolatileData)
>>
>> Other differences in details between your proposal and mine:
>>
>> In your proposal, the method ReflectionHelper rh() is equivalent to 
>> mine VolatileData volatileData() - it lazily constructs the structure 
>> and returns it. My implementation also incorporates the logic of 
>> clearCachesOnClassRedefinition() by returning and installing a new 
>> instance of the structure in case a redefinition is detected. This 
>> has a profound impact on the correctness of the cached data in face 
>> of races that can occur.
>>
>> In your proposal, even if the VM could atomically publish changes to 
>> raw reflection data and the classRedefinitionCount at the same time 
>> (we hope that at least the order of publishing is such that 
>> classRedefinitionCount is updated last), it can theoretically be that 
>> 2 or more redefinitions of the same class happen in close proximity:
>>
>> VM thread: redefines the class to version=1
>> thread 1: clears the cache and takes  version=1 raw data and computes 
>> derived data but gets pre-empted before installing it
>> VM thread: redefines the class to version=2
>> thread 2: clears the cache and takes  version=2 raw data and computes 
>> derived data and installs it
>> thread 1: ...gets back and installs version=1 derived data over 
>> version=2 data
>>
>> ...if there are no more class redefinitions, the stale version of 
>> derived data can persist indefinitely.
>>
>> In my proposal, each thread will get it's own copy of the structure 
>> in the above scenario and install the derived data into it. It can 
>> happen that a particular instance of the structure does not represent 
>> a "snapshot" view of the world, but that is not important, since that 
>> particular inconsistent instance is only used for the in-flight call 
>> and only in that part that is consistent. Other callers will get a 
>> fresh instance.
>>
>> There is also one thing I overlooked and you haven't: the 
>> cachedConstructor and newInstanceCallerCache fields.
>>
>> I'll have to look at how to incorporate them into my scheme. They are 
>> currently neither SoftReferenced nor cleared at class redefinition. 
>> As the cachedConstructor is only used to implement the .newInstance() 
>> method, I wonder if it is safe not to clear it when the class is 
>> redefined. Are old versions of Constructors still valid for invoking 
>> in a redefined class? I guess they must be, since user code is free 
>> to cache it's own versions and class redefinition should not prevent 
>> invoking them...
>>
>> Since cachedConstructor/newInstanceCallerCache are used to optimize 
>> .newInstance() method. That alone suggests that calling this method 
>> is more common use-case than others. Perhaps leaving this pair of 
>> fields out of the game is a better approach space-saving wise.
>>
>> Regards, Peter
>>
>>>
>>> On 6/11/2012 11:12 PM, Peter Levart wrote:
>>>> On 11/05/2012 06:23 AM, David Holmes wrote:
>>>>> Hi Peter,
>>>>>
>>>>> Moving the annotations fields into a helper object would tie in with
>>>>> the Class-instance size reduction effort that was investigated as 
>>>>> part
>>>>> of "JEP 149: Reduce Core-Library Memory Usage":
>>>>>
>>>>> http://openjdk.java.net/jeps/149
>>>>>
>>>>> The investigations there to date only looked at relocating the
>>>>> reflection related fields, though the JEP mentions the annotations as
>>>>> well.
>>>>>
>>>>> Any such effort requires extensive benchmarking and performance
>>>>> analysis before being accepted though.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>
>>>> On 11/05/2012 10:25 AM, Alan Bateman wrote:
>>>>> Here's a good starting place on the interaction of runtime visible
>>>>> attributes and RedefineClasses/RetransformClasses:
>>>>>
>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5002251
>>>>>
>>>>> -Alan.
>>>>
>>>> Hi all,
>>>>
>>>> Presented here is a patch mainly against java.lang.Class and also
>>>> against java.lang.reflect.[Field,Method,Constructor,Executable] 
>>>> classes.
>>>>
>>>> Currently java.lang.Class uses the following fields to maintain caches
>>>> of reflection data that are invalidated as a result of class or
>>>> superclass redefinition/re-transformation:
>>>>
>>>> private volatile transient SoftReference<Field[]> declaredFields;
>>>> private volatile transient SoftReference<Field[]> publicFields;
>>>> private volatile transient SoftReference<Method[]> declaredMethods;
>>>> private volatile transient SoftReference<Method[]> publicMethods;
>>>> private volatile transient SoftReference<Constructor<T>[]>
>>>> declaredConstructors;
>>>> private volatile transient SoftReference<Constructor<T>[]>
>>>> publicConstructors;
>>>> private volatile transient SoftReference<Field[]> 
>>>> declaredPublicFields;
>>>> private volatile transient SoftReference<Method[]> 
>>>> declaredPublicMethods;
>>>>
>>>> // Value of classRedefinedCount when we last cleared the cached values
>>>> // that are sensitive to class redefinition.
>>>> private volatile transient int lastRedefinedCount = 0;
>>>>
>>>> // Annotations cache
>>>> private transient Map<Class<? extends Annotation>, Annotation> 
>>>> annotations;
>>>> private transient Map<Class<? extends Annotation>, Annotation>
>>>> declaredAnnotations;
>>>>
>>>> If I understand Alan's references correctly, current VM can 
>>>> redefine the
>>>> class in a way that changes method bodies. Also new methods can be
>>>> added. And the set of annotations can also be altered. And future
>>>> improvements could allow even more.
>>>>
>>>> Because annotations are cached on Field/Method/Constructor instances,
>>>> all the above fields must be invalidated when the class or 
>>>> superclass is
>>>> redefined.
>>>>
>>>> It can also be observed that Field/Method/Constructor caches are
>>>> maintained using SoftReferences but annotations are hard references. I
>>>> don't know if this is intentional. I believe that annotations could 
>>>> also
>>>> be SoftReferenced, so that in the event of memory pressure they get
>>>> cleared. Many applications retrieve annotations only in the early 
>>>> stages
>>>> of their life-cycle and then either cache them themselves or forget
>>>> about them.
>>>>
>>>> So I designed the patch to equalize this. If this is undesirable, the
>>>> patch could be modified to make a distinction again.
>>>>
>>>> The patch replaces the above-mentioned java.lang.Class fields with a
>>>> single field:
>>>>
>>>> private volatile transient SoftReference<VolatileData<T>> 
>>>> volatileData;
>>>>
>>>> ...which is a SoftReference to the following structure:
>>>>
>>>> // volatile data that might get invalid when JVM TI 
>>>> RedefineClasses() is
>>>> called
>>>> static class VolatileData<T> {
>>>> volatile Field[] declaredFields;
>>>> volatile Field[] publicFields;
>>>> volatile Method[] declaredMethods;
>>>> volatile Method[] publicMethods;
>>>> volatile Constructor<T>[] declaredConstructors;
>>>> volatile Constructor<T>[] publicConstructors;
>>>> // Intermediate results for getFields and getMethods
>>>> volatile Field[] declaredPublicFields;
>>>> volatile Method[] declaredPublicMethods;
>>>> // Annotations
>>>> volatile Map<Class<? extends Annotation>, Annotation> annotations;
>>>> volatile Map<Class<? extends Annotation>, Annotation> 
>>>> declaredAnnotations;
>>>> // Value of classRedefinedCount when we created this VolatileData 
>>>> instance
>>>> final int redefinedCount;
>>>>
>>>> So let's look at static overhead differences using 64 bit addressing
>>>> (useful load - arrays, Maps not counted since the patched code uses 
>>>> the
>>>> same amount of same types of each).
>>>>
>>>> * Fresh java.lang.Class instance:
>>>>
>>>> current JDK8 code:
>>>>
>>>> 10 OOPs + 1 int = 10*8+4 = 84 bytes in 1 instance
>>>>
>>>> vs. patched code :
>>>>
>>>> 1 OOP = 8 bytes in 1 instance
>>>>
>>>> * Fully loaded java.lang.Class (Fields, Methods, Constructors, 
>>>> annotations):
>>>>
>>>> current JDK8 code:
>>>>
>>>> 10 OOPs + 1 int = 84 bytes
>>>> 8 SoftReference instances = 8*(header + 4 OOPs + 1 long) = 
>>>> 8*(24+32+8) =
>>>> 8*64 = 512 bytes
>>>> total: 84+512 = 596 bytes in 9 instances
>>>>
>>>> vs. patched code :
>>>>
>>>> 1 OOP = 8 bytes
>>>> 1 SoftReference = 64 bytes
>>>> 1 VolatileData = header + 10 OOPs + 1 int = 24+84 = 108 bytes
>>>> total: 8+64+108 = 180 bytes in 3 instances
>>>>
>>>> So we have 84 vs. 8 (empty) or 596 vs. 180 (fully loaded) byte 
>>>> overheads and
>>>> 1 vs. 1 (empty) or 9 vs. 3 (fully loaded) instance overheads
>>>>
>>>> Other than that, the patch also removes synchronized blocks for lazy
>>>> initialization of annotations in Class, Field, Method and Constructor
>>>> and replaces them with volatile fields. In case of Class.volatileData,
>>>> this field is initialized using a CAS so there is no race which could
>>>> install an already stale instance over more recent. Although such race
>>>> would quickly be corrected at next call to any retrieval method, 
>>>> because
>>>> redefinedCount is now an integral part of the cached structure not an
>>>> individual volatile field.
>>>>
>>>> There is also a change in how annotations are cached in Field, Method
>>>> and Constructor. Originally they are cached in each copy of the
>>>> Field/Method/Constructor that is returned to the outside world at each
>>>> invocation of Class.getFields() etc. Such caching is not very 
>>>> effective
>>>> if the annotations are only retrieved once per instance. The patch
>>>> changes this and delegates caching to the "root" instance which is 
>>>> held
>>>> inside Class so caching becomes more effective in certain usage
>>>> patterns. There's now a possible hot-spot on the "root" instance but
>>>> that seems not to be a bottleneck since the fast-path does not involve
>>>> blocking synchronization (just volatile read). The effects of this
>>>> change are clearly visible in one of the benchmarks.
>>>>
>>>> I have tried to create 3 micro benchmarks which exercise concurrent 
>>>> load
>>>> on 3 Class instances.
>>>>
>>>> Here's the benchmark code:
>>>>
>>>> https://raw.github.com/plevart/jdk8-hacks/master/src/test/ReflectionTest.java 
>>>>
>>>>
>>>> And here are the results when run on an Intel i7 CPU (4 cores, 2
>>>> threads/core) Linux machine using -Xmx4G VM option:
>>>>
>>>> https://raw.github.com/plevart/jdk8-hacks/master/benchmark_results.txt
>>>>
>>>>
>>>> The huge difference of Test1 results is a direct consequence of 
>>>> patched
>>>> code delegating caching of annotations in Field/Method/Constructor to
>>>> the "root" instance.
>>>>
>>>> Test2 results show no noticeable difference between original and 
>>>> patched
>>>> code. This, I believe, is the most common usage of the API, so another
>>>> level of indirection does not appear to present any noticeable
>>>> performance overhead.
>>>>
>>>> The Test3 on the other hand shows the synchronization overhead of
>>>> current jdk8 code in comparison with non-blocking synchronization in
>>>> patched code.
>>>>
>>>> JEP 149 also mentions testing with SPECjbb2005 and SPECjvm98, but that
>>>> exceeds my possibilities.
>>>>
>>>> The patch against jdk8/jdk8/jdk hg repository is here:
>>>>
>>>> https://raw.github.com/plevart/jdk8-hacks/master/volatile_class_data_caching.patch 
>>>>
>>>>
>>>> You can also browse the changed sources:
>>>>
>>>> https://github.com/plevart/jdk8-hacks
>>>>
>>>>
>>>> Regards, Peter
>>>>
>