RFR: 8247532: Records deserialization is slow
Claes Redestad
claes.redestad at oracle.com
Tue Jun 23 08:23:10 UTC 2020
On 2020-06-23 10:06, Claes Redestad wrote:
> Hi,
>
> On 2020-06-23 09:49, Peter Levart wrote:
>> Hi Chris, Claes,
>>
>>
>> Ok then, here's with benchmark included:
>>
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/
>>
>>
>>
>> I haven't been able to run the benchmark with "make test" though. I
>> have tried various ways to pass javac options to build like:
>>
>>
>> make test TEST='micro:org.openjdk.bench.java.io.RecordDeserialization'
>> TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"
>>
>>
>> ...but javac doesn't seem to get them. Is there some secret option to
>> achieve that?
>
> Hmm, we might as well have the microbenchmarks build with
> --enable-preview on by default. Try this:
Fixed:
diff -r f2e1cd498381 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:08:35 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:33:17 2020 +0200
@@ -90,10 +90,11 @@
TARGET_RELEASE := $(TARGET_RELEASE_NEWJDK_UPGRADED), \
SMALL_JAVA := false, \
CLASSPATH := $(MICROBENCHMARK_CLASSPATH), \
- DISABLED_WARNINGS := processing rawtypes cast serial, \
+ DISABLED_WARNINGS := processing rawtypes cast serial preview, \
SRC := $(MICROBENCHMARK_SRC), \
BIN := $(MICROBENCHMARK_CLASSES), \
JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules
java.management, \
+ JAVAC_FLAGS := --enable-preview, \
))
$(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
I verified this works with your micro, and doesn't seem to cause
any issues elsewhere:
make test TEST=micro:RecordDeserialization
I can shepherd this as a separate fix for documentation purposes, but
feel free to include it in your patch and ping build-dev at ..
/Claes
>
> diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
> --- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:54:42 2020 +0200
> +++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:59:29 2020 +0200
> @@ -93,7 +93,7 @@
> DISABLED_WARNINGS := processing rawtypes cast serial, \
> SRC := $(MICROBENCHMARK_SRC), \
> BIN := $(MICROBENCHMARK_CLASSES), \
> - JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules
> java.management, \
> + JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported
> --limit-modules java.management, \
> ))
>
> $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)
>
>>
>>
>> Otherwise, I simulated what would happen when there are more then 10
>> ObjectStreamClass layouts for same class rapidly interchanging, so
>> that they push each other out of the cache, by temporarily setting
>> cache's MAX_SIZE = 0. Note that this is worst case scenario:
>>
>>
>> Benchmark (length) Mode Cnt
>> Score Error Units
>> RecordDeserializationBench.deserializeClasses 10 avgt 10
>> 9.393 ± 0.287 us/op
>> RecordDeserializationBench.deserializeClasses 100 avgt 10
>> 35.642 ± 0.977 us/op
>> RecordDeserializationBench.deserializeClasses 1000 avgt 10
>> 293.769 ± 7.321 us/op
>> RecordDeserializationBench.deserializeRecords 10 avgt 10
>> 15.335 ± 0.496 us/op
>> RecordDeserializationBench.deserializeRecords 100 avgt 10
>> 211.427 ± 11.908 us/op
>> RecordDeserializationBench.deserializeRecords 1000 avgt 10
>> 990.398 ± 26.681 us/op
>>
>>
>> This is using JMH option '-gc true' to force GC after each iteration
>> of benchmark. Without it, I get a big (~4s) full-GC pause just in the
>> middle of a run with length=100:
>>
>>
>> Iteration 1: 528.577 us/op
>> Iteration 2: 580.404 us/op
>> Iteration 3: 4438.228 us/op
>> Iteration 4: 644.532 us/op
>> Iteration 5: 698.493 us/op
>> Iteration 6: 800.738 us/op
>> Iteration 7: 929.791 us/op
>> Iteration 8: 870.946 us/op
>> Iteration 9: 863.416 us/op
>> Iteration 10: 916.508 us/op
>>
>>
>> ...so results are a bit off because of that:
>>
>>
>> Benchmark (length) Mode Cnt
>> Score Error Units
>> RecordDeserializationBench.deserializeClasses 10 avgt 10
>> 8.263 ± 0.043 us/op
>> RecordDeserializationBench.deserializeClasses 100 avgt 10
>> 33.406 ± 0.160 us/op
>> RecordDeserializationBench.deserializeClasses 1000 avgt 10
>> 287.595 ± 0.960 us/op
>> RecordDeserializationBench.deserializeRecords 10 avgt 10
>> 15.270 ± 0.080 us/op
>> RecordDeserializationBench.deserializeRecords 100 avgt 10
>> 1127.163 ± 1771.892 us/op
>> RecordDeserializationBench.deserializeRecords 1000 avgt 10
>> 2003.235 ± 227.159 us/op
>>
>>
>> Yes, there is quite a bit of GCing going on when cache is thrashing.
>> Note that I haven't tuned GC in any way and I'm running this on a
>> machine with 64GiB of RAM so heap is allowed to grow quite big and G1
>> is used by default I think.
>>
>>
>> This is still no worse than before the patch:
>>
>>
>> Benchmark (length) Mode Cnt
>> Score Error Units
>> RecordDeserialization.deserializeClasses 10 avgt 10
>> 8.382 : 0.013 us/op
>> RecordDeserialization.deserializeClasses 100 avgt 10
>> 33.736 : 0.171 us/op
>> RecordDeserialization.deserializeClasses 1000 avgt 10
>> 271.224 : 0.953 us/op
>> RecordDeserialization.deserializeRecords 10 avgt 10
>> 58.606 : 0.446 us/op
>> RecordDeserialization.deserializeRecords 100 avgt 10
>> 530.044 : 1.752 us/op
>> RecordDeserialization.deserializeRecords 1000 avgt 10
>> 5335.624 : 44.942 us/op
>>
>>
>> ... since caching of adapted method handle for multiple objects
>> withing single stream is still effective.
>>
>> I doubt there will ever be more than 10 variants/versions of the same
>> record class deserialized by the same VM and in rapid succession, so I
>> think this should not be an issue. We could add a system property to
>> control the MAX_SIZE of cache if you think it is needed.
>
> Thanks for running the numbers on this, and I agree - it seems
> outlandishly improbable (most will only see one, and if you have to
> maintain 10+ different serialized shapes of some record you likely have
> bigger problems).
>
> I'd say let's keep it constant unless someone actually asks for it.
>
> /Claes
>
>>
>>
>> Regards, Peter
>>
>>
>> On 6/22/20 1:04 AM, Claes Redestad wrote:
>>> Hi Peter,
>>>
>>> patch and results look great!
>>>
>>> My only real comment on this is that I think the microbenchmark would be
>>> a valuable contribution, too.
>>>
>>> It'd also be interesting to explore how poor performance would become if
>>> we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
>>> 10, 11, or 12 different shapes.
>>>
>>> /Claes
>>>
>>> On 2020-06-21 19:16, Peter Levart wrote:
>>>> Hi,
>>>>
>>>>
>>>> When re-running the benchmark [1] with different lengths of
>>>> serialized arrays of records, I found that, compared to classical
>>>> classes, lookup into the cache of adapted method handles starts to
>>>> show when the length of array is larger (# of instances of same
>>>> record type deserialized in single stream). Each record deserialized
>>>> must lookup the method handle in a ConcurrentHashMap:
>>>>
>>>>
>>>> Benchmark (length) Mode Cnt
>>>> Score Error Units
>>>> RecordSerializationBench.deserializeClasses 10 avgt 10 8.088
>>>> ± 0.013 us/op
>>>> RecordSerializationBench.deserializeClasses 100 avgt 10
>>>> 32.171 ± 0.324 us/op
>>>> RecordSerializationBench.deserializeClasses 1000 avgt 10
>>>> 279.762 ± 3.072 us/op
>>>> RecordSerializationBench.deserializeRecords 10 avgt 10 9.011
>>>> ± 0.027 us/op
>>>> RecordSerializationBench.deserializeRecords 100 avgt 10
>>>> 33.206 ± 0.514 us/op
>>>> RecordSerializationBench.deserializeRecords 1000 avgt 10
>>>> 325.137 ± 0.969 us/op
>>>>
>>>>
>>>> ...so keeping the correctly shaped adapted method handle in the
>>>> per-serialization-session ObjectStreamClass instance [2] starts to
>>>> make sense:
>>>>
>>>>
>>>> Benchmark (length) Mode Cnt
>>>> Score Error Units
>>>> RecordSerializationBench.deserializeClasses 10 avgt 10 8.681
>>>> ± 0.155 us/op
>>>> RecordSerializationBench.deserializeClasses 100 avgt 10
>>>> 32.496 ± 0.087 us/op
>>>> RecordSerializationBench.deserializeClasses 1000 avgt 10
>>>> 279.014 ± 1.189 us/op
>>>> RecordSerializationBench.deserializeRecords 10 avgt 10 8.537
>>>> ± 0.032 us/op
>>>> RecordSerializationBench.deserializeRecords 100 avgt 10
>>>> 31.451 ± 0.083 us/op
>>>> RecordSerializationBench.deserializeRecords 1000 avgt 10
>>>> 250.854 ± 2.772 us/op
>>>>
>>>>
>>>> With that, more objects means advantage over classical classes
>>>> instead of disadvantage.
>>>>
>>>>
>>>> [1]
>>>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java
>>>>
>>>>
>>>> [2]
>>>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/
>>>>
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>
More information about the core-libs-dev
mailing list