Biased locking Obsoletion
Patricio Chilano
patricio.chilano.mateo at oracle.com
Mon Nov 9 04:23:09 UTC 2020
Hi Andrew,
On 11/6/20 1:36 PM, Andrew Haley wrote:
> On 11/6/20 5:29 PM, Doerr, Martin wrote:
>> My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off.
> I don't believe they are, because there are several (many?) places in
> the Java library that perform badly. In particular, see JDK-8254078,
> DataOutputStream is very slow post-disabling of Biased Locking.
I looked at JDK-8254078 and run some small tests on the JMH benchmarks
defined in DataOutputStreamTest.java so I could have a better picture of
the issue.
I see that in each of those @benchmarks we are essentially calling in a
loop the write() method of BufferedOutputStream, ByteArrayOutputStream
or FileOutputStream respectively. In both BufferedOutputStream and
ByteArrayOutputStream write() is synchronized and does very little work
(buf[count++] = byte) so for those cases the benchmark can basically be
reduced to:
while (true) {
synchronized (BufferedOutputStream/ByteArrayOutputStream object) {
buf[count++] = byte;
}
}
Since the workload is single-threaded this particular benchmark is ideal
for biased locking. (Although from the performance numbers posted seems
BufferedOutputStream didn't get that much worst with BL disabled. Maybe
it was an outlier). On the other hand FileOutputStream's write() is not
synchronized, but rather is a native call that ends up in a write()
syscall, so makes sense that performance didn't change with or without
BL for the dataOutputStreamOverRawFileStream case.
I run a trace of all synchronization attempts for the 3 benchmarks just
to look at the numbers. I uploaded all the results in
http://cr.openjdk.java.net/~pchilanomate/jmh%208254078trace/ but here is
a summary for the "short" case (only 1iteration/1s - 1warmup/1s):
dataOutputStreamOverBufferedFileStream:
Count Thread Class
417792 0x00007fb0a05b3a60 java.io.BufferedOutputStream
3831 0x00007fb0a0026ec0 java.lang.Object
2111 0x00007fb0a0026ec0 java.util.concurrent.ConcurrentHashMap$Node
838 0x00007fb0a0026ec0 java.util.zip.Inflater$InflaterZStreamRef
dataOutputStreamOverByteArray:
Count Thread Class
417894 0x00007fd7cc5b3850 java.io.ByteArrayOutputStream
3830 0x00007fd7cc026ec0 java.lang.Object
2252 0x00007fd7cc026ec0 java.util.concurrent.ConcurrentHashMap$Node
838 0x00007fd7cc026ec0 java.util.zip.Inflater$InflaterZStreamRef
dataOutputStreamOverRawFileStream:
Count Thread Class
3831 0x00007eff78026ec0 java.lang.Object
2113 0x00007eff78026ec0 java.util.concurrent.ConcurrentHashMap$Node
838 0x00007eff78026ec0 java.util.zip.Inflater$InflaterZStreamRef
So again, it makes sense benchmark dataOutputStreamOverRawFileStream is
not affected with BL disabled but the other two are.
With your patch to DataOutputStream we keep calling a synchronized
write() method for the BufferedOutputStream and ByteArrayOutputStream
cases, but now we copy more bytes each time we synchronize. The results
again for the "short" case (only 1iteration/1s - 1warmup/1s):
(all uploaded to
http://cr.openjdk.java.net/~pchilanomate/jmh%208254078trace/patched/)
dataOutputStreamOverBufferedFileStream:
Count Thread Class
354304 0x00007f540c5d3d50 java.io.BufferedOutputStream
3830 0x00007f540c026ec0 java.lang.Object
2110 0x00007f540c026ec0 java.util.concurrent.ConcurrentHashMap$Node
838 0x00007f540c026ec0 java.util.zip.Inflater$InflaterZStreamRef
dataOutputStreamOverByteArray:
Count Thread Class
387261 0x00007fd80c5cb750 java.io.ByteArrayOutputStream
3830 0x00007fd80c026ec0 java.lang.Object
2252 0x00007fd80c026ec0 java.util.concurrent.ConcurrentHashMap$Node
838 0x00007fd80c026ec0 java.util.zip.Inflater$InflaterZStreamRef
dataOutputStreamOverRawFileStream:
Count Thread Class
3830 0x00007efd28026ec0 java.lang.Object
2112 0x00007efd28026ec0 java.util.concurrent.ConcurrentHashMap$Node
838 0x00007efd28026ec0 java.util.zip.Inflater$InflaterZStreamRef
Still a big synchronization count on java.io.BufferedOutputStream and
java.io.ByteArrayOutputStream as expected, but since we are copying more
bytes each time performance still improves relative to the unpatched
version. dataOutputStreamOverRawFileStream remains unaltered in terms of
synchronization also as expected, but it gets greatly benefited
indirectly because we are now making less native calls for each run of
the benchmark.
Now, the way I see it is that the problem appears because we are calling
a synchronized method when we really shouldn't. If we are in a
single-threaded case the DataOutputStream objects should have been
backed by an output stream that is unsynchronized. So seems to me that
the issue is we are missing an unsynchronized version of
BufferedOutputStream and ByteArrayOutputStream. Shouldn't we provide
that in the JDK library to solve for these cases ? That way these kind
of workflows will run even faster than with biased locking.
Thanks,
Patricio
> This is not a solved problem, and I don't know what a "typical"
> user is, but some users may experience significant performance
> degradation. This is not a solved problem.
More information about the hotspot-dev
mailing list