Biased locking Obsoletion

Patricio Chilano patricio.chilano.mateo at oracle.com
Mon Nov 9 04:23:09 UTC 2020


Hi Andrew,

On 11/6/20 1:36 PM, Andrew Haley wrote:
> On 11/6/20 5:29 PM, Doerr, Martin wrote:
>> My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off.
> I don't believe they are, because there are several (many?) places in
> the Java library that perform badly. In particular, see JDK-8254078,
> DataOutputStream is very slow post-disabling of Biased Locking.
I looked at JDK-8254078 and run some small tests on the JMH benchmarks 
defined in DataOutputStreamTest.java so I could have a better picture of 
the issue.

I see that in each of those @benchmarks we are essentially calling in a 
loop the write() method of BufferedOutputStream, ByteArrayOutputStream 
or FileOutputStream respectively. In both BufferedOutputStream and 
ByteArrayOutputStream write() is synchronized and does very little work 
(buf[count++] = byte) so for those cases the benchmark can basically be 
reduced to:

while (true) {
     synchronized (BufferedOutputStream/ByteArrayOutputStream object) {
         buf[count++] = byte;
     }
}

Since the workload is single-threaded this particular benchmark is ideal 
for biased locking. (Although from the performance numbers posted seems 
BufferedOutputStream didn't get that much worst with BL disabled. Maybe 
it was an outlier). On the other hand FileOutputStream's write() is not 
synchronized, but rather is a native call that ends up in a write() 
syscall, so makes sense that performance didn't change with or without 
BL for the dataOutputStreamOverRawFileStream case.

I run a trace of all synchronization attempts for the 3 benchmarks just 
to look at the numbers. I uploaded all the results in 
http://cr.openjdk.java.net/~pchilanomate/jmh%208254078trace/ but here is 
a summary for the "short" case (only 1iteration/1s - 1warmup/1s):

dataOutputStreamOverBufferedFileStream:
Count     Thread                 Class
417792     0x00007fb0a05b3a60     java.io.BufferedOutputStream
    3831     0x00007fb0a0026ec0     java.lang.Object
    2111     0x00007fb0a0026ec0 java.util.concurrent.ConcurrentHashMap$Node
     838     0x00007fb0a0026ec0 java.util.zip.Inflater$InflaterZStreamRef

dataOutputStreamOverByteArray:
Count     Thread                 Class
417894     0x00007fd7cc5b3850     java.io.ByteArrayOutputStream
    3830     0x00007fd7cc026ec0     java.lang.Object
    2252     0x00007fd7cc026ec0 java.util.concurrent.ConcurrentHashMap$Node
     838     0x00007fd7cc026ec0 java.util.zip.Inflater$InflaterZStreamRef

dataOutputStreamOverRawFileStream:
Count     Thread                 Class
   3831     0x00007eff78026ec0     java.lang.Object
    2113     0x00007eff78026ec0 java.util.concurrent.ConcurrentHashMap$Node
     838     0x00007eff78026ec0 java.util.zip.Inflater$InflaterZStreamRef

So again, it makes sense benchmark dataOutputStreamOverRawFileStream is 
not affected with BL disabled but the other two are.

With your patch to DataOutputStream we keep calling a synchronized 
write() method for the BufferedOutputStream and ByteArrayOutputStream 
cases, but now we copy more bytes each time we synchronize. The results 
again for the "short" case (only 1iteration/1s - 1warmup/1s):
(all uploaded to 
http://cr.openjdk.java.net/~pchilanomate/jmh%208254078trace/patched/)

dataOutputStreamOverBufferedFileStream:
Count     Thread                 Class
354304     0x00007f540c5d3d50     java.io.BufferedOutputStream
    3830     0x00007f540c026ec0     java.lang.Object
    2110     0x00007f540c026ec0 java.util.concurrent.ConcurrentHashMap$Node
     838     0x00007f540c026ec0 java.util.zip.Inflater$InflaterZStreamRef

dataOutputStreamOverByteArray:
Count     Thread                 Class
387261     0x00007fd80c5cb750     java.io.ByteArrayOutputStream
    3830     0x00007fd80c026ec0     java.lang.Object
    2252     0x00007fd80c026ec0 java.util.concurrent.ConcurrentHashMap$Node
     838     0x00007fd80c026ec0 java.util.zip.Inflater$InflaterZStreamRef

dataOutputStreamOverRawFileStream:
Count     Thread                     Class
3830     0x00007efd28026ec0     java.lang.Object
    2112     0x00007efd28026ec0 java.util.concurrent.ConcurrentHashMap$Node
     838     0x00007efd28026ec0 java.util.zip.Inflater$InflaterZStreamRef

Still a big synchronization count on java.io.BufferedOutputStream and 
java.io.ByteArrayOutputStream as expected, but since we are copying more 
bytes each time performance still improves relative to the unpatched 
version. dataOutputStreamOverRawFileStream remains unaltered in terms of 
synchronization also as expected, but it gets greatly benefited 
indirectly because we are now making less native calls for each run of 
the benchmark.

Now, the way I see it is that the problem appears because we are calling 
a synchronized method when we really shouldn't. If we are in a 
single-threaded case the DataOutputStream objects should have been 
backed by an output stream that is unsynchronized. So seems to me that 
the issue is we are missing an unsynchronized version of 
BufferedOutputStream and ByteArrayOutputStream. Shouldn't we provide 
that in the JDK library to solve for these cases ? That way these kind 
of workflows will run even faster than with biased locking.


Thanks,
Patricio
> This is not a solved problem, and I don't know what a "typical"
> user is, but some users may experience significant performance
> degradation. This is not a solved problem.


More information about the hotspot-dev mailing list