RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v2]

Thu May 5 23:39:31 UTC 2022

On Thu, 28 Apr 2022 20:02:36 GMT, Brian Burkhalter <bpb at openjdk.org> wrote:

>> Modify native multi-byte read-write code used by the `java.io` classes to limit the size of the allocated native buffer thereby decreasing off-heap memory footprint and increasing throughput.
>
> Brian Burkhalter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   6478546: Decrease malloc'ed buffer maximum size to 64 kB

Further performance testing was conducted for the case where the native read and write functions used a fixed, stack-allocated buffer of size 8192. The loops were moved up into the Java code of `FileInputStream`, `FileOutputStream` and `RandomAccessFile`. Note that there was code duplication because RAF needs both read and write methods as well. The performance of writing with this approach was approximately half what it had been, so for writing the approach was abandoned.

Here are some updated performance measurements:

<img width="721" alt="FileInputStream-read-perf" src="https://user-images.githubusercontent.com/71468245/167041493-6d4c421c-c2ec-4a8a-8b32-09b2a902a77c.png">

<img width="720" alt="FileOutputStream-write-perf" src="https://user-images.githubusercontent.com/71468245/167041541-94e5806c-de86-4e62-a117-4cfafac82e87.png">

The performance measurements shown are for the following cases:

1. Master: unmodified code as it exists in the mainline
2. Java: fixed-size stack buffer in native read, read loops in Java, write as in the mainline but with malloc buffer size limit
3. Native: read loop in native read with malloc buffer size limit, write as in the mainline but with malloc buffer size limit

The horizontal axis represents a variety of lengths from 8192 to 1GB; the vertical axis is throughput (ops/s) on a log 10 scale. The native lines in the charts are for the code proposed to be integrated.

As can be seen, the performance of reading is quite similar up to larger lengths. The mainline version presumably starts to suffer the effect of large allocations. The native read loop performs the best throughout, being for lengths 10 MB and above from 50% to 3X faster than the mainline version. The native read loop is about 40% faster than the Java read loop for these larger lengths.

Due to the log scale of the charts, the reading performance detail cannot be seen exactly and so is given here for the larger lengths:

               Throughput of read(byte[]) (ops/s)
   Length      Master         Java        Native
   1048576    11341.39      6124.482    11371.091
  10485760      356.893      376.326      557.906
 251503002       10.036       14.27        19.869
 524288000        5.005        6.857        9.552
1000000000        1.675        3.527        4.997

The performance of writing is about the same for the Java and Native versions, as it should be since the implementations are the same. Any difference is likely due to measurement noise. The mainline version again suffers for larger lengths.

As the native write loop was already present in the mainline code, the principal complexity proposed to be added is the native read loop. Given the improved throughput and vastly reduced native memory allocation this seems to be justified.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8235