Proposal: Optimizing Efficiency using Read-only Arrays
Brian Goetz
brian.goetz at oracle.com
Thu Dec 29 17:10:43 UTC 2022
This speaks to an important design decision underlying the freezing proposal. The ecosystem is full of API points that take arrays; we wouldn’t want to bifurcate those into those who take mutable array and those who take read-only arrays. So a new bytecode descriptor for readonly arrays would be problematic not from a platform perspective, but from an ecosystem perspective. What we want is to reuse the world of existing non-evil, array-consuming code without disruption. So it makes sense that writability be a dynamic property enforced by the VM.
Fortunately, the VM already has to validate array writes (for reference arrays), because you can cast a String[] to an Object[] and then try to put an Integer in it — and you’ll get an ArrayStoreException. For a readonly array, you’d always get ASE.
On Dec 29, 2022, at 12:04 PM, Nathan Reynolds <numeralnathan at gmail.com<mailto:numeralnathan at gmail.com>> wrote:
In one project, we changed the code to be zero copy I/O. This dramatically improved the performance of I/O intensive applications. So, getting to zero copy I/O is a tremendous win.
How would the Java language prevent a rogue custom method from altering the array? Someone could write a method that receives a read-only array and then change the bytecode in the .class to say it receives a writable array. The enforcement would have to be at the JVM level. Furthermore, reflection would have to deal with enforcing the read-only attribute.
On Thu, Dec 29, 2022 at 7:46 AM Markus Karg <markus at headcrashing.eu<mailto:markus at headcrashing.eu>> wrote:
Proposal: Optimizing Efficiency using Read-only Arrays
TL;DR: Read-only Arrays will improve speed, reduce memory and power consumption, provide security by default, and make programming and reviews easier and quicker.
Looking at the profile of any average real-world application, it is apparent that a lot of memory activity stems from allocating byte arrays.
Byte arrays are a core building block of several APIs in OpenJDK.
Just to name two of them: First and foremost Strings, as they are ubiquitous, but also I/O, as byte arrays are the buckets which carry all data through any InputStream/OutputStream.
While I was authoring several java.io<http://java.io/> optimizations in the past months, the latter became the driver for me write down this propsal.
Nevertheless, the proposal is focusing on a general solution, applicable to all Java APIs, beyond I/O.
To perform any I/O in Java, all data MUST pass one or multiple byte arrays, each and every day.
As it is easy to imagine, we can easily talk about multiple Gigabytes per day for an average server product.
Once this array reference is passed to a custom method, it leaves the safe harbor of the JDK while entering possibly evil outside world - it becomes compromised.
The called custom method ("Mr Evil") could either read privata data sitting in the array beyond passed read lower and upper limits, or could write poisoned data into the passed array, picked up afterwards by the JDK code (hence is treated as "safe" data).
To mitigate these risks, typically byte arrays are duplicated (at least within limits) before forwarded to the outer world, so the "evil" receiver will only see a temporary / trimmed copy of the array.
Just due to that single safety means alone, each day tens of thousands of Java servers are squandering precious memory and power, producing considerable amounts of carbon dioxide in turn.
While copying buffers is effective, it also is inefficient.
"Inefficiency" is definitively not a term we want Java to be recognized as in the age of climate change.
N.B.: As soon as we omit explicit creation of an array copy, either due to a human programming fault, or due to an unexpected technical failure, security is ineffective! Hence relying on explicit copies is also a suboptimal ("flaky") safety means. Due to that risk, reviews of I/O code often become complex, lenghty and exhausting, making them rather expensive.
This is just one single example. You could easily find lots more in the JDK.
If the Java language would have a means to mark arrays as "read-only" to the Compiler / JVM (just like it alrady has for final variables), then no more need for an explicit copy exists.
Several benefits would arise from the fact that no copy of the array is created (and removed) in turn:
* Speed is improved. While System.arraycopy() is quick, not calling it at all is quicker.
* GC pressure is reduced. While it might be low already, not creating a copy of an array makes it zero.
* Security by default. As the JVM cannot write "read-only" arrays, there is no harm when an explicit copy is omitted.
* Reduced memory consumption. No copy at all means literally zero additional memory.
* Reduced power consumption. No power to invest into squandered CPU cycles.
* Easier programming. No need to remind explicit creation of copies.
* Simpler code. No copies means no code to create them, making the reminder simpler to understand.
* Quicker reviews. Reviewer does not have to take care to check for compromised buffers, which is easily forgotten.
While each single effect might be small, remind that all these effects will happen all together at once, and are massively applied each and every day, as arrays are building blocks of the JDK.
To sum up, I'd like to propose to add a means to the Java language which turns arrays into "read-only" arrays.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20221229/6dbd33bb/attachment.htm>
More information about the amber-dev
mailing list