RFR: 8293170: Improve encoding of the debuginfo nmethod section [v18]
John R Rose
jrose at openjdk.org
Thu Dec 22 20:20:57 UTC 2022
On Thu, 15 Dec 2022 13:51:45 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
>> The nmethod "scopes data" section is 10% of the size of nmethod. Now the data is compressed using the Pack200 algorithm, which is good for encoding small integers (LineNumberTable, etc). Using the fact that half of the data in the partition contains zeros, I reduce its size by another 30%.
>>
>> Testing: jtreg hotspot&jdk, Renaissance benchmarks
>
> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision:
>
> cleanup, rename and some testing
I am sorry, but we need to pump the brakes here. This proposal is not acceptable in its current form.
There's a classic tradeoff between writing specialized encodings for each kind of data, versus using shared encodings. The specialized encodings might be more compact and/or faster to read or write because they are tuned for a particular kind of data. The shared encodings might be less compact, etc., but because they are shared the work of learning, documenting, analyzing, and testing them is done once. That's why people use LEB128 as a varint in so many places: It is not the best varint for most purposes, but it is easy to learn and well-understood. You know what you are getting when you use it.
These varint formats are rather easy to create. Here we have one that has good statistics for a particular kind of data, in the cases available today we have tested. But if we accept it into our source base, maintainers will have to learn to deal with it in the future. If the statistics of the data change (this will surely happen at some point) then what is a nice local optimization will become an analysis problem, a black box that maintainers will not want to touch.
To address such maintenance problems, we have picked off-the-shelf algorithms, documented and implemented them carefully, and reused them in our code base.
In HotSpot, we are using UNSIGNED5 (from Pack200 in the Java ecosystem) as an internal varint, yet another one. It is slightly better than LEB128. (We also use UTF8, of course, for similar purposes.) We have factored out the algorithm so that it can be shared in more places in HotSpot. For example, there is a line of work where field-info structures will be compressed with var-ints. And there may be more to come; the reloc-info streams come to my mind as candidates for re-encoding.
Having a centralized varint mechanism is the smart call, because it can be maintained on behalf of multiple use cases. And if it needs improvement or tuning, such changes will benefit all users.
By contrast, using a local compression scheme for a particular kind of data, as proposed here, creates a separate account of technical debt in each place.
So, let's not.
Instead, if we notice that we have zero-rich data, let's first try to do something that composes with with the existing varint scheme (UNSIGNED5) and works on top of it, or underneath it. That is, stack a zero-reduction scheme with our varint scheme, instead of simultaneously inventing a zero-reduction scheme and a new varint scheme.
As a very simple example of this, the first N (N=16, say) code points `J=[0..N-1]` of UNSIGNED5 can be special-cased to mean "there are J zero values here". A byte J in that range means that there are J+1 zeroes to decode. A byte beyond that range is first decoded as X0 and then adjusted as `X=X0-N+1`, so that it is a full-range 32-bit integer. The last N-1 values in the range, `[1-N..-1]`, will decode from 5-byte encodings, using 32-bit overflow, which the UNSIGNED5 algorithm is tolerant of.
As another simple example, probably overkill but more similar to the present proposal, the first `N=2^K` code points could be reserved as above (say, `K=4,N=16`), and the decoder state would have not just a count of extra zero values, but rather a bitmask of them. In that case, the bitmask for some byte `J<2^K` would always be initialized to `2*J+1` so that there is always at least one bit set.
And as a different suggestion, the most attractive off-the-shelf zero-reduction scheme I am aware of is in Capn Proto [1].
[1]: <https://capnproto.org/encoding.html#packing>
It is similar to the zero-reduction scheme proposed here, but is more battle-tested and probably better thought through. I suggest we investigate its use, as a standard compression technique, to compress these streams, preferably as-is, or perhaps adapted; preferably in conjunction with our existing varint scheme, or stand-alone.
I suspect (though I am not certain) that Capn Proto's packing scheme would work fine as a byte source for UNSIGNED5. It is organized as a word-wise algorithm, which allows it to use fully-loaded registers as temps. Stacking the two together would involve streaming through packed words, sticking those words into a small buffer (one cache line is enough), and decoding from there as UNSIGNED5. Again, doing something like this (or either of the two suggestions above), would compose known-good techniques, would reduce maintenance burden, would solve the problem at hand (probably), and would avoid the technical debt of having multiple overlapping varint schemes in our code base.
-------------
Changes requested by jrose (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10025
More information about the hotspot-compiler-dev
mailing list