Java value layout constants
Uwe Schindler
uschindler at apache.org
Sat Nov 27 14:24:15 UTC 2021
Hi,
File formats in Lucene were always big endian (all currently released
versions). With soon to be released 9.0 files we changed to little endian.
We did the change in 9.0 due to the fact that most platforms are little
endian.
For reading older index files (e.g. from mixed indexes, some segments old
others new, the backwards compatibility codecs wrap a endian switching
wrapper on top of all index inputs. This slows reading old indexes due to
extra method calls (this is also the reason why the panama directory impl can't be used with old Lucene versions, so it's a backwards incompatible change of API definition)
There is no plans (and won't ever be) to make endianness platform
dependent. Maintenance is too hard. We just use the most common one.
Uwe
Am 27. November 2021 14:04:12 UTC schrieb Rado Smogura <mail at smogura.eu>:
>Hi Uwe,
>
>
>I've got just one thing, more about Lucene (I guess it was already
>thought through). In this code you specify the byte order. I guess
>overall performance could be better if all things will be in native
>order, unless there's requirement Lucene files to be cross-platform
>copyable (I know Lucene only from it's wrapper ES).
>
>
>BR,
>
>Rado
>
>On 27.11.2021 11:00, Uwe Schindler wrote:
>> Hi Maurizio,
>>
>>> For this reason, I'd like to propose a small tweak, which would
>>> essentially revert alignment constraints for Java layout constants to
>>> what they were in 17. In other words, let's keep the "good" JAVA_XYZ
>>> names for the _true_ Java layouts (including alignment as seen by VM).
>>> If clients want to create unaligned constants they can do so, as they
>>> can also create big-endian constants where needed. In the majority of
>>> cases, since access will be aligned (for performance reasons), this will
>>> not really change much for clients. But some of those clients that need
>>> to pack data structures more (Lucene?) will need to define their own
>>> packed/unaligned layout constants.
>> That's all fine. In my first JDK 18 branch of Lucene's new MMapDirectory, I did it like that:
>> https://github.com/uschindler/lucene/blob/ad3a81e3d348d6aa417aa785bdfe9e7a39c1ee53/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L36-L40
>>
>> Basically we have layout constants anyways (as the file format has a defined byte order). At the time when I wrote this (in September) I was already thinking: "maybe add withBitAlignment(8) everywhere?"
>>
>> In general, when you define your own constant for on-disk layouts it is always adviseable to be specifc with byte order and alignment.
>>
>> Anyway, we are working on Lucene to have alignment in our files at least for those non-packed formats. E.g. we now align file slices (we call them CFS files) always with 8 bytes to not add additional misalignment there. At some point in future we may to enable the alignment checks, but this will be after several years (file format compatibility).
>>
>> Uwe
>>
--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
More information about the panama-dev
mailing list