RFR: 8328995: launcher can't open jar files where the offset of the manifest is >4GB [v4]
Liam Miller-Cushon
cushon at openjdk.org
Sat Mar 30 03:49:31 UTC 2024
On Sat, 30 Mar 2024 03:21:39 GMT, Jiangli Zhou <jiangli at openjdk.org> wrote:
>> The other loop uses `readAt` to read in additional data and advance through the extra fields, this loop already has access to a buffer that contains all of the data for the extra fields and doesn't need to do that.
>>
>> I considered having the other loop read in all of the data for the extra fields so there could be a shared helper, but the data is variable length, so that would have required having a large fixed-size buffer or allocating memory, which this code seems to be trying to avoid.
>>
>> Do you see a good way to share the loop?
>
> How about something like the following (I haven't tried to compile or test, please double check if everything is correct)? The size of the extra fields is recorded in the 2-byte field, `extra field length`, so we can just read all extra fields in a temp buffer. It causes less overhead with just one read.
>
>
> jboolean read_zip64_ext(int fd, jlong censtart, jlong base_offset, Byte *cenhdr,
> jboolean check_offset_only, jboolean read_extra_fields) {
> jlong cenlen = CENLEN(cenhdr);
> jlong censiz = CENSIZ(cenhdr);
> jlong cenoff = CENOFF(cenhdr);
> if (cenoff == ZIP64_MAGICVAL ||
> (!check_offset_only && (censiz == ZIP64_MAGICVAL || cenlen == ZIP64_MAGICVAL))) {
> jlong cenext = CENEXT(cenhdr);
> if (cenext == 0) {
> return JNI_FALSE;
> }
>
> jlong start = censtart + CENHDR + CENNAM(cenhdr);
> jlong offset = 0;
> Byte *p = start;
>
> // Read the extra fields if needed.
> if (read_extra_fields) {
> *p = (Byte*)malloc(cenext);
> if (p == NULL) {
> return JNI_FALSE;
> }
> if (!readAt(fd, start + offset, cenext, p)) {
> free(p);
> return JNI_FALSE;
> }
> }
>
> while (offset < cenext) {
> short headerId = ZIPEXT_HDR(p + offset);
> short headerSize = ZIPEXT_SIZ(p + offset);
> if (headerId == ZIP64_EXTID) {
> if (!read_zip64_ext(p, &cenlen, &censiz, &cenoff, CENDSK(cenhdr))) {
> if (p != start) {
> free(p);
> }
> return JNI_FALSE;
> }
> break;
> }
> offset += 4 + headerSize;
> }
> }
>
> if (p != start) {
> free(p);
> }
> return JNI_TRUE;
> }
I think that's similar idea to one of the alternatives I mentioned earlier, won't that allocate for every central directory entry? This callsite has already read the data we need into a buffer, if we end up doing something like that it might be better to do the allocation only for the call site in `validate_lochdr`, and have the shared helper take the pointer to the buffer instead of having a duplicate read.
It seems like there are some tradeoffs here, I can definitely restructure this, but I'd also like to get some feedback from a core-libs owner on the overall approach before iterating too much more.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18479#discussion_r1545070043
More information about the core-libs-dev
mailing list