RFR: 8328995: launcher can't open jar files where the offset of the manifest is >4GB [v4]

Sat Mar 30 03:49:31 UTC 2024

On Sat, 30 Mar 2024 03:21:39 GMT, Jiangli Zhou <jiangli at openjdk.org> wrote:

>> The other loop uses `readAt` to read in additional data and advance through the extra fields, this loop already has access to a buffer that contains all of the data for the extra fields and doesn't need to do that.
>> 
>> I considered having the other loop read in all of the data for the extra fields so there could be a shared helper, but the data is variable length, so that would have required having a large fixed-size buffer or allocating memory, which this code seems to be trying to avoid.
>> 
>> Do you see a good way to share the loop?
>
> How about something like the following (I haven't tried to compile or test, please double check if everything is correct)? The size of the extra fields is recorded in the 2-byte field, `extra field length`, so we can just read all extra fields in a temp buffer. It causes less overhead with just one read. 
> 
> 
> jboolean read_zip64_ext(int fd, jlong censtart, jlong base_offset, Byte *cenhdr,
>                                            jboolean check_offset_only, jboolean read_extra_fields) {
>   jlong cenlen = CENLEN(cenhdr);
>   jlong censiz = CENSIZ(cenhdr);
>   jlong cenoff = CENOFF(cenhdr);
>   if (cenoff == ZIP64_MAGICVAL ||
>       (!check_offset_only && (censiz == ZIP64_MAGICVAL || cenlen == ZIP64_MAGICVAL))) {
>     jlong cenext = CENEXT(cenhdr);
>     if (cenext == 0) {
>       return JNI_FALSE;
>     }
> 
>     jlong start = censtart + CENHDR + CENNAM(cenhdr);
>     jlong offset = 0;
>     Byte *p = start;
> 
>     // Read the extra fields if needed.
>     if (read_extra_fields) {
>       *p = (Byte*)malloc(cenext);
>       if (p == NULL) {
>         return JNI_FALSE;
>       }
>       if (!readAt(fd, start + offset, cenext, p)) {
>         free(p);
>         return JNI_FALSE;
>       } 
>     }
>     
>     while (offset < cenext) { 
>       short headerId = ZIPEXT_HDR(p + offset);
>       short headerSize = ZIPEXT_SIZ(p + offset);
>       if (headerId == ZIP64_EXTID) {
>         if (!read_zip64_ext(p, &cenlen, &censiz, &cenoff, CENDSK(cenhdr))) {
>           if (p != start) {
>             free(p);
>           }
>           return JNI_FALSE;
>         }
>         break;
>       }
>       offset += 4 + headerSize;
>     }
>   }
>   
>   if (p != start) {
>     free(p);
>   }
>   return JNI_TRUE;
> }

I think that's similar idea to one of the alternatives I mentioned earlier, won't that allocate for every central directory entry? This callsite has already read the data we need into a buffer, if we end up doing something like that it might be better to do the allocation only for the call site in `validate_lochdr`, and have the shared helper take the pointer to the buffer instead of having a duplicate read.

It seems like there are some tradeoffs here, I can definitely restructure this, but I'd also like to get some feedback from a core-libs owner on the overall approach before iterating too much more.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18479#discussion_r1545070043