HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809

Michael McMahon michael.x.mcmahon at oracle.com
Mon Oct 23 15:26:42 UTC 2023


Hi,

Thanks for bringing this to our attention. You are right that this is a 
misuse of the authentication cache in the case of Kerberos (Negotiate) 
authentication. Though that is not the case for other auth schemes, 
because normally what gets cached are credentials, rather than security 
tokens.

It makes no sense to cache GSS contexts either, outside the scope of any 
individual request (being authenticated through multiple 
request/responses). We don't need to cache it in this situation as it is 
already kept as a local variable in the HttpURLConnection impl class.

So, my first impression is that the fix here needs to disable the cache 
permanently for the Negotiate scheme, which is basically what the system 
property workaround is doing. But, we need to just be sure about that 
before we publish a PR.

Thanks,

Michael.

On 19/10/2023 23:09, Nico Williams wrote:
> # Crashes
>
> We recently upgrade to OpenJDK 17.0.8.1 and started observing crashes
> resulting from double-frees via `gss_delete_sec_context()`.
>
> Adding `-Djdk.spnego.cache=false` to our java invocations stops the
> crashes.  We believe this is due to a race condition that has long been
> in `HttpURLConnection` that was mostly harmless before but which with
> the addition of this commit:
>
>      16843770b5a 8303809: Dispose context in SPNEGO NegotiatorImpl
>
> became a use-after-free / double-free bug, leading to crashes.
>
> This happens with stock 17.0.8.1+1 from Adoptium.
>
> Attached is a reproducer, Test.java, which you can run like this:
>
>      $ # credendials are kinituser at jgssbug.twosigma.com, password password
>      $ #
>      $ "$JAVA_HOME/bin/java" \
>                -Dsun.security.jgss.native=true \
>                -Dsun.security.jgss.lib=/usr/lib/libgssapi_krb5.so \
>                Test.javahttps://kerberos.club/tmp/jgssbug.txt
>
> It doesn't crash every time, but it crashes often.
>
> A typical Java stack trace upon crashing looks like:
>
> j  sun.security.jgss.wrapper.GSSLibStub.deleteContext(J)J+0java.security.jgss at 17.0.8.1
> j  sun.security.jgss.wrapper.NativeGSSContext.dispose()V+76java.security.jgss at 17.0.8.1
> j  sun.security.jgss.GSSContextImpl.dispose()V+16java.security.jgss at 17.0.8.1
> j  sun.net.www.protocol.http.spnego.NegotiatorImpl.disposeContext()V+11java.security.jgss at 17.0.8.1
> J 4456 c1 sun.net.www.protocol.http.NegotiateAuthentication.disposeContext()Vjava.base at 17.0.8.1  (24 bytes) @ 0x00007f909dc33834 [0x00007f909dc337c0+0x0000000
> 000000074]
> j  sun.net.www.protocol.http.HttpURLConnection.getInputStream0()Ljava/io/InputStream;+1923java.base at 17.0.8.1
> j  sun.net.www.protocol.http.HttpURLConnection.getInputStream()Ljava/io/InputStream;+62java.base at 17.0.8.1
> j  sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream()Ljava/io/InputStream;+4java.base at 17.0.8.1
> j  Test.lambda$main$0()V+16
> j  Test$$Lambda$207+0x00007f9020144208.run()V+0
> j  java.lang.Thread.run()V+11java.base at 17.0.8.1
> v  ~StubRoutines::call_stub
>
> On the C side the crash typically happens in the allocator, typically in
> `free()`:
>
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007fb6e1185535 in __GI_abort () at abort.c:79
> #2  0x00007fb6e11dc648 in __libc_message (action=action at entry=do_abort, fmt=fmt at entry=0x7fb6e12e62a0 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
> #3  0x00007fb6e11e2d6a in malloc_printerr (str=str at entry=0x7fb6e12e444e "free(): invalid pointer") at malloc.c:5359
> #4  0x00007fb6e11e459c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:4180
> #5  0x00007fb582ba5b6d in krb5_free_address (context=context at entry=0x7fb5d80cbf60, val=0x7fb5d8000080) at kfree.c:62
> #6  0x00007fb582b93d9e in krb5_auth_con_free (context=context at entry=0x7fb5d80cbf60, auth_context=0x7fb5d80778a0) at auth_con.c:60
> #7  0x00007fb6440a3eec in krb5_gss_delete_sec_context (minor_status=0x7fb586868504, context_handle=0x7fb5d80e22d0, output_token=<optimized out>)
>      at delete_sec_context.c:77
> #8  0x00007fb6440964b0 in gssint_delete_internal_sec_context (minor_status=0x7fb586868504, mech_type=<optimized out>,
>      internal_ctx=internal_ctx at entry=0x7fb5d80e22d0, output_token=0x0) at g_glue.c:606
> #9  0x00007fb6440946dd in gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb5d80d2be8, output_token=<optimized out>)
>      at g_delete_sec_context.c:91
> #10 0x00007fb6440b8dfe in spnego_gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb5d807bd70, output_token=<optimized out>)
>      at spnego_mech.c:2161
> #11 0x00007fb6440964b0 in gssint_delete_internal_sec_context (minor_status=0x7fb586868504, mech_type=<optimized out>,
>      internal_ctx=internal_ctx at entry=0x7fb5d807bd70, output_token=0x0) at g_glue.c:606
> #12 0x00007fb6440946dd in gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb586868508, output_token=<optimized out>)
>      at g_delete_sec_context.c:91
> #13 0x00007fb6dc0559ff in Java_sun_security_jgss_wrapper_GSSLibStub_deleteContext () from /home/nico/tmp/jdk-17.0.8.1+1/lib/libj2gss.so
> #14 0x00007fb6c89a653a in ?? ()
> #15 0x0000000000000000 in ?? ()
>
> We've seen other variations, but always they can be traced to
> `HttpURLConnection`.
>
> (We do maintain local patches to OpenJDK JGSS code as well.  But that's
> a story for another time.  We do want to try again to upstream those
> patches.  Because we maintain such patches we developed an interim fix
> where we use `synchronized (lock)` in the JGSS `dispose()` methods and
> also `Reference.reachabilityFence(this)`, which worked to prevent the
> crashes, but that was before we learned that `-Djdk.spnego.cache=false`
> also works.  In any case, stock, unpatched OpenJDK 17.0.8.1 evinces this
> crasher.)
>
> # Root cause
>
> What 8303809 did was add calls to `dispose()` on `GSSContext` objects in
> NegotiatorImpl.java in a new method called `disposeContext()` that all
> HTTP authentication mechanisms now implement and which 8303809 also
> added.
>
> It took us a while to realize that the use of the `AuthCache` in
> `HttpURLConnection` implies concurrent access to the cached contexts via
> concurrent HTTP requests to the same origin.
>
> Concurrent access to and disposal of those context objects then leads to
> use-after-free / double-free errors when using sun.security.jgss.native.
>
> There's really several issues going on here:
>
>   - The `dispose()` methods in 17.x in the JGSS native C GSS bindings
>     clases are dangerous, as are finalizers generally (and these
>     `dispose()` methods are called from finalizers).
>
>     This is fixed in later OpenJDK releases by making use of `Cleaner`,
>     so there's not much more to say about this here.
>
>   - The `AuthCache` in `src/java.base/share/classes/sun/net/www/protocol/http/`
>     implies concurrency, and it's not clear that concurrency is allowable
>     in all cases.
>
>   - In the case of RFC 4559 "Negotiate" and GSS concurrency is very much
>     not permitted.  The GSS APIs can and should be thread-safe, but it is
>     still an error to invoke `initSecContext()` concurrently on the same
>     `GSSContext`!
>
>     GSS-API `initSecContext()`/`acceptSecContext()` are strictly
>     synchronous and serial.
>
>   - The `AuthCache` in `src/java.base/share/classes/sun/net/www/protocol/http/`
>     is likely a misfeature altogether, at least as far as sharing GSS-
>     like HTTP authentication mechanism contexts goes, and probably for
>     all other HTTP auth methods too.
>
>     It's not ok to take a reply token from a `WWW-Authenticate:` response
>     header and consume it together with a "context" created by a
>     unrelated request [to the same origin].
>
>     Every `WWW-Authenticate:`/`Authorization:` token must be matched to
>     the same [serial train of] request[s].
>
>     Thus if we have an RFC 4559 Negotiate exchange requiring 5 tokens, so
>     something like:
>
>     0: request..
>     1: 401 response w/ WWW-Authenticate: Negotiate
>     2: request w/ Authorization: Negotiate ...           (continue needed)
>     3: 200 response w/ WWW-Authenticate: Negotiate ...   (continue needed)
>     4: request w/ Authorization: Negotiate ...           (continue needed)
>     5: 200 response w/ WWW-Authenticate: Negotiate ...   (complete)
>
>     it would not be permissible for one of those reply tokens to be fed
>     to another request's GSSContext, nor would it be OK to have more than
>     one serial set of requests share the same GSSContext.
>
>   - The foregoing also applies to SCRAM, DIGEST-MD5, and just about any
>     and all HTTP authentication methods that I know.  I believe that
>     there is no value in caching HTTP authentication mechanism contexts
>     on the client side.
>
>     HTTP Authentication mechanisms like SCRAM and DIGEST-MD5 are
>     three-message mechanisms when counting the initial challenge:
>
>     0: request...
>     1: 401 response w/ WWW-Authenticate: DIGEST-MD5 <challenge>
>     2: request w/ Authorization: DIGEST-MD5 ...
>     3: 200 w/ WWW-Authenticate: DIGEST-MD5 ... (complete)
>
>     There is no case in which caching and sharing a SCRAM or DIGEST-MD5
>     authentication context is useful because there is only one request/
>     response in which there is a context needed at all: between 2 and 3
>     inclusive in the above diagram.
>
>     Perhaps a single challenge can be shared with multiple subsequent
>     requests that will optimistically use SCRAM or DIGEST-MD5 knowing
>     that one response demanded it, but one should probably not do this,
>     though I could find nothing in RFC 5802 and 7677.
>
>   - Negotiate is really a bearer token system, albeit with the
>     possibility of using channel bindings.  Meaning we never use the
>     resulting GSSContext to encrypt or sign any part of the HTTP requests
>     and responses authenticated with that context.  Therefore there's
>     never any point in sharing a GSSContext in HTTP, not even once it's
>     fully-established -- there is no point in doing anything other than
>     disposing of a fully-established GSSContext in HTTP/Negotiate.
>
>     (Attempts to develop standards for encrypting/signing of HTTP
>     requests and responses at the HTTP layer rather than in TLS have
>     historically failed.  If one attempt to standardize such a thing
>     succeeds, _then_ there will be something worth sharing in a cache
>     with care to prevent use-after-dispose.)
>
>   - Because of the preceding, in HTTP/Negotiate it is desirable to "free"
>     any GSSContext immediately after it is complete.
>
>     This is presumably how you came to add the `disposeContext()` method:
>     because those cached contexts otherwise sit there using lots of
>     memory for no reason; someone must have noticed.  However, rather
>     than have an explicit `disposeContext()` these contexts should just
>     immediately be disposed of when complete or failed.
>
> # Possible fixes
>
>   - Backport the use of `Cleaner` in the JGSS `dispose()` methods.
>
>     This would not be very satisfying because the `HttpURLConnection`
>     `AuthCache` would remain buggy, but at least the JVM wouldn't crash.
>
>   - Default `jdk.spnego.cache` to `false`.
>
>   - Rethink or remove the `AuthCache` in `HttpURLConnection`.
>
>   - Something else?
>
> Thanks,
>
> Nico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/security-dev/attachments/20231023/eaa05ccf/attachment.htm>


More information about the security-dev mailing list