HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809

Nico Williams Nico.Williams at twosigma.com
Thu Oct 19 22:09:16 UTC 2023


# Crashes

We recently upgrade to OpenJDK 17.0.8.1 and started observing crashes
resulting from double-frees via `gss_delete_sec_context()`. 

Adding `-Djdk.spnego.cache=false` to our java invocations stops the
crashes.  We believe this is due to a race condition that has long been
in `HttpURLConnection` that was mostly harmless before but which with
the addition of this commit:

    16843770b5a 8303809: Dispose context in SPNEGO NegotiatorImpl

became a use-after-free / double-free bug, leading to crashes.

This happens with stock 17.0.8.1+1 from Adoptium.

Attached is a reproducer, Test.java, which you can run like this:

    $ # credendials are kinit user at jgssbug.twosigma.com, password password
    $ #
    $ "$JAVA_HOME/bin/java" \
              -Dsun.security.jgss.native=true \
              -Dsun.security.jgss.lib=/usr/lib/libgssapi_krb5.so \
              Test.java https://kerberos.club/tmp/jgssbug.txt

It doesn't crash every time, but it crashes often.

A typical Java stack trace upon crashing looks like:

j  sun.security.jgss.wrapper.GSSLibStub.deleteContext(J)J+0 java.security.jgss at 17.0.8.1
j  sun.security.jgss.wrapper.NativeGSSContext.dispose()V+76 java.security.jgss at 17.0.8.1
j  sun.security.jgss.GSSContextImpl.dispose()V+16 java.security.jgss at 17.0.8.1
j  sun.net.www.protocol.http.spnego.NegotiatorImpl.disposeContext()V+11 java.security.jgss at 17.0.8.1
J 4456 c1 sun.net.www.protocol.http.NegotiateAuthentication.disposeContext()V java.base at 17.0.8.1 (24 bytes) @ 0x00007f909dc33834 [0x00007f909dc337c0+0x0000000
000000074]
j  sun.net.www.protocol.http.HttpURLConnection.getInputStream0()Ljava/io/InputStream;+1923 java.base at 17.0.8.1
j  sun.net.www.protocol.http.HttpURLConnection.getInputStream()Ljava/io/InputStream;+62 java.base at 17.0.8.1
j  sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream()Ljava/io/InputStream;+4 java.base at 17.0.8.1
j  Test.lambda$main$0()V+16
j  Test$$Lambda$207+0x00007f9020144208.run()V+0
j  java.lang.Thread.run()V+11 java.base at 17.0.8.1
v  ~StubRoutines::call_stub

On the C side the crash typically happens in the allocator, typically in
`free()`:

#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fb6e1185535 in __GI_abort () at abort.c:79
#2  0x00007fb6e11dc648 in __libc_message (action=action at entry=do_abort, fmt=fmt at entry=0x7fb6e12e62a0 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fb6e11e2d6a in malloc_printerr (str=str at entry=0x7fb6e12e444e "free(): invalid pointer") at malloc.c:5359
#4  0x00007fb6e11e459c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:4180
#5  0x00007fb582ba5b6d in krb5_free_address (context=context at entry=0x7fb5d80cbf60, val=0x7fb5d8000080) at kfree.c:62
#6  0x00007fb582b93d9e in krb5_auth_con_free (context=context at entry=0x7fb5d80cbf60, auth_context=0x7fb5d80778a0) at auth_con.c:60
#7  0x00007fb6440a3eec in krb5_gss_delete_sec_context (minor_status=0x7fb586868504, context_handle=0x7fb5d80e22d0, output_token=<optimized out>)
    at delete_sec_context.c:77
#8  0x00007fb6440964b0 in gssint_delete_internal_sec_context (minor_status=0x7fb586868504, mech_type=<optimized out>,
    internal_ctx=internal_ctx at entry=0x7fb5d80e22d0, output_token=0x0) at g_glue.c:606
#9  0x00007fb6440946dd in gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb5d80d2be8, output_token=<optimized out>)
    at g_delete_sec_context.c:91
#10 0x00007fb6440b8dfe in spnego_gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb5d807bd70, output_token=<optimized out>)
    at spnego_mech.c:2161
#11 0x00007fb6440964b0 in gssint_delete_internal_sec_context (minor_status=0x7fb586868504, mech_type=<optimized out>,
    internal_ctx=internal_ctx at entry=0x7fb5d807bd70, output_token=0x0) at g_glue.c:606
#12 0x00007fb6440946dd in gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb586868508, output_token=<optimized out>)
    at g_delete_sec_context.c:91
#13 0x00007fb6dc0559ff in Java_sun_security_jgss_wrapper_GSSLibStub_deleteContext () from /home/nico/tmp/jdk-17.0.8.1+1/lib/libj2gss.so
#14 0x00007fb6c89a653a in ?? ()
#15 0x0000000000000000 in ?? ()

We've seen other variations, but always they can be traced to
`HttpURLConnection`.

(We do maintain local patches to OpenJDK JGSS code as well.  But that's
a story for another time.  We do want to try again to upstream those
patches.  Because we maintain such patches we developed an interim fix
where we use `synchronized (lock)` in the JGSS `dispose()` methods and
also `Reference.reachabilityFence(this)`, which worked to prevent the
crashes, but that was before we learned that `-Djdk.spnego.cache=false`
also works.  In any case, stock, unpatched OpenJDK 17.0.8.1 evinces this
crasher.)

# Root cause

What 8303809 did was add calls to `dispose()` on `GSSContext` objects in
NegotiatorImpl.java in a new method called `disposeContext()` that all
HTTP authentication mechanisms now implement and which 8303809 also
added.

It took us a while to realize that the use of the `AuthCache` in
`HttpURLConnection` implies concurrent access to the cached contexts via
concurrent HTTP requests to the same origin.

Concurrent access to and disposal of those context objects then leads to
use-after-free / double-free errors when using sun.security.jgss.native.

There's really several issues going on here:

 - The `dispose()` methods in 17.x in the JGSS native C GSS bindings
   clases are dangerous, as are finalizers generally (and these
   `dispose()` methods are called from finalizers).

   This is fixed in later OpenJDK releases by making use of `Cleaner`,
   so there's not much more to say about this here.

 - The `AuthCache` in `src/java.base/share/classes/sun/net/www/protocol/http/`
   implies concurrency, and it's not clear that concurrency is allowable
   in all cases.

 - In the case of RFC 4559 "Negotiate" and GSS concurrency is very much
   not permitted.  The GSS APIs can and should be thread-safe, but it is
   still an error to invoke `initSecContext()` concurrently on the same
   `GSSContext`!

   GSS-API `initSecContext()`/`acceptSecContext()` are strictly
   synchronous and serial.

 - The `AuthCache` in `src/java.base/share/classes/sun/net/www/protocol/http/`
   is likely a misfeature altogether, at least as far as sharing GSS-
   like HTTP authentication mechanism contexts goes, and probably for
   all other HTTP auth methods too.

   It's not ok to take a reply token from a `WWW-Authenticate:` response
   header and consume it together with a "context" created by a
   unrelated request [to the same origin].

   Every `WWW-Authenticate:`/`Authorization:` token must be matched to
   the same [serial train of] request[s].

   Thus if we have an RFC 4559 Negotiate exchange requiring 5 tokens, so
   something like:

   0: request..
   1: 401 response w/ WWW-Authenticate: Negotiate
   2: request w/ Authorization: Negotiate ...           (continue needed)
   3: 200 response w/ WWW-Authenticate: Negotiate ...   (continue needed)
   4: request w/ Authorization: Negotiate ...           (continue needed)
   5: 200 response w/ WWW-Authenticate: Negotiate ...   (complete)

   it would not be permissible for one of those reply tokens to be fed
   to another request's GSSContext, nor would it be OK to have more than
   one serial set of requests share the same GSSContext.

 - The foregoing also applies to SCRAM, DIGEST-MD5, and just about any
   and all HTTP authentication methods that I know.  I believe that
   there is no value in caching HTTP authentication mechanism contexts
   on the client side.

   HTTP Authentication mechanisms like SCRAM and DIGEST-MD5 are
   three-message mechanisms when counting the initial challenge:

   0: request...
   1: 401 response w/ WWW-Authenticate: DIGEST-MD5 <challenge>
   2: request w/ Authorization: DIGEST-MD5 ...
   3: 200 w/ WWW-Authenticate: DIGEST-MD5 ... (complete)

   There is no case in which caching and sharing a SCRAM or DIGEST-MD5
   authentication context is useful because there is only one request/
   response in which there is a context needed at all: between 2 and 3
   inclusive in the above diagram.

   Perhaps a single challenge can be shared with multiple subsequent
   requests that will optimistically use SCRAM or DIGEST-MD5 knowing
   that one response demanded it, but one should probably not do this,
   though I could find nothing in RFC 5802 and 7677.

 - Negotiate is really a bearer token system, albeit with the
   possibility of using channel bindings.  Meaning we never use the
   resulting GSSContext to encrypt or sign any part of the HTTP requests
   and responses authenticated with that context.  Therefore there's
   never any point in sharing a GSSContext in HTTP, not even once it's
   fully-established -- there is no point in doing anything other than
   disposing of a fully-established GSSContext in HTTP/Negotiate.

   (Attempts to develop standards for encrypting/signing of HTTP
   requests and responses at the HTTP layer rather than in TLS have
   historically failed.  If one attempt to standardize such a thing
   succeeds, _then_ there will be something worth sharing in a cache
   with care to prevent use-after-dispose.)

 - Because of the preceding, in HTTP/Negotiate it is desirable to "free"
   any GSSContext immediately after it is complete.

   This is presumably how you came to add the `disposeContext()` method:
   because those cached contexts otherwise sit there using lots of
   memory for no reason; someone must have noticed.  However, rather
   than have an explicit `disposeContext()` these contexts should just
   immediately be disposed of when complete or failed.

# Possible fixes

 - Backport the use of `Cleaner` in the JGSS `dispose()` methods.

   This would not be very satisfying because the `HttpURLConnection`
   `AuthCache` would remain buggy, but at least the JVM wouldn't crash.

 - Default `jdk.spnego.cache` to `false`.

 - Rethink or remove the `AuthCache` in `HttpURLConnection`.

 - Something else?

Thanks,

Nico
-- 



More information about the security-dev mailing list