[Update]: JEP 249 (OCSP Stapling for TLS)

Thu Jul 2 14:26:18 UTC 2015

On 07/02/2015 05:05 AM, Xuelei Fan wrote:
> sun/security/ssl/ServerHandshaker.java
> ======================================
> OCSP stapling only used for certificate-based server authentication at
> present.  I was wondering,  may be better to make a check before wrap
> the ServerHello OCSP extension and CertificateStatus message that
> Certificate message would be actually used.
>
> -------------------
>   860    for (CertStatusReqItemV2 csriv2 :
>   861        statReqExtV2.getRequestItems()) {
>   862        // Favor OCSP_MULTI over OCSP
>   863        statReqType = csriv2.getType();
>   864        statReqData = csriv2.getRequest();
>   865        if (statReqType == StatusRequestType.OCSP_MULTI) {
>   866            break;
>   867        }
>   868     }
>
> Client's preference (favorite choice first) should be respected. For
> type OCSP, line 863-864 prefer the last item rather than the client
> preference.  And line 863-864 does not consider whether the
> OCSPStatusRequest is suitable for this server or not, just select the
> last one.  As may result in ocsp stapling failure if the selected item
> is not for the right responder.
We talked about this back when the original design was done.  Back then 
I had gone with client-ordering as the selection criteria and the 
feedback I got was to favor ocsp_multi for those clients that can do it, 
since it gives a more complete set of revocation information.  While the 
client is ordering it's preference, I don't see in the spec any area 
that says the server has to strictly adhere to the client preference.  
If we want to favor ocsp_multi as a server we should be able to do that 
legally.  As for the discussion where we talked about this, I don't 
recall if it was a conversation or in email.  I'll have to check.  If 
folks feel strongly about strictly honoring client ordering I can make 
the change.

The current code snippet doesn't strictly prefer the last item.  If the 
first item is OCSP_MULTI, then it will stop right there, but your point 
is still true that we're not going on client ordering (as explained above).
>
> Line 865-867 does not consider whether the OCSPStatusRequest is suitable
> for this server or not, either.
>
> In general, client knows nothing about server certificates.  So client
> would offer responders it trusted.  The responders is not necessary
> suitable for the target server.  The server side should select a
> suitable one.  It is not necessary the first one or the last one.  See
> also section 2.2, RFC 6961:
>
>    "Servers that support a client's selection of responders using the
>     ResponderID field could implement this selection by matching the
>     responder ID values from the client's list with the ResponderIDs of
>     known OCSP responders, either by using a binary compare of the values
>     or a hash calculation and compare method."
>
> If the update of your implementation is pretty significant or it is
> unclean how we should move forward at present, I'm OK if you want to
> remove the support of client specified responders.
As implied from the above paragraph servers aren't required to support 
client ResponderId selection.  I don't have support for this feature 
yet, neither from the client or the server.  The OCSPStatusRequest has 
the hooks in it to populate ResponderIds because it was fairly easy to 
do.  I would like to leave that code in, since providing that capability 
to the client is something I want to do in the future, as well as handle 
the server selection based on Responder Id.
>
> sun/security/ssl/StatusResponseManager.java
> ===========================================
> line 207-213:
>
> // It is assumed that the caller has ordered the certs in the chain
> // from end-entity to trust anchor.  In order to create the CertId
> // necessary for OCSP requests the subject and issuer certs must
> // be present.
> - if (chain.length < 2) {
> + if (chain.length == 0) {
>      return Collections.emptyMap();
> }
>
> This assumption is not actually correct in general.  The trust anchor is
> not necessarily be included in the Certificate message.  See page 48,
> section 7.4.2, RFC 5246.
I believe this chain comes in from the KeyManager via 
ServerHandshaker.setupPrivateKeyAndChain(), so it should be a complete 
cert chain in the correct order (or as complete as the server can do, I 
suppose).  The Certificate message is not a factor here.
>
> ----------------------------
>   436         List<ResponderId> responderIds;
>
> You don't actually use the client specified responders, do you?  It's OK
> to me if we don't support none-empty OCSPStatusRequest.  If we do not
> want to do that, I would suggest to remove the related implementation
> code for that.  As would simplify the job, I think.
No, not today.  In the future we will.  I don't believe Apache does 
ResponderId selection either.  Not sure about other implementations.  I 
know Apache will handle extension forwarding, which we also do (and can 
opt not to do  with a property).  I don't want to remove the code 
though.  Those are the hooks for the feature in the future.  Same for 
the OCSPStatusRequest...it has a constructor that allows population of 
extensions and ResponderIds. I'd like to leave this in, even if we're 
not populating those fields from the client today.
>
> -----------------------------
>   252    // Set a bunch of threads to go do the fetching
>   253    List<Future<StatusInfo>> resultList =
>   254        threadMgr.invokeAll(requestList, delay, unit);
>
> The current implementation of OCSP response getter, looks like for every
> cert in the chain, a fetch thread would be created to get the
> corresponding OCSP response.  In the fetch thread, if the response is
> cached, use it; otherwise, connect to OCSP server and fetch it.
>
> Performance would be a very big concern.  A connection may kick start
> many threads.  Fetch threads may be a bottleneck of the server. In
> general, a server may be expected to serve 10K+ connections, but may
> only be able to hit around 10K- threads.  It's likely to face DoS
> challenges if every ClientHello trigger 1+ fetch threads.
If a Server has to hit that many connections simultaneously, it's 
probably already doing multi-threading to accomplish that goal, wouldn't 
you think?  If it cannot hit 10K threads to begin with, it seems to me 
like some system-level tuning needs to be done . Second, a new thread 
isn't created for each client connection.  This is a thread pool, so a 
core number of threads are created when the SRM is created, new ones are 
made as needed, and old ones are reused whenever their task is finished 
and goes into an idle state.  So we not always making a new thread.

Further, time to bring the response back to the caller from the cache is 
very fast, between 2-5 msec.  So there would most likely be some thread 
reuse on a very busy server.  That very busy server is not going to be a 
single-core/single-thread machine; it will be capable of servicing 
multiple threads at the same time, so that should help aggregate throughput.

It of course takes significantly longer to bring the response back from 
the network.  I didn't get exact numbers, but in an environment where 
the responders are on the same machine as the server I could get them 
back in about 150msec.  So in extremely rough terms it's network IO time 
+ 150msec for each fetch.
>
> Another performance concern is that when there is no cache, the client
> would have to wait for the server to fetch the response from OCSP
> servers.  It's easy to run into client timeout issue.
As clients do with other implementations.  If the server doesn't have 
the response, it has to get it and the client has to wait for it.  I see 
no problem here.  The connection delay isn't technically the time taken 
to retrieve all responses, it is the time taken to retrieve all 
responses or the server fetch timeout, whichever is smaller.  A server 
can opt to wait 500 msec, 1000msec, or whatever the admin thinks is 
appropriate given the distance of the responders and the client load.  
If the timer is exceeded, the server will include any responses it has 
and will not include the ones it is still retrieving.  If it cannot get 
any, then it doesn't even assert status_requst[_v2].

What's more, even if the server has to get 3 responses, but only has one 
by the timeout threshold the remaining two fetches will continue in the 
background and complete, even if the connection has gone on. This way a 
future connection coming in has a greater chance of getting a cache hit.
>
> I may suggest to use a single thread to get responses from OCSP servers.
>   And any TLS connection would only get responses from cache.  The
> scenarios may looks like:
> 1. Per every SSLContext instance, create a scheduled single thread to
> fetch the response from OCSP servers, and cache them for each server
> certificate chains.  The server certificates can be got from key manager.
> 2. For every client request, look for the response from the cache.
> 3. if responder is not cached and if we want to support none-empty
> OCSPStatusRequest, fetch the response from OCSP servers in the same
> thread, and cache the result.  We may also want to add the task to the
> scheduled thread in #1.  Not create new thread for every connection is
> important for SSLEngine in some circumstances.
>
> #3 is also not performance friendly.  I think none-empty
> OCSPStatusRequest is not common in practice.
I'll have to think about your proposal a bit.  It's a significant change 
from what I'm doing, but it is an interesting alternative.

As stated above, we don't create new threads for each connection. 
Threads are reused when they are idle/available.

You are correct, non-empty OCSPStatusRequests are rare.  I've never seen 
a client do one.  I've only looked at browsers, but none of them 
populate the request.
>
>
> I will stop here for the first round code review.  Looking forward for
> the next round webrev.
>
> Thanks,
> Xuelei
>
>
> On 6/27/2015 11:06 PM, Jamil Nimeh wrote:
>> Hello all, I've posted an updated webrev based on comments I've received
>> so far:
>>
>> http://cr.openjdk.java.net/~jnimeh/reviews/8046321/webrev.1
>>
>> Thanks,
>> --Jamil
>>
>> On 06/18/2015 05:27 PM, Jamil Nimeh wrote:
>>> Hello all,
>>>
>>> I have a first cut at the OCSP stapling webrev posted for your review:
>>>
>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8046321
>>> Webrev: http://cr.openjdk.java.net/~jnimeh/reviews/8046321/webrev.0/
>>>
>>> A couple items to note:
>>>
>>>    * I'm in the process of updating the JEP with some more details.  I
>>>      should be done with these changes by tonight (PDT).
>>>    * Missing are some of the TLS end-to-end tests.  These tests have
>>>      been coded and run outside the jtreg framework, but for some
>>>      reason things hang in jtreg.  I've included some of the supporting
>>>      classes that these tests will use (CertificateBuilder.java and
>>>      SimpleOCSPResponder.java) so folks could review those if they're
>>>      interested.  I will update the webrev and notify the list as soon
>>>      as I've got the tests working in jtreg.
>>>
>>> Thanks to everyone who has helped along the way.
>>>
>>> --Jamil
>>>
>>>