RFR 6913047: SunPKCS11 memory leak

Wed Oct 3 20:36:37 UTC 2018

Hi Martin,

I found the problem causing the one regression test failure and fixed 
it. Now Mach5 run is clean.

http://cr.openjdk.java.net/~valeriep/6913047Exp/webrev.02

I also made various changes hoping to improve things. You can compare 
the files in above webrev with yours for differences. General principal 
is to minimize the changes as all new code may introduce regressions 
especially with changes of this scale.
One key difference is that in your code, you destroy the native key 
handle after extracting native key info in the key constructor, and then 
re-creating the key handle when increase the reference count. I use a 
different approach, I keep the key handle in the key constructor which 
will then be destroyed when the reference count goes down to 0. From the 
regression test output that I observed, most keys are created and used 
once. Keeping the keyID in constructor seems more efficient. Besides, I 
also disable native key info extraction for all token keys.

One thing that I am debating is whether we should add some property to 
disable this. I am aware that this will only be enabled if the key info 
extraction succeeds. However, would there be cases where the extraction 
succeeds but the re-creation fails? P11 Key objects are quite widely 
used, if something goes wrong, the impact may be significant.

Thanks,
Valerie

On 10/1/2018 6:48 PM, Valerie Peng wrote:
>
> Hi Martin,
>
> For the KeyStore case, they are mostly token objects which the extract 
> key info approach does not apply?
>
> For your changes in p11_keymgmt.c, I ran into compiler error and 
> SIGBUS errors on two OS (mac and solaris sparc), I ended up changing 
> variable initializations as well as memset(...). With the updated 
> native changes, I adapted and re-tested my prototype changes. For 
> reference, you can find the updated prototype changes at: 
> http://cr.openjdk.java.net/~valeriep/6913047Exp/webrev.01/
>
> Besides making changes in the keymgmet.c for getting rid of 
> platform-specific compilation error and SIGBUS error, I noticed that 
> you hardcoded the key wrapping mechanism in native code for both 
> getNativeKeyInfo(...)/createNativeKey() methods, it seems better to 
> storing the mechanism object at java side, i.e. P11Key and its 
> associated classes, and then pass the object to JNI code (please also 
> see my webrev.01)
>
> In addition, I switched the reference counting to your model, i.e. 
> increase in init() and decrease in reset(), instead of the 
> try-n-finally model in prototype webrev.00. My earlier comment on 
> P11Cipher class which you should not replace the initialize() call 
> with ensureInitialized() call applies to all other PKCS11 classes as well.
>
> With this approach, the KeyID field of P11Key should not be freely 
> accessible and directly referenced outside of P11Key class. Also, the 
> increase and decrease of reference counting must be paired up. 
> Supposedly, the reference count should not go negative, right? If the 
> reference counting isn't correct, the key may be freed pre-maturely? 
> Lastly, the reference counting is an implementation detail and I think 
> it's better to keep it inside the P11Key class/file and not exposing 
> it, i.e. through method names.
>
> I have spent time verifying my updated prototype changes and trace the 
> reference counting. All look fine, except there is one regression test 
> failure (sun/security/tools/keytool/NssTest.java) on linux-x64 which I 
> am still troubleshooting. However, I will be on vacation from 10/4 to 
> 10/21, so I want to update you on what I have so you can continue 
> during my vacation.
>
> Thanks,
>
> Valerie
>
>
>
> On 9/18/2018 4:48 PM, Valerie Peng wrote:
>>
>> Hi Martin,
>>
>> I am ok with your conservation choice of only applying this when 
>> using NSS. If we are only applying this for NSS, we should really 
>> refactor the code to minimize the impact on callers and P11Key class. 
>> My prototype code may be on the extreme end of minimizing changes. 
>> But the current webrev can use some refactoring also. With your 
>> explanation, I now understand your model better. How about the 
>> refactoring in P11Key class? Is there a reason for not doing this? I 
>> did test my prototype code against existing regression tests (except 
>> the KeyStore ones as more API changes are needed for persistent keys 
>> which I have not covered in prototype) but I ran into some strange 
>> errors in some native p11 calls which I did not touch so I commented 
>> them out and just checked the part of reference count, etc.
>>
>> I will take a closer look at the KeyStore case and let you know.
>> Thanks,
>> Valerie
>>
>> On 9/18/2018 7:29 AM, Martin Balao wrote:
>>> Hi Valerie,
>>>
>>> Thanks for your comments.
>>>
>>> Here it is Webrev.11:
>>>
>>>  * 
>>> http://cr.openjdk.java.net/~mbalao/webrevs/6913047/6913047.webrev.11/ 
>>> <http://cr.openjdk.java.net/%7Embalao/webrevs/6913047/6913047.webrev.11/>
>>>  * 
>>> http://cr.openjdk.java.net/~mbalao/webrevs/6913047/6913047.webrev.11.zip 
>>> <http://cr.openjdk.java.net/%7Embalao/webrevs/6913047/6913047.webrev.11.zip>
>>>
>>> <src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Cipher.java>
>>> L397: That's right. I was trying to simplify the code but missed 
>>> this. Thanks.
>>> L471: The key reference counter has to be decremented under any 
>>> exception (P11Key.decNativeKeyRef method call). But, yes, no 
>>> exception different than PKCS11Exception should be thrown. Reverted 
>>> this change.
>>>
>>> <src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Key.java>
>>> L99: Comment changed. It should be better now.
>>> L148-L149: In fact, I'd enforce this and disable the feature for all 
>>> token keys. Token keys are permanent and extracting them is risky. 
>>> This criteria was already applied when dealing with key stores 
>>> (P11Keystore class).
>>>
>>> Yes, this feature is enabled for NSS only because it's the only 
>>> backend we currently know that is affected by this memory "leak" 
>>> issue. If there were any other software-token backend affected, we 
>>> can try this feature there too. HSMs shouldn't have any problem. I 
>>> prefer to take a more conservative approach and enable the feature 
>>> only in those cases in which it's really necessary. All other cases, 
>>> default to the previous mechanism for freeing memory.
>>>
>>> This does not replace the PhantomReference approach; both work 
>>> together and are complementary. In cases where temporary keys 
>>> feature is disabled or when a temporary key client is not behaving 
>>> correctly (i.e.: leaking stateful operations like "cipher" or 
>>> "signature" in an intermediate state with the native key 
>>> initialized), PhantomReference approach will be the last chance to 
>>> free memory. The native key object can be destroyed (C_DestroyObject 
>>> call) either from the PhantomReference mechanism or from the 
>>> temporary keys mechanism. There shouldn't be any conflict between 
>>> them. If it's destroyed through temporary keys mechanism, then we 
>>> know that the P11Key object is alive (refereced) and thus 
>>> PhantomReference destruction won't be taking place at the same time. 
>>> Once the key is deleted, keyID is set to 0 and session to null. 
>>> Thus, PhantomReference destruction won't have any effect when 
>>> executed later. If we think of the other case (when the key is freed 
>>> by PhantomReference), we have a P11Key object with a native key 
>>> initialized but with no references to it. Thus, destroyNativeKey 
>>> method won't be called and SessionKeyRef.disposeNative is the only 
>>> method that will delete the key.
>>>
>>> L157: that's right, synchronization has to be at class level. Fixed.
>>>
>>> L1343: It's not the same session: this.session was assigned a new 
>>> value (this.session = session;) before calling addObject.
>>>
>>> L1363: removeObject is called for the session, inside 
>>> setKeyIDAndSession: "this.session.removeObject();". Null is set to 
>>> this.session instance variable after this call.
>>>
>>> In regards to the refactorings you proposed, the problem I see with 
>>> moving key reference incrementing/decrementing to PKCS11.java is 
>>> that some operations are stateful. I.e.: encryption. When we 
>>> initialize the operation with C_EncryptInit, the key id is the 3rd 
>>> parameter. Destroying the key id and then doing C_EncryptUpdate 
>>> sounds incorrect to me. Have you tried the regression testing suite 
>>> after this refactoring? (I see some parts commented). In regards to 
>>> removing the tmpNativeKey parameter (used to explicitly disable the 
>>> feature for new P11Key objects), how do you handle the P11KeyStore 
>>> case? We don't want temporary keys there.
>>>
>>> Kind regards,
>>> Martin.-
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/security-dev/attachments/20181003/1e279944/attachment.htm>