Adding an Intrinsic for MD5

Ludovic Henry luhenry at microsoft.com
Wed Jul 29 19:13:32 UTC 2020


To add some more information, I've uploaded one of the `hs_err_pid*.log` file at [1].

--
Ludovic

[1] http://cr.openjdk.java.net/~burban/luhenry/md5-intrinsics/hs_err_pid28286.log

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On Behalf Of Ludovic Henry
Sent: Wednesday, July 29, 2020 9:55 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Adding an Intrinsic for MD5

Hi,

After doing profiling on some applications on Azure, I noticed that MD5 takes a significant time when verifying the content of large amount of downloaded data (see [1] for a flamegraph of some Spark operations pulling data from Azure Storage, look at the top most `Lsun/securitu/pro..` entry representing 11.68% of the samples). I then looked into the code generated for `sun.security.provider.MD5.implCompress` (the hottest method). I observed that the generated code contains many branches that are never taken and not even necessary (array-bound checks on a fixed sized array for which we already checked the size, for example). On top of that, MD5 doesn't require any (there are no conditions and no loops), making all these branches pure overhead. Accelerating MD5 will not be only beneficial to Azure workloads, but to anyone doing any sort of content hashing/verification with MD5 (which is quite unfortunate given the known flaws of MD5 and the availability of faster alternatives with greater cryptographical qualities).

I worked last night on a prototype of an intrinsic, which I've uploaded at [2]. It's a very rough draft and I want to have your input before I invest further into it.

As it is the first time I do such work (adding an intrinsic, generating assembly by hand, adding support for one instruction in the assembler), I'm still running into a crash and I am not sure how to debug it further. I would really appreciate any pointer on how I need to approach debugging such an issue, or even for an expert to look into my change and help me pinpoint what's going wrong. So far, I used the disassembly and hs_err*.log file to clearly see the generated code and the machine state at the time of the crash. I expect the problem to be around calling conventions and assumptions around the shape/content of the parameters. I'll keep debugging in the meantime.

Thank you very much,

--
Ludovic

[1] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fflamegraph-45235.svg&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858391072&sdata=1rNcCYW29l4KZPjpXT1%2F3nSWma3%2F83rXaIwNsw9s1GM%3D&reserved=0
[2] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858401068&sdata=014gBkFRpgC4QT6U0Zp4%2FKSI0qv0g3fXEJ4YL12bDX0%3D&reserved=0


More information about the hotspot-compiler-dev mailing list