Adding an Intrinsic for MD5

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Wed Jul 29 23:10:34 UTC 2020


Hi Ludovic,

It's a crash due to a out-of-bounds Java heap access (right at the upper 
heap boundary). Something is wrong either with the initial buf value 
(r15) or limit check:

166   if (multi_block) {
167     // increment data pointer and loop if more to process
168     addptr(buf, 64);
169     movptr(rsi, ofs);
170     addptr(rsi, 64);
171     movptr(ofs, rsi);
172     cmpptr(rsi, limit);
173     jcc(Assembler::belowEqual, loop0);
174   }

 From the hs_err log:

#  SIGSEGV (0xb) at pc=0x00007f34f10354a1, pid=28286, tid=28305

siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr:
0x0000000510800000


   0x00007f34f10354a1:   add     0x18(%r15),%ecx


R15=0x00000005107fffe8 points into unknown readable memory: 
0x0000000000000000 | 00 00 00 00 00 00 00 00


| 100|0x0000000510000000, 0x0000000510800000, 0x0000000510800000|100%| 
E|CS|TAMS 0x0000000510000000, 0x0000000510000000| Complete

Regarding ways to debug it, I'd put a breakpoint right at the beginning 
of the stub first to validate that parameters are valid. Then I'd dump 
parameters on stack in order to simplify post-mortem analysis. (If the 
problem is with limit check, then many iterations should pass before it 
reaches the end of the Java heap.) Also, inserting debug checks in the 
stub itself can catch an inconsistency much closer to the actual place 
where the bug lurks.

Best regards,
Vladimir Ivanov

On 29.07.2020 22:13, Ludovic Henry wrote:
> To add some more information, I've uploaded one of the `hs_err_pid*.log` file at [1].
> 
> --
> Ludovic
> 
> [1] http://cr.openjdk.java.net/~burban/luhenry/md5-intrinsics/hs_err_pid28286.log
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On Behalf Of Ludovic Henry
> Sent: Wednesday, July 29, 2020 9:55 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Adding an Intrinsic for MD5
> 
> Hi,
> 
> After doing profiling on some applications on Azure, I noticed that MD5 takes a significant time when verifying the content of large amount of downloaded data (see [1] for a flamegraph of some Spark operations pulling data from Azure Storage, look at the top most `Lsun/securitu/pro..` entry representing 11.68% of the samples). I then looked into the code generated for `sun.security.provider.MD5.implCompress` (the hottest method). I observed that the generated code contains many branches that are never taken and not even necessary (array-bound checks on a fixed sized array for which we already checked the size, for example). On top of that, MD5 doesn't require any (there are no conditions and no loops), making all these branches pure overhead. Accelerating MD5 will not be only beneficial to Azure workloads, but to anyone doing any sort of content hashing/verification with MD5 (which is quite unfortunate given the known flaws of MD5 and the availability of faster alternatives with greater cryptographical qualities).
> 
> I worked last night on a prototype of an intrinsic, which I've uploaded at [2]. It's a very rough draft and I want to have your input before I invest further into it.
> 
> As it is the first time I do such work (adding an intrinsic, generating assembly by hand, adding support for one instruction in the assembler), I'm still running into a crash and I am not sure how to debug it further. I would really appreciate any pointer on how I need to approach debugging such an issue, or even for an expert to look into my change and help me pinpoint what's going wrong. So far, I used the disassembly and hs_err*.log file to clearly see the generated code and the machine state at the time of the crash. I expect the problem to be around calling conventions and assumptions around the shape/content of the parameters. I'll keep debugging in the meantime.
> 
> Thank you very much,
> 
> --
> Ludovic
> 
> [1] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fflamegraph-45235.svg&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858391072&sdata=1rNcCYW29l4KZPjpXT1%2F3nSWma3%2F83rXaIwNsw9s1GM%3D&reserved=0
> [2] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858401068&sdata=014gBkFRpgC4QT6U0Zp4%2FKSI0qv0g3fXEJ4YL12bDX0%3D&reserved=0
> 


More information about the hotspot-compiler-dev mailing list