RFR: 8231717: Improve performance of EBCDIC charset decoding for COMPACT_STRINGS

Andrew Leonard andrew_m_leonard at uk.ibm.com
Wed Oct 2 16:54:58 UTC 2019


Hi Roger,
Thank you for the feedback.
So your point about constructor performance is a good one, in my 
performance tests in which I have run both the StrCodingBenchmark.java 
test and also some simple standalone tests, I did not see any noticeable 
degradation for charsets that are not alwaysCompactable(), but I will look 
closer at that one. It has made me think though that I can do the 
isAlwaysCompactable determination at static init time, since the table is 
obviously static...I will investigate that optimization.

The change is actually not just EBCDIC, as I point out in the bug it's any 
SingleByte charset that is determined as "alwaysCompactable", I specified 
EBCDIC in the bug title as it is actually just about only the EBCDIC 
charsets that fall into this category. I can rename the "title" maybe?

The "negatives" check for isASCIICompatible is done only on the input 
bytes when the given charset "isASCIICompatible" as you said. This change 
is not related to hasNegatives() so not sure I quite understand the point 
about moving it? hasNegatives() is relavent to the input byte[] being 
decoded as if it contains negatives it will need complete mapping even for 
an isASCIICompatible charset. isAlwaysCompactable is about the charset in 
general, and can (and I think should) be computed at static init time, as 
it's simply determined by the fact that the static b2c[] table maps only 
to values 0->0xff.

EBCDIC codepages are not ASCIICompatible as they don't map directly, so 
you cannot combine the two, they are independent scenario fastpaths.

If I add a default impl for decodeToLatin1() I could add the methods to 
ArrayDecoder, I will look at that. I think I did it that way as I didn't 
want to affect anyone that may have implemented ArrayDecoder separately...

Thanks for the review, I will look at the static init of 
alwaysCompactable, and move the methods to ArrayDecoder, and do a closer 
check on constructor performance at the end of that.
Cheers
Andrew

Andrew Leonard
Java Runtimes Development
IBM Hursley
IBM United Kingdom Ltd
internet email: andrew_m_leonard at uk.ibm.com 




From:   Roger Riggs <Roger.Riggs at oracle.com>
To:     core-libs-dev at openjdk.java.net
Date:   02/10/2019 14:52
Subject:        Re: RFR: 8231717: Improve performance of EBCDIC charset 
decoding for COMPACT_STRINGS
Sent by:        "core-libs-dev" <core-libs-dev-bounces at openjdk.java.net>



Hi Andrew,

This would seem to have an impact on the performance of every Decoder 
constructor
because it determines dynamically the isAlwaysCompactable attribute.
Are there regressions in the non-EBCDIC cases?

Did your performance tests include the constructor overhead?

The isASCIICompatible case is handled by checking for negatives only 
when needed.

Unless the information is used more than once it looks like you've moved 
the computation
from StringCoding hasNegatives to the Decoder constructor.

Is there a change that would benefit both the isASCIICompatible case and 
EBCDIC?

Is there a reason you didn't add the isAlwaysCompactable method to the 
existing ArrayDecoder interface.
I don't think there there is a need for a new interface.

Thanks, Roger

On 10/2/19 4:10 AM, Andrew Leonard wrote:
> Hi,
> Please can I request a review of this performance enhancement for EBCDIC
> (and any SingleByte, always compactable) charsets? I've explained the
> theory in the bug (
https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8231717&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=NaV8Iy8Ld-vjpXZFDdTbgGlRTghGHnwM75wUPd5_NUQ&m=-rKUlQuiG0c80g_6taHF1xicVgTGLhZC0DQZzIrZC54&s=rZFusfU4b1q1rCMGEv_Ex05kXpMHHWdHsZdEbue6Oz0&e= 
), but
> essentially it optimizes any SingleByte charset that is always 
compactable
> due to all mappings being to <=0xff and avoids unnecessary char[] to
> internal Latin1 byte[] arraycopy as a result. This leads to up to a 100%
> performance gain for decoding these charsets.
> I have run the complete tier1 and also the complete sun/nio/cs testcases
> successfully.
>
> Webrev: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__cr.openjdk.java.net_-7Ealeonard_8231717_webrev.00_&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=NaV8Iy8Ld-vjpXZFDdTbgGlRTghGHnwM75wUPd5_NUQ&m=-rKUlQuiG0c80g_6taHF1xicVgTGLhZC0DQZzIrZC54&s=LvQIS6ORN5ZsNhwROkLLLEy_LXKv_v3FLrSCuxhPeDc&e= 

>
> Thoughts and comments welcome please?
> Thanks
> Andrew
>
> Andrew Leonard
> Java Runtimes Development
> IBM Hursley
> IBM United Kingdom Ltd
> internet email: andrew_m_leonard at uk.ibm.com
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



More information about the core-libs-dev mailing list