RFR: 8231717: Improve performance of EBCDIC charset decoding for COMPACT_STRINGS

Wed Oct 2 18:26:55 UTC 2019

Hi Andrew,

On 10/2/19 12:54 PM, Andrew Leonard wrote:
> Hi Roger,
> Thank you for the feedback.
> So your point about constructor performance is a good one, in my 
> performance tests in which I have run both the StrCodingBenchmark.java 
> test and also some simple standalone tests, I did not see any 
> noticeable degradation for charsets that are not alwaysCompactable(), 
> but I will look closer at that one. It has made me think though that I 
> can do the isAlwaysCompactable determination at static init time, 
> since the table is obviously static...I will investigate that 
> optimization.
>
> The change is actually not just EBCDIC, as I point out in the bug it's 
> any SingleByte charset that is determined as "alwaysCompactable", I 
> specified EBCDIC in the bug title as it is actually just about only 
> the EBCDIC charsets that fall into this category. I can rename the 
> "title" maybe?

So I think the name should be closer to everything that can be 
decoded/mapped to Latin1 or 8-bit clean.
>
>
> The "negatives" check for isASCIICompatible is done only on the input 
> bytes when the given charset "isASCIICompatible" as you said. This 
> change is not related to hasNegatives() so not sure I quite understand 
> the point about moving it? hasNegatives() is relavent to the input 
> byte[] being decoded as if it contains negatives it will need complete 
> mapping even for an isASCIICompatible charset.
ok
> isAlwaysCompactable is about the charset in general, and can (and I 
> think should) be computed at static init time, as it's simply 
> determined by the fact that the static b2c[] table maps only to values 
> 0->0xff.
>
> EBCDIC codepages are not ASCIICompatible as they don't map directly, 
> so you cannot combine the two, they are independent scenario fastpaths.
rigth, there is a mapping needed.
>
>
> If I add a default impl for decodeToLatin1() I could add the methods 
> to ArrayDecoder, I will look at that. I think I did it that way as I 
> didn't want to affect anyone that may have implemented ArrayDecoder 
> separately...
I was also thinking of ISO_8859_1, that doesn't need a mapping,
and trying to avoid a multiple separate decisions and data paths.
(while keeping the code simple)

Thanks, Roger

>
>
> Thanks for the review, I will look at the static init of 
> alwaysCompactable, and move the methods to ArrayDecoder, and do a 
> closer check on constructor performance at the end of that.
> Cheers
> Andrew
>
> Andrew Leonard
> Java Runtimes Development
> IBM Hursley
> IBM United Kingdom Ltd
> internet email: andrew_m_leonard at uk.ibm.com
>
>
>
>
> From: Roger Riggs <Roger.Riggs at oracle.com>
> To: core-libs-dev at openjdk.java.net
> Date: 02/10/2019 14:52
> Subject: Re: RFR: 8231717: Improve performance of EBCDIC charset 
> decoding for COMPACT_STRINGS
> Sent by: "core-libs-dev" <core-libs-dev-bounces at openjdk.java.net>
> ------------------------------------------------------------------------
>
>
>
> Hi Andrew,
>
> This would seem to have an impact on the performance of every Decoder
> constructor
> because it determines dynamically the isAlwaysCompactable attribute.
> Are there regressions in the non-EBCDIC cases?
>
> Did your performance tests include the constructor overhead?
>
> The isASCIICompatible case is handled by checking for negatives only
> when needed.
>
> Unless the information is used more than once it looks like you've moved
> the computation
> from StringCoding hasNegatives to the Decoder constructor.
>
> Is there a change that would benefit both the isASCIICompatible case and
> EBCDIC?
>
> Is there a reason you didn't add the isAlwaysCompactable method to the
> existing ArrayDecoder interface.
> I don't think there there is a need for a new interface.
>
> Thanks, Roger
>
> On 10/2/19 4:10 AM, Andrew Leonard wrote:
> > Hi,
> > Please can I request a review of this performance enhancement for EBCDIC
> > (and any SingleByte, always compactable) charsets? I've explained the
> > theory in the bug 
> (https://bugs.openjdk.java.net/browse/JDK-8231717), but
> > essentially it optimizes any SingleByte charset that is always 
> compactable
> > due to all mappings being to <=0xff and avoids unnecessary char[] to
> > internal Latin1 byte[] arraycopy as a result. This leads to up to a 100%
> > performance gain for decoding these charsets.
> > I have run the complete tier1 and also the complete sun/nio/cs testcases
> > successfully.
> >
> > Webrev: http://cr.openjdk.java.net/~aleonard/8231717/webrev.00/
> >
> > Thoughts and comments welcome please?
> > Thanks
> > Andrew
> >
> > Andrew Leonard
> > Java Runtimes Development
> > IBM Hursley
> > IBM United Kingdom Ltd
> > internet email: andrew_m_leonard at uk.ibm.com
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
> PO6 3AU
> >
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with 
> number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU