[threeten-dev] Ser performance

Fri Mar 8 10:43:52 PST 2013

oops:-) updated.

understood the (i!=0). the assumption is that this check should
not be reached in most use scenario (consider all "normal" zid
names, and we are trying to improve the performance for the
"normal" use scenario, in which, the zid names are those tzdb
names, with [A-Z][a-z/]+ pattern). With those "if"s inside the
loop, hotspot probably can do nothing to optimize, with  or
without this check. It looks a little "ugly", but it might be the
most straightforward/fastest one:-)

-Sherman

On 03/08/2013 10:28 AM, roger riggs wrote:
> Hi Sherman,
>
> Check the loop initial value, I think char(0) is not being checked.
>
> Personal preference is to avoid redundant and complex expressions in the loop
> (i != 0).
>
> Roger
>
> On 3/8/2013 1:05 PM, Xueming Shen wrote:
>>
>> I would assume "check first char first" is something similar to
>>
>>         char c = zoneId.charAt(0);
>>         if (c < 'A' || c > 'z' || (c > 'Z' && c < 'a'))
>>             throw new DateTimeException("Invalid ID for region-based ZoneId, invalid format: " + zoneId);
>>         for (int i = 1; i < n; i++) {
>>             c = zoneId.charAt(i);
>>             if (c >= 'a' && c <= 'z') continue;
>>             if (c >= 'A' && c <= 'Z') continue;
>>             if (c == '/') continue;
>>             if (c >= '0' && c <= '9') continue;
>>             if (c == '~') continue;
>>             if (c == '.') continue;
>>             if (c == '_') continue;
>>             if (c == '+') continue;
>>             if (c == '-') continue;
>>             throw new DateTimeException("Invalid ID for region-based ZoneId, invalid format: " + zoneId);
>>         }
>>
>> There is no change in speed, and personally I prefer the proposed one.
>> Moving the lower case the first check obviously brings in 5%+ boost,
>> with the assumption that the "original" tzdb names should be used
>> in most scenario, which has a capital first character followed by lower
>> case characters as the rest.
>>
>> Webrev has been updated to
>> (1) moved up lower case
>> (2) removed pattern
>> (3) moved the checkName after ofId()
>>
>> http://cr.openjdk.java.net/~sherman/jdk8_threeten/zidPerm/
>>
>> Added j.u.Date, it is around the expected number.
>>
>> --------LocalDate--------
>> serSize=16
>> Writing: 6696
>> Reading: 3218
>> --------LocalTime--------
>> serSize=17
>> Writing: 6166
>> Reading: 2959
>> --------LocalDateTime--------
>> serSize=23
>> Writing: 6268
>> Reading: 3606
>> --------ZonedDateTime--------
>> serSize=46
>> Writing: 7543
>> Reading: 8450
>> --------Date--------
>> serSize=17
>> Writing: 2951
>> Reading: 3485
>> --------GregorianCal--------
>> serSize=266
>> Writing: 29136
>> Reading: 43823
>> --------ZoneId.of()--------
>> Reading: 987
>>
>> -Sherman
>>
>> On 03/08/2013 08:27 AM, Xueming Shen wrote:
>>> On 3/8/13 2:02 AM, Stephen Colebourne wrote:
>>>> On 8 March 2013 05:10, Xueming Shen <xueming.shen at oracle.com> wrote:
>>>>> However I noticed that the ZDT reading/de-serialization time is kinda
>>>>> "unproportional"
>>>>> bigger, given it only has a LDT, a ZoneOffset and ZoneId to read in. It
>>>>> appears we "spend"
>>>>> most of the ZoneRegion.ofId() time on the name validation via the regex. the
>>>>> regex's
>>>>> matches() is too slow for this "simple" validation, unfortunately. Though I
>>>>> hate to propose,
>>>>> as the proud maintainer of jdk regex:-), but I think it might worth
>>>>> considering the simple
>>>>> char by char direct match, as we did in java.nio.Charset for the similar
>>>>> name validation
>>>>> situation, as showed in the following webrev.
>>>>>
>>>>> http://cr.openjdk.java.net/~sherman/jdk8_threeten/zidPerm/
>>>> I'm happy with the principle of the change. I think it might be
>>>> cleaner to read and faster if you checked the first character, then
>>>> checked the remaining characters. This is safe as the string is at
>>>> least length 2.
>>>
>>> The assumption of current impl is that in "normal" scenario the tested
>>> zoneId name is valid and  most of the character is A-Z and a-z. Maybe
>>> we should move _ and / up before 0-9. But I would guess the suggested
>>> approach would not be faster.
>>>
>>>>
>>>> You should also delete the pattern rather than just commenting it out.
>>>>
>>>> I'd also prefer the checkName() method to be located after ofId(), not
>>>> before, in the source file.
>>>>
>>>> thanks
>>>> Stephen
>>>
>>
>