[threeten-dev] Ser performance

Fri Mar 8 10:05:43 PST 2013

I would assume "check first char first" is something similar to

         char c = zoneId.charAt(0);
         if (c < 'A' || c > 'z' || (c > 'Z' && c < 'a'))
             throw new DateTimeException("Invalid ID for region-based ZoneId, invalid format: " + zoneId);
         for (int i = 1; i < n; i++) {
             c = zoneId.charAt(i);
             if (c >= 'a' && c <= 'z') continue;
             if (c >= 'A' && c <= 'Z') continue;
             if (c == '/') continue;
             if (c >= '0' && c <= '9') continue;
             if (c == '~') continue;
             if (c == '.') continue;
             if (c == '_') continue;
             if (c == '+') continue;
             if (c == '-') continue;
             throw new DateTimeException("Invalid ID for region-based ZoneId, invalid format: " + zoneId);
         }

There is no change in speed, and personally I prefer the proposed one.
Moving the lower case the first check obviously brings in 5%+ boost,
with the assumption that the "original" tzdb names should be used
in most scenario, which has a capital first character followed by lower
case characters as the rest.

Webrev has been updated to
(1) moved up lower case
(2) removed pattern
(3) moved the checkName after ofId()

http://cr.openjdk.java.net/~sherman/jdk8_threeten/zidPerm/

Added j.u.Date, it is around the expected number.

--------LocalDate--------
serSize=16
Writing: 6696
Reading: 3218
--------LocalTime--------
serSize=17
Writing: 6166
Reading: 2959
--------LocalDateTime--------
serSize=23
Writing: 6268
Reading: 3606
--------ZonedDateTime--------
serSize=46
Writing: 7543
Reading: 8450
--------Date--------
serSize=17
Writing: 2951
Reading: 3485
--------GregorianCal--------
serSize=266
Writing: 29136
Reading: 43823
--------ZoneId.of()--------
Reading: 987

-Sherman

On 03/08/2013 08:27 AM, Xueming Shen wrote:
> On 3/8/13 2:02 AM, Stephen Colebourne wrote:
>> On 8 March 2013 05:10, Xueming Shen <xueming.shen at oracle.com> wrote:
>>> However I noticed that the ZDT reading/de-serialization time is kinda
>>> "unproportional"
>>> bigger, given it only has a LDT, a ZoneOffset and ZoneId to read in. It
>>> appears we "spend"
>>> most of the ZoneRegion.ofId() time on the name validation via the regex. the
>>> regex's
>>> matches() is too slow for this "simple" validation, unfortunately. Though I
>>> hate to propose,
>>> as the proud maintainer of jdk regex:-), but I think it might worth
>>> considering the simple
>>> char by char direct match, as we did in java.nio.Charset for the similar
>>> name validation
>>> situation, as showed in the following webrev.
>>>
>>> http://cr.openjdk.java.net/~sherman/jdk8_threeten/zidPerm/
>> I'm happy with the principle of the change. I think it might be
>> cleaner to read and faster if you checked the first character, then
>> checked the remaining characters. This is safe as the string is at
>> least length 2.
>
> The assumption of current impl is that in "normal" scenario the tested
> zoneId name is valid and  most of the character is A-Z and a-z. Maybe
> we should move _ and / up before 0-9. But I would guess the suggested
> approach would not be faster.
>
>>
>> You should also delete the pattern rather than just commenting it out.
>>
>> I'd also prefer the checkName() method to be located after ofId(), not
>> before, in the source file.
>>
>> thanks
>> Stephen
>