RFR:JDK-8032051:"ZonedDateTime" class "parse" method fails with short time zone offset ("+01")

Thu Mar 3 18:37:03 UTC 2016

Hi,

Stephen, Roger Thanks for the comments.

Please see the updated webrev 
http://cr.openjdk.java.net/~ntv/8032051/webrev.04/

Regards,
Nadeesh

On 3/1/2016 12:29 AM, Stephen Colebourne wrote:
> I'm happy to go back to the spec I proposed before. That spec would
> determine colons dynamically only for pattern HH. Otherwise, it would
> use the presence/absence of a colon in the pattern as the signal. That
> would deal with the ISO-8601 problem and resolve the original issue
> (as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse
> using colons).
>
> Writing the spec wording is not easy however. I had:
>
> When parsing in lenient mode, only the hours are mandatory - minutes
> and seconds are optional. The colons are required if the specified
> pattern contains a colon. If the specified pattern is "+HH", the
> presence of colons is determined by whether the character after the
> hour digits is a colon or not. If the offset cannot be parsed then an
> exception is thrown unless the section of the formatter is optional.
>
> which isn't too bad but alternatives are possible.
>
> Stephen
>
>
>
>
> On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs at oracle.com> wrote:
>> Hi Stephen,
>>
>> As a fix for the original issue[1], not correctly parsing a ISO defined
>> offset, the use of lenient
>> was a convenient implementation technique (hack).  But with the expanded
>> definition of lenient,
>> it will allow many forms of the offset that are not allowed by the ISO
>> specification
>> and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME.
>> In particular, ISO requires the ":" to separate the minutes.
>> I'm not sure how to correctly fix the original issue with the new
>> specification of lenient offset
>> parsing without introducing some more specific implementation information.
>>
>>
>> WRT the lenient parsing mode for appendOffset:
>>
>> I was considering that the subfields of the offset were to be treated
>> leniently but it seems
>> you were treating the entire offset field and text as the unit to be treated
>> leniently.
>> The spec for lenient parsing would be clearer if it were specified as
>> allowing any
>> of the patterns of appendOffset.  The current wording around the character
>> after the hour
>> may be confusing.
>>
>> In the specification of appendOffset(pattern, noOffsetText) how about:
>>
>> "When parsing in lenient mode, the longest valid pattern that matches the
>> input is used. Only the hours are mandatory, minutes  and seconds are
>> optional."
>>
>> Roger
>>
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8032051
>>
>>
>>
>>
>>
>> On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
>>> It is important to also consider the case where the user wants to
>>> format using HH:MM but parse seconds if they are provided.
>>>
>>> As I said above, this is no different to SignStyle, where the user
>>> requests something specific on format, but accepts anything on input.
>>>
>>> The pattern is still used for formatting and strict parsing under
>>> these changes. It is effectively ignored in lenient parsing (which is
>>> the very definition of leniency).
>>>
>>> Another way to look at it:
>>>
>>> using a pattern of HH:MM and strict:
>>> +02 - disallowed
>>> +02:00 - allowed
>>> +02:00:00 - disallowed
>>>
>>> using a pattern of HH:mm and strict:
>>> +02 - allowed
>>> +02:00 - allowed
>>> +02:00:00 - disallowed
>>>
>>> using any pattern and lenient:
>>> +02 - allowed
>>> +02:00 - allowed
>>> +02:00:00 - allowed
>>>
>>> This covers pretty much anything a user needs when parsing.
>>>
>>> Stephen
>>>
>>>
>>> On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs at oracle.com> wrote:
>>>> Hi Stephen,
>>>>
>>>> Even in lenient mode the parser needs to stick to the fields provided in
>>>> the
>>>> pattern.
>>>> If the caller intends to parse seconds, the pattern should include
>>>> seconds.
>>>> Otherwise the caller has not been able to specify their intent.
>>>> That's consistent with lenient mode used in the other fields.
>>>> Otherwise, the pattern is irrelevant except for whether it contains a ":"
>>>> and makes
>>>> the spec nearly useless.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>> On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
>>>>> On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs at oracle.com>
>>>>> wrote:
>>>>>> Hi Stephen,
>>>>>>
>>>>>> It does not seem natural to me with a pattern of HHMM for it to parse
>>>>>> more
>>>>>> than 4 digits.
>>>>>> I can see lenient modifying the behavior as it it were HHmm, but there
>>>>>> is
>>>>>> no
>>>>>> indication in the pattern
>>>>>> that seconds would be considered.  How it would it be implied from the
>>>>>> spec?
>>>>> The spec is being expanded to define what happens. Previously it
>>>>> didn't define it at all, and would throw an error.
>>>>>
>>>>> Lenient parsing typically accepts much more than the strict parsing.
>>>>>
>>>>> When parsing numbers, you may set the SignStyle to NEVER, but the sign
>>>>> will still be parsed in lenient mode
>>>>>
>>>>> When parsing text, you may select the short output format, but any
>>>>> length of text will be parsed in lenient mode.
>>>>>
>>>>> As such, it is very much in line with the behavour of the API to parse
>>>>> a much broader format than the one requested in lenient mode. (None of
>>>>> this affects strict mode).
>>>>>
>>>>> Stephen
>>>>>
>>>>>
>>>>>> In the original issue, appendOffsetId is defined as using the +HH:MM:ss
>>>>>> pattern and
>>>>>> specific to ISO the MM should be allowed to be optional.   There is no
>>>>>> question of parsing
>>>>>> extra digits not included in the requested pattern.
>>>>>>
>>>>>> Separately, this is specifying the new lenient behavior of
>>>>>> appendOffset(pattern, noffsetText).
>>>>>> In that case, I don't think it will be understood that patterns
>>>>>> 'shorter'
>>>>>> than the input will
>>>>>> gobble up extra digits and ':'s.
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
>>>>>>
>>>>>> Lenient can be however lenient we define it to be. Allowing minutes
>>>>>> and seconds to be parsed when not specified in the pattern is the key
>>>>>> part of the change. Whether the parser copes with both colons and
>>>>>> no-colons is the choice at hand here. It seems to me that since the
>>>>>> parser can easily handle figuring out whether the colon is present or
>>>>>> not, we should just allow the parser to be fully lenient.
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>>
>>>>>> On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs at oracle.com>
>>>>>> wrote:
>>>>>>
>>>>>> HI Stephen,
>>>>>>
>>>>>> How lenient is lenient supposed to be? Looking at the offset test
>>>>>> cases,
>>>>>> it
>>>>>> seems to allow  minutes
>>>>>> and seconds digits to be parsed even if the pattern did not include
>>>>>> them.
>>>>>>
>>>>>> + @DataProvider(name="lenientOffsetParseData")
>>>>>> + Object[][] data_lenient_offset_parse() {
>>>>>> + return new Object[][] {
>>>>>> + {"+HH", "+01", 3600},
>>>>>> + {"+HH", "+0101", 3660},
>>>>>> + {"+HH", "+010101", 3661},
>>>>>> + {"+HH", "+01", 3600},
>>>>>> + {"+HH", "+01:01", 3660},
>>>>>> + {"+HH", "+01:01:01", 3661},
>>>>>>
>>>>>> Thanks, Roger
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
>>>>>>
>>>>>> I don't think this is quite right.
>>>>>>
>>>>>> if ((length > position + 3) && (text.charAt(position + 3) == ':')) {
>>>>>>        parseType = 10;
>>>>>> }
>>>>>>
>>>>>> This code will *always* select type 10 (colons) if a colon is found at
>>>>>> position+3. Whereas the spec now says that it should only do this if
>>>>>> the pattern is "HH". For other patterns, the colon/no-colon choice is
>>>>>> defined to be based on the pattern.
>>>>>>
>>>>>> That said, I'm thinking it is better to make the spec more lenient to
>>>>>> match the behaviour as implemented:
>>>>>>
>>>>>>
>>>>>> When parsing in lenient mode, only the hours are mandatory - minutes
>>>>>> and seconds are optional. If the character after the hour digits is a
>>>>>> colon
>>>>>> then the parser will parse using the pattern "HH:mm:ss", otherwise the
>>>>>> parser will parse using the pattern "HHmmss".
>>>>>>
>>>>>>
>>>>>> Additional TCKDateTimeFormatterBuilder tests will be needed to
>>>>>> demonstrate the above. There should also be a test for data following
>>>>>> the lenient parse. The following should all succeed:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId();
>>>>>> "+01:00Europe/London"
>>>>>> "+0100Europe/London"
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId();
>>>>>> "+01:Europe/London"
>>>>>>
>>>>>> Note this special case, where the colon affects the parse type, but is
>>>>>> not ultimately part of the offset, thus it is left to match the
>>>>>> appendLiteral(":")
>>>>>>
>>>>>> You may want to think of some additional nasty edge cases!
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv at oracle.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>> Please see the updated webrev
>>>>>> http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Nadeesh
>>>>>>
>>>>>> On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
>>>>>>
>>>>>> Thanks for the changes.
>>>>>>
>>>>>> In `DateTimeFormatter`, the code should be
>>>>>>
>>>>>> .parseLenient()
>>>>>> .appendOffsetId()
>>>>>> .parseStrict()
>>>>>>
>>>>>> and the same in the other case. This ensures that existing callers who
>>>>>> then embed the formatter in another formatter (like the
>>>>>> ZONED_DATE_TIME constant) are unaffected.
>>>>>>
>>>>>>
>>>>>> The logic for lenient parsing does not look right as it only handles
>>>>>> types 5 and 6. This table shows the mappings needed:
>>>>>>
>>>>>> "+HH",  -> "+HHmmss" or "+HH:mm:ss"
>>>>>> "+HHmm",  -> "+HHmmss",
>>>>>> "+HH:mm",  -> "+HH:mm:ss",
>>>>>> "+HHMM",  -> "+HHmmss",
>>>>>> "+HH:MM",  -> "+HH:mm:ss",
>>>>>> "+HHMMss",  -> "+HHmmss",
>>>>>> "+HH:MM:ss",  -> "+HH:mm:ss",
>>>>>> "+HHMMSS",  -> "+HHmmss",
>>>>>> "+HH:MM:SS",  -> "+HH:mm:ss",
>>>>>> "+HHmmss",
>>>>>> "+HH:mm:ss",
>>>>>>
>>>>>> Note that the "+HH" pattern is a special case, as we don't know
>>>>>> whether to use the colon or non-colon pattern. Whether to require
>>>>>> colon or not is based on whether the next character after the HH is a
>>>>>> colon or not.
>>>>>>
>>>>>> Proposed appendOffsetId() Javadoc:
>>>>>>
>>>>>> * Appends the zone offset, such as '+01:00', to the formatter.
>>>>>> * <p>
>>>>>> * This appends an instruction to format/parse the offset ID to the
>>>>>> builder.
>>>>>> * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}.
>>>>>> * See {@link #appendOffset(String, String)} for details on formatting
>>>>>> and parsing.
>>>>>>
>>>>>> Proposed appendOffset(String, String) Javadoc:
>>>>>>
>>>>>> * During parsing, the offset...
>>>>>>
>>>>>> changed to:
>>>>>>
>>>>>> * When parsing in strict mode, the input must contain the mandatory
>>>>>> and optional elements are defined by the specified pattern.
>>>>>> * If the offset cannot be parsed then an exception is thrown unless
>>>>>> the section of the formatter is optional.
>>>>>> * <p>
>>>>>> * When parsing in lenient mode, only the hours are mandatory - minutes
>>>>>> and seconds are optional.
>>>>>> * The colons are required if the specified pattern contains a colon.
>>>>>> * If the specified pattern is "+HH", the presence of colons is
>>>>>> determined by whether the character after the hour digits is a colon
>>>>>> or not.
>>>>>> * If the offset cannot be parsed then an exception is thrown unless
>>>>>> the section of the formatter is optional.
>>>>>>
>>>>>> thanks and sorry for delay
>>>>>> Stephen
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv at oracle.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Please review a fix for
>>>>>>
>>>>>> Bug Id  https://bugs.openjdk.java.net/browse/JDK-8032051
>>>>>>
>>>>>> webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Nadeesh TV
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Nadeesh TV
>>>>>>
>>>>>>

-- 
Thanks and Regards,
Nadeesh TV