RFR:JDK-8032051:"ZonedDateTime" class "parse" method fails with short time zone offset ("+01")

Mon Feb 29 18:59:15 UTC 2016

I'm happy to go back to the spec I proposed before. That spec would
determine colons dynamically only for pattern HH. Otherwise, it would
use the presence/absence of a colon in the pattern as the signal. That
would deal with the ISO-8601 problem and resolve the original issue
(as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse
using colons).

Writing the spec wording is not easy however. I had:

When parsing in lenient mode, only the hours are mandatory - minutes
and seconds are optional. The colons are required if the specified
pattern contains a colon. If the specified pattern is "+HH", the
presence of colons is determined by whether the character after the
hour digits is a colon or not. If the offset cannot be parsed then an
exception is thrown unless the section of the formatter is optional.

which isn't too bad but alternatives are possible.

Stephen

On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs at oracle.com> wrote:
> Hi Stephen,
>
> As a fix for the original issue[1], not correctly parsing a ISO defined
> offset, the use of lenient
> was a convenient implementation technique (hack).  But with the expanded
> definition of lenient,
> it will allow many forms of the offset that are not allowed by the ISO
> specification
> and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME.
> In particular, ISO requires the ":" to separate the minutes.
> I'm not sure how to correctly fix the original issue with the new
> specification of lenient offset
> parsing without introducing some more specific implementation information.
>
>
> WRT the lenient parsing mode for appendOffset:
>
> I was considering that the subfields of the offset were to be treated
> leniently but it seems
> you were treating the entire offset field and text as the unit to be treated
> leniently.
> The spec for lenient parsing would be clearer if it were specified as
> allowing any
> of the patterns of appendOffset.  The current wording around the character
> after the hour
> may be confusing.
>
> In the specification of appendOffset(pattern, noOffsetText) how about:
>
> "When parsing in lenient mode, the longest valid pattern that matches the
> input is used. Only the hours are mandatory, minutes  and seconds are
> optional."
>
> Roger
>
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8032051
>
>
>
>
>
> On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
>>
>> It is important to also consider the case where the user wants to
>> format using HH:MM but parse seconds if they are provided.
>>
>> As I said above, this is no different to SignStyle, where the user
>> requests something specific on format, but accepts anything on input.
>>
>> The pattern is still used for formatting and strict parsing under
>> these changes. It is effectively ignored in lenient parsing (which is
>> the very definition of leniency).
>>
>> Another way to look at it:
>>
>> using a pattern of HH:MM and strict:
>> +02 - disallowed
>> +02:00 - allowed
>> +02:00:00 - disallowed
>>
>> using a pattern of HH:mm and strict:
>> +02 - allowed
>> +02:00 - allowed
>> +02:00:00 - disallowed
>>
>> using any pattern and lenient:
>> +02 - allowed
>> +02:00 - allowed
>> +02:00:00 - allowed
>>
>> This covers pretty much anything a user needs when parsing.
>>
>> Stephen
>>
>>
>> On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs at oracle.com> wrote:
>>>
>>> Hi Stephen,
>>>
>>> Even in lenient mode the parser needs to stick to the fields provided in
>>> the
>>> pattern.
>>> If the caller intends to parse seconds, the pattern should include
>>> seconds.
>>> Otherwise the caller has not been able to specify their intent.
>>> That's consistent with lenient mode used in the other fields.
>>> Otherwise, the pattern is irrelevant except for whether it contains a ":"
>>> and makes
>>> the spec nearly useless.
>>>
>>> Roger
>>>
>>>
>>>
>>> On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
>>>>
>>>> On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs at oracle.com>
>>>> wrote:
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> It does not seem natural to me with a pattern of HHMM for it to parse
>>>>> more
>>>>> than 4 digits.
>>>>> I can see lenient modifying the behavior as it it were HHmm, but there
>>>>> is
>>>>> no
>>>>> indication in the pattern
>>>>> that seconds would be considered.  How it would it be implied from the
>>>>> spec?
>>>>
>>>> The spec is being expanded to define what happens. Previously it
>>>> didn't define it at all, and would throw an error.
>>>>
>>>> Lenient parsing typically accepts much more than the strict parsing.
>>>>
>>>> When parsing numbers, you may set the SignStyle to NEVER, but the sign
>>>> will still be parsed in lenient mode
>>>>
>>>> When parsing text, you may select the short output format, but any
>>>> length of text will be parsed in lenient mode.
>>>>
>>>> As such, it is very much in line with the behavour of the API to parse
>>>> a much broader format than the one requested in lenient mode. (None of
>>>> this affects strict mode).
>>>>
>>>> Stephen
>>>>
>>>>
>>>>> In the original issue, appendOffsetId is defined as using the +HH:MM:ss
>>>>> pattern and
>>>>> specific to ISO the MM should be allowed to be optional.   There is no
>>>>> question of parsing
>>>>> extra digits not included in the requested pattern.
>>>>>
>>>>> Separately, this is specifying the new lenient behavior of
>>>>> appendOffset(pattern, noffsetText).
>>>>> In that case, I don't think it will be understood that patterns
>>>>> 'shorter'
>>>>> than the input will
>>>>> gobble up extra digits and ':'s.
>>>>>
>>>>> Roger
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
>>>>>
>>>>> Lenient can be however lenient we define it to be. Allowing minutes
>>>>> and seconds to be parsed when not specified in the pattern is the key
>>>>> part of the change. Whether the parser copes with both colons and
>>>>> no-colons is the choice at hand here. It seems to me that since the
>>>>> parser can easily handle figuring out whether the colon is present or
>>>>> not, we should just allow the parser to be fully lenient.
>>>>>
>>>>> Stephen
>>>>>
>>>>>
>>>>> On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs at oracle.com>
>>>>> wrote:
>>>>>
>>>>> HI Stephen,
>>>>>
>>>>> How lenient is lenient supposed to be? Looking at the offset test
>>>>> cases,
>>>>> it
>>>>> seems to allow  minutes
>>>>> and seconds digits to be parsed even if the pattern did not include
>>>>> them.
>>>>>
>>>>> + @DataProvider(name="lenientOffsetParseData")
>>>>> + Object[][] data_lenient_offset_parse() {
>>>>> + return new Object[][] {
>>>>> + {"+HH", "+01", 3600},
>>>>> + {"+HH", "+0101", 3660},
>>>>> + {"+HH", "+010101", 3661},
>>>>> + {"+HH", "+01", 3600},
>>>>> + {"+HH", "+01:01", 3660},
>>>>> + {"+HH", "+01:01:01", 3661},
>>>>>
>>>>> Thanks, Roger
>>>>>
>>>>>
>>>>>
>>>>> On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
>>>>>
>>>>> I don't think this is quite right.
>>>>>
>>>>> if ((length > position + 3) && (text.charAt(position + 3) == ':')) {
>>>>>       parseType = 10;
>>>>> }
>>>>>
>>>>> This code will *always* select type 10 (colons) if a colon is found at
>>>>> position+3. Whereas the spec now says that it should only do this if
>>>>> the pattern is "HH". For other patterns, the colon/no-colon choice is
>>>>> defined to be based on the pattern.
>>>>>
>>>>> That said, I'm thinking it is better to make the spec more lenient to
>>>>> match the behaviour as implemented:
>>>>>
>>>>>
>>>>> When parsing in lenient mode, only the hours are mandatory - minutes
>>>>> and seconds are optional. If the character after the hour digits is a
>>>>> colon
>>>>> then the parser will parse using the pattern "HH:mm:ss", otherwise the
>>>>> parser will parse using the pattern "HHmmss".
>>>>>
>>>>>
>>>>> Additional TCKDateTimeFormatterBuilder tests will be needed to
>>>>> demonstrate the above. There should also be a test for data following
>>>>> the lenient parse. The following should all succeed:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId();
>>>>> "+01:00Europe/London"
>>>>> "+0100Europe/London"
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId();
>>>>> "+01:Europe/London"
>>>>>
>>>>> Note this special case, where the colon affects the parse type, but is
>>>>> not ultimately part of the offset, thus it is left to match the
>>>>> appendLiteral(":")
>>>>>
>>>>> You may want to think of some additional nasty edge cases!
>>>>>
>>>>> Stephen
>>>>>
>>>>> On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv at oracle.com> wrote:
>>>>>
>>>>> Hi all,
>>>>> Please see the updated webrev
>>>>> http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
>>>>>
>>>>> Thanks and Regards,
>>>>> Nadeesh
>>>>>
>>>>> On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
>>>>>
>>>>> Thanks for the changes.
>>>>>
>>>>> In `DateTimeFormatter`, the code should be
>>>>>
>>>>> .parseLenient()
>>>>> .appendOffsetId()
>>>>> .parseStrict()
>>>>>
>>>>> and the same in the other case. This ensures that existing callers who
>>>>> then embed the formatter in another formatter (like the
>>>>> ZONED_DATE_TIME constant) are unaffected.
>>>>>
>>>>>
>>>>> The logic for lenient parsing does not look right as it only handles
>>>>> types 5 and 6. This table shows the mappings needed:
>>>>>
>>>>> "+HH",  -> "+HHmmss" or "+HH:mm:ss"
>>>>> "+HHmm",  -> "+HHmmss",
>>>>> "+HH:mm",  -> "+HH:mm:ss",
>>>>> "+HHMM",  -> "+HHmmss",
>>>>> "+HH:MM",  -> "+HH:mm:ss",
>>>>> "+HHMMss",  -> "+HHmmss",
>>>>> "+HH:MM:ss",  -> "+HH:mm:ss",
>>>>> "+HHMMSS",  -> "+HHmmss",
>>>>> "+HH:MM:SS",  -> "+HH:mm:ss",
>>>>> "+HHmmss",
>>>>> "+HH:mm:ss",
>>>>>
>>>>> Note that the "+HH" pattern is a special case, as we don't know
>>>>> whether to use the colon or non-colon pattern. Whether to require
>>>>> colon or not is based on whether the next character after the HH is a
>>>>> colon or not.
>>>>>
>>>>> Proposed appendOffsetId() Javadoc:
>>>>>
>>>>> * Appends the zone offset, such as '+01:00', to the formatter.
>>>>> * <p>
>>>>> * This appends an instruction to format/parse the offset ID to the
>>>>> builder.
>>>>> * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}.
>>>>> * See {@link #appendOffset(String, String)} for details on formatting
>>>>> and parsing.
>>>>>
>>>>> Proposed appendOffset(String, String) Javadoc:
>>>>>
>>>>> * During parsing, the offset...
>>>>>
>>>>> changed to:
>>>>>
>>>>> * When parsing in strict mode, the input must contain the mandatory
>>>>> and optional elements are defined by the specified pattern.
>>>>> * If the offset cannot be parsed then an exception is thrown unless
>>>>> the section of the formatter is optional.
>>>>> * <p>
>>>>> * When parsing in lenient mode, only the hours are mandatory - minutes
>>>>> and seconds are optional.
>>>>> * The colons are required if the specified pattern contains a colon.
>>>>> * If the specified pattern is "+HH", the presence of colons is
>>>>> determined by whether the character after the hour digits is a colon
>>>>> or not.
>>>>> * If the offset cannot be parsed then an exception is thrown unless
>>>>> the section of the formatter is optional.
>>>>>
>>>>> thanks and sorry for delay
>>>>> Stephen
>>>>>
>>>>>
>>>>>
>>>>> On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv at oracle.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Please review a fix for
>>>>>
>>>>> Bug Id  https://bugs.openjdk.java.net/browse/JDK-8032051
>>>>>
>>>>> webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Nadeesh TV
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Nadeesh TV
>>>>>
>>>>>
>