RFR:JDK-8032051:"ZonedDateTime" class "parse" method fails with short time zone offset ("+01")

Thu Mar 3 20:59:55 UTC 2016

Hi Nadeesh,

Looks good.

Thanks, Roger

On 3/3/2016 1:37 PM, nadeesh tv wrote:
> Hi,
>
> Stephen, Roger Thanks for the comments.
>
> Please see the updated webrev 
> http://cr.openjdk.java.net/~ntv/8032051/webrev.04/
>
>
> Regards,
> Nadeesh
>
>
> On 3/1/2016 12:29 AM, Stephen Colebourne wrote:
>> I'm happy to go back to the spec I proposed before. That spec would
>> determine colons dynamically only for pattern HH. Otherwise, it would
>> use the presence/absence of a colon in the pattern as the signal. That
>> would deal with the ISO-8601 problem and resolve the original issue
>> (as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse
>> using colons).
>>
>> Writing the spec wording is not easy however. I had:
>>
>> When parsing in lenient mode, only the hours are mandatory - minutes
>> and seconds are optional. The colons are required if the specified
>> pattern contains a colon. If the specified pattern is "+HH", the
>> presence of colons is determined by whether the character after the
>> hour digits is a colon or not. If the offset cannot be parsed then an
>> exception is thrown unless the section of the formatter is optional.
>>
>> which isn't too bad but alternatives are possible.
>>
>> Stephen
>>
>>
>>
>>
>> On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs at oracle.com> 
>> wrote:
>>> Hi Stephen,
>>>
>>> As a fix for the original issue[1], not correctly parsing a ISO defined
>>> offset, the use of lenient
>>> was a convenient implementation technique (hack).  But with the 
>>> expanded
>>> definition of lenient,
>>> it will allow many forms of the offset that are not allowed by the ISO
>>> specification
>>> and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME.
>>> In particular, ISO requires the ":" to separate the minutes.
>>> I'm not sure how to correctly fix the original issue with the new
>>> specification of lenient offset
>>> parsing without introducing some more specific implementation 
>>> information.
>>>
>>>
>>> WRT the lenient parsing mode for appendOffset:
>>>
>>> I was considering that the subfields of the offset were to be treated
>>> leniently but it seems
>>> you were treating the entire offset field and text as the unit to be 
>>> treated
>>> leniently.
>>> The spec for lenient parsing would be clearer if it were specified as
>>> allowing any
>>> of the patterns of appendOffset.  The current wording around the 
>>> character
>>> after the hour
>>> may be confusing.
>>>
>>> In the specification of appendOffset(pattern, noOffsetText) how about:
>>>
>>> "When parsing in lenient mode, the longest valid pattern that 
>>> matches the
>>> input is used. Only the hours are mandatory, minutes  and seconds are
>>> optional."
>>>
>>> Roger
>>>
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8032051
>>>
>>>
>>>
>>>
>>>
>>> On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
>>>> It is important to also consider the case where the user wants to
>>>> format using HH:MM but parse seconds if they are provided.
>>>>
>>>> As I said above, this is no different to SignStyle, where the user
>>>> requests something specific on format, but accepts anything on input.
>>>>
>>>> The pattern is still used for formatting and strict parsing under
>>>> these changes. It is effectively ignored in lenient parsing (which is
>>>> the very definition of leniency).
>>>>
>>>> Another way to look at it:
>>>>
>>>> using a pattern of HH:MM and strict:
>>>> +02 - disallowed
>>>> +02:00 - allowed
>>>> +02:00:00 - disallowed
>>>>
>>>> using a pattern of HH:mm and strict:
>>>> +02 - allowed
>>>> +02:00 - allowed
>>>> +02:00:00 - disallowed
>>>>
>>>> using any pattern and lenient:
>>>> +02 - allowed
>>>> +02:00 - allowed
>>>> +02:00:00 - allowed
>>>>
>>>> This covers pretty much anything a user needs when parsing.
>>>>
>>>> Stephen
>>>>
>>>>
>>>> On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs at oracle.com> 
>>>> wrote:
>>>>> Hi Stephen,
>>>>>
>>>>> Even in lenient mode the parser needs to stick to the fields 
>>>>> provided in
>>>>> the
>>>>> pattern.
>>>>> If the caller intends to parse seconds, the pattern should include
>>>>> seconds.
>>>>> Otherwise the caller has not been able to specify their intent.
>>>>> That's consistent with lenient mode used in the other fields.
>>>>> Otherwise, the pattern is irrelevant except for whether it 
>>>>> contains a ":"
>>>>> and makes
>>>>> the spec nearly useless.
>>>>>
>>>>> Roger
>>>>>
>>>>>
>>>>>
>>>>> On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
>>>>>> On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs at oracle.com>
>>>>>> wrote:
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> It does not seem natural to me with a pattern of HHMM for it to 
>>>>>>> parse
>>>>>>> more
>>>>>>> than 4 digits.
>>>>>>> I can see lenient modifying the behavior as it it were HHmm, but 
>>>>>>> there
>>>>>>> is
>>>>>>> no
>>>>>>> indication in the pattern
>>>>>>> that seconds would be considered.  How it would it be implied 
>>>>>>> from the
>>>>>>> spec?
>>>>>> The spec is being expanded to define what happens. Previously it
>>>>>> didn't define it at all, and would throw an error.
>>>>>>
>>>>>> Lenient parsing typically accepts much more than the strict parsing.
>>>>>>
>>>>>> When parsing numbers, you may set the SignStyle to NEVER, but the 
>>>>>> sign
>>>>>> will still be parsed in lenient mode
>>>>>>
>>>>>> When parsing text, you may select the short output format, but any
>>>>>> length of text will be parsed in lenient mode.
>>>>>>
>>>>>> As such, it is very much in line with the behavour of the API to 
>>>>>> parse
>>>>>> a much broader format than the one requested in lenient mode. 
>>>>>> (None of
>>>>>> this affects strict mode).
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>>
>>>>>>> In the original issue, appendOffsetId is defined as using the 
>>>>>>> +HH:MM:ss
>>>>>>> pattern and
>>>>>>> specific to ISO the MM should be allowed to be optional.   There 
>>>>>>> is no
>>>>>>> question of parsing
>>>>>>> extra digits not included in the requested pattern.
>>>>>>>
>>>>>>> Separately, this is specifying the new lenient behavior of
>>>>>>> appendOffset(pattern, noffsetText).
>>>>>>> In that case, I don't think it will be understood that patterns
>>>>>>> 'shorter'
>>>>>>> than the input will
>>>>>>> gobble up extra digits and ':'s.
>>>>>>>
>>>>>>> Roger
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
>>>>>>>
>>>>>>> Lenient can be however lenient we define it to be. Allowing minutes
>>>>>>> and seconds to be parsed when not specified in the pattern is 
>>>>>>> the key
>>>>>>> part of the change. Whether the parser copes with both colons and
>>>>>>> no-colons is the choice at hand here. It seems to me that since the
>>>>>>> parser can easily handle figuring out whether the colon is 
>>>>>>> present or
>>>>>>> not, we should just allow the parser to be fully lenient.
>>>>>>>
>>>>>>> Stephen
>>>>>>>
>>>>>>>
>>>>>>> On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs at oracle.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> HI Stephen,
>>>>>>>
>>>>>>> How lenient is lenient supposed to be? Looking at the offset test
>>>>>>> cases,
>>>>>>> it
>>>>>>> seems to allow  minutes
>>>>>>> and seconds digits to be parsed even if the pattern did not include
>>>>>>> them.
>>>>>>>
>>>>>>> + @DataProvider(name="lenientOffsetParseData")
>>>>>>> + Object[][] data_lenient_offset_parse() {
>>>>>>> + return new Object[][] {
>>>>>>> + {"+HH", "+01", 3600},
>>>>>>> + {"+HH", "+0101", 3660},
>>>>>>> + {"+HH", "+010101", 3661},
>>>>>>> + {"+HH", "+01", 3600},
>>>>>>> + {"+HH", "+01:01", 3660},
>>>>>>> + {"+HH", "+01:01:01", 3661},
>>>>>>>
>>>>>>> Thanks, Roger
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
>>>>>>>
>>>>>>> I don't think this is quite right.
>>>>>>>
>>>>>>> if ((length > position + 3) && (text.charAt(position + 3) == 
>>>>>>> ':')) {
>>>>>>>        parseType = 10;
>>>>>>> }
>>>>>>>
>>>>>>> This code will *always* select type 10 (colons) if a colon is 
>>>>>>> found at
>>>>>>> position+3. Whereas the spec now says that it should only do 
>>>>>>> this if
>>>>>>> the pattern is "HH". For other patterns, the colon/no-colon 
>>>>>>> choice is
>>>>>>> defined to be based on the pattern.
>>>>>>>
>>>>>>> That said, I'm thinking it is better to make the spec more 
>>>>>>> lenient to
>>>>>>> match the behaviour as implemented:
>>>>>>>
>>>>>>>
>>>>>>> When parsing in lenient mode, only the hours are mandatory - 
>>>>>>> minutes
>>>>>>> and seconds are optional. If the character after the hour digits 
>>>>>>> is a
>>>>>>> colon
>>>>>>> then the parser will parse using the pattern "HH:mm:ss", 
>>>>>>> otherwise the
>>>>>>> parser will parse using the pattern "HHmmss".
>>>>>>>
>>>>>>>
>>>>>>> Additional TCKDateTimeFormatterBuilder tests will be needed to
>>>>>>> demonstrate the above. There should also be a test for data 
>>>>>>> following
>>>>>>> the lenient parse. The following should all succeed:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); 
>>>>>>>
>>>>>>> "+01:00Europe/London"
>>>>>>> "+0100Europe/London"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); 
>>>>>>>
>>>>>>> "+01:Europe/London"
>>>>>>>
>>>>>>> Note this special case, where the colon affects the parse type, 
>>>>>>> but is
>>>>>>> not ultimately part of the offset, thus it is left to match the
>>>>>>> appendLiteral(":")
>>>>>>>
>>>>>>> You may want to think of some additional nasty edge cases!
>>>>>>>
>>>>>>> Stephen
>>>>>>>
>>>>>>> On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv at oracle.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>> Please see the updated webrev
>>>>>>> http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Nadeesh
>>>>>>>
>>>>>>> On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
>>>>>>>
>>>>>>> Thanks for the changes.
>>>>>>>
>>>>>>> In `DateTimeFormatter`, the code should be
>>>>>>>
>>>>>>> .parseLenient()
>>>>>>> .appendOffsetId()
>>>>>>> .parseStrict()
>>>>>>>
>>>>>>> and the same in the other case. This ensures that existing 
>>>>>>> callers who
>>>>>>> then embed the formatter in another formatter (like the
>>>>>>> ZONED_DATE_TIME constant) are unaffected.
>>>>>>>
>>>>>>>
>>>>>>> The logic for lenient parsing does not look right as it only 
>>>>>>> handles
>>>>>>> types 5 and 6. This table shows the mappings needed:
>>>>>>>
>>>>>>> "+HH",  -> "+HHmmss" or "+HH:mm:ss"
>>>>>>> "+HHmm",  -> "+HHmmss",
>>>>>>> "+HH:mm",  -> "+HH:mm:ss",
>>>>>>> "+HHMM",  -> "+HHmmss",
>>>>>>> "+HH:MM",  -> "+HH:mm:ss",
>>>>>>> "+HHMMss",  -> "+HHmmss",
>>>>>>> "+HH:MM:ss",  -> "+HH:mm:ss",
>>>>>>> "+HHMMSS",  -> "+HHmmss",
>>>>>>> "+HH:MM:SS",  -> "+HH:mm:ss",
>>>>>>> "+HHmmss",
>>>>>>> "+HH:mm:ss",
>>>>>>>
>>>>>>> Note that the "+HH" pattern is a special case, as we don't know
>>>>>>> whether to use the colon or non-colon pattern. Whether to require
>>>>>>> colon or not is based on whether the next character after the HH 
>>>>>>> is a
>>>>>>> colon or not.
>>>>>>>
>>>>>>> Proposed appendOffsetId() Javadoc:
>>>>>>>
>>>>>>> * Appends the zone offset, such as '+01:00', to the formatter.
>>>>>>> * <p>
>>>>>>> * This appends an instruction to format/parse the offset ID to the
>>>>>>> builder.
>>>>>>> * This is equivalent to calling {@code appendOffset("+HH:MM:ss", 
>>>>>>> "Z")}.
>>>>>>> * See {@link #appendOffset(String, String)} for details on 
>>>>>>> formatting
>>>>>>> and parsing.
>>>>>>>
>>>>>>> Proposed appendOffset(String, String) Javadoc:
>>>>>>>
>>>>>>> * During parsing, the offset...
>>>>>>>
>>>>>>> changed to:
>>>>>>>
>>>>>>> * When parsing in strict mode, the input must contain the mandatory
>>>>>>> and optional elements are defined by the specified pattern.
>>>>>>> * If the offset cannot be parsed then an exception is thrown unless
>>>>>>> the section of the formatter is optional.
>>>>>>> * <p>
>>>>>>> * When parsing in lenient mode, only the hours are mandatory - 
>>>>>>> minutes
>>>>>>> and seconds are optional.
>>>>>>> * The colons are required if the specified pattern contains a 
>>>>>>> colon.
>>>>>>> * If the specified pattern is "+HH", the presence of colons is
>>>>>>> determined by whether the character after the hour digits is a 
>>>>>>> colon
>>>>>>> or not.
>>>>>>> * If the offset cannot be parsed then an exception is thrown unless
>>>>>>> the section of the formatter is optional.
>>>>>>>
>>>>>>> thanks and sorry for delay
>>>>>>> Stephen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv at oracle.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Please review a fix for
>>>>>>>
>>>>>>> Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
>>>>>>>
>>>>>>> webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
>>>>>>>
>>>>>>> -- 
>>>>>>> Thanks and Regards,
>>>>>>> Nadeesh TV
>>>>>>>
>>>>>>> -- 
>>>>>>> Thanks and Regards,
>>>>>>> Nadeesh TV
>>>>>>>
>>>>>>>
>