RFR:JDK-8032051:"ZonedDateTime" class "parse" method fails with short time zone offset ("+01")
Hi all, Please review a fix for Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051 webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/ -- Thanks and Regards, Nadeesh TV
Gentle Reminder On 2/12/2016 1:52 AM, nadeesh tv wrote:
Hi all,
Please review a fix for
-- Thanks and Regards, Nadeesh TV
Hi Nadeesh, Sorry for the delay. DateTimeFrmatterBuilder.java: 3387: avoid testing isStrict twice. - refactor the if's so there is an outer if for context.isStrict == false and inner if's (or tertiary ?: assignment) for the replacement parseType. if (context.isStrict() == false) { if ( type == 6) { parseType = 10; } else if (type == 5) { parseType = 9; } } Otherwise looks fine. Thanks Roger On 2/22/2016 3:50 AM, nadeesh tv wrote:
Gentle Reminder On 2/12/2016 1:52 AM, nadeesh tv wrote:
Hi all,
Please review a fix for
Thanks for the changes. In `DateTimeFormatter`, the code should be .parseLenient() .appendOffsetId() .parseStrict() and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected. The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed: "+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss", Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not. Proposed appendOffsetId() Javadoc: * Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing. Proposed appendOffset(String, String) Javadoc: * During parsing, the offset... changed to: * When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. thanks and sorry for delay Stephen On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/ Thanks and Regards, Nadeesh On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
I don't think this is quite right. if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; } This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern. That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented: When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss". Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed: DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London" DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London" Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":") You may want to think of some additional nasty edge cases! Stephen On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
HI Stephen, How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them. + @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661}, Thanks, Roger On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient. Stephen On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
Hi Stephen, It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec? In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern. Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s. Roger On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec?
The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error. Lenient parsing typically accepts much more than the strict parsing. When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode. As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode). Stephen
In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern.
Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s.
Roger
On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
Hi Stephen, Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless. Roger On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec? The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern.
Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s.
Roger
On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
It is important to also consider the case where the user wants to format using HH:MM but parse seconds if they are provided. As I said above, this is no different to SignStyle, where the user requests something specific on format, but accepts anything on input. The pattern is still used for formatting and strict parsing under these changes. It is effectively ignored in lenient parsing (which is the very definition of leniency). Another way to look at it: using a pattern of HH:MM and strict: +02 - disallowed +02:00 - allowed +02:00:00 - disallowed using a pattern of HH:mm and strict: +02 - allowed +02:00 - allowed +02:00:00 - disallowed using any pattern and lenient: +02 - allowed +02:00 - allowed +02:00:00 - allowed This covers pretty much anything a user needs when parsing. Stephen On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless.
Roger
On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec?
The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern.
Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s.
Roger
On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
Hi Stephen, As a fix for the original issue[1], not correctly parsing a ISO defined offset, the use of lenient was a convenient implementation technique (hack). But with the expanded definition of lenient, it will allow many forms of the offset that are not allowed by the ISO specification and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME. In particular, ISO requires the ":" to separate the minutes. I'm not sure how to correctly fix the original issue with the new specification of lenient offset parsing without introducing some more specific implementation information. WRT the lenient parsing mode for appendOffset: I was considering that the subfields of the offset were to be treated leniently but it seems you were treating the entire offset field and text as the unit to be treated leniently. The spec for lenient parsing would be clearer if it were specified as allowing any of the patterns of appendOffset. The current wording around the character after the hour may be confusing. In the specification of appendOffset(pattern, noOffsetText) how about: "When parsing in lenient mode, the longest valid pattern that matches the input is used. Only the hours are mandatory, minutes and seconds are optional." Roger [1] https://bugs.openjdk.java.net/browse/JDK-8032051 On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
It is important to also consider the case where the user wants to format using HH:MM but parse seconds if they are provided.
As I said above, this is no different to SignStyle, where the user requests something specific on format, but accepts anything on input.
The pattern is still used for formatting and strict parsing under these changes. It is effectively ignored in lenient parsing (which is the very definition of leniency).
Another way to look at it:
using a pattern of HH:MM and strict: +02 - disallowed +02:00 - allowed +02:00:00 - disallowed
using a pattern of HH:mm and strict: +02 - allowed +02:00 - allowed +02:00:00 - disallowed
using any pattern and lenient: +02 - allowed +02:00 - allowed +02:00:00 - allowed
This covers pretty much anything a user needs when parsing.
Stephen
On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless.
Roger
On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec? The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern.
Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s.
Roger
On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
I'm happy to go back to the spec I proposed before. That spec would determine colons dynamically only for pattern HH. Otherwise, it would use the presence/absence of a colon in the pattern as the signal. That would deal with the ISO-8601 problem and resolve the original issue (as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse using colons). Writing the spec wording is not easy however. I had: When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. The colons are required if the specified pattern contains a colon. If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. which isn't too bad but alternatives are possible. Stephen On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
As a fix for the original issue[1], not correctly parsing a ISO defined offset, the use of lenient was a convenient implementation technique (hack). But with the expanded definition of lenient, it will allow many forms of the offset that are not allowed by the ISO specification and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME. In particular, ISO requires the ":" to separate the minutes. I'm not sure how to correctly fix the original issue with the new specification of lenient offset parsing without introducing some more specific implementation information.
WRT the lenient parsing mode for appendOffset:
I was considering that the subfields of the offset were to be treated leniently but it seems you were treating the entire offset field and text as the unit to be treated leniently. The spec for lenient parsing would be clearer if it were specified as allowing any of the patterns of appendOffset. The current wording around the character after the hour may be confusing.
In the specification of appendOffset(pattern, noOffsetText) how about:
"When parsing in lenient mode, the longest valid pattern that matches the input is used. Only the hours are mandatory, minutes and seconds are optional."
Roger
[1] https://bugs.openjdk.java.net/browse/JDK-8032051
On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
It is important to also consider the case where the user wants to format using HH:MM but parse seconds if they are provided.
As I said above, this is no different to SignStyle, where the user requests something specific on format, but accepts anything on input.
The pattern is still used for formatting and strict parsing under these changes. It is effectively ignored in lenient parsing (which is the very definition of leniency).
Another way to look at it:
using a pattern of HH:MM and strict: +02 - disallowed +02:00 - allowed +02:00:00 - disallowed
using a pattern of HH:mm and strict: +02 - allowed +02:00 - allowed +02:00:00 - disallowed
using any pattern and lenient: +02 - allowed +02:00 - allowed +02:00:00 - allowed
This covers pretty much anything a user needs when parsing.
Stephen
On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless.
Roger
On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec?
The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern.
Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s.
Roger
On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
Hi, Stephen, Roger Thanks for the comments. Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.04/ Regards, Nadeesh On 3/1/2016 12:29 AM, Stephen Colebourne wrote:
I'm happy to go back to the spec I proposed before. That spec would determine colons dynamically only for pattern HH. Otherwise, it would use the presence/absence of a colon in the pattern as the signal. That would deal with the ISO-8601 problem and resolve the original issue (as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse using colons).
Writing the spec wording is not easy however. I had:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. The colons are required if the specified pattern contains a colon. If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
which isn't too bad but alternatives are possible.
Stephen
On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
As a fix for the original issue[1], not correctly parsing a ISO defined offset, the use of lenient was a convenient implementation technique (hack). But with the expanded definition of lenient, it will allow many forms of the offset that are not allowed by the ISO specification and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME. In particular, ISO requires the ":" to separate the minutes. I'm not sure how to correctly fix the original issue with the new specification of lenient offset parsing without introducing some more specific implementation information.
WRT the lenient parsing mode for appendOffset:
I was considering that the subfields of the offset were to be treated leniently but it seems you were treating the entire offset field and text as the unit to be treated leniently. The spec for lenient parsing would be clearer if it were specified as allowing any of the patterns of appendOffset. The current wording around the character after the hour may be confusing.
In the specification of appendOffset(pattern, noOffsetText) how about:
"When parsing in lenient mode, the longest valid pattern that matches the input is used. Only the hours are mandatory, minutes and seconds are optional."
Roger
[1] https://bugs.openjdk.java.net/browse/JDK-8032051
On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
It is important to also consider the case where the user wants to format using HH:MM but parse seconds if they are provided.
As I said above, this is no different to SignStyle, where the user requests something specific on format, but accepts anything on input.
The pattern is still used for formatting and strict parsing under these changes. It is effectively ignored in lenient parsing (which is the very definition of leniency).
Another way to look at it:
using a pattern of HH:MM and strict: +02 - disallowed +02:00 - allowed +02:00:00 - disallowed
using a pattern of HH:mm and strict: +02 - allowed +02:00 - allowed +02:00:00 - disallowed
using any pattern and lenient: +02 - allowed +02:00 - allowed +02:00:00 - allowed
This covers pretty much anything a user needs when parsing.
Stephen
On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless.
Roger
On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
It does not seem natural to me with a pattern of HHMM for it to parse more than 4 digits. I can see lenient modifying the behavior as it it were HHmm, but there is no indication in the pattern that seconds would be considered. How it would it be implied from the spec? The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
In the original issue, appendOffsetId is defined as using the +HH:MM:ss pattern and specific to ISO the MM should be allowed to be optional. There is no question of parsing extra digits not included in the requested pattern.
Separately, this is specifying the new lenient behavior of appendOffset(pattern, noffsetText). In that case, I don't think it will be understood that patterns 'shorter' than the input will gobble up extra digits and ':'s.
Roger
On 2/26/2016 9:42 AM, Stephen Colebourne wrote:
Lenient can be however lenient we define it to be. Allowing minutes and seconds to be parsed when not specified in the pattern is the key part of the change. Whether the parser copes with both colons and no-colons is the choice at hand here. It seems to me that since the parser can easily handle figuring out whether the colon is present or not, we should just allow the parser to be fully lenient.
Stephen
On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> wrote:
HI Stephen,
How lenient is lenient supposed to be? Looking at the offset test cases, it seems to allow minutes and seconds digits to be parsed even if the pattern did not include them.
+ @DataProvider(name="lenientOffsetParseData") + Object[][] data_lenient_offset_parse() { + return new Object[][] { + {"+HH", "+01", 3600}, + {"+HH", "+0101", 3660}, + {"+HH", "+010101", 3661}, + {"+HH", "+01", 3600}, + {"+HH", "+01:01", 3660}, + {"+HH", "+01:01:01", 3661},
Thanks, Roger
On 2/26/2016 6:16 AM, Stephen Colebourne wrote:
I don't think this is quite right.
if ((length > position + 3) && (text.charAt(position + 3) == ':')) { parseType = 10; }
This code will *always* select type 10 (colons) if a colon is found at position+3. Whereas the spec now says that it should only do this if the pattern is "HH". For other patterns, the colon/no-colon choice is defined to be based on the pattern.
That said, I'm thinking it is better to make the spec more lenient to match the behaviour as implemented:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. If the character after the hour digits is a colon then the parser will parse using the pattern "HH:mm:ss", otherwise the parser will parse using the pattern "HHmmss".
Additional TCKDateTimeFormatterBuilder tests will be needed to demonstrate the above. There should also be a test for data following the lenient parse. The following should all succeed:
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); "+01:00Europe/London" "+0100Europe/London"
DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); "+01:Europe/London"
Note this special case, where the colon affects the parse type, but is not ultimately part of the offset, thus it is left to match the appendLiteral(":")
You may want to think of some additional nasty edge cases!
Stephen
On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all, Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.02/
Thanks and Regards, Nadeesh
On 2/23/2016 5:17 PM, Stephen Colebourne wrote:
Thanks for the changes.
In `DateTimeFormatter`, the code should be
.parseLenient() .appendOffsetId() .parseStrict()
and the same in the other case. This ensures that existing callers who then embed the formatter in another formatter (like the ZONED_DATE_TIME constant) are unaffected.
The logic for lenient parsing does not look right as it only handles types 5 and 6. This table shows the mappings needed:
"+HH", -> "+HHmmss" or "+HH:mm:ss" "+HHmm", -> "+HHmmss", "+HH:mm", -> "+HH:mm:ss", "+HHMM", -> "+HHmmss", "+HH:MM", -> "+HH:mm:ss", "+HHMMss", -> "+HHmmss", "+HH:MM:ss", -> "+HH:mm:ss", "+HHMMSS", -> "+HHmmss", "+HH:MM:SS", -> "+HH:mm:ss", "+HHmmss", "+HH:mm:ss",
Note that the "+HH" pattern is a special case, as we don't know whether to use the colon or non-colon pattern. Whether to require colon or not is based on whether the next character after the HH is a colon or not.
Proposed appendOffsetId() Javadoc:
* Appends the zone offset, such as '+01:00', to the formatter. * <p> * This appends an instruction to format/parse the offset ID to the builder. * This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}. * See {@link #appendOffset(String, String)} for details on formatting and parsing.
Proposed appendOffset(String, String) Javadoc:
* During parsing, the offset...
changed to:
* When parsing in strict mode, the input must contain the mandatory and optional elements are defined by the specified pattern. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional. * <p> * When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. * The colons are required if the specified pattern contains a colon. * If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. * If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
thanks and sorry for delay Stephen
On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi all,
Please review a fix for
Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051
webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
-- Thanks and Regards, Nadeesh TV
Hi Nadeesh, Looks good. Thanks, Roger On 3/3/2016 1:37 PM, nadeesh tv wrote:
Hi,
Stephen, Roger Thanks for the comments.
Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.04/
Regards, Nadeesh
On 3/1/2016 12:29 AM, Stephen Colebourne wrote:
I'm happy to go back to the spec I proposed before. That spec would determine colons dynamically only for pattern HH. Otherwise, it would use the presence/absence of a colon in the pattern as the signal. That would deal with the ISO-8601 problem and resolve the original issue (as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse using colons).
Writing the spec wording is not easy however. I had:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. The colons are required if the specified pattern contains a colon. If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
which isn't too bad but alternatives are possible.
Stephen
On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
As a fix for the original issue[1], not correctly parsing a ISO defined offset, the use of lenient was a convenient implementation technique (hack). But with the expanded definition of lenient, it will allow many forms of the offset that are not allowed by the ISO specification and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME. In particular, ISO requires the ":" to separate the minutes. I'm not sure how to correctly fix the original issue with the new specification of lenient offset parsing without introducing some more specific implementation information.
WRT the lenient parsing mode for appendOffset:
I was considering that the subfields of the offset were to be treated leniently but it seems you were treating the entire offset field and text as the unit to be treated leniently. The spec for lenient parsing would be clearer if it were specified as allowing any of the patterns of appendOffset. The current wording around the character after the hour may be confusing.
In the specification of appendOffset(pattern, noOffsetText) how about:
"When parsing in lenient mode, the longest valid pattern that matches the input is used. Only the hours are mandatory, minutes and seconds are optional."
Roger
[1] https://bugs.openjdk.java.net/browse/JDK-8032051
On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
It is important to also consider the case where the user wants to format using HH:MM but parse seconds if they are provided.
As I said above, this is no different to SignStyle, where the user requests something specific on format, but accepts anything on input.
The pattern is still used for formatting and strict parsing under these changes. It is effectively ignored in lenient parsing (which is the very definition of leniency).
Another way to look at it:
using a pattern of HH:MM and strict: +02 - disallowed +02:00 - allowed +02:00:00 - disallowed
using a pattern of HH:mm and strict: +02 - allowed +02:00 - allowed +02:00:00 - disallowed
using any pattern and lenient: +02 - allowed +02:00 - allowed +02:00:00 - allowed
This covers pretty much anything a user needs when parsing.
Stephen
On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless.
Roger
On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote: > Hi Stephen, > > It does not seem natural to me with a pattern of HHMM for it to > parse > more > than 4 digits. > I can see lenient modifying the behavior as it it were HHmm, but > there > is > no > indication in the pattern > that seconds would be considered. How it would it be implied > from the > spec? The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
> In the original issue, appendOffsetId is defined as using the > +HH:MM:ss > pattern and > specific to ISO the MM should be allowed to be optional. There > is no > question of parsing > extra digits not included in the requested pattern. > > Separately, this is specifying the new lenient behavior of > appendOffset(pattern, noffsetText). > In that case, I don't think it will be understood that patterns > 'shorter' > than the input will > gobble up extra digits and ':'s. > > Roger > > > > > > On 2/26/2016 9:42 AM, Stephen Colebourne wrote: > > Lenient can be however lenient we define it to be. Allowing minutes > and seconds to be parsed when not specified in the pattern is > the key > part of the change. Whether the parser copes with both colons and > no-colons is the choice at hand here. It seems to me that since the > parser can easily handle figuring out whether the colon is > present or > not, we should just allow the parser to be fully lenient. > > Stephen > > > On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> > wrote: > > HI Stephen, > > How lenient is lenient supposed to be? Looking at the offset test > cases, > it > seems to allow minutes > and seconds digits to be parsed even if the pattern did not include > them. > > + @DataProvider(name="lenientOffsetParseData") > + Object[][] data_lenient_offset_parse() { > + return new Object[][] { > + {"+HH", "+01", 3600}, > + {"+HH", "+0101", 3660}, > + {"+HH", "+010101", 3661}, > + {"+HH", "+01", 3600}, > + {"+HH", "+01:01", 3660}, > + {"+HH", "+01:01:01", 3661}, > > Thanks, Roger > > > > On 2/26/2016 6:16 AM, Stephen Colebourne wrote: > > I don't think this is quite right. > > if ((length > position + 3) && (text.charAt(position + 3) == > ':')) { > parseType = 10; > } > > This code will *always* select type 10 (colons) if a colon is > found at > position+3. Whereas the spec now says that it should only do > this if > the pattern is "HH". For other patterns, the colon/no-colon > choice is > defined to be based on the pattern. > > That said, I'm thinking it is better to make the spec more > lenient to > match the behaviour as implemented: > > > When parsing in lenient mode, only the hours are mandatory - > minutes > and seconds are optional. If the character after the hour digits > is a > colon > then the parser will parse using the pattern "HH:mm:ss", > otherwise the > parser will parse using the pattern "HHmmss". > > > Additional TCKDateTimeFormatterBuilder tests will be needed to > demonstrate the above. There should also be a test for data > following > the lenient parse. The following should all succeed: > > > > > DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); > > "+01:00Europe/London" > "+0100Europe/London" > > > > > DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); > > "+01:Europe/London" > > Note this special case, where the colon affects the parse type, > but is > not ultimately part of the offset, thus it is left to match the > appendLiteral(":") > > You may want to think of some additional nasty edge cases! > > Stephen > > On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> > wrote: > > Hi all, > Please see the updated webrev > http://cr.openjdk.java.net/~ntv/8032051/webrev.02/ > > Thanks and Regards, > Nadeesh > > On 2/23/2016 5:17 PM, Stephen Colebourne wrote: > > Thanks for the changes. > > In `DateTimeFormatter`, the code should be > > .parseLenient() > .appendOffsetId() > .parseStrict() > > and the same in the other case. This ensures that existing > callers who > then embed the formatter in another formatter (like the > ZONED_DATE_TIME constant) are unaffected. > > > The logic for lenient parsing does not look right as it only > handles > types 5 and 6. This table shows the mappings needed: > > "+HH", -> "+HHmmss" or "+HH:mm:ss" > "+HHmm", -> "+HHmmss", > "+HH:mm", -> "+HH:mm:ss", > "+HHMM", -> "+HHmmss", > "+HH:MM", -> "+HH:mm:ss", > "+HHMMss", -> "+HHmmss", > "+HH:MM:ss", -> "+HH:mm:ss", > "+HHMMSS", -> "+HHmmss", > "+HH:MM:SS", -> "+HH:mm:ss", > "+HHmmss", > "+HH:mm:ss", > > Note that the "+HH" pattern is a special case, as we don't know > whether to use the colon or non-colon pattern. Whether to require > colon or not is based on whether the next character after the HH > is a > colon or not. > > Proposed appendOffsetId() Javadoc: > > * Appends the zone offset, such as '+01:00', to the formatter. > * <p> > * This appends an instruction to format/parse the offset ID to the > builder. > * This is equivalent to calling {@code appendOffset("+HH:MM:ss", > "Z")}. > * See {@link #appendOffset(String, String)} for details on > formatting > and parsing. > > Proposed appendOffset(String, String) Javadoc: > > * During parsing, the offset... > > changed to: > > * When parsing in strict mode, the input must contain the mandatory > and optional elements are defined by the specified pattern. > * If the offset cannot be parsed then an exception is thrown unless > the section of the formatter is optional. > * <p> > * When parsing in lenient mode, only the hours are mandatory - > minutes > and seconds are optional. > * The colons are required if the specified pattern contains a > colon. > * If the specified pattern is "+HH", the presence of colons is > determined by whether the character after the hour digits is a > colon > or not. > * If the offset cannot be parsed then an exception is thrown unless > the section of the formatter is optional. > > thanks and sorry for delay > Stephen > > > > On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> > wrote: > > Hi all, > > Please review a fix for > > Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051 > > webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/ > > -- > Thanks and Regards, > Nadeesh TV > > -- > Thanks and Regards, > Nadeesh TV > >
In TCKDateTimeFormatterBuilder there is a commented out line in one of the new tests. Should be removed. No need for another review for that - happy otherwise. thanks for the work! Stephen On 3 March 2016 at 18:37, nadeesh tv <nadeesh.tv@oracle.com> wrote:
Hi,
Stephen, Roger Thanks for the comments.
Please see the updated webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.04/
Regards, Nadeesh
On 3/1/2016 12:29 AM, Stephen Colebourne wrote:
I'm happy to go back to the spec I proposed before. That spec would determine colons dynamically only for pattern HH. Otherwise, it would use the presence/absence of a colon in the pattern as the signal. That would deal with the ISO-8601 problem and resolve the original issue (as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse using colons).
Writing the spec wording is not easy however. I had:
When parsing in lenient mode, only the hours are mandatory - minutes and seconds are optional. The colons are required if the specified pattern contains a colon. If the specified pattern is "+HH", the presence of colons is determined by whether the character after the hour digits is a colon or not. If the offset cannot be parsed then an exception is thrown unless the section of the formatter is optional.
which isn't too bad but alternatives are possible.
Stephen
On 29 February 2016 at 15:52, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
As a fix for the original issue[1], not correctly parsing a ISO defined offset, the use of lenient was a convenient implementation technique (hack). But with the expanded definition of lenient, it will allow many forms of the offset that are not allowed by the ISO specification and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME. In particular, ISO requires the ":" to separate the minutes. I'm not sure how to correctly fix the original issue with the new specification of lenient offset parsing without introducing some more specific implementation information.
WRT the lenient parsing mode for appendOffset:
I was considering that the subfields of the offset were to be treated leniently but it seems you were treating the entire offset field and text as the unit to be treated leniently. The spec for lenient parsing would be clearer if it were specified as allowing any of the patterns of appendOffset. The current wording around the character after the hour may be confusing.
In the specification of appendOffset(pattern, noOffsetText) how about:
"When parsing in lenient mode, the longest valid pattern that matches the input is used. Only the hours are mandatory, minutes and seconds are optional."
Roger
[1] https://bugs.openjdk.java.net/browse/JDK-8032051
On 2/26/2016 1:10 PM, Stephen Colebourne wrote:
It is important to also consider the case where the user wants to format using HH:MM but parse seconds if they are provided.
As I said above, this is no different to SignStyle, where the user requests something specific on format, but accepts anything on input.
The pattern is still used for formatting and strict parsing under these changes. It is effectively ignored in lenient parsing (which is the very definition of leniency).
Another way to look at it:
using a pattern of HH:MM and strict: +02 - disallowed +02:00 - allowed +02:00:00 - disallowed
using a pattern of HH:mm and strict: +02 - allowed +02:00 - allowed +02:00:00 - disallowed
using any pattern and lenient: +02 - allowed +02:00 - allowed +02:00:00 - allowed
This covers pretty much anything a user needs when parsing.
Stephen
On 26 February 2016 at 17:38, Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Stephen,
Even in lenient mode the parser needs to stick to the fields provided in the pattern. If the caller intends to parse seconds, the pattern should include seconds. Otherwise the caller has not been able to specify their intent. That's consistent with lenient mode used in the other fields. Otherwise, the pattern is irrelevant except for whether it contains a ":" and makes the spec nearly useless.
Roger
On 2/26/2016 12:09 PM, Stephen Colebourne wrote:
On 26 February 2016 at 15:00, Roger Riggs <Roger.Riggs@oracle.com> wrote: > > Hi Stephen, > > It does not seem natural to me with a pattern of HHMM for it to parse > more > than 4 digits. > I can see lenient modifying the behavior as it it were HHmm, but > there > is > no > indication in the pattern > that seconds would be considered. How it would it be implied from > the > spec?
The spec is being expanded to define what happens. Previously it didn't define it at all, and would throw an error.
Lenient parsing typically accepts much more than the strict parsing.
When parsing numbers, you may set the SignStyle to NEVER, but the sign will still be parsed in lenient mode
When parsing text, you may select the short output format, but any length of text will be parsed in lenient mode.
As such, it is very much in line with the behavour of the API to parse a much broader format than the one requested in lenient mode. (None of this affects strict mode).
Stephen
> In the original issue, appendOffsetId is defined as using the > +HH:MM:ss > pattern and > specific to ISO the MM should be allowed to be optional. There is > no > question of parsing > extra digits not included in the requested pattern. > > Separately, this is specifying the new lenient behavior of > appendOffset(pattern, noffsetText). > In that case, I don't think it will be understood that patterns > 'shorter' > than the input will > gobble up extra digits and ':'s. > > Roger > > > > > > On 2/26/2016 9:42 AM, Stephen Colebourne wrote: > > Lenient can be however lenient we define it to be. Allowing minutes > and seconds to be parsed when not specified in the pattern is the key > part of the change. Whether the parser copes with both colons and > no-colons is the choice at hand here. It seems to me that since the > parser can easily handle figuring out whether the colon is present or > not, we should just allow the parser to be fully lenient. > > Stephen > > > On 26 February 2016 at 14:15, Roger Riggs <Roger.Riggs@oracle.com> > wrote: > > HI Stephen, > > How lenient is lenient supposed to be? Looking at the offset test > cases, > it > seems to allow minutes > and seconds digits to be parsed even if the pattern did not include > them. > > + @DataProvider(name="lenientOffsetParseData") > + Object[][] data_lenient_offset_parse() { > + return new Object[][] { > + {"+HH", "+01", 3600}, > + {"+HH", "+0101", 3660}, > + {"+HH", "+010101", 3661}, > + {"+HH", "+01", 3600}, > + {"+HH", "+01:01", 3660}, > + {"+HH", "+01:01:01", 3661}, > > Thanks, Roger > > > > On 2/26/2016 6:16 AM, Stephen Colebourne wrote: > > I don't think this is quite right. > > if ((length > position + 3) && (text.charAt(position + 3) == ':')) { > parseType = 10; > } > > This code will *always* select type 10 (colons) if a colon is found > at > position+3. Whereas the spec now says that it should only do this if > the pattern is "HH". For other patterns, the colon/no-colon choice is > defined to be based on the pattern. > > That said, I'm thinking it is better to make the spec more lenient to > match the behaviour as implemented: > > > When parsing in lenient mode, only the hours are mandatory - minutes > and seconds are optional. If the character after the hour digits is a > colon > then the parser will parse using the pattern "HH:mm:ss", otherwise > the > parser will parse using the pattern "HHmmss". > > > Additional TCKDateTimeFormatterBuilder tests will be needed to > demonstrate the above. There should also be a test for data following > the lenient parse. The following should all succeed: > > > > > > DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId(); > "+01:00Europe/London" > "+0100Europe/London" > > > > > > DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId(); > "+01:Europe/London" > > Note this special case, where the colon affects the parse type, but > is > not ultimately part of the offset, thus it is left to match the > appendLiteral(":") > > You may want to think of some additional nasty edge cases! > > Stephen > > On 25 February 2016 at 15:44, nadeesh tv <nadeesh.tv@oracle.com> > wrote: > > Hi all, > Please see the updated webrev > http://cr.openjdk.java.net/~ntv/8032051/webrev.02/ > > Thanks and Regards, > Nadeesh > > On 2/23/2016 5:17 PM, Stephen Colebourne wrote: > > Thanks for the changes. > > In `DateTimeFormatter`, the code should be > > .parseLenient() > .appendOffsetId() > .parseStrict() > > and the same in the other case. This ensures that existing callers > who > then embed the formatter in another formatter (like the > ZONED_DATE_TIME constant) are unaffected. > > > The logic for lenient parsing does not look right as it only handles > types 5 and 6. This table shows the mappings needed: > > "+HH", -> "+HHmmss" or "+HH:mm:ss" > "+HHmm", -> "+HHmmss", > "+HH:mm", -> "+HH:mm:ss", > "+HHMM", -> "+HHmmss", > "+HH:MM", -> "+HH:mm:ss", > "+HHMMss", -> "+HHmmss", > "+HH:MM:ss", -> "+HH:mm:ss", > "+HHMMSS", -> "+HHmmss", > "+HH:MM:SS", -> "+HH:mm:ss", > "+HHmmss", > "+HH:mm:ss", > > Note that the "+HH" pattern is a special case, as we don't know > whether to use the colon or non-colon pattern. Whether to require > colon or not is based on whether the next character after the HH is a > colon or not. > > Proposed appendOffsetId() Javadoc: > > * Appends the zone offset, such as '+01:00', to the formatter. > * <p> > * This appends an instruction to format/parse the offset ID to the > builder. > * This is equivalent to calling {@code appendOffset("+HH:MM:ss", > "Z")}. > * See {@link #appendOffset(String, String)} for details on formatting > and parsing. > > Proposed appendOffset(String, String) Javadoc: > > * During parsing, the offset... > > changed to: > > * When parsing in strict mode, the input must contain the mandatory > and optional elements are defined by the specified pattern. > * If the offset cannot be parsed then an exception is thrown unless > the section of the formatter is optional. > * <p> > * When parsing in lenient mode, only the hours are mandatory - > minutes > and seconds are optional. > * The colons are required if the specified pattern contains a colon. > * If the specified pattern is "+HH", the presence of colons is > determined by whether the character after the hour digits is a colon > or not. > * If the offset cannot be parsed then an exception is thrown unless > the section of the formatter is optional. > > thanks and sorry for delay > Stephen > > > > On 11 February 2016 at 20:22, nadeesh tv <nadeesh.tv@oracle.com> > wrote: > > Hi all, > > Please review a fix for > > Bug Id https://bugs.openjdk.java.net/browse/JDK-8032051 > > webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/ > > -- > Thanks and Regards, > Nadeesh TV > > -- > Thanks and Regards, > Nadeesh TV > >
-- Thanks and Regards, Nadeesh TV
participants (3)
-
nadeesh tv
-
Roger Riggs
-
Stephen Colebourne