[threeten-dev] Lenient parsing of 2 and 4 digits values (years)

roger riggs roger.riggs at oracle.com
Sat Apr 20 13:29:34 PDT 2013


Hi,

I'm not sure I understand all of the logic/code around parseAdjacent mode.
It appears only to be active after a non-fixed width parser.

ReducePrinterParser is defined to be statically fixed width but the
setting of strict vs lenient isn't known until it is encountered during 
parsing.

As such, the mechanism for accumulating subsequentWidth is not active.
If it was then the ReducedPrinterParser could avoid eagerly consuming
all the digits and leave them for subsequent parsers.

If the current state of strict vs lenient was visible while building then
ReducedPrinterParser could be sensitive to that and be created
as fixed or variable width as appropriate.

I'll look at SimpleDateFormat...

Roger


On 4/20/2013 4:16 AM, Stephen Colebourne wrote:
> The CLDR document does not explicitly cover this
> http://www.unicode.org/reports/tr35/tr35-dates.html#Parsing_Dates_Times
>
> Personally, I'm not convinced that a single digit user entry "1" is
> clear enough to treat as either year 2001 or year 1. SimpleDateFormat
> does however allow this, and chooses to interpret is as year 1. If
> we're going to allow it, then it should match SimpleDateFormat, not
> what you currently have.
>
> If in adjacent parsing mode then the width needs to be fixed and not
> varying with leniency. This handles a pattern like ddMMyy.
>
> thanks
> Stephen
>
>
>
> On 19 April 2013 21:48, roger riggs <roger.riggs at oracle.com> wrote:
>> Hi Sherman,
>>
>> Yes, I missed updating some of the javadoc comments.
>>
>> To work effectively the reducedValue mechanism should be associated with
>> the "yy" patterns but the default is strict.  It seemed important to be able
>> to leverage
>> the locale specific date pattern info for the ordering and separators
>> of the fields.    I suppose the question is whether this kind of leniency
>> is a reasonable interpretation or usage within the CLDR descriptions.
>>
>> Expanding the definition of the reducedValue seemed like a reasonable
>> extension.
>>
>> Roger
>>
>>
>>
>> On 4/19/2013 3:20 PM, Xueming Shen wrote:
>>> I'm a little concerned of the approach of using the reducedValue
>>> parsing/lenient
>>> mode to address this 2 and 4 digits values issues, especially it
>>> introduces in the
>>> "uncertainty of" adding the baseValue or not based on the width of the
>>> input value,
>>> while the method itself is originally designed to add the baseValue for a
>>> "reduced"
>>> input. There is also words later says "This is a fixed width parser
>>> operating using
>>> 'adjacent value parsing'...", anything need to be wording here?
>>>
>>> -Sherman
>>>
>>> On 04/19/2013 11:26 AM, roger riggs wrote:
>>>> Hi,
>>>>
>>>> As noted in#218 Parsing 2 and 4 digit years
>>>> <https://github.com/ThreeTen/threeten/issues/218>, the reducedValue parsing
>>>> lenient mode should be defined to allow parsing of absolute values
>>>> greater than the defined field width.
>>>>
>>>> Webrev:
>>>>     http://cr.openjdk.java.net/~rriggs/webrev-lenient-parse-218/
>>>>
>>>> Modified DateTimeFormatter.appendValueReduced to define the behavior when
>>>> isLenient.
>>>>
>>>> Modified NumberPrinterParser in lenient mode to accept 1..19 digits.
>>>> Note, the API/ implementation only works for base < MAX_INT, ~ 10 digits
>>>> but the field size may be defined larger.
>>>> Removed width constraint for "fixed" fields unless strict.
>>>>
>>>> Modified ReducedPrinterParser.setValue to apply the base only
>>>> if isStrict or the value is less than range.
>>>>
>>>> It is not possible to enter negative values since ReducedPrinterParser
>>>> prohibits any sign character.  It should be considered to make the
>>>> sign parsing lenient also.  Currently entering a sign is prohibited in a
>>>> "fixed"
>>>> length field. The definition of fixed is diluted in this case in lenient
>>>> mode.
>>>>
>>>> It is also not possible to enter absolute values less than the width.
>>>> All small values are considered to be combined with the base.
>>>> The only interesting case though is year and the range of likely base
>>>> values is the current or past centuries.
>>>>
>>>> Thanks, Roger
>>>>



More information about the threeten-dev mailing list