<i18n dev> Date parsing issues with SimpleDateFormat and DateTimeFormatter

Lothar Kimmeringer job at kimmeringer.de
Tue Oct 10 10:47:58 UTC 2023


Hi,

my first mail in this list, so please be gentle ;-)

I've encountered issues when trying to keep date parsing functionality when
migrating from Java 8 to Java 11. This happened a while ago and I implemented
local workarounds but with installations using more recent versions of Java 11
things broke again so I'm not sure if I'm simply doing things wrong or if there
are actual bugs.

I've attached my JUnit-class that contains the different issues (not as single
tests but I will highlight them here in this mail. "Here" SimpleDateFormat is
used but I've added a test to use DateTimeFormatter to make sure that it's
not the use of old classes and that the problem persists in the new API as
well.

Most issues come up when trying to parse abbreviated months with Locales
different from "en". Our use case is that data with the same date layout
but different Locales are parsed (e.g. Ebay revenue summary CSV-files or FTP
servers on german Windows installations). The dates used there are of the form

  - Ebay: 18. Mär 2023
  - FTP server: Mär 14  2022

This worked well with MMM in the template till Java 8 then LLL got introduced
and MMM now leads to the use of four letters being used for the abbreviation
including a dot. Btw: I think the Javadoc that explains the template-parts
(e.g. in SimpleDateFormat) should have an additional column containing an example
for a non-EN-Locale, because

M    Month in year (context sensitive)  Month  July; Jul; 07
L    Month in year (standalone form)    Month  July; Jul; 07

isn't helping at all to see the effect of these two template-parts, so e.g.

M    Month ... (context...)             Month   January;...   Januar; Jan.; 01
M    Month ... (standalone...)          Month   January;...   Januar; Jan; 01

might be better for understanding it.

With the use of LLL all tests with dates without a dot can now be parsed
again using the same mask. But it's not possible to parse a date where
the month is always abbreviated with a dot in a consistent way, e.g.

23. Dez. 2016 11:12:13.456
using the template
dd. LLL. yyyy HH:mm:ss.SSS

It works with Locale en (with "Dec" as month of course) but not with "de".

Reason is that SimpleDateFormat is using all month display names when
parsing "month standalone". That also includes the abbreviated month including
dots. Because these months are in general longer than their standalone
counterparts (except three-letter months like "Mai" in german)
matchString considers this as best match, "consuming" the dot in the text to
be parsed which is now missing when the parsing continues.

DateTimeFormatter seem to work differently because it's not failing at that
point (haven't debugged it) but is failing when trying to parse russian dates
without abbreviating dots. I assume that is because the ru-Locale doesn't
seem to have values for the standalone month. I could live with that given
our user base but the parser in java.time runs into problems when parsing
a time with milliseconds: You need to provide as many "S" as there are digits
in the value:

  - "23. Dec. 2016 11:12:13.456" needs "dd. LLL. yyyy HH:mm:ss.SSS",
    it doesn't work with "dd. LLL. yyyy HH:mm:ss.S"
  - "23. Dec. 2016 11:12:13.4"   needs "dd. LLL. yyyy HH:mm:ss.S",
    it doesn't work with "dd. LLL. yyyy HH:mm:ss.SSS"

When handling data from different sources where one source is cutting
away trailing zeros and the other isn't you essentially need to parse
the date to be parsed to use the correct template being used for parsing.

SimpleDateFormat parses the date correctly independent from the
number of "S" in the template and the actual number of digits
in the text to be parsed.

While my lengthy explanation of the problems with LLL might result into
the answer "not a bug, go away" ;-) I definitely see the milliseconds with
java.time.* as one.


Thanks for reading this far and best regards,

Lothar Kimmeringer
-------------- next part --------------
import static org.junit.Assert.assertEquals;

import java.text.ParsePosition;
import java.text.SimpleDateFormat;
import java.time.DateTimeException;
import java.time.Instant;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.time.temporal.TemporalAccessor;
import java.util.Date;
import java.util.Locale;

import org.junit.Test;

public class __Test_LocaleDateParsing {

    @FunctionalInterface
    static interface DateParseFunction {
        Date executeParser(String toParse, String mask, Locale locale);
    }
    
    @Test
    public void testParsingWithSimpleDateFormatter() throws Exception {
        performTests(__Test_LocaleDateParsing::parseUsingSimpleDateFormat);
    }
    
    @Test
    public void testParsingWithDateTime() throws Exception {
        performTests(__Test_LocaleDateParsing::parseUsingDateTimeParsing);
    }
    
    void performTests(DateParseFunction parser) {
//        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-LLL-dd", new Locale("ru"));
//        System.out.println(sdf.format(new Date(113, 11, 11, 12, 34, 56)));
        assertEquals("check parsing of short month", "Tue Sep 12 00:00:00 CEST 2023", String.valueOf(parser.executeParser("12. Sep 2023", "dd. LLL yyyy", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Sun Mar 12 00:00:00 CET 2023", String.valueOf(parser.executeParser("12 Mar 2023", "dd LLL yyyy", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Tue Sep 12 00:00:00 CEST 2023", String.valueOf(parser.executeParser("12 Sep 2023", "dd LLL yyyy", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Sun Mar 12 00:00:00 CET 2023", String.valueOf(parser.executeParser("12. Mar 2023", "dd. LLL yyyy", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Tue Dec 12 00:00:00 CET 2023", String.valueOf(parser.executeParser("12. Dec 2023", "dd. LLL yyyy", new Locale("en", "US"))));

        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dec. 2016 11:12:13.456", "dd. LLL. yyyy HH:mm:ss.SSS", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dec. 2016 11:12:13.456", "dd. LLL. yyyy HH:mm:ss.SSS", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23.Dec.2016 11:12:13.456", "dd.LLL.yyyy HH:mm:ss.SSS", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dec. 2016 11:12:13.4", "dd. LLL. yyyy HH:mm:ss.S", new Locale("en", "US"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23.Dec.2016 11:12:13.4", "dd.LLL.yyyy HH:mm:ss.S", new Locale("en", "US"))));

        assertEquals("check parsing of short month", "Tue Sep 12 00:00:00 CEST 2023", String.valueOf(parser.executeParser("12. Sep 2023", "dd. LLL yyyy", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Sun Mar 12 00:00:00 CET 2023", String.valueOf(parser.executeParser("12 Mär 2023", "dd LLL yyyy", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Tue Sep 12 00:00:00 CEST 2023", String.valueOf(parser.executeParser("12 Sep 2023", "dd LLL yyyy", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Sun Mar 12 00:00:00 CET 2023", String.valueOf(parser.executeParser("12. Mär 2023", "dd. LLL yyyy", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Tue Dec 12 00:00:00 CET 2023", String.valueOf(parser.executeParser("12. Dez 2023", "dd. LLL yyyy", new Locale("de", "DE"))));

        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dez. 2016 11:12:13.456", "dd. LLL. yyyy HH:mm:ss.SSS", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dez. 2016 11:12:13.456", "dd. LLL. yyyy HH:mm:ss.SSS", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23.Dez.2016 11:12:13.456", "dd.LLL.yyyy HH:mm:ss.SSS", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dez. 2016 11:12:13.4", "dd. LLL. yyyy HH:mm:ss.S", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23.Dez.2016 11:12:13.4", "dd.LLL.yyyy HH:mm:ss.S", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Wed Dec 11 12:34:56 CET 2013", String.valueOf(parser.executeParser("11-дек-2013 12:34:56", "dd-LLL-yyyy HH:mm:ss", new Locale("ru"))));
        

        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dez. 2016 11:12:13.456", "dd. MMM yyyy HH:mm:ss.SSS", new Locale("de", "DE"))));
        assertEquals("check parsing of short month", "Fri Dec 23 11:12:13 CET 2016", String.valueOf(parser.executeParser("23. Dec. 2016 11:12:13.456", "dd. MMM yyyy HH:mm:ss.SSS", new Locale("en", "US"))));
    }
    
    private static Date parseUsingSimpleDateFormat(String toParse, String mask, Locale locale) {
        SimpleDateFormat formatter = new SimpleDateFormat(mask, locale);
        formatter.setLenient(false);
        ParsePosition pos = new ParsePosition(0);
        return formatter.parse(toParse, pos);
    }
    
    private static Date parseUsingDateTimeParsing(String toParse, String mask, Locale locale) {
        final DateTimeFormatter df = DateTimeFormatter.ofPattern(mask).withLocale(locale);
        TemporalAccessor res = df.parse(toParse);
        try {
            return Date.from(Instant.from(res));
        }
        catch(DateTimeException dte) {
            try {
                LocalDateTime ldt = LocalDateTime.from(res);
                return Date.from(ldt.atZone(ZoneId.systemDefault()).toInstant());
            }
            catch (DateTimeException dte2) {
//                dte2.printStackTrace();
                return Date.from(LocalDate.from(res).atStartOfDay(ZoneId.systemDefault()).toInstant());
            }
        }
    }
}


More information about the i18n-dev mailing list