Request for review: Race conditions in java.nio.charset.Charset

Thu Oct 8 04:40:39 UTC 2009

If you can show that a simple test program that appears to access
only 2 charsets in fact causes accesses to 3 or 4, that is a serious
problem with the 2-element cache.

People at Google are working on better caches,
but I don't think they are quite ready today.

Perhaps, instead of a small charset cache,
we could cache all the charsets, but for the
large charsets like GB18030, we could,
inside the charset implementation, cache the
large data tables using a soft reference, and recompute
as needed.  Then most of the static memory used
by an unused charset could be reclaimed.

In general, high quality caching is hard,
much harder than it looks.

Martin

On Wed, Oct 7, 2009 at 15:58, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:

>> I don't think it's worth a point fix here unless an actual wrong result
>> can be demonstrated.  I do think a more sophisticated charset cache
>> would be good, but hard to get right.
>>
>
> The other point is the size of the cache, see
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6795535.
> I have logged the usage of the Charset.lookup() method from a simple test
> which has only called ISO-8859-1 and IBM037 . As you can see, UTF-8 and
> cp1252 (default encoding on German Windows) is frequently requested from the
> VM, so IMO size 2 is too restrictive (note the different aliases UTF-8,
> utf-8 and UTF8):
> UTF-8
> utf-8
> UTF-8
> Cp1252
> UTF-8
> UTF-8
> UTF-8
> UTF-8
> UTF-8
> UTF-8
> UTF8
> UTF8
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> Cp1252
> UTF-8
> IBM037
> UTF-8
> UTF-8
> utf-8
> ISO-8859-1
> UTF-8
>
>
> -Ulf
>
>
>