URLConnection.guessContentTypeFromStream() does not support UTF8 and UTF32 with BOM

Charles Lee littlee at linux.vnet.ibm.com
Mon Mar 7 17:07:45 PST 2011


On 03/07/2011 09:56 PM, Chris Hegarty wrote:
> I think Alan mentioned it in an earlier mail, this is a legacy API and 
> not widely used. I can't say I've come across its use more than a 
> handful of times in more than 10 years.
>
> If you have a small patch that resolves a particular issue then maybe 
> we should just proceed with getting it in.
>
> -Chris.
>
> On 07/03/2011 06:57, Charles Lee wrote:
>>   On 03/04/2011 07:13 PM, Alan Bateman wrote:
>>> Charles Lee wrote:
>>>> Hi Alan,
>>>>
>>>> Sorry for the late reply. This test case comes from a big test case,
>>>> which test more types of stream.
>>> A test for this method should pass if "null" is returned as the method
>>> does not specify the content types that it recognizes. I think it's
>>> okay to extend it as you are proposing but I don't think anyone can
>>> really rely on it. I'm happy to create an request-for-feature (RFE) in
>>> the bug database for this and help you get it in if you want to 
>>> pursue it.
>>>
>>> -Alan
>> Thanks Alan. I am so confused the hard-coding in that method. It shows
>> it can only detect the a small range of type. Should some comment add on
>> the spec, such as "can not be relied"?
Hi Chris, what about:
patch:
diff --git src/share/classes/java/net/URLConnection.java 
src/share/classes/java/net/URLConnection.java
--- src/share/classes/java/net/URLConnection.java
+++ src/share/classes/java/net/URLConnection.java
@@ -1422,7 +1422,7 @@
          if (!is.markSupported())
              return null;

-        is.mark(12);
+        is.mark(16);
          int c1 = is.read();
          int c2 = is.read();
          int c3 = is.read();
@@ -1434,6 +1434,11 @@
          int c9 = is.read();
          int c10 = is.read();
          int c11 = is.read();
+    int c12 = is.read();
+    int c13 = is.read();
+    int c14 = is.read();
+    int c15 = is.read();
+    int c16 = is.read();
          is.reset();

          if (c1 == 0xCA&&  c2 == 0xFE&&  c3 == 0xBA&&  c4 == 0xBE) {
@@ -1461,6 +1466,13 @@
              }
          }

+    // big and little endian UTF-8 encodings, with BOM
+    if (c1 == 0xef&&  c2 == 0xbb&&  c3 == 0xbf) {
+        if (c4 == '<'&&  c5 == '?'&&  c6 == 'x') {
+        return "application/xml";
+        }
+    }
+
          // big and little endian UTF-16 encodings, with byte order mark
          if (c1 == 0xfe&&  c2 == 0xff) {
              if (c3 == 0&&  c4 == '<'&&  c5 == 0&&  c6 == '?'&&
@@ -1476,6 +1488,19 @@
              }
          }

+    // big and little endian UTF-32 encodings, with BOM
+    if (c1 == 0xff&&  c2 == 0xfe&&  c3 == 0x0&&  c4 == 0x0) {
+        if (c5 == '<'&&  c9 == '?'&&  c13 == 'x') {
+        return "application/xml";
+        }
+    }
+
+    if (c1 == 0x0&&  c2 == 0x0&&  c3 == 0xfe&&  c4 == 0xff) {
+        if (c8 == '<'&&  c12 == '?'&&  c16 == 'x') {
+        return "application/xml";
+        }
+    }
+
          if (c1 == 'G'&&  c2 == 'I'&&  c3 == 'F'&&  c4 == '8') {
              return "image/gif";
          }



More information about the net-dev mailing list