URLConnection.guessContentTypeFromStream() does not support UTF8 and UTF32 with BOM
Charles Lee
littlee at linux.vnet.ibm.com
Mon Mar 7 17:07:45 PST 2011
On 03/07/2011 09:56 PM, Chris Hegarty wrote:
> I think Alan mentioned it in an earlier mail, this is a legacy API and
> not widely used. I can't say I've come across its use more than a
> handful of times in more than 10 years.
>
> If you have a small patch that resolves a particular issue then maybe
> we should just proceed with getting it in.
>
> -Chris.
>
> On 07/03/2011 06:57, Charles Lee wrote:
>> On 03/04/2011 07:13 PM, Alan Bateman wrote:
>>> Charles Lee wrote:
>>>> Hi Alan,
>>>>
>>>> Sorry for the late reply. This test case comes from a big test case,
>>>> which test more types of stream.
>>> A test for this method should pass if "null" is returned as the method
>>> does not specify the content types that it recognizes. I think it's
>>> okay to extend it as you are proposing but I don't think anyone can
>>> really rely on it. I'm happy to create an request-for-feature (RFE) in
>>> the bug database for this and help you get it in if you want to
>>> pursue it.
>>>
>>> -Alan
>> Thanks Alan. I am so confused the hard-coding in that method. It shows
>> it can only detect the a small range of type. Should some comment add on
>> the spec, such as "can not be relied"?
Hi Chris, what about:
patch:
diff --git src/share/classes/java/net/URLConnection.java
src/share/classes/java/net/URLConnection.java
--- src/share/classes/java/net/URLConnection.java
+++ src/share/classes/java/net/URLConnection.java
@@ -1422,7 +1422,7 @@
if (!is.markSupported())
return null;
- is.mark(12);
+ is.mark(16);
int c1 = is.read();
int c2 = is.read();
int c3 = is.read();
@@ -1434,6 +1434,11 @@
int c9 = is.read();
int c10 = is.read();
int c11 = is.read();
+ int c12 = is.read();
+ int c13 = is.read();
+ int c14 = is.read();
+ int c15 = is.read();
+ int c16 = is.read();
is.reset();
if (c1 == 0xCA&& c2 == 0xFE&& c3 == 0xBA&& c4 == 0xBE) {
@@ -1461,6 +1466,13 @@
}
}
+ // big and little endian UTF-8 encodings, with BOM
+ if (c1 == 0xef&& c2 == 0xbb&& c3 == 0xbf) {
+ if (c4 == '<'&& c5 == '?'&& c6 == 'x') {
+ return "application/xml";
+ }
+ }
+
// big and little endian UTF-16 encodings, with byte order mark
if (c1 == 0xfe&& c2 == 0xff) {
if (c3 == 0&& c4 == '<'&& c5 == 0&& c6 == '?'&&
@@ -1476,6 +1488,19 @@
}
}
+ // big and little endian UTF-32 encodings, with BOM
+ if (c1 == 0xff&& c2 == 0xfe&& c3 == 0x0&& c4 == 0x0) {
+ if (c5 == '<'&& c9 == '?'&& c13 == 'x') {
+ return "application/xml";
+ }
+ }
+
+ if (c1 == 0x0&& c2 == 0x0&& c3 == 0xfe&& c4 == 0xff) {
+ if (c8 == '<'&& c12 == '?'&& c16 == 'x') {
+ return "application/xml";
+ }
+ }
+
if (c1 == 'G'&& c2 == 'I'&& c3 == 'F'&& c4 == '8') {
return "image/gif";
}
More information about the net-dev
mailing list