Very long response headers and java.net.http.HttpClient?
robert engels
rengels at ix.netcom.com
Mon Jul 29 22:37:57 UTC 2024
I have found that the OpenJDK net team is very open to receiving patches.
Have you filed an issue that has been accepted? This is usually the first step.
> On Jul 29, 2024, at 5:20 PM, Andy Boothe <andy.boothe at gmail.com> wrote:
>
> First, thank you both for the responses. I know how busy everyone is, and I really appreciate the time.
>
> We can talk about use cases and architecture and such, but I think we all agree that a developer should be able to make an HTTP request with HttpClient without worrying about whether or not it will cause an OOM. Or, at least, that whether or not it causes an OOM should be fully within their control. And that's not where this implementation is right now.
>
> I’d be very happy to work on a fix for this. Would it be out of order for me to propose a patch?
>
> Andy Boothe
> Email: andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>
> Mobile: (979) 574-1089
>
>
> On Mon, Jul 29, 2024 at 3:57 PM robert engels <rengels at ix.netcom.com <mailto:rengels at ix.netcom.com>> wrote:
> Yes, but normally you fork a worker process that tracks progress and scrapes N sites. If the worker process dies processing a site, the site is marked “bad” and only periodically scraped after a retry/backoff period.
>
> There are probably a lot of ways to crash a worker process, intentionally or accidentally - a robust design is called for.
>
> As an aside, if I was writing a large scale scraper I don’t think I would use HttpClient anyway - I think a custom url accessor would be easier to monitor, etc.
>
>> On Jul 29, 2024, at 3:43 PM, Ethan McCue <ethan at mccue.dev <mailto:ethan at mccue.dev>> wrote:
>>
>> Scraping of unknown/untrusted websites is a common task in certain...fields? I don't want to comment on it too deeply, but I know that is something folks would do.
>>
>> Imagine a site where someone inputs a URL, clicks submit, and then with the power of funding they return a summary of the page.
>>
>> On Mon, Jul 29, 2024, 3:52 PM robert engels <rengels at ix.netcom.com <mailto:rengels at ix.netcom.com>> wrote:
>> Isn’t the HttpClient almost always used to access other services?
>>
>> Why would a developer access a malicious service?
>>
>> I also think there are lots of ways for a service to crash the client - .e.g it could attempt to return a very large response - if the client uses a memory buffered reader, it will cause an OOM as well.
>>
>>> On Jul 29, 2024, at 2:42 PM, Andy Boothe <andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>> wrote:
>>>
>>> Following up here.
>>>
>>> I believe I have discovered that it is possible to craft a malicious HTTP response that can cause the built-in HttpURLConnection and HttpClient implementations to throw exceptions. Specifically, HttpURLConnection can be made to throw a NegativeArraySizeException, and HttpClient can be made to throw an OutOfMemoryError. Proof of this behavior is in the attached (very simple) Java programs.
>>>
>>> This seems like A Bad Thing to me.
>>>
>>> I've moved from the dev list to this list based on a recommendation from that list. Is this the right list? If not, can you point me in the right direction? Perhaps a security list?
>>>
>>> Thank you,
>>>
>>> Andy Boothe
>>> Email: andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>
>>> Mobile: (979) 574-1089
>>> On Wed, Jul 24, 2024 at 4:47 PM Andy Boothe <andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>> wrote:
>>> Hello,
>>>
>>> I'm moving this thread from jdk-dev to this list on the sage advice of Pavel Rappo.
>>>
>>> As a brief recap, it looks like HttpClient and HttpURLConnection do not currently support a way to set the maximum acceptable response header length. As a result, sending HTTP requests with these classes that result in a response with very long headers causes an OutOfMemoryError and a NegativeArraySizeException, respectively. (Simple programs for reproducing the issue are attached.) This seems like A Bad Thing. There is a (very brief) discussion in the thread about how to handle, but of course you guys are the experts.
>>>
>>> If my head is on straight and this turns out to be a real issue as opposed to a mistake on my part, I'm keen to help however I can.
>>>
>>> Andy Boothe
>>> Email: andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>
>>> Mobile: (979) 574-1089
>>>
>>>
>>> ---------- Forwarded message ---------
>>> From: Pavel Rappo <pavel.rappo at oracle.com <mailto:pavel.rappo at oracle.com>>
>>> Date: Wed, Jul 24, 2024 at 4:30 PM
>>> Subject: Re: Very long response headers and java.net.http.HttpClient?
>>> To: Andy Boothe <andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>>
>>> Cc: jdk-dev at openjdk.org <mailto:jdk-dev at openjdk.org> <jdk-dev at openjdk.org <mailto:jdk-dev at openjdk.org>>
>>>
>>>
>>> A proper list would be net-dev at openjdk.java.net <http://openjdk.java.net/>.
>>>
>>> > On 24 Jul 2024, at 21:13, Andy Boothe <andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I'm documenting some guidelines for using java.net.http.HttpClient defensively for my team. For example: "Always set a request timeout", "Don't assume HTTP response entities are small and/or will fit in memory", etc.
>>> >
>>> > One guideline I'd like to document is "Set a maximum for HTTP response header size." However, I can't seem to find a way to set that limit, either in documentation or in OpenJDK code.
>>> >
>>> > I tried my best to search the archives for this mailing list for any mentions, but came up empty.
>>> >
>>> > To make sure my head is on straight and there isn't an undocumented limit set by default, I wrote the attached (very quick and dirty) client and server programs. LongResponseHeaderDemoServer opens a raw server socket and reads (what it assumes is) a well-formed HTTP request, and then prints an HTTP response which includes a response header of infinite length. LongResponseHeaderDemoHttpClient uses java.net.http.HttpClient to make a request and print the response body.
>>> >
>>> > When I run LongResponseHeaderDemoServer in one terminal and make a curl request to the server in another terminal, this is what curl spits out:
>>> >
>>> > $ curl -vvv -D - http://localhost:3000 <http://localhost:3000/>
>>> > * Host localhost:3000 was resolved.
>>> > * IPv6: ::1
>>> > * IPv4: 127.0.0.1
>>> > * Trying [::1]:3000...
>>> > * Connected to localhost (::1) port 3000
>>> > > GET / HTTP/1.1
>>> > > Host: localhost:3000
>>> > > User-Agent: curl/8.6.0
>>> > > Accept: */*
>>> > >
>>> > < HTTP/1.1 200 OK
>>> > HTTP/1.1 200 OK
>>> > < Content-Type: text/plain
>>> > Content-Type: text/plain
>>> > < Connection: close
>>> > Connection: close
>>> > < Content-Length: 3
>>> > Content-Length: 3
>>> > * Closing connection
>>> > curl: (100) A value or data field grew larger than allowed
>>> >
>>> > So curl detects the long response header and bails out. Safe and sane.
>>> >
>>> > However, when I run LongResponseHeaderDemoServer in one terminal and run LongResponseHeaderDemoHttpClient in another terminal, this is what happens:
>>> >
>>> > $ java LongResponseHeaderDemoHttpClient
>>> > Exception in thread "main" java.io.IOException: Requested array size exceeds VM limit
>>> > at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:966)
>>> > at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133)
>>> > at LongResponseHeaderDemoHttpClient.main(LongResponseHeaderDemoHttpClient.java:13)
>>> > Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>>> > at java.base/java.util.Arrays.copyOf(Arrays.java:3541)
>>> > at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242)
>>> > at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:806)
>>> > at java.base/java.lang.StringBuilder.append(StringBuilder.java:246)
>>> > at java.net.http/jdk.internal.net.http.Http1HeaderParser.readResumeHeader(Http1HeaderParser.java:250)
>>> > at java.net.http/jdk.internal.net.http.Http1HeaderParser.parse(Http1HeaderParser.java:124)
>>> > at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:605)
>>> > at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:536)
>>> > at java.net.http/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:527)
>>> > at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.tryAsyncReceive(Http1Response.java:583)
>>> > at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)
>>> > at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000008010dbd50.run(Unknown Source)
>>> > at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
>>> > at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
>>> > at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
>>> > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>>> > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>>> > at java.base/java.lang.Thread.runWith(Thread.java:1596)
>>> > at java.base/java.lang.Thread.run(Thread.java:1583)
>>> >
>>> > Ostensibly, HttpClient just keeps on reading the never-ending header until it OOMs. This seems to confirm that there is no default limit to header size. It also seems like A Very Bad Thing to me. This suggests that any time a program makes an HTTP request to an untrusted source using HttpClient, for example when crawling the web, they are at risk of an OOM.
>>> >
>>> > For grins, I also wrote an application LongResponseHeaderDemoHttpURLConnection that does the same thing as LongResponseHeaderDemoHttpClient, just using HttpURLConnection instead of HttpClient. When I run LongResponseHeaderDemoServer in one terminal and LongResponseHeaderDemoHttpURLConnection in another terminal, this is what happens:
>>> >
>>> > $ java LongResponseHeaderDemoHttpURLConnection
>>> > Exception in thread "main" java.lang.NegativeArraySizeException: -1610612736
>>> > at java.base/sun.net.www.MessageHeader.mergeHeader(MessageHeader.java:526)
>>> > at java.base/sun.net.www.MessageHeader.parseHeader(MessageHeader.java:481)
>>> > at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:804)
>>> > at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:726)
>>> > at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1688)
>>> > at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
>>> > at java.base/java.net.URL.openStream(URL.java:1161)
>>> > at LongResponseHeaderDemoHttpURLConnection.main(LongResponseHeaderDemoHttpURLConnection.java:12)
>>> >
>>> > So HttpURLConnection doesn't handle things gracefully either, but at least it doesn't OOM. That seems like a bug, too, but perhaps less severe.
>>> >
>>> > For reference, here's my java version:
>>> >
>>> > $ java -version
>>> > openjdk version "21.0.2" 2024-01-16 LTS
>>> > OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS)
>>> > OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS, mixed mode, sharing)
>>> >
>>> > Can anyone check my work, and maybe reproduce? And ideally, can someone with more knowledge than me about java.net.http.HttpClient and/or java.net.HttpURLConnection please comment? Is this real, or have I made a mistake somewhere along the way? If it's real, what's next? A bug report?
>>> >
>>> > Andy Boothe
>>> > Email: andy.boothe at gmail.com <mailto:andy.boothe at gmail.com>
>>> > Mobile: (979) 574-1089
>>>
>>> <LongResponseHeaderDemoHttpClient.java><LongResponseHeaderDemoHttpURLConnection.java><LongResponseHeaderDemoServer.java>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/net-dev/attachments/20240729/4d37a26a/attachment-0001.htm>
More information about the net-dev
mailing list