Very long response headers and java.net.http.HttpClient?
Andy Boothe
andy.boothe at gmail.com
Mon Jul 29 22:20:55 UTC 2024
First, thank you both for the responses. I know how busy everyone is, and I
really appreciate the time.
We can talk about use cases and architecture and such, but I think we all
agree that a developer should be able to make an HTTP request with
HttpClient without worrying about whether or not it will cause an OOM. Or,
at least, that whether or not it causes an OOM should be fully within their
control. And that's not where this implementation is right now.
I’d be very happy to work on a fix for this. Would it be out of order for
me to propose a patch?
Andy Boothe
*Email*: andy.boothe at gmail.com
*Mobile*: (979) 574-1089
On Mon, Jul 29, 2024 at 3:57 PM robert engels <rengels at ix.netcom.com> wrote:
> Yes, but normally you fork a worker process that tracks progress and
> scrapes N sites. If the worker process dies processing a site, the site is
> marked “bad” and only periodically scraped after a retry/backoff period.
>
> There are probably a lot of ways to crash a worker process, intentionally
> or accidentally - a robust design is called for.
>
> As an aside, if I was writing a large scale scraper I don’t think I would
> use HttpClient anyway - I think a custom url accessor would be easier to
> monitor, etc.
>
> On Jul 29, 2024, at 3:43 PM, Ethan McCue <ethan at mccue.dev> wrote:
>
> Scraping of unknown/untrusted websites is a common task in
> certain...fields? I don't want to comment on it too deeply, but I know that
> is something folks would do.
>
> Imagine a site where someone inputs a URL, clicks submit, and then with
> the power of funding they return a summary of the page.
>
> On Mon, Jul 29, 2024, 3:52 PM robert engels <rengels at ix.netcom.com> wrote:
>
>> Isn’t the HttpClient almost always used to access other services?
>>
>> Why would a developer access a malicious service?
>>
>> I also think there are lots of ways for a service to crash the client -
>> .e.g it could attempt to return a very large response - if the client uses
>> a memory buffered reader, it will cause an OOM as well.
>>
>> On Jul 29, 2024, at 2:42 PM, Andy Boothe <andy.boothe at gmail.com> wrote:
>>
>> Following up here.
>>
>> I believe I have discovered that it is possible to craft a malicious HTTP
>> response that can cause the built-in HttpURLConnection and HttpClient
>> implementations to throw exceptions. Specifically, HttpURLConnection can be
>> made to throw a NegativeArraySizeException, and HttpClient can be made to
>> throw an OutOfMemoryError. Proof of this behavior is in the attached (very
>> simple) Java programs.
>>
>> This seems like A Bad Thing to me.
>>
>> I've moved from the dev list to this list based on a recommendation from
>> that list. Is this the right list? If not, can you point me in the right
>> direction? Perhaps a security list?
>>
>> Thank you,
>>
>> Andy Boothe
>> *Email*: andy.boothe at gmail.com
>> *Mobile*: (979) 574-1089
>> On Wed, Jul 24, 2024 at 4:47 PM Andy Boothe <andy.boothe at gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm moving this thread from jdk-dev to this list on the sage advice of
>>> Pavel Rappo.
>>>
>>> As a brief recap, it looks like HttpClient and HttpURLConnection do not
>>> currently support a way to set the maximum acceptable response header
>>> length. As a result, sending HTTP requests with these classes that result
>>> in a response with very long headers causes an OutOfMemoryError and a
>>> NegativeArraySizeException, respectively. (Simple programs for reproducing
>>> the issue are attached.) This seems like A Bad Thing. There is a (very
>>> brief) discussion in the thread about how to handle, but of course you guys
>>> are the experts.
>>>
>>> If my head is on straight and this turns out to be a real issue as
>>> opposed to a mistake on my part, I'm keen to help however I can.
>>>
>>> Andy Boothe
>>> *Email*: andy.boothe at gmail.com
>>> *Mobile*: (979) 574-1089
>>>
>>>
>>> ---------- Forwarded message ---------
>>> From: Pavel Rappo <pavel.rappo at oracle.com>
>>> Date: Wed, Jul 24, 2024 at 4:30 PM
>>> Subject: Re: Very long response headers and java.net.http.HttpClient?
>>> To: Andy Boothe <andy.boothe at gmail.com>
>>> Cc: jdk-dev at openjdk.org <jdk-dev at openjdk.org>
>>>
>>>
>>> A proper list would be net-dev at openjdk.java.net.
>>>
>>> > On 24 Jul 2024, at 21:13, Andy Boothe <andy.boothe at gmail.com> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I'm documenting some guidelines for using java.net.http.HttpClient
>>> defensively for my team. For example: "Always set a request timeout",
>>> "Don't assume HTTP response entities are small and/or will fit in memory",
>>> etc.
>>> >
>>> > One guideline I'd like to document is "Set a maximum for HTTP response
>>> header size." However, I can't seem to find a way to set that limit, either
>>> in documentation or in OpenJDK code.
>>> >
>>> > I tried my best to search the archives for this mailing list for any
>>> mentions, but came up empty.
>>> >
>>> > To make sure my head is on straight and there isn't an undocumented
>>> limit set by default, I wrote the attached (very quick and dirty) client
>>> and server programs. LongResponseHeaderDemoServer opens a raw server socket
>>> and reads (what it assumes is) a well-formed HTTP request, and then prints
>>> an HTTP response which includes a response header of infinite length.
>>> LongResponseHeaderDemoHttpClient uses java.net.http.HttpClient to make a
>>> request and print the response body.
>>> >
>>> > When I run LongResponseHeaderDemoServer in one terminal and make a
>>> curl request to the server in another terminal, this is what curl spits out:
>>> >
>>> > $ curl -vvv -D - http://localhost:3000
>>> > * Host localhost:3000 was resolved.
>>> > * IPv6: ::1
>>> > * IPv4: 127.0.0.1
>>> > * Trying [::1]:3000...
>>> > * Connected to localhost (::1) port 3000
>>> > > GET / HTTP/1.1
>>> > > Host: localhost:3000
>>> > > User-Agent: curl/8.6.0
>>> > > Accept: */*
>>> > >
>>> > < HTTP/1.1 200 OK
>>> > HTTP/1.1 200 OK
>>> > < Content-Type: text/plain
>>> > Content-Type: text/plain
>>> > < Connection: close
>>> > Connection: close
>>> > < Content-Length: 3
>>> > Content-Length: 3
>>> > * Closing connection
>>> > curl: (100) A value or data field grew larger than allowed
>>> >
>>> > So curl detects the long response header and bails out. Safe and sane.
>>> >
>>> > However, when I run LongResponseHeaderDemoServer in one terminal and
>>> run LongResponseHeaderDemoHttpClient in another terminal, this is what
>>> happens:
>>> >
>>> > $ java LongResponseHeaderDemoHttpClient
>>> > Exception in thread "main" java.io.IOException: Requested array size
>>> exceeds VM limit
>>> > at
>>> java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:966)
>>> > at
>>> java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133)
>>> > at
>>> LongResponseHeaderDemoHttpClient.main(LongResponseHeaderDemoHttpClient.java:13)
>>> > Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>>> limit
>>> > at java.base/java.util.Arrays.copyOf(Arrays.java:3541)
>>> > at
>>> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242)
>>> > at
>>> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:806)
>>> > at java.base/java.lang.StringBuilder.append(StringBuilder.java:246)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1HeaderParser.readResumeHeader(Http1HeaderParser.java:250)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1HeaderParser.parse(Http1HeaderParser.java:124)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:605)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:536)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:527)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.tryAsyncReceive(Http1Response.java:583)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)
>>> > at
>>> java.net.http/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000008010dbd50.run(Unknown
>>> Source)
>>> > at
>>> java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
>>> > at
>>> java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
>>> > at
>>> java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
>>> > at
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>>> > at
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>>> > at java.base/java.lang.Thread.runWith(Thread.java:1596)
>>> > at java.base/java.lang.Thread.run(Thread.java:1583)
>>> >
>>> > Ostensibly, HttpClient just keeps on reading the never-ending header
>>> until it OOMs. This seems to confirm that there is no default limit to
>>> header size. It also seems like A Very Bad Thing to me. This suggests that
>>> any time a program makes an HTTP request to an untrusted source using
>>> HttpClient, for example when crawling the web, they are at risk of an OOM.
>>> >
>>> > For grins, I also wrote an application
>>> LongResponseHeaderDemoHttpURLConnection that does the same thing as
>>> LongResponseHeaderDemoHttpClient, just using HttpURLConnection instead of
>>> HttpClient. When I run LongResponseHeaderDemoServer in one terminal and
>>> LongResponseHeaderDemoHttpURLConnection in another terminal, this is what
>>> happens:
>>> >
>>> > $ java LongResponseHeaderDemoHttpURLConnection
>>> > Exception in thread "main" java.lang.NegativeArraySizeException:
>>> -1610612736
>>> > at
>>> java.base/sun.net.www.MessageHeader.mergeHeader(MessageHeader.java:526)
>>> > at
>>> java.base/sun.net.www.MessageHeader.parseHeader(MessageHeader.java:481)
>>> > at
>>> java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:804)
>>> > at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:726)
>>> > at
>>> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1688)
>>> > at
>>> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
>>> > at java.base/java.net.URL.openStream(URL.java:1161)
>>> > at
>>> LongResponseHeaderDemoHttpURLConnection.main(LongResponseHeaderDemoHttpURLConnection.java:12)
>>> >
>>> > So HttpURLConnection doesn't handle things gracefully either, but at
>>> least it doesn't OOM. That seems like a bug, too, but perhaps less severe.
>>> >
>>> > For reference, here's my java version:
>>> >
>>> > $ java -version
>>> > openjdk version "21.0.2" 2024-01-16 LTS
>>> > OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS)
>>> > OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS,
>>> mixed mode, sharing)
>>> >
>>> > Can anyone check my work, and maybe reproduce? And ideally, can
>>> someone with more knowledge than me about java.net.http.HttpClient and/or
>>> java.net.HttpURLConnection please comment? Is this real, or have I made a
>>> mistake somewhere along the way? If it's real, what's next? A bug report?
>>> >
>>> > Andy Boothe
>>> > Email: andy.boothe at gmail.com
>>> > Mobile: (979) 574-1089
>>>
>>> <LongResponseHeaderDemoHttpClient.java>
>> <LongResponseHeaderDemoHttpURLConnection.java>
>> <LongResponseHeaderDemoServer.java>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/net-dev/attachments/20240729/470d1479/attachment-0001.htm>
More information about the net-dev
mailing list