<div dir="ltr"><div>First, thank you both for the responses. I know how busy everyone is, and I really appreciate the time.<br></div><div><br></div><div>We can talk about use cases and architecture and such, but I think we all agree that a developer should be able to make an HTTP request with HttpClient without worrying about whether or not it will cause an OOM. Or, at least, that whether or not it causes an OOM should be fully within their control. And that's not where this implementation is right now.</div><br><div>I’d be very happy to work on a fix for this. Would it be out of order for me to propose a patch?</div><div><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Andy Boothe<div><b>Email</b>: <a href="mailto:andy.boothe@gmail.com" target="_blank">andy.boothe@gmail.com</a></div><div><b>Mobile</b>: (979) 574-1089<br></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 29, 2024 at 3:57 PM robert engels <<a href="mailto:rengels@ix.netcom.com" target="_blank">rengels@ix.netcom.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Yes, but normally you fork a worker process that tracks progress and scrapes N sites. If the worker process dies processing a site, the site is marked “bad” and only periodically scraped after a retry/backoff period.<div><br></div><div>There are probably a lot of ways to crash a worker process, intentionally or accidentally - a robust design is called for.</div><div><br></div><div>As an aside, if I was writing a large scale scraper I don’t think I would use HttpClient anyway - I think a custom url accessor would be easier to monitor, etc.<br><div><br><blockquote type="cite"><div>On Jul 29, 2024, at 3:43 PM, Ethan McCue <<a href="mailto:ethan@mccue.dev" target="_blank">ethan@mccue.dev</a>> wrote:</div><br><div><div dir="auto">Scraping of unknown/untrusted websites is a common task in certain...fields? I don't want to comment on it too deeply, but I know that is something folks would do.<div dir="auto"><br></div><div dir="auto">Imagine a site where someone inputs a URL, clicks submit, and then with the power of funding they return a summary of the page.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 29, 2024, 3:52 PM robert engels <<a href="mailto:rengels@ix.netcom.com" target="_blank">rengels@ix.netcom.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Isn’t the HttpClient almost always used to access other services?<div><br></div><div>Why would a developer access a malicious service?</div><div><br></div><div>I also think there are lots of ways for a service to crash the client - .e.g it could attempt to return a very large response - if the client uses a memory buffered reader, it will cause an OOM as well.<br><div><br><blockquote type="cite"><div>On Jul 29, 2024, at 2:42 PM, Andy Boothe <<a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div>Following up here.</div><div><br></div><div>I believe I have discovered that it is possible to craft a malicious HTTP response that can cause the built-in HttpURLConnection and HttpClient implementations to throw exceptions. Specifically, HttpURLConnection can be made to throw a NegativeArraySizeException, and HttpClient can be made to throw an OutOfMemoryError. Proof of this behavior is in the attached (very simple) Java programs.<br></div><div><br></div><div>This seems like A Bad Thing to me.</div><div><br></div><div>I've moved from the dev list to this list based on a recommendation from that list. Is this the right list? If not, can you point me in the right direction? Perhaps a security list?</div><div><br></div><div>Thank you,<br></div><div><br clear="all"></div><div dir="ltr"><div><div dir="ltr" class="gmail_signature"><div dir="ltr">Andy Boothe<div><b>Email</b>: <a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a></div><div><b>Mobile</b>: (979) 574-1089<br></div></div></div></div></div></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 24, 2024 at 4:47 PM Andy Boothe <<a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hello,</div><div><br></div><div>I'm moving this thread from jdk-dev to this list on the sage advice of Pavel Rappo.</div><div><br></div><div>As a brief recap, it looks like HttpClient and HttpURLConnection do not currently support a way to set the maximum acceptable response header length. As a result, sending HTTP requests with these classes that result in a response with very long headers causes an OutOfMemoryError and a NegativeArraySizeException, respectively. (Simple programs for reproducing the issue are attached.) This seems like A Bad Thing. There is a (very brief) discussion in the thread about how to handle, but of course you guys are the experts.</div><div><br></div><div>If my head is on straight and this turns out to be a real issue as opposed to a mistake on my part, I'm keen to help however I can. <br></div><div><br></div><div><div><div dir="ltr" class="gmail_signature"><div dir="ltr">Andy Boothe<div><b>Email</b>: <a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a></div><div><b>Mobile</b>: (979) 574-1089<br></div></div></div></div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <b class="gmail_sendername" dir="auto">Pavel Rappo</b> <span dir="auto"><<a href="mailto:pavel.rappo@oracle.com" rel="noreferrer" target="_blank">pavel.rappo@oracle.com</a>></span><br>Date: Wed, Jul 24, 2024 at 4:30 PM<br>Subject: Re: Very long response headers and java.net.http.HttpClient?<br>To: Andy Boothe <<a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a>><br>Cc: <a href="mailto:jdk-dev@openjdk.org" rel="noreferrer" target="_blank">jdk-dev@openjdk.org</a> <<a href="mailto:jdk-dev@openjdk.org" rel="noreferrer" target="_blank">jdk-dev@openjdk.org</a>><br></div><br><br>
<div>
<div><font size="2"><span style="font-size:11pt">
<div>A proper list would be net-dev at <a href="http://openjdk.java.net/" rel="noreferrer" target="_blank">openjdk.java.net</a>.<br>
<br>
> On 24 Jul 2024, at 21:13, Andy Boothe <<a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a>> wrote:<br>
> <br>
> Hello,<br>
> <br>
> I'm documenting some guidelines for using java.net.http.HttpClient defensively for my team. For example: "Always set a request timeout", "Don't assume HTTP response entities are small and/or will fit in memory", etc.<br>
> <br>
> One guideline I'd like to document is "Set a maximum for HTTP response header size." However, I can't seem to find a way to set that limit, either in documentation or in OpenJDK code.<br>
> <br>
> I tried my best to search the archives for this mailing list for any mentions, but came up empty.<br>
> <br>
> To make sure my head is on straight and there isn't an undocumented limit set by default, I wrote the attached (very quick and dirty) client and server programs. LongResponseHeaderDemoServer opens a raw server socket and reads (what it assumes is) a well-formed
HTTP request, and then prints an HTTP response which includes a response header of infinite length. LongResponseHeaderDemoHttpClient uses java.net.http.HttpClient to make a request and print the response body.<br>
> <br>
> When I run LongResponseHeaderDemoServer in one terminal and make a curl request to the server in another terminal, this is what curl spits out:<br>
> <br>
> $ curl -vvv -D - <a href="http://localhost:3000/" rel="noreferrer" target="_blank">http://localhost:3000</a><br>
> * Host localhost:3000 was resolved.<br>
> * IPv6: ::1<br>
> * IPv4: 127.0.0.1<br>
> * Trying [::1]:3000...<br>
> * Connected to localhost (::1) port 3000<br>
> > GET / HTTP/1.1<br>
> > Host: localhost:3000<br>
> > User-Agent: curl/8.6.0<br>
> > Accept: */*<br>
> > <br>
> < HTTP/1.1 200 OK<br>
> HTTP/1.1 200 OK<br>
> < Content-Type: text/plain<br>
> Content-Type: text/plain<br>
> < Connection: close<br>
> Connection: close<br>
> < Content-Length: 3<br>
> Content-Length: 3<br>
> * Closing connection<br>
> curl: (100) A value or data field grew larger than allowed<br>
> <br>
> So curl detects the long response header and bails out. Safe and sane.<br>
> <br>
> However, when I run LongResponseHeaderDemoServer in one terminal and run LongResponseHeaderDemoHttpClient in another terminal, this is what happens:<br>
> <br>
> $ java LongResponseHeaderDemoHttpClient <br>
> Exception in thread "main" java.io.IOException: Requested array size exceeds VM limit<br>
> at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:966)<br>
> at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133)<br>
> at LongResponseHeaderDemoHttpClient.main(LongResponseHeaderDemoHttpClient.java:13)<br>
> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit<br>
> at java.base/java.util.Arrays.copyOf(Arrays.java:3541)<br>
> at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242)<br>
> at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:806)<br>
> at java.base/java.lang.StringBuilder.append(StringBuilder.java:246)<br>
> at java.net.http/jdk.internal.net.http.Http1HeaderParser.readResumeHeader(Http1HeaderParser.java:250)<br>
> at java.net.http/jdk.internal.net.http.Http1HeaderParser.parse(Http1HeaderParser.java:124)<br>
> at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:605)<br>
> at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:536)<br>
> at java.net.http/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:527)<br>
> at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.tryAsyncReceive(Http1Response.java:583)<br>
> at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)<br>
> at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000008010dbd50.run(Unknown Source)<br>
> at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)<br>
> at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)<br>
> at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)<br>
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)<br>
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)<br>
> at java.base/java.lang.Thread.runWith(Thread.java:1596)<br>
> at java.base/java.lang.Thread.run(Thread.java:1583)<br>
> <br>
> Ostensibly, HttpClient just keeps on reading the never-ending header until it OOMs. This seems to confirm that there is no default limit to header size. It also seems like A Very Bad Thing to me. This suggests that any time a program makes an HTTP request
to an untrusted source using HttpClient, for example when crawling the web, they are at risk of an OOM.<br>
> <br>
> For grins, I also wrote an application LongResponseHeaderDemoHttpURLConnection that does the same thing as LongResponseHeaderDemoHttpClient, just using HttpURLConnection instead of HttpClient. When I run LongResponseHeaderDemoServer in one terminal and LongResponseHeaderDemoHttpURLConnection
in another terminal, this is what happens:<br>
> <br>
> $ java LongResponseHeaderDemoHttpURLConnection<br>
> Exception in thread "main" java.lang.NegativeArraySizeException: -1610612736<br>
> at java.base/sun.net.www.MessageHeader.mergeHeader(MessageHeader.java:526)<br>
> at java.base/sun.net.www.MessageHeader.parseHeader(MessageHeader.java:481)<br>
> at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:804)<br>
> at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:726)<br>
> at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1688)<br>
> at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)<br>
> at java.base/java.net.URL.openStream(URL.java:1161)<br>
> at LongResponseHeaderDemoHttpURLConnection.main(LongResponseHeaderDemoHttpURLConnection.java:12)<br>
> <br>
> So HttpURLConnection doesn't handle things gracefully either, but at least it doesn't OOM. That seems like a bug, too, but perhaps less severe.<br>
> <br>
> For reference, here's my java version:<br>
> <br>
> $ java -version<br>
> openjdk version "21.0.2" 2024-01-16 LTS<br>
> OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS)<br>
> OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS, mixed mode, sharing)<br>
> <br>
> Can anyone check my work, and maybe reproduce? And ideally, can someone with more knowledge than me about java.net.http.HttpClient and/or java.net.HttpURLConnection please comment? Is this real, or have I made a mistake somewhere along the way? If it's real,
what's next? A bug report?<br>
> <br>
> Andy Boothe<br>
> Email: <a href="mailto:andy.boothe@gmail.com" rel="noreferrer" target="_blank">andy.boothe@gmail.com</a><br>
> Mobile: (979) 574-1089<br>
</div>
</span></font></div>
<div><font size="2"><span style="font-size:11pt">
<div><br>
</div>
</span></font></div>
</div>
</div></div></div>
</blockquote></div>
<span id="m_8593777315149732547m_-4338633962271655457m_-7505231832731471893cid:f_lz7e35y50"><LongResponseHeaderDemoHttpClient.java></span><span id="m_8593777315149732547m_-4338633962271655457m_-7505231832731471893cid:f_lz7e35yd1"><LongResponseHeaderDemoHttpURLConnection.java></span><span id="m_8593777315149732547m_-4338633962271655457m_-7505231832731471893cid:f_lz7e35ye2"><LongResponseHeaderDemoServer.java></span></div></blockquote></div><br></div></div></blockquote></div>
</div></blockquote></div><br></div></div></blockquote></div>