A parallel HttpClient sendAsync question

Weijun Wang weijun.wang at oracle.com
Tue Nov 13 03:18:05 UTC 2018



> On Nov 13, 2018, at 11:09 AM, Pavel Rappo <pavel.rappo at oracle.com> wrote:
> 
>> On 13 Nov 2018, at 02:35, Weijun Wang <weijun.wang at oracle.com> wrote:
>> 
>> I'm scanning a file and downloading links inside:
>> 
>> lines.flapMap(x -> Stream.ofNullable(findURIFrom(x)))
>>    .map(l -> download(c, l))
>>    .forEach(f -> f.join());
>> 
>> CompletableFuture<HttpResponse<Path>> download(HttpClient c, URI link) {
>>   return c.sendAsync(HttpRequest.newBuilder(link).build(),
>>           HttpResponse.BodyHandlers.ofFile(Path.of(link.getPath())));
>> }
>> 
>> However, it seems the download is one by one and not parallel.
> 
> 1. CompletableFuture.join waits until the result (or exception) is available
> 2. I guess the "lines" stream is created from Files.lines(Path)?

It's from a blocking HTTP request:

   c.send(req, HttpResponse.BodyHandlers.ofLines()).body()

> If so, then
> 
>    * this method does not read all lines into a {@code List}, but instead
>    * populates lazily as the stream is consumed
> 
>> I can only collect the jobs into a list and then call join() on CompletableFuture.allOf(list). Is there a simpler way?
> 
> This seems to be a correct way of waiting until all the links have been
> downloaded (possibly in parallel, depending on the Executor you use for
> HttpClient) or an error occurred.

I didn't choose any executor but @implNote says it will be a thread pool.

BTW, I am expecting 1000 links but after 976 were downloaded it hangs and I have to Ctrl-C. I wish there is a way that when there are no new download within 10 seconds I can stop the whole process and print out which jobs are not finished yet.

Thanks
Max

> 
> 
> 



More information about the net-dev mailing list