Parallel processing of lines ina file (was RE: Basic functional style question)

Millies, Sebastian Sebastian.Millies at softwareag.com
Thu Nov 28 02:24:04 PST 2013


Hello there,

Assuming very large log files, how well does that approach parallelize?
Can/should I use  "Files.lines(path).parallel()" ?

I assume that the JRE would buffer blocks of lines in an array, the elements of which would then
be processed in parallel. Is that right? Is there any way to tweak the buffer size etc.? And will the next
block be read in parallel to processing the previous one, or after processing of each block has finished?

-- Sebastian

-----Original Message-----
From: lambda-dev-bounces at openjdk.java.net [mailto:lambda-dev-bounces at openjdk.java.net] On Behalf Of Stuart Marks
Sent: Wednesday, November 27, 2013 9:18 AM
To: mohan.radhakrishnan at polarisft.com; lambda-dev at openjdk.java.net
Subject: Re: Basic functional style question

I started scratching this out and I got a sense of déjà vu, and then I realized Kirk Pepperdine recently wrote a blog post last week about using lambda to parse and pattern match GC logs. Were you at his talk at Øredev?

https://weblogs.java.net/blog/kcpeppe/archive/2013/11/10/fun-lambdas

The approach he ended up with (with help from Brian Goetz), applied to your example, would look something like this:

     Pattern pat = Pattern.compile("Current.*?[/|]");
     Files.lines(Paths.get(...))
          .map(pat::matcher)
          .filter(Matcher::find)
          .map(Matcher::group)
          .forEach(System.out::println);

Note this uses Pattern and Matcher instead of Scanner, as they're more flexible.

The insight is to map each input string to a Matcher so that midway through the pipeline we have Stream<Matcher>. We then filter the matchers to get only the successful ones, and then extract the matched string from them.

Does this do what you want?

s'marks

Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Karl-Heinz Streibich (Vorsitzender/Chairman), Dr. Wolfram Jost, Arnd Zinnhardt; - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Dr. Andreas Bereczky - http://www.softwareag.com



More information about the lambda-dev mailing list