Data server design questions
Zhong Yu
zhong.j.yu at gmail.com
Tue Sep 4 18:02:48 PDT 2012
About the claim that traditional socket IO is 30% faster than Selector
based NIO: It's unclear how they did the benchmark. At that time I
also did some benchmark myself, and got the same 30% number. However
the test case is too simplistic. Basically, on the same machine, some
clients write() data to the server; the server does read(). Measure
the throughput, the blocking version is 30% more than the non-blocking
version.
There's a critical flaw though. The server calls read(byte[]) or
read(ByteBuffer), but doesn't do anything with the data received.
That's a extremely high throughput but terribly useless server! As
soon as the server does the most basic thing with the data - reading
every byte once - the throughput drops significantly, and there's not
much difference between blocking and non-blocking any more.
If the 30% number they got is also from such extreme and unrealistic
setups, it has litter value for evaluating real server performance.
Zhong Yu
On Tue, Sep 4, 2012 at 3:04 PM, Tom Lasseter <t.lasseter at comcast.net> wrote:
> I am developing a high-performance cloud data server primarily for the
> display of large engineering and scientific datasets. The client-side is a
> rich Java application though a client browser is possible if the GUI were
> limited.
>
>
>
> The challenge is the diverse libraries and technologies that are available.
> I have spent weeks analyzing and prototyping. The decision on what path to
> take is a complex one requiring knowledge of not just the detailed software
> architecture of the various design possibilities, but
> projections/predictions of how effectively they will function in a
> heavy-load environment.
>
>
>
> Despite the fact that most questions on this open-jdk list are low-level, I
> believe this is the right venue for my questions as the experts who
> understand the code well enough to design and improve it are the only ones
> who can expertly answer the questions I’m posing.
>
>
>
> A data server is pretty general, so I will be more explicit about the
> requirements for this one.
>
>
>
> The data consists of ASCII files which are JSON-like key-value files each of
> which describes an object. Associated with each of these object files are a
> series of binary files which contain the bulk of the data. These data
> objects fall into two extreme groups: 1) large 3D/4D volumes from which the
> user wishes to extract a subset of the data; 2) smaller objects of which the
> user may wish to completely display many thousands.
>
>
>
> For the large volume files, the user is often moving a cursor through the
> volume, so the server will need to be constantly delivering new segments
> selected. It makes sense to have a server thread monitoring requests for
> this volume which may be coming simultaneously from multiple users and
> starting worker threads to deliver the requests. Client-server sockets
> should stay alive for some period of time and the associated server work
> thread might be kept alive as well until the user ceases activity.
>
>
>
> For the large sets of smaller object files, the options are to make
> individual requests for each file and have the client process and display
> the data as received, or to make a request for all the objects and have the
> server send them sequentially on a single socket. The optimization here is
> a bit tricky: the IO can be running on several sockets or sequentially on a
> single socket, and the processing on both server and client ends could be
> run on single or multiple threads in either case.
>
>
>
> QUESTION 1: Looking at these requirements and scenarios, how would you
> design this client-server system in general terms?
>
>
>
> An interesting analysis and presentation was put together by Paul Tyma:
>
> “Thousands of Threads and Blocking I/O: The old way to write Java Servers is
> New again (and way better)”
>
>
>
> http://www.mailinator.com/tymaPaulMultithreaded.pdf
>
>
>
> where he makes the case that asynchronous IO was developed because of
> performance issues in creating and switching threads, but that hardware
> improvements have since significantly reduced these problems. His
> comparison shows that the multi-thread blocking IO is simpler and more
> efficient than asynchronous IO. QUESTION 2: What are your comments on
> Paul’s analysis and has it changed since his analysis and presentation
> (2008)?
>
>
>
> Looking at the technologies out there is quite confusing. Here are the main
> ones I’ve studied in detail.
>
> 1) Java 7 with its outstanding NIO.2 API; Anghel Leonard’s book
> describing many of its features is amazing: there are dozens of examples
> which can be rapidly loaded and tested and they all work (!); the problem is
> that no one seems to be adopting the Java 7 socket APIs; many such as the
> Netty group say it’s somewhat incompatible and will be adopted at a low
> level under the existing Netty API; Jetty is not supporting asynchronous IO
> at all (I don’t believe); QUESTION 3: What are the issues with Java 7
> socket APIs that are inhibiting their adoption?
>
> 2) Apache MINA; a lightweight server with an FTP server implemented;
> uses blocking IO;
>
> 3) Netty asynchronous NIO; excellent design for an event-driven
> client-server system; does not have a TCP file-server implementation
> comparable to FTP;
>
> 4) Kaazing socket gateway server; closed source, not asynchronous;
>
> 5) Waarp FTP and R66 file servers; heavy-duty file server built on
> Netty by the French government; impressive and complex.
>
> QUESTION 4: Are there other project and technologies I should be aware of?
>
>
>
> It seems to me that that the best solution is to use Java 7 and use simple
> blocking IO for the application I’ve described. The only issue is
> reliability under heavy-load on a cloud server system.
>
> QUESTION 5: Can I rely on the cloud server itself (such as Amazon EC2) to
> provide me all the bells-and-whistles needed such as authentication,
> load-balancing, etc and do so reliably with the simple server I’m
> envisioning? QUESTION 6: Do any of the other technologies discussed above
> provide capabilities which cannot be easily built from scratch?
>
>
>
> Thank you very much for any and all input!
>
>
>
> Tom Lasseter
>
> Email: tom at gc-rt.com
>
> Website: gc-rt.com
>
>
More information about the nio-dev
mailing list