Data server design questions
Vitaly Davidovich
vitalyd at gmail.com
Tue Sep 4 13:58:47 PDT 2012
Hi Tom,
I'm on the phone so can't write too much but can give you my general view
on this.
The idea behind nio/async i/o is that on modern hardware, you can pretty
much saturate either the NIC or the network with data with just 1 (maybe 2)
thread. It's therefore unlikely that a thread-per-request will scale
better. In fact, it may have negative effects.
I think a good networking setup to start with would be one i/o thread and a
pool of worker threads. The i/o thread handles accepting connections and
reading/writing data from/to clients. The workers handle the actual work
of servicing the requests. The i/o thread should not consume much CPU
(unless you have frequent SSL connections being established) and should
mostly be busy filling input/output buffers.
For your specific situation, I'd consider memory mapping the backing
file(s) so that you can avoid caching the same data twice (if you were,
say, to read these files and cache their data in your java app as well) and
also let kernel manage the backing memory for them. The workers would then
be responsible for finding the data in the file(s), doing some processing
on the data (whatever it is that's needed), and then pumping this data back
to client.
Depending on how much physical memory vs size of files you have, the
workers may hit a hard page fault when accessing some region of the file
and thus be suspended while kernel pages in the data. Therefore, you may
want to size the worker pool to be larger than number of cpus/cores that
you have (may need to play with sizing until you get a good performance
profile).
Netty is a good nio framework, so I'd try it out first.
Not sure if this helps, so feel free to ask follow up questions.
Vitaly
Sent from my phone
On Sep 4, 2012 4:04 PM, "Tom Lasseter" <t.lasseter at comcast.net> wrote:
> I am developing a high-performance cloud data server primarily for the
> display of large engineering and scientific datasets. The client-side is a
> rich Java application though a client browser is possible if the GUI were
> limited. ****
>
> ** **
>
> The challenge is the diverse libraries and technologies that are
> available. I have spent weeks analyzing and prototyping. The decision on
> what path to take is a complex one requiring knowledge of not just the
> detailed software architecture of the various design possibilities, but
> projections/predictions of how effectively they will function in a
> heavy-load environment. ****
>
> ** **
>
> Despite the fact that most questions on this *open-jdk list* are
> low-level, I believe this is the right venue for my questions as the
> experts who understand the code well enough to design and improve it are
> the only ones who can expertly answer the questions I’m posing. ****
>
> ****
>
> A data server is pretty general, so I will be more explicit about the
> requirements for this one. ****
>
> ** **
>
> The data consists of ASCII files which are JSON-like key-value files each
> of which describes an object. Associated with each of these object files
> are a series of binary files which contain the bulk of the data. These
> data objects fall into two extreme groups: 1) large 3D/4D volumes from
> which the user wishes to extract a subset of the data; 2) smaller objects
> of which the user may wish to completely display many thousands. ****
>
> ** **
>
> For the large volume files, the user is often moving a cursor through the
> volume, so the server will need to be constantly delivering new segments
> selected. It makes sense to have a server thread monitoring requests for
> this volume which may be coming simultaneously from multiple users and
> starting worker threads to deliver the requests. Client-server sockets
> should stay alive for some period of time and the associated server work
> thread might be kept alive as well until the user ceases activity. ****
>
> ****
>
> For the large sets of smaller object files, the options are to make
> individual requests for each file and have the client process and display
> the data as received, or to make a request for all the objects and have the
> server send them sequentially on a single socket. The optimization here is
> a bit tricky: the IO can be running on several sockets or sequentially on
> a single socket, and the processing on both server and client ends could be
> run on single or multiple threads in either case. ****
>
> ** **
>
> *QUESTION 1:* Looking at these requirements and scenarios, how would you
> design this client-server system in general terms?****
>
> ** **
>
> An interesting analysis and presentation was put together by Paul Tyma:***
> *
>
> *“Thousands of Threads and Blocking I/O: The old way to write Java
> Servers is New again (and way better)” *
>
> ** **
>
> http://www.mailinator.com/tymaPaulMultithreaded.pdf****
>
> ** **
>
> where he makes the case that asynchronous IO was developed because of
> performance issues in creating and switching threads, but that hardware
> improvements have since significantly reduced these problems. His
> comparison shows that the multi-thread blocking IO is simpler and more
> efficient than asynchronous IO. *QUESTION 2:* What are your comments on
> Paul’s analysis and has it changed since his analysis and presentation
> (2008)? ****
>
> ** **
>
> Looking at the technologies out there is quite confusing. Here are the
> main ones I’ve studied in detail.****
>
> **1) **Java 7 with its outstanding NIO.2 API; Anghel Leonard’s book
> describing many of its features is amazing: there are dozens of examples
> which can be rapidly loaded and tested and they all work (!); the problem
> is that no one seems to be adopting the Java 7 socket APIs; many such as
> the Netty group say it’s somewhat incompatible and will be adopted at a low
> level under the existing Netty API; Jetty is not supporting asynchronous
> IO at all (I don’t believe); *QUESTION 3:* What are the issues with Java
> 7 socket APIs that are inhibiting their adoption? ****
>
> **2) **Apache MINA; a lightweight server with an FTP server
> implemented; uses blocking IO; ****
>
> **3) **Netty asynchronous NIO; excellent design for an event-driven
> client-server system; does not have a TCP file-server implementation
> comparable to FTP;****
>
> **4) **Kaazing socket gateway server; closed source, not
> asynchronous; ****
>
> **5) **Waarp FTP and R66 file servers; heavy-duty file server built
> on Netty by the French government; impressive and complex.****
>
> *QUESTION 4:* Are there other project and technologies I should be aware
> of?****
>
> ** **
>
> It seems to me that that the best solution is to use Java 7 and use simple
> blocking IO for the application I’ve described. The only issue is
> reliability under heavy-load on a cloud server system. ****
>
> *QUESTION 5:* Can I rely on the cloud server itself (such as Amazon EC2)
> to provide me all the bells-and-whistles needed such as authentication,
> load-balancing, etc and do so reliably with the simple server I’m
> envisioning? *QUESTION 6:* Do any of the other technologies discussed
> above provide capabilities which cannot be easily built from scratch? *
> ***
>
> ****
>
> Thank you very much for any and all input!****
>
> ** **
>
> Tom Lasseter****
>
> Email: tom at gc-rt.com****
>
> Website: gc-rt.com****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20120904/416b2c45/attachment-0001.html
More information about the nio-dev
mailing list