Why Nothing Matters: The Impact of Zeroing

Xi Yang hiyangxi at gmail.com
Thu Sep 22 19:54:47 PDT 2011


Hi all,

We publish a paper (
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/zero-oopsla-2011.pdf
) about zeroing initialization of JVM at OOPSLA11. We found that the
cost of zeroing initialization is very high on modern x86 CMPs. By
concurrently zeroing the nursery space with non-temporal instructions,
we improve the performance by 3.2% on average and up to 9.3% on the
newest sandybridge (i7-2600) machine across 19 benchmarks from DaCapo,
SPECjvm98, and pjbb2005.

The speedup is not that significant, however, compared with current
zeroing approach in HotSpot, the design we proposed is more simple. If
HotSpot developers are interested in the idea, you can implement it
within 1 hour.  One hour work leads to 3.2% speedup, not a bad deal,
right?


Here is the paper link and abstract:

http://users.cecs.anu.edu.au/~steveb/downloads/pdf/zero-oopsla-2011.pdf

Managed languages use memory safety to defend against inadvertent and
malicious misuse of memory. Unmanaged native languages are
increasingly integrating memory safety for the same reasons. A
critical element of memory safety is initializing new memory before
the program obtains it. Our experiments show that zero initialization
is surprisingly expensive in a highly optimized managed runtime — on
average the direct cost of zeroing is 4% to 6% and up to 50% of total
application time on a variety of modern processors. Zeroing incurs
indirect costs as well, which include memory bandwidth consumption and
cache displacement. Existing virtual machines (VMs) either: a)
minimize direct costs by zeroing in large blocks, or b) minimize
indirect costs by integrating zeroing into the allocation sequence to
reduce cache displacement.
This paper first describes and evaluates zero initialization costs and
the two existing design points. Our microarchitectural analysis of
prior designs inspires two better designs that exploit concurrency and
non-temporal cache-bypassing instructions to reduce the direct and
indirect costs simultaneously. We show that the best strategy is to
adaptively choose between the two new designs based on CPU
utilization. This approach improves over widely used hot-path zeroing
by 3% on average and up to 15% on the newest Intel i7-2600 processor,
without slowing down any of the benchmarks. These results indicate
that zero initialization is a surprisingly important source of
overhead in existing VMs and that our new software strategies are
effective at reducing this overhead. These findings also invite other
optimizations, including software elision of zeroing and
microarchitectural support.


Regards.


More information about the hotspot-dev mailing list