Aggressive garbage collector strategy

You don’t mention which build of the JVM you’re running, this is crucial info. You also don’t mention how long the app tends to run for (e.g. is it for the length of a working day? a week? less?)

A few other points

  1. If you are continually leaking objects into tenured because you’re allocating at a rate faster than your young gen can be swept then your generations are incorrectly sized. You will need to do some proper analysis of the behaviour of your app to be able to size them correctly, you can use visualgc for this.
  2. the throughput collector is designed to accept a single, large pause as opposed to many smaller pauses, the benefit is it is a compacting collector and it enables higher total throughput
  3. CMS exists to serve the other end of the spectrum, i.e. many more much much smaller pauses but lower total throughput. The downside is it is not compacting so fragmentation can be a problem. The fragmentation issue was improved in 6u26 so if you’re not on that build then it may be upgrade time. Note that the “bleeding into tenured” effect you have remarked on exacerbates the fragmentation issue and, given time, this will lead to promotion failures (aka unscheduled full gc and associates STW pause). I have previously written an answer about this on this question
    1. If you’re running a 64bit JVM with >4GB RAM and a recent enough JVM, make sure you -XX:+UseCompressedOops otherwise you’re simply wasting space as a 64bit JVM occupies ~1.5x the space of a 32bit JVM for the same workload without it (and if you’re not, upgrade to get access to more RAM)

You may also want to read another answer I’ve written on this subject which goes into sizing your survivor spaces & eden appropriately. Basically what you want to achieve is;

  • eden big enough that it is not collected too often
  • survivor spaces sized to match the tenuring threshold
  • a tenuring threshold set to ensure, as much as possible, that only truly long lived objects make it into tenured

Therefore say you had a 6G heap, you might do something like 5G eden + 16M survivor spaces + a tenuring threshold of 1.

The basic process is

  1. allocate into eden
  2. eden fills up
  3. live objects swept into the to survivor space
  4. live objects in from survivor space either copied to the to space or promoted to tenured (depending on tenuring threshold & space available & no of times they’ve been copied from 1 to the other)
  5. anything left in eden is swept away

Therefore, given spaces appropriately sized for your application’s allocation profile, it’s perfectly possible to configure the system such that it handles the load nicely. A few caveats to this;

  1. you need some long running tests to do this properly (e.g. can take days to hit the CMS fragmentation problem)
  2. you need to do each test a few times to get good results
  3. you need to change 1 thing at a time in the GC config
  4. you need to be able to present a reasonably repeatable workload to the app otherwise it will be difficult to objectively compare results from different test runs
  5. this will get really hard to do reliably if the workload is unpredictable and has massive peaks/troughs

Points 1-3 mean this can take ages to get right. On the other hand you may be able to make it good enough v quickly, it depends how anal you are!

Finally, echoing Peter Lawrey’s point, you can save a lot of bother (albeit introducing some other bother) if you are really rigorous about object allocation.

Leave a Comment