The Cost of Creating Objects in Java: A Quick Sanity Check

Experienced Java programmers are often careful not to create unnecessary objects. This is not about the overall object structure of an application, but about tight, inner loops, where careless programming can easily lead to huge numbers of objects being created and then thrown away in a very short time.

I got my fingers burned with this in the early days of Java, version 1.1 or 1.2, when I had to read a few million records from a text file and put them into an in-memory data structure. The loop went something like this:

while (true) {
  String line = input.readLine();
  if (line == null) break;
  StringTokenizer tok = new StringTokenizer (line, "");
  String s1 = tok.nextToken();
  String s2 = tok.nextToken();
  ...
}

You can see how each line is returned as a new String object, then a StringTokenizer is created for each line, and the individual tokens are returned as new, individual String objects again. For a few million input lines, that’s x-times-a-few-million temporary objects. On that ancient 1.1 or 1.2 JVM, the program would simply not finish, because those millions of objects flooded the heap and the garbage collector couldn’t keep up with it. So I rewrote my code to use a single byte array buffer and simply avoid the creation of objects whereever possible. Bam! it finished in a few seconds.

Things have changed a lot since then. The garbage collectors in a modern Java 6 or Java 7 VM are so good that they can easily deal with millions of temporary objects thrown at them. Still… you would think that creating an object, which means talking to heap management and getting some storage reserved, initializing it, etc. is not exactly a trivial operation, certainly more expensive than a method call or an integer increment? When optimizing a tight, inner loop, perhaps it might still be a good idea to avoid creating too many temporary objects…?

I looked around on the net and came across this entry on stackexchange where an innocent programmer asked pretty much the same question (“people told me I should avoid creating objects — really?”), to which he received an energetic, sarcastic lecture by another programmer to settle the issue once and for all:

Your colleague has no idea what they are talking about. Your most expensive operation would be listening to them, they wasted your time mis-directing you to information at least a decade out of date as well as you having to spend time posting here and researching the Internet for the truth.

Hopefully they are just ignorantly regurgitating something they heard or read from a decade ago and don’t know any better. I would take anything else they say as suspect as well, this should be a well known fallacy by anyone that keeps up to date either way.

[...]

Object creation in Java due to its memory allocation strategies is faster than C++ in most cases and for all practical purposes compared to everything else in the JVM can be considered “free”.

And another programmer went on:

Actually, due to the memory management strategies that the Java language (or any other managed language) makes possible, object creation is little more than incrementing a pointer in a block of memory called the young generation. It’s much faster than C, where a search for free memory has to be done.

Now that sounds intriguing (and a bit intimidating, I almost did not dare to even think about the question anymore after that lecture). Object creation is just a pointer increment nowadays?

I decided to do a quick sanity check — not an exhaustive, scientific study — based on what I happened to be working on at the moment. This comes from h3270, a web-to-host adapter I have been involved with for a number of years. The task at hand was to convert a hexadecimal representation of a UTF-8 character into a Java char (UTF-16). For example, the string “61″ would be converted to the character ‘a’, and “e282ac” to the euro sign ‘€’. This conversion would have to be done for each character on a terminal screen, altogether several thousand times per screen — so here’s our tight loop.

A naive approach is to create a byte array from the hex string and then to create a string from the byte array using the UTF-8 charset, from which we take the first (and only) character and return it (the function value() used below simply converts a hex digit into a number from 0 to 15):

public char decodeChar1 (String source) throws UnsupportedEncodingException {
  byte[] b = new byte[source.length() / 2];
  for (int i=0; i<b.length; i++) {
    int val = value(source.charAt(i*2)) * 16 + value(source.charAt(i*2+1) );
    b[i] = (byte)val;
  }
  return new String(b, "UTF-8").charAt(0);
}

This code creates two objects per call: the byte array b and the result string from which we return only the first character. If object creation were essentially free nowadays that would be fine, but it happens that Java offers a different API precisely to avoid creating these objects. Here’s another solution using a CharsetDecoder and re-usable buffers:

private CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
private ByteBuffer codeBuffer = ByteBuffer.allocate(6); // max utf8 encoding length for a single character
private CharBuffer charBuffer = CharBuffer.allocate(1);

public char decodeChar2 (String source) {
  codeBuffer.clear();
  for (int i=0; i<source.length(); i+=2) {
    int val = value(source.charAt(i)) * 16 + value(source.charAt(i+1));
    codeBuffer.put((byte)val);
  }
  codeBuffer.rewind();
  charBuffer.clear();
  charsetDecoder.reset();
  charsetDecoder.decode(codeBuffer, charBuffer, true);
  charsetDecoder.flush (charBuffer);
  return charBuffer.get(0);
}

This code looks more complicated (and it is), but it avoids all object creations inside the method. (And if you walk through the library code you realize that actually both versions use the same API and the only meaningful difference is that the second version does not create any objects while decoding.)

Here are the running times for 10 million iterations on a 2 GHz Linux laptop under Java 1.4 and Java 6:

Java 1.4

Java 6

decodeChar1

9.5 s

2.8 s

decodeChar2

3.3 s

1.4 s

That is a factor of 2-3 between the version that creates objects and the version that doesn’t. We also see that the JVM has indeed become much better between Java 1.4 and 6, but it remains a fact that object creation is a non-trivial operation, certainly not comparable to a pointer increment or even a method call. A rough back-of-the-envelope calculation puts it somewhere in the 100 nanosecond range for this example, which is one or two orders of magnitude more than a pointer increment on a 2 GHz processor. (These numbers are consistent with what other people report for an object creation in Java, at least to the order of magnitude.)

In conclusion, it follows that:

  1. Yes, object creation has a non-trivial, measurable cost in Java, and avoiding object creation is therefore a reasonable optimization technique for tight, inner loops.

  2. This has hopefully been clear all along: For the large-scale structure of an object-oriented program, this is completely irrelevant. At the macroscopic level, structure is much more important than a few nanoseconds, and therefore objects should be used to the fullest degree everywhere, except in those tiny spots where nanosecond-level optimization really is relevant.

Leave a Reply