MentaBlog

Simplicity is everything

Which one is faster: Java heap or native memory?

 |  9 Comments

One of the advantages of the Java language is that you do not need to deal with memory allocation and deallocation. Whenever you instantiate an object with the new keyword, the necessary memory is allocated in the JVM heap. The heap is then managed by the garbate collector which reclaims the memory after the object goes out-of-scope. However there is a backdoor to reach the off-heap native memory from the JVM. In this article I am going to show how an object can be stored in memory as a sequence of bytes and how you can choose between storing these bytes in heap memory or in direct (i.e. native) memory. Then I will try to conclude which one is faster to access from the JVM: heap memory or direct memory.

Allocating and Deallocating with Unsafe

The sun.misc.Unsafe class allows you to allocate and deallocate native memory from Java like you were calling malloc and free from C. The memory you create goes off the heap and are not managed by the garbage collector so it becomes your responsibility to deallocate the memory after you are done with it. Here is my Direct utility class to gain access to the Unsafe class.

public class Direct implements Memory {

	private static Unsafe unsafe;
	private static boolean AVAILABLE = false;

	static {
		try {
			Field field = Unsafe.class.getDeclaredField("theUnsafe");
			field.setAccessible(true);
			unsafe = (Unsafe)field.get(null);
			AVAILABLE = true;
		} catch(Exception e) {
			// NOOP: throw exception later when allocating memory
		}
    }

	public static boolean isAvailable() {
		return AVAILABLE;
	}

	private static Direct INSTANCE = null;

	public static Memory getInstance() {
		if (INSTANCE == null) {
			INSTANCE = new Direct();
		}
		return INSTANCE;
	}

	private Direct() {

	}

	@Override
	public long alloc(long size) {
		if (!AVAILABLE) {
			throw new IllegalStateException("sun.misc.Unsafe is not accessible!");
		}
		return unsafe.allocateMemory(size);
	}

	@Override
	public void free(long address) {
		unsafe.freeMemory(address);
	}

	@Override
	public final long getLong(long address) {
		return unsafe.getLong(address);
	}

	@Override
	public final void putLong(long address, long value) {
		unsafe.putLong(address, value);
	}

	@Override
	public final int getInt(long address) {
		return unsafe.getInt(address);
	}

	@Override
	public final void putInt(long address, int value) {
		unsafe.putInt(address, value);
	}
}

Placing an object in native memory

Let’s move the following Java object to native memory:

public class SomeObject {

	private long someLong;
	private int someInt;

	public long getSomeLong() {
		return someLong;
	}
	public void setSomeLong(long someLong) {
		this.someLong = someLong;
	}
	public int getSomeInt() {
		return someInt;
	}
	public void setSomeInt(int someInt) {
		this.someInt = someInt;
	}
}

Note that all we are doing below is saving its properties in the Memory:

public class SomeMemoryObject {

	private final static int someLong_OFFSET = 0;
	private final static int someInt_OFFSET = 8;
	private final static int SIZE = 8 + 4; // one long + one int

	private long address;
	private final Memory memory;

	public SomeMemoryObject(Memory memory) {
		this.memory = memory;
		this.address = memory.alloc(SIZE);
	}

	@Override
	public void finalize() {
		memory.free(address);
	}

	public final void setSomeLong(long someLong) {
		memory.putLong(address + someLong_OFFSET, someLong);
	}

	public final long getSomeLong() {
		return memory.getLong(address + someLong_OFFSET);
	}

	public final void setSomeInt(int someInt) {
		memory.putInt(address + someInt_OFFSET, someInt);
	}

	public final int getSomeInt() {
		return memory.getInt(address + someInt_OFFSET);
	}
}

Now let’s benchmark read/write access for two arrays: one with millions of SomeObjects and another one with millions of SomeMemoryObjects. The code can be seen here and the results are below:

// with JIT:
Number of Objects:  1,000     1,000,000     10,000,000    60,000,000
Heap Avg Write:      107         2.30          2.51         2.58       
Native Avg Write:    305         6.65          5.94         5.26
Heap Avg Read:       61          0.31          0.28         0.28
Native Avg Read:     309         3.50          2.96         2.16
// without JIT: (-Xint)
Number of Objects:  1,000     1,000,000     10,000,000    60,000,000
Heap Avg Write:      104         107           105         102       
Native Avg Write:    292         293           300         297
Heap Avg Read:       59          63            60          58
Native Avg Read:     297         298           302         299

Conclusion: Crossing the JVM barrier to reach native memory is approximately 10 times slower for reads and 2 times slower for writes. But notice that each SomeMemoryObject is allocating its own native memory space so the reads and writes are not continuous, in other words, each direct memory object reads and writes from and to its own allocated memory space that can be located anywhere. Let’s benchmark read/write access to continuous direct and heap memory to try to determine which one is faster.

Accessing large chunks of continuous memory

The test consist of allocating a byte array in the heap and a corresponding chunk of native memory to hold the same amount of data. Then we sequentially write and read a couple of times to measure which one is faster. We also test random access to any location of the array and compare the results. The sequential test can be seen here. The random one can be seen here. The results:

// with JIT and sequential access:
Number of Objects:  1,000     1,000,000     1,000,000,000
Heap Avg Write:      12          0.34           0.35 
Native Avg Write:    102         0.71           0.69 
Heap Avg Read:       12          0.29           0.28 
Native Avg Read:     110         0.32           0.32
// without JIT and sequential access: (-Xint)
Number of Objects:  1,000     1,000,000      10,000,000
Heap Avg Write:      8           8              8
Native Avg Write:    91          92             94
Heap Avg Read:       10          10             10
Native Avg Read:     91          90             94
// with JIT and random access:
Number of Objects:  1,000     1,000,000     1,000,000,000
Heap Avg Write:      61          1.01           1.12
Native Avg Write:    151         0.89           0.90 
Heap Avg Read:       59          0.89           0.92 
Native Avg Read:     156         0.78           0.84
// without JIT and random access: (-Xint)
Number of Objects:  1,000     1,000,000      10,000,000
Heap Avg Write:      55          55              55
Native Avg Write:    141         142             140
Heap Avg Read:       55          55              55 
Native Avg Read:     138         140             138

Conclusion: Heap memory is always faster than direct memory for sequential access. For random access, heap memory is a little bit slower for big chunks of data, but not much.

Final Conclusion

Working with Native memory from Java has its usages such as when you need to work with large amounts of data (> 2 gigabytes) or when you want to escape from the garbage collector [1]. However in terms of latency, direct memory access from the JVM is not faster than accessing the heap as demonstrated above. The results actually make sense since crossing the JVM barrier must have a cost. That’s the same dilema between using a direct or a heap ByteBuffer. The speed advantage of the direct ByteBuffer is not access speed but the ability to talk directly with the operating system’s native I/O operations. Another great example discussed by Peter Lawrey is the use of memory-mapped files when working with time-series.

[1] For more info about avoiding the GC you can read my previous article about Real-time programming without the GC.

9 Comments

  1. While its not surprising to find examples of where the heap is faster, you have to be careful to avoid the situations where the JIT can eliminate or heavily optimise a code which doesn’t do anything useful. You would expect that it can do this more heavily for heap than direct code. You have results where they are similar and the heap is tens of percent faster and I am more likely to believe those, where you have a must wider difference, I would try to ensure your test is realistic enough that the JIT cannot “cheat”

    I suggest testing an array of 1 million heap objects, and one continuous piece of memory split in to 1 million direct objects. i.e. one allocation of 12 MB. And a include a full GC in the test with the objects retained ;) The benefit of using direct memory is that you have more options as to how to lay out the memory and the impact it has on the GC.

    Where the direct memory wins IMHO is when you have billions rather than millions of objects. e.g. time series data. Or when you are performing IO and the data has to be transferred to/from native C space anyway. I use memory mapped files in the situation where both are possible, billions of entries AND the data needs to be read/persisted.

  2. I ran the same tests without JIT with the -Xint option and the relative results between heap and native were not very different as far as I can remember. I will run them again and publish the numbers.

    My point is that it is not fair to compare one million heap objects with their compact representation in direct memory, because you can also serialize these heap objects in the heap laying them down in a heap byte array, right? That’s why in my final test I compare read/write access on a big heap byte array versus a big chunk of direct memory.

    But I totally agree with you about the IO situation. That’s why a direct byte buffer is great, but correct me if I am wrong, its speed advantage is not access speed but the ability to talk directly with the operating system’s native I/O operations.

  3. Really nice article, I was not aware about Unsafe class before I read this article.

  4. You must remember to set -Xmx and -Xms for the JVM. For the type of tests you are doing, the amount of memory allocated by the JVM will make a big difference if the heap needs to be resized during the tests.

  5. To compare what can be compared, I benchmarked (https://github.com/turiot/benchUnsafe) array access safe and unsafe :
    time alloc_unsafe : 0,0 s
    time put_unsafe : 19,1 s
    time get_unsafe : 18,9 s
    time alloc_safe : 0,7 s
    time put_safe : 20,7 s
    time get_safe : 18,9 s
    time test_safe : 40,3 s
    I used Unsafe in an old project to prevent array bound checking but JIT is now very clever;
    Unsafe cause no penalty on my config (java 7 32 bits on Ubuntu) and brings faster and finer alloc and deterministic unalloc.
    Thanks for all your good job.

  6. What about reading and writing int, long, char values instead of just byte? I’ve heard that compiler does some optimization. In my project writing int value into byte heap array takes at least 17 instructions(bit shifts and additions) because there is no any other ways to put primitive values into byte heap array.

  7. How does ByteBuffer.allocateDirect compare to using Unsafe? Unlike Unsafe, it’s guaranteed to be available.

  8. Very good blog to compare different type of allocation.

    I added another allocater that will allocate one big array for all the objects and it will workout read/writer location based on index, this type of allocation will be more cpu cache friendy.

    I ran test on linux – Linux 2.6.18-92.el5 x86_64 , JDK 1.6.0_13
    It is very old version of JDk , but results are very intresting

    BigArrayDirect Write/BigArrayDirect Read tag result are from new code

    *** java -Xms1g -Xmx1g playground.memory.WriteReadTest1 10000

    Heap Write: 2, 2, 2, 2, 2, 2, 2.3, 2, 2, 2
    Direct Write: 5.5, 5.4, 5.4, 5.4, 5.4, 5.4, 5.9, 5.4, 5.4, 5.4
    BigArrayDirect Write: 2.1, 2.2, 2.2, 2.2, 2.1, 2.1, 2.1, 2.2, 2.1, 2.1
    Heap Read: 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.6
    Direct Read: 3.5, 3.5, 3.5, 3.5, 3.4, 3.4, 3.5, 3.5, 3.4, 3.5
    BigArrayDirect Read: 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.2, 0.2, 0.3, 0.2

    **** java -Xms1g -Xmx1g playground.memory.WriteReadTest1 100000

    Heap Write: 4.23, 4.49, 4.23, 4.23, 4.46, 4.43, 4.23, 4.27, 4.76, 5.15
    Direct Write: 12.14, 12.08, 12.33, 12.06, 12.12, 12.1, 12.08, 12.33, 12.34, 12.47
    BigArrayDirect Write: 2.48, 2.5, 2.48, 2.89, 2.47, 2.48, 2.72, 2.82, 2.43, 2.44
    Heap Read: 1.17, 0.88, 0.89, 0.88, 0.87, 0.89, 0.88, 1.07, 0.89, 0.89
    Direct Read: 6.7, 6.96, 6.98, 6.72, 6.96, 6.95, 7.2, 6.77, 7.25, 7.29
    BigArrayDirect Read: 0.15, 0.14, 0.14, 0.15, 0.14, 0.14, 0.16, 0.14, 0.14, 0.14

    I see allocating bigarray is much faster than heap. I can start seeing significant improvement after 100K element, so if you allocate one bigarray then it can beat heap read/write by big margin.

Leave a Reply    (moderated so note that SPAM will not be approved!)