The heap, garbage collection and scavenges
In the last post I discussed the heap, dead and live objects and the object graph. Recapping…
V8 uses an extremely simple technique for garbage collection. In this it scans the stack for any references(handles) which are active(live) and considers the referenced objects live. The rest of the heap is considered garbage(dead) and is marked to be reclaimed for re-use.
One of the problems we face is that V8 is single threaded. When garbage collection happens the rest of the program stops. This is why it is called “stop the world” garbage collector. This has performance implications because if the execution of the program is held up for long it may result in sluggish behavior. From a UI standpoint this may result in browser rendering to be less than 60 fps resulting in the user experience being less than optimal. More information on this sluggishness in browser rendering, also called “Jank” can be found here.
Collection types and cycles
If you think about it, most of the allocations we request are localized to the executing function. In comparison the allocations that require to be alive across multiple function calls(e.g. globals) are fewer. For this reason V8 uses a “Generational” garbage collection system where the lifetime of the objects determine their placement in the heap. In this the heap is divided into two major lifetime sections or generations. The new and the old generation(or space). The new space holds objects that are short lived while the old space holds objects that are intended to be around for a longer period of time.
This is how it works. Most objects(if they are not too big) start their life in the new space(young generation). When memory is required, V8 quickly scans the new space for any live objects and considers the rest of the objects as dead and re-usable. This quick scan and cleanup is called a scavenge(minor collection cycle) and usually lasts less than a millisecond.
If an object survives two scavenge cycles it is considered “old” and moved(promoted) to the old space. At some point when V8 determines that it may need more memory in the old space it triggers a major garbage collection cycle(mark-sweep and mark compact) and removes dead objects from the old space.
One consequence of garbage collection is memory fragmentation. Dead and live objects are intermixed in the heap. If there are many of these pockets of free memory(space occupied by dead objects it would make allocation of new memory slow as objects would have to be split up for storage. To fix this the garbage collector moves objects around and lays them out in a contiguous memory space. This process is called “memory compaction”.
Spaces in the Heap
To understand both the minor(scavenge) and major(mark-sweep and mark-compact) cycle, it is helpful to visualize how the heap is divided in to spaces and their functions. The heap is divided into multiple sections or spaces. The spaces that are relevant to us are shown in Figure 4.1. below which provides a pictorial representation.
Briefly, here is what each space does:
- New space: Short lived objects allocated here. Scavenge works in this space
- Old Space = Old data space + Old pointer space: Promoted objects from new space and other raw data objects and pointers respectively. Mark-sweep and mark-compact work in this space.
- Code space: Executable instructions
- Other miscellaneous spaces: These include the Cell space, property cell space and map space and they contain specialized data and pointers. No garbage collection takes place here.
Within these spaces we will concern ourselves only with the new space and the old space. Let’s look at scavenges and the new space first.
Consider this simple example(Figure 4.2) in which a string of 100,000 bytes in length is created and stored in the local variable text. The setInterval timer runs the aString function every 200ms till the program is aborted .
It is clear that there should not be any memory growth, because every time the function returns the variable “text” goes out of scope and is discarded. The program does not have any practical application but works well for learning about scavenges. Running this piece of code with the trace_gc option in the terminal we get the output in figure 4.3.
The dissected trace(the one in bold) is shown below in figure 4.4.
This is the simplest of the gc traces that V8 offers. It gives an excellent overview of memory allocation. This is also the first trace I run to check on how memory allocation is doing. The explanations given in the figure for the dissected trace are self explanatory.
Analyzing the above trace:
- It is a scavenge event, triggered as a result of allocation failure. In other words V8 ran out of memory in the new space and decided to reclaim dead space.
- Each gc event is less then 1.0 ms. This is ok. If this time tends to increase then there is a possibility of a leak. It means that V8 is finding it hard to gc.
- The difference in size of before and after objects which is about 1MB remains fairly constant.
- The before and after overall memory size(41.1MB) remains fairly constant. This indicates that all of the dead space is being cleaned up. If it did not, then this would be another indication of a leak. In real applications it is not this ideal and some variation should be expected.
Continuing with the analysis let us assume that the new space is 1MB in size and that it is only used for text. Visualizing at a very simplistic level:
- For a 1MB sized new space the start and end of new space in the heap would be between 0 and 1048575 bytes
- Initially the pointer is at 0th byte.
- Allocation is requested for 100,000 bytes
- The pointer moves 100,000 bytes from 0 to 99999 bytes. The pointer is at the 100000th position waiting for the next allocation.
- After the 10th allocation the pointer is at 1000000th position
- For the 11th allocation there is no memory left so a GC is triggered
- When allocation is requested for the text variable (11th allocation) V8 finds that there is no memory left in the new space.
- V8 reaches out to the stack and finds that there are no live objects. This means all of the new space contains dead objects and can be re-used.
Please keep in mind that V8 uses a variety of techniques to trigger a GC. Also, it tends to manipulate the size of new space depending on current conditions. This is the reason why the trigger points are never the same. It is the trend that is important and not the exactness of the values.
V8’s scavenge is based on Cheney’s algorithm. In this, the new space is divided into two semi-spaces, the to-space and from-space. There are five steps that the garbage collector goes through to GC.
- Memory allocation starts in the to-space
- Once the to-space is out of memory a GC event is triggered
- To and from spaces are swapped out. At this point to-space is empty and from-space is full.
- Live objects are copied from from-space to to-space, laying them next to each other in contiguous memory locations. This also compacts the memory. from-space now only contains dead objects while to-space contains only live objects.
Note: Live objects are in blue, while the dead objects are in gray.
from-space is now fully reclaimed and can be reused and the cycle continues. For a pictorial representation see figure 4.5 below.
We used a simple example to displayed the V8 GC trace and further dissected it to see how scavenges work. No special tools were used. As such, you can start using it with your existing projects. I will talk about Mark sweep and mark compact next time.