Does my NodeJS application leak memory? – 3

The heap, objects dead or live?

In the last post I discussed the stack. To quickly recap, the stack is LIFO which is fast and managed automatically. It is small in size and stores only local variables(immediate small integers). Everything else is stored on the heap.

The heap, compared to the stack is bigger in size, more freeform in nature and stores reference types such as objects and strings. Variables that have to span function calls, including globals and variables captured by closures are also stored here.

The heap is dynamically allocated by the OS. It is self managed, that is, the running program(in this case V8)  makes requests from the OS for allocation and de-allocation. It is divided into multiple sections or spaces. The spaces that are relevant to us are the New Space, Old space, Code space, Map Space, Large object space. More on this when I talk about garbage collection. However for now it is extremely important to understand how the stack and the heap interact when a function or an application is executing.

A heap can be imagined as a network of interconnected objects. Consider the example we used in the previous post drawn a little differently.

leaks-post.010
Figure 3.1

In the above figure(3.1):

  1. aSmallInt is a SMI (immediate 31-bit integer) stored on the stack.
  2. aFloat is a number object so a reference(a handle in V8 speak) is stored on the stack with the number object on the heap.
  3. anObject is a literal object. Reference to this object is stored on the stack while the object itself is stored on the heap. The object is split up into three other string objects one each for a country code.

We can draw this in a different way:

leaks-post.011
Figure 3.2

This is a very simplistic rendering of an object graph. The local handles (or references) point to the objects on the heap. The fact that the heap is a network of interconnected objects becomes clear with an object graph.

As long as the parameters of the function test exists on the stack, its handles(references) exist and so do the objects on the heap. The objects in the heap are called live objects. Once the function ends  (returns) both the handles go out of scope and the objects in the heap are now considered dead. V8 can now reclaim the memory area used up by these objects. Understanding  dead and live objects is important for visualizing how the code affects garbage collection.

Objects, Dead or Live?

An object is considered live if it is being referenced by some chain of pointers to a root object or another live object. I will discuss two examples to illustrate this.

Simple Example

Consider the following simple snippet. The function  getCountryCode returns the countryCode if one is found or otherwise an empty string:

var countryCodes = {no: "+47", us: "+1", uk: "+44"};
var cc = '';

function getCountryCode(countryAbbreviation) {
    var ret = null;

    if (countryAbbreviation in countryCodes) {
       ret = countryCodes[countryAbbreviation];
    }

    return ret;
}

cc = getCountryCode('no');

 

In this code snippet:

  • The literal object countryCodes is global.
  • cc, the variable which will hold the results is initialized with an empty string and is a global.
  • countryAbbreviation is a local variable and a string.
  • ret, the return variable is also a local variable and a string.

1. Before getCountryCode is run the graph looks like in the following figure(3.4).  Note that both the global variables, cc and countryCodes are stored in the heap pointed to by global handles.

 

leaks-post.013
Figure 3.4

 

2. Before the functions ends, that is before getCountryCode returns, here is how the object graph looks (figure 3.5). Now there are two local variables. Both are string objects and stored on the heap pointed to by local handles.

leaks-post.014
Figure 3.5

 

 

3. Once the function ends(returns) the graph looks like in figure 3.6. As the function has ended, both ret and countryAbbreviation string objects have lost their references and now can be cleaned up by the garbage collector as these objects are dead. On the other hand both the global variables cc and countryCodes remain alive and the garbage collector will not touch them.

leaks-post.015
Figure 3.6

 

In summary:

  1. On entering the module, countryCodes object and cc which are global variables are allocated space on the heap. Both the globals are considered as root objects. The stack is empty at this point.
  2. On entering getCountryCode, ret and countryAbbreviation, both local variables (not small integers) are stored on the heap with the references on the stack. Both the local variables are considered root objects. They are also live objects as they can be accessed within the function scope. In other words they are live for the entire execution of the function.
  3. When the function returns, the global variable cc contains the value of 47, the country code for Norway.
  4. Both ret and countryAbbreviation go out of scope, are popped from the stack and discarded.
  5. The references to the heap for ret and countryAbbreviation are removed. Both ret and countryAbbreviation in the heap are considered dead.
  6. Both globals are still live as they can be used again till the program terminates.

 

Example with a Closure

Now consider the same example but in a form of a closure in which processCountryCodes returns a reference to the inner function getCountryCode. Please note that the variable getCountryCode to which the function is assigned is redundant and used here for clarity. I could have easily returned the function itself directly.

var cc, code;

function processCountryCodes() {
    var countryCodes = {no: "+47", usa: "+1", uk: "+44"};
    var getCountryCode = function (countryAbbreviation) {
        var ret = null;

        if (countryAbbreviation in countryCodes) {
            ret = countryCodes[countryAbbreviation];
        }
        return ret;
    };

    return getCountryCode;
}

cc = processCountryCodes();
code = cc('no');
console.log(code);//+47

1. Before the function processCountryCodes is run the object graph looks like in the figure(3.8) below. Both global variables are allocated space on the heap. They are referenced via global handles. Simple enough.

Figure 3.8
Figure 3.8

 

2. Before the function processCountryCodes returns the object graph looks like in the following figure 3.9:

Figure 3.9
Figure 3.9

The inner function getCountryCode is a closure and closes over the object countryCodes. In other words before processCountryCodes returns it needs to save a reference to the countryCodes object. This is because once processCountryCodes returns, the stack will be wiped clean. To retain the reference V8 automatically creates an internal object called the “Context object” which is an instance of the JSFunction class and adds the countryCodes object to it as a property. For the same reason V8 also allocates space for the inner function in the heap. Please remember that V8 creates the context object when it enters the outer function processCountryCodes.

3. Before the function processCountryCodes returns the object graph looks like in the figure 3.10. The global variable cc now holds a reference to the inner function getCountryCode.

Figure 3.10
Figure 3.10

 

4. On execution of the inner function the graph looks like in the figure 3.11 The global variable cc now holds a reference to the inner function getCountryCode. Global variable code now contains the result “+47”.

Figure 3.11
Figure 3.11

 

Please note that I have not included the objects for the inner function because that is the same as the earlier example.

In summary:

  1. On entering the module, cc and code which are globals are allocated space on the heap. The stack is empty at this point. cc and code are now considered live root objects.
  2. On entering processCountryCodes V8 builds a special “Context” object on the heap and adds the countryCodes object as a property to it.
  3. Before processCountryCodes returns, the local variable now holds a reference to the inner function which is allocated space on the heap
  4. Once processCountryCodes returns, global variable cc holds the reference to the inner function.
  5. Global variable holds the result “+47” once the inner function is executed via the global variable cc.

The key point here to note is that even though both processCountryCodes and getCountryCode have been executed the heap structure remains intact. The reason is that both the global variables will keep holding references to the objects in the heap till the program terminates.

I hope that you now have the necessary tools to visualize your code in terms of an object graph. In the next post I will talk about Garbage collection as it relates to the heap and build on the information in this post.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s