How JavaScript works, memory management + how to handle 4 common memory leaks

[ ]

Alexander Zlatkov has written a nice article how does the memory work in Javascript. As we know, Javascript runs in the virtual machine as Java, thus user doesn’t need to explicitly handle the memory allocation and release.

garbage collection

Javascript embeds a piece of software called garbage collector which job is to track memory allocation and use in order to find when a piece of allocated memory is not needed any longer in which case, it will automatically free it.

As of 2012, all modern browsers ship a mark-and-sweep garbage-collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last years are implementation improvements of this algorithm (mark-and-sweep), but not improvements over the garbage collection algorithm itself, nor its goal of deciding whether an object is reachable or not.

In this article, you can read in a greater detail about tracing garbage collection that also covers mark-and-sweep along with its optimizations.

Although Garbage Collectors are convenient they come with their own set of trade-offs. One of them is non-determinism. In other words, GCs are unpredictable. You can’t really tell when a collection will be performed. This means that in some cases programs use more memory that it’s actually required. In other cases, short-pauses may be noticeable in particularly sensitive applications. Although non-determinism means one cannot be certain when a collection will be performed, most GC implementations share the common pattern of doing collection passes during allocation. If no allocations are performed, most GCs stay idle. Consider the following scenario:

  • A sizable set of allocations is performed.
  • Most of these elements (or all of them) are marked as unreachable (suppose we null a reference pointing to a cache we no longer need).
  • No further allocations are performed.

In this scenario, most GCs will not run any further collection passes. In other words, even though there are unreachable references available for collection, these are not claimed by the collector. These are not strictly leaks but still, result in higher-than-usual memory usage.

memory leaks

In essence, memory leaks can be defined as memory that is not required by the application anymore but for some reason is not returned to the operating system or the pool of free memory.

The four types of common JavaScript leaks

Global variables

JavaScript handles undeclared variables in an interesting way: a reference to an undeclared variable creates a new variable inside the global object. In the case of browsers, the global object is window.

In other words:

function foo(arg) {
    bar = "some text";
}

is the equivalent of:

function foo(arg) {
    window.bar = "some text";
}

If bar was supposed to hold a reference to a variable only inside the scope of the foo function and you forget to use var to declare it, an unexpected global variable is created.

Another way in which an accidental global variable can be created is through this:

function foo() {
    this.var1 = "potential accidental global";
}
// Foo called on its own, this points to the global object (window)
// rather than being undefined.
foo();

To prevent these mistakes from happening, add use strict; at the beginning of your JavaScript files. This enables a stricter mode of parsing JavaScript that prevents accidental global variables. Learn more about this mode of JavaScript execution.

Even though we talk about unsuspected globals, it’s still the case that much code is filled with explicit global variables. These are by definition non-collectible (unless assigned as null or reassigned). In particular, global variables that are used to temporarily store and process big amounts of information are of concern. If you must use a global variable to store lots of data, make sure to assign it as null or reassign it after you are done with it.

Timers or callbacks that are forgotten

The use of setInterval is quite common in JavaScript. Most libraries, that provide observers and other facilities that take callbacks, take care of making any references to the callback unreachable after their own instances become unreachable as well. In the case of setInterval, however, code like this is quite common:

var serverData = loadData();
setInterval(function() {
    var renderer = document.getElementById('renderer');
    if(renderer) {
        renderer.innerHTML = JSON.stringify(serverData);
    }
}, 5000); //This will be executed every ~5 seconds.

This example illustrates what can happen with timers: timers that make reference to nodes or data that is no longer required. The object represented by renderer may be removed in the future, making the whole block inside the interval handler unnecessary. However, the handler cannot be collected as the interval is still active, (the interval needs to be stopped for this to happen). If the interval handler cannot be collected, its dependencies cannot be collected either. This means that serverData, which presumably stores quite a big amount of data, cannot be collected either.

In the case of observers, it is important to make explicit calls to remove them once they are not needed anymore (or the associated object is about to be made unreachable).

In the past, this used to be particularly important as certain browsers (the good old IE 6) were not able to manage well cyclic references (see below for more info). Nowadays, most browsers can and will collect observer handlers once the observed object becomes unreachable, even if the listener is not explicitly removed. It remains good practice, however, to explicitly remove these observers before the object is disposed of. For instance:

var element = document.getElementById('launch-button');
var counter = 0;
function onClick(event) {
   counter++;
   element.innerHtml = 'text ' + counter;
}
element.addEventListener('click', onClick);
// Do stuff
element.removeEventListener('click', onClick);
element.parentNode.removeChild(element);
// Now when element goes out of scope,
// both element and onClick will be collected even in old browsers // that don't handle cycles well.

Nowadays, modern browsers (including Internet Explorer and Microsoft Edge) use modern garbage collection algorithms that can detect these cycles and deal with them correctly. In other words, it’s not strictly necessary to call removeEventListener before making a node unreachable.

Frameworks and libraries such as jQuery do remove listeners before disposing of a node (when using their specific APIs for that). This is handled internally by the libraries which also make sure that no leaks are produced, even when running under problematic browsers such as … yeah, IE 6.

Closures

A key aspect of JavaScript development are closures: an inner function that has access to the outer (enclosing) function’s variables. Due to the implementation details of the JavaScript runtime, it is possible to leak memory in the following way:

var theThing = null;
var replaceThing = function () {
  var originalThing = theThing;
  var unused = function () {
    if (originalThing) // a reference to 'originalThing'
      console.log("hi");
  };
  theThing = {
    longStr: new Array(1000000).join('*'),
    someMethod: function () {
      console.log("message");
    }
  };
};
setInterval(replaceThing, 1000);

This snippet does one thing: every time replaceThing is called, theThing gets a new object which contains a big array and a new closure (someMethod). At the same time, the variable unused holds a closure that has a reference to originalThing (theThing from the previous call to replaceThing). Already somewhat confusing, huh? The important thing is that once a scope is created for closures that are in the same parent scope, that scope is shared.

In this case, the scope created for the closure someMethod is shared with unused. unused has a reference to originalThing. Even though unused is never used, someMethod can be used through theThing outside of the scope of replaceThing (e.g. somewhere globally). And as someMethod shares the closure scope with unused, the reference unused has to originalThing forces it to stay active (the whole shared scope between the two closures). This prevents its collection.

When this snippet is run repeatedly a steady increase in memory usage can be observed. This does not get smaller when the GC runs. In essence, a linked list of closures is created (with its root in the form of the theThing variable), and each of these closures’ scopes carries an indirect reference to the big array, resulting in a sizable leak.

This issue was found by the Meteor team and they have a great article that describes the issue in great detail.

Out of DOM references

Sometimes it may be useful to store DOM nodes inside data structures. Suppose you want to rapidly update the contents of several rows in a table. It may make sense to store a reference to each DOM row in a dictionary or an array. When this happens, two references to the same DOM element are kept: one in the DOM tree and the other in the dictionary. If at some point in the future you decide to remove these rows, you need to make both references unreachable.

var elements = {
    button: document.getElementById('button'),
    image: document.getElementById('image')
};
function doStuff() {
    image.src = 'http://example.com/image_name.png';
}
function removeImage() {
    // The image is a direct child of the body element.
    document.body.removeChild(document.getElementById('image'));
    // At this point, we still have a reference to #button in the
    //global elements object. In other words, the button element is
    //still in memory and cannot be collected by the GC.
}

There’s an additional consideration that has to be taken into account when it comes to references to inner or leaf nodes inside a DOM tree. Say you keep a reference to a specific cell of a table (a <td> tag) in your JavaScript code. One day you decide to remove the table from the DOM but keep the reference to that cell. Intuitively one may suppose the GC will collect everything but that cell. In reality, this won’t happen: the cell is a child node of that table and children keep references to their parents. That is, the reference to the table cell from JavaScript code causes the whole table to stay in memory. Consider this carefully when keeping references to DOM elements.

We at SessionStack try to follow these best practices in writing code that handles memory allocation properly, and here’s why: Once you integrate SessionStack into your production web app, it starts recording everything: all DOM changes, user interactions, JavaScript exceptions, stack traces, failed network requests, debug messages, etc. With SessionStack, you replay issues in your web apps as videos and see everything that happened to your user. And all of this has to take place with no performance impact for your web app.

Since the user can reload the page or navigate your app, all observers, interceptors, variable allocations, etc. have to be handled properly, so they don’t cause any memory leaks or don’t increase the memory consumption of the web app in which we are integrated.

Written on September 13, 2017