A closer look at _.extend and copying properties

[2012-08-13] underscorejs, dev, javascript, jslang
(Ad, please don’t block)
Underscore.js is a useful complement to JavaScript’s sparse standard library. This blog post takes a closer look at its extend function. Along the way, it will give a detailed explanation of how to best copy properties in JavaScript. This post requires basic knowledge of JavaScript inheritance and prototype chains (which you can brush up on at [1]), but should be mostly self-explanatory.

_.extend

extend has the signature
    _.extend(destination, source1, ..., sourcen) 
There must be at least one source. extend copies the properties of each source to the destination, starting with source1. To understand where this function can be problematic, let’s take a look at its current source code:
    _.extend = function(obj) {
        each(slice.call(arguments, 1), function(source) {
            for (var prop in source) {
                obj[prop] = source[prop];
            }
        });
        return obj;
    };
The following sections explain the weaknesses of this function. But note that they only exist, because Underscore.js must be compatible with ECMAScript 3. We are examining how you can do things differently if can afford to rely on ECMAScript 5.

Problem: for-in

The for-in loop should generally be avoided [2]. Alas, under ECMAScript 3, it is the only way of iterating over the properties of an object. When it does so, it exhibits two problematic traits: First, it iterates over all properties, including inherited ones. Second, it ignores non-enumerable properties.

for-in iterates over all properties, including inherited ones

for-in iterating over all properties means that extend will also copy all of those properties – as opposed to only own (non-inherited) properties. To see why that can be problematic, let’s look at the following constructor for colors:
    function Color(name) {
        this.name = name;
    }
    Color.prototype.toString = function () {
        return "Color "+this.name;
    };
If you create an instance of Color and copy its properties to a fresh object then that object will also contain the toString method.
    > var c = new Color("red");
    > var obj = _.extend({}, c);
    > obj
    { name: 'red', toString: [Function] }
That is probably not what you wanted. The solution is to make toString non-enumerable. Then for-in will ignore it. Ironically, only ECMAScript 5 allows you to do that, via Object.defineProperties. Let’s use that function to add a non-enumerable toString to Color.prototype.
    Object.defineProperties(
        Color.prototype,
        {
            toString: {
                value: function () {
                    return "Color "+this.name;
                },
                enumerable: false
            }
        });
Now we only copy property name.
    > var c = new Color("red");
    > var obj = _.extend({}, c);
    > obj
    { name: 'red' }

Why built-in methods are non-enumerable

What we have just observed explains why all built-in methods are non-enumerable: they should be ignored by for-in. Take, for example, Object.prototype, which is in the prototype chain of most objects:
    > Object.prototype.isPrototypeOf({})
    true
    > Object.prototype.isPrototypeOf([])
    true
None of the properties of Object.prototype show up when you copy an object, because none of them are enumerable. You can verify that by using Object.keys, which sticks to own properties that are enumerable [3]:
    > Object.keys(Object.prototype)
    []
In contrast, Object.getOwnPropertyNames returns all own property names:
    > Object.getOwnPropertyNames(Object.prototype)
    [ '__defineSetter__',
      'propertyIsEnumerable',
      'toLocaleString',
      'isPrototypeOf',
      '__lookupGetter__',
      'valueOf',
      'hasOwnProperty',
      'toString',
      '__defineGetter__',
      'constructor',
      '__lookupSetter__' ]

for-in only iterates over enumerable properties

for-in ignoring non-enumerable properties was acceptable for inherited properties. But it also prevents extend from copying non-enumerable own properties. The following code illustrates this. obj1.foo is a non-enumerable own property.
    > var obj1 = Object.defineProperty({}, "foo",
                     { value: 123, enumerable: false });
    > var obj2 = _.extend({}, obj1)

    > obj2.foo
    undefined
    > obj1.foo
    123
We are faced with an interesting design decision for extend: Should non-enumerable properties be copied? If yes then you can’t include inherited properties, because too many properties would be copied (usually at least all of Object.prototype’s properties).

Problem: assignment instead of definition

extend uses assignment to create new properties in the destination object. That is problematic, as will be explained in the following two sections. ECMAScript 5 gives you an additional way to create properties, definition [4], via methods such as Object.defineProperties.

Assignment invokes setters

If the source has a property that has the same name as a setter of the destination object then the destination setter will be invoked by extend and the source property will not be copied. An example. Given the following objects.
    var proto = {
        get foo() {
            return "a";
        },
        set foo() {
            console.log("Setter");
        }
    }
    var dest = Object.create(proto);
    var source = {
        foo: "b"
    };
dest is an empty object whose prototype is proto. extend will call the setter and not copy source.foo:
    > _.extend(dest, source);
    Setter
    {}
    > dest.foo
    'a'
Object.defineProperty does not exhibit this problem:
    > Object.defineProperty(dest, "foo", { value: source.foo });
    {}
    > dest.foo
    'b'
Now dest.foo is an own property that overrides the setter in proto. If you want dest.foo to be exactly like source.foo (including property attributes such as enumerability) then you should retrieve its property descriptor, not just its value:
    Object.defineProperty(dest, "foo",
        Object.getOwnPropertyDescriptor(source, "foo"))

Read-only prototype properties prevent assignment

If there is a read-only property in one of the prototypes of the destination object then extend will not copy a property with the same name from the source [4]. Given, for example, the following objects.
    "use strict";  // enable strict mode
    var proto = Object.defineProperties({}, {
        foo: {
            value: "a",
            writable: false,
            configurable: true
        }
    });
    var dest = Object.create(proto);
    var source = {
        foo: "b"
    };
extend does not let us copy source.foo:
    > _.extend(dest, source);
    TypeError: dest.foo is read-only
Note: not all JavaScript engines prevent such assignments (yet), but Firefox already does. The exception is only thrown in strict mode [5]. Otherwise, the copying fails silently. We can still use definition to copy source.foo:
    > Object.defineProperty(dest, "foo",
             Object.getOwnPropertyDescriptor(source, "foo"))
    > dest.foo
    'b'

A better solution

My preferences for an extend-like function are to only copy own properties and to include non-enumerable properties. I prefer the former, because the first object in a prototype chain is where you usually find all of the state, even if subtyping is involved. I prefer the latter, because if you copy, you should copy everything. The following sections show how to achieve those preferences.

Ignoring inherited properties under ECMAScript 3

Under ECMAScript 3, there is no way to copy non-enumerable properties, but at least we can avoid copying inherited properties. To do so, we use Object.prototype.hasOwnProperty.
    function update(obj) {
        _.each(_.slice.call(arguments, 1), function(source) {
            for (var prop in source) {
                if (Object.prototype.hasOwnProperty.call(source, prop)) {
                    obj[prop] = source[prop];
                }
            }
        });
        return obj;
    }    
The name extend is used by Underscore for historical reasons (the Prototype framework had a similar function). But that word is usually associated with inheritance. Hence, update is a better choice. We could also invoke hasOwnProperty via source, but that can cause trouble [6]. As a performance optimization, you can store hasOwnProperty in a variable beforehand (external to myextend), to avoid retrieving it each time via Object.prototype.

A complete solution for ECMAScript 5

A more complete solution relies on ECMAScript 5 and uses definition instead of assignment:
    function update(target) {
        var sources = [].slice.call(arguments, 1);
        sources.forEach(function (source) {
            Object.getOwnPropertyNames(source).forEach(function(propName) {
                Object.defineProperty(target, propName,
                    Object.getOwnPropertyDescriptor(source, propName));
            });
        });
        return target;
    };
We have also used ECMAScript 5 functions instead of Underscore functions. The above function is therefore completely independent of that library. update gives you a simple way to clone an object:
    function clone(obj) {
        var copy = Object.create(Object.getPrototypeOf(obj));
        update(copy, obj);
        return copy;
    }
Note that this kind of cloning works for virtually all instances of user-defined types, but fails for some instances of built-in types [7].

References

  1. JavaScript inheritance by example
  2. Iterating over arrays and objects in JavaScript
  3. JavaScript properties: inheritance and enumerability [and how it affects operations such as Object.keys()]
  4. Properties in JavaScript: definition versus assignment [also explains property attributes and property descriptors]
  5. JavaScript’s strict mode: a summary
  6. The pitfalls of using objects as maps in JavaScript
  7. Subtyping JavaScript built-ins