2013-02-03

JavaScript: fixing categorization

Categorizing values in JavaScript is quirky. This blog post explains the quirks and one approach to fixing them. To understand everything, it helps to be familiar with how values are categorized in JavaScript. If you aren’t, consult [1].

Problems

Categorizing values in JavaScript is problematic in several ways.

Three different mechanisms for basic tasks

You need three different mechanisms, depending on what you want to check:
  • Distinguishing primitives from each other and from objects: use typeof [1].
  • Determining which constructor an object is an instance of: use instanceof [1].
  • Finding out whether a value is an array, even if it comes from another frame: use Array.isArray() [1].
JavaScript programmers shouldn’t have to know how these values differ. The language normally does a good job with hiding the differences and people tend to intuitively handle the different kinds correctly.

Objects that are not instances of Object

Some objects are not instances of Object:
    > Object.create(null) instanceof Object
    false
    > Object.prototype instanceof Object
    false
You can use typeof to check whether a value is an object, but that’s a bit complicated (see below).

typeof is quirky

The typeof operator has several well-known quirks that can’t be fixed, because that would break existing code:
  • typeof null is 'object'.
  • The type of objects is 'object', except for functions, whose type is 'function'.
Hence, using typeof to check whether a value is an object is a bit complicated:
    function isObject(value) {
        return (typeof value === 'object' && value !== null) ||
               (typeof value === 'function');
    }
The function in use:
    > isObject(Object.create(null))
    true
    > isObject(Object.prototype)
    true
    > isObject(null)
    false
    > isObject('xyz')
    false

The future: more value objects

In the future, we’ll probably get more kinds of value objects (beyond current primitives): at least large integers, possibly even user-defined value object types. Hence, categorization will become even more tricky. For reasons of backward compatibility, we won’t be able to change typeof to handle these cases.

One possible solution

The plan is to extend the instanceof operator so that it also takes over all of typeof’s duties. As we can’t do that, we’ll instead implement the following function:
    function isInstance(value, Type) {
        ...
    }
isInstance() will be explained in two steps. Its complete code is available as a gist.

Step 1: an extensible protocol

Given the function call
    isInstance(value, Type)
If Type has a method hasInstance, then return Type.hasInstance(value). Otherwise, return value instanceof Type.
    function isInstance(value, Type) {
        if (typeof Type.hasInstance === 'function') {
            return Type.hasInstance(value);
        } else {
            return value instanceof Type;
        }
    }

Step 2: more types

We then complete JavaScript’s type hierarchy with more entries:
    ValueType
        Primitive
            PrimitiveBoolean
            PrimitiveNumber
            PrimitiveString
        (Future: user-defined value types)
    ReferenceType
        (Objects that are not an instance of Object)
        Object
            (Subtypes of Object)
The additions are only used for categorization and all have a method hasInstance. For example:
    isInstance.PrimitiveBoolean = {
        hasInstance: function (value) {
            return typeof value === 'boolean';
        }
    };

Examples

In action, isInstance looks like this:
    > isInstance(Object.create(null), Object)
    false
    > isInstance(Object.create(null), isInstance.ReferenceType)
    true
    > isInstance('abc', isInstance.PrimitiveString)
    true

Open issue

We haven’t solved the problem of objects coming from other frames. It does not look like there will ever be a simple solution for this.

ECMAScript 6

The current draft of the ECMAScript 6 specification allows extending instanceof via the well-known symbol @@hasInstance, in a manner similar to what has been described above. However, the left-hand side of instanceof must still be an object.

Reference

  1. Categorizing values in JavaScript

4 comments:

Claude said...

"Categorising" is a rather vague notion, and I'm not sure that there exists a universal notion which is right in all cases. Instanceof and typeof have each their own precise and distinct notion of "categorisation".

It is a sad fact that typeof has an unfixable quirk for null (and a-posteriori justifications of typeof null === 'object' don't help), and is very limited for non-function objects. You can define a variation of typeof which corrects both issues by using Object.prototype.toString:

function brandof(o) {
var t = /^\[object (.*)\]$/.exec(Object.prototype.toString.call(o))
return t && t[1]
}

but it has a really different semantic from instanceof (it doesn't check the prototype), and is still unable to work with subtypes. It is nevertheless useful for distinguishing things like built-in dates and arrays, while duck-typing forbids me to use the semantic of instanceof.

BTW, for checking if something is an object, you can use a direct method, without typeof:

function isObject(value) {
return value === Object(value)
}

Axel Rauschmayer said...

“‘Categorising’ is a rather vague notion, and I'm not sure that there exists a universal notion which is right in all cases.”
I agree wholeheartedly, a keen observation.

Improving typeof: I tried something similar [1], but was never entirely happy with it. I’d throw an exception if the regex doesn’t match.



I never particularly liked typeof (even ignoring the quirks). Do you have a use case for typeof that isn’t covered by what has been described here?


[1] http://www.2ality.com/2011/11/improving-typeof.html

Joe Morgan said...

Kangax describes Object.prototype.toString.call as a solution for the cross frame issue here:

http://perfectionkills.com/instanceof-considered-harmful-or-how-to-write-a-robust-isarray/



Are there cases where this doesn't work?

Axel Rauschmayer said...

I’m using the same technique here: [1]. However, it’s yet another system of categorization that shouldn’t even be accessible from the language. And you don’t encounter these cross-frame objects very often.

[1] http://www.2ality.com/2011/11/improving-typeof.html

Web Analytics